cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
natalie_
Level V

Normal Distributions and Transformations

Hi Everyone,

 

I have some measured data and when I try a continuous normal fit, I can see that my data is not normal.  However, I can see from the Goodness-of-Fit Test that the data is from the Johnson Su distribution.

 

This distribution has two shape, one location and one scale parameter.  From my research online, I can see how to calculate variance from these parameters and from that the standard deviation.  I used Excel to calculate that, but is there a way in JMP to do this?  From my understanding, the Summary Statics table from the "Distributions" analysis calculates these statistics assuming the data is from the normal distribution.

 

Thanks in advance!

 

Natalie

46 REPLIES 46
Reinaldo
Level IV

Re: Normal Distributions and Transformations

Of course the script should be run separately as I posted before. I just enclosed the single file (Test.jmp) for convenience. Shouldn't your script record the normal data after Johnson SI transformation in the third column called Y (in that data table file) in order to run it and see its normal shape?

The issue is that there is no command line in your script to save the normal data in that data table. That's why I have been asked you about it.

If you look one of those split files and run your script, the third column Y is still in blank. I thought each split data table could be completed by the normal data after Johnson SI transformation in the third column Y because after that I would like to combine them again into one file to proceed my stats analysis.

 

Timepoint 1:

JohnsonMean = -3.80945275324507e-15;

Col Mean(:X) = -7.4569375;

Gettrans("X", "Johnson Sl Transform X By Timepoint", JohnsonMean, myFormula) = -10.7617217296089;

 

Timepoint 2:

JohnsonMean = 7.27196081129478e-15;

Col Mean(:X) = 3.257375;

Gettrans("X", "Johnson Sl Transform X By Timepoint", JohnsonMean, myFormula) = 1.87687074670016;

 

Thank you for your help.

~Rei
txnelson
Super User

Re: Normal Distributions and Transformations

The column called "Johnson Sl Transform X By Timepoint" is the transformed data.  It is this column that contains the normalized data.  You would run your analyses based upon this column.  

Therefore, when you run an anlysis to determine if there is a difference between Timepoint1 and Timepoint2 data, you would run it on the transformed data column.  Now the issue is, that it will report values in the scale of the transformed data.  Thus, in order to relate it back to the original column X, you run the script, entering in the data point value from the analysis, and getting back, what the value in terms of X.

Jim
THack501
Level I

Re: Normal Distributions and Transformations

Hi txnelson,

I have a question about the program/script you linked. It will reverse a johnson transormation easily, but it does not work for other transformations. In particular, the logNormal Density transformation. Would you mind if I PM you to see if there is a way the script could be modified to work?
txnelson
Super User

Re: Normal Distributions and Transformations

I don't understand what the issue is?  Since all the code is doing is taking taking values from the original column and passing them through the formula until the result is equal to the target value from the transformed column(typically the mean of the new distribution).  I reworked the script from my previous responses, but this time, using the Log Normal Density formula determined by the Distribution platform, and the code seems to work for me

Names Default to Here(1);
dt=New Table( "Test",
	Add Rows( 100 ),
	New Column("PNP3", 
		Numeric,
		"Continuous",
		Format( "Best", 10 ),
		Set Values(
			[130.378809500886, 132.736937590235, 136.831952164704, 136.969154239163,
			136.622623419893, 137.480356512481, 138.094011176166, 142.10586919757,
			134.750041237898, 129.719685093372, 136.520172900026, 134.778624216716,
			136.991131754497, 140.420940666186, 143.257827228986, 132.213765487819,
			144.671172276421, 134.703176752174, 136.744212473486, 137.187637284245,
			139.788226018039, 139.372617199359, 136.112484488985, 142.809097034809,
			137.799370430979, 138.482493403385, 135.236982575217, 136.251777781494,
			130.768356509183, 138.248554728086, 139.77894292052, 134.25405167366,
			147.680687116943, 131.351711991517, 132.84274608728, 129.925216236015,
			133.47206414316, 143.339607103893, 145.341236691691, 139.200187547183,
			142.775409342827, 140.276696563388, 130.623979847275, 140.899814366103,
			136.839389290019, 137.239125319, 133.5265281641, 139.356927352471,
			130.278640163464, 144.604061001983, 135.286715550332, 134.465744849174,
			131.37612790407, 131.830655714309, 140.69724979219, 142.88152043774,
			135.253839945611, 127.349434776131, 129.499730399113, 128.447533754611,
			130.916853702805, 134.599575929218, 140.761701916093, 136.870661473033,
			138.253066182015, 140.403627077024, 134.522643679098, 124.842978178703,
			131.803059455053, 125.886786494664, 133.013566701611, 136.940299158936,
			133.263913979648, 144.695171321015, 149.541020434399, 144.503845528521,
			136.086324063453, 139.530943158798, 138.421460162451, 133.180943784947,
			142.166796633818, 142.676541730822, 135.723943440339, 143.957996114985,
			145.712158558794, 138.38937502716, 140.535304753392, 142.140619481175,
			131.379414509756, 144.949299702964, 133.349854687882, 139.470639804255,
			140.160558367008, 137.130662662612, 145.692632562344, 131.870848869966,
			136.391733804566, 134.219740661271, 139.021827550389, 147.958038547157]
		)
	),
	New Column( "LogNormal Density PNP3",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			Normal Density( Log( :PNP3 ), 4.92072961984847, 0.0370684539915594 ) / :PNP3
		)
	)
);

/***********************************************************************/
/*                                                                     */
/* The getformula column retrieves the formula from the translation    */
/* column and replaces the Original Column name in the formula with    */
/* string "__value__".  The value of this variable is what will be     */
/* evaluated in the successive approximations done by the script.      */
/*                                                                     */
/***********************************************************************/

getformula = Function( {ColName, FormulaColName},
	{ColName, FormulaColName, TheFormula, coloncolname}, 
	//__value__ = .;
	// Get the transformed data columns formula as a literal string
	TheFormula=Column( FormulaColName ) << Get Formula ;
	
	// Check to see that a formula was found
	If( Is Empty( TheFormula ) == 1 ,
		Dialog(
			"   The column specified as",
			"the Transformed Column does",
			"      not contain a formula. ",
			" ",
			"    Please rerun and select",
			"       the correct column"
		);
		Throw();
	);
	TheFormula=char( Column( FormulaColName ) << Get Formula );

	// Get the actual name of the orignal column since
	// the upper,lower case and spacing is critical in determining
	// where in the formula the column name actually occurs
	ColName = Column( ColName ) << Get Name;
	

	// Determine if the reference to the column name in the 
	// formula is a simple :colname reference or a complex
	// reference :Name(\!"colname\!")
	// If the column name isn't found set the return code to -1
	If(
		Contains( TheFormula, ":" || ColName ), ColonColName = ":" || ColName, // Else
		Contains( TheFormula, ":Name(\!"" || ColName || "\!")" ), ColonColName = ":Name(\!"" || ColName || "\!")", // Else
		rc = -1
	);
		
	// Replace all of the column references in the formula with
	// the string "(__value__)" so that when the formula is 
	// evaluated later, it will take the then value of the memory
	// variable called __value__ and use it in the formula
	If( Contains( TheFormula, ColonColName ) > 0,
		While( Contains( TheFormula, ColonColName ) > 0, TheFormula = Munger( TheFormula, 1, ColonColName, "(__Value__)" ) ),
		Dialog(
			"   The column specified as",
			"the Transformed Column does",
			"     not contain a reference",
			"      to the original column.",
			"           in it's formula.",
			" ",
			"    Please rerun and select",
			"       the correct column"
		);
		Throw();
	);
	TheFormula;
);

/***********************************************************************/
/*                                                                     */
/* The gettrans function evaluates the formula in the transformed      */
/* and converts the formula into a generic form for repeated use in    */
/* running of the script.                                              */
/*                                                                     */
/***********************************************************************/

gettrans = Function( {ColName, FormulaColName, TheTarget, Theformula},
//colname="PNP3";formulacolname="Johnson Sl Transform PNP3"; Thetarget=johnsonmean;theformula=myformula;
	{ColName, FormulaColName, TheFormula, High, Low, TheTarget, TheMax, Themin, __value__}, 

	// The program uses successive approximations to determine the different 
	// parametrics.  The way it works is that it calculates the needed parameter
	// such as Mean, or Standard Deviation, and then by using successive 
	// approximations from the original column's values, and passing those
	// values through the columns formula, when the approximation value matches
	// the calculated value from the transformed column, the retransformed value
	// has been found

	// Set the extreem values
	High = Col Maximum( If( Excluded( Row State( Empty() ) ) == 0, Column( ColName ), . ) );
	Low = Col Minimum( If( Excluded( Row State( Empty() ) ) == 0, Column( ColName ), . ) );
	
	// Make a guess at the first value 
	__value__ = Mean( High, Low );

	// Iterate the guessing for up to 100 times, adjusting by 1/2 on each loop
	For( i = 1, i <= 100, i++,
		TheResult = Eval( Parse( theformula ) );

		If(
			TheResult > TheTarget, High = __value__,
			TheResult < TheTarget, Low = __value__,
			Break()
		);
		If( High == Low, Break() );
		__value__ = Mean( High, Low );
	);
	
	__value__; // Expose the return value
); // End of function gettrans

myFormula = getformula( "PNP3","LogNormal Density PNP3" );

dt<<run formulas;

show(
	colMean(:PNP3),
	Col Mean(dt:LogNormal Density PNP3),
	Gettrans(
		"PNP3", // The original column
		"LogNormal Density PNP3", // The Formula Column
		colMean(:LogNormal Density PNP3), // The Target Value
		myFormula // The formula
	));
Jim
THack501
Level I

Re: Normal Distributions and Transformations

Perhaps it is my dataset then? In short, I was picking a value from my data set and pulling out the lognormal transformed value for testing/validating the code. My data set is below. A value of 7 corresponds to a transformed value of 0.112832236739791, therefore I should be able to use the transformed value as the target in the code and the script should spit out 7. Correct?

 

 

Names Default to Here(1);
dt=New Table( "Test",
	Add Rows( 30 ),
	New Column("PNP3", 
		Numeric,
		"Continuous",
		Format( "Best", 10 ),
		Set Values(
			[12, 4, 12, 13, 7, 5, 5, 15, 5, 5, 5, 15, 7, 7, 16, 5, 9, 11, 18, 8, 16, 12, 7, 14, 6, 11, 7, 10, 14, 10]
		)
	),
	New Column( "LogNormal Density PNP3",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			Normal Density( Log( :PNP3 ), 2.17970974904887, 0.43804625261609 ) / :PNP3
		)
	)
);

/***********************************************************************/
/*                                                                     */
/* The getformula column retrieves the formula from the translation    */
/* column and replaces the Original Column name in the formula with    */
/* string "__value__".  The value of this variable is what will be     */
/* evaluated in the successive approximations done by the script.      */
/*                                                                     */
/***********************************************************************/

getformula = Function( {ColName, FormulaColName},
	{ColName, FormulaColName, TheFormula, coloncolname}, 
	//__value__ = .;
	// Get the transformed data columns formula as a literal string
	TheFormula=Column( FormulaColName ) << Get Formula ;
	
	// Check to see that a formula was found
	If( Is Empty( TheFormula ) == 1 ,
		Dialog(
			"   The column specified as",
			"the Transformed Column does",
			"      not contain a formula. ",
			" ",
			"    Please rerun and select",
			"       the correct column"
		);
		Throw();
	);
	TheFormula=char( Column( FormulaColName ) << Get Formula );

	// Get the actual name of the orignal column since
	// the upper,lower case and spacing is critical in determining
	// where in the formula the column name actually occurs
	ColName = Column( ColName ) << Get Name;
	

	// Determine if the reference to the column name in the 
	// formula is a simple :colname reference or a complex
	// reference :Name(\!"colname\!")
	// If the column name isn't found set the return code to -1
	If(
		Contains( TheFormula, ":" || ColName ), ColonColName = ":" || ColName, // Else
		Contains( TheFormula, ":Name(\!"" || ColName || "\!")" ), ColonColName = ":Name(\!"" || ColName || "\!")", // Else
		rc = -1
	);
		
	// Replace all of the column references in the formula with
	// the string "(__value__)" so that when the formula is 
	// evaluated later, it will take the then value of the memory
	// variable called __value__ and use it in the formula
	If( Contains( TheFormula, ColonColName ) > 0,
		While( Contains( TheFormula, ColonColName ) > 0, TheFormula = Munger( TheFormula, 1, ColonColName, "(__Value__)" ) ),
		Dialog(
			"   The column specified as",
			"the Transformed Column does",
			"     not contain a reference",
			"      to the original column.",
			"           in it's formula.",
			" ",
			"    Please rerun and select",
			"       the correct column"
		);
		Throw();
	);
	TheFormula;
);

/***********************************************************************/
/*                                                                     */
/* The gettrans function evaluates the formula in the transformed      */
/* and converts the formula into a generic form for repeated use in    */
/* running of the script.                                              */
/*                                                                     */
/***********************************************************************/

gettrans = Function( {ColName, FormulaColName, TheTarget, Theformula},
//colname="PNP3";formulacolname="Johnson Sl Transform PNP3"; Thetarget=johnsonmean;theformula=myformula;
	{ColName, FormulaColName, TheFormula, High, Low, TheTarget, TheMax, Themin, __value__}, 

	// The program uses successive approximations to determine the different 
	// parametrics.  The way it works is that it calculates the needed parameter
	// such as Mean, or Standard Deviation, and then by using successive 
	// approximations from the original column's values, and passing those
	// values through the columns formula, when the approximation value matches
	// the calculated value from the transformed column, the retransformed value
	// has been found

	// Set the extreem values
	High = Col Maximum( If( Excluded( Row State( Empty() ) ) == 0, Column( ColName ), . ) );
	Low = Col Minimum( If( Excluded( Row State( Empty() ) ) == 0, Column( ColName ), . ) );
	
	// Make a guess at the first value 
	__value__ = Mean( High, Low );

	// Iterate the guessing for up to 100 times, adjusting by 1/2 on each loop
	For( i = 1, i <= 100, i++,
		TheResult = Eval( Parse( theformula ) );

		If(
			TheResult > TheTarget, High = __value__,
			TheResult < TheTarget, Low = __value__,
			Break()
		);
		If( High == Low, Break() );
		__value__ = Mean( High, Low );
	);
	
	__value__; // Expose the return value
); // End of function gettrans

myFormula = getformula( "PNP3","LogNormal Density PNP3" );

MatchValue = 7;

TargetValue = 0.112832236739791;

dt<<run formulas;

show(
	MatchValue,
	TargetValue,
	Gettrans(
		"PNP3", // The original column
		"LogNormal Density PNP3", // The Formula Column
		0.112832236739791, // The Target Value
		myFormula // The formula
	));

 

After running the code, here is the log output

 

MatchValue = 7;
TargetValue = 0.112832236739791;
Gettrans("PNP3", "LogNormal Density PNP3", 0.112832236739791, myFormula) = 18;

 

The output of the Gettrans function should be 7. The value of 18 that the code is outputting corresponds to a transformed value of 0.0135701784925184.

txnelson
Super User

Re: Normal Distributions and Transformations

The script will not work for the Denisity Transformation.  The function produces multiple values for the same density value and the code can not handle that.  See graph below

density.PNG

 

Jim
THack501
Level I

Re: Normal Distributions and Transformations

Thank you for the reply, txnelson. I didn't think of that and I wouldn't have expected that to be the case. I am wondering what the value of that transformation is then if it makes it impossible to reverse the transformation with accuracy. If I develop a model off of the transformed data, I need to know what the output of that model is in the original data form. It seems silly, then, that JMP would even include this transformation as an option. Kudos on the original script. I will save it for future use as needed.