BookmarkSubscribeSubscribe to RSS Feed

Normal Distributions and Transformations

natalie_

Community Trekker

Joined:

Jan 6, 2016

Hi Everyone,

 

I have some measured data and when I try a continuous normal fit, I can see that my data is not normal.  However, I can see from the Goodness-of-Fit Test that the data is from the Johnson Su distribution.

 

This distribution has two shape, one location and one scale parameter.  From my research online, I can see how to calculate variance from these parameters and from that the standard deviation.  I used Excel to calculate that, but is there a way in JMP to do this?  From my understanding, the Summary Statics table from the "Distributions" analysis calculates these statistics assuming the data is from the normal distribution.

 

Thanks in advance!

 

Natalie

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson

Super User

Joined:

Jun 22, 2012

Solution
Here is what I do. To set my limits on my original data, based upon the transformed data values, I take the std from the transformed data, calculate what the values above and below the mean are for 1, 2, 3, etc. stds, and then reverse the transformation back to the original data. In some cases, such as the Johnson SU, there isn't an easy way to transform the values back, What I do then, is to run a little script that passes a value through the original transformation, checks the value of the targeted std, then iterates the value until there is a match. Then you have found the value in the original data that when transformed, results in the transformed values targeted value. Remember, when you do this, the distances above and below the mean in your original data will not be the same.
Jim
44 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

Natalie,

You should be able to simply save the transform to a new column, and then run the distribution on that column.

 

Jim

Jim
David_Burnham

Super User

Joined:

Jul 13, 2011

Natalie

The formula for variance and standard deviation doesn't make any assumption about the shape of the distribution.  It's just algebra (in the same way that the calculation of an average value doesn't make any assumptions about the type of distribution).

-Dave
natalie_

Community Trekker

Joined:

Jan 6, 2016

Oh, I thought it did matter for standard deviation, though.  For example, the 68-95-99.7 (three standard deviations) rule is used to to find the values within a band around the mean in a normal distribution.  However, if my data is not normal, it might not make sense to use this.  For example, if my on resistance of my transistor is not normal, and I want to see what the value is at 3 standard deviations from the mean, I might have a negative value or a very low value that actually doesn't make any sense.

 

Sorry if I am being confusing or misunderstanding something, I am just starting to get back into learning statistics again since university!

David_Burnham

Super User

Joined:

Jul 13, 2011

I think I missed the point of your question.  If you want to calculate "bands" based on probability then the location of these bands will differ according to the type of distribution you have.  Your numbers 68-95-99.7 are not standard deviations, but are probabilities associated with "bands" based on distances of 1,2,3 standard deviations from the mean based on a normal distribution.  If you don't have a normal distribution, the problem is not with the calculation of the standard deviation, but the conversion to probabilities.  If you want to have +/- 3 standard deviation bands then you are assuming the distribution is normal, or at least symmetric.  Depending on what you want to do, you can either calculate assymetric bands (JMP has probability distributions not only for the normal distributions, but for all distributions), or you have to perform a transformation to normalise the data (and then back-transformations whenever you want to convert back to natural metrics).  My preference would be to use asymetric bands and use the JOHNSON SU function to calculate them.

online help

-Dave
natalie_

Community Trekker

Joined:

Jan 6, 2016

Thank you, I see how it did that.  Now that I see that the data is normal, how can I use this to find the standard deviation?  It says in the summary statistics a value that makes sense based on the transformation, but I would like to know what the standard deviation is for the original data.  Perhaps I don't understand the purpose of transforming data.

 

txnelson

Super User

Joined:

Jun 22, 2012

Solution
Here is what I do. To set my limits on my original data, based upon the transformed data values, I take the std from the transformed data, calculate what the values above and below the mean are for 1, 2, 3, etc. stds, and then reverse the transformation back to the original data. In some cases, such as the Johnson SU, there isn't an easy way to transform the values back, What I do then, is to run a little script that passes a value through the original transformation, checks the value of the targeted std, then iterates the value until there is a match. Then you have found the value in the original data that when transformed, results in the transformed values targeted value. Remember, when you do this, the distances above and below the mean in your original data will not be the same.
Jim
natalie_

Community Trekker

Joined:

Jan 6, 2016

Thanks for you reply Jim! I will give this a shot.
ckronig

Community Trekker

Joined:

Feb 27, 2015

Hi Jim,

I have the same issue. I have modelled some Johnson Si transformed data, and got a predicted model. I tried to use the inverse function to transform the predicted data back, but it's not working. Would you be able to provide more guidance on writing a script to do this? I keep coming across these types of distributions when modeling responses from DOE experiments, so it would be really useful to know how to transfer the data back.
Many thanks!

Christel

txnelson

Super User

Joined:

Jun 22, 2012

Here is a function that I pulled out of a running system that uses Successive Approximations to get the resolved values

/***********************************************************************/
/*                                                                     */
/* The getformula column retrieves the formula from the translation    */
/* column and replaces the Original Column name in the formula with    */
/* string "__value__".  The value of this variable is what will be     */
/* evaluated in the successive approximations done by the script.      */
/*                                                                     */
/***********************************************************************/

getformula = Function( {ColName, FormulaColName},
	{ColName, FormulaColName, TheFormula, coloncolname}, 
	//__value__ = .;
	// Get the transformed data columns formula as a literal string
	TheFormula=Column( FormulaColName ) << Get Formula ;
	
	// Check to see that a formula was found
	If( Is Empty( TheFormula ) == 1 ,
		Dialog(
			"   The column specified as",
			"the Transformed Column does",
			"      not contain a formula. ",
			" ",
			"    Please rerun and select",
			"       the correct column"
		);
		Throw();
	);
	TheFormula=char( Column( FormulaColName ) << Get Formula );

	// Get the actual name of the orignal column since
	// the upper,lower case and spacing is critical in determining
	// where in the formula the column name actually occurs
	ColName = Column( ColName ) << Get Name;
	

	// Determine if the reference to the column name in the 
	// formula is a simple :colname reference or a complex
	// reference :Name(\!"colname\!")
	// If the column name isn't found set the return code to -1
	If(
		Contains( TheFormula, ":" || ColName ), ColonColName = ":" || ColName, // Else
		Contains( TheFormula, ":Name(\!"" || ColName || "\!")" ), ColonColName = ":Name(\!"" || ColName || "\!")", // Else
		rc = -1
	);
		
	// Replace all of the column references in the formula with
	// the string "(__value__)" so that when the formula is 
	// evaluated later, it will take the then value of the memory
	// variable called __value__ and use it in the formula
	If( Contains( TheFormula, ColonColName ) > 0,
		While( Contains( TheFormula, ColonColName ) > 0, TheFormula = Munger( TheFormula, 1, ColonColName, "(__Value__)" ) ),
		Dialog(
			"   The column specified as",
			"the Transformed Column does",
			"     not contain a reference",
			"      to the original column.",
			"           in it's formula.",
			" ",
			"    Please rerun and select",
			"       the correct column"
		);
		Throw();
	);
	TheFormula;
);

.

/***********************************************************************/
/*                                                                     */
/* The gettrans function evaluates the formula in the transformed      */
/* and converts the formula into a generic form for repeated use in    */
/* running of the script.                                              */
/*                                                                     */
/***********************************************************************/

gettrans = Function( {ColName, FormulaColName, TheTarget, Theformula},
	{ColName, FormulaColName, TheFormula, High, Low, TheTarget, TheMax, Themin, __value__}, 

	// The program uses successive approximations to determine the different 
	// parametrics.  The way it works is that it calculates the needed parameter
	// such as Mean, or Standard Deviation, and then by using successive 
	// approximations from the original column's values, and passing those
	// values through the columns formula, when the approximation value matches
	// the calculated value from the transformed column, the retransformed value
	// has been found

	// Set the extreem values
	High = Col Maximum( If( Excluded( Row State( Empty() ) ) == 0, Column( ColName ), . ) );
	Low = Col Minimum( If( Excluded( Row State( Empty() ) ) == 0, Column( ColName ), . ) );
	Highm = 999999999999999999999999;
	Lowm = -999999999999999999999999;
	If( Highm > High,
		High = Highm
	);
	If( Lowm < Low, low = lowm );
	
	// Make a guess at the first value 
	__value__ = Mean( High, Low );

	// Iterate the guessing for up to 100 times, adjusting by 1/2 on each loop
	For( i = 1, i <= 100, i++,
		TheResult = Eval( Parse( theformula ) );
		If(
			TheResult > TheTarget, High = __value__,
			TheResult < TheTarget, Low = __value__,
			Break()
		);
		If( High == Low, Break() );
		__value__ = Mean( High, Low );
	);
	
	__value__; // Expose the return value
); // End of function gettrans

 

 

Jim