cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
natalie_
Level V

Normal Distributions and Transformations

Hi Everyone,

 

I have some measured data and when I try a continuous normal fit, I can see that my data is not normal.  However, I can see from the Goodness-of-Fit Test that the data is from the Johnson Su distribution.

 

This distribution has two shape, one location and one scale parameter.  From my research online, I can see how to calculate variance from these parameters and from that the standard deviation.  I used Excel to calculate that, but is there a way in JMP to do this?  From my understanding, the Summary Statics table from the "Distributions" analysis calculates these statistics assuming the data is from the normal distribution.

 

Thanks in advance!

 

Natalie

46 REPLIES 46
txnelson
Super User

Re: Normal Distributions and Transformations

They typically are applied to manufacturing processes

Jim
Reinaldo
Level IV

Re: Normal Distributions and Transformations

Thank you for your post, Jim! I think now the Capability Analysis is understood! :)

 

Regarding the transformation from non-normal data to normality, I created an example (Test.jmp) which contains two timepoints and X as the non-normal variable. I followed your procedure:

 

1. Analyse -> Distribution

2. X as "Y, Columns" and Timepoint as "By"

3. I clicked on Distributions Timepoint -> Stack

4. For each timepoint, I clicked on the red triangle and "Continuous Fit -> Johnson SI"

5. I clicked on "Fitted Johnson SI -> Save Transformed"

 

Then JMP generated the column "Johnson SI Transform X By Timepoint"

 

So, that's the point: I tried to run your script that is in the top left of the Test.jmp, but nothing happened.

 

May you explain what I did wrong in that script, please?

 

PS: I would like that the Y column represents the normality of X after Johnson SI transformation.

 

Many thanks!

~Rei
txnelson
Super User

Re: Normal Distributions and Transformations

1. I had requested in one of my last responses, if you had looked in the Log to see if there were any errors.  I ran the script as you had included in the attached data table from your last response.  In looking in the log, it specified:

The namespace "dt" is not defined in access or evaluation of 'dt:Johnson Sl Transform X By Timepoint' , 
dt:Johnson Sl Transform X By Timepoint/*###*/

Taking that, and looking into the script, the "dt" reference is in the line

JohnsonMean = Col Mean( dt:Johnson Sl Transform X By Timepoint );

dt in this context is looking for a variable called dt, which is pointing to the data table to read the data from.  So the correction to the program, is to add the following line prior to the use of the reference in the code. 

dt=current data table();

I added it in my copy of the code, as the first line in the script.

Once I did that, in the log I got the following results from the script

JohnsonMean = 1.72084568816899e-15;
Col Mean(:X) = -2.09978125;
Gettrans("X", "Johnson Sl Transform X By Timepoint", JohnsonMean, myFormula) = 1.1;

Please Note......the results were in the log file.......

 

 

Jim
Reinaldo
Level IV

Re: Normal Distributions and Transformations

Sorry, I didn't tell you the Log file because I couldn't find it. I am beginner in JMP. Now I can find it: View -> Log.

Okay, I added that command line and I got the same result!

What's the next step? :D

 

PS: Doesn't the script consider two timepoint separately to calculate those parameters? Perhaps, the new parameter called "Timepoint" would be added in those functions "getformula" and "gettrans".

~Rei
txnelson
Super User

Re: Normal Distributions and Transformations

1. If you need to get separate results for each timepoint, you will need to subset the data table into separate tables for each timepoint, and then run the script on each of them separately.

2. You now need to proceed with your analysis using the transformed data column, as if it was your Y column..

3. Read and digest the document

     Help==>Books==>Discovering JMP

Jim
Reinaldo
Level IV

Re: Normal Distributions and Transformations

1. Okay, I will run it separately.

2. I have those outputs you mentioned (Log file) and the Johnson SI Transform X By Timepoint (= non-normal distribution), but no data assigned to Y Column.

~Rei
Reinaldo
Level IV

Re: Normal Distributions and Transformations

I created two subsets: one per timepoint. Where do I use the outputs from Log file to get and plot the normal data?

~Rei
txnelson
Super User

Re: Normal Distributions and Transformations

Go back to the data table with the combined timepoints.  Then go to Distribution and run it against the transformed data.

 

Then, go to 

     Help==>Books==>Discovering JMP

 

You have to learn more about how JMP works!

Jim
Reinaldo
Level IV

Re: Normal Distributions and Transformations

Hi Jim,

I think there is a misunderstanding here. Although, the Log file shows: 

JohnsonMean = 1.72084568816899e-15;
Col Mean(:X) = -2.09978125;
Gettrans("X", "Johnson Sl Transform X By Timepoint", JohnsonMean, myFormula) = 1.1;

Your script doesn't save the normal data from "Johnson SI Transformation By Timepoint" column into the Y column in my data table.

Best,

Reinaldo

~Rei
txnelson
Super User

Re: Normal Distributions and Transformations

Your question's solution has to bring in a human decision about whether or not you know for a fact that the different Timepoint groups of data are from different overall distributions?  My assumption was that the best determination of whether or not the overall distribution is from a normal distribution was to combine all of the data into a single evaluation.  And the follow on from that is to use the single transformation for all of the data, and then to perform subsequent analyses on that transformed data.  Any true differences should be identified within the transformed data.  Now, if you have an independent apriori knowlege of the Timepoint groups being from separate distributions, then you will need to subset the data into the separate Timepoint groups, and run the script separately on the different subsets.

Jim