cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
Kate
Level II

How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

Input data is already the histogram in %.  There are 5 samples in total. Each sample include 2 distributions (bottom and top position). The goal is to calculate the difference between the 2 means (physically, it´s an height). Raw data (´´CI_t test_data_Q12345.jmp´´) is attached.

Questions:

  1. How to obtain the statistics for each distribution (bottom and top position)?
  2. How to deduce the statistics for height of each sample (1 to 5)?
  3. How to obtain the t-test and equivalent test to compare the deduced 5 heights?

Many  thanks for your help!

CI_t-test_JMP.JPG

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

So you want to get the um locations of the peaks and compare them instead of the peak value? You probably need to use formula columns to extract the peak locations first and then do what has been described before. Not sure if that is exactly what you are looking for, but I added two columns to the data table. 

 

other possible scripts to detect peaks in a distribution could be one of the other threads when searching for "peak" in the community and click on the results like e.g. this one https://community.jmp.com/t5/Discussions/find-multiple-peak-values-in-a-column/m-p/62919#M33840 

/****NeverStopLearning****/

View solution in original post

10 REPLIES 10
SDF1
Super User

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

Hi @Kate ,

 

  To me, it sounds like you need to do an ANOVA test of the data. Use the Fit Y by X platform, and use um as Y and sample number as X. This will show a picture like this:

SDF1_0-1643224410758.png

Then, go to the red hot button next to Oneway and select Means/Anova. You should get this:

SDF1_1-1643224459664.png

Notice that the "Prob > F" is <.0001, so this means that the Sample number has a high probability of explaining some of the variation in the Y (um). Before continuing, you will want to do a test of unequal variance to see if the variance from sample number to sample number is the same or different. Again, red hot button, then select unequal variance. You should get this:

SDF1_2-1643224630341.png

All of the unequal variance tests have Prob > F to be <.0001, meaning that the variances are NOT equal. But, the Welch test also has a Prob > F at <.0001, so even though the variances are not equal, you can continue with the ANOVA comparison. Next, you CAN do a Student's-t test, but since you have more than two levels (the number of levels for Sample number is 5), you'll actually want to do a TukeyHSD test. If you do a Student's-t on this data set, you might be mistakenly saying there's no difference when there actually is a difference, Type II Error (I think).  Again, go red hot button, then Compare Means > All Pairs, Tukey HSD. You should get this:

SDF1_3-1643224958402.png

You can see there is some overlap between levels 2 and 5, but otherwise, there are distinct differences in the means. Within the Oneway platform, you can do an Equivalence test. Again, red hot button and then select Equivalence test. For the variance assumption, you'll need to select Unequal Variances since the tests all came back with very low p-values. You'll then need to enter a value that will be practically equivalent or not. You'll then get this (I put in 0.2):

SDF1_4-1643225214085.png

What this tells you (given my value for equivalence) is that 3 out of the 10 pair-wise comparisons are different while the other 7 are practically equivalent.

 

To get the statistics on the distributions, Click on the Distribution icon in JMP (or go Analyze > Distribution, cast um into Y and Sample number into By. If you then click on the red hot button next to Summary Statistics, you can customize what you see. In this case, I have also selected Minimum and Maximum to be in the summary statistics. If you hold down the CTRL button while doing that, it will "broadcast" it to all the other distributions (for each sample) and you'll get all the data you need. If you want to further analyze things, you can right click the data in the Summary Statistics and select Make Into Combined Data Table. Very useful. Below is an example for sample number 1.

SDF1_5-1643225638190.png

I'm attaching the original data table and scripts so you can see what I did.

 

Out of curiosity, if you can share, what NanoScope Analysis are you using (instrumentation)? I'm a nano-physicist.

 

Hope this helps!,

DS

 

Kate
Level II

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

Hi, @SDF1 ,

Thank you very much for your help! Thanks for your script and explanations. Very helpful! The tricks you shared are nice! I like them.

 

Sorry I didn´t describe my request clearly. That has caused confusion. I try to make it better described in my answer to Martin below. Yes, I also work in micro and nano technologies. It´s an AFM measurement result. Glad to meet you here!! 

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

Hi Kate, not sure I understand what you are trying to do. There are some things I need some clarification: 

 

1. There are two tables attached which are completely identical (both from name and content - checked with compare data tables in JMP)

2. you are talking about top and bottom, what do you mean with that? If I do the distribution of um and % I see that for each sample um is  distributed almost uniform, and % many at 0 or close to up to single values close to 18. I cannot see two different distributions in a sample. What do I miss?

 

I guess with this information (and check if the data is correct) we can work on your questions and how to calculate what you need. 

3. What do you mean with histogram in %? 

/****NeverStopLearning****/

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

OK, I think I didn't look at the picture carefully enough. So % has two peaks over the course of um and you would like to get the position of the peaks, or some information about the spread of the data around the first peak (bottom) and the same for the values around the second peak (top). 

With height you want to get the mean um value for the top and bottom distributions and compare the 5 sample's top positions with each other as well the 5 bottom positions. 

 

Looking at the distribution across um it looks very uniform as the measurements across um seem to be equidistant and therefore you have similar amount of values in one bin. Distributions are for one variable, you use two in relationship. I'll take a second look at it and come back to you soon.

/****NeverStopLearning****/

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

So my best bet to do this analysis is using Fit Curve and a peak model like gaussian peak. You can find it under analyze->Specialized Models-> Fit Curve. Use um as X, and % as Y. put samples as group, press ok. Now use in the hotspot gaussian peak under peak models. This will fit five peak models and in the hotspot of the Gaussian Peak outline you then have compare parameter estimates, equivalence test and so forth. It will concentrate on the first peak, therefore, if you want to do this for the bottom and the top seperately you first need to divide the um data into two spots. e.g. 0.75 seems to be a good separator. Then you can use the new column as a By group. 

 

Hope this helps. Attached the data table with some scripts which should help you with a starting point. From there you can probably walk alone. 

/****NeverStopLearning****/
Kate
Level II

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

Hi, Martin,

That´s a very nice learning to use the fit curve and peak model! It automatically excludes the headache of noises, very helpful, Many thanks! Sorry that I didn´t describe my request clearly. Sorry for the confusion. Please excuse.

 

I still face difficulties. Because at the end, I need to compare the height and the statistics of the height in µm. The height of each sample can be defined as the difference between the z position at the bottom_peak (in um) to the z position at the top_peak (in um). The height should also has a confidence interval, which is deduced from its corresponding top and bottom distributions, correct? I don´t know how to do this.

 

I think the difficulty in the problem is that the input data is a histogram which includes the z position ( in um) and the frequency in %. The frequency indicates how the z positions are distributed. The corresponding physical module is to analysis and compare the height of the binary (step) samples. That´s why it´s calculated from the z position difference between top to bottom.

 

Trying to put the % column into the Analyze -->Distribution --> Freq, it seem to be good, but still I can´t get the height statistics and compare the sample heights by JMP.

 

One of the output I need is the parameter comparison plot as below (example copied from your solution), but to compare the height in µm (instead of the Peak Value, I know you just give it as an example). Thanks for your help!

Kate_0-1643279885959.png

 

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

So you want to get the um locations of the peaks and compare them instead of the peak value? You probably need to use formula columns to extract the peak locations first and then do what has been described before. Not sure if that is exactly what you are looking for, but I added two columns to the data table. 

 

other possible scripts to detect peaks in a distribution could be one of the other threads when searching for "peak" in the community and click on the results like e.g. this one https://community.jmp.com/t5/Discussions/find-multiple-peak-values-in-a-column/m-p/62919#M33840 

/****NeverStopLearning****/
Kate
Level II

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

@martindemel, Thanks a lot!

Kate
Level II

Re: How to get the confidence interval (CI ) and to make t-test with multi distributions in one histogram (as direct input data)?

@martindemel, Taking your method to look at more data, I find it provides more info than my initial target set before. The parameters from the fit module can play as the metrics for the quality control. that´s fantastic, Thanks!