New Distribution options in JMP 10: Custom Quantiles

My previous blog post discussed two new options in the JMP 10 Distribution platform for customizing the summary statistics and default quantiles reports. This blog post discusses another new JMP 10 feature for the Distribution platform for continuous data: Custom Quantiles. This feature gives the user even more customization options for studying quantiles. Custom Quantiles, which can be found under the Display Options menu, allows a user to enter any number and value of custom quantiles. It reports two different types of quantile estimates as well as their confidence intervals.

To demonstrate the new Custom Quantiles features, let us examine the weights of the students found in Big Class.jmp in JMP’s Sample Data folder. In particular, we want to study the 5%, 50% (median) and 95% weight quantiles for this set of students as well as their 95% confidence intervals. Figure 1 shows the Custom Quantiles dialog with our quantiles of interest entered into the list, which is the default entry method.

Figure 2 shows the Custom Quantiles report towards the bottom. The first type of quantile estimate is the same type of empirical quantile estimate as those found in the default quantiles report. Nonparametric confidence intervals based on order statistics are calculated for these empirical quantiles. With data sets that contain few rows, a significant drawback of these types of confidence intervals is that their actual probability of containing the true population quantile can be much lower than the specified confidence level. To address this limitation, we also added smoothed empirical likelihood quantile estimates that are based on a kernel density estimate. An advantage of these quantile estimates is that their confidence intervals tend to contain the true quantile with the promised confidence level.

Looking at our Custom Quantiles report for the weights, notice that the empirical quantile estimates and the smoothed empirical likelihood quantile estimates are fairly similar. Examining the first part of the report with the empirical quantile estimates, you see that the actual coverage of the confidence intervals for these 95% and 5% quantile estimates do not meet the 95% confidence level. These confidence intervals span the entire range of the data with only an 87% confidence level. Because the data sample size is relatively small at 40 observations, it is impossible to create these confidence intervals at a 95% confidence level. In addition, the confidence interval for the median, the 50% quantile estimate, is a little conservative with a confidence level of 96%. Now examine the second part of the report with the smoothed empirical likelihood quantile estimates. You see that we do get true 95% confidence intervals for all of these quantile estimates.

The following are references to learn more about these quantiles and their confidence intervals:

Hahn, G. J. and Meeker, W. Q. (1991), Statistical Intervals: A Guide for Practitioners, New York: John Wiley & Sons.

Chen, Song Xi and Hall, Peter (1993) Smoothed Empirical Likelihood Confidence Intervals for Quantiles. The Annals of Statistics, Vol. 21, No. 3, 1166-1181.

6 Comments
Community Member

Cyrus wrote:

Did this ever get implemented? I'm running JMP 11 and can't find this ability anywhere.

Staff

Laura Lancaster wrote:

Hi Cyrus,

Yes. This was added beginning with JMP 10. To find this, open up the Distribution platform with a continuous column of data. Under the little red triangle menu for your column's distribution go the "Display Option" submenu and then select "Custom Quantiles". Also, here is some JSL script you can use to see this using Big Class.jmp from the sample data.

Distribution(

Continuous Distribution(

Column( :height ),

Custom Quantiles( 0.95, [0.2, 0.4, 0.6, 0.8] )

)

);

Let me know if this is still not clear.

Thanks!

Laura

Community Trekker

Hi @laural_jmp  Thank you again for this contribution and the explanation.  One thing this made me realize in the context of the work I am doing is that I did not fully understand what quantiles are in the first place!  It may be useful to elaborate on this to some extent in the context of either the JMP help literature or Wikipedia (which has some explanation on the topic, although I feel it may not be practical enough for the applied engineering statistician)  

 

Can you help to clarify for me what the principal difference is between the "Quantiles: Uncentered and Unscaled" (indicated in the green box in my attached image and generated under "Fit Continuous Distribution"  and the default Quantiles (or those generated using "Custom Quantiles" as shown per green box) in which JMP calculates?   

 

See attached. 

 

Thanks! @PatrickGiuliano 

Untitled.png

Staff

Hi Patrick,

 

The big difference between the quantiles that you have highlighted is that the "Quantiles: Uncentered and Unscaled" below the Fitted Shash outline actually use quantiles from the fitted SHASH distribution.  (You can learn  more about the SHASH distribution here: https://www.jmp.com/support/help/en/15.0/#page/jmp%2Ffitted-quantiles.shtml%23ww1162331#ww175052/ )

The quantiles below the Custom Quantiles outline do not assume any type of distribution (Normal, SHASH, etc.) and just uses the data to calculate the quantiles.  These are empirical quantiles.  The method we use is  described here:  https://www.jmp.com/support/help/en/15.0/?os=win&source=application#page/jmp%2Fquantiles.shtml%23ww... )  

 

I hope this helps.

 

Thanks.

 

Laura

 

Community Trekker

Hello Laura, 

 

I am having trouble computing the 50% confidence intervals on the 95% percentile. It is giving me an estimate, but not the Upper and Lower CI. Do you know why this may be the case? I have a continuous variable in the distribution, with no missing data. I have attached a screenshot below.

Screen Shot 2019-08-08 at 3.08.14 PM.png

 

Thanks,
Todd

Community Trekker

Hi @spirotodd,  I tried to reproduce a dataset as close as I could to the one you summarized in your Distribution platform Image. I came up with a similar scenario except that the Median of my dataset is equal to 1 and not 2 like yours (see attached).  I'm not sure what the math is for the upper and lower confidence limit on the 95th percentile (it should be a non-parametric estimate, so it will not use standard t- or z-tables, and it will not be equal to a smoothed empirical likelihood function), but fundamentally, the estimates have to be based on some function of the data. 

 

I generated the CDF plot inside the Distribution platform.  This highlights the insight that you have limited data spanning the range of your response variable -- and in many cases you don't have values within this range.  In other words, your data is 'resolution limited'.  In other words, your data displays symptoms of inadequate discrimination (aka readability or resolution). 

 

CDF plot0.png

CDF plot.png

You can also see this in an I&MR chart (you can generate one in the Control Chart Builder), You can randomize the order of the data if the order doesn't matter and you will see that for the moving range, many of the moving ranges are zero or very close to zero.

I-MR Chart.png

For the purposes of naivete, let's assume the upper and lower 50% confidence limits on the observed 95th percentil of 24 are equal to 23 and 26 respectively, to align with those calculated using the Smoothed Emperical Liklihood Quantiles.  What I notice here is that at those particular values, the CDF is perfectly "flat".  Might we take this to mean that the calculated values on the basis of this data are (mathematically) undefined? Consider a similar scenario in which we are at the 75th percentile and we try to compute the upper and lower 50% confidence limits.  Here we might be betwen ~ 5.5 and ~19 if the distribution function was piecewise smooth.  In that case we don't have any resolution (again we are on the flat part of the CDF).

Untitled.png

 

CDF plot2.png

This is just my thinking, hopefully it helps in some way.  Cheers