Modernizing the Distribution platform

tonya_mauldin · Nov 27, 2019 01:10 PM

The Distribution platform is one of the most widely used platforms in JMP. This platform is not only used for testing which distribution fits your data, it is used for data exploration, capability analysis, and so much more. This platform has been around since the first version of JMP and it was time for a modernization. JMP 15 has delivered the modernization of this commonly used platform.

My first blog post detailed the new fitters in JMP 15. The second blog post detailed comparing distribution fits. This blog post details the new options available for fitted distributions.

Diagnostic Plots

The diagnostic plot has been replaced with two new plots (QQ and PP plots). The quantile-quantile (QQ) plot shows the relationship between the observations and the quantiles obtained using the estimated parameters. The percentile-percentile (PP) plot shows the relationship between the empirical cumulative distribution function (CDF) and the fitted CDF obtained using the estimated parameters.

To see an example of these plots, open Braces.jmp found in the Quality Control sample data folder.

Open("$SAMPLE_DATA/Quality Control/Braces.jmp");

Select Analyze->Distribution. Specify # defects as Y, Columns and click OK.

From the red triangle next to # defects, select Discrete Fit->Fit Negative Binomial. From the red triangle next to Fitted Negative Binomial Distribution, choose QQ Plot and PP Plot.

For the QQ Plot, most observations fall close to the diagonal line. The last point is questionable. If the two sets (fitted negative binomial quantiles and the # defects) come from the same distribution, the points should fall approximately along the reference line. The PP plot has a similar interpretation. The points in the PP plot fall approximately along the reference line indicating that the Negative Binomial fitted CDF and the empirical CDF come from a population with the same distribution.

Profilers

Two Profilers have been added. The Distribution Profiler is a Prediction Profiler of the cumulative distribution function (CDF). The Quantile Profiler is a prediction profiler of the quantile function.

Using the previous example, choose Distribution Profiler and Quantile Profiler from the red triangle menu next to Fitted Negative Binomial Distribution.

If the data follow the fitted negative binomial distribution, the Distribution Profiler shows that the probability of getting 27 defects or fewer is .896934 (~90%). The Quantile Profiler gives the inverse of the Distribution Profiler. For this example, the probability that we get seven or fewer defects is .1 or 10%. The Profilers are interactive. You can move the slider around to determine what the probability would be for various numbers of defects.

Save Columns

Two new save column features have been added. Save Distribution Formula saves a column to the data table that contains the cumulative distribution function (CDF) formula computed using the estimated parameter values. Save Simulation Formula saves a column to the data table that contains a formula that generates simulated values using the estimated parameters. This column can be used in the Simulate utility.

Goodness of Fit

The goodness of fit test has been standardized to the Anderson-Darling for continuous fits and the Pearson Chi-Squared test for discrete fits. The Pearson Chi-Square test has improved bin creations. JMP now satisfies the rule of thumb that there are at least five expected observations in each bin.

For the previous example, click on the red triangle next to Fitted Negative Binomial Distribution and choose Goodness of Fit.

The p-value is 0.4971. At any meaningful alpha level, we fail to reject the null hypothesis that the data are from the Negative Binomial distribution.

Summary

QQ Plots
PP Plots
Distribution Profiler
Quantile Profiler
Save Distribution Formula
Save Simulation Formula
Anderson-Darling
Pearson Chi-Square improved binning

These three blog posts only scratch the surface of the JMP 15 new features in the Distribution platform. Look for my next blog post in which I detail new capability options in the Distribution platform.

TracyCamp · ‎06-22-2021

The previous diagnostic plot included confidence bounds and looked like JMP's normal quantile plot. The new QQ plot does not seem to allow for the bounds and appears to be a mirror image of the normal quantile plot. What was the reason for this change? Can the intervals be included in the QQ plot? (These lack of intervals are causing a pretty major reconstruction of a training presentation and I'd like to understand the reason for the change.)

tonya_mauldin · ‎06-23-2021

In previous versions, there was frustration that some distributions essentially had a q-q plot and others had a p-p plot. So we decided to offer both for each of the distributions rather than choosing which diagnostic plot was most appropriate for each.

As you discovered, confidence interval bands have not yet been added to these plots. The developer does intend to add these. Currently we are investigating a possible implementation into JMP 17.

David_Burnham · ‎08-07-2021

The refresh to the fitters is very welcome. However, there is now a significant performance hit when a goodness of fit is included:

Fitting 3 distributions to 20 column variables:

legacy: 0.4 sec

new: 6.3 sec

Given that I'm not fitting 20 column variables but closer to 1000 that's quite a performance hit!

tonya_mauldin · ‎08-25-2021

The Anderson-Darling test does require more processing time. That is the price you pay for getting a more trustworthy p-value.