Modernization of the Distribution Platform (2020-US-30MP-536)
Tonya Mauldin, Principal Analytics Software Tester, SAS
Distribution is one of the most widely used platforms in JMP. It has been around since the first version of JMP. It's useful for all sorts of things, including data exploration, capability analysis – and of course, testing which distribution fits your data. Version 15 introduces a modernized distribution platform. This talk discusses the changes made to the distribution platform in Version 15 of JMP.
Auto-generated transcript...
Speaker | Transcript |
tobake | The Modernization of the Distribution Platform. My name is Tonya Mauldin, and I am the tester for distribution. |
For this presentation I will be using version 15.2 of JMP. | |
Distribution is one of the most widely used platforms in JMP. This platform is not only used for testing which distribution fits your data, it is used for data exploration, capability analysis, and so much more. JMP 15 brings a modernization to this commonly used platform. | |
Why did we decide to update this platform? Distribution has been around since the first version of JMP. It was time to make this platform more modern. We did this by improving the flow, providing a cleaner look, | |
making the product more consistent and easier to maintain. | |
Distribution of fitters now use the same code as generalized regression. Capability analysis now uses the same code as process capability. | |
the negative binomial distribution which is equivalent to the gamma poisson, | |
the Cauchy distribution, zero inflated poisson, and ZI negative binomial. | |
Johnson fits are now a single command that selects the best fitting distribution from the Johnson system of distribution. | |
This is the same method, quantile matching, that is used in the process capability platform. | |
This method is more stable and faster than maximum likelihood. | |
Users now have the ability to type the specific bandwidth parameter for each for the smooth curve fit. | |
Standard errors have been added for the SHASH distribution. | |
Let's use JMP to investigate some of these. | |
Airlines Delays is part of the sample data that comes with JMP. | |
Let's perform a distribution analysis on the column Arrival Delay and fit a Johnson distribution. | |
Notice that rather than seeing the three different Johnson fitters, there's one option called fit Johnson. After choosing this option, we see the best fit from the Johnson distribution family for this data is the Johnson SU distribution. | |
Let's also add a smooth curve fit to this report. | |
Prior to JMP 15, the only way to control the bandwidth parameter for the non parametric density was via a slider. | |
Now there is an option to specify a specific value for the bandwidth parameter within the user interface as well as via JSL. | |
In addition to compare...in addition to altering the fitters themselves, JMP has changed the way fit comparisons are made. As each fit is selected, it is added to a compare distributions report. | |
In previous versions of JMP you only got a compare distributions report when using the All option under continuous fits. | |
AICc weight and BIC have been added to this report to make it more consistent with other platforms in JMP. | |
The histogram legend has been removed from the report. This information is now contained within the compare distributions report, which always appears directly below the histogram. | |
Additionally overlaid CDF plots have been added to the report. | |
Let's look at some of these in JMP. | |
This data table Washers is provided in the sample data that comes with JMP. | |
Let's perform a distribution analysis on the #defective column. | |
Since this is count data, let's fit the negative binomial. This is a new fitter for JMP 15. It is equivalent to the gamma poisson fit. | |
The compare distributions report was added automatically. | |
Let's also fit the ZI negative binomial distribution to see how it compares. | |
Remember the zero inflated negative binomial was also a new fitter available in JMP 15. | |
Information about this fit was also added to the compare distributions report. | |
You will also notice some changes within the compare distribution report itself. | |
AICc weight corrected and weighted, Akaike information criterion, and BIC bayesian information criterion have been added to this report. | |
These statistics were already available in other parts of JMP, such as model comparison. They have now been added to the distribution platform to make JMP more consistent. | |
Notice that the compare distributions report always appears directly underneath the histogram. | |
This gave us the ability to remove the legend that previously appeared beneath the histograms. | |
This information is now contained within the compare distributions reports. | |
We find that the green line is for the negative binomial fit, the blue line is for the ZI negative binomial fit. | |
CDF plots have also changed. As fits are added, the CDFs for the fitted distributions are superimposed on to the empirical CDFs. | |
We see that both the green and blue lines closely followed the empirical CDFs for this example. | |
The histogram shows that the green and blue lines closely following the data. In a compare distributions report, | |
the negative binomial distribution appears first because it has a smaller a AICc value, indicating a better fit. If we wanted to use the Ic, rather than AICc as our criterion for best fit, we can simply click on that column header to perform the sort on that column. | |
Something that is helpful but may not be obvious is that you can remove a fit from the report window entirely by double clicking on the fit name in the compare distribution section. Here, I double clicked on ZI negative binomial to remove it from the report. | |
QQ and PP plots. The quantile-quantile plot shows the relationship between the observations and the quantiles obtained using the estimated parameters. | |
The percentile-percentile plot shows the relationship between the empirical cumulative distribution function and the fitted CDFs obtained using the estimated parameters. | |
Two profilers have been added. The distribution profiler is the prediction profiler of the cumulative distribution function. | |
The quantile profiler is the prediction profiler of the quantile function. | |
save distribution formula and save simulation formula. | |
The goodness of fit tests have been standardized to use Anderson-Darling and Pearson chi-square. The Pearson chi-square test has improved ??? creations. Beginning with version 15 in JMP, we now satisfy the rule of thumb that there are at least five expected observations in each ???. | |
Let's look at some of these in JMP. From the previous Washers report, select QQ plot and PP plot. | |
For each of these plots, we are looking to see how closely the points fall to the reference line. The closer the points are to the reference line the better the fit. | |
Add the distribution and quantile profilers to the report. | |
If the data follow the negative binomial distribution, the probability of having four defective or fewer is 84%. | |
As with other profilers in JMP, you can alter the input settings to see what effect it will have on the probability. | |
The quantile profiler works in a similar manner, which shows the relationship between the probability and the negative binomial quantile. | |
Another new option is the save simulation formula. | |
This option saves a new column to the data table that contains a formula that generates simulated values using the estimated parameters. | |
This column can be used in the simulate utility as well as in other parts of JMP. | |
Although we are dealing with discrete data, let's add the exponential distribution to the report so we can view the goodness of fit report. | |
You can see from the histogram that the exponential distribution may not be a good fit for this data. To test this hypothesis, select goodness of fit. | |
The Anderson-Darling test is ??? for continuous distribution. Here we have a simulated p-Value of less than .05, which indicates that we should reject the null hypothesis that this data comes from the exponential distribution. | |
Process Capability was introduced in JMP version 12. Parts of this new platform were added to control chart builder at the same time. | |
Now with JMP 15, parts of this platform have been added to the distribution platform. This makes the platforms not only easier to maintain, but it also makes them more consistent with each other. | |
There're several differences that happen around the launch of a distribution report. | |
The histogram only shows spec limits, if show as graph reference lines is checked in the spec limits column properties. | |
The user is given the ability to disable the capability analysis, even if there are spec limit column properties. | |
If there is a process capability distribution column property, that distribution will be used for the capability analysis. | |
The workflow for the quantiles option for fitted distributions has been simplified into one launch, whereas past versions required two platform calls. | |
Normal and non normal capability analyses now use similar dialogues. | |
Let's look at JMP. The script I'm running opens the process measurements sample data table and alters some of the column properties. | |
For Process 1, there's a spec limits column property, show as graph reference lines, that's not checked. | |
For Process 2, there's the spec limit column property, shows as graph reference lines, is checked. For Process 3, there's a process capability distribution column property with Weibull defined, as well as the spec limits column property with show as graph reference lines unchecked. | |
In the distribution dialogue, I will assign these three processes as the Y. | |
Notice the new check box in the lower left corner, create process capability. This checkbox only appears if at least one Y column has a spec limit columns property. | |
Uncheck this box and click OK. | |
In previous versions of JMP, you would get a capability analysis for each of these three processes. Additionally, there would be spec limits drawn in each of the three histograms. | |
In this report, no capability analysis is given at all because we unchecked that box. Spec limits are only shown for Process 2 because it was the only Y whose spec limit column property had show as graph reference lines checked. | |
Let's go back to the distribution dialogue and check the box, create process capability, to compare. | |
Capability analysis is now giving for all three process variables. | |
In the main histogram at the top, spec limits are only shown for Process 2. | |
The capability analysis for Process 3 is based on the Weibull distribution. | |
The capability report looks the same. It has the same options that you would get with the individual detail reports in a process capability platform that was introduced in JMP 12. | |
Note that Ppk labeling is now the default for the overall sigma. | |
The report shows both within and overall sigma indices when a normal distribution is assumed. | |
Let's investigate the case where the data table has no spec limit column properties. Thickness.JMP is the sample data table with no column properties. | |
In the distribution dialogue, notice there is no checkbox for creative process capability. This is because there are no spec limit column properties. | |
Let's investigate the distribution report for Thickness 3 and Thickness 4. | |
We can add a process capability report for Thickness 3 by selecting this option from the red triangle menu. | |
These options work in the same manner as the options you would find in our process capability platform introduced in JMP 12. | |
Let's perform a simple capability analysis and define LSL as .03, target is .045 and USL is .05. Let's also turn on show limits. | |
A capability analysis based on the normal distribution is given for the specification limits. The limits are shown in the graph because we checked that option in the dialogue. | |
You're provided with both within and overall sigma capabilities statistics. | |
For Thickness 4, let's fit a beta distribution. | |
Process Capability for fitted distribution yields the following dialogue options. | |
The calculate quantile spec limits options section of this dialogue contains the quantiles option and the set k for k sigma options that were available in the legacy fitters. | |
For this example, specify the LSL prob as .05, the target prob as .5, | |
And the USL prob as .95. | |
When we click the calculate spec limit button, the spec limits are calculated for the given probabilities using the fitted beta distribution. | |
In previous versions of JMP, you had to fit a distribution, get the quantiles, and pass them back to the platform, which required two distribution calls. Now you can do this in a single call. | |
The capability analysis given is based on the beta distribution. You're provided with the ovrall sigma capabilities statistics. Spec limits are not shown in the top histogram because we did not check the show limits option in the dialogue. | |
In conclusion, th distribution platform in version 15 has been modernized to improve flow, provide a cleaner look, provide consistency and require less maintenance. | |
Thank you for attending my presentation about the modernization of the distribution platform. |