Subscribe Bookmark RSS Feed

Scripting - How to select best fit distribution and collect capability parameters?

Lisa

New Contributor

Joined:

Feb 28, 2017

I want to create a script that does the following:

 

1) take a data set (1 column at a time)

2) do a continuous fit for all distributions

3) find the best fit, apply the spec limits and analyze capability parameters (CpK specifically)

4) output the CpK (and probably other stuff) to a table 

 

I know how to do this manually, but even after I do it and save a script, the script doesn't regenerate the same analysis I had done originally. If I set the spec limits within a given distribution (the red triangle beside Normal 3 instead of the top level triangle), that gets lost completely in the script. If I set them from the top level the info is stored, but I have 50-100 data sets to run this analysis on, and they are a range of distributions. I need the script to find the BEST fit, then do this analysis.

 

Example 1 below is a code snippet I've gotten to work (not output to a table yet, but at least get the right distribution and analysis), but it requires me to know that the data will be Normal 3 mixture. If I use Example 2 it defaults to a Normal distribution, which is wrong.  If I could figure out how to evaluate the best fit as a parameter I could use in a switch function or comparable to set up the syntax required for each distribution.  

// Example 1 that gives meaningful results, but requires me to know the distribution

Distribution(
	Column("Test Data"),
	Fit Distribution (Normal Mixtures( Spec Limits( LSL( -1 ), USL( 1 ) ), Clusters( 3 ) )
	)
);

// Example 2 gives garbage because it assumes the distribution is normal, and it's never normal

Distribution(
	Column("Test Data"),
	Fit Distribution ("All"), Capability Analysis ( LSL( -1 ), USL( 1 ) ) 
);

Other help with outputting to a table would be great too. This is my first JMP script and I'm struggling to find clear examples in the Scripting Guide.

 

4 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

Here is a script that should give you a good start.  It performs your desired tasks for a single column

names default to here(1);
dt=open("$SAMPLE_DATA\big class.jmp");
// Set the limits for the target column
dt:height << set property("spec limits",{LSL(57),USL(68),show limits(1)});
// Run the Distribution Platform
dis=Distribution(
	Continuous Distribution( Column( :height ), Fit Distribution( "All" ) ),
	capability analysis(1)
);
// Point to the first capability analysis and create a data table from it
dtCpk = report(dis)[outline box(9)][1][3][1]<<make data table;
// Strip off the tpe of distribution from the paragraph heading and give that to the new data table
dtCpk << Set Name(report(dis)[outline box(6)]<<get title);
// Close the unwanted output
report(dis)<<close window;
Jim
Lisa

New Contributor

Joined:

Feb 28, 2017

Thanks for the help but the capability analysis still defaulted to a Normal distribution. You did give me some other coding ideas on how to improve my script though! The "outline box" command is very useful.

 

I was able to use "outline box" to parse out the top fit from the Compare Distributions table and key off that name (Normal 3 Mixture) to create the string required for the capability analysis call required for that distribution. This is NOT elegant, but it's working so far.

 

Thanks. 

stephen_pearson

Community Trekker

Joined:

Oct 6, 2014

I would be careful using the best fit, as the numerically best fit is not always the logically best fit. In my experience I have found the distribution platform to over fit - for example suggesting a Normal 3 mixture was the best fit but based on the theory of how the data is generated, a log normal fit was most appropriate.

It requires a degree of subjective evaluation, in the example above the AICc scores for the more exotic/complex fits were only fractionally lower. If many of the columns share the same mechanism for how the data is generated, it might be prudent to pick the best ranking fit across all those columns by combining the ranks from all the different columns.

This will require a bit more JSL as you will need to save the Best fits table for each column, rank them, compare them and then extract the best fit across the columns.
markbailey

Staff

Joined:

Jun 23, 2011

I concur with Stephen's caution.

I understand the desire to reduce the tedium of performing such an analysis by hand with a script. A script is a perfect way to automate JMP activity.

But the ability to fit and rank all of the distribution models at once is intended for a exploratory analysis. You might conduct such an exploration early in process development when little is known about process behavior. On the other hand, conducting a capability study is a confirmatory analysis that is conducted late in process development when much is known and nothing about the process is changing. For example, a proper capability analysis requires that  process control has been demonstrated, at least through phase I.

You can't really automate the subjective decision about the distribution model while using objective methods. Or maybe I should put it like this, "Just because you can doesn't mean you should."

Learn it once, use it forever!