JSL function to perform fits of distributions to data

nikles · ‎12-05-2023

What inspired this wish list request?

At my company we need to compute ppk and new limits for datasets containing ~10k rows x 10k cols, and this needs to be done on multiple datasets semi-daily. This can take a very long time. I've explored writing scripts to automate this using either the Distribution or Process Capability platforms and output the ppk, the best-fit, the parameters for that fit, and a new set of limits based on a target ppk. Some colleagues have written a script in Matlab to do the same. Despite all my efforts to optimize my JMP script, the Matlab script is still about 10x faster.

What is the improvement you would like to see?

I would like to see a JSL function that can be run in a script and does not create any report windows. It would have these inputs:

the data for a column as a 1xn matrix,
a list of distributions to fit to the data,
the preferred ppkmethod to use (score, percentile),
target ppk

And it would output the following:

Name of the distribution with the best fit (minimum AICC)
a vector or list of the fit parameters for the best-fit distribution type
the ppk,
ppkmethod used (zscore or percentile),
aicc (nice to have, but not essential)
new limits, computed using the target ppk (nice to have, but not essential)

Lastly, it should be able to fit these distributions: Beta, Exponential, Gamma, Johnson, Lognormal, Normal, Normal Mixture, SHASH, and Weibull.

Why is this idea important?

(1) I want my team to use JMP, but it is nearly impossible to convince them if my script takes 30min to run while the Matlab script does the same thing but takes only 1min. The reason my script is slower is due to all the overhead of creating a report window for 10k columns. Instead, it would great if these calculations could be done with a function that can skip report generation and just provide the results I need. I believe this would be much faster, as well as easier to code.

(2) We must be able to analyze datasets >10k columns. I've noticed that my script just stops in the middle of execution when I try to run it with 12000 columns. I have not identified the root cause yet, but it works fine for 3600 columns or less. I suspect there may be a limitation in the report window generation causing an error.

Other things I've tried

1. Both the Fit Censored() and Fit Transform to Normal() functions are similar to the function I'm requesting, in that they both can be used to get the fit parameters. They do not compute ppk or new limits, but I can use the fit parameters to compute ppk and new limits myself. However, these functions do support only a subset of distributions I would like to have.

2. I'm currently exploring writing something in Python and incorporating that into my existing script, replacing the usage of the Distribution and Process Capability platforms. This will take me a lot of time though.

Unfortunately the scripts to reproduce my problem are rather long. I will provide on request, but hopefully my explanation of the problem above is sufficient.

My Specifics:

JMP Pro 17.2.0

MacOS Ventura 13.6.1