Default Kernel Std in Smooth Curve fit (Distribution Platform)

Super User

Joined:

Jun 23, 2011

When selecting "Smooth Curve" in the Distribution platform JMP displays a fitted Kernel Density function along with a slider for varying the "Kernel Std".

Without luck, I have looked for any details on how JMP sets the default Kernel Std in the manuals. I have also experimented with ratios between StdDev (or SE) and default Kernel Std for datasets of different size in order to find any consistent pattern.

Does anyone know how JMP chooses the default Kernel Std, i.e. the one showing up initially before the user touches the slider?

And, second, is it possible to set the Kernel Std by JSL-script?

Message was edited by: MS
3 REPLIES
The Smooth Curve in the Distribution platform is a kernel density estimator. You are right, the method for estimating the initial Kernel Std is not in the documentation, but that was on purpose. The method is a complicated one, and we felt it better to not try and document it. The Kernel Std can not be set through JSL, although I think it's a good idea to make it so you can set it with JSL.

The Distribution platform in JMP v9 will feature a new distribution you can fit, called Normal Mixtures. If a distribution is multi-modal, it fits a normal distribution (with a separate mean and variance) to each group. The pdf ends up being a weighted sum of the component pdfs. Well, I tell you this because the Smooth Curve is just a special case of the Normal Mixture. If you fit enough groups, the Normal Mixture fit is essentially identical to the Smooth Curve fit. But, the Normal Mixture fit has a big advantage in that it is much easier to understand and explain.

Super User

Joined:

Jun 23, 2011

Why i am asking is because I try to figure out a if I can use quantiles for the smooth curve fit as a robust way to routinely identify outliers in interlaboratory tests, in cases where the distribution is clearly asymmetric or multimodal. The quantiles I can set and retrieve via JSL but I am not sure if I can trust the default Kernel Std to be adequate for each round of tests which may have different number of participants or distribution shape.

It would be neat to be able to set the Kernel Std to e.g. 3/4 of the standard deviation (or whatever) by JSL. But maybe the JMP definition and default choice, although too complicated to document, is good enough to allow comparison of the kernel density quantiles among different sets of data.

In some textbooks or other statistical software's documentation it is referred to the kernel density "bandwidth". Is that comparable to JMPs Kernel Std?

(Btw I look forward to JMP v9 and as a Macintosh user I would very much welcome the return of, at least some rudimentary, Applescript support)

Message was edited by: MS

Message was edited by: MS
Yes, it is also called bandwidth.

If data is just simply asymmetric, then many of the available distributions will fit it. Particularly the Johnson distributions and Glog. They can handle heavy skewness. If it's not too skewed, then perhaps one of the common ones will work too, like Weibull or LogNornal. For different shapes of skewness, the fitted parameters will be different of course.

If data is multimodal, then the only option now is the smooth curve fit. If the shape of each data set is relatively the same, then the estimated kernel std will be roughly similar. If the shapes are very different, then the kernel std is different. Don't try to compare the kernel std's, but it is ok to make a rough comparison of the quantiles. I haven't formally investigated the performance of the smooth curve fit across different samples that are supposed to be similar or different, so I am guessing at these things. And, of course, quantiles can vary depending on overfitting or underfitting of the smooth curve.

If you want to estimate quantiles without worrying about a fitted distribution, there is an empirical quantile function in the formula editor, under the Statistical group. Table > Summary can also spit out quantiles

If you want to investigate for outliers, sometimes the best way to identify them is with the eye. Like on a histogram or the outlier box plot. But, I'm sure there is perhaps an automated/quantitative way to do it. What is an outlier to one fitted distribution, may not be to another fitted distribution though.