Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Default Kernel Std in Smooth Curve fit (Distribution Platform)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Default Kernel Std in Smooth Curve fit (Distribution Platform)

Sep 25, 2009 4:33 AM
(4775 views)

When selecting "Smooth Curve" in the Distribution platform JMP displays a fitted Kernel Density function along with a slider for varying the "Kernel Std".

Without luck, I have looked for any details on how JMP sets the default Kernel Std in the manuals. I have also experimented with ratios between StdDev (or SE) and default Kernel Std for datasets of different size in order to find any consistent pattern.

Does anyone know how JMP chooses the default Kernel Std, i.e. the one showing up initially before the user touches the slider?

And, second, is it possible to set the Kernel Std by JSL-script?

Message was edited by: MS

Without luck, I have looked for any details on how JMP sets the default Kernel Std in the manuals. I have also experimented with ratios between StdDev (or SE) and default Kernel Std for datasets of different size in order to find any consistent pattern.

Does anyone know how JMP chooses the default Kernel Std, i.e. the one showing up initially before the user touches the slider?

And, second, is it possible to set the Kernel Std by JSL-script?

Message was edited by: MS

4 REPLIES 4

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Default Kernel Std in Smooth Curve fit (Distribution Platform)

The Smooth Curve in the Distribution platform is a kernel density estimator. You are right, the method for estimating the initial Kernel Std is not in the documentation, but that was on purpose. The method is a complicated one, and we felt it better to not try and document it. The Kernel Std can not be set through JSL, although I think it's a good idea to make it so you can set it with JSL.

The Distribution platform in JMP v9 will feature a new distribution you can fit, called Normal Mixtures. If a distribution is multi-modal, it fits a normal distribution (with a separate mean and variance) to each group. The pdf ends up being a weighted sum of the component pdfs. Well, I tell you this because the Smooth Curve is just a special case of the Normal Mixture. If you fit enough groups, the Normal Mixture fit is essentially identical to the Smooth Curve fit. But, the Normal Mixture fit has a big advantage in that it is much easier to understand and explain.

The Distribution platform in JMP v9 will feature a new distribution you can fit, called Normal Mixtures. If a distribution is multi-modal, it fits a normal distribution (with a separate mean and variance) to each group. The pdf ends up being a weighted sum of the component pdfs. Well, I tell you this because the Smooth Curve is just a special case of the Normal Mixture. If you fit enough groups, the Normal Mixture fit is essentially identical to the Smooth Curve fit. But, the Normal Mixture fit has a big advantage in that it is much easier to understand and explain.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Default Kernel Std in Smooth Curve fit (Distribution Platform)

Thanks for your reply!

Why i am asking is because I try to figure out a if I can use quantiles for the smooth curve fit as a robust way to routinely identify outliers in interlaboratory tests, in cases where the distribution is clearly asymmetric or multimodal. The quantiles I can set and retrieve via JSL but I am not sure if I can trust the default Kernel Std to be adequate for each round of tests which may have different number of participants or distribution shape.

It would be neat to be able to set the Kernel Std to e.g. 3/4 of the standard deviation (or whatever) by JSL. But maybe the JMP definition and default choice, although too complicated to document, is good enough to allow comparison of the kernel density quantiles among different sets of data.

In some textbooks or other statistical software's documentation it is referred to the kernel density "bandwidth". Is that comparable to JMPs Kernel Std?

(Btw I look forward to JMP v9 and as a Macintosh user I would very much welcome the return of, at least some rudimentary, Applescript support)

Message was edited by: MS

Message was edited by: MS

Why i am asking is because I try to figure out a if I can use quantiles for the smooth curve fit as a robust way to routinely identify outliers in interlaboratory tests, in cases where the distribution is clearly asymmetric or multimodal. The quantiles I can set and retrieve via JSL but I am not sure if I can trust the default Kernel Std to be adequate for each round of tests which may have different number of participants or distribution shape.

It would be neat to be able to set the Kernel Std to e.g. 3/4 of the standard deviation (or whatever) by JSL. But maybe the JMP definition and default choice, although too complicated to document, is good enough to allow comparison of the kernel density quantiles among different sets of data.

In some textbooks or other statistical software's documentation it is referred to the kernel density "bandwidth". Is that comparable to JMPs Kernel Std?

(Btw I look forward to JMP v9 and as a Macintosh user I would very much welcome the return of, at least some rudimentary, Applescript support)

Message was edited by: MS

Message was edited by: MS

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Default Kernel Std in Smooth Curve fit (Distribution Platform)

Probably a bit late, but I found the post looking for setting the kernel value. Here is a solution for creating a reproducible smooth fit with adjusted bandwidth in case somebody looks for it again. I hope though that the Jmp team improves on that...

```
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dst=Distribution(
Automatic Recalc( 1 ),
Continuous Distribution(
Column( :height ),
Vertical( 0 ),
Fit Distribution( Smooth Curve ),
Confidence Interval( 0.95 )
),
Histograms Only
);
dstr=dst<<Report;
dstr[Sliderbox(1)]<<set value(0.7);
npd=dstr[OutlineBox(3)]<<Get Scriptable Object;
npd<<Quantiles( 0.00135, 0.99865 );
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Default Kernel Std in Smooth Curve fit (Distribution Platform)

Yes, it is also called bandwidth.

If data is just simply asymmetric, then many of the available distributions will fit it. Particularly the Johnson distributions and Glog. They can handle heavy skewness. If it's not too skewed, then perhaps one of the common ones will work too, like Weibull or LogNornal. For different shapes of skewness, the fitted parameters will be different of course.

If data is multimodal, then the only option now is the smooth curve fit. If the shape of each data set is relatively the same, then the estimated kernel std will be roughly similar. If the shapes are very different, then the kernel std is different. Don't try to compare the kernel std's, but it is ok to make a rough comparison of the quantiles. I haven't formally investigated the performance of the smooth curve fit across different samples that are supposed to be similar or different, so I am guessing at these things. And, of course, quantiles can vary depending on overfitting or underfitting of the smooth curve.

If you want to estimate quantiles without worrying about a fitted distribution, there is an empirical quantile function in the formula editor, under the Statistical group. Table > Summary can also spit out quantiles

If you want to investigate for outliers, sometimes the best way to identify them is with the eye. Like on a histogram or the outlier box plot. But, I'm sure there is perhaps an automated/quantitative way to do it. What is an outlier to one fitted distribution, may not be to another fitted distribution though.

If data is just simply asymmetric, then many of the available distributions will fit it. Particularly the Johnson distributions and Glog. They can handle heavy skewness. If it's not too skewed, then perhaps one of the common ones will work too, like Weibull or LogNornal. For different shapes of skewness, the fitted parameters will be different of course.

If data is multimodal, then the only option now is the smooth curve fit. If the shape of each data set is relatively the same, then the estimated kernel std will be roughly similar. If the shapes are very different, then the kernel std is different. Don't try to compare the kernel std's, but it is ok to make a rough comparison of the quantiles. I haven't formally investigated the performance of the smooth curve fit across different samples that are supposed to be similar or different, so I am guessing at these things. And, of course, quantiles can vary depending on overfitting or underfitting of the smooth curve.

If you want to estimate quantiles without worrying about a fitted distribution, there is an empirical quantile function in the formula editor, under the Statistical group. Table > Summary can also spit out quantiles

If you want to investigate for outliers, sometimes the best way to identify them is with the eye. Like on a histogram or the outlier box plot. But, I'm sure there is perhaps an automated/quantitative way to do it. What is an outlier to one fitted distribution, may not be to another fitted distribution though.

Article Labels

There are no labels assigned to this post.