cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Dennisbur
Level IV

Data Dispersion

Hello,

I have about 80K tests with numeric results.

Some of the tests look ok, without too much dispersion data, and some of them are a lot.

I need to detect the tests that have a vast range of dispersion data.

Visually I can see it by the graph, but the problem is I have 80K test, and I need a mathematical formula that will detect and show me the test with the huge distribution.

I tried calculating through average and 3 sigmas (STD DEV), but I see it doesn't show me problematic tests.

Is there any technique to detect this test by JMP?

 

Dennisbur_0-1675700986610.png

 

4 REPLIES 4
txnelson
Super User

Re: Data Dispersion

I would look for something like the percent of points outside of the inter quartile range.  That would be where I would start, and if that doesn't work, move further out, and see if you can detect some point that gives you clarity.

Jim

Re: Data Dispersion

Dispersion typically refers to the variance. I think you are asking about highly skewed data. If so, there are a couple of ways that you might go about it. You could launch the Distribution platform with the data columns for all the tests. Press the Control key on Windows or the Command key on Mac and click the red triangle next to Sample Statistics. Select Skewness. Now right-click on the result and select Make Into Data Table. You can look for columns with unusually large skew (right hand).

 

You might instead find one of the outlier tools useful to identify these columns. See the documentation for exploring outliers.

jthi
Super User

Re: Data Dispersion

JMP does offer methods of outlier detection. One starting point for these could be Outliers blog posts 

-Jarmo
statman
Super User

Re: Data Dispersion

The others have provided some options, but I don't understand the situation well enough.  Here are my thoughts/questions:

1. Are you looking for a way to find these after the fact or while it is happening?  This can greatly impact the tools you want to use.

2. Why do you want to find these "events"?  Do you plan to anticipate, predict or eliminate these events?  Are you interested in understanding why this happens? Or do you just want to find them and react?

3. It is hard to tell from your picture whether these are indeed "outliers", whether the distributions are skewed or whether there are issues with some of the tests (e.g., measurement errors, processing errors).  Can you post a subset of the data table?

"All models are wrong, some are useful" G.E.P. Box