Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Automatically selecting outliers

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 26, 2018 7:59 AM
(6307 views)

I have 7000 rows, that consists of multiple batches. I am doing a distribution by batch and using the outlier box plot to look for outliers. But I have 100+ batches, is there a way to have JMP automatically select which rows are outliers?

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

now see if it works for you...

the script you sent was not adapted to your data at all...

24 REPLIES 24

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

You might want to look under Analyze > Screening > Explore Outliers.

Dan Obermiller

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

Created:
Nov 26, 2018 8:22 AM
| Last Modified: Nov 26, 2018 8:25 AM
(6301 views)
| Posted in reply to message from lmaley 11-26-2018

I am not sure of exactly what you are looking for. If you want to select the rows for a given distribution, when viewing the output, you can simply drag across the displayed outlier data points in the outlier box plot, and it will select the outlier data points.

But if you want to go from column to column and select outlier rows, a simple script could do that, however, remember that row selection goes across all columns, and what you would end up will are selected rows where any of the columns are selected.

What you may really want, is a simple script that colors all of the individual cells that are outliers. Maybe red if they are to far above the mean, and orange if they are outliers below the mean.

Can you please clarify exactly what you are looking for?

See Dan's reply for a better answer.

Jim

Highlighted
##

The screening outliers does not work if I need a variable in the by. What I want to do is my Y is in one column. Another column is labeled batch. I put batch in by and make the number of distributions as I do batches. So say 100 distributions. I want a script to highlight all the outliers in all 100 distributions. Ideally it will only highlight the rows so I can pick my own color since I use colors for different thing. Make more sense now?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

If you had searched the **File Exchange** portion of the **JMP Community** for '*outlier*,' you would have found these two utility scripts:

Of course, the statistical basis for each of these tests must be suited to your situation and criterion.

Learn it once, use it forever!

Highlighted
##

I tried the Grubb's script. This just shows the outliers and does not highlight the rows right? I need something that will highlight the rows.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

Hi,

perhaps here is a solution:

Ron

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

I can't get this to work with my data, I changed the height and weight to my columns and removed the two lines where it opened the table and embedded it into my table, but it says evaluations are done but will not do anything.

I also need something that uses the by and this for each batch individually.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

try to split the table by batch and group by unique value (so you do not lose any rows) then run my script.

otherwise you need a different strategy than the one i offer.

does it make sense?

otherwise you need a different strategy than the one i offer.

does it make sense?

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Automatically selecting outliers

let me try again

it is not a sleek solution but steps are clear and easy to modify

```
Names Default To Here( 1 );
// make a sample data table
dt = New Table( "Data",
add rows( 10000 ),
New Column( "it", Numeric, Continues, Format( "Best", 8 ), Formula( Row() ) ),
New Column( "Batch" ,Numeric, Ordinal, Format( "Best", 8 ), formula( random integer (110) )),
New Column( "y" ,Numeric, Continues, Format( "Best", 8 ), formula( random log normal (:Batch / 10, 5) )),
New Column( "Out" ,Numeric, Ordinal, Format( "Best", 8 ), formula (0)),
);
dt << run formulas;
column (dt, "it") << delete formula;
column (dt, "Batch") << delete formula;
column (dt, "y") << delete formula;
column (dt, "Out") << delete formula;
// now we get to work
// extract the list of the by variable levels
byvarlist = Associative Array( dt:Batch ) << get keys;
// run across each group and locate the outliers
For( i = 1, i <= N Items( byvarlist ), i++,
// produce a subset data table for each group using selection method
dt << select where (dt:batch == byvarlist[i]);
subdt = dt << Subset(
Selected Rows( 1 ),
);
dt << clear select;
// indicate outliers within the group
subdt:Out << set formula(
If(
:y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 ) -
Col Quantile( :y, 0.25 )) * 1.5, 1,
:y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 ) -
Col Quantile( :y, 0.25 )) * 1.5, 1,
0
)
);
subdt << run formulas;
// prepare table for updating original by removing formulas and columns that are not needed
column (subdt, "Out") << delete formula;
keep = {"it", "Out"};
todelete = subdt << get column names( "string" );
For( icol = 1, icol <= N Items( keep ), icol++,
Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
);
subdt << delete columns( todelete );
// update
dt << Update(
With( subdt ),
Match Columns( :it = :it )
);
Close( subdt, No Save );
);
// now the original data has all the outliers indicated in a column
selected = dt << select where (:Out == 1);
selected << Markers( 11 );
selected << Marker size ( 8 );// not working???
selected << Row Colors( red );// not working???
dt << clear select;
```

Article Labels

There are no labels assigned to this post.