cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
lmaley
Level I

Automatically selecting outliers

I have 7000 rows, that consists of multiple batches.   I am doing a distribution by batch and using the outlier box plot to look for outliers.  But I have 100+ batches, is there a way to have JMP automatically select which rows are outliers?

 

1 ACCEPTED SOLUTION

Accepted Solutions
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

now see if it works for you...

the script you sent was not adapted to your data at all...

View solution in original post

24 REPLIES 24

Re: Automatically selecting outliers

You might want to look under Analyze > Screening > Explore Outliers.

Dan Obermiller
txnelson
Super User

Re: Automatically selecting outliers

I am not sure of exactly what you are looking for.  If you want to select the rows for a given distribution, when viewing the output, you can simply drag across the displayed outlier data points in the outlier box plot, and it will select the outlier data points.

outlier1.PNG

But if you want to go from column to column and select outlier rows, a simple script could do that, however, remember that row selection goes across all columns, and what you would end up will are selected rows where any of the columns are selected.

What you may really want, is a simple script that colors all of the individual cells that are outliers.  Maybe red if they are to far above the mean, and orange if they are outliers below the mean.

Can you please clarify exactly what you are looking for?

 

See Dan's reply for a better answer.

Jim
lmaley
Level I

Re: Automatically selecting outliers

The screening outliers does not work if I need a variable in the by.   What I want to do is my Y is in one column.   Another column is labeled batch.   I put batch in by and make the number of distributions as I do batches.   So say 100 distributions.   I want a script to highlight all the outliers in all 100 distributions.  Ideally it will only highlight the rows so I can pick my own color since I use colors for different thing.  Make more sense now?

Re: Automatically selecting outliers

If you had searched the File Exchange portion of the JMP Community for 'outlier,' you would have found these two utility scripts:

 

 

Of course, the statistical basis for each of these tests must be suited to your situation and criterion.

lmaley
Level I

Re: Automatically selecting outliers

I tried the Grubb's script.   This just shows the outliers and does not highlight the rows right?  I need something that will highlight the rows.

ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

lmaley
Level I

Re: Automatically selecting outliers

I can't get this to work with my data, I changed the height and weight to my columns and removed the two lines where it opened the table and embedded it into my table, but it says evaluations are done but will not do anything.

 

I also need something that uses the by and this for each batch individually.   

 

 

ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

try to split the table by batch and group by unique value (so you do not lose any rows) then run my script.
otherwise you need a different strategy than the one i offer.
does it make sense?
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

let me try again

it is not a sleek solution but steps are clear and easy to modify

 

Names Default To Here( 1 );

// make a sample data table
dt = New Table( "Data",
	add rows( 10000 ),
	New Column( "it", Numeric, Continues, Format( "Best", 8 ), Formula( Row() ) ),
	New Column( "Batch" ,Numeric, Ordinal, Format( "Best", 8 ), 	formula( random integer (110) )),
	New Column( "y" ,Numeric, Continues, Format( "Best", 8 ), 	formula( random log normal (:Batch / 10, 5) )),
	New Column( "Out" ,Numeric, Ordinal, Format( "Best", 8 ), formula (0)),

);

dt << run formulas;
column (dt, "it") << delete formula;
column (dt, "Batch") << delete formula;
column (dt, "y") << delete formula;
column (dt, "Out") << delete formula;

// now we get to work

// extract the list of the by variable levels
byvarlist = Associative Array( dt:Batch ) << get keys;

// run across each group and locate the outliers
For( i = 1, i <= N Items( byvarlist ), i++,

// produce a subset data table for each group using selection method
dt << select where (dt:batch == byvarlist[i]); 
subdt = dt << Subset(
	Selected Rows( 1 ),
); 

dt << clear select;

// indicate outliers within the group
subdt:Out << set formula(
	If(
		:y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 ) -
		Col Quantile( :y, 0.25 )) * 1.5, 1,
		:y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 ) -
		Col Quantile( :y, 0.25 )) * 1.5, 1,
		0
	)
);
subdt << run formulas;

// prepare table for updating original by removing formulas and columns that are not needed
column (subdt, "Out") << delete formula;

keep = {"it", "Out"};
todelete = subdt << get column names( "string" );
For( icol = 1, icol <= N Items( keep ), icol++,
	Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
);

subdt << delete columns( todelete );

// update
dt << Update(
	With( subdt ),
	Match Columns( :it = :it )
);

Close( subdt, No Save );

);

// now the original data has all the outliers indicated in a column
selected = dt << select where (:Out == 1);
selected << Markers( 11 );
selected << Marker size ( 8 );// not working???
selected << Row Colors( red );// not working???
dt << clear select;