cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
lmaley
Level I

Automatically selecting outliers

I have 7000 rows, that consists of multiple batches.   I am doing a distribution by batch and using the outlier box plot to look for outliers.  But I have 100+ batches, is there a way to have JMP automatically select which rows are outliers?

 

24 REPLIES 24
lmaley
Level I

Re: Automatically selecting outliers

when I ran this, it made a subset table with the first batch only. I
changed this to do my open table:

dt = currentdatatable();
byvarlist = Associative Array( dt:Batch ) << get keys;
For( i = 1, i <= N Items( byvarlist ), i++,
dt << select where( dt:batch == byvarlist[i] );
subdt = dt << Subset( Selected Rows( 1 ) );
dt << clear select;
subdt:Out << set formula(
If(
:y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 )
-Col Quantile( :y, 0.25 )) * 1.5, 1,
:y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 )
-Col Quantile( :y, 0.25 )) * 1.5, 1,
0
)
);
subdt << run formulas;
Column( subdt, "Out" ) << delete formula;
keep = {"it", "Out"};
todelete = subdt << get column names( "string" );
For( icol = 1, icol <= N Items( keep ), icol++,
Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
);
subdt << delete columns( todelete );
dt << Update( With( subdt ), Match Columns( :it = :it ) );
Close( subdt, No Save );
);
selected = dt << select where( :Out == 1 );
selected << Markers( 11 );
selected << Marker Size( 8 );
selected << Row Colors( red );
dt << clear select;
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

the column name is case sensitive, be consistent when using :Batch or :batch
lmaley
Level I

Re: Automatically selecting outliers

My batch column is capitilzed:

[image: image.png]
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

in the for loop it isn't
lmaley
Level I

Re: Automatically selecting outliers

I am not sure if you got my last message I just sent, I got a weird
automated email back. I fixed the second one and it is still doing the
same thing.
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

this worked for me.

at the end you should have 3 tables. original with all rows, one with just outliers and the last without outliers. let me know if it works.


 

Names Default To Here( 1 );

// make a sample data table
dt = New Table( "Data",
    add rows( 10000 ),
    New Column( "it", Numeric, Continues, Format( "Best", 8 ), Formula( Row() ) ),
    New Column( "Batch", Numeric, Ordinal, Format( "Best", 8 ), formula( Random Integer( 110 ) ) ),
    New Column( "y", Numeric, Continues, Format( "Best", 8 ), formula( Random Lognormal( :Batch / 10, 5 ) ) ),
    New Column( "Out", Numeric, Ordinal, Format( "Best", 8 ), formula( 0 ) ),

);

dt << run formulas;
Column( dt, "it" ) << delete formula;
Column( dt, "Batch" ) << delete formula;
Column( dt, "y" ) << delete formula;
Column( dt, "Out" ) << delete formula;

// now we get to work
dt = Current Data Table();
byvarlist = Associative Array( dt:Batch ) << get keys;
For( i = 1, i <= N Items( byvarlist ), i++,
    dt << select where( dt:Batch == byvarlist[i] );
    subdt = dt << Subset( Selected Rows( 1 ) );
    dt << clear select;
    subdt:Out << set formula(
        If(
            :y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 ) - Col Quantile( :y, 0.25 )) * 1.5, 1,
            :y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 ) - Col Quantile( :y, 0.25 )) * 1.5, 1,
            0
        )
    );
    subdt << run formulas;
    Column( subdt, "Out" ) << delete formula;
    keep = {"it", "Out"};
    todelete = subdt << get column names( "string" );
    For( icol = 1, icol <= N Items( keep ), icol++,
        Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
    );
    subdt << delete columns( todelete );
    dt << Update( With( subdt ), Match Columns( :it = :it ) );
    Close( subdt, No Save );
    Wait( 0.00001 );
);
selected = dt << select where( :Out == 1 );
selected << Markers( 11 );
selected << Marker Size( 8 );
selected << Row Colors( red );
dt << clear select;

dt << select where( dt:Out == 1 );
Outdt = dt << Subset( Selected Rows( 1 ) );
dt << clear select;
    
Wait( 0.00001 );
    
dt << select where( dt:Out == 0 );
indt = dt << Subset( Selected Rows( 1 ) );
dt << clear select;

 

lmaley
Level I

Re: Automatically selecting outliers

I copy everything from //now we get to work on down. When I run this in
my own table, it just creates one table with one batch only. I must say
one thing, is that the number of points per batch are not consistent. I do
not know if that matters.
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

if you run the whole script can you see that you get what you wanted (just not on your data)?
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

my formula is not sensitive to batch size. furthermore, it will work even if batch column is character (and not numeric).
can you post a sample, even with just 2 or three different batches?
lmaley
Level I

Re: Automatically selecting outliers

When I run it, I get one table, and here is a snapshot of it, this will
work since I can select rows where Out = 1, if I want to select all of the
outliers.

[image: image.png]