cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
lmaley
Level I

Automatically selecting outliers

I have 7000 rows, that consists of multiple batches.   I am doing a distribution by batch and using the outlier box plot to look for outliers.  But I have 100+ batches, is there a way to have JMP automatically select which rows are outliers?

 

24 REPLIES 24
lmaley
Level I

Re: Automatically selecting outliers

when I ran this, it made a subset table with the first batch only. I
changed this to do my open table:

dt = currentdatatable();
byvarlist = Associative Array( dt:Batch ) << get keys;
For( i = 1, i <= N Items( byvarlist ), i++,
dt << select where( dt:batch == byvarlist[i] );
subdt = dt << Subset( Selected Rows( 1 ) );
dt << clear select;
subdt:Out << set formula(
If(
:y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 )
-Col Quantile( :y, 0.25 )) * 1.5, 1,
:y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 )
-Col Quantile( :y, 0.25 )) * 1.5, 1,
0
)
);
subdt << run formulas;
Column( subdt, "Out" ) << delete formula;
keep = {"it", "Out"};
todelete = subdt << get column names( "string" );
For( icol = 1, icol <= N Items( keep ), icol++,
Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
);
subdt << delete columns( todelete );
dt << Update( With( subdt ), Match Columns( :it = :it ) );
Close( subdt, No Save );
);
selected = dt << select where( :Out == 1 );
selected << Markers( 11 );
selected << Marker Size( 8 );
selected << Row Colors( red );
dt << clear select;
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

the column name is case sensitive, be consistent when using :Batch or :batch
lmaley
Level I

Re: Automatically selecting outliers

My batch column is capitilzed:

[image: image.png]
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

in the for loop it isn't
lmaley
Level I

Re: Automatically selecting outliers

I am not sure if you got my last message I just sent, I got a weird
automated email back. I fixed the second one and it is still doing the
same thing.
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

this worked for me.

at the end you should have 3 tables. original with all rows, one with just outliers and the last without outliers. let me know if it works.


 

Names Default To Here( 1 );

// make a sample data table
dt = New Table( "Data",
    add rows( 10000 ),
    New Column( "it", Numeric, Continues, Format( "Best", 8 ), Formula( Row() ) ),
    New Column( "Batch", Numeric, Ordinal, Format( "Best", 8 ), formula( Random Integer( 110 ) ) ),
    New Column( "y", Numeric, Continues, Format( "Best", 8 ), formula( Random Lognormal( :Batch / 10, 5 ) ) ),
    New Column( "Out", Numeric, Ordinal, Format( "Best", 8 ), formula( 0 ) ),

);

dt << run formulas;
Column( dt, "it" ) << delete formula;
Column( dt, "Batch" ) << delete formula;
Column( dt, "y" ) << delete formula;
Column( dt, "Out" ) << delete formula;

// now we get to work
dt = Current Data Table();
byvarlist = Associative Array( dt:Batch ) << get keys;
For( i = 1, i <= N Items( byvarlist ), i++,
    dt << select where( dt:Batch == byvarlist[i] );
    subdt = dt << Subset( Selected Rows( 1 ) );
    dt << clear select;
    subdt:Out << set formula(
        If(
            :y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 ) - Col Quantile( :y, 0.25 )) * 1.5, 1,
            :y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 ) - Col Quantile( :y, 0.25 )) * 1.5, 1,
            0
        )
    );
    subdt << run formulas;
    Column( subdt, "Out" ) << delete formula;
    keep = {"it", "Out"};
    todelete = subdt << get column names( "string" );
    For( icol = 1, icol <= N Items( keep ), icol++,
        Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
    );
    subdt << delete columns( todelete );
    dt << Update( With( subdt ), Match Columns( :it = :it ) );
    Close( subdt, No Save );
    Wait( 0.00001 );
);
selected = dt << select where( :Out == 1 );
selected << Markers( 11 );
selected << Marker Size( 8 );
selected << Row Colors( red );
dt << clear select;

dt << select where( dt:Out == 1 );
Outdt = dt << Subset( Selected Rows( 1 ) );
dt << clear select;
    
Wait( 0.00001 );
    
dt << select where( dt:Out == 0 );
indt = dt << Subset( Selected Rows( 1 ) );
dt << clear select;

 

lmaley
Level I

Re: Automatically selecting outliers

I copy everything from //now we get to work on down. When I run this in
my own table, it just creates one table with one batch only. I must say
one thing, is that the number of points per batch are not consistent. I do
not know if that matters.
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

if you run the whole script can you see that you get what you wanted (just not on your data)?
ron_horne
Super User (Alumni)

Re: Automatically selecting outliers

my formula is not sensitive to batch size. furthermore, it will work even if batch column is character (and not numeric).
can you post a sample, even with just 2 or three different batches?
lmaley
Level I

Re: Automatically selecting outliers

When I run it, I get one table, and here is a snapshot of it, this will
work since I can select rows where Out = 1, if I want to select all of the
outliers.

[image: image.png]