I have 7000 rows, that consists of multiple batches. I am doing a distribution by batch and using the outlier box plot to look for outliers. But I have 100+ batches, is there a way to have JMP automatically select which rows are outliers?
now see if it works for you...
the script you sent was not adapted to your data at all...
You might want to look under Analyze > Screening > Explore Outliers.
I am not sure of exactly what you are looking for. If you want to select the rows for a given distribution, when viewing the output, you can simply drag across the displayed outlier data points in the outlier box plot, and it will select the outlier data points.
But if you want to go from column to column and select outlier rows, a simple script could do that, however, remember that row selection goes across all columns, and what you would end up will are selected rows where any of the columns are selected.
What you may really want, is a simple script that colors all of the individual cells that are outliers. Maybe red if they are to far above the mean, and orange if they are outliers below the mean.
Can you please clarify exactly what you are looking for?
See Dan's reply for a better answer.
The screening outliers does not work if I need a variable in the by. What I want to do is my Y is in one column. Another column is labeled batch. I put batch in by and make the number of distributions as I do batches. So say 100 distributions. I want a script to highlight all the outliers in all 100 distributions. Ideally it will only highlight the rows so I can pick my own color since I use colors for different thing. Make more sense now?
If you had searched the File Exchange portion of the JMP Community for 'outlier,' you would have found these two utility scripts:
Of course, the statistical basis for each of these tests must be suited to your situation and criterion.
I tried the Grubb's script. This just shows the outliers and does not highlight the rows right? I need something that will highlight the rows.
Hi,
perhaps here is a solution:
Ron
I can't get this to work with my data, I changed the height and weight to my columns and removed the two lines where it opened the table and embedded it into my table, but it says evaluations are done but will not do anything.
I also need something that uses the by and this for each batch individually.
let me try again
it is not a sleek solution but steps are clear and easy to modify
Names Default To Here( 1 );
// make a sample data table
dt = New Table( "Data",
add rows( 10000 ),
New Column( "it", Numeric, Continues, Format( "Best", 8 ), Formula( Row() ) ),
New Column( "Batch" ,Numeric, Ordinal, Format( "Best", 8 ), formula( random integer (110) )),
New Column( "y" ,Numeric, Continues, Format( "Best", 8 ), formula( random log normal (:Batch / 10, 5) )),
New Column( "Out" ,Numeric, Ordinal, Format( "Best", 8 ), formula (0)),
);
dt << run formulas;
column (dt, "it") << delete formula;
column (dt, "Batch") << delete formula;
column (dt, "y") << delete formula;
column (dt, "Out") << delete formula;
// now we get to work
// extract the list of the by variable levels
byvarlist = Associative Array( dt:Batch ) << get keys;
// run across each group and locate the outliers
For( i = 1, i <= N Items( byvarlist ), i++,
// produce a subset data table for each group using selection method
dt << select where (dt:batch == byvarlist[i]);
subdt = dt << Subset(
Selected Rows( 1 ),
);
dt << clear select;
// indicate outliers within the group
subdt:Out << set formula(
If(
:y > Col Quantile( :y, 0.75 ) + (Col Quantile( :y, 0.75 ) -
Col Quantile( :y, 0.25 )) * 1.5, 1,
:y < Col Quantile( :y, 0.25 ) - (Col Quantile( :y, 0.75 ) -
Col Quantile( :y, 0.25 )) * 1.5, 1,
0
)
);
subdt << run formulas;
// prepare table for updating original by removing formulas and columns that are not needed
column (subdt, "Out") << delete formula;
keep = {"it", "Out"};
todelete = subdt << get column names( "string" );
For( icol = 1, icol <= N Items( keep ), icol++,
Remove From( todelete, Contains( todelete, Keep[icol] ), 1 )
);
subdt << delete columns( todelete );
// update
dt << Update(
With( subdt ),
Match Columns( :it = :it )
);
Close( subdt, No Save );
);
// now the original data has all the outliers indicated in a column
selected = dt << select where (:Out == 1);
selected << Markers( 11 );
selected << Marker size ( 8 );// not working???
selected << Row Colors( red );// not working???
dt << clear select;