Solved: Re: Diving into Explore Outliers

Yngeinstn · May 14, 2019 6:48 AM

I just stumbled on the Explore Outliers platform and I am very giddy to say the least. My question lies in the Exclude Rows under the Robust Fit Outliers. I am actually just trying to see how the script works inside this platform specifically how it determines which rows to Exclude. I have up to 200+ columns after i split the table with the appropriate screening limits.

I can do Save Script to Script Window but i just appears to show me how to write the script in a macro form and not actually the row functions.

Thanks

ian_jmp · May 16, 2019 05:51 AM

I've not attempted to script 'Explore Outliers' before, so there may be a better way - The code below worked on the only case I had time to try, so it might get you started:

NamesDefaultToHere(1);

// Example data
dt1 = Open( "$SAMPLE_DATA/Probe.jmp" );

// Copy the data to a new table
dt2 = Eval(dt1 << getScript);
dt2 << setName((dt1 << getName||" Screened"));

// List of columns to screen
colsToScreen = {:VDP_M1, :VDP_M2, :VDP_NBASE};

// Screen for outliers using your favourite method
eo = dt2 << Explore Outliers(Y(Eval(colsToScreen)), Quantile Range Outliers( 1 ), Show Only Columns With Outliers(1), Invisible);

// Using the report, find the columns that have outliers
eoRep = Report(eo);
table = eoRep[TableBox(1)];
colList = eoRep[StringColBox(1)];
// Loop over these columns . . .
nCols = NItems(colList << get);
for(c=1, c<=nCols, c++,
	// Select this column (described by a row)
	CMD = Expr( table << setSelectedRows({colTBD}) );
	SubstituteInto(CMD, Expr(colTBD), Eval(c));
	CMD;
	// Update dt2 for this column: Cells that were considered outliers are coloured red
	eo << ColorCells(1);
	eo << ChangeToMissing(1);
	);
eoRep << closeWindow;

Rather than set cells to missing, you could consider using missing value codes.

View solution in original post

ian_jmp · May 15, 2019 04:33 AM

Regarding 'specifically how it determines which rows to exclude', did you take a look at the requisite help page?

Yngeinstn · May 15, 2019 10:35 AM

Thanks for the reply. I found that link you were referring to and I am working through all the information to determine what is the best method to use for my data.

I apologize for not being more specific in my first post. What i would like to do is automate the Explore Outliers platform into a multitude of ploting functions i am creating. For example, load my data table, split the column and perform the Explore Outlier which obviously is able to select the rows that are conidered an outlier and then exclude it. I noticed that ones a row has been excluded, an error pops up. I would need to somehow be able to ignore it in order to move on to the next column. I have attached a data table and included a JSL of how I logically see the script run. It doesn't work as of yet and that is why am here. Some tests have 300+ Split By ( :SPEC_COL_NAMES ).

Thanks in advance.

----- Begin Loop

----- Run Explore Outliers on cols[1]

----- Exclude Row

----- Plot Distribution

----- Clear Row States

----- Run Explore Outliers on cols[2]

----- Exclude Row

----- Plot Distribution

----- Clear Row States

----- End Loop after cols[i]

	dt = Current Data Table();
	
// Split Data Table by SPEC_COL_NAMES : 	
// This is used for a variety of things, specifically Spec Limits and Range Checks

	dtsplitmeas = dt << Split(
		Invisible,
		Split By( :SPEC_COL_NAMES ),
		Split( :Output_1 ),
		Group( :wafer_number, :rownum, :colnum, :subrow, :subcol, :RowCol ),
		Remaining Columns( Drop All )
	);
	
	cols = dtsplitmeas << Get Column Names( Numeric );
	
	dtsum = dt << Summary(
		Invisible,
		Group( :wafer_number )
	);
	
	jjrn1 = New Window( "Distribution - Output_1 ", << Journal );
	
// I am taking a shot at the syntax and the way I think it should be coded. 
// By all means, correct me if i am wrong and make any modifications you see fit
// I don't actually know how to address the different wafers #'s and the
// SPEC_COL_NAMES at the same time. This is why i figured 2 For() loops.
	
	For( i = 1, i <= N Items( cols ), i++,
		For( j = 1, j <= N Items( dtsum ), j++,
			test = cols[i];
			wfr = dtsum:wafer_number[j];

			eo = Explore Outliers(
				SendToByGroup( Bygroup Default ),
				Y( As Column( test ) ),
				Robust Fit Outliers,
				Where( :wafer_number == wfr )
			);
			
//------------------------------------------------------------ -/				
// < Insert Script to Exclude Rows for col[1] in dtsplitmeas>  /
// This is what my original question was 	            	  /
// referring to 										     /
//----------------------------------------------------------/

// Step into Control Chart, Control Chart Builder, Distributions, Fit Y by X plots
// Control Chart Builder is just an example
				
			gb = Control Chart Builder(
				Size( 847, 990 ),
				Show Control Panel( 0 ),
				Variables( Y( As Column( test ) ) ),
				Chart( Position( 1 ), Limits( Sigma( "Levey Jennings" ) ) ),
				Chart( Position( 2 ), Limits( Sigma( "Moving Range" ) ) )
			);

// Create Control Chart for all SPEC_COL_NAMES place it into a report / journal
	
			(gb << top Report)[Text Edit Box( 1 )] << delete;  //delete the where statement
			Report( gb )[Outline Box( 1 )] << Set Title( cols[i] );
			
			jjrn2 << Append( Report( gb ) );
			gb << Close Window;
	
// Clear Row States because when you exclude a row for a given column, it extends
// to all other colums on that row
			
			dtsplitmeas << Clear Row States();
			
// Rise and Repeat for all SPEC_COL_NAMES
			
		);
	);

ian_jmp · May 16, 2019 05:51 AM

I've not attempted to script 'Explore Outliers' before, so there may be a better way - The code below worked on the only case I had time to try, so it might get you started:

NamesDefaultToHere(1);

// Example data
dt1 = Open( "$SAMPLE_DATA/Probe.jmp" );

// Copy the data to a new table
dt2 = Eval(dt1 << getScript);
dt2 << setName((dt1 << getName||" Screened"));

// List of columns to screen
colsToScreen = {:VDP_M1, :VDP_M2, :VDP_NBASE};

// Screen for outliers using your favourite method
eo = dt2 << Explore Outliers(Y(Eval(colsToScreen)), Quantile Range Outliers( 1 ), Show Only Columns With Outliers(1), Invisible);

// Using the report, find the columns that have outliers
eoRep = Report(eo);
table = eoRep[TableBox(1)];
colList = eoRep[StringColBox(1)];
// Loop over these columns . . .
nCols = NItems(colList << get);
for(c=1, c<=nCols, c++,
	// Select this column (described by a row)
	CMD = Expr( table << setSelectedRows({colTBD}) );
	SubstituteInto(CMD, Expr(colTBD), Eval(c));
	CMD;
	// Update dt2 for this column: Cells that were considered outliers are coloured red
	eo << ColorCells(1);
	eo << ChangeToMissing(1);
	);
eoRep << closeWindow;

Rather than set cells to missing, you could consider using missing value codes.

Yngeinstn · May 16, 2019 03:05 PM

I can't thank you enough for this... This is unbelievable! The outlier issue has been the bain of my existence (along with wafer map creation) since i was in charge of plotting all this data. Mr. txnelson enlightned me about the range check method which is good when i have spec limits to compare it too however just plotting raw data with where i can't use the check was hard for me..

Yngeinstn · Dec 13, 2019 06:02 AM

@ian_jmp

Could you please help me with this error message i am getting.. When i run this script manually it works just fine and doesn't throw any errors (see screen shot of log) however if i try to use this in an expression and run it automatically I get the following error message and then subsequent error messages (every line of the log that says values were replaced by missing. I tried to to put a Wait() in there but that didn't help.

Thanks David

error

ian_jmp · Dec 17, 2019 07:32 AM

Try putting:

  Batch Interactive( 1 );

at the start of your code.