Choose Language Hide Translation Bar
Highlighted
Yngeinstn
Level IV

Integratation of GrubbsOutlierTest2 into multivariable data table

Previously stated, I am dealing with a complex set of data tables joined using SQL scripts. With the help from this community I have managed to create an "application" based script that creates plots and distributions based on a user selected conditions.

 

The last portion of issues is dealing with those pesky outliers. As usual, my problem lies with looping through the test conditions identifying the outliers and then removing them from the data set. I understand that removing them can be risky but given our complex tests our parts are in a tight distribution or catastrophically fails. I am trying to impliment if g > g0 go find the outliers and delete them.

 

Attached: 3 subsets from the main table and the main table.

 

I found this THREAD that could do it in a single column but again, how does one loop through everything?

 

I have quite a few scripts that loop through some of these conditions but I cannot figure out the correct combination.

 

Again thanks for the help.

5 REPLIES 5
Highlighted
ian_jmp
Staff

Re: Integratation of GrubbsOutlierTest2 into multivariable data table

Let's focus on your table #1 below, and on just 'test_1' within that. I think you are saying that you want to consider each of the 3 * 2 * 16 = 96 'looping conditions' to define a group, and to look for outliers within each of these groups. Is that correct, please? But the attached table shows that #1 has 105 such groups, not 96?

Highlighted
Yngeinstn
Level IV

Re: Integratation of GrubbsOutlierTest2 into multivariable data table

I apologize for me not communicating this clearly. test_1 includes 3 different set_id's {7000,7100,7200}.

set_id 7000 has 1 channel (switching) + 2 supplies =  2 test outputs

set_id 7100 has 16 channels + 2 supplies = 32 test outputs

set_id 7200 has 16 channels + 2 supplies = 32 test outputs

Total = 66

I got ahead of myself and forgot that set_id: 7000 is just 2 conditions. all 3 tests = 105

 

I have also been giving it some thought and need to rewrite the original post. I think it is going to be easier to subset each of the tests ( _1, _2, _3) and then do what i need to do with the distriubtions. The main table is good for my graph plots. The subsets will be the tests and test conditions that we screen at... 

Highlighted

Re: Integratation of GrubbsOutlierTest2 into multivariable data table

The script for the Grubb's outlier test can use a grouping variable in the By analysis role.

Learn it once, use it forever!
Highlighted
Yngeinstn
Level IV

Re: Integratation of GrubbsOutlierTest2 into multivariable data table

Yes sir, however i would like to use that in an automated form that cacluates the g > g0. Then with that calculation for each test By: channel locates the outliers and then excludes them from the data set instead of manually doing it.

 

Highlighted
Yngeinstn
Level IV

Re: Integratation of GrubbsOutlierTest2 into multivariable data table

Update: I managed to get GrubbsOutlier to sort of work. I can take one test condition and get what i need. I grabbed a For Loop to capture two test conditions however, it hangs.

 

You can run this script against the sample data _ 1 wafers (test_3).jmp and see the output which i am looking for.. I would appreciate any help with adding the 2nd Trmode and even as far as doing this on a dataset with multiple wafers sample data _ 6 wafers (test_3).jmp

 

After this is resolved i am going to try to do a If( g > g0 ) Loc( < the outliers> ) then delete them from the dataset.

 

Thank You

 

 

dt = Current Data Table();

	dtsum = dt << Summary(
		Group( :channel, :trmode ),
		Interquartile Range( :Output ),
		Freq( "None" ),
		Weight( "None" ),
//		invisible
	);

	Current Data Table( dt );
// <Insert For Loop Here> dist = dt << Distribution( Y( Column( "Output" )), By( Column( "channel" ) ), Normal Quantille Plot( 1 ), Fit Distribution( Normal( Goodness of FIt( 1 ) ) ),
// Added a For() Loop located at the botton of the screen, the script hangs up
// Where( dt :trmode == mode ) Where( dt :trmode == "Tx" )

); distr = dist << Report; bCol = Column( "channel "); Summarize( group = By( bCol )); yy = Column( "Output" ) << Get As Matrix; exRows = dt << Get Excluded Rows(); yy[exRows] = .; For( i = 1, i <= N Items( group ), i++, groupName = Trim( Word( 2, distr[i][OutlineBox(1)] << Get Title, "=" ) ); getRows = dt << Get Rows Where( bCol[] == groupName ); yVal = yy[getRows]; yVal[Loc( Is Missing( yVal ) )] = []; n = N Row( yVal ); a = 0.05; t0Sqr = t Quantile( 1 - a/(2*n), n-2 )^2; g = Maximum( Abs( yVal - Mean( yVal ) ) ) / Std Dev( yVal ); g0 = ((n-1)/Sqrt(n)) * Sqrt( t0Sqr / (n - 2 + t0Sqr) ); distr[i][Outline Box(2)] << Append( Outline Box( "Grubbs' Outlier Test", Table Box( String Col Box( "Statistic", {"G", "G("||Char(a)||")"} ), Number Col Box( "Estimate", Matrix( {g, g0} ) ) ), Text Box( If( g>g0, "Outlier detected", "No outlier detected" ) ) ) ); ); // For Loop I was trying to use // For( i = 1, i <= N Rows( dtsum ), i++, // mode = dtsum:trmode[i]; // < Insert Script From Above > // );

 

Article Labels

    There are no labels assigned to this post.