Subscribe Bookmark RSS Feed

I am having trouble getting Column Stats to work in a loop?

apeden

Community Trekker

Joined:

Jun 23, 2011

I am trying to write a script it remove outliers from multiple columns in a table by using a loop to look at each column name and pass this into a function which has the following line of code to work out the 99th percentile.

test_upper_quantile=Col Quantile( Column( Test_Number ), 0.99 );

I then use a For Each Row loop to compare the values against the calculated percentile and remove any Rows which fall outside this.

The code works fine if I only do one column however if I set the loop up with multiple columns reading the column names into the Test_Number variable, jmp always uses the value of the quantile first calculated for the first column, it doesnt recalculate for each column. The loop looks to be set up correctly as it displays the variable Test_Number in the log and this changes to match the column as expected. So if I do 10 columns I get the 10 correct column names displayed but only the quantile from the first column repeated 10 times!!!



Also when the script finishes if I highlight the line of code below, it calculates the value for the last column it encountered correctly.

test_upper_quantile=Col Quantile( Column( Test_Number ), 0.99 );


I have tried clearing the global test_upper_quantile which holds the quantile value at the begining of the function but this makes no difference?

Any ideas? I did notice when you hover the cursor over the column quantile comand it says it gets cached. Is this my problem if so is there a way to clear the cache?

The complete function is shown

remove_outliers = Function( {Test_Number},

show(test_number);
test_lower_quantile = Col Quantile( Column( Test_Number ), 0.01 );
test_upper_quantile = Col Quantile( Column( Test_Number ), 0.99 );

show(test_upper_quantile);

lower_cutoff = test_lower_quantile;
upper_cutoff = test_upper_quantile;

For Each Row(

If( Column( test_number )[] < lower_cutoff,
Excluded( Row State() ) = 1
);

If( Column( test_number )[] > upper_cutoff,
Excluded( Row State() ) = 1
);
);

);

Message was edited by: apeden
4 REPLIES
mikedriscoll

Community Trekker

Joined:

Jun 23, 2011

I know this post is years old, but is there a solution for this problem?  I am seeing the exact same thing.  Using col quantile in a loop will result in only the first iteration of the loop being stored into my list.

Is there a work around for getting quantile data that does not involve the col quantile() function?   I don't really want to do summary tables (summary with quantile data) because it will increase execution time for several thousand columns.

Thanks

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

Perhaps you could post some code that causing you difficulty.

This works for me:

dt=open("$SAMPLE_DATA/Solubility.jmp");

nc=dt<<get column names(Numeric);

for(i=1, i<=nitems(nc), i++,

  show(col quantile(column(nc[i]), .5));

);

/*:

Col Quantile(Column(nc), 0.5) = 1.36;

Col Quantile(Column(nc), 0.5) = 1.21;

Col Quantile(Column(nc), 0.5) = 0.95;

Col Quantile(Column(nc), 0.5) = 0.445;

Col Quantile(Column(nc), 0.5) = -0.015;

Col Quantile(Column(nc), 0.5) = -0.62;

-Jeff
mikedriscoll

Community Trekker

Joined:

Jun 23, 2011

Hi Jeff,

Your code works for me also.  It is very similar to my code, the only difference i can see is how the column name list is generated. From a dialog box window, i use:

   ButtonBox("Ok",
   yParamNames = colListY << get Items;

which returns a list of strings, each column name in quotes.  colListY is a column list box that is populated by the user.

I've found a so-so workaround. It grabs everything in teh column instead of letting JMP do the work.  The downside of the workaround that I see is that it does not ignore hidden and excluded data unless i manually program that in.

      tempList1 = {};

      tempList1 = column( dtData, yParamList[i]) << get as matrix;

      medVal[i] = quantile(0.5, tempList1);

      q1Val[i] = quantile(0.25, tempList1);

      q3Val[i] = quantile(0.75, tempList1);

//    medVal = col quantile(column( dtData, yParamList), 0.5);

//    q1Val = col quantile(column( dtData, yParamList), 0.25);

//    q3Val = col quantile(column( dtData, yParamList), 0.75);   

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

Try the As Column() function to convert what you get from the Col List Box()<<get selected to a column reference.

Names Default To Here( 1 );

dt = Open( "$SAMPLE_DATA/solubility.jmp" );

nw = New Window( "Select columns for quantiles",

  lb = Col List Box( all ),

  Button Box( "Ok",

  cols = lb << get selected;

  nw << close window;

  )

);

For( i = 1, i <= N Items( cols ), i++,

  Show( Col Quantile( As Column( cols[i] ), .5 ) )

);

-Jeff

-Jeff