Below script which is a slight modification from yours and should be the starting point .(w/ some minor modification on stdev being compared to a threshold). As you know, I need an actual script to further progress as this deletion would be done automatically by the script. The next phase internal code – after JMP data selection - is particularly having an issue when zero variance is encountered. Your prior version (ie. below) proved that these can be taken out. I am though trying to avoid deleting parameters w/ mean differences but with zero variance.
I cannot use a threshold that is dependent on actual parameter mean & distribution. e.g. I cannot specify practical difference, as this requires in depth examination of all 2000 parameters, not feasible. Not sure if I mentioned, but 2000 is just the tip of the iceberg, more coming. However, a single dataset contains roughly 2000 parameters.
I can use p_value to filter parameters in or out depending on which ones cause a failure in our internal code. I think from that stand point, t-test may be a better option. However, I do not know how to insert the t-test and extract a p_value .... so I can add it as an additional filter.
I feel like I have two options ...
1) Either I will use the code below as is w/ no further 2-sample means comparison. It would delete all such cases if even means are different. I need to then record the column names of those that were deleted (need help w/ that) , so that at the end of the overall study, I can turn my attention to them and analyze them in jmp to detect means difference if any. This would be a much smaller set of parameter analysis, percentage-wise, I expect zero variance cases to be a very minimal portion of the dataset.
2) Or my second option is to add a t-test or any other test which offers a normalized threshold to fine tune the population of significant differences. Eg. I can use p_value to confirm or reject significance.
What do you think ?
Names Default To Here( 1 );
dt = current datatable();
dtcol1 = column(dt, 1);
dt1 = dt << subset(rows(dt<<get rows Where(ascolumn(dtcol1) == dtcol1[1])), invisible, selected columns(0));
dt2 = dt << subset(rows(dt<<get rows Where(ascolumn(dtcol1) != dtcol1[1])), invisible, selected columns(0));
// Get all continuous data columns
colList = dt << get column names( numeric, continuous );
// Loop across all columns and find those with no variance
For( i = N Items( collist ), i >= 1, i--,
If( (Col Std Dev( Column( dt1, colList[i] ) ) ) >= 0.01 & Col Std Dev( Column( dt2, colList[i] ) ) >= 0.01,
colList = Remove( colList, i, 1 )
// Delete the columns with no variance
dt<<delete columns(colList);
close(dt1, nosave);
close(dt2, nosave);
note: the thresholds on stdevs are not settled down. 0.01 is just one example. Will fine tune that.