Solved: Re: How to write a script for Wilcoxon rank-sum test for thousands features - Page 2

Windermere · Nov 6, 2018 04:49 PM

Hi all experts, I am an user of JMP. I am writing to learn how to write a script to caculate Wilcoxon rank-sum test results for thousands of features. The tables of grouping and features(from feature_1 to feature_n) are attached as the figure above. Here is what I am going to do:

1) Wilcoson rank-sum test for "Treatment A vs. No treatment" for feature_1 to feature_2000. I understand I need to use "fit Y by X" funtion. The X should be the grouping (Treatment); and the Y should be the value corresponding to feature_1, 2....n.

2) I understand how to it mannully. The code for each feature is kinda like this:

Fit Group(
Oneway( Y( :Feature_1 ), X( :Treatment ), Wilcoxon Test( 1 ) ),

However, I don't know how to write a script to let it pick up the column items only have "No treatment" and "Treatment A"; and how to make it run as a loop to finish feature 1 and then feature 2 and till feature 2000.

I have tried my best to read the scripting guid. However, I can only catch the basic concept and don't really can make it solved. I will be really grateful if anyone here can at least give a hand here.

Thank you so much..

Best,

Windermere

txnelson · Nov 14, 2018 11:53 AM

Here is an example of doing what you want. I am using a sample data table that has a lot of continuous variables. I change the name of one of the continuous columns to "Feature_1" to mimic the name of the column you are targeting on. I then delete all continuous columns from the colList that appear before the "Feature_1" column and then run the analysis on all fo the remaining columns.

Names Default To Here( 1 );

/* Open a sample data table */
dt = Open( "$SAMPLE_DATA\semiconductor capability.jmp" );

// For illustration, change the name of one of the columns to "Feature_1"
Column( dt, 10 ) << set name( "Feature_1" );

/* Obtain a list of numeric/continuous column names */
colList = dt << Get Column Names( Continuous );

// Remove from the list of columns, all columns before the column called
// "Feature_1"
If( Contains( colList, Parse( "Feature_1" ) ) > 1,
	colList = Remove( colList, 1, Contains( colList, Parse( "Feature_1" ) ) - 1 )
);

// Run the analysis
Fit Group(
	Oneway( Y( Eval( colList ) ), X( :Wafer ), Wilcoxon Test( 1 ) ), 
	// If you want all of the outputs in a single row and you don't 
	// know how many there will be, a simple way to handle this is
	// to just specify a large number
	<<{Arrange in Rows( 200 )}
);

Please take the time to read through the Scripting Guide to learn what scripting is all about

Help==>Books==>Scripting Guide

Jim

Windermere · Nov 14, 2018 12:50 PM

Hi Jim,If I try to open a dataset which is not in SAMPLE_DATA but somewhere under the windows, for example, like this "C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp"

Should I write it like this?

dt= Open("$C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp");

Thank you

Windermere · Nov 14, 2018 01:05 PM

Finally, It works...

That is a good experience of learning.

Thank you

Windermere

Windermere · Nov 14, 2018 9:55 AM

dt = Open( "$C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp" );
colList = dt << Get Column Names( Continuous );
dt2 = dt << Stack(
	columns( colList ),
	source Label Column( "Feature" ),
	Stacked Data Column( "Data" ),
	Output Table( (dt << getName) || "Stacked" )
);
Close( dt, NoSave );
ow = dt2 << Oneway(
	Y( :Data ),
	X( :Treatment ),
	wilcoxon Test( 1 ),
	By( :Feature )
);
firstOwRep = Report( ow[1] );
dt3 = firstOwRep[Table Box( 3 )] << makeCombinedDataTable;
dt3 << setName( "One Way Chisquare Results By Feature" );

This is the things I wrote on the basis of the most updated suggestions.

It is still not working... :(

When I try to run it. No response at all...

ian_jmp · Nov 14, 2018 12:55 PM

Think you just need “C” (not “$C”) at the start of the filename of the file you are trying to open.

Windermere · Nov 14, 2018 01:06 PM

Yep...you are right. I should not use the $.

Thanks

I believe both yours and Jim's solutions work now

KarenC · Nov 9, 2018 08:16 AM

Another option to consider is the JMP response screenning function. It doesn't do the rank-sum test, rather a t-test or a robust test (to downweight outliers). It runs fast, provides a table of p-values, graphs for significance vs. effect sizes, and best of all has a "fit selected" option so you can easily fit Y by X of those features of most interest to you and then you could run the Wilcoxon test on a subset of the 2000 features.

https://www.jmp.com/support/help/14/response-screening.shtml