cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
Windermere
Level I

How to write a script for Wilcoxon rank-sum test for thousands features

demo.PNGdemo.PNG

Hi all experts,  I am an user of JMP. I am writing to learn how to write a script to caculate Wilcoxon rank-sum test results for thousands of features. The tables of grouping and features(from feature_1 to feature_n) are attached as the figure above. Here is what I am going to do:

1) Wilcoson rank-sum test for "Treatment A vs. No treatment" for feature_1 to feature_2000. I understand I need to use "fit Y by X" funtion. The X should be the grouping (Treatment); and the Y should be the value corresponding to feature_1, 2....n. 

2) I understand how to it mannully. The code for each feature is kinda like this:

  Fit Group(
 Oneway( Y( :Feature_1 ), X( :Treatment ), Wilcoxon Test( 1 ) ),

However, I don't know how to write a script to let it pick up the column items only have "No treatment" and "Treatment A"; and how to make it run as a loop to finish feature 1 and then feature 2 and till feature 2000.

 

I have tried my best to read the scripting guid. However, I can only catch the basic concept and don't really can make it solved. I will be really grateful if anyone here can at least give a hand here.

 

Thank you so much..

 

Best,

 

Windermere

16 REPLIES 16
txnelson
Super User

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Here is an example of doing what you want.  I am using a sample data table that has a lot of continuous variables.  I change the name of one of the continuous columns to "Feature_1" to mimic the name of the column you are targeting on.  I then delete all continuous columns from the colList that appear before the "Feature_1" column and then run the analysis on all fo the remaining columns.

Names Default To Here( 1 );

/* Open a sample data table */
dt = Open( "$SAMPLE_DATA\semiconductor capability.jmp" );

// For illustration, change the name of one of the columns to "Feature_1"
Column( dt, 10 ) << set name( "Feature_1" );

/* Obtain a list of numeric/continuous column names */
colList = dt << Get Column Names( Continuous );

// Remove from the list of columns, all columns before the column called
// "Feature_1"
If( Contains( colList, Parse( "Feature_1" ) ) > 1,
	colList = Remove( colList, 1, Contains( colList, Parse( "Feature_1" ) ) - 1 )
);

// Run the analysis
Fit Group(
	Oneway( Y( Eval( colList ) ), X( :Wafer ), Wilcoxon Test( 1 ) ), 
	// If you want all of the outputs in a single row and you don't 
	// know how many there will be, a simple way to handle this is
	// to just specify a large number
	<<{Arrange in Rows( 200 )}
);

Please take the time to read through the Scripting Guide to learn what scripting is all about

     Help==>Books==>Scripting Guide

Jim
Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Hi Jim,If I try to open a dataset which is not in SAMPLE_DATA but somewhere under the windows, for example, like this "C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp"

Should I write it like this?

dt= Open("$C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp"); 

 

Thank you

Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Finally, It works...

 

That is a good experience of learning.

 

Thank you

 

Windermere

Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

dt = Open( "$C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp" );
colList = dt << Get Column Names( Continuous );
dt2 = dt << Stack(
	columns( colList ),
	source Label Column( "Feature" ),
	Stacked Data Column( "Data" ),
	Output Table( (dt << getName) || "Stacked" )
);
Close( dt, NoSave );
ow = dt2 << Oneway(
	Y( :Data ),
	X( :Treatment ),
	wilcoxon Test( 1 ),
	By( :Feature )
);
firstOwRep = Report( ow[1] );
dt3 = firstOwRep[Table Box( 3 )] << makeCombinedDataTable;
dt3 << setName( "One Way Chisquare Results By Feature" );

This is the things I wrote on the basis of the most updated suggestions.

 

It is still not working... :(

When I try to run it. No response at all...

ian_jmp
Staff

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Think you just need “C” (not “$C”) at the start of the filename of the file you are trying to open.
Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Yep...you are right. I should not use the $.

 

Thanks

 

I believe both yours and Jim's solutions work now

 

KarenC
Super User (Alumni)

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Another option to consider is the JMP response screenning function.  It doesn't do the rank-sum test, rather a t-test or a robust test (to downweight outliers). It runs fast, provides a table of p-values, graphs for significance vs. effect sizes, and best of all has a "fit selected" option so you can easily fit Y by X of those features of most interest to you and then you could run the Wilcoxon test on a subset of the 2000 features.

 

https://www.jmp.com/support/help/14/response-screening.shtml