cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Windermere
Level I

How to write a script for Wilcoxon rank-sum test for thousands features

demo.PNGdemo.PNG

Hi all experts,  I am an user of JMP. I am writing to learn how to write a script to caculate Wilcoxon rank-sum test results for thousands of features. The tables of grouping and features(from feature_1 to feature_n) are attached as the figure above. Here is what I am going to do:

1) Wilcoson rank-sum test for "Treatment A vs. No treatment" for feature_1 to feature_2000. I understand I need to use "fit Y by X" funtion. The X should be the grouping (Treatment); and the Y should be the value corresponding to feature_1, 2....n. 

2) I understand how to it mannully. The code for each feature is kinda like this:

  Fit Group(
 Oneway( Y( :Feature_1 ), X( :Treatment ), Wilcoxon Test( 1 ) ),

However, I don't know how to write a script to let it pick up the column items only have "No treatment" and "Treatment A"; and how to make it run as a loop to finish feature 1 and then feature 2 and till feature 2000.

 

I have tried my best to read the scripting guid. However, I can only catch the basic concept and don't really can make it solved. I will be really grateful if anyone here can at least give a hand here.

 

Thank you so much..

 

Best,

 

Windermere

16 REPLIES 16
txnelson
Super User

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Here is an example of doing what you want.  I am using a sample data table that has a lot of continuous variables.  I change the name of one of the continuous columns to "Feature_1" to mimic the name of the column you are targeting on.  I then delete all continuous columns from the colList that appear before the "Feature_1" column and then run the analysis on all fo the remaining columns.

Names Default To Here( 1 );

/* Open a sample data table */
dt = Open( "$SAMPLE_DATA\semiconductor capability.jmp" );

// For illustration, change the name of one of the columns to "Feature_1"
Column( dt, 10 ) << set name( "Feature_1" );

/* Obtain a list of numeric/continuous column names */
colList = dt << Get Column Names( Continuous );

// Remove from the list of columns, all columns before the column called
// "Feature_1"
If( Contains( colList, Parse( "Feature_1" ) ) > 1,
	colList = Remove( colList, 1, Contains( colList, Parse( "Feature_1" ) ) - 1 )
);

// Run the analysis
Fit Group(
	Oneway( Y( Eval( colList ) ), X( :Wafer ), Wilcoxon Test( 1 ) ), 
	// If you want all of the outputs in a single row and you don't 
	// know how many there will be, a simple way to handle this is
	// to just specify a large number
	<<{Arrange in Rows( 200 )}
);

Please take the time to read through the Scripting Guide to learn what scripting is all about

     Help==>Books==>Scripting Guide

Jim
Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Hi Jim,If I try to open a dataset which is not in SAMPLE_DATA but somewhere under the windows, for example, like this "C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp"

Should I write it like this?

dt= Open("$C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp"); 

 

Thank you

Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Finally, It works...

 

That is a good experience of learning.

 

Thank you

 

Windermere

Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

dt = Open( "$C:\Users\yuanyli4\Desktop\JMP TEST\Sheet1.jmp" );
colList = dt << Get Column Names( Continuous );
dt2 = dt << Stack(
	columns( colList ),
	source Label Column( "Feature" ),
	Stacked Data Column( "Data" ),
	Output Table( (dt << getName) || "Stacked" )
);
Close( dt, NoSave );
ow = dt2 << Oneway(
	Y( :Data ),
	X( :Treatment ),
	wilcoxon Test( 1 ),
	By( :Feature )
);
firstOwRep = Report( ow[1] );
dt3 = firstOwRep[Table Box( 3 )] << makeCombinedDataTable;
dt3 << setName( "One Way Chisquare Results By Feature" );

This is the things I wrote on the basis of the most updated suggestions.

 

It is still not working... :(

When I try to run it. No response at all...

ian_jmp
Level X

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Think you just need “C” (not “$C”) at the start of the filename of the file you are trying to open.
Windermere
Level I

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Yep...you are right. I should not use the $.

 

Thanks

 

I believe both yours and Jim's solutions work now

 

KarenC
Super User (Alumni)

Re: How to write a script for Wilcoxon rank-sum test for thousands features

Another option to consider is the JMP response screenning function.  It doesn't do the rank-sum test, rather a t-test or a robust test (to downweight outliers). It runs fast, provides a table of p-values, graphs for significance vs. effect sizes, and best of all has a "fit selected" option so you can easily fit Y by X of those features of most interest to you and then you could run the Wilcoxon test on a subset of the 2000 features.

 

https://www.jmp.com/support/help/14/response-screening.shtml