Solved: Re: How to write a script for Wilcoxon rank-sum test for thousands features

Windermere · Nov 6, 2018 04:49 PM

Hi all experts, I am an user of JMP. I am writing to learn how to write a script to caculate Wilcoxon rank-sum test results for thousands of features. The tables of grouping and features(from feature_1 to feature_n) are attached as the figure above. Here is what I am going to do:

1) Wilcoson rank-sum test for "Treatment A vs. No treatment" for feature_1 to feature_2000. I understand I need to use "fit Y by X" funtion. The X should be the grouping (Treatment); and the Y should be the value corresponding to feature_1, 2....n.

2) I understand how to it mannully. The code for each feature is kinda like this:

Fit Group(
Oneway( Y( :Feature_1 ), X( :Treatment ), Wilcoxon Test( 1 ) ),

However, I don't know how to write a script to let it pick up the column items only have "No treatment" and "Treatment A"; and how to make it run as a loop to finish feature 1 and then feature 2 and till feature 2000.

I have tried my best to read the scripting guid. However, I can only catch the basic concept and don't really can make it solved. I will be really grateful if anyone here can at least give a hand here.

Thank you so much..

Best,

Windermere

txnelson · Nov 14, 2018 11:53 AM

Here is an example of doing what you want. I am using a sample data table that has a lot of continuous variables. I change the name of one of the continuous columns to "Feature_1" to mimic the name of the column you are targeting on. I then delete all continuous columns from the colList that appear before the "Feature_1" column and then run the analysis on all fo the remaining columns.

Names Default To Here( 1 );

/* Open a sample data table */
dt = Open( "$SAMPLE_DATA\semiconductor capability.jmp" );

// For illustration, change the name of one of the columns to "Feature_1"
Column( dt, 10 ) << set name( "Feature_1" );

/* Obtain a list of numeric/continuous column names */
colList = dt << Get Column Names( Continuous );

// Remove from the list of columns, all columns before the column called
// "Feature_1"
If( Contains( colList, Parse( "Feature_1" ) ) > 1,
	colList = Remove( colList, 1, Contains( colList, Parse( "Feature_1" ) ) - 1 )
);

// Run the analysis
Fit Group(
	Oneway( Y( Eval( colList ) ), X( :Wafer ), Wilcoxon Test( 1 ) ), 
	// If you want all of the outputs in a single row and you don't 
	// know how many there will be, a simple way to handle this is
	// to just specify a large number
	<<{Arrange in Rows( 200 )}
);

Please take the time to read through the Scripting Guide to learn what scripting is all about

Help==>Books==>Scripting Guide

Jim

View solution in original post

txnelson · Nov 6, 2018 05:09 PM

The two solutions that I can quickly see are either setting the Row State for all of the rows that do not have "No Treatment" or "Treatment A" to "Hide and Exclude", and then running your Fit Y by X analyses

names default to here(1);
dt=current data table();

dt << select where(:treatment !="No Treatment" & :treatment !="Treatment A");

dt << Hide and Exclude;

dt << Fit Group(
     Oneway(………………………..));

Or, subset the data into a new table that only has the "No Treatment" and "Treatment A" rows, and then do the analysis

names default to here(1);
dt=current data table();

dt << select where(:treatment =="No Treatment" | :treatment !=="Treatment A");

dt2=dt<<subset(selected columns(0), selected rows(1));

dt2 << Fit Group(
     Oneway(………………………..));

Jim

Windermere · Nov 6, 2018 05:20 PM

Hi,

Thank you for your solution for picking the group information.

But how to do the test for feature_1, to feauture_2000 as a loop? Could you please also provide a solution for that?

Thanks

Windermere

txnelson · Nov 6, 2018 05:45 PM

Looping across columns and running one of the platforms, is one of the most common items handled in the Discussion Forum. Searching on "Looping" brings up multiple selections. Below is an example of looping taken from an example of looping across columns and running a Graph Builder Platform. The looping for a Oneway Platform will be very similar.

/* Open a sample data table */
dt = Open( "$SAMPLE_DATA\Fitness.jmp" );

/* Obtain a list of numeric/continuous column names as strings */
colList = dt << Get Column Names( Continuous, String );

/* Loop through the list to generate the desired Graph Builder 
   and save the report as a JPEG file */
For( i = 2, i <= Nitems( colList ), i++,
	gb = dt << Graph Builder(
		Size( 534, 454 ),
		Show Control Panel( 0 ),
		Variables(
			X( :Age ),
			Y( Column( dt, colList[i] ) ),
			Color( :Age )
		),
		Elements(
			Points(
				X,
				Y,
				Legend( 3 ),
				Summary Statistic( "Mean" ),
				Error Bars( "Confidence Interval" )
			)
		),
		SendToReport(
			Dispatch(
				{},
				"Graph Builder",
				FrameBox,
				{Marker Size( 5 )}
			)
		), 
		Invisible
	);
	gb << Save Picture( "E:\Trash\Graph Builder_" || colList[i] ||".jpg", JPEG );
);

/* Close table and all invisible reports */
Close( dt, No Save );

Jim

Windermere · Nov 7, 2018 12:09 AM

Hi txnelson,

Thank you for your help. Just want to make sure I understand the following things in a correct way.

1.

/* Obtain a list of numeric/continuous column names as strings */
colList = dt << Get Column Names( Continuous, String );

For my table, I should write it as

colList = dt << "Feature_1"( Continuous, String );

Is it correct?

2.

For( i = 2, i <= Nitems( colList ), i++,

In my case, it should be written as

For( i = 0, i <= feature_2000( colList ), i++,)

Is that correct?

So, I don't really undertand what does i=0, i=1, i=2 mean? Could you please explain?

"Nitems" should be changed to the column name of the last column (feature_2000). Is my understanding correct?

Also, do we need a ")" to stop this loop? The template you shown below didnot have the ")".

Thank you.

I will try the script tomorrow and give you a feebback.

Windermere

Windermere · Nov 8, 2018 10:07 AM

/* Open a sample data table */
dt = Open( "$Sheet1.jmp" );
/* Obtain a list of numeric/continuous column names as strings */
colList = dt << Feature_1( Continuous, String );
For( i = 2, i <= N Items( colList ), i++, );
Oneway(
	Y( :colList ),
	X( :Treatment ),
	Wilcoxon Test( 1 )
);
<<{Arrange in Rows( 1 )};

This is the script I used.

Unfortunately,

It does not work at all..

No reponse.

Does anyone know why?

Thanks

txnelson · Nov 8, 2018 11:04 AM

Below is an annotated script that should give you what you want.

/* Open a sample data table */
dt = Open( "$SAMPLE_DATA\big class.jmp" );
/* Obtain a list of numeric/continuous column names */
colList = dt << Get Column Names( Continuous );

// If you run a Oneway analysis, and specify to Arrange in Rows
// and then save the script, it generates
/*
Fit Group(
	Oneway( Y( :height ), X( :sex ) ),
	Oneway( Y( :weight ), X( :sex ) ),
	<<{Arrange in Rows( 200 )}
)
*/
// A little modification to include all of the columns that
// were extracted using the << Get Columns Names message, and
// the script below will give you what you want
Fit Group(
	Oneway( Y( Eval( colList ) ), X( :sex ) ),
	// If you want all of the outputs in a single row and you don't 
	// know how many there will be, a simple way to handle this is
	// to just specify a large number
	<<{Arrange in Rows( 200 )}
);

All functions and messages in the script above are documented with examples in the Scripting Index

Help==>Scripting Index

The JSL structures and methods are documented in the Scripting Guide

Help==>Books==>Scripting Guide

However, it seems that you are currently limited in your programming knowledge and experience. I strongly suggest that you pick up one of the many books on Introduction to programming. While you will not find a book on "Introduction to Programming for JMP Scripting Language", what is really needed is to learn the concepts of programming. Once that is understood, then switching between languages in the programming world becomes a simple matter of learning the new syntax of the new language.

Jim

Windermere · Nov 8, 2018 11:51 AM

Thank you and you are absoultely right that I dont have scripitng writing training and experience.

I will try again, also need to read. I hope you dont mind if I post anything here if I dont understand...

Thank you again for your patience.

Best,

Windermere

ian_jmp · Nov 9, 2018 1:50 AM

Jim has already given you some great help and guidance.

If I understand correctly, you may have upwards of 2,000 features, so it might be marginally more efficient to avoid looping over them. Also, looking at 2,000 reports is not great, so you probably need to summarise the results in some way.

In this case you can indeed avoid the loop, so you might like to understand how the code below works to help you go further with JSL. Generally stacking the data and using a 'By' variable is a good coding pattern to be aware of. Of course in the example there's only two 'features' (heinght and weight).

// Open a sample data table
dt = Open( "$SAMPLE_DATA\big class.jmp" );

// Obtain a list of column names containing the 'features'
colList = dt << Get Column Names( Continuous );

// Stack the data
dt2 = dt << Stack(
	columns( colList ),
	Source Label Column( "Feature" ),
	Stacked Data Column( "Data" ),
	Output Table( (dt << getName)||" Stacked" )
);
Close(dt, NoSave);

// Do the Wilcoxon tests all at the same time (this will be a list, due to the use of the 'By' variable)
ow = dt2 << Oneway( Y( :Data ), X( :sex ), Wilcoxon Test( 1 ), By(:Feature) );

// Get a link to the first report
firstOwRep = Report(ow[1]); 

// Get the one way ChiSq results from ALL reports (features) into a single table
dt3 = firstOwRep[TableBox(3)] << makeCombinedDataTable;
dt3 << setName("One Way ChiSquare Results By Feature");

Of course, when applying a statistical test thousands of times, you need to consider that the false alarm rate will inevitably be high.

Windermere · Nov 14, 2018 11:28 AM

I am sorry. It still does not work.

I am wondering any one can help with the specific case. I am sorry that I dont know how to write the loop start from the column named Feature_1 to the end. I really hope that who can explain how to write the script on that.

Thanks