cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Makeover Challenge Suggestion: Something about Loops or finding elements/values in a list/matrix/...

 
6 Comments
Monomorphist
Level III

80% of my scripting involves loops, going through data sets that commonly comprise +100k rows and +50 columns. Depending on the conditions that I set, the script might extract 2, or 3 or 5, or 10 or 20 of these columns for a variety of calculations (eg log-ratio pairs), the result of which are then stored in additional "result" columns, perhaps 10, or 20 or 100 of those (which I then need to compare for similarities...). If I want to work now with the result columns, but do not know with how many of those I will eventually end up.... how do I write my formulas so that they are always valid, regardless whether it's for 2 or for 200 columns to be applied?

I can dump every result column into a list, so I know how many result columns I have. But I still don't know how to formulate the equation (such as  x = n1 + n2 for 2 columns, but  n1 + n2 + .... + nn for n columns),.

Cheers,

Kofi

SDF1
Super User

Hi @Monomorphist ,

 

  Here's some modified code I found from searching through the discussion forums. The original post is here.

 

  I think if you have a list of your result columns, which should be pretty easy to either generate or get, then you should be able to implement something like my modified AddCols to generate the formulas that you need. The original post was very informative and interesting.

Names Default To Here( 1 );

// Make a table
nc = 30;
dt = New Table( "Test" );
For( c = 1, c <= nc, c++,
	dt << New Column( "Col" || Char( c ), Numeric, Continuous, Formula( Random Normal() ) )
);

dt << AddRows( 100 );

// Select half the columns
n = Round( nc / 2 );
c = Random Index( N Col( dt ), n );
For( i = 1, i <= n, i++,
	Column( dt, c[i] ) << setSelected( 1 )
);


//This part generates the column formulas based on the SelCols list.
SelCols = dt << Get Selected Columns;

For( j = N Items(SelCols), j >= 1, j--,
	AddCols = Expr(

		dt << New Column( "Sum_" || Char( j ), Formula( Sum( colsTBD ) ) )
	
	);
	Substitute Into( AddCols, Expr( colsTBD ), Eval( selCols[1::j] ) );
	AddCols;
);

Hope this helps!,

DS

Monomorphist
Level III

Hi DS,

 

well, that's about how I've done it so far: Looping through the rows and then for each row looping through the columns. I was hoping, though, for a more elegant solution. A simple one-liner....

Here's an specific, and very simple, example:

I've 50 columns, and I want to calculate for each row the arithmetic mean of the columns that I choose to select. Let's say, I choose 3 columns E1, E6 and E12 to calculate the mean. When I do this manually, it takes me a few seconds: Add a column in the data table, pick the 3 columns and divide by 3 ->  (:E1 + :E6 + :E12) / 3.

If I want to do this in a script, and want to keep the script valid for the case that I choose not 3, but 4 or 5 or 20 columns, I haven't found any other way - neither here in the community nor in the JMP library/scripting guides etc - than looping vertically through each row and then nesting a horizontal loop through the columns. This seems to be very clunky? The other option would be to write 49 different macros, each for the exact number of columns selected, that is, if I want to select 7 columns, I run script7 and if I want to select 21 columns, I run script21, with the formula adjusted to 21 items. This I find rather ludicrous (then I better keep my clunky nested loops....)

Is there no JSL syntax which condenses a formula to a simple one-liner like:

dt << New Column ("AnyTitle", numeric, continuous, formula ( Sum_of_all_selected_columns / N_items) )?

How would the expresssion/column formula  "Sum_of_all_selected_columns" look like in JSL (if it exists?!?), when the selected columns are retrievable from a ColDialog ColList, Is it perhaps the :: bit ->  Sum(v1::vn) ,where v1 is the first item in the ColList and vn the last, and the formula sums up the first, the last and everything in between, without the need to specify how many items there are in between?

I come across this issue quite frequently, where I want the script to calculate something (eg sum or products etc) with an a priori unknown number of items. 

 

Cheers,

Kofi

 

 

Georg
Level VII

Dear @Monomorphist ,

maybe this helps. When working with lots of columns I find it advantageous to work with column groups. The below example defines a group (many examples can be found in scripting index), and builds a mean formula for that columns and implements it.

The same is scriptable for perhaps selected columns, if needed.

 

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Blood Pressure.jmp" );
Wait( 1 );
theGroup = dt << Group Columns( "theGroup", :BP 8M :: :BP 12W );
Wait( 1 );

// build an expression for formula with column replacement
formula_expr = expr(dt << New Column( "Mean_of_Group",
	Numeric,
	"Continuous",
	Format( "Best", 10 ),
	Formula( mean( expr(current data table() <<get column group( "theGroup" )) ) ),
));

// perform replacement and formula implementation
eval(eval expr(formula_expr));
jthi
Super User

Similar idea as Georg has. Everytime you select new columns, run the EvalEvalExpr()) part again (can be made into script/addin/shortcut if needed).

 

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Semiconductor Capability.jmp");

//create placeholder for colum
new_col = dt << New Column("Sum_selected", Numeric, Continuous);

//after column selection run this part to update script
Eval(EValExpr(new_col << Set Formula(Sum(Expr(Current Data Table() << Get Selected Columns)))));

JMP doesn't have (to my knowledge) "Column State Handler" so you cannot trigger events after column selection changes like you can do with rows.

 

 

You can also try something like this, but I would advice against it and move the formula updating to button press instead of Filter Col Box selection change as it is easier to manage that way

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Semiconductor Capability.jmp");


nw = New Window("Update formula",
	window:new_col = dt << New Column("Sum_selected", Numeric, Continuous); //create column inside new window
	window:dt = dt;
	flc = Filter Col Selector(
		dt, << character(0), << nominal(0), << ordinal(0),
		<< On Change(
			show(window:new_col, window:dt);
			window:dt << Select columns((flc << Get Selected));
			Eval(EValExpr(window:new_col<< Set Formula(Sum(Expr(window:dt << Get Selected Columns)))));
		)
	)
);
Monomorphist
Level III

Hi @jthi ,

 

I've tried your first suggestion - and it works as intended. Thanks a lot. The last line is pretty much the one-liner I was looking for!