Choose Language Hide Translation Bar
Highlighted
KfirDarina
Occasional Contributor

JSL loop that add statistics by run ID for multiple columns take ages

Hi guys.

Need your advice with the following.

I careted this loop that look for all Numeric columns that their name starts with "MEAS" (for measurement) and create a new statistic columns based on data in those columns grouped by runid data.

The loop is working but it takes ages to complete - any idea why? or what can be done to improve it?

 

Thx

 

 

dt_ncol=ncol(dt);
for(i=1,i<=dt_ncol,i++,
if(column(i)<<get data type=="Numeric" & substr(char(column name(1)),1,4)=="MEAS",

((dt<<new column(column name(i)||"_MEAN",
formula(col mean(as column(column name(i)),:runid))));
(dt<<new column(column name(i)||"_MAX",
formula(col max(as column(column name(i)),:runid))));
(dt<<new column(column name(i)||"_MIN",
formula(col min(as column(column name(i)),:runid)))))
,column(i)));
0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
KfirDarina
Occasional Contributor

Re: JSL loop that add statistics by run ID for multiple columns take ages

Morgen thx for your replay.

Couldn't thank you on time, but aventually I took your advice to summarize data within a temporrary table and update the new columns from that table in the main one.

I did it a little differently from the script you suggested, mainly due to issue that I had with the using of a list name within summary calculations (Mean(cols)).

Anyway, I came up with this short script that is working perfect for me:

 

 

//Add measurements statistics

 

// List relevant column names

meascol = dt << get column names( numeric );
measnc = N Items( meascol );
meascollist = {};
For( i = 1, i <= measnc, i++,
If( Contains( meascol[i], "MEAS" ),
Insert Into( meascollist, meascol[i] )
));

// Summary relevant columns and update table

for(k=1,k<=n items(meascollist),k++,

dt << Summary(
Group( :RUNID ),
Mean( char(meascollist[k]) ),
Max( char(meascollist[k]) ),
Min( char(meascollist[k]) ),
output table name( "tempmeas" )
);

dt << Update(
With( Data Table( "tempmeas" ) ),
Match Columns( :RUNID = :RUNID )
);

close(data table("tempmeas"),no save));

View solution in original post

0 Kudos
3 REPLIES 3
gzmorgan0
Super User

Re: JSL loop that add statistics by run ID for multiple columns take ages

This is taking time due to the number of calculations (order) and rendering (drawing) the table values.  Also, column formulas might be slowing it down versus using values.

 

Note the method you are using is finding and calculating the runid rows, by (:runid), 3 times for each column.  Two alternate methods are:

  1. Use table manipulation and forget column formulas. An example script is below, using the JMP sample data table Semiconductor Capability.jmp
  2. Add a sequence column, I call it seq which is  row(). Use Summarize( rid = By(:runid), ibeg=Min(:seq), iend=Max(:seq) ). Then use table assignments. I did not show an example of this methods because I believe it would be slower than table manipulations.  
Names Default to Here(1);

dt = Open("$sample_data/semiconductor Capability.jmp");

//change lot_id column name to runid

dt:lot_id << set name("runid");

//select the columns here. I will just use columns with a name that starts with I 
cols = {};
for(i=5, i<=ncol(dt), i++,
  nme = column(dt,i) << get name;
  if( substr(nme, 1, 1) == "I", insert into(cols, nme) );
);  
  
sumdt = dt << Summary( Group(:runid), Mean(cols), Max(cols), Min(cols), 
		statistics column name format( "column stat" ), Invisible,
			Link to original data table( 0 )
);
ord = {"runid"};
for(i=1, i<=ngrp, i++,
    nme = cols[i];
    column(sumdt, nme || "Mean") << Set Name(nme || "_MEAN");
    column(sumdt, nme || "Min") << Set Name(nme || "_MIN");
    column(sumdt, nme || "Max") << Set Name(nme || "_MAX");

	insert into( ord, eval list({nme||"_MEAN", nme||"_MIN", nme||"_MAX"}))
);

newdt = sumdt << Subset( All Rows, Columns(ord), Invisible );
close(sumdt, NoSave);
wait(0);

dt2 = dt << Join(
	With( newdt ),
	By Matching Columns( :runid = :runid ),
	Merge Same name columns(1),
	Match Flag(0),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 0 ),
	Preserve main table order( 1 )
);

close(newdt,NoSave);
nme = dt << get name;
Close(dt, NoSave);
dt = dt2;
dt << set name(nme);
dt << delete scripts("Source");

A JSL nice to have would be a data table message to Order By (list), dt << Order Columns By( mylist)  or dt << Move Columns( list, After | Before | To Last | To First ).  Currently, I find Subset( Columns (list) )  the fastest.

0 Kudos
KfirDarina
Occasional Contributor

Re: JSL loop that add statistics by run ID for multiple columns take ages

Morgen thx for your replay.

Couldn't thank you on time, but aventually I took your advice to summarize data within a temporrary table and update the new columns from that table in the main one.

I did it a little differently from the script you suggested, mainly due to issue that I had with the using of a list name within summary calculations (Mean(cols)).

Anyway, I came up with this short script that is working perfect for me:

 

//Add measurements statistics

meas_ncol=n col(dt);

// List relevant column names

meascol = dt << get column names( numeric );
measnc = N Items( meascol );
meascollist = {};
For( i = 1, i <= measnc, i++,
If( Contains( meascol[i], "MEAS" ),
Insert Into( meascollist, meascol[i] )
));

// Summary relevant columns and update table

for(k=1,k<=n items(meascollist),k++,

dt << Summary(
Group( :RUNID ),
Mean( char(meascollist[k]) ),
Max( char(meascollist[k]) ),
Min( char(meascollist[k]) ),
output table name( "tempmeas" )
);

dt << Update(
With( Data Table( "tempmeas" ) ),
Match Columns( :RUNID = :RUNID )
);

close(data table("tempmeas"),no save));

0 Kudos
KfirDarina
Occasional Contributor

Re: JSL loop that add statistics by run ID for multiple columns take ages

Morgen thx for your replay.

Couldn't thank you on time, but aventually I took your advice to summarize data within a temporrary table and update the new columns from that table in the main one.

I did it a little differently from the script you suggested, mainly due to issue that I had with the using of a list name within summary calculations (Mean(cols)).

Anyway, I came up with this short script that is working perfect for me:

 

 

//Add measurements statistics

 

// List relevant column names

meascol = dt << get column names( numeric );
measnc = N Items( meascol );
meascollist = {};
For( i = 1, i <= measnc, i++,
If( Contains( meascol[i], "MEAS" ),
Insert Into( meascollist, meascol[i] )
));

// Summary relevant columns and update table

for(k=1,k<=n items(meascollist),k++,

dt << Summary(
Group( :RUNID ),
Mean( char(meascollist[k]) ),
Max( char(meascollist[k]) ),
Min( char(meascollist[k]) ),
output table name( "tempmeas" )
);

dt << Update(
With( Data Table( "tempmeas" ) ),
Match Columns( :RUNID = :RUNID )
);

close(data table("tempmeas"),no save));

View solution in original post

0 Kudos