Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Re: JSL loop that add statistics by run ID for multiple columns take ages

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Dec 20, 2018 1:10 AM
(1730 views)

Hi guys.

Need your advice with the following.

I careted this loop that look for all Numeric columns that their name starts with "MEAS" (for measurement) and create a new statistic columns based on data in those columns grouped by runid data.

The loop is working but it takes ages to complete - any idea why? or what can be done to improve it?

Thx

```
dt_ncol=ncol(dt);
for(i=1,i<=dt_ncol,i++,
if(column(i)<<get data type=="Numeric" & substr(char(column name(1)),1,4)=="MEAS",
((dt<<new column(column name(i)||"_MEAN",
formula(col mean(as column(column name(i)),:runid))));
(dt<<new column(column name(i)||"_MAX",
formula(col max(as column(column name(i)),:runid))));
(dt<<new column(column name(i)||"_MIN",
formula(col min(as column(column name(i)),:runid)))))
,column(i)));
```

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Morgen thx for your replay.

Couldn't thank you on time, but aventually I took your advice to summarize data within a temporrary table and update the new columns from that table in the main one.

I did it a little differently from the script you suggested, mainly due to issue that I had with the using of a list name within summary calculations (Mean(cols)).

Anyway, I came up with this short script that is working perfect for me:

//Add measurements statistics

// List relevant column names

meascol = dt << get column names( numeric );

measnc = N Items( meascol );

meascollist = {};

For( i = 1, i <= measnc, i++,

If( Contains( meascol[i], "MEAS" ),

Insert Into( meascollist, meascol[i] )

));

// Summary relevant columns and update table

for(k=1,k<=n items(meascollist),k++,

dt << Summary(

Group( :RUNID ),

Mean( char(meascollist[k]) ),

Max( char(meascollist[k]) ),

Min( char(meascollist[k]) ),

output table name( "tempmeas" )

);

dt << Update(

With( Data Table( "tempmeas" ) ),

Match Columns( :RUNID = :RUNID )

);

close(data table("tempmeas"),no save));

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: JSL loop that add statistics by run ID for multiple columns take ages

This is taking time due to the number of calculations (order) and rendering (drawing) the table values. Also, column formulas might be slowing it down versus using values.

Note the method you are using is finding and calculating the runid rows, by (:runid), 3 times for each column. Two alternate methods are:

- Use table manipulation and forget column formulas. An example script is below, using the JMP sample data table Semiconductor Capability.jmp
- Add a sequence column, I call it seq which is row(). Use Summarize( rid = By(:runid), ibeg=Min(:seq), iend=Max(:seq) ). Then use table assignments. I did not show an example of this methods because I believe it would be slower than table manipulations.

```
Names Default to Here(1);
dt = Open("$sample_data/semiconductor Capability.jmp");
//change lot_id column name to runid
dt:lot_id << set name("runid");
//select the columns here. I will just use columns with a name that starts with I
cols = {};
for(i=5, i<=ncol(dt), i++,
nme = column(dt,i) << get name;
if( substr(nme, 1, 1) == "I", insert into(cols, nme) );
);
sumdt = dt << Summary( Group(:runid), Mean(cols), Max(cols), Min(cols),
statistics column name format( "column stat" ), Invisible,
Link to original data table( 0 )
);
ord = {"runid"};
for(i=1, i<=ngrp, i++,
nme = cols[i];
column(sumdt, nme || "Mean") << Set Name(nme || "_MEAN");
column(sumdt, nme || "Min") << Set Name(nme || "_MIN");
column(sumdt, nme || "Max") << Set Name(nme || "_MAX");
insert into( ord, eval list({nme||"_MEAN", nme||"_MIN", nme||"_MAX"}))
);
newdt = sumdt << Subset( All Rows, Columns(ord), Invisible );
close(sumdt, NoSave);
wait(0);
dt2 = dt << Join(
With( newdt ),
By Matching Columns( :runid = :runid ),
Merge Same name columns(1),
Match Flag(0),
Drop multiples( 0, 0 ),
Include Nonmatches( 1, 0 ),
Preserve main table order( 1 )
);
close(newdt,NoSave);
nme = dt << get name;
Close(dt, NoSave);
dt = dt2;
dt << set name(nme);
dt << delete scripts("Source");
```

A JSL nice to have would be a data table message to Order By (list), dt << Order Columns By( mylist) or dt << Move Columns( list, After | Before | To Last | To First ). Currently, I find Subset( Columns (list) ) the fastest.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: JSL loop that add statistics by run ID for multiple columns take ages

Morgen thx for your replay.

Couldn't thank you on time, but aventually I took your advice to summarize data within a temporrary table and update the new columns from that table in the main one.

I did it a little differently from the script you suggested, mainly due to issue that I had with the using of a list name within summary calculations (Mean(cols)).

Anyway, I came up with this short script that is working perfect for me:

//Add measurements statistics

meas_ncol=n col(dt);

// List relevant column names

meascol = dt << get column names( numeric );

measnc = N Items( meascol );

meascollist = {};

For( i = 1, i <= measnc, i++,

If( Contains( meascol[i], "MEAS" ),

Insert Into( meascollist, meascol[i] )

));

// Summary relevant columns and update table

for(k=1,k<=n items(meascollist),k++,

dt << Summary(

Group( :RUNID ),

Mean( char(meascollist[k]) ),

Max( char(meascollist[k]) ),

Min( char(meascollist[k]) ),

output table name( "tempmeas" )

);

dt << Update(

With( Data Table( "tempmeas" ) ),

Match Columns( :RUNID = :RUNID )

);

close(data table("tempmeas"),no save));

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Morgen thx for your replay.

Couldn't thank you on time, but aventually I took your advice to summarize data within a temporrary table and update the new columns from that table in the main one.

I did it a little differently from the script you suggested, mainly due to issue that I had with the using of a list name within summary calculations (Mean(cols)).

Anyway, I came up with this short script that is working perfect for me:

//Add measurements statistics

// List relevant column names

meascol = dt << get column names( numeric );

measnc = N Items( meascol );

meascollist = {};

For( i = 1, i <= measnc, i++,

If( Contains( meascol[i], "MEAS" ),

Insert Into( meascollist, meascol[i] )

));

// Summary relevant columns and update table

for(k=1,k<=n items(meascollist),k++,

dt << Summary(

Group( :RUNID ),

Mean( char(meascollist[k]) ),

Max( char(meascollist[k]) ),

Min( char(meascollist[k]) ),

output table name( "tempmeas" )

);

dt << Update(

With( Data Table( "tempmeas" ) ),

Match Columns( :RUNID = :RUNID )

);

close(data table("tempmeas"),no save));