A recent Community Discussion script to find and replace a numeric value with a missing value in the whole tablehad a very nice entry provided by @SDF1 that was selected as the solution
names default to here(1);
dt=current data table();
For( i = 1, i <= N Rows( dt ), i++,
For( l = 1, l <= N Cols( dt ), l++,
If( Column( l )[i] == 999,
Column( l )[i] = .
)
)
);
When I viewed the solution, an old teaching popped into my brain that roughly stated is, "Don't loop through a data table. Looping is slow. Use the JMP built in capabilities."
So I started out to modify @SDF1 code.
I determined that my fallback standard of "Get Rows Where()" would be faster. After all, it allows all found rows to be set with one assignment statement, after the "very fast" Get Rows Where() has found the rows.
Another solution, that I have seen pushed recommended by the community is the Recode facility in JMP.
So:
Hypothesis: The built in JMP functions of Get Rows Where and Recode will be faster than Looping down and across an entire data table.
I selected the Semiconductor Capability data table as a good sized table, 132 columns and 1455 rows
I wrote a little script that sets 10 random values in each column to 999.
names default to here(1);
dt=current data table();
for(i=5,i<=ncols(dt),i++,
if(col stddev(ascolumn(dt,i))!=0,
for(k=1,k<=10,k++,
column(dt,i)[randominteger(1,1455)]=999;
)
));
Names Default To Here( 1 );
dt = Current Data Table();
start = HP Time();
For( i = 5, i <= N Cols( dt ), i++,
dt << Recode Column(
As Column( dt, i ),
{If( _rcNow == 999, ., _rcNow )},
Target Column( As Column( dt, i ) )
)
);
show(hptime()-start)
This is the original value to be changed in the Discussion mentioned above. The 999 is to be changed to missing (.) by the scripts.
Method:
- Startup JMP 16.
- Read in the Semiconductor Capability data table
- Run the script to set the 999 values in each column
- Repeat the above 2 steps for each of the 3 methods and recode the time down to the millisecond.
- Repeat the above steps for JMP15
Here is the script for the Recode
Names Default To Here( 1 );
dt = Current Data Table();
start = HP Time();
For( i = 5, i <= N Cols( dt ), i++,
dt << Recode Column(
As Column( dt, i ),
{If( _rcNow == 999, ., _rcNow )},
Target Column( As Column( dt, i ) )
)
);
show(hptime()-start)
The code for the Get Rows Where
names default to here(1);
dt=current data table();
start=hptime();
for(i=5,i<= ncols(dt),i++,
column(dt,i)[dt<<get rows where(ascolumn(dt,i)==999)]=.;
if(mod(i,10)==0,show(i));
);
show(hptime()-start)
And here are the surprising results
The For Loop wins hands down. And from the results, the Get Rows Where should be used sparingly.
Comments and corrections are welcomed......
Jim