cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
mikedriscoll
Level VI

What is the fastest way to populate a new column with character values?

Hi, I have a script that does an advanced table concatenation, and I want to add 'source columns' to indicate the original table names. I have a few methods that work but all are slow for large tables. I also tried a formula method but it wasn't quick either. Is there some other trick I can use?

Example below very closely matches what I'm doing. Data table sizes of 1M rows are not unusual, so I'd really like to get this operation down into the low seconds for large tables.

Thanks,

Mike

dt = new table();

dtList = {dt};

rowCount = 50000;

dt << add rows(rowCount);

time2 = today();

if(0,

      dt << new column("test", character, set each value(eval(dtList[1]) << get name)); // 24 sec at 50k rows.

,

      tempList = {};

           for(i = 1, i <= rowCount, i++, tempList[i] = eval(dtList[1]) << get name);      //104 sec at 50k rows.

      dt << new column("test", character, values(templist));

);

show(eval(today()-time2));

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: What is the fastest way to populate a new column with character values?

start = Tick Seconds();

dt = New Table();

Eval(

    Eval Expr(

        dt << New Column( "test",

            Character,

            "Nominal",

            Formula( Expr( dt << getname ) )

        )

    )

);

dt << addrows( 1e7 );

dt << runformulas;

stop = Tick Seconds();

Show( stop - start );

stop - start = 3.80000000004657; // 10,000,000 rows in 4 seconds

The eval( evalExpr( ... expr( ... ) ... )) pattern is one way to manipulate an expression in JMP.  To understand it, start with EvalExpr( ... ) which looks at its argument to find any Expr( ... ).  the Expr parts are evaluated and the result (a string in this case) is left in place of the Expr.  The result of EvalExpr is an expression, which needs to be evaluated by the outer eval(...)

There might be a faster way; I didn't explore very far.

12108_table.PNG

Craige

View solution in original post

3 REPLIES 3
Craige_Hales
Super User

Re: What is the fastest way to populate a new column with character values?

start = Tick Seconds();

dt = New Table();

Eval(

    Eval Expr(

        dt << New Column( "test",

            Character,

            "Nominal",

            Formula( Expr( dt << getname ) )

        )

    )

);

dt << addrows( 1e7 );

dt << runformulas;

stop = Tick Seconds();

Show( stop - start );

stop - start = 3.80000000004657; // 10,000,000 rows in 4 seconds

The eval( evalExpr( ... expr( ... ) ... )) pattern is one way to manipulate an expression in JMP.  To understand it, start with EvalExpr( ... ) which looks at its argument to find any Expr( ... ).  the Expr parts are evaluated and the result (a string in this case) is left in place of the Expr.  The result of EvalExpr is an expression, which needs to be evaluated by the outer eval(...)

There might be a faster way; I didn't explore very far.

12108_table.PNG

Craige
mikedriscoll
Level VI

Re: What is the fastest way to populate a new column with character values?

Thanks Craige. After I posted my original message, it occurred to me that JMP's native concatenate() tables function has the 'create source column' option, and I was able to find the code to this using the standard concatenate feature and then click on the red triangle / source script to see how it worked. This option doesn't seem to be documented anywhere.  My script works with two copies of the original tables, so this intermediate solution required a little manipulation to get it back to the original table names.  Had it been the original tables this would have been a good solution.

Your method is fairly clean, and certainly is fast. The need for something like this comes up fairly often so I'll bookmark this one. I think it is time I learned the value of those function combinations. I've seen them used, I've used them when I've been told it's how to get something to work properly, but I rarely seem to be able to recognize when to use it. I was close this time... I thought I got the formula working (mentioned in OP) but looking at the code I remember it didn't work because the expression just showed up in the formula. I figured I would need something like this but wasn't sure how to do it.

Is there some rule of thumb regarding which types of functions can / should be wrapped in expressions to optimize my scripts? Or would you recommend just running it and doing a timing analysis and fixing what's slow, which is what I do?

Thanks again,

Mike

PS thanks for reminding me about tick seconds(). I knew there was a better function than today() for that.

Craige_Hales
Super User

Re: What is the fastest way to populate a new column with character values?

There is an easier way than my first example, at least for this case.

start = Tick Seconds();

dt = New Table();

dt << New Column( "test", Character, "Nominal", Formula( Eval( dt << getname ) ) );

dt << addrows( 1e7 );

dt << runformulas;

stop = Tick Seconds();

Show( stop - start );

The Formula argument of the NewColumn message to a data table was originally designed to be very easy to use for the common cases, something like

Formula( height + weight )

where height and weight are data table columns and the addition should be done on each row.  So, Formula does not evaluate its argument, but just stores the unevaluated expression as the column's formula.

Unless...the outer function is Eval(...) which you see in the easier example above.  When the formula(...) argument is processed, it looks for the Eval and uses the evaluated result.  There are other places in JMP that do this too.  And, there are other places, like the substitute function, that use Expr(...) to prevent an evaluation of expression arguments.

Craige