turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- What is the fastest way to populate a new column with character values?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 29, 2016 5:31 PM
(4051 views)

Hi, I have a script that does an advanced table concatenation, and I want to add 'source columns' to indicate the original table names. I have a few methods that work but all are slow for large tables. I also tried a formula method but it wasn't quick either. Is there some other trick I can use?

Example below very closely matches what I'm doing. Data table sizes of 1M rows are not unusual, so I'd really like to get this operation down into the low seconds for large tables.

Thanks,

Mike

dt = new table**()**;

dtList = **{**dt**}**;

rowCount = **50000**;

dt << **add rows(**rowCount**)**;

time2 = today**()**;

if**(0**,

dt << **new column(**"test", character, set each value**(**eval**(**dtList**[****1])** << **get name))**; // 24 sec at 50k rows.

,

tempList = **{}**;

for**(**i = **1**, i <= rowCount, i++, tempList**[**i**]** = eval**(**dtList**[****1])** << **get name)**; //104 sec at 50k rows.

dt << **new column(**"test", character, values**(**templist**))**;

**)**;

show**(**eval**(**today**()**-time2**))**;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 29, 2016 8:18 PM
(7112 views)

start = Tick Seconds();

dt = New Table();

Eval(

Eval Expr(

dt << New Column( "test",

Character,

"Nominal",

Formula( Expr( dt << getname ) )

)

)

);

dt << addrows( 1e7 );

dt << runformulas;

stop = Tick Seconds();

Show( stop - start );

**stop - start = 3.80000000004657; // 10,000,000 rows in 4 seconds**

The eval( evalExpr( ... expr( ... ) ... )) pattern is one way to manipulate an expression in JMP. To understand it, start with EvalExpr( ... ) which looks at its argument to find any Expr( ... ). the Expr parts are evaluated and the result (a string in this case) is left in place of the Expr. The result of EvalExpr is an expression, which needs to be evaluated by the outer eval(...)

There might be a faster way; I didn't explore very far.

Craige

3 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 29, 2016 8:18 PM
(7113 views)

start = Tick Seconds();

dt = New Table();

Eval(

Eval Expr(

dt << New Column( "test",

Character,

"Nominal",

Formula( Expr( dt << getname ) )

)

)

);

dt << addrows( 1e7 );

dt << runformulas;

stop = Tick Seconds();

Show( stop - start );

**stop - start = 3.80000000004657; // 10,000,000 rows in 4 seconds**

The eval( evalExpr( ... expr( ... ) ... )) pattern is one way to manipulate an expression in JMP. To understand it, start with EvalExpr( ... ) which looks at its argument to find any Expr( ... ). the Expr parts are evaluated and the result (a string in this case) is left in place of the Expr. The result of EvalExpr is an expression, which needs to be evaluated by the outer eval(...)

There might be a faster way; I didn't explore very far.

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 30, 2016 6:05 AM
(3920 views)

Thanks Craige. After I posted my original message, it occurred to me that JMP's native concatenate() tables function has the 'create source column' option, and I was able to find the code to this using the standard concatenate feature and then click on the red triangle / source script to see how it worked. This option doesn't seem to be documented anywhere. My script works with two copies of the original tables, so this intermediate solution required a little manipulation to get it back to the original table names. Had it been the original tables this would have been a good solution.

Your method is fairly clean, and certainly is fast. The need for something like this comes up fairly often so I'll bookmark this one. I think it is time I learned the value of those function combinations. I've seen them used, I've used them when I've been told it's how to get something to work properly, but I rarely seem to be able to recognize when to use it. I was close this time... I thought I got the formula working (mentioned in OP) but looking at the code I remember it didn't work because the expression just showed up in the formula. I figured I would need something like this but wasn't sure how to do it.

Is there some rule of thumb regarding which types of functions can / should be wrapped in expressions to optimize my scripts? Or would you recommend just running it and doing a timing analysis and fixing what's slow, which is what I do?

Thanks again,

Mike

PS thanks for reminding me about tick seconds(). I knew there was a better function than today() for that.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 30, 2016 7:14 AM
(3920 views)

There is an easier way than my first example, at least for this case.

start = Tick Seconds();

dt = New Table();

dt << New Column( "test", Character, "Nominal", Formula( **Eval( dt << getname )** ) );

dt << addrows( 1e7 );

dt << runformulas;

stop = Tick Seconds();

Show( stop - start );

The Formula argument of the NewColumn message to a data table was originally designed to be very easy to use for the common cases, something like

Formula( height + weight )

where height and weight are data table columns and the addition should be done on each row. So, Formula does not evaluate its argument, but just stores the unevaluated expression as the column's formula.

Unless...the outer function is **Eval(...)** which you see in the easier example above. When the formula(...) argument is processed, it looks for the Eval and uses the evaluated result. There are other places in JMP that do this too. And, there are other places, like the substitute function, that use **Expr(...)** to prevent an evaluation of expression arguments.

Craige