Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- optimizing an IF statement?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 17, 2016 5:30 PM
(12541 views)

I have a list of specific values I am interested in identifying in my data set (specific values of the :MeasurementGroup column). I created a new column called "Value of Interest?" with a formula to generate an indicator: either YES or NO

dt1 << New Column( "Value of Interest?",

Character,

Nominal,

Formula( If(

:MeasurementGroup == "A", "YES",

:MeasurementGroup == "B", "YES",

:MeasurementGroup == "C", "YES",

:MeasurementGroup == "D", "YES",

:MeasurementGroup == "E", "YES",

:MeasurementGroup == "F", "YES",

:MeasurementGroup == "G", "YES",

:MeasurementGroup == "AA", "YES",

:MeasurementGroup == "BB", "YES",

:MeasurementGroup == "CC", "YES",

:MeasurementGroup == "ZZ", "YES",

:MeasurementGroup == "ABC", "YES",

:MeasurementGroup == "DEC", "YES",

:MeasurementGroup == "WSA", "YES",

:MeasurementGroup == "RSS", "YES",

:MeasurementGroup == "DVC", "YES",

:MeasurementGroup == "SFS", "YES",

:MeasurementGroup == "AAZ", "YES",

:MeasurementGroup == "ZZA", "YES",

:MeasurementGroup == "TTT", "YES",

"NO") )

);

My datatable has close to 1 million rows... and it seems to churn for a bit before populating. Is there a more efficient way to acheive this same result?

Any help is greatly appreciated! thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

The Match function is effective for multiple choices. I did a quick test: with a million rows *Match*, *Loc* and *If* complete in 2.1 sec, 6.9 sec and 10.2 sec, respectively.

dt << **New Column****(**"Value of Interest",

Character,

Nominal,

Formula**(**

Match**(**:MeasurementGroup,

"A", "YES",

"B", "YES",

"C", "YES",

"D", "YES",

"E", "YES",

"F", "YES",

"G", "YES",

"AA", "YES",

"BB", "YES",

"CC", "YES",

"ZZ", "YES",

"ABC", "YES",

"DEC", "YES",

"WSA", "YES",

"RSS", "YES",

"DVC", "YES",

"SFS", "YES",

"AAZ", "YES",

"ZZA", "YES",

"TTT", "YES",

"NO"

**)**

**)**

**)**;

7 REPLIES 7

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: optimizing an IF statement?

I would try using the LOC() function

if(N Rows(Loc({"A","B","C",...............},MeasurementGroup))>0,"Yes","NO")

Jim

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

The Match function is effective for multiple choices. I did a quick test: with a million rows *Match*, *Loc* and *If* complete in 2.1 sec, 6.9 sec and 10.2 sec, respectively.

dt << **New Column****(**"Value of Interest",

Character,

Nominal,

Formula**(**

Match**(**:MeasurementGroup,

"A", "YES",

"B", "YES",

"C", "YES",

"D", "YES",

"E", "YES",

"F", "YES",

"G", "YES",

"AA", "YES",

"BB", "YES",

"CC", "YES",

"ZZ", "YES",

"ABC", "YES",

"DEC", "YES",

"WSA", "YES",

"RSS", "YES",

"DVC", "YES",

"SFS", "YES",

"AAZ", "YES",

"ZZA", "YES",

"TTT", "YES",

"NO"

**)**

**)**

**)**;

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: optimizing an IF statement?

Give this man an award! This is great.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: optimizing an IF statement?

Jim, thank you for the quick response! I was able to implement the LOC function and see an immediate improvement in speed!

MS, thank you for improving upon this further and teaching me the MATCH function (and providing the speed comparison)!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: optimizing an IF statement?

Hi JMPBeginner,

**** Please refer to the attached .jsl file to see these scripts as intended. I have been having trouble getting my posts to display correctly, and if you copy/paste or type in the scripts that appear below, I am not sure if they will work. They look incorrect when I post them, but they look OK when I edit, so I am not sure what to do except attach a script file. My apologies!

*****

If you are willing to use value labels (for large tables, this can save a good deal of disk space, as well), you have another couple of speedy options:

1) A formula-based approach using the Contains() function:

dt << New Column( "y/n 2",

formula(

Contains(

{"A", "B", "C", "D", "E", "F", "G", "AA", "BB", "CC", "ZZ", "ABC", "DEC", "WSA", "RSS", "DVC", "SFS", "AAZ", "ZZA", "TTT"},

:MeasurementGroup

) != 0

),

<<Add Column Properties( value labels( {0, 1}, {"NO", "YES"} ) )

);

2) A non-formula based approach: JMP works with tables very quickly; the speed of this approach shocked me a little, given that a completely new table must be created and closed:

dt = Current Data Table();

dt2 = New Table( "mt", invisible );

Column( dt2, 1 ) << datatype( character ) << set name( "MeasurementGroup" ) << set values(

{"A", "B", "C", "D", "E", "F", "G", "AA", "BB", "CC", "ZZ", "ABC", "DEC", "WSA", "RSS", "DVC", "SFS", "AAZ", "ZZA", "TTT"}

);

dt2 << New Column( "y/n", <<set each value( 1 ) );

dt << New Column( "y/n", <<set each value( 0 ) );

dt << Update( With( dt2 ), Match Columns( :Measurement Group = :MeasurementGroup ), Add Columns from Update table( None ) );

Column( dt, "y/n" ) << Add Column Properties( value labels( {0, 1}, {"NO", "YES"} ) );

Close( dt2, nosave );

On a million row table, I was getting runtimes of ~ 2.4 seconds for the Match function. (MS's computer seems to be faster than mine ;-)

The runtimes of approach 1 above were ~ 0.8 seconds, while the runtimes of approach 2 were ~ 0.4 seconds.

Cheers,

Brady

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: optimizing an IF statement?

Looks good Brady!

BTW when I post code I copy/paste the JSL into a word document, and then copy/paste from there to the forum. That preserves the look of the JSL editor.

For example:

dt << **New Column(** "y/n 2",

formula**(**

Contains**(****{**"A", "B", "C", "D", "E", "F", "G", "AA", "BB", "CC", "ZZ", "ABC", "DEC", "WSA",

"RSS", "DVC", "SFS", "AAZ", "ZZA", "TTT"**}**, :MeasurementGroup

**)** != **0**

**)**,

**)**;

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: optimizing an IF statement?

Thanks Peter, I will use this trick from now on when I post.

Cheers,

Brady