cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Sign-in to the JMP Community will be unavailable intermittently Dec. 6-7 due to a system update. Thank you for your understanding!
  • We’re retiring the File Exchange at the end of this year. The JMP Marketplace is now your destination for add-ins and extensions.
  • JMP 19 is here! Learn more about the new features.

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
MeanChris
Level III

Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

I'm Updating a large data table (4.4 million rows) of stacked data with design info from a much smaller table.  I have 1,927 unique rows in the smaller table that correspond to the stacked parameter.  They can be identified either by a column with a unique string (X.Y.Z), or the combination of three columns that contain X, Y and Z individually.  The same columns (X, Y, Z and X.Y.Z) are in the large data table.  X, Y and Z are all integers.

 

I'm not sure how to test this, but I'm curious if matching 3 integer columns is faster than matching 1 string column by any appreciable amount.  May need to repeat this fairly often, and the large data table is still growing.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

I created some random data to test this on 1 million rows.  Updating by matching three integer columns was on average twice as fast as on the single character column.

 

Names Default to Here( 1 );
dt = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt << Add Rows( 1e6 );
For Each Row( dt, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z );  );
dt2 = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt2 << Add Rows( 1e6 );
For Each Row( dt2, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z ); :Update = {"taco", "cat", "goat", "cheese", "pizza"}[Random Integer( 5 )] );
For Each( {v, i}, 1::10, 
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :XYZ = :XYZ ) );
char_time = HP Time() - start_time;
Print( "Update on single character column match: " || Char( char_time / 1e6 ) || " seconds" );
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :X = :X, :Y = :Y, :Z = :Z ) );
num_time = HP Time() - start_time;
Print( "Update on three integer columns match: " || Char( num_time / 1e6 ) || " seconds" ) );

//"Update on single character column match: 1.700707 seconds"
//"Update on three integer columns match: 0.70421 seconds"
//"Update on single character column match: 1.508058 seconds"
//"Update on three integer columns match: 0.718748 seconds"
//"Update on single character column match: 1.316761 seconds"
//"Update on three integer columns match: 0.654922 seconds"
//"Update on single character column match: 1.344123 seconds"
//"Update on three integer columns match: 0.659323 seconds"
//"Update on single character column match: 1.284766 seconds"
//"Update on three integer columns match: 0.654989 seconds"
//"Update on single character column match: 1.272866 seconds"
//"Update on three integer columns match: 0.666974 seconds"
//"Update on single character column match: 1.289576 seconds"
//"Update on three integer columns match: 0.656926 seconds"
//"Update on single character column match: 1.276214 seconds"
//"Update on three integer columns match: 0.707188 seconds"
//"Update on single character column match: 1.689671 seconds"
//"Update on three integer columns match: 0.668045 seconds"
//"Update on single character column match: 1.375964 seconds"
//"Update on three integer columns match: 0.675418 seconds"

View solution in original post

4 REPLIES 4
MeanChris
Level III

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

Update:  I created duplicate tables and updated them with the different methods.  I was hoping to see an execution time in the JMP Log window or something that would tip me off as to how long each took.

 

Both were so fast I spent nearly 10^6 longer writing this post hypothesizing about it.  But if I had to pick, it seemed like the single column string match was an eye-blink slower.

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

I created some random data to test this on 1 million rows.  Updating by matching three integer columns was on average twice as fast as on the single character column.

 

Names Default to Here( 1 );
dt = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt << Add Rows( 1e6 );
For Each Row( dt, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z );  );
dt2 = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt2 << Add Rows( 1e6 );
For Each Row( dt2, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z ); :Update = {"taco", "cat", "goat", "cheese", "pizza"}[Random Integer( 5 )] );
For Each( {v, i}, 1::10, 
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :XYZ = :XYZ ) );
char_time = HP Time() - start_time;
Print( "Update on single character column match: " || Char( char_time / 1e6 ) || " seconds" );
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :X = :X, :Y = :Y, :Z = :Z ) );
num_time = HP Time() - start_time;
Print( "Update on three integer columns match: " || Char( num_time / 1e6 ) || " seconds" ) );

//"Update on single character column match: 1.700707 seconds"
//"Update on three integer columns match: 0.70421 seconds"
//"Update on single character column match: 1.508058 seconds"
//"Update on three integer columns match: 0.718748 seconds"
//"Update on single character column match: 1.316761 seconds"
//"Update on three integer columns match: 0.654922 seconds"
//"Update on single character column match: 1.344123 seconds"
//"Update on three integer columns match: 0.659323 seconds"
//"Update on single character column match: 1.284766 seconds"
//"Update on three integer columns match: 0.654989 seconds"
//"Update on single character column match: 1.272866 seconds"
//"Update on three integer columns match: 0.666974 seconds"
//"Update on single character column match: 1.289576 seconds"
//"Update on three integer columns match: 0.656926 seconds"
//"Update on single character column match: 1.276214 seconds"
//"Update on three integer columns match: 0.707188 seconds"
//"Update on single character column match: 1.689671 seconds"
//"Update on three integer columns match: 0.668045 seconds"
//"Update on single character column match: 1.375964 seconds"
//"Update on three integer columns match: 0.675418 seconds"

hogi
Level XIII

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

Please have a look atSpeed up Tables/Update for further details about the speed issue of Tables/Update.

If the matches are unique (every entry shows up once), the function is fast.

 

It gets extremely slow when values show up multiple times:

The root cause:
- the match is detected for each row in the target table

- TIMES each match in the source table (the last one "sticks")

- TIMES number of columns to add and update (the information about the matches is generated again and again)

 

Therefore, even for small tables wich just a few 100k rows, it can take hours till update finishes - enough time to write multiple posts on the topic ....

 

Who dares to try

start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :X = :X ) );
char_time = HP Time() - start_time;
Print( "Update on single numeric column match: " || Char( char_time / 1e6 ) || " seconds" );

... for a table with 5 columns?

hogi
Level XIII

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

So, if JMP gets stuck during Tables/Update, it's much faster to start a new instance of JMP, load the last saved version of your data file, execute again some unsaved steps and then 
a) use Tabes/Join
or

b) generate a Summary table of the source table (-> unique matches) and use Tables / Update

Recommended Articles