cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
  • New JMP features coming to desktops everywhere this September. Sign up to learn more at jmp.com/launch.
Choose Language Hide Translation Bar
MeanChris
Level II

Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

I'm Updating a large data table (4.4 million rows) of stacked data with design info from a much smaller table.  I have 1,927 unique rows in the smaller table that correspond to the stacked parameter.  They can be identified either by a column with a unique string (X.Y.Z), or the combination of three columns that contain X, Y and Z individually.  The same columns (X, Y, Z and X.Y.Z) are in the large data table.  X, Y and Z are all integers.

 

I'm not sure how to test this, but I'm curious if matching 3 integer columns is faster than matching 1 string column by any appreciable amount.  May need to repeat this fairly often, and the large data table is still growing.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

I created some random data to test this on 1 million rows.  Updating by matching three integer columns was on average twice as fast as on the single character column.

 

Names Default to Here( 1 );
dt = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt << Add Rows( 1e6 );
For Each Row( dt, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z );  );
dt2 = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt2 << Add Rows( 1e6 );
For Each Row( dt2, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z ); :Update = {"taco", "cat", "goat", "cheese", "pizza"}[Random Integer( 5 )] );
For Each( {v, i}, 1::10, 
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :XYZ = :XYZ ) );
char_time = HP Time() - start_time;
Print( "Update on single character column match: " || Char( char_time / 1e6 ) || " seconds" );
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :X = :X, :Y = :Y, :Z = :Z ) );
num_time = HP Time() - start_time;
Print( "Update on three integer columns match: " || Char( num_time / 1e6 ) || " seconds" ) );

//"Update on single character column match: 1.700707 seconds"
//"Update on three integer columns match: 0.70421 seconds"
//"Update on single character column match: 1.508058 seconds"
//"Update on three integer columns match: 0.718748 seconds"
//"Update on single character column match: 1.316761 seconds"
//"Update on three integer columns match: 0.654922 seconds"
//"Update on single character column match: 1.344123 seconds"
//"Update on three integer columns match: 0.659323 seconds"
//"Update on single character column match: 1.284766 seconds"
//"Update on three integer columns match: 0.654989 seconds"
//"Update on single character column match: 1.272866 seconds"
//"Update on three integer columns match: 0.666974 seconds"
//"Update on single character column match: 1.289576 seconds"
//"Update on three integer columns match: 0.656926 seconds"
//"Update on single character column match: 1.276214 seconds"
//"Update on three integer columns match: 0.707188 seconds"
//"Update on single character column match: 1.689671 seconds"
//"Update on three integer columns match: 0.668045 seconds"
//"Update on single character column match: 1.375964 seconds"
//"Update on three integer columns match: 0.675418 seconds"

View solution in original post

4 REPLIES 4
MeanChris
Level II

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

Update:  I created duplicate tables and updated them with the different methods.  I was hoping to see an execution time in the JMP Log window or something that would tip me off as to how long each took.

 

Both were so fast I spent nearly 10^6 longer writing this post hypothesizing about it.  But if I had to pick, it seemed like the single column string match was an eye-blink slower.

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

I created some random data to test this on 1 million rows.  Updating by matching three integer columns was on average twice as fast as on the single character column.

 

Names Default to Here( 1 );
dt = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt << Add Rows( 1e6 );
For Each Row( dt, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z );  );
dt2 = New Table( "Match Test", New Column( "X", numeric ), New Column( "Y", numeric ), New Column( "Z", numeric ), New Column( "XYZ", character ), New Column( "Update", character ) );
dt2 << Add Rows( 1e6 );
For Each Row( dt2, :X = Random Integer( 999 ); :Y = Random Integer( 999 ); :Z = Random Integer( 999 ); :XYZ = Char( :X ) || " " || Char(:Y ) || " " || Char( :Z ); :Update = {"taco", "cat", "goat", "cheese", "pizza"}[Random Integer( 5 )] );
For Each( {v, i}, 1::10, 
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :XYZ = :XYZ ) );
char_time = HP Time() - start_time;
Print( "Update on single character column match: " || Char( char_time / 1e6 ) || " seconds" );
start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :X = :X, :Y = :Y, :Z = :Z ) );
num_time = HP Time() - start_time;
Print( "Update on three integer columns match: " || Char( num_time / 1e6 ) || " seconds" ) );

//"Update on single character column match: 1.700707 seconds"
//"Update on three integer columns match: 0.70421 seconds"
//"Update on single character column match: 1.508058 seconds"
//"Update on three integer columns match: 0.718748 seconds"
//"Update on single character column match: 1.316761 seconds"
//"Update on three integer columns match: 0.654922 seconds"
//"Update on single character column match: 1.344123 seconds"
//"Update on three integer columns match: 0.659323 seconds"
//"Update on single character column match: 1.284766 seconds"
//"Update on three integer columns match: 0.654989 seconds"
//"Update on single character column match: 1.272866 seconds"
//"Update on three integer columns match: 0.666974 seconds"
//"Update on single character column match: 1.289576 seconds"
//"Update on three integer columns match: 0.656926 seconds"
//"Update on single character column match: 1.276214 seconds"
//"Update on three integer columns match: 0.707188 seconds"
//"Update on single character column match: 1.689671 seconds"
//"Update on three integer columns match: 0.668045 seconds"
//"Update on single character column match: 1.375964 seconds"
//"Update on three integer columns match: 0.675418 seconds"

hogi
Level XII

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

Please have a look atSpeed up Tables/Update for further details about the speed issue of Tables/Update.

If the matches are unique (every entry shows up once), the function is fast.

 

It gets extremely slow when values show up multiple times:

The root cause:
- the match is detected for each row in the target table

- TIMES each match in the source table (the last one "sticks")

- TIMES number of columns to add and update (the information about the matches is generated again and again)

 

Therefore, even for small tables wich just a few 100k rows, it can take hours till update finishes - enough time to write multiple posts on the topic ....

 

Who dares to try

start_time = HP Time();
dt << Update( With( dt2 ), Match Columns( :X = :X ) );
char_time = HP Time() - start_time;
Print( "Update on single numeric column match: " || Char( char_time / 1e6 ) || " seconds" );

... for a table with 5 columns?

hogi
Level XII

Re: Update Table Match Speed - which is faster: Matching multiple columns X, Y, Z (all integers) or Matching Col "X.Y.Z" unique combined string?

So, if JMP gets stuck during Tables/Update, it's much faster to start a new instance of JMP, load the last saved version of your data file, execute again some unsaved steps and then 
a) use Tabes/Join
or

b) generate a Summary table of the source table (-> unique matches) and use Tables / Update

Recommended Articles