cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Speed up Tables/Update

What inspired this wish list request? 

☐ cool new feature
☑ could help many users!

☑ removes a „bug“

☐ nice to have

☐ nobody needs it

 

#myTop10_2023a
(either this one or Tables join: option to update main table )

 

Tables Update is a really nice Platfrom which can be used to merge columns into a data table or to update columns with the data from another data table.

But sometimes the user has to wait for a long time till Jmp finishes a data table update - even with medium large data tables.

 

Sometimes, Jmp already gets stuck while editing settings in the Update GUI.
Then it seems that Jmp crashed - , but after some minutes (or sometimes:  hours), Jmp finishes the calculation for the data preview and gives the user again the chance to edit the settings.

It's clear that it takes a while to match hundreds of thousands of values in one table with values in another table.

But I am surprised that merging two columns takes twice as long as merging one column

Is the matching step triggered again for each column that is merged - or is it actually the sorting of the values which takes so long?

 

Often, I don't have unique identifiers in the source table: a single match value shows up several times in a column - with identical values also in the column to be merged. In this case, I wonder if the value of the first match is entered to the target data table, then replaced by the second  - and so on ...
This could be accelerated by searching the last matching value in the source table - and just inserting this value.

 

A big drawback of the Update Platform:

It starts with add columns and replace columns activated.

If a table has e.g. 1000 columns, it takes a while till all columns are added to the preview window.
The issue scales with rows_main     x     rows_2nd    x    columns_2nd.

And as stupid as I am, I often start by defining the match columns ... which allows me to go for a long walk till I am allowed to pick the (1-2) column(s) I actually want to add to the table.

There is a functionality which disables the preview if the table is too large.
But I fear that the function just checks the length of the table -  and not the number of columns.
Or the threshold is too large?

 

Please share your experience - how often you had to fill the time till your Jmp session was back again after an update of the preview tab.

 

Another issue is the Recall Button
(addressed here: Disable Preview while filling the GUI via Recall  )
After clicking on Recall, the Update menu is filled with the saved settings - and after each change of a GUI setting there seems to be an update of the preview tab.
So if you want to join large data tables - even if you just want to add a singe column - don't dare to press Recall ... before you disable the preview.

 

What is the improvement you would like to see? 

Could you please improve the performance of Tables/Update.

 

Why is this idea important? 

Tables/Update is a wonderful function. The Preview Tab is amazingly useful, I don't want to work without it.

But I'm starting to get paranoia - how long will it take the next time when I try to add columns to a data tables.

7 Comments
hogi
Level XI

just try this - 40 students, N x:

 

Names Default to Here(1);
// adjust the size and check// +1 means: data table size x 2// ... and update time x 4   
mySize=12;

dt0 = Open( "$SAMPLE_DATA/Big Class.jmp" );

increaseSize= Function({dt,cycles},
for(i=1,i<=cycles,i++,
dt << Concatenate(dt,	Append to first table))
);

increaseSize(dt0,mySize);

show(nrows (dt0));

dt2 = subset(dt0);

//if you are brave, enable this:
/*for (i=1,i<5,i++,
dt2 << new column("new", set each value (random uniform()))
);*/

//generate a list of columns which should be added
Cols = dt2 << get column names();

remove from (cols, {1,2,3} );

dt0 << Delete Columns( :height, :weight );

t1=hptime();
Eval(Eval Expr(dt0 << Update(
	With( dt2 ),
	Match Columns( :name = :name, :sex = :sex ),
	Add Columns from Update Table(
		Expr(Cols)
	),
	Replace Columns in Main Table( None )
)));
show((hptime()-t1)/1000000); // on my notebook: ~ 15s per column

close(dt0, noSave);
close(dt2, noSave);  
hogi
Level XI

At first place, a progress bar could help the user

... slightly: to understand that Jmp is not crashed, but doing something in the background.

hogi
Level XI

It seems that a tremendous improvement could be achieved if UPDATE didn't use it's own function
... but just triggered JOIN in the background.

hogi
Level XI

Interesting: disabling the

remove from (cols, {1,2,3} );

costs additional 10s with Tables/Update.

 

Seems that the row matching  is not only done for each column which is added.

It is also done to some extent for columns which are already there.

hogi
Level XI

@hogi wrote:

But I fear that the function just checks the length of the table -  and not the number of columns.


Actually, it seems that Tables/Update checks the number of columns

- if there are > 100 columns, auto-update is disabled.

 

But it seems that Jmp just checks the  # of rows/ columns of the main table
not of the second table.


So, if the # of cols of the main table is too large, the preview is disabled - although # cols (main table) doesn't have any impact on the speed

 

Names Default to Here(1);
// adjust the size and check// +1 means: data table size x 2//
mySize1=7;

dt0 = Open( "$SAMPLE_DATA/Big Class.jmp" );

new column ("one", set each value(1));

increaseSize= Function({dt,cycles},
for(i=1,i<=cycles,i++,
dt << Concatenate(dt,	Append to first table))
);

increaseSize(dt0,mySize1);

numberRows0 = Nrows(dt0);

dt2 = dt0 << Subset( All rows, columns( :name, :age ) );


for (i=1,i<1000,i++,
dt2 << new column("new", set each value (random uniform()))
);

numberRows2 = Nrows(dt2);
dt2Name = dt2 << get name();

New Column( "Rank",	Formula( Col Rank( Row(), :name ) ));


Try(updateWindow << close window());
updateWindow = New window("are you brave?",
dt 0 << update()	
);

updateWindow[Text Box(1)] << background color (Orange) << text color (white )<< set font ("Arial Black", 10)
<<  set text("The tables have <10k rows. \!nBut the second table has > 1000 columns. \!nSo I deactivated the preview for you.\!nIf you have some time, activate it again ...");
updateWindow["Preview", CheckboxBox( 1 )] << set( 1, 0 );
updateWindow[ListBoxBox( 1 )] << set selected (loc(updateWindow[ListBoxBox( 1 )] <<get items,dt2Name)) ;

updateWindow[PanelBox(2),ListBoxBox( 1 )] << set selected (loc(updateWindow[ListBoxBox( 2 )] <<get items,"name")) ;
updateWindow[PanelBox(2),ListBoxBox( 2 )] << set selected (loc(updateWindow[ListBoxBox( 3 )] <<get items,"name")) ;

// what can easily fix the issue: an additional match condition which reduces the number of possible matches:
//updateWindow[PanelBox(2),ListBoxBox( 1 )] << set selected (loc(updateWindow[ListBoxBox( 2 )] <<get items,"one")) ;
//updateWindow[PanelBox(2),ListBoxBox( 2 )] << set selected (loc(updateWindow[ListBoxBox( 3 )] <<get items,"Rank")) ;

updateWindow[ButtonBox(1)]<< click();


Edit:
Sometimes it's worth the effort to add an additional match condition (e.g. rank in the source table )

This reduces the number of matches and tremendously speeds up the preview & actual update.

hogi
Level XI

One could replace the Tables/Update platform with a user-defined one.

  • it sets Add/Replace Columns to Selected/None (instead of All/All) -->  less columns are evaluated for the preview
  • a user defined Button OK (high-speed) can trigger Tables/Join after converting the settings
    via UpdateHS()

hogi_0-1687969618049.png

 

Names Default to Here(1);
dt = Current Data Table();

If(isempty(dt) | N rows(dt )==0, stop());

Try( myDTUpdate << close window() );

myDTUpdate = New Window( "Update - high speed", dt << update() );

myDTUpdate[Lineup Box( 1 )] << Prepend( mb = MouseBox( Button Box( "OK (HS)", updateHS() ) ) );
mb << settooltip( "high speed via Tables/join" );
//DtUpdateHS:myDTUpdate["Preview", CheckboxBox( 1 )] << set( 1, 0 );
myDTUpdate["Add Columns from Update Table", Radio Box( 1 )] << set( 3 );
myDTUpdate["Replace Columns in Main Table", Radio Box( 1 )] << set( 2 );


 

hogi
Level XI

Concerning the speed differences between Tables/Join and Tables/Update, I got the feedback from Jmp Support (TS-00044631) that

by design, Update and Join do different things when there’s multiple matches. [...] Join uses the first match, so as soon as it finds a match it’s done.

[...] Update [looks] through all of those matches, applying each one, and only the last one sticks. *)


This explains the huge timing difference for the special case above because of the many copies of the data.

 

So, be cautious when using Update if you have tables with non-unique values and many columns.
On the other hand, the issue will be easily fixed if Jmp checks the presence of non-unique values and many columns before it decides to calculate the preview

 

*) The difference between Tables/Join and Tables/Update is nicely illustrated by this example from Jmp Support:

Notice that the Joined table has all 1’s in col2, while the Updated table has all 3’s in col2.

dt0 = New Table( "MyTable",
        Add Rows( 3 ),
        New Column( "col1",
               Character, "Nominal",
               Set Values({"a", "a", "a"})
        ),
        New Column( "col2",
               Numeric,
               "Continuous",
               Set Values([1, 2, 3])

        )

);
dt2 = subset( dt0 );


 

Cols = dt2 << get column names();
Remove From( cols, {1} );
dt0 << Delete Columns( :col2 );

Eval(Eval Expr(dt0 << Join(
        With( dt2 ),
        Select( :col1 ),
        SelectWith(Expr( Cols) ),
        By Matching Columns( :col1 = :col1 ),
        Drop multiples( 0, 1 ),
        Preserve main table order( 1 ),
)));

Eval(
        Eval Expr(
               dt0 << Update(
                       With( dt2 ),
                       Match Columns( :col1 = :col1 ),
                       Add Columns from Update Table( Expr( Cols ) ),
                       Replace Columns in Main Table( None )
               )
        )
);