Discussions

shuey · Mar 3, 2026 08:27 AM

I often need to Stack very large tables with millions of rows, do some operation and then Split the table.
What options are there to speed all this up besides working with "invisible" data tables in the jsl code?

I'm familiar with the command Begin/End Data Update, however I'm not sure it's relevant here or has any improvement on top of invisible tables.

jthi · Mar 3, 2026 09:44 AM

Names Default To Here(1);

dt = Open("$DOWNLOADS/example_input.jmp");
valid_cols = 4;

m = dt[0,0];
r = Transform Each({vals}, m,
	nonmissing = Where(!IsMissing(vals));
	vals[nonmissing];
);

dt[0, 1::valid_cols] = r;
dt << Delete Columns(valid_cols::N Cols(dt));

-Jarmo

View solution in original post

jthi · Mar 3, 2026 09:54 AM

And if this is how the data really looks like

Pick your "duplicate" columns
Make a subset with them + Id column (might have to create one if you don't already have one)
Drop rows which have missing values
Rename your columns to be same in both tables (+drop extra columns)
Use Update on the original table with the ID column
Close the subset

-Jarmo

View solution in original post

jthi · Mar 3, 2026 08:42 AM

Invisible (or Private) are good starting points and then it depends a lot on your tables, sometimes you might be even able to avoid split/stack.

-Jarmo

shuey · Mar 3, 2026 09:18 AM

My goal is to Coalesce Columns which represent the same column, meaning I want to merge columns which represent the same column. I attached an example table where each of the columns Age, Weight, Height is spread across 2 columns. the goal is to merge each couple of the columns.
originally I would stack the columns, cut out the relevant part (Age/Weight/Height) and then split back, however this becomes a problem with very large data tables.

christian-z · Mar 3, 2026 6:55 AM

~~Have you tried Query Builder?~~

Sorry, I misunderstood the question. I don't think Query Builder helps here.

jthi · Mar 3, 2026 09:44 AM

Names Default To Here(1);

dt = Open("$DOWNLOADS/example_input.jmp");
valid_cols = 4;

m = dt[0,0];
r = Transform Each({vals}, m,
	nonmissing = Where(!IsMissing(vals));
	vals[nonmissing];
);

dt[0, 1::valid_cols] = r;
dt << Delete Columns(valid_cols::N Cols(dt));

-Jarmo

jthi · Mar 3, 2026 09:54 AM

And if this is how the data really looks like

Pick your "duplicate" columns
Make a subset with them + Id column (might have to create one if you don't already have one)
Drop rows which have missing values
Rename your columns to be same in both tables (+drop extra columns)
Use Update on the original table with the ID column
Close the subset

-Jarmo

Discussions

How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Recommended Articles