cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
] />

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
shuey
Level II

How to speed up Data Table Operations (e.g Stack, Split, ...)?

I often need to Stack very large tables with millions of rows, do some operation and then Split the table.
What options are there to speed all this up besides working with "invisible" data tables in the jsl code?

I'm familiar with the command Begin/End Data Update, however I'm not sure it's relevant here or has any improvement on top of invisible tables.

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
jthi
Super User

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Names Default To Here(1);

dt = Open("$DOWNLOADS/example_input.jmp");
valid_cols = 4;

m = dt[0,0];
r = Transform Each({vals}, m,
	nonmissing = Where(!IsMissing(vals));
	vals[nonmissing];
);

dt[0, 1::valid_cols] = r;
dt << Delete Columns(valid_cols::N Cols(dt));
-Jarmo

View solution in original post

jthi
Super User

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

And if this is how the data really looks like

  1. Pick your "duplicate" columns
  2. Make a subset with them + Id column (might have to create one if you don't already have one)
  3. Drop rows which have missing values
  4. Rename your columns to be same in both tables (+drop extra columns)
  5. Use Update on the original table with the ID column
  6. Close the subset
-Jarmo

View solution in original post

5 REPLIES 5
jthi
Super User

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Invisible (or Private) are good starting points and then it depends a lot on your tables, sometimes you might be even able to avoid split/stack. 

-Jarmo
shuey
Level II

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

My goal is to Coalesce Columns which represent the same column, meaning I want to merge columns which represent the same column. I attached an example table where each of the columns Age, Weight, Height is spread across 2 columns. the goal is to merge each couple of the columns.
originally I would stack the columns, cut out the relevant part (Age/Weight/Height) and then split back, however this becomes a problem with very large data tables.

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Have you tried Query Builder? 

Sorry, I misunderstood the question. I don't think Query Builder helps here. 

jthi
Super User

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

Names Default To Here(1);

dt = Open("$DOWNLOADS/example_input.jmp");
valid_cols = 4;

m = dt[0,0];
r = Transform Each({vals}, m,
	nonmissing = Where(!IsMissing(vals));
	vals[nonmissing];
);

dt[0, 1::valid_cols] = r;
dt << Delete Columns(valid_cols::N Cols(dt));
-Jarmo
jthi
Super User

Re: How to speed up Data Table Operations (e.g Stack, Split, ...)?

And if this is how the data really looks like

  1. Pick your "duplicate" columns
  2. Make a subset with them + Id column (might have to create one if you don't already have one)
  3. Drop rows which have missing values
  4. Rename your columns to be same in both tables (+drop extra columns)
  5. Use Update on the original table with the ID column
  6. Close the subset
-Jarmo

Recommended Articles