Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Re: Biggest problem in JMP in sub setting tables.... sluggishness

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Biggest problem in JMP in sub setting tables.... sluggishness

May 30, 2019 9:01 AM
(4313 views)

I have used multiple ways to subset a long table (rows >100K, very common size of datatable). The problem arises because subsetting requires selecting rows which means a operation on the table before subsetting. And if the table is long it makes it very slow. specially when a foor loop is required to generate plots by subsetting everytime a new parameter.

it does not matter whatever you use, it is just slow: I am not sure if JMP team have any solution to speed up the subsetting??????

select rows where

get rows where

25 REPLIES 25

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Why are you subsetting the data table?

Try selecting the data you don't want, set the Hide and Exclude row state, make your plot, and repeat for the next group.

Learn it once, use it forever!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Also, could you accomplish the same task using filters?

Dan Obermiller

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

I think i mentioned it clearly that problem arises when you select the data, which slows it down.

but i may be not able to understand what you are suggesting. could you provide JSL example. surely i will try if it speeds up things.

Thanks

Ram

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

I've never had a problem if I use get rows where or even just the rows. You don't actually have to select them.

```
names default to here(1);
dt = open("$SAMPLE_DATA\Big Class.jmp");
rows = dt << Get Rows Where(:sex == "M");
dt_sub = dt << Subset(rows(rows));
```

What are you using to select the rows? It's possible that's slow.

Vince Faller - Predictum

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@vince_faller currently i use the same method. but when file size becomes bigger, even get rows matrix becomes bigger and slow.

if you use small data table like

`Big Class.jmp`

, anything can work. problem is with Big data files.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

So with this table of a million rows. It takes a sixth of second to run. It's probably something to do with your script. Like Dan said, if you gave an example script of it running slow we may be able to help you.

```
Names Default to Here( 1 );
dt = New Table("Test",
Add Rows(1000000),
New Column("X", Character, <<Set Each Value(Random Category(.3, "A", .3, "B", "C"))),
New Column("Y", continuous, <<Set Each Value(Random Normal(0, 1)))
);
st = HPTime();
rows = dt << Get Rows Where(:X == "C");
dt << Subset(rows(rows));
tot = HPTime() - st;
show(tot);
//RETURNS
//tot = 148508;
```

If you're calling a sixth of a second sluggish, then I don't know what to tell you.

Also, if you're using a for loop to generate a bunch of plots for each parameter, is there a reason you're not using a by?

Vince Faller - Predictum

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Here is a simple example of how to do the analyses by groups without subsetting

```
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/big class.jmp" );
// Reset the data table row states
dt << clear row states;
// Find the different values for Sex
Summarize( byGroup = By( :sex ) );
// Loop across the different groups and analyze them
For( i = 1, i <= N Items( byGroup ), i++,
// Reset the data table row states
dt << clear row states;
// Hide and Exclude the unwanted rows
dt << select where( :Sex != byGroup[i] );
dt << hide and exclude;
// Run the analysis
ow = dt << oneway( x( :age ), y( :height ) );
// Change the title of the report to show what data are in the analysis
Report( ow )[Outline Box( 1 )] << set title(
Report( ow )[Outline Box( 1 )] << get title || ": Sex=" || byGroup[i]
);
);
// Reset the data table row states
dt << clear row states;
```

Jim

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

There are two problems with your aproach.

1. you are using "select where(" which is the real root cause for bigger table. once you use select where it is operation of updating the data table of selecting rows (in my case >300K rows) you are just using Big class which is not even counted for a file size.

2. this does not work for the situation below

if i had to correlate height data on X_axis for sex="M" and height data on Y_axis for sex="F".

i hope it can help you to undestand the problem now

Thanks