Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Re: Biggest problem in JMP in sub setting tables.... sluggishness

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

From the sounds of it, you are subsetting tables by a column and joining them together. You may have just oversimplified it for example sake, but if I wanted to plot Y(height, sex==M), X(height, sex == F). I would just split the table. These two scripts, a subset and join vs a split, show the split takes about a third of the time.

```
names default to here(1);
dt = new table("Test",
add rows(1000000),
New column("sex", character, <<Set Each Value(choose(random integer(1, 2), "M", "F"))),
New Column("height", <<Set Each Value(random normal(70, 5))),
New Column("Joiner", <<Set Each Value(col cumulative sum(1, :sex)))
);
st = HPTime();
// subset and join
rows = dt << get rows where(:sex == "M");
dt_sub1 = dt << subset(rows(rows), "linked", private, output table("M"));
rows = dt << get rows where(:sex == "F");
dt_sub2 = dt << subset(rows(rows), "linked",output table("F"));
dt_j = dt_sub1 << Join(
With( dt_sub2 ),
By Matching Columns( :Joiner = :Joiner ),
Drop multiples( 0, 0 ),
Include Nonmatches( 1, 1 ),
Preserve main table order( 1 ),
invisible
);
close(dt_sub1, no save);
close(dt_sub2, no save);
tot1 = HPTime() - st;
st = HPTime();
dt_split = dt << Split(
Split By( :sex ),
Split( :height ),
Group( :Joiner ),
Output Table( "Other Table" ),
Sort by Column Property
);
tot_split = HPTime() - st;
show(tot1, tot_split);
//tot1 = 1193242;
//tot_split = 382229;
```

Also, with Linked you cant change column names and other properties so you couldn't change the name of the column so the join looks a little uglier.

Vince Faller - Predictum

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

So with this table of a million rows. It takes a sixth of second to run. It's probably something to do with your script. Like @danschikore said, if you gave an example script of it running slow we may be able to help you.

```
Names Default to Here( 1 );
dt = New Table("Test",
Add Rows(1000000),
New Column("X", Character, <<Set Each Value(Random Category(.3, "A", .3, "B", "C"))),
New Column("Y", continuous, <<Set Each Value(Random Normal(0, 1)))
);
st = HPTime();
rows = dt << Get Rows Where(:X == "C");
dt << Subset(rows(rows));
tot = HPTime() - st;
show(tot);
//RETURNS
//tot = 148508;
```

If you're calling a sixth of a second sluggish, then I don't know what to tell you.

Vince Faller - Predictum

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Maybe your select criteria are very complex and that's what is slowing things down. We can't help you any further unless you provide:

1. Your script

2. A sample of your data (anonymized if necessary)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Just being practicle here instead of pure theorist .

Below is the script with some added columns. depending on the runs, the result can take up to 2sec. i also added a wait(0), so that it can accomodate the display time updating also.

This time taken is just the subsetiing two files. After that you can add extra time taken by script to join the two files by some criteria. and then making a bivariate chart. It really becomes slow.

One LOT worth of data is enough speed. but if i just add few more lots......

`Try(close(datatable("Test"),nosave)); dt = New Table("Test", Add Rows(1000000), New Column("X", Character, <<Set Each Value(Random Category(0.4, "A", 0.4, "B", 0.4, "C", 0.2, "D", 0.25, "E", 0.25, "F", 0.25, "G"))), New Column("Y", continuous, Width( 15 ), <<Set Each Value(Random Normal(1, 134500))), New Column("a", continuous, Width( 15 ), <<Set Each Value(Random Normal(23, 200))), New Column("b", continuous, Width( 15 ), <<Set Each Value(Random Normal(04, 300))), New Column("Yc", continuous, Width( 15 ), <<Set Each Value(Random Normal(10,4100))), New Column("Yd", continuous, Width( 15 ), <<Set Each Value(Random Normal(10, 500000))), New Column("Ye", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 600000))), New Column("Yf", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 76789))), New Column("Yg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 8345678))), New Column("Yh", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 934567889))), New Column("Yi", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1000))), New Column("Yj", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1100))), New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 12000))), New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 13000))), New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1400))), New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1500))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 16))), New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 76456575))), New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 10))), New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 11))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 12))), New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 13))), New Column("Ygh", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 14))), new Column("X1", Character, <<Set Each Value(Random Category(.4, "X", .4, "Y", 0.3, "Z", 0.2, "P", 0.2, "Q"))), New Column("Y1", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 12342543))), New Column("a", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 2345375))), New Column("b", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 3345687))), New Column("Yc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0,4199686))), New Column("Yd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 51223454))), New Column("Ye", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 64545658))), New Column("Yf", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 74565))), New Column("Yg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 8436557))), New Column("Yh", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 9354))), New Column("Yi", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 10243))), New Column("Yj", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1124321))), New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1232543))), New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1356786))), New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 14214213))), New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 15678))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1612312))), New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 7576))), New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 91231))), New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 9978))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 32131))), New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 156756))), New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1232543))), New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1356786))), New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 14214213))), New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 15678))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1612312))), New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 7576))), New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 91231))), New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 9978))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 32131))), New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 156756))) ); clearlog(); Try(close(datatable("sub1"),nosave)); Try(close(datatable("sub2"),nosave)); st=HPTIME(); rows = dt << Get Rows Where(:X == "C"); dt << Subset(rows(rows),outputtable("sub1")); rows = dt << Get Rows Where(:X == "B"); dt << Subset(rows(rows),outputtable("sub2")); wait(0); st=1e-6*(HPTIME()-st); print(char(st)|| " Sec");`

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

We can only do theoretical stuff because you haven't told us what you're actually doing.

So when running this it took about .3-.4 seconds for me for a single table (not two of them). I find that completely reasonable. If you're doing a bunch of other actions along with the subset, then it's not fair to say subset is slow. If you're running these in a for loop then it's likely what you're doing in the for loop that is causing the problem. If you're running a script then I would say don't put in the wait(0) unless you need the user to see the table. In fact you should make them private/invisible so that they never render and use unnecessary time.

What are you actually trying to do? I feel like this is an XY problem.

If you let us know what you're actually doing we might be able to help you speed it up.

Vince Faller - Predictum

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

How complex are your "get rows where" queries? What you posted should be quite fast because it's only looking for one value, but I've seen slower performance with more complicated queries.

Here's a simple example that shows a time difference for the subset command, by making the subsets invisible and then private.

```
dt = data table("Probe");
t1 = hp time();
new_rows = dt << get rows where(:Process == "New");
t2 = hp time();
old_rows = dt << get rows where(:Process == "Old");
t3 = hp time();
//s1 = dt << subset(rows(new_rows), invisible);
s1 = dt << subset(rows(new_rows), private);
t4 = hp time();
//s2 = dt << subset(rows(old_rows), invisible);
s2 = dt << subset(rows(old_rows), private);
t5 = hp time();
tn = t2 - t1;
to = t3 - t2;
ts1 = t4 - t3;
ts2 = t5 - t4;
print("New rows: " || char(tn));
print("Old rows: " || char(to));
print("New subset: " || char(ts1));
print("Old subset: " || char(ts2));
```

Invisible subset timings:

"New rows: 1095" "Old rows: 964" "New subset: 3527" "Old subset: 3366"

Private subset timings:

"New rows: 949" "Old rows: 883" "New subset: 1861" "Old subset: 1638"

As you can see changing from invisible to private reduced the subset time in half.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

My computer is really busy, so the numbers vary a bit, but changing to linked subsets and removing the wait helps, a bunch. Removing the wait is roughly the same as invisible; it gets the table's paint-to-screen out of the measurement. Using a linked subset is a big win since a lot of data no longer needs to be moved about.

```
st=HPTIME();
rows = dt << Get Rows Where(:X == "C");
dt << Subset(rows(rows),outputtable("sub1"),"linked");
rows = dt << Get Rows Where(:X == "B");
dt << Subset(rows(rows),outputtable("sub2"),"linked");
//wait(0);
st=1e-6*(HPTIME()-st);
print(char(st)|| " Sec");
```

"2.534772 Sec" -- unlinked with wait(0)

"1.50435 Sec" -- unlinked

"0.467098 Sec" -- linked

Craige

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@Craige_Hales linked subset seems can reduce time by a lot. but i have to check with the script i posted two things:

1. By repeating do i get the improved avg time

2. will i be still able to join the two subset tables to make XY chart subsequently.

Thanks