BookmarkSubscribeRSS Feed
Choose Language Hide Translation Bar
vince_faller
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

From the sounds of it, you are subsetting tables by a column and joining them together.  You may have just oversimplified it for example sake, but if I wanted to plot Y(height, sex==M), X(height, sex == F).  I would just split the table.  These two scripts, a subset and join vs a split, show the split takes about a third of the time.  

 

names default to here(1);
dt = new table("Test", 
	add rows(1000000), 
	New column("sex", character, <<Set Each Value(choose(random integer(1, 2), "M", "F"))), 
	New Column("height", <<Set Each Value(random normal(70, 5))), 
	New Column("Joiner", <<Set Each Value(col cumulative sum(1, :sex)))
);


st = HPTime();
// subset and join
rows = dt << get rows where(:sex == "M");
dt_sub1 = dt << subset(rows(rows), "linked", private, output table("M"));

rows = dt << get rows where(:sex == "F");
dt_sub2 = dt << subset(rows(rows), "linked",output table("F"));

dt_j = dt_sub1 << Join(
	With( dt_sub2 ),
	By Matching Columns( :Joiner = :Joiner ),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 1, 1 ),
	Preserve main table order( 1 ), 
	invisible
);

close(dt_sub1, no save);
close(dt_sub2, no save);

tot1 = HPTime() - st;

st = HPTime();
dt_split = dt << Split(
	Split By( :sex ),
	Split( :height ),
	Group( :Joiner ),
	Output Table( "Other Table" ),
	Sort by Column Property
);
tot_split = HPTime() - st;
show(tot1, tot_split);


//tot1 = 1193242;
//tot_split = 382229;

Also, with Linked you cant change column names and other properties so you couldn't change the name of the column so the join looks a little uglier.  

 

 

Vince Faller - Predictum
0 Kudos

Re: Biggest problem in JMP in sub setting tables.... sluggishness

It sounds like there are several things going on in your script - selection, subsetting, creating reports, all inside a loop.  I think it would be helpful to see a sample script.  How is selection being done?  Are the subsets linked to the original data table?  Which platforms are being created? How many iterations of this loop are run?  If the subset tables are linked, and the reports have auto-recalc enabled, then maybe the process is slowing down with each iteration as all previous reports get notifications about the selection occuring in the source table.  If this is the problem, then the data filter approach that others have suggested may fix the problem because each report would be isolated from the next iteration of the loop.

ram_asra_gmail_
Community Trekker

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@danschikore 

it is not the number of iterations. it is the selection of rows which is slow when you have large file size.

0 Kudos
vince_faller
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

So with this table of a million rows.  It takes a sixth of second to run.  It's probably something to do with your script.  Like @danschikore said, if you gave an example script of it running slow we may be able to help you. 

 

Names Default to Here( 1 );
dt = New Table("Test", 
	Add Rows(1000000), 
	New Column("X", Character, <<Set Each Value(Random Category(.3, "A", .3, "B", "C"))), 
	New Column("Y", continuous, <<Set Each Value(Random Normal(0, 1)))
);

st = HPTime();
rows = dt << Get Rows Where(:X == "C");
dt << Subset(rows(rows));
tot = HPTime() - st;
show(tot);
//RETURNS
//tot = 148508;

If you're calling a sixth of a second sluggish, then I don't know what to tell you.  

Vince Faller - Predictum
pmroz
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Maybe your select criteria are very complex and that's what is slowing things down.  We can't help you any further unless you provide:

1. Your script

2. A sample of your data (anonymized if necessary)

ram_asra_gmail_
Community Trekker

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@pmroz 

@vince_faller  

Just being practicle here instead of pure theorist .

Below is the script with some added columns. depending on the runs, the result can take up to 2sec. i also added a wait(0), so that it can accomodate the display time updating also.

This time taken is just  the subsetiing two files. After that you can add extra time taken by script to join the two files by some criteria. and then making a bivariate chart. It really becomes slow.

One LOT worth of data is enough speed. but if i just add few more lots......

 

 

 


Try(close(datatable("Test"),nosave)); dt = New Table("Test", Add Rows(1000000), New Column("X", Character, <<Set Each Value(Random Category(0.4, "A", 0.4, "B", 0.4, "C", 0.2, "D", 0.25, "E", 0.25, "F", 0.25, "G"))), New Column("Y", continuous, Width( 15 ), <<Set Each Value(Random Normal(1, 134500))), New Column("a", continuous, Width( 15 ), <<Set Each Value(Random Normal(23, 200))), New Column("b", continuous, Width( 15 ), <<Set Each Value(Random Normal(04, 300))), New Column("Yc", continuous, Width( 15 ), <<Set Each Value(Random Normal(10,4100))), New Column("Yd", continuous, Width( 15 ), <<Set Each Value(Random Normal(10, 500000))), New Column("Ye", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 600000))), New Column("Yf", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 76789))), New Column("Yg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 8345678))), New Column("Yh", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 934567889))), New Column("Yi", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1000))), New Column("Yj", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1100))), New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 12000))), New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 13000))), New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1400))), New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1500))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 16))), New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 76456575))), New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 10))), New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 11))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 12))), New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 13))), New Column("Ygh", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 14))), new Column("X1", Character, <<Set Each Value(Random Category(.4, "X", .4, "Y", 0.3, "Z", 0.2, "P", 0.2, "Q"))), New Column("Y1", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 12342543))), New Column("a", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 2345375))), New Column("b", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 3345687))), New Column("Yc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0,4199686))), New Column("Yd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 51223454))), New Column("Ye", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 64545658))), New Column("Yf", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 74565))), New Column("Yg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 8436557))), New Column("Yh", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 9354))), New Column("Yi", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 10243))), New Column("Yj", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1124321))), New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1232543))), New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1356786))), New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 14214213))), New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 15678))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1612312))), New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 7576))), New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 91231))), New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 9978))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 32131))), New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 156756))), New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1232543))), New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1356786))), New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 14214213))), New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 15678))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 1612312))), New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 7576))), New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 91231))), New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 9978))), New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 32131))), New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(0, 156756))) ); clearlog(); Try(close(datatable("sub1"),nosave)); Try(close(datatable("sub2"),nosave)); st=HPTIME(); rows = dt << Get Rows Where(:X == "C"); dt << Subset(rows(rows),outputtable("sub1")); rows = dt << Get Rows Where(:X == "B"); dt << Subset(rows(rows),outputtable("sub2")); wait(0); st=1e-6*(HPTIME()-st); print(char(st)|| " Sec");
0 Kudos
Highlighted
vince_faller
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

We can only do theoretical stuff because you haven't told us what you're actually doing.  

 

So when running this it took about .3-.4 seconds for me for a single table (not two of them). I find that completely reasonable.   If you're doing a bunch of other actions along with the subset, then it's not fair to say subset is slow.  If you're running these in a for loop then it's likely what you're doing in the for loop that is causing the problem.  If you're running a script then I would say don't put in the wait(0) unless you need the user to see the table.  In fact you should make them private/invisible so that they never render and use unnecessary time. 

 

What are you actually trying to do?  I feel like this is an XY problem.  

 

If you let us know what you're actually doing we might be able to help you speed it up.  

Vince Faller - Predictum
pmroz
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

How complex are your "get rows where" queries?  What you posted should be quite fast because it's only looking for one value, but I've seen slower performance with more complicated queries.

 

Here's a simple example that shows a time difference for the subset command, by making the subsets invisible and then private.

dt = data table("Probe");
t1 = hp time();
new_rows = dt << get rows where(:Process == "New");
t2 = hp time();

old_rows = dt << get rows where(:Process == "Old");
t3 = hp time();

//s1 = dt << subset(rows(new_rows), invisible);
s1 = dt << subset(rows(new_rows), private);
t4 = hp time();

//s2 = dt << subset(rows(old_rows), invisible);
s2 = dt << subset(rows(old_rows), private);
t5 = hp time();

tn = t2 - t1;
to = t3 - t2;
ts1 = t4 - t3;
ts2 = t5 - t4;

print("New rows: " || char(tn));
print("Old rows: " || char(to));
print("New subset: " || char(ts1));
print("Old subset: " || char(ts2));

Invisible subset timings:

"New rows: 1095"
"Old rows: 964"
"New subset: 3527"
"Old subset: 3366"

Private subset timings:

"New rows: 949"
"Old rows: 883"
"New subset: 1861"
"Old subset: 1638"

As you can see changing from invisible to private reduced the subset time in half.

Re: Biggest problem in JMP in sub setting tables.... sluggishness

My computer is really busy, so the numbers vary a bit, but changing to linked subsets and removing the wait helps, a bunch. Removing the wait is roughly the same as invisible; it gets the table's paint-to-screen out of the measurement. Using a linked subset is a big win since a lot of data no longer needs to be moved about.

st=HPTIME();
rows = dt << Get Rows Where(:X == "C");
dt << Subset(rows(rows),outputtable("sub1"),"linked");
rows = dt << Get Rows Where(:X == "B");
dt << Subset(rows(rows),outputtable("sub2"),"linked");
//wait(0);
st=1e-6*(HPTIME()-st);
print(char(st)|| " Sec");

"2.534772 Sec" -- unlinked with wait(0)

"1.50435 Sec" -- unlinked

"0.467098 Sec" -- linked

 

Craige
ram_asra_gmail_
Community Trekker

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@Craige_Hales linked subset seems can reduce time by a lot. but i have to check with the script i posted two things:

1. By repeating do i get the improved avg time

2. will i be still able to join the two subset tables to make XY chart subsequently.

Thanks

0 Kudos