Choose Language Hide Translation Bar
Highlighted
ram_asra_gmail_
Community Trekker

Re: Biggest problem in JMP in sub setting tables.... sluggishness

I think this trick can help to reduce some time. I already take advantage of invisible subset. but it can do so much only.
Thanks for doing some experiments.
0 Kudos
ram_asra_gmail_
Community Trekker

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Here is a simpler overview of what i am doing. remeber this is the simplest form of datatable and script which is still fast compared to actual. I ran this on my comp and getting avg 2sec. some people say this is not bad. but remember we are talking about computer time not human time. 2sec is like WW2 time period.

you can try this to know how bad it can be: reduce and increase number of rows by a factor of 10 in the " add rows(1000000).

Try(close(datatable("Test"),nosave));
dt = New Table("Test", 
	Add Rows(1000000), 
	New Column("X", Character, <<Set Each Value(Random Category(0.4, "Alpha_alpha_alpha", 0.4, "Beta_beta_beta", 0.4, "Ceta_ceta_ceta", 0.2, "Data_data_ceta", 0.25, "Eta_eta_eta", 0.25, "Feta_feta_feta", 0.25, "Geta_geta_geta"))), 
	New Column("Y", continuous, Width( 15 ), <<Set Each Value(Random Normal(1, 134500))),
	New Column("a", continuous, Width( 15 ), <<Set Each Value(Random Normal(23, 200))),
	New Column("b", continuous, Width( 15 ), <<Set Each Value(Random Normal(04, 300))),
	New Column("Yc", continuous, Width( 15 ), <<Set Each Value(Random Normal(10,4100))),
	New Column("Yd", continuous, Width( 15 ), <<Set Each Value(Random Normal(10, 500000))),
	New Column("Ye", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 600000))),
	New Column("Yf", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 76789))),
	New Column("Yg", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 8345678))),
	New Column("Yh", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 934567889))),
	New Column("Yi", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1000))),
	New Column("Yj", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1100))),
	New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 12000))),
	New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 13000))),
	New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1400))),
	New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1500))),
	New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 160))),
	New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 76456575))),
	New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1000))),
	New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1001))),
	New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1002))),
	New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1003))),
	New Column("Ygh", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1004))),
	
	new Column("X1", Character, <<Set Each Value(Random Category(.4, "X_alpha", .4, "Y_beta", 0.3, "Z_ceta", 0.2, "P_data", 0.2, "Q_eta"))),
	new Column("P1", Character, <<Set Each Value(Random Category(.4, "XA_alpha", .4, "YA_beta", 0.3, "ZA_ceta", 0.2, "PA_data", 0.2, "QA_eta"))), 
	New Column("Y1", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 12342543))),
	New Column("a", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 2345375))),
	New Column("b", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 3345687))),
	New Column("Yc", continuous, Width( 15 ), <<Set Each Value(Random Normal(5,4199686))),
	New Column("Yd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 51223454))),
	New Column("Ye", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 64545658))),
	New Column("Yf", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 74565))),
	New Column("Yg", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 8436557))),
	New Column("Yh", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 9354))),
	New Column("Yi", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 10243))),
	New Column("Yj", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1124321))),
	New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1232543))),
	New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1356786))),
	New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 14214213))),
	New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 15678))),
	New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1612312))),
	New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 7576))),
	New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 91231))),
	New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 9978))),
	New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 32131))),
	New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 156756))),
	New Column("Yaa", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1232543))),
	New Column("Ybb", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1356786))),
	New Column("Yvv", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 14214213))),
	New Column("Ycc", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 15678))),
	New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 1612312))),
	New Column("Yav", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 7576))),
	New Column("Ydg", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 91231))),
	New Column("Yab", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 9978))),
	New Column("Ycd", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 32131))),
	New Column("Yef", continuous, Width( 15 ), <<Set Each Value(Random Normal(5, 156756))),invisible
);
avgmatrix={}; clearlog();

for(i=1,i<=12,i++,

Try(close(datatable("sub1"),nosave)); Try(close(datatable("sub2"),nosave));

st=HPTIME();
rows = dt << Get Rows Where(:X == "Alpha_alpha_alpha");
dt << Subset(rows(rows),outputtable("sub1"),invisible);
rows = dt << Get Rows Where(:X == "Beta_beta_beta");
dt << Subset(rows(rows),outputtable("sub2"),invisible);
column(datatable("sub2"),"Y1")<<setname("Y2");
column(datatable("sub2"),"X1")<<setname("X2");
column(datatable("sub2"),"P1")<<setname("P2");

datatable("sub1") << update(
	With( Data Table( "Sub2" ) ),
	Matching Columns( :X1 = :X2, :P1 = :P2 )
);
Legend_Item="P2";
Biv=datatable("sub1")<<Bivariate(	Y( :Y2 ),	X( :Y1 ),	Automatic Recalc( 1 ));
eval(parse("Report(Biv)[FrameBox(1)]<<Row Legend(\!""|| Legend_Item||"\!", Marker( 0 ), Marker Theme(\!" \!" ) )")) ;

wait(0);
avgmatrix[i]=1e-6*(HPTIME()-st);
//print(char(st)|| " Sec");
);
show(avgmatrix);

0 Kudos
David_Burnham
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

The biggest problem might be you failed to mention the update / join.

 

Capture5.PNG

-Dave
julian
Staff

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@Hi @ram_asra_gmail_,

I'm not sure how much of an issue this introduces in terms of speed, but it certainly will in terms of results: the syntax for your update appears incorrect. The proper specification for match columns using Update is Match Columns() not Matching Columns(). The latter will glue the update table on to your original table row-wise, which in fact will very likely be faster than doing the actual match. If you use Join, it's "By Matching Columns()."

 

Here's the correct syntax for Update:

 

datatable("sub1") << update(
	With( Data Table( "Sub2" ) ),
	Match Columns( :X1 = :X2, :P1 = :P2 )
);

 

Certainly not the only thing going on, but thought I should mention in case it wasn't spotted elsewhere!

@julian 

 

 

vince_faller
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Yeah, this has almost nothign to do with subsetting tables.  I can bring this script down to 100 rows and it still takes 7 seconds.  

 

My first suggestion would be to not run each bivariate independently.  Run them as platforms (or inside vlistbox()) so that they don't evaluate and append.  Then just show at the end.  Doing just that brought me to 3 seconds.  Script is attached.  

 

But it REALLY looks like you're just trying to split a data table.  The split function would probably be better than doing this.  If you have to do it like this for whatever reason. I'd concat all your dt_sub1 tables together with a by column and run them all at once.  But I bet this could be optimized a ton.  

Vince Faller - Predictum
David_Burnham
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

With the corrected syntax referred to by @julian and a reduced row count of 10,000 I had an average execution time of 2.45 seconds (I also made the bivariates invisible).

 

Replacing the update with an invisible join reduced the average execution time to 0.67 seconds.  Critical to this was dropping columns not needed for the bivariate.  Not dropping them would make it significantly slower.

 

If the join is the bottle-neck you might want to think about performing a virtual join.  To do that you would first need to create a single column that would be the unique key for the join.

 

-Dave