Choose Language Hide Translation Bar
Highlighted

Biggest problem in JMP in sub setting tables.... sluggishness

I have used multiple ways to subset a long table (rows >100K, very common size of datatable). The problem arises because subsetting requires selecting rows which means a operation on the table before subsetting. And if the table is long it makes it very slow. specially when a foor loop is required to generate plots by subsetting everytime a new parameter.

it does not matter whatever you use, it is just slow: I am not sure if JMP team have any solution to speed up the subsetting??????

 

select rows where

get rows where

25 REPLIES 25
Highlighted

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Why are you subsetting the data table?

 

Try selecting the data you don't want, set the Hide and Exclude row state, make your plot, and repeat for the next group.

Learn it once, use it forever!
Highlighted

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Also, could you accomplish the same task using filters?

Dan Obermiller
Highlighted

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@Dan_Obermiller 

could you add JSL to do that in for loop without the filter pop up?

Thanks

Highlighted

Re: Biggest problem in JMP in sub setting tables.... sluggishness

I think i mentioned it clearly that problem arises when you select the data, which slows it down.

but i may be not able to understand what you are suggesting. could you provide JSL example. surely i will try if it speeds up things.

Thanks

Ram

Highlighted
vince_faller
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

I've never had a problem if I use get rows where or even just the rows.  You don't actually have to select them.  

 

names default to here(1);
dt = open("$SAMPLE_DATA\Big Class.jmp");
rows = dt << Get Rows Where(:sex == "M");
dt_sub = dt << Subset(rows(rows));

 What are you using to select the rows?  It's possible that's slow. 

Vince Faller - Predictum
Highlighted

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@vince_faller  currently i use the same method. but when file size becomes bigger, even get rows  matrix becomes bigger and slow. 

 

if you use small data table like

Big Class.jmp

, anything can work. problem is with Big data files.

Highlighted
vince_faller
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

So with this table of a million rows.  It takes a sixth of second to run.  It's probably something to do with your script.  Like Dan said, if you gave an example script of it running slow we may be able to help you. 

 

Names Default to Here( 1 );
dt = New Table("Test", 
	Add Rows(1000000), 
	New Column("X", Character, <<Set Each Value(Random Category(.3, "A", .3, "B", "C"))), 
	New Column("Y", continuous, <<Set Each Value(Random Normal(0, 1)))
);

st = HPTime();
rows = dt << Get Rows Where(:X == "C");
dt << Subset(rows(rows));
tot = HPTime() - st;
show(tot);
//RETURNS
//tot = 148508;

If you're calling a sixth of a second sluggish, then I don't know what to tell you.  

 

Also, if you're using a for loop to generate a bunch of plots for each parameter, is there a reason you're not using a by?

Vince Faller - Predictum
Highlighted
txnelson
Super User

Re: Biggest problem in JMP in sub setting tables.... sluggishness

Here is a simple example of how to do the analyses by groups without subsetting

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/big class.jmp" );

// Reset the data table row states
dt << clear row states;

// Find the different values for Sex
Summarize( byGroup = By( :sex ) );

// Loop across the different groups and analyze them
For( i = 1, i <= N Items( byGroup ), i++,

	// Reset the data table row states
	dt << clear row states;
	
	// Hide and Exclude the unwanted rows
	dt << select where( :Sex != byGroup[i] );
	dt << hide and exclude;
	
	// Run the analysis
	ow = dt << oneway( x( :age ), y( :height ) );
	
	// Change the title of the report to show what data are in the analysis
	Report( ow )[Outline Box( 1 )] << set title(
		Report( ow )[Outline Box( 1 )] << get title || ": Sex=" || byGroup[i]
	);
);

// Reset the data table row states
dt << clear row states;
Jim
Highlighted

Re: Biggest problem in JMP in sub setting tables.... sluggishness

@txnelson 

There are two  problems with your aproach.

1. you are using "select where(" which is the real root cause for bigger table. once you use select where it is operation of updating the data table of selecting rows (in my case >300K rows) you are just using Big class which is not even counted for a file size.

 

2. this does not work for the situation below

if i had to correlate height data on X_axis for sex="M" and height data on Y_axis for sex="F".

i hope it can help you to undestand the problem now

Thanks

Article Labels

    There are no labels assigned to this post.