cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
jlrouquette
Level III

Return a Specific Row With Multiple Criteria - For Beginners

I'm not new to JMP but I'm pretty much brand new to the scripting of this sort.  

 

I have a table with ≈8mm rows and 80 columns.  Ive just updated it with 3 additional columns and 2,000 rows.  I need to return a specific "Formation" from the 2,000-row update for each of the 8mm rows from the original table. 

 

I need to do so under a couple criteria.  IF "well.name" = "Well.Name" AND "BitDepth" < "FormationTVD" THEN "Formation".

 

I can break the update back out of the table if that makes more sense but I couldn't figure out how to manage this another way either.  

 

Lastly, I've read several discussions on this or similar topics and I've seen things similar to "dt = blahblah; dt << blah" which I don't understand at all.  If explaining this in those terms makes the most sense please do as thoroughly as you're able to.        

 

28 REPLIES 28
julian
Community Manager Community Manager

Re: Return a Specific Row With Multiple Criteria - For Beginners

Hi @Justin_Chilton,

Very similar! I wonder which will be fastest. In your script you loop through the main table, and mine I loop through the lookup table. @jlrouquette will have to report back :)

 

By the way, if there is ever an interest in *all* matching formations, JMP Query Builder (Tables > JMP Query Builder) can be set up with these conditions and will return as separate rows all the matching formations. 

Screen Shot 2018-02-15 at 5.21.13 PM.png@julian

jlrouquette
Level III

Re: Return a Specific Row With Multiple Criteria - For Beginners

I always appreciate a little healthy competition!  In this case, its hards to say though.  Both of the solutions ran on my test set but neither will process on the large set.  Is it possible it's a RAM thing?  They instantly force JMP into "Not Responding" and neither has come out of it yet (I'm running both codes side by side on my work laptop and school laptop).       

jlrouquette
Level III

Re: Return a Specific Row With Multiple Criteria - For Beginners

I tried subsetting my larger dataset into sets of around 250k rows and the script still won't process.  My machine has 16GM.  Not sure what could be causing this problem.    

Craige_Hales
Super User

Re: Return a Specific Row With Multiple Criteria - For Beginners

Both of the solutions have nested loops: the outer for loop, and the inner (implied) where loop. This makes the run time proportional to nrows(dt1) * nrows(dt2). I would expect one of the solutions to be faster than the other, but still too slow if both nrows are large. If it is possible to do this in smaller sets of data, then combine the results, you're done.

Otherwise you might  sort both tables, by WellName and Bit Depth and by WellReference and TVD Reference. Then a while loop moves row indexes forward in both tables, in parallel. If needed, a final sort can put the table back into original order.

Craige
julian
Community Manager Community Manager

Re: Return a Specific Row With Multiple Criteria - For Beginners

Well that's not what we were hoping for!  What happens with an even smaller subset, say 10k rows? After you subset, be sure to change the data table reference at the start of the script so it points to the subset, not the original table.

 

dt1 = Data Table( "Subset of MainTable" );

 

Alternatively, you could add the following jsl after defining your tables to do the subset of your main table and replace the data table pointer in one step:

dt1 = dt1<< Subset( Sample Size( 10000 ), Selected columns only( 0 ) );

 

jlrouquette
Level III

Re: Return a Specific Row With Multiple Criteria - For Beginners

It works on ≈1k and ≈10k just fine but takes a little time.  I tried it on 100k and it takes forever.  The fact that it still works is fantastic but subsetting 20mm rows in such a way that I can actually do the work will still talk a lot of time.

 

I added some script for a progress bar with Justin's help so this is what I'm using right now based on how I subset the data.  The actual progress bar isn't working now though for some reason even though the actual search for formation is working [slowly].

 

Can you suggest any corrections here that would 1) fix the address bar situation or 2) speed it up?  I saw that you added some script in your response for reducing the data that it finds but I think I've got that covered (unless you say otherwise) now with simply subsetting by well.    

 

 

progress = New Namespace(
"progress"
);
progress:init = Function( {workload},
progress:TotalWork = workload;
progress:Current = -1;
progress:tick = Tick Seconds();
progress:Window = New Window( "progress",
H List Box(
Spacer Box( size( 20, 20 ) ),
progress:busy = Busy Light( <<size( 60, 60 ) ),
Border Box( Left( 20 ), Right( 20 ), top( 20 ), bottom( 20 ),
H List Box(
progress:left = Spacer Box( color( RGB Color( 20, 200, 20 ) ) ),
progress:right = Spacer Box( color( RGB Color( 100, 100, 100 ) ) ),
Spacer Box( size( 20, 10 ) ),
progress:text = Text Box( "0" )
)
)
)
);
progress:set( 0 );
);

progress:set = Function( {workdone},
{now = Tick Seconds(), pct = Round( 100 * workdone / progress:TotalWork )},
If( now - progress:tick > 1 / 30,
progress:tick = now;
If( pct != progress:Current,
progress:Current = pct;
progress:left << size( progress:Current, 20 );
progress:right << size( 100 - progress:Current, 20 );
progress:text << settext( Char( pct ) || "%" );
);
progress:busy << advance;
Wait( 0 );
)
);

progress:term = Function( {},
progress:Window << closeWindow
);

/////////////////////////
// test
/////////////////////////

dt1 = Data Table( "well name=xxxxxxxxx" );
dt2 = Data Table( "Tops" );
dt1 << Begin Data Update();
// loop through each table in the main table
For( iRow = 1, iRow <= Nrow( dt1 ), iRow++,

// get rows that meet the two conditions
r = dt2 << Get Rows Where(
And(
dt1:well name[iRow] == dt2:WellReference,
dt1:BitDepth[iRow] < dt2:TVDReference
)
);

// if we have at least one matching row, get the first formation
If( Nrow( r ) > 0,
dt1:ActualFormation[iRow] = dt2:FormationReference[r[1]];
);
);
dt1 << End Data Update();
progress:term();

 

jlrouquette
Level III

Re: Return a Specific Row With Multiple Criteria - For Beginners

I've removed wells from the secondary table that aren't present in each respective subset of data.  I also removed the progress bar script as well as the and portion of the code since the well reference doesn't matter if I remove all of the wells that don't match the set I'm looking at.  I will still have to do this individually for each of the 80 subsets but it took the process took less than 10 seconds for 160k rows, I can live with that!  

 

 

dt1 = Data Table( "well name=OAK 55-1-30 UNIT 1H" );
dt2 = Data Table( "Tops" );
dt1 << Begin Data Update();
// loop through each table in the main table
For( iRow = 1, iRow <= Nrow( dt1 ), iRow++,

// get rows that meet the two conditions
r = dt2 << Get Rows Where(
dt1:BitDepth[iRow] < dt2:TVDReference
);

// if we have at least one matching row, get the first formation
If( Nrow( r ) > 0,
dt1:ActualFormation[iRow] = dt2:FormationReference[r[1]];
);
);
dt1 << End Data Update();
progress:term();

 

Re: Return a Specific Row With Multiple Criteria - For Beginners

One last thing for you to try. I think that putting this same logic within a Column Formula might work better (although not necessarily faster). 

The reason I say it is better is because JMP is handling the formula execution for each row, instead of a single long running JSL script. With a formula, each execution of the formula is done for each row, where an individual row does not take a ton of time, but when done 19 million times in a single script execution, it really adds up. Also, since formula execution is done as a background task, you should not get JMP into an unresponsive state, but it will still take a really long time.  

If( Row() == 1,
	dt2 = Data Table( "formation" )
);
wn = :well_name;
bd = :BitDepth;
r = dt2 << Get Rows Where( wn == dt2:WellReference & bd < dt2:TVDReference );
If( N Row( r ) > 0,
	dt2:FormationReference[r[1]],
	""
);

On my computer here (i7-4770 with 32 GB of RAM), it took about 25 min to get through 18 million rows (the 36 rows from the subset repeating) from your dataset.

I have a feeling that the previous method of using the JSL script may have been working, but since JMP didn't respond for so long, there's no way of knowing where it's at in the process. With the formula, I was able to do other things while the formula is evaluating in the background (although I wouldn't reccomend trying to do anything else).

EDIT:  The script above should be added as a formula in your "main" table, which would be the "ActualFormation" column.

Justin
julian
Community Manager Community Manager

Re: Return a Specific Row With Multiple Criteria - For Beginners

Hi @jlrouquette and @Justin_Chilton,

 

I spent a little more time thinking about how to accomplish this join interactively in JMP, and I figured out a way to handle it through JMP Query Builder for Tables (Tables > Query Builder). This may or may not be faster than Justin's last solution, but it's worth a try. Below is a short video showing how I did it. General steps are as follows (easier than it sounds): 1) Sort lookup table ascending based on TVD, 2) Number the Formations 1..n, 3) Launch Tables > JMP Query Builder, 4) Setup Join to have WellName = WellReference, and BitDepth < TVD Reference, 5) Include only WellName, Bit Depth, and Formation Number as columns in new table, 6) Aggregate (min) on Formation Number; 7) Use Tables > Update to bring in Formation Reference based on Formation number. 

 

I hope this might help!

 

@julian

 

edit: If you *do* want to go the scripting approach, I've attached an example script that uses the JMP Query Builder, and performs the update at the end to bring in the formation names. You will probably need to change based on what columns you want to end up with, but this should get you started. I think this might be much faster -- I was able to process a mock table of 250k rows in about 5 seconds.

 

edit 2:  This definitely seems faster. 18 million rows processed in 7 minutes 56 seconds (MBP 2.7ghzi7, 16gb ram).

 

 

 

Names Default To Here(1);


//get handles for the tables
dt1 = Data Table( "MainTable" );
dt2 = Data Table( "LookupTable" );

//Sort Table
dt2 << Sort( By( :TVD Reference ), Order( Ascending ), Replace Table );

//Make formation number column
formNum = dt2 << New Column("Formation Num", Numeric);
formNum << Set Each Value(Row());


dt3 = qbTable = New SQL Query(
	Version( 130 ),
	Connection( "JMP" ),
	JMP Tables(
	
	//---------------------------
	//Change table locations here if other names
	//and change references in script	
	[
		"LookupTable" => "_MEMORY_",
		"MainTable" => "_MEMORY_"
	]
	//--------------------------	
		
		
	),
	QueryName( "SQLQuery2" ),
	Select(
		Column( "WellName", "t1" ),
		Column( "BitDepth", "t1" ),
		Column(
			"Formation Num",
			"t2",
			Alias( "Minimum-Formation Num" ),
			SavedJMPName( "Formation Num" ),
			Aggregation( "Minimum" )
		)
	),
	From(
		Table( "MainTable", Alias( "t1" ) ),
		Table(
			"LookupTable",
			Alias( "t2" ),
			Join(
				Type( Left Outer ),
				EQ( Column( "WellName", "t1" ), Column( "WellReference", "t2" ) ) &
				LT( Column( "BitDepth", "t1" ), Column( "TVD Reference", "t2" ) )
			)
		)
	),
	Group By( Column( "WellName", "t1" ), Column( "BitDepth", "t1" ) )
) << Run;


//Update from Lookup table to get formation names
dt3 << Update(
	With( dt2 ),
	Match Columns( :Name( "Minimum-Formation Num" ) = :Formation Num ),
	Add Columns from Update table( :FormationReference )
);