cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
Newbie2Jumpie
Level IV

How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

I have a "BIG" dataset containing several million rows. Each row has a specific IDENT field. IDENT may occur multiple times. I have another, smaller dataset "SMALL", that contains just hundreds of 1000s of rows. Each row has also a specific IDENT field.   

Now I just want to keep all the rows from BIG, that have the same IDENT in SMALL. In other words: How to filter BIG using the IDENT field from SMALL. Using SAS SQL, that would be easy-peasy, e.g.

proc sql ;

cretate table FILTERED

as select * from BIG

where IDENT in (select IDENT from SMALL) ;

quit ;

Can I do something like this in JMP?

kind regards

Newbie

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

If you run the Join interactively, you can then go to the data table that was just created, right click on the "Source" item, next to the green triangle in the table panel on the left edge of the data table, and select "Edit".  It will show you the JSL that was used to create the joined data table.

Jim

View solution in original post

4 REPLIES 4

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

Dear Newbie2Jumpie,

 

this is possible in several ways. It depends, if you want to create a new table or if you just want to filter. Let's start with creating a new table (check if you have enough RAM):

  1. Open both tables in JMP
  2. In the table menu you have "Join" where you can do a database like join as you require it to be, like this e.g. JoinWindow.JPG

    3. Press ok and you'll get a shortened new table and can close the other two tables.

Similar thing you can achieve with the menu Tables -> JMP Query Builder, however there you have even more options like in a database Query Builder and you get besides the source script an update script and other options you can use to change options for query. (Attached an example to illustrate this (Big Class.jmp, ID_Table.jmp, SQLQuery1.jmp)

 

The other option allows you filtering on the fly: Using Linked/referenced tables aka virtual join. There you need in the referenced table a column with the same name as in the Big table for the id, and another column with the same content as the ID column of the small table (so actually twice the same ID column, just with different names. (Please take a look in the second example. In the small table you specify the ID Column with the same name as in the big table as Link ID (right click on the column and Link ID). In the big table you right click on the ID column and select Link reference and as reference you choose the small table. You get all columns of the small table now also visualized in the big table as hidden columns. Now you can filter through this second ID column which only has entries like in the small table. This will exclude all others and therefore only take those rows into account which are of interest, but listed in a different table.virtual join.JPG

 

 

 

 

 

 

/****NeverStopLearning****/
Newbie2Jumpie
Level IV

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

I appreciate your efforts but I was interested in a scripting answer, not an interactive explanation.

txnelson
Super User

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

If you run the Join interactively, you can then go to the data table that was just created, right click on the "Source" item, next to the green triangle in the table panel on the left edge of the data table, and select "Edit".  It will show you the JSL that was used to create the joined data table.

Jim
Newbie2Jumpie
Level IV

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

in JSL script it looks like this:

data table ("Big") << join (with(data table("Small")),
by matching columns (::ID=:ID),
drop mutiples (0,0),
include nonmatches(0,0),
preserve main table order(1)) ;


Beware of possible tiny typos...