Solved: Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery...

Newbie2Jumpie · Jun 28, 2019 01:13 AM

I have a "BIG" dataset containing several million rows. Each row has a specific IDENT field. IDENT may occur multiple times. I have another, smaller dataset "SMALL", that contains just hundreds of 1000s of rows. Each row has also a specific IDENT field.

Now I just want to keep all the rows from BIG, that have the same IDENT in SMALL. In other words: How to filter BIG using the IDENT field from SMALL. Using SAS SQL, that would be easy-peasy, e.g.

proc sql ;

cretate table FILTERED

as select * from BIG

where IDENT in (select IDENT from SMALL) ;

quit ;

Can I do something like this in JMP?

kind regards

Newbie

txnelson · Jun 28, 2019 3:37 AM

If you run the Join interactively, you can then go to the data table that was just created, right click on the "Source" item, next to the green triangle in the table panel on the left edge of the data table, and select "Edit". It will show you the JSL that was used to create the joined data table.

Jim

View solution in original post

martindemel · Jun 28, 2019 03:02 AM

Dear Newbie2Jumpie,

this is possible in several ways. It depends, if you want to create a new table or if you just want to filter. Let's start with creating a new table (check if you have enough RAM):

Open both tables in JMP
In the table menu you have "Join" where you can do a database like join as you require it to be, like this e.g.
3. Press ok and you'll get a shortened new table and can close the other two tables.

Similar thing you can achieve with the menu Tables -> JMP Query Builder, however there you have even more options like in a database Query Builder and you get besides the source script an update script and other options you can use to change options for query. (Attached an example to illustrate this (Big Class.jmp, ID_Table.jmp, SQLQuery1.jmp)

The other option allows you filtering on the fly: Using Linked/referenced tables aka virtual join. There you need in the referenced table a column with the same name as in the Big table for the id, and another column with the same content as the ID column of the small table (so actually twice the same ID column, just with different names. (Please take a look in the second example. In the small table you specify the ID Column with the same name as in the big table as Link ID (right click on the column and Link ID). In the big table you right click on the ID column and select Link reference and as reference you choose the small table. You get all columns of the small table now also visualized in the big table as hidden columns. Now you can filter through this second ID column which only has entries like in the small table. This will exclude all others and therefore only take those rows into account which are of interest, but listed in a different table.

/****NeverStopLearning****/

Newbie2Jumpie · Jun 28, 2019 1:20 AM

I appreciate your efforts but I was interested in a scripting answer, not an interactive explanation.

txnelson · Jun 28, 2019 3:37 AM

If you run the Join interactively, you can then go to the data table that was just created, right click on the "Source" item, next to the green triangle in the table panel on the left edge of the data table, and select "Edit". It will show you the JSL that was used to create the joined data table.

Jim

Newbie2Jumpie · Jul 15, 2019 8:17 AM

in JSL script it looks like this:

data table ("Big") << join (with(data table("Small")),
by matching columns (::ID=:ID),
drop mutiples (0,0),
include nonmatches(0,0),
preserve main table order(1)) ;

Beware of possible tiny typos...

How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)

Re: How to filter a dataset using a separate dataset with 1000s of IDs (subquery, scripting)