cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Polygon0516
Level I

Remove Duplicate Entries from SQLQuery Based on Date

Hello,

 

I've tried to use this video linked below to use the SQLQuery function. However, I'm having an issue.

community.jmp.com/t5/video/gallerypage/video-id/tvcW9lZTE6Ie3IFXxeru0AYNqvj5AB5r

 

 

Problem: Take Table 2 with "cylinder" "Received_Date" and "lot" & merge that into Table 1 that has "batch", "cylinder" and "Batch_Date"

I need to match exactly on "Cylinder" and then take the nearest backward "Received Date" to the "Batch Date" (Batch_Date >= Received_Date).

I created the query & also did the step to create a numeric tag that should theoretically provide only 1 unique lot number for each batch (not shown in table below)

 

However, this is what I'm getting

BatchBatch DateCylinder NumberLot NumberReceived Date
12/13/2023 55A6/14/2022
12/13/2023 55B10/21/2022
12/13/2023 55C11/30/2022

 

There should only be 1 row populated from the query (should be the last row 11/30/2022); the other ones are "ghosts" as we only made 1 "Batch 1"

 

Any help on how to improve the SQL code to prevent duplicate entries & pick the entry only if the received date is the closest <= batch date?

 

This was the code from the SQLQuery interface that was given

 

SELECT DISTINCT t1.Batch, t1.Batch_Date AS Date, t1.Cylinder_Number,
t2.Lot_Number, t2.Cylinder_Number AS "Cylinder_Number 2", t2.Lot_Date AS "Date 2", MIN(t2."Batch Num") AS "Minimum-Batch Num"
FROM "Table 1" t1
LEFT OUTER JOIN "Table 2" t2
ON ( ( t1.Cylinder_Number = t2.Cylinder_Number ) AND ( t1.Cat_Date >= t2.Lot_Date ) )
GROUP BY t1.Batch, t1.Batch_Date, t1.Cylinder_Number, 
t2.Lot_Number, t2.Cylinder_Number, t2.Lot_Date;

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Remove Duplicate Entries from SQLQuery Based on Date

This is quite close to the limit on what I would attempt to do with JMP's limited SQL capabilities (at least when using data tables). It is most likely simpler to use SQL + JSL

Names Default To Here(1);

res = New SQL Query(
	Version(130),
	Connection("JMP"),
	JMP Tables(
		["t1" => "$DOWNLOADS\Table1-1.jmp",
		"t2" => "$DOWNLOADS\Table 2-2.jmp"]
	),
	QueryName("SQLQuery2"),
	CustomSQL(
		"\[SELECT 
			t1.Cylinder, 
			t1.Batch, 
			t1."Batch Date", 
			t2."Received Date",
			t2.Supplier, 
			t2.Lot
		FROM t1
			LEFT OUTER JOIN t2 ON 
				t1.Cylinder = t2.Cylinder
				AND t1."Batch Date" >= t2."Received Date"

	]\")
) << Run;


res << select where(Row() == Col Max(Row(), :Cylinder, :Batch) | Is Missing(:Lot));
res << Invert Row Selection;
res << delete rows;
-Jarmo

View solution in original post

10 REPLIES 10
jthi
Super User

Re: Remove Duplicate Entries from SQLQuery Based on Date

Are these JMP tables you are performing the SQL query to? Can you share the tables?

-Jarmo
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Just replied to the main thread with the files, thanks for reaching out

jthi
Super User

Re: Remove Duplicate Entries from SQLQuery Based on Date

This is quite close to the limit on what I would attempt to do with JMP's limited SQL capabilities (at least when using data tables). It is most likely simpler to use SQL + JSL

Names Default To Here(1);

res = New SQL Query(
	Version(130),
	Connection("JMP"),
	JMP Tables(
		["t1" => "$DOWNLOADS\Table1-1.jmp",
		"t2" => "$DOWNLOADS\Table 2-2.jmp"]
	),
	QueryName("SQLQuery2"),
	CustomSQL(
		"\[SELECT 
			t1.Cylinder, 
			t1.Batch, 
			t1."Batch Date", 
			t2."Received Date",
			t2.Supplier, 
			t2.Lot
		FROM t1
			LEFT OUTER JOIN t2 ON 
				t1.Cylinder = t2.Cylinder
				AND t1."Batch Date" >= t2."Received Date"

	]\")
) << Run;


res << select where(Row() == Col Max(Row(), :Cylinder, :Batch) | Is Missing(:Lot));
res << Invert Row Selection;
res << delete rows;
-Jarmo
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Hello - thank you for the quick help! I tried the macro & unfortunately for some items it is keeping the oldest possible date. It doesn't appear to be doing this for all entries though ( see last row of the table below).

 

So this is a definite improvement that there aren't multiple rows for the same batch, but it's not consistently pulling the right lot/received date for that batch in question.

 

Example

 

CylinderBatchBatch DateReceived DateSupplierLot Should Be Received Date
822738/9/20235/20/2022B55 6/22/2023
322748/10/20235/20/2022B55 6/22/2023
222758/11/20235/20/2022B55 6/22/2023
1022768/12/20235/20/2022B55 6/22/2023
622778/12/20235/20/2022B55 6/22/2023
922788/13/20235/20/2022B55 6/22/2023
122798/14/20235/20/2022B55 6/22/2023
21222808/15/20236/22/2023B74 (file is correct)
jthi
Super User

Re: Remove Duplicate Entries from SQLQuery Based on Date

I'm not really sure how to read your table. If I take a look at Table 2-2 there is only 1 row for Cylinder 1 which has Received date 05/20/2022

jthi_0-1733329094365.png

 

-Jarmo
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Hi - replying to keep the topic active. I think something occurred when I updated my file to remove proprietary information. I will look at it more tomorrow & will get back to you with a reply.

Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Thank you for this assistance! I realized there was an error within my original data set (I modified it for entry on this website to anonymize the data). This code has been working well, will reach out with any other questions.
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Here are the files - sorry, meant to attach these.

Table 1-1 and Table 2-2 are as described above.

Query Results shows the final table where there are multiple entries for each batch (but there should only be 1 entry per batch as described above).

Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

One last question - how can I add where more columns are pulled over from Table 2 into the final query table, even if I'm not using them as a match criteria? 

I tried adding into your code above a t2."wt%" column but the code now throws me an error. I have other columns I'd like to pull in at the same time (about 10 columns in total) but not sure how to do this without getting an error.

 

Thank you!