cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
Polygon0516
Level I

Remove Duplicate Entries from SQLQuery Based on Date

Hello,

 

I've tried to use this video linked below to use the SQLQuery function. However, I'm having an issue.

community.jmp.com/t5/video/gallerypage/video-id/tvcW9lZTE6Ie3IFXxeru0AYNqvj5AB5r

 

 

Problem: Take Table 2 with "cylinder" "Received_Date" and "lot" & merge that into Table 1 that has "batch", "cylinder" and "Batch_Date"

I need to match exactly on "Cylinder" and then take the nearest backward "Received Date" to the "Batch Date" (Batch_Date >= Received_Date).

I created the query & also did the step to create a numeric tag that should theoretically provide only 1 unique lot number for each batch (not shown in table below)

 

However, this is what I'm getting

BatchBatch DateCylinder NumberLot NumberReceived Date
12/13/2023 55A6/14/2022
12/13/2023 55B10/21/2022
12/13/2023 55C11/30/2022

 

There should only be 1 row populated from the query (should be the last row 11/30/2022); the other ones are "ghosts" as we only made 1 "Batch 1"

 

Any help on how to improve the SQL code to prevent duplicate entries & pick the entry only if the received date is the closest <= batch date?

 

This was the code from the SQLQuery interface that was given

 

SELECT DISTINCT t1.Batch, t1.Batch_Date AS Date, t1.Cylinder_Number,
t2.Lot_Number, t2.Cylinder_Number AS "Cylinder_Number 2", t2.Lot_Date AS "Date 2", MIN(t2."Batch Num") AS "Minimum-Batch Num"
FROM "Table 1" t1
LEFT OUTER JOIN "Table 2" t2
ON ( ( t1.Cylinder_Number = t2.Cylinder_Number ) AND ( t1.Cat_Date >= t2.Lot_Date ) )
GROUP BY t1.Batch, t1.Batch_Date, t1.Cylinder_Number, 
t2.Lot_Number, t2.Cylinder_Number, t2.Lot_Date;

5 REPLIES 5
jthi
Super User

Re: Remove Duplicate Entries from SQLQuery Based on Date

Are these JMP tables you are performing the SQL query to? Can you share the tables?

-Jarmo
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Just replied to the main thread with the files, thanks for reaching out

jthi
Super User

Re: Remove Duplicate Entries from SQLQuery Based on Date

This is quite close to the limit on what I would attempt to do with JMP's limited SQL capabilities (at least when using data tables). It is most likely simpler to use SQL + JSL

Names Default To Here(1);

res = New SQL Query(
	Version(130),
	Connection("JMP"),
	JMP Tables(
		["t1" => "$DOWNLOADS\Table1-1.jmp",
		"t2" => "$DOWNLOADS\Table 2-2.jmp"]
	),
	QueryName("SQLQuery2"),
	CustomSQL(
		"\[SELECT 
			t1.Cylinder, 
			t1.Batch, 
			t1."Batch Date", 
			t2."Received Date",
			t2.Supplier, 
			t2.Lot
		FROM t1
			LEFT OUTER JOIN t2 ON 
				t1.Cylinder = t2.Cylinder
				AND t1."Batch Date" >= t2."Received Date"

	]\")
) << Run;


res << select where(Row() == Col Max(Row(), :Cylinder, :Batch) | Is Missing(:Lot));
res << Invert Row Selection;
res << delete rows;
-Jarmo
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Hello - thank you for the quick help! I tried the macro & unfortunately for some items it is keeping the oldest possible date. It doesn't appear to be doing this for all entries though ( see last row of the table below).

 

So this is a definite improvement that there aren't multiple rows for the same batch, but it's not consistently pulling the right lot/received date for that batch in question.

 

Example

 

CylinderBatchBatch DateReceived DateSupplierLot Should Be Received Date
822738/9/20235/20/2022B55 6/22/2023
322748/10/20235/20/2022B55 6/22/2023
222758/11/20235/20/2022B55 6/22/2023
1022768/12/20235/20/2022B55 6/22/2023
622778/12/20235/20/2022B55 6/22/2023
922788/13/20235/20/2022B55 6/22/2023
122798/14/20235/20/2022B55 6/22/2023
21222808/15/20236/22/2023B74 (file is correct)
Polygon0516
Level I

Re: Remove Duplicate Entries from SQLQuery Based on Date

Here are the files - sorry, meant to attach these.

Table 1-1 and Table 2-2 are as described above.

Query Results shows the final table where there are multiple entries for each batch (but there should only be 1 entry per batch as described above).