cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
doraemengs
Level III

How can I select rows based on multiple conditions for the same ID?

Hello, I need assistance in cleaning up some data. The data in the table contain multiple IDs, dates, and categories. I am interested in IDs with category 1 and 2, or 1 and 2 and 3, or 1 and 3, while retaining only the entries with the maximum date for each category. Could you please provide the best script for this solution? Thank you!

IDcategorydatecostAcostBcostCcostD
111/21/16500003000500
111/21/1610005000100000
111/21/162000001050
121/22/1630000050000
132/1/161109000
214/2/130100100005000
224/3/13000642.49
214/2/131109000
323/6/155000259.58
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: How can I select rows based on multiple conditions for the same ID?

My example from above had Category as a character column.  Your example is numeric.  I also changed the logic to ensure that there is at least categories 1&2, 1&3 or 1&2&3

Names Default To Here( 1 );
dt = Current Data Table();

For Each Row(
	If( :date == Col Maximum( :date, :ID ),
		currID = :ID;
		cats = Associative Array( dt:category[dt << get rows where( :ID == currID )] ) << get keys;
		show(cats);
		If( Length( cats ) > 1,
			If( cats[1] == 1 & (cats[2] == 2 | cats[2] == 3),
				dt << select rows( Row() )
			)
		);
	)
);

dtNew = dt << subset( selected columns( 0 ), selected rows( 1 ), output table( "Final" ) );
Jim

View solution in original post

4 REPLIES 4
txnelson
Super User

Re: How can I select rows based on multiple conditions for the same ID?

Here is a little script that should work.  It finds the rows that meet your definition.  Those rows are then subsetted into a new table.

Names Default To Here( 1 );
dt = Current Data Table();

For Each Row(
	If( :date == Col Maximum( :date, :ID ),
		currID = :ID;
		//currID="1"
		cats = Associative Array( dt:category[dt << get rows where( :ID == currID )] ) << get keys;
		If( cats[1] == "1" & Length( cats ) > 1,
			dt << select rows( Row() )
		);
	)
);

dtNew = dt << subset( selected columns(0), selected rows(1),
	output table("Final")
);

// The above code will allow multiple rows within a given ID, when there are 
// multiple rows with the same maximum date.  Run the below code to 
// get rid of the multiple entries, if that is what is desired

dtNew << select duplicate rows(Match(:ID));
dtNew << delete rows;

txnelson_0-1711471791138.png

 

txnelson_2-1711471846520.png

If the last 2 lines of the above script are run, then the duplicate ID rows are deleted.

txnelson_3-1711471972582.png

 

 

Jim
doraemengs
Level III

Re: How can I select rows based on multiple conditions for the same ID?

ello Jim,

Thank you for your help. I apologize for any confusion. I need each ID that must have category 1 and 2 and/or 3 in the final table. If there are repeated IDs with the same category, I need the row that has the maximum date. The table should be provided here. Thank you for your help in advance.

IDcategorydatecostAcostBcostCcostD
111/21/162000001050
121/22/1630000050000
214/2/130100100005000
224/3/13000642.49
313/6/15300000817
323/6/155000259.58
413/6/151000000710.16
423/6/155000001087
433/16/15000326.7
txnelson
Super User

Re: How can I select rows based on multiple conditions for the same ID?

My example from above had Category as a character column.  Your example is numeric.  I also changed the logic to ensure that there is at least categories 1&2, 1&3 or 1&2&3

Names Default To Here( 1 );
dt = Current Data Table();

For Each Row(
	If( :date == Col Maximum( :date, :ID ),
		currID = :ID;
		cats = Associative Array( dt:category[dt << get rows where( :ID == currID )] ) << get keys;
		show(cats);
		If( Length( cats ) > 1,
			If( cats[1] == 1 & (cats[2] == 2 | cats[2] == 3),
				dt << select rows( Row() )
			)
		);
	)
);

dtNew = dt << subset( selected columns( 0 ), selected rows( 1 ), output table( "Final" ) );
Jim
jthi
Super User

Re: How can I select rows based on multiple conditions for the same ID?

Here is one way you could approach this using Col-functions

Names Default To Here(1);

dt = New Table("Untitled 3",
	Add Rows(11),
	Compress File When Saved(1),
	New Column("ID", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 1, 2, 3, 3, 4, 4, 4, 4, 4])),
	New Column("Category", Character, "Nominal", Set Values({"A", "A", "B", "A", "A", "C", "A", "A", "B", "B", "C"})),
	New Column("Date", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1]))
);

dt << New Column("KEEP", Numeric, Nominal, Formula(
	Col Sum(Row() == Col Min(Row(), :category, :id), :id) > 1 & :date == Col Max(:date, :category, :id)
));

Change the formula to << Select Where / << Get Rows Where or use it with subset to take the rows you want to keep.

jthi_0-1711515810659.png

 

Edit:

Original version didn't take into account that category 1 could be missing (this assumes that category is numerical column)

New Column("KEEP", Numeric, Nominal, Formula(
	Col Sum(Row() == Col Min(Row(), :category, :id), :id) > 1 
	& :date == Col Max(:date, :category, :id) 
	& Col Sum(If(:category == 1, 1, .), :id)
));
-Jarmo