cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Sign-in to the JMP Community will be unavailable intermittently Dec. 6-7 due to a system update. Thank you for your understanding!
  • We’re retiring the File Exchange at the end of this year. The JMP Marketplace is now your destination for add-ins and extensions.
  • JMP 19 is here! Learn more about the new features.

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
doraemengs
Level III

How can I select rows based on multiple conditions for the same ID?

Hello, I need assistance in cleaning up some data. The data in the table contain multiple IDs, dates, and categories. I am interested in IDs with category 1 and 2, or 1 and 2 and 3, or 1 and 3, while retaining only the entries with the maximum date for each category. Could you please provide the best script for this solution? Thank you!

IDcategorydatecostAcostBcostCcostD
111/21/16500003000500
111/21/1610005000100000
111/21/162000001050
121/22/1630000050000
132/1/161109000
214/2/130100100005000
224/3/13000642.49
214/2/131109000
323/6/155000259.58
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: How can I select rows based on multiple conditions for the same ID?

My example from above had Category as a character column.  Your example is numeric.  I also changed the logic to ensure that there is at least categories 1&2, 1&3 or 1&2&3

Names Default To Here( 1 );
dt = Current Data Table();

For Each Row(
	If( :date == Col Maximum( :date, :ID ),
		currID = :ID;
		cats = Associative Array( dt:category[dt << get rows where( :ID == currID )] ) << get keys;
		show(cats);
		If( Length( cats ) > 1,
			If( cats[1] == 1 & (cats[2] == 2 | cats[2] == 3),
				dt << select rows( Row() )
			)
		);
	)
);

dtNew = dt << subset( selected columns( 0 ), selected rows( 1 ), output table( "Final" ) );
Jim

View solution in original post

4 REPLIES 4
txnelson
Super User

Re: How can I select rows based on multiple conditions for the same ID?

Here is a little script that should work.  It finds the rows that meet your definition.  Those rows are then subsetted into a new table.

Names Default To Here( 1 );
dt = Current Data Table();

For Each Row(
	If( :date == Col Maximum( :date, :ID ),
		currID = :ID;
		//currID="1"
		cats = Associative Array( dt:category[dt << get rows where( :ID == currID )] ) << get keys;
		If( cats[1] == "1" & Length( cats ) > 1,
			dt << select rows( Row() )
		);
	)
);

dtNew = dt << subset( selected columns(0), selected rows(1),
	output table("Final")
);

// The above code will allow multiple rows within a given ID, when there are 
// multiple rows with the same maximum date.  Run the below code to 
// get rid of the multiple entries, if that is what is desired

dtNew << select duplicate rows(Match(:ID));
dtNew << delete rows;

txnelson_0-1711471791138.png

 

txnelson_2-1711471846520.png

If the last 2 lines of the above script are run, then the duplicate ID rows are deleted.

txnelson_3-1711471972582.png

 

 

Jim
doraemengs
Level III

Re: How can I select rows based on multiple conditions for the same ID?

ello Jim,

Thank you for your help. I apologize for any confusion. I need each ID that must have category 1 and 2 and/or 3 in the final table. If there are repeated IDs with the same category, I need the row that has the maximum date. The table should be provided here. Thank you for your help in advance.

IDcategorydatecostAcostBcostCcostD
111/21/162000001050
121/22/1630000050000
214/2/130100100005000
224/3/13000642.49
313/6/15300000817
323/6/155000259.58
413/6/151000000710.16
423/6/155000001087
433/16/15000326.7
txnelson
Super User

Re: How can I select rows based on multiple conditions for the same ID?

My example from above had Category as a character column.  Your example is numeric.  I also changed the logic to ensure that there is at least categories 1&2, 1&3 or 1&2&3

Names Default To Here( 1 );
dt = Current Data Table();

For Each Row(
	If( :date == Col Maximum( :date, :ID ),
		currID = :ID;
		cats = Associative Array( dt:category[dt << get rows where( :ID == currID )] ) << get keys;
		show(cats);
		If( Length( cats ) > 1,
			If( cats[1] == 1 & (cats[2] == 2 | cats[2] == 3),
				dt << select rows( Row() )
			)
		);
	)
);

dtNew = dt << subset( selected columns( 0 ), selected rows( 1 ), output table( "Final" ) );
Jim
jthi
Super User

Re: How can I select rows based on multiple conditions for the same ID?

Here is one way you could approach this using Col-functions

Names Default To Here(1);

dt = New Table("Untitled 3",
	Add Rows(11),
	Compress File When Saved(1),
	New Column("ID", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 1, 2, 3, 3, 4, 4, 4, 4, 4])),
	New Column("Category", Character, "Nominal", Set Values({"A", "A", "B", "A", "A", "C", "A", "A", "B", "B", "C"})),
	New Column("Date", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1]))
);

dt << New Column("KEEP", Numeric, Nominal, Formula(
	Col Sum(Row() == Col Min(Row(), :category, :id), :id) > 1 & :date == Col Max(:date, :category, :id)
));

Change the formula to << Select Where / << Get Rows Where or use it with subset to take the rows you want to keep.

jthi_0-1711515810659.png

 

Edit:

Original version didn't take into account that category 1 could be missing (this assumes that category is numerical column)

New Column("KEEP", Numeric, Nominal, Formula(
	Col Sum(Row() == Col Min(Row(), :category, :id), :id) > 1 
	& :date == Col Max(:date, :category, :id) 
	& Col Sum(If(:category == 1, 1, .), :id)
));
-Jarmo

Recommended Articles