Discussions

jsolo01 · Apr 24, 2024 06:27 PM

i have a large dataset that has a static date on a subject (date of chest CT scan) and i need to find the smallest date difference and keep that row (delete the others). jmp 16.2.0

Example below - i want to keep rows 2 and 4 (the smallest day difference for subject 1 and 2).

subj ID. CT date. lung function date. day difference

1 8/30/18 1/29/18 -213

1 8/30/18 8/31/18 1

1 8/30/18 3/4/19 186

2 12/19/17 11/13/18 329

2 12/19/17 9/3/19 623

2 12/19/17 10/4/21 1385

WebDesignesCrow · Apr 24, 2024 09:16 PM

Hi @jsolo01 ,

How about adding a new formula column "Col Rank" to rank the day difference according to subj ID.?

Then, you can select row "Col Rank" not equal to 0 & delete the rows.

For;

subj ID 1, the min day difference is the 2nd row.

subj ID 2, the min day difference is the 4th row

1) Create new formula column: Col Rank

2) Go to Rows > Row Selection > Select Where

3) Define the condition: Col Ran does not equal 1 > click "Add Condition"

4) Right click the selected rows > Delete Rows

5) Final result as below

Hope it helps.

If it's a repetitive analysis, you can automate the steps through workflow or JSL

txnelson · Apr 24, 2024 09:54 PM

Here is my take on a simple way to get what you are asking for; Just a simple script

Names Default To Here( 1 );
dt = Current Data Table();
dt << select where( :day difference == Col Min( :day difference, :subject ID ) );

dt << subset( selected columns( 0 ), selected rows( 1 ) );

Jim

jthi · Apr 25, 2024 01:00 AM

Just to extend a little on @WebDesignesCrow interactive solution:

If you don't already have the difference column, you can create it interactively (select both of your dates, right click on the column header and find Difference (reverse order). This will create a new column where the differences are in seconds. If you want to see them in yours/days you can double click on the created column and modify it as needed (such as add division by In Days(1)) but seconds work for this case

Then right click on this new column's header and make absolute value column out of it

After you have that column, right click on the subjectid column and make it grouping column

next you can right click on the date difference column and create rank formula (note the grouping)

from the final column, right click on one of the "ones", select matching cells

From here you can either create subset using selected rows

OR right click on the row list, invert selection and after inversion delete rows

Then if you need to repeat this you can either make a workflow or capture the script from enhanced log

Below is script created by Workflow builder. Only thing it is missing is opening the table as I don't have the table saved (and some column names are incorrect for the same reason) so I added table creation script (workflow is also attached).

Names Default To Here(1);

//Custom Action - Create table 'Untitled 2'
dt = New Table("Untitled 2",
	Add Rows(6),
	Compress File When Saved(1),
	New Column("subj", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 1, 2, 2, 2])),
	New Column("day1",
		Numeric,
		"Continuous",
		Format("m/d/y", 12),
		Input Format("m/d/y"),
		Set Values([3618432000, 3618432000, 3618432000, 3596486400, 3596486400, 3596486400])
	),
	New Column("day2",
		Numeric,
		"Continuous",
		Format("m/d/y", 12),
		Input Format("m/d/y"),
		Set Values([3600028800, 3618518400, 3634502400, 3624912000, 3650313600, 3716150400])
	)
);;

//New formula column: day2-day1
Data Table("Untitled 2") << New Formula Column(
	Operation(Category("Combine"), "Difference (reverse order)"),
	Columns(:day1, :day2)
);

//New formula column: Abs[day2-day1]
Data Table("Untitled 2") << New Formula Column(
	Operation(Category("Transform"), "Absolute Value"),
	Columns(:"day2-day1"n)
);

//New formula column: Rank[Abs[day2-day1]][subj]
Data Table("Untitled 2") << New Formula Column(
	Operation(Category("Distributional"), "Rank"),
	Columns(:"Abs[day2-day1]"n),
	Group By(:subj)
);

//Select matching cells
Data Table("Untitled 2") << Select Where(:"Rank[Abs[day2-day1]][subj]"n == 1);

//Subset data table
Data Table("Untitled 2") << Select Where(:"Rank[Abs[day2-day1]][subj]"n == 1) <<
Subset(Selected Rows(1), Selected columns only(0));

Names Default To Here(1); //Custom Action - Create table 'Untitled 2' dt = New Table("Untitled 2", Add Rows(6), Compress File When Saved(1), New Column("subj", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 1, 2, 2, 2])), New Column("day1", Numeric, "Continuous", Format("m/d/y", 12), Input Format("m/d/y"), Set Values([3618432000, 3618432000, 3618432000, 3596486400, 3596486400, 3596486400]) ), New Column("day2", Numeric, "Continuous", Format("m/d/y", 12), Input Format("m/d/y"), Set Values([3600028800, 3618518400, 3634502400, 3624912000, 3650313600, 3716150400]) ) );; //New formula column: day2-day1 Data Table("Untitled 2") << New Formula Column( Operation(Category("Combine"), "Difference (reverse order)"), Columns(:day1, :day2) ); //New formula column: Abs[day2-day1] Data Table("Untitled 2") << New Formula Column( Operation(Category("Transform"), "Absolute Value"), Columns(:"day2-day1"n) ); //New formula column: Rank[Abs[day2-day1]][subj] Data Table("Untitled 2") << New Formula Column( Operation(Category("Distributional"), "Rank"), Columns(:"Abs[day2-day1]"n), Group By(:subj) ); //Select matching cells Data Table("Untitled 2") << Select Where(:"Rank[Abs[day2-day1]][subj]"n == 1); //Subset data table Data Table("Untitled 2") << Select Where(:"Rank[Abs[day2-day1]][subj]"n == 1) << Subset(Selected Rows(1), Selected columns only(0));

-Jarmo

Discussions

isolating the smallest date difference

Re: isolating the smallest date difference

Re: isolating the smallest date difference

Re: isolating the smallest date difference

Recommended Articles