cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
jsolo01
Level I

isolating the smallest date difference

i have a large dataset that has a static date on a subject (date of chest CT scan) and i need to find the smallest date difference and keep that row (delete the others).  jmp 16.2.0

 

Example below - i want to keep rows 2 and 4 (the smallest day difference for subject 1 and 2). 

 

subj ID.    CT date.      lung function date.      day difference

1               8/30/18            1/29/18                          -213  

1               8/30/18            8/31/18                              1

1               8/30/18            3/4/19                              186

2              12/19/17           11/13/18                          329

2              12/19/17            9/3/19                             623

2              12/19/17           10/4/21                            1385

3 REPLIES 3
WebDesignesCrow
Super User

Re: isolating the smallest date difference

Hi @jsolo01 ,

How about adding a new formula column "Col Rank" to rank the day difference according to subj ID.?

Then, you can select row "Col Rank" not equal to 0 & delete the rows.

For;

subj ID 1, the min day difference is the 2nd row.

subj ID 2, the min day difference is the 4th row

 

1) Create new formula column: Col Rank

WebDesignesCrow_1-1714007353565.png

WebDesignesCrow_2-1714007436165.png

 

2) Go to Rows > Row Selection > Select Where

WebDesignesCrow_3-1714007503396.png

3) Define the condition: Col Ran does not equal 1 > click "Add Condition"

WebDesignesCrow_4-1714007638691.png

WebDesignesCrow_5-1714007657925.png

 

4) Right click the selected rows > Delete Rows

WebDesignesCrow_6-1714007730940.png

 

5) Final result as below

WebDesignesCrow_7-1714007750159.png

 

Hope it helps.

If it's a repetitive analysis, you can automate the steps through workflow or JSL

 

txnelson
Super User

Re: isolating the smallest date difference

Here is my take on a simple way to get what you are asking for;  Just a simple script

Names Default To Here( 1 );
dt = Current Data Table();
dt << select where( :day difference == Col Min( :day difference, :subject ID ) );

dt << subset( selected columns( 0 ), selected rows( 1 ) );
Jim
jthi
Super User

Re: isolating the smallest date difference

Just to extend a little on @WebDesignesCrow interactive solution:

 

If you don't already have the difference column, you can create it interactively (select both of your dates, right click on the column header and find Difference (reverse order). This will create a new column where the differences are in seconds. If you want to see them in yours/days you can double click on the created column and modify it as needed (such as add division by In Days(1)) but seconds work for this case

jthi_1-1714020133160.png

Then right click on this new column's header and make absolute value column out of it

jthi_5-1714020507585.png

After you have that column, right click on the subjectid column and make it grouping column

jthi_3-1714020241035.png

next you can right click on the date difference column and create rank formula (note the grouping)

jthi_8-1714020609139.png

from the final column, right click on one of the "ones", select matching cells

jthi_7-1714020569347.png

From here you can either create subset using selected rows

jthi_9-1714020648045.png

OR right click on the row list, invert selection and after inversion delete rows

jthi_10-1714020672795.png

jthi_11-1714020685944.png

jthi_12-1714020694899.png

Then if you need to repeat this you can either make a workflow or capture the script from enhanced log

 

Below is script created by Workflow builder. Only thing it is missing is opening the table as I don't have the table saved (and some column names are incorrect for the same reason) so I added table creation script (workflow is also attached).

View more...
Names Default To Here(1);

//Custom Action - Create table 'Untitled 2'
dt = New Table("Untitled 2",
	Add Rows(6),
	Compress File When Saved(1),
	New Column("subj", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 1, 2, 2, 2])),
	New Column("day1",
		Numeric,
		"Continuous",
		Format("m/d/y", 12),
		Input Format("m/d/y"),
		Set Values([3618432000, 3618432000, 3618432000, 3596486400, 3596486400, 3596486400])
	),
	New Column("day2",
		Numeric,
		"Continuous",
		Format("m/d/y", 12),
		Input Format("m/d/y"),
		Set Values([3600028800, 3618518400, 3634502400, 3624912000, 3650313600, 3716150400])
	)
);;

//New formula column: day2-day1
Data Table("Untitled 2") << New Formula Column(
	Operation(Category("Combine"), "Difference (reverse order)"),
	Columns(:day1, :day2)
);

//New formula column: Abs[day2-day1]
Data Table("Untitled 2") << New Formula Column(
	Operation(Category("Transform"), "Absolute Value"),
	Columns(:"day2-day1"n)
);

//New formula column: Rank[Abs[day2-day1]][subj]
Data Table("Untitled 2") << New Formula Column(
	Operation(Category("Distributional"), "Rank"),
	Columns(:"Abs[day2-day1]"n),
	Group By(:subj)
);

//Select matching cells
Data Table("Untitled 2") << Select Where(:"Rank[Abs[day2-day1]][subj]"n == 1);

//Subset data table
Data Table("Untitled 2") << Select Where(:"Rank[Abs[day2-day1]][subj]"n == 1) <<
Subset(Selected Rows(1), Selected columns only(0));

 

 

-Jarmo