cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
jsolo01
Level I

date differences in two datasets

Hi,

 

I have two datasets on a cohort of patients.  One has dates of test “a” and the other has dates of test “b”.  the number of tests a and b on a subject vary (i.e., some have a lot of a and few b or visa versa)

 

I need to find all instances where tests a and b are within a timeframe (90 days for this project).  How could I automate the comparison of all combinations of test a and b to find the tests within this parameter?  Stated differently, how do I compare every test a date against all test b dates to caculate a date difference (and isolate those <90d)?  if I just get a list of all the date-diff combinations I can easily sort to find those <90d

 

Example below

 

DATASET1

Subject #         Test A date                             

1                      1/1/23                                    

1                      2/1/23

1                      3/1/23

1                      4/1/23

2                      2/1/10

2                      5/1/10

 

 

DATASET2

Subject #         Test B date

1                      1/1/14

1                      6/1/23

1                      11/1/22

2                      1/1/10

2                      1/1/23

2                      1/5/23

 

RESULTS
subject            test a date       test b date       date diff         

1                      1/1/23             11/1/22           60d

1                      3/1/23             6/1/23             90d

2                      2/1/10             1/1/10             30d     

7 REPLIES 7
dale_lehman
Level VII

Re: date differences in two datasets

I would try joining the data sets by subject, and then using the datedifference function to calculate the differences between tests A and tests B.  It is a little unclear to me whether you want all of the differences for all pairs of tests or just the minimum or maximum difference.  If the minimum or maximum, you should be able to create new columns for the earliest and/or latest dates for tests A and B before calculating the date difference.  If you want all the differences, then you may need to restructure your data set (perhaps by stacking the test A and test B columns after joining the datasets) before calculating the date differences.

dale_lehman
Level VII

Re: date differences in two datasets

Attached are the 3 data sets for your example above.  I'm not sure what you mean by your Results table - there are more matches than you are showing.  So, if it is an abbreviated from of what I attached, then this should work for you.  If you really only wanted those 3 matches and differences shown in your Results, then I'm not understanding how you are choosing which matches you want.

jsolo01
Level I

Re: date differences in two datasets

my goal is to isolate subjects to include in a study who have tests a and b done within 90 days of each other.  My results only lists the pair of tests with <=90d in between as that would be the desired result.

dale_lehman
Level VII

Re: date differences in two datasets

So either my solution or txnelson's will work (he's the coding wizard, I just use my mouse).  Once you have the date differences calculated just choose those < 90 days or create a column identifying those <90 with an indicator (e.g., 1 vs 0).

txnelson
Super User

Re: date differences in two datasets

I believe the 

     Tables=>Join will do what you want.

txnelson_0-1696017593695.png

 

Here is a script that creates your 2 sample tables, and then Joins them together and highlights all of the rows with date differences greater than 90.  

Names Default To Here( 1 );
dt1 = New Table( "Dataset1",
	Add Rows( 6 ),
	New Column( "Subject",
		Character,
		"Nominal",
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Values( {"1", "1", "1", "1", "2", "2"} )
	),
	New Column( "Test A Date",
		Numeric,
		"Nominal",
		Format( "m/d/y", 12 ),
		Input Format( "m/d/y" ),
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Selected,
		Set Values( [3755376000, 3758054400, 3760473600, 3763152000, 3347827200, 3355516800] )
	)
);
dt2 = New Table( "Dataset2",
	Add Rows( 6 ),
	New Column( "Subject",
		Character,
		"Nominal",
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Values( {"1", "1", "1", "2", "2", "2"} )
	),
	New Column( "Test B Date",
		Numeric,
		"Nominal",
		Format( "m/d/y", 12 ),
		Input Format( "m/d/y" ),
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Selected,
		Set Values( [3471379200, 3768422400, 3750105600, 3345148800, 3755376000, 3755721600] )
	),
	Set Row States( [1, 0, 0, 0, 0, 0] )
);


// The code below is the code that does the work.  
dtFinal = dt1 << Join(
	With( dt2 ),
	Merge Same Name Columns,
	Match Flag( 0 ),
	By Matching Columns( :Subject = :Subject ),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 0, 0 ),
	Preserve main table order( 1 ),
	Output Table( "Final" )
);

dtFinal << New Column( "Date Diff in Days", formula( Abs( :Test A Date - :Test B Date ) / In Days( 1 ) ) );

dtFinal << select where(:Date Diff in Days > 90 );

 

Jim
dale_lehman
Level VII

Re: date differences in two datasets

I got the same thing as you, except for 2 items:  I calculated the difference in months rather than days (an oversight - days makes more sense), and you used the absolute value of the difference.  That factor is unclear from the request - I would normally think of the tests as sequential (A preceding B), but given that a number of the B tests preceded the A tests, I guess the absolute value makes more sense.

jthi
Super User

Re: date differences in two datasets

I have also written an add-in which could be used for this (it is clunky to use and automating it might be a bit difficult, but it should be possible). Join Nearest Rows - Join tables by using continuous columns and nearest values . I hope that I have time to rewrite this at some point

Tolerance is set to 7776000 as JMP datetimes are in seconds. Join direction is Nearest as we can join backwards or forwards

jthi_0-1696054369216.png

and from result you can remove extra rows with no matches

jthi_1-1696054423776.png

After some cleanup

jthi_2-1696054577787.png

 

If you know SQL you could use that

Query(
	Table(dt1, "dt1"),
	Table(dt2, "dt2"), 
	"\[SELECT dt1.*, dt2."Test B Date", Abs(dt1."Test A Date" - dt2."Test B Date") / (24*60*60) DeltaDays
		FROM dt1
		join dt2 on dt1.subject = dt2.subject
		where
			Abs(dt1."Test A Date" - dt2."Test B Date") <= 90*24*60*60
	]\"
);

 

But in this case I think using JMP's interactive solution and taking the script JMP creates for you (join by subject, add new column based on date differences (right click menu can do this), add second column to check if the difference is small enough, perform selection based on that and remove extra rows). Both @txnelson and @dale_lehman did more or less suggest this

Join:

jthi_3-1696055513159.png

Adding first cormula column

jthi_4-1696055545288.png

Adding second formula

jthi_5-1696055640618.png

Select 0 rows and delete those

jthi_6-1696055666118.png

Final result

jthi_8-1696055713523.png

JMP created script (this should be cleaned up a bit, good post regarding that Better JSL Smoother Migration: How To Fix Your Own Code​ )

jthi_7-1696055708813.png

-Jarmo