date differences in two datasets

Report Inappropriate Content · Sep 29, 2023 02:22 PM

Hi,

I have two datasets on a cohort of patients. One has dates of test “a” and the other has dates of test “b”. the number of tests a and b on a subject vary (i.e., some have a lot of a and few b or visa versa)

I need to find all instances where tests a and b are within a timeframe (90 days for this project). How could I automate the comparison of all combinations of test a and b to find the tests within this parameter? Stated differently, how do I compare every test a date against all test b dates to caculate a date difference (and isolate those <90d)? if I just get a list of all the date-diff combinations I can easily sort to find those <90d

Example below

DATASET1

Subject # Test A date

1 1/1/23

1 2/1/23

1 3/1/23

1 4/1/23

2 2/1/10

2 5/1/10

DATASET2

Subject # Test B date

1 1/1/14

1 6/1/23

1 11/1/22

2 1/1/10

2 1/1/23

2 1/5/23

RESULTS
subject test a date test b date date diff

1 1/1/23 11/1/22 60d

1 3/1/23 6/1/23 90d

2 2/1/10 1/1/10 30d

dale_lehman · Sep 29, 2023 03:37 PM

I would try joining the data sets by subject, and then using the datedifference function to calculate the differences between tests A and tests B. It is a little unclear to me whether you want all of the differences for all pairs of tests or just the minimum or maximum difference. If the minimum or maximum, you should be able to create new columns for the earliest and/or latest dates for tests A and B before calculating the date difference. If you want all the differences, then you may need to restructure your data set (perhaps by stacking the test A and test B columns after joining the datasets) before calculating the date differences.

dale_lehman · Sep 29, 2023 03:47 PM

Attached are the 3 data sets for your example above. I'm not sure what you mean by your Results table - there are more matches than you are showing. So, if it is an abbreviated from of what I attached, then this should work for you. If you really only wanted those 3 matches and differences shown in your Results, then I'm not understanding how you are choosing which matches you want.

jsolo01 · Sep 29, 2023 04:40 PM

my goal is to isolate subjects to include in a study who have tests a and b done within 90 days of each other. My results only lists the pair of tests with <=90d in between as that would be the desired result.

dale_lehman · Sep 29, 2023 05:44 PM

So either my solution or txnelson's will work (he's the coding wizard, I just use my mouse). Once you have the date differences calculated just choose those < 90 days or create a column identifying those <90 with an indicator (e.g., 1 vs 0).

txnelson · Sep 29, 2023 04:01 PM

I believe the

Tables=>Join will do what you want.

Here is a script that creates your 2 sample tables, and then Joins them together and highlights all of the rows with date differences greater than 90.

Names Default To Here( 1 );
dt1 = New Table( "Dataset1",
	Add Rows( 6 ),
	New Column( "Subject",
		Character,
		"Nominal",
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Values( {"1", "1", "1", "1", "2", "2"} )
	),
	New Column( "Test A Date",
		Numeric,
		"Nominal",
		Format( "m/d/y", 12 ),
		Input Format( "m/d/y" ),
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Selected,
		Set Values( [3755376000, 3758054400, 3760473600, 3763152000, 3347827200, 3355516800] )
	)
);
dt2 = New Table( "Dataset2",
	Add Rows( 6 ),
	New Column( "Subject",
		Character,
		"Nominal",
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Values( {"1", "1", "1", "2", "2", "2"} )
	),
	New Column( "Test B Date",
		Numeric,
		"Nominal",
		Format( "m/d/y", 12 ),
		Input Format( "m/d/y" ),
		Set Property( "Value Order", {Numerical Order( 0 )} ),
		Set Selected,
		Set Values( [3471379200, 3768422400, 3750105600, 3345148800, 3755376000, 3755721600] )
	),
	Set Row States( [1, 0, 0, 0, 0, 0] )
);


// The code below is the code that does the work.  
dtFinal = dt1 << Join(
	With( dt2 ),
	Merge Same Name Columns,
	Match Flag( 0 ),
	By Matching Columns( :Subject = :Subject ),
	Drop multiples( 0, 0 ),
	Include Nonmatches( 0, 0 ),
	Preserve main table order( 1 ),
	Output Table( "Final" )
);

dtFinal << New Column( "Date Diff in Days", formula( Abs( :Test A Date - :Test B Date ) / In Days( 1 ) ) );

dtFinal << select where(:Date Diff in Days > 90 );

Jim

dale_lehman · Sep 29, 2023 04:14 PM

I got the same thing as you, except for 2 items: I calculated the difference in months rather than days (an oversight - days makes more sense), and you used the absolute value of the difference. That factor is unclear from the request - I would normally think of the tests as sequential (A preceding B), but given that a number of the B tests preceded the A tests, I guess the absolute value makes more sense.

jthi · Sep 30, 2023 02:41 AM

I have also written an add-in which could be used for this (it is clunky to use and automating it might be a bit difficult, but it should be possible). Join Nearest Rows - Join tables by using continuous columns and nearest values . I hope that I have time to rewrite this at some point

Tolerance is set to 7776000 as JMP datetimes are in seconds. Join direction is Nearest as we can join backwards or forwards

and from result you can remove extra rows with no matches

After some cleanup

If you know SQL you could use that

Query(
	Table(dt1, "dt1"),
	Table(dt2, "dt2"), 
	"\[SELECT dt1.*, dt2."Test B Date", Abs(dt1."Test A Date" - dt2."Test B Date") / (24*60*60) DeltaDays
		FROM dt1
		join dt2 on dt1.subject = dt2.subject
		where
			Abs(dt1."Test A Date" - dt2."Test B Date") <= 90*24*60*60
	]\"
);

But in this case I think using JMP's interactive solution and taking the script JMP creates for you (join by subject, add new column based on date differences (right click menu can do this), add second column to check if the difference is small enough, perform selection based on that and remove extra rows). Both @txnelson and @dale_lehman did more or less suggest this

Join:

Adding first cormula column

Adding second formula

Select 0 rows and delete those

Final result

JMP created script (this should be cleaned up a bit, good post regarding that )

-Jarmo

date differences in two datasets

Re: date differences in two datasets

Re: date differences in two datasets

Re: date differences in two datasets

Re: date differences in two datasets

Re: date differences in two datasets

Re: date differences in two datasets

Re: date differences in two datasets

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Getting Started with JMP: On Demand Course

Hiding and Excluding Data