- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
date differences in two datasets
Hi,
I have two datasets on a cohort of patients. One has dates of test “a” and the other has dates of test “b”. the number of tests a and b on a subject vary (i.e., some have a lot of a and few b or visa versa)
I need to find all instances where tests a and b are within a timeframe (90 days for this project). How could I automate the comparison of all combinations of test a and b to find the tests within this parameter? Stated differently, how do I compare every test a date against all test b dates to caculate a date difference (and isolate those <90d)? if I just get a list of all the date-diff combinations I can easily sort to find those <90d
Example below
DATASET1
Subject # Test A date
1 1/1/23
1 2/1/23
1 3/1/23
1 4/1/23
2 2/1/10
2 5/1/10
DATASET2
Subject # Test B date
1 1/1/14
1 6/1/23
1 11/1/22
2 1/1/10
2 1/1/23
2 1/5/23
RESULTS
subject test a date test b date date diff
1 1/1/23 11/1/22 60d
1 3/1/23 6/1/23 90d
2 2/1/10 1/1/10 30d
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
I would try joining the data sets by subject, and then using the datedifference function to calculate the differences between tests A and tests B. It is a little unclear to me whether you want all of the differences for all pairs of tests or just the minimum or maximum difference. If the minimum or maximum, you should be able to create new columns for the earliest and/or latest dates for tests A and B before calculating the date difference. If you want all the differences, then you may need to restructure your data set (perhaps by stacking the test A and test B columns after joining the datasets) before calculating the date differences.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
Attached are the 3 data sets for your example above. I'm not sure what you mean by your Results table - there are more matches than you are showing. So, if it is an abbreviated from of what I attached, then this should work for you. If you really only wanted those 3 matches and differences shown in your Results, then I'm not understanding how you are choosing which matches you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
my goal is to isolate subjects to include in a study who have tests a and b done within 90 days of each other. My results only lists the pair of tests with <=90d in between as that would be the desired result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
So either my solution or txnelson's will work (he's the coding wizard, I just use my mouse). Once you have the date differences calculated just choose those < 90 days or create a column identifying those <90 with an indicator (e.g., 1 vs 0).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
I believe the
Tables=>Join will do what you want.
Here is a script that creates your 2 sample tables, and then Joins them together and highlights all of the rows with date differences greater than 90.
Names Default To Here( 1 );
dt1 = New Table( "Dataset1",
Add Rows( 6 ),
New Column( "Subject",
Character,
"Nominal",
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Values( {"1", "1", "1", "1", "2", "2"} )
),
New Column( "Test A Date",
Numeric,
"Nominal",
Format( "m/d/y", 12 ),
Input Format( "m/d/y" ),
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Selected,
Set Values( [3755376000, 3758054400, 3760473600, 3763152000, 3347827200, 3355516800] )
)
);
dt2 = New Table( "Dataset2",
Add Rows( 6 ),
New Column( "Subject",
Character,
"Nominal",
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Values( {"1", "1", "1", "2", "2", "2"} )
),
New Column( "Test B Date",
Numeric,
"Nominal",
Format( "m/d/y", 12 ),
Input Format( "m/d/y" ),
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Selected,
Set Values( [3471379200, 3768422400, 3750105600, 3345148800, 3755376000, 3755721600] )
),
Set Row States( [1, 0, 0, 0, 0, 0] )
);
// The code below is the code that does the work.
dtFinal = dt1 << Join(
With( dt2 ),
Merge Same Name Columns,
Match Flag( 0 ),
By Matching Columns( :Subject = :Subject ),
Drop multiples( 0, 0 ),
Include Nonmatches( 0, 0 ),
Preserve main table order( 1 ),
Output Table( "Final" )
);
dtFinal << New Column( "Date Diff in Days", formula( Abs( :Test A Date - :Test B Date ) / In Days( 1 ) ) );
dtFinal << select where(:Date Diff in Days > 90 );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
I got the same thing as you, except for 2 items: I calculated the difference in months rather than days (an oversight - days makes more sense), and you used the absolute value of the difference. That factor is unclear from the request - I would normally think of the tests as sequential (A preceding B), but given that a number of the B tests preceded the A tests, I guess the absolute value makes more sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: date differences in two datasets
I have also written an add-in which could be used for this (it is clunky to use and automating it might be a bit difficult, but it should be possible). Join Nearest Rows - Join tables by using continuous columns and nearest values . I hope that I have time to rewrite this at some point
Tolerance is set to 7776000 as JMP datetimes are in seconds. Join direction is Nearest as we can join backwards or forwards
and from result you can remove extra rows with no matches
After some cleanup
If you know SQL you could use that
Query(
Table(dt1, "dt1"),
Table(dt2, "dt2"),
"\[SELECT dt1.*, dt2."Test B Date", Abs(dt1."Test A Date" - dt2."Test B Date") / (24*60*60) DeltaDays
FROM dt1
join dt2 on dt1.subject = dt2.subject
where
Abs(dt1."Test A Date" - dt2."Test B Date") <= 90*24*60*60
]\"
);
But in this case I think using JMP's interactive solution and taking the script JMP creates for you (join by subject, add new column based on date differences (right click menu can do this), add second column to check if the difference is small enough, perform selection based on that and remove extra rows). Both @txnelson and @dale_lehman did more or less suggest this
Join:
Adding first cormula column
Adding second formula
Select 0 rows and delete those
Final result
JMP created script (this should be cleaned up a bit, good post regarding that