cancel
Showing results for
Show  only  | Search instead for
Did you mean:
Choose Language Hide Translation Bar
Level I

## date differences in two datasets

Hi,

I have two datasets on a cohort of patients.  One has dates of test “a” and the other has dates of test “b”.  the number of tests a and b on a subject vary (i.e., some have a lot of a and few b or visa versa)

I need to find all instances where tests a and b are within a timeframe (90 days for this project).  How could I automate the comparison of all combinations of test a and b to find the tests within this parameter?  Stated differently, how do I compare every test a date against all test b dates to caculate a date difference (and isolate those <90d)?  if I just get a list of all the date-diff combinations I can easily sort to find those <90d

Example below

DATASET1

Subject #         Test A date

1                      1/1/23

1                      2/1/23

1                      3/1/23

1                      4/1/23

2                      2/1/10

2                      5/1/10

DATASET2

Subject #         Test B date

1                      1/1/14

1                      6/1/23

1                      11/1/22

2                      1/1/10

2                      1/1/23

2                      1/5/23

RESULTS
subject            test a date       test b date       date diff

1                      1/1/23             11/1/22           60d

1                      3/1/23             6/1/23             90d

2                      2/1/10             1/1/10             30d

7 REPLIES 7
Level VII

## Re: date differences in two datasets

I would try joining the data sets by subject, and then using the datedifference function to calculate the differences between tests A and tests B.  It is a little unclear to me whether you want all of the differences for all pairs of tests or just the minimum or maximum difference.  If the minimum or maximum, you should be able to create new columns for the earliest and/or latest dates for tests A and B before calculating the date difference.  If you want all the differences, then you may need to restructure your data set (perhaps by stacking the test A and test B columns after joining the datasets) before calculating the date differences.

Level VII

## Re: date differences in two datasets

Attached are the 3 data sets for your example above.  I'm not sure what you mean by your Results table - there are more matches than you are showing.  So, if it is an abbreviated from of what I attached, then this should work for you.  If you really only wanted those 3 matches and differences shown in your Results, then I'm not understanding how you are choosing which matches you want.

Level I

## Re: date differences in two datasets

my goal is to isolate subjects to include in a study who have tests a and b done within 90 days of each other.  My results only lists the pair of tests with <=90d in between as that would be the desired result.

Level VII

## Re: date differences in two datasets

So either my solution or txnelson's will work (he's the coding wizard, I just use my mouse).  Once you have the date differences calculated just choose those < 90 days or create a column identifying those <90 with an indicator (e.g., 1 vs 0).

Super User

## Re: date differences in two datasets

I believe the

Tables=>Join will do what you want.

Here is a script that creates your 2 sample tables, and then Joins them together and highlights all of the rows with date differences greater than 90.

``````Names Default To Here( 1 );
dt1 = New Table( "Dataset1",
New Column( "Subject",
Character,
"Nominal",
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Values( {"1", "1", "1", "1", "2", "2"} )
),
New Column( "Test A Date",
Numeric,
"Nominal",
Format( "m/d/y", 12 ),
Input Format( "m/d/y" ),
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Selected,
Set Values( [3755376000, 3758054400, 3760473600, 3763152000, 3347827200, 3355516800] )
)
);
dt2 = New Table( "Dataset2",
New Column( "Subject",
Character,
"Nominal",
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Values( {"1", "1", "1", "2", "2", "2"} )
),
New Column( "Test B Date",
Numeric,
"Nominal",
Format( "m/d/y", 12 ),
Input Format( "m/d/y" ),
Set Property( "Value Order", {Numerical Order( 0 )} ),
Set Selected,
Set Values( [3471379200, 3768422400, 3750105600, 3345148800, 3755376000, 3755721600] )
),
Set Row States( [1, 0, 0, 0, 0, 0] )
);

// The code below is the code that does the work.
dtFinal = dt1 << Join(
With( dt2 ),
Merge Same Name Columns,
Match Flag( 0 ),
By Matching Columns( :Subject = :Subject ),
Drop multiples( 0, 0 ),
Include Nonmatches( 0, 0 ),
Preserve main table order( 1 ),
Output Table( "Final" )
);

dtFinal << New Column( "Date Diff in Days", formula( Abs( :Test A Date - :Test B Date ) / In Days( 1 ) ) );

dtFinal << select where(:Date Diff in Days > 90 );``````

Jim
Level VII

## Re: date differences in two datasets

I got the same thing as you, except for 2 items:  I calculated the difference in months rather than days (an oversight - days makes more sense), and you used the absolute value of the difference.  That factor is unclear from the request - I would normally think of the tests as sequential (A preceding B), but given that a number of the B tests preceded the A tests, I guess the absolute value makes more sense.

Super User

## Re: date differences in two datasets

I have also written an add-in which could be used for this (it is clunky to use and automating it might be a bit difficult, but it should be possible). Join Nearest Rows - Join tables by using continuous columns and nearest values . I hope that I have time to rewrite this at some point

Tolerance is set to 7776000 as JMP datetimes are in seconds. Join direction is Nearest as we can join backwards or forwards

and from result you can remove extra rows with no matches

After some cleanup

If you know SQL you could use that

``````Query(
Table(dt1, "dt1"),
Table(dt2, "dt2"),
"\[SELECT dt1.*, dt2."Test B Date", Abs(dt1."Test A Date" - dt2."Test B Date") / (24*60*60) DeltaDays
FROM dt1
join dt2 on dt1.subject = dt2.subject
where
Abs(dt1."Test A Date" - dt2."Test B Date") <= 90*24*60*60
]\"
);``````

But in this case I think using JMP's interactive solution and taking the script JMP creates for you (join by subject, add new column based on date differences (right click menu can do this), add second column to check if the difference is small enough, perform selection based on that and remove extra rows). Both @txnelson and @dale_lehman did more or less suggest this

Join: