"Fuzzymerge" - join tables with slightly different timestamps

15 Kudos

I often have data files from different sources that I want to join by time, but as they come from different systems their time stamps are just slightly different. We solved this by using JMPs KDTable object which allows for crazy fast nearest neighbor searching. This "fuzzymerge" will find the closest row in time in a second table for each row in the main table and join those rows into the resulting single table.

I found a few other examples of this in my search on the community, but none with a GUI to make it accessible to non-scripters.

To use this, first run the 'fn_fuzzymerge_generic' script to load the custom function into JMP. Then run the fuzzymerge_generic_GUI script with your data tables open. You might want to take a look at a distribution of the 'Fuzzy Merge delta' column which is the difference between the time columns of the two tables for the rows that were joined. Large values indicate that there is no data "close by" that row in the main table.

Note that this will find the closest row *in any direction*, forward or backward in time.

Hope this helps someone!

Kristo

Ressel · ‎03-18-2021

Very nice. I was looking for something like this. Thanks!

klk · ‎03-18-2021

Glad it helps! Note that I just updated it with a tiny fix.

Ressel · ‎03-18-2021

Great script & good support. I like.

fionaweston · ‎03-30-2021

Kristo, yes I have been in this exact situation, as I'm sure a lot of people have. I just decided it would be too hard to figure out. Glad you took the time to write this script.

I know I'll have the opportunity to use it soon.

Thanks

Fiona

FN · ‎06-28-2021

Thanks for sharing! A tolerance factor will be great to have so we don't join indexes that are 'too far'.

Lavik17 · ‎05-13-2022

@klk this script is great!

I was wondering if you could complement it with a couple of items:

1. Add a cancel button to the GUI window

2. Would it be possible to do the join by group? for example I might have two time series that should be kept separated, and the series name is found in one of the columns.

Thanks!!!

Lavik17 · ‎05-13-2022

@klk I have one table with timestamps every second, and another table with a timestamp every hour. Both columns are showing the time as hours. When I join the tables with your script it will match the second table into multiple places in the first table until it reaches a delta of 0.5, and then it would match the next timestamp. I end up with multiple matches instead of a single match.

Is there a way to enforce a single fuzzy match? perhaps by manually defining a single delta?

klk · ‎05-13-2022

So you have table A with one row every second, and table B with one row every hour. If you use this fuzzymerge, and merge into A (the main table) from table B (the 'with' table), you will end up finding the nearest row in B for every row in A. It sounds like what you are asking for is to just do it the other way around - set table B as the 'main' table, and table A as the 'with' table. Let's call it table C. That will get you the single closet row in A for each row in B. If you still want a table with as many rows as table A, but empty data for B, you could then table->join C back with A by matching on timestamp.

Audra · ‎06-21-2023

Thank you so much, this has been very helpful for our group since we often have to combine several files with different timestamps. Works perfectly! I would suggest an option to save the results into a new table with a new name, rather than changing the main table itself.

Recommended Articles