Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Correlating Data sets with non-identical time stamps

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 21, 2018 6:21 AM
(4272 views)

I'm brand new to JMP. I'm sure my issue is quite common, but I didn't know the right terms to look for help.

I ran an experiment where the data was collected manually a random time points.

To better illustrate the type of data I have, I've made a sample table below. Let's say I see a problem with Result 2, so I want to see if there is a correlation between Result 2 and Result 1 and also between Result 2 and Result 3.

The numbers are completely made up, but you will see that, in general, the data is not gathered at the same time points (but sometimes they are). The data sets are not the same size.

Time | Result 1 | Result 2 | Result 3 |

01:00 | 2000 | ||

02:00 | 5200 | ||

03:14 | 5000 | ||

04:00 | 5100 | ||

05:23 | 5200 | ||

06:00 | 5300 | ||

07:52 | 5000 | ||

01:15 | 12 | ||

04:23 | 16 | ||

06:00 | 22 | ||

01:17 | 345 | ||

02:34 | 345 | ||

02:50 | 347 | ||

04:23 | 347 | ||

05:53 | 342 | ||

08:23 | 235 | ||

I've been able to convert the time into a date time because my data spans multiple days (not shown in example table). I don't know the best way to go about this. If my datasets were small, I would try and match Results 2 and 3 with the closest dataset of Result 1. Say, match 1:00 of Result 1 and 1:15 of Result 2 and 1:17 of Result 3.

But I have lots of data (a few thousand data points or so). I just need to get an overall sense of what correlates and what does not. I don't need to keep the data in it's original format and I don't need to agnoize over small differences (hence, ok to correlation 1:00 with 1:15 with 1:17). I just need be able to cycle through lots of different variables and determine... yes there is a correlation, or no there is not.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Here is a simple example of one way to solve the question you asked

```
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/semiconductor capability.jmp" );
colNamesList = dt << get column names( numeric, continuous );
dtSum = dt << Summary(
Mean( Eval( colNamesList ) ),
Freq( "None" ),
Weight( "None" ),
Link to original data table( 0 )
);
```

Documentation on the method and functions are found in the Scripting Guide and Scripting Index

Help==>Books==>Scripting Guide

Help==>Scripting Index

Jim

4 REPLIES 4

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlating Data sets with non-identical time stamps

To "match" data there should be some context for matching. From your description, it seemed you wanted to match by nearest hour. If that were the case then you could create a new column, I'll call it Rounded Hour.

`Round( :Time / (60 * 60), 0 ) * (60 * 60)`

Then you could use Table Summary for your results, requesting the mean, or min or max, since there should be just one,

```
Data Table( "Example" ) << Summary(
Group( :Round Hour ),
Mean( :Result 1 ),
Mean( :Result 2 ),
Mean( :Result 3 ),
Freq( "None" ),
Weight( "None" ),
statistics column name format( "column" ),
Link to original data table( 0 )
)
```

However, given your made up data there would be a problem with this. For Result 3 you have a reading at 2:34, which would round to 3:00 and another at 2:50 that would round to 3:00. What would happen here? How would you resolve this issue?

If it takes more logic, such as, "if 2 results at the same rounded hour, assign the closest to that hour and if none assigned to the previous hour, assign the other to the previous hour" and similar rules, then a script would be required.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlating Data sets with non-identical time stamps

That's super awesome. It's exactly what I needed. Thank you so much.

I do have one more question... for the script to make the summary table, I can type the column names (e.g. "Result 2") if I only have a few columns. What do I do if I have 100 columns with various names that need to be merged into the data table?

Thank you!

I do have one more question... for the script to make the summary table, I can type the column names (e.g. "Result 2") if I only have a few columns. What do I do if I have 100 columns with various names that need to be merged into the data table?

Thank you!

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Here is a simple example of one way to solve the question you asked

```
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/semiconductor capability.jmp" );
colNamesList = dt << get column names( numeric, continuous );
dtSum = dt << Summary(
Mean( Eval( colNamesList ) ),
Freq( "None" ),
Weight( "None" ),
Link to original data table( 0 )
);
```

Documentation on the method and functions are found in the Scripting Guide and Scripting Index

Help==>Books==>Scripting Guide

Help==>Scripting Index

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Correlating Data sets with non-identical time stamps

Thank you so much!

Article Labels

There are no labels assigned to this post.