Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
pgfan
Level II

Correlating Data sets with non-identical time stamps

I'm brand new to JMP. I'm sure my issue is quite common, but I didn't know the right terms to look for help.

 

I ran an experiment where the data was collected manually a random time points.

 

To better illustrate the type of data I have, I've made a sample table below. Let's say I see a problem with Result 2, so I want to see if there is a correlation between Result 2 and Result 1 and also between Result 2 and Result 3.

 

The numbers are completely made up, but you will see that, in general, the data is not gathered at the same time points (but sometimes they are). The data sets are not the same size.

 

TimeResult 1Result 2Result 3

01:00

2000  
02:005200  
03:145000  
04:005100  

05:23

5200  
06:005300  
07:525000  
01:15 12 
04:23 16 
06:00 22 
01:17  345
02:34  345
02:50  347
04:23  347
05:53  342
08:23  235
    

 

I've been able to convert the time into a date time because my data spans multiple days (not shown in example table). I don't know the best way to go about this. If my datasets were small, I would try and match Results 2 and 3 with the closest dataset of Result 1. Say, match 1:00 of Result 1 and 1:15 of Result 2 and 1:17 of Result 3.

 

But I have lots of data (a few thousand data points or so). I just need to get an overall sense of what correlates and what does not. I don't need to keep the data in it's original format and I don't need to agnoize over small differences (hence, ok to correlation 1:00 with 1:15 with 1:17). I just need be able to cycle through lots of different variables and determine... yes there is a correlation, or no there is not.

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
txnelson
Super User

Re: Correlating Data sets with non-identical time stamps

Here is a simple example of one way to solve the question you asked

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/semiconductor capability.jmp" );

colNamesList = dt << get column names( numeric, continuous );

dtSum = dt << Summary(
	Mean( Eval( colNamesList ) ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 )
);

Documentation on the method and functions are found in the Scripting Guide and Scripting Index

     Help==>Books==>Scripting Guide

     Help==>Scripting Index

Jim

View solution in original post

4 REPLIES 4
Highlighted
gzmorgan0
Super User

Re: Correlating Data sets with non-identical time stamps

To "match" data there should be some context for matching. From your description, it seemed you wanted to match by nearest hour. If that were the case then you could create a new column, I'll call it Rounded Hour. 

Round( :Time / (60 * 60), 0 ) * (60 * 60)

Then you could use Table Summary for your results, requesting the mean, or min or max, since there should be just one,

Data Table( "Example" ) << Summary(
	Group( :Round Hour ),
	Mean( :Result 1 ),
	Mean( :Result 2 ),
	Mean( :Result 3 ),
	Freq( "None" ),
	Weight( "None" ),
	statistics column name format( "column" ),
	Link to original data table( 0 )
)

 However, given your made up data there would be a problem with this. For Result 3 you have a reading at 2:34, which would round to 3:00 and another at 2:50 that would round to 3:00.  What would happen here?  How would you resolve this issue?

 

If it takes more logic, such as, "if 2 results at the same rounded hour, assign the closest to that hour and if none assigned to the previous hour, assign the other to the previous hour" and similar rules, then a script would be required.

 

 

 

 

Highlighted
pgfan
Level II

Re: Correlating Data sets with non-identical time stamps

That's super awesome. It's exactly what I needed. Thank you so much.

I do have one more question... for the script to make the summary table, I can type the column names (e.g. "Result 2") if I only have a few columns. What do I do if I have 100 columns with various names that need to be merged into the data table?

Thank you!
Highlighted
txnelson
Super User

Re: Correlating Data sets with non-identical time stamps

Here is a simple example of one way to solve the question you asked

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/semiconductor capability.jmp" );

colNamesList = dt << get column names( numeric, continuous );

dtSum = dt << Summary(
	Mean( Eval( colNamesList ) ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 )
);

Documentation on the method and functions are found in the Scripting Guide and Scripting Index

     Help==>Books==>Scripting Guide

     Help==>Scripting Index

Jim

View solution in original post

Highlighted
pgfan
Level II

Re: Correlating Data sets with non-identical time stamps

Thank you so much!
Article Labels

    There are no labels assigned to this post.