My name is Wendy,
and I'm a technical lead working with health and Life Sciences in the Midwest.
My motivation for this talk comes from working with JMP customers and prospects
across a broad range of industries.
When I reflect on my customers' questions over the past five years,
the biggest opportunity with data analysis
has been in using all the data that's being collected to make decisions.
With sensors becoming more prevalent in measurement systems
and manufacturing equipment,
more data is being collected over time.
It's cheaper to collect and store the data,
and as a result, scientists and engineers are often swimming in data.
They want to use the data
to better understand their processes, to make better products,
but they have a knowledge gap when it comes to how to process all the data.
This talk is intended to be an introduction
in understanding how to work with data collected over time.
What do I mean by timestamp data or data collected over time?
I have some examples here.
It could just be a time or a date or both a date and a time.
This data can be formatted in many different ways.
When I say timestamp data, I'm referring to all of these variations,
and it really is the granularity at which your data is stored.
It's a pretty broad category,
but this is what we're talking about today.
The title of my presentation is,
If I only had Time to Work with Timestamp Data,
I would learn so much about my process.
I would almost retitle that to say, If I only had the knowledge.
I've seen people struggle with working with this type of data,
like wrestling with a pig,
because they simply don't have some of the fundamentals
of understanding how JMP recognizes this data
and how to marry data together by timestamp.
It doesn't have to be like this.
Today, what I'm hoping you walk away with are two things.
One is just some basics about timestamp data,
and then an introduction to JMP Query Builder
if you've never used this before, to join Timestamp Data.
All right, so let's start with the ABCs.
I'm bringing open a spreadsheet here,
and we're going to bring this data into JMP
to just talk about some of the basics when it comes to timestamp data.
I've got three columns here.
We're first going to focus on start date and end date.
Excel is a very common way that raw data is stored.
Let's bring this into JMP.
Okay?
Let's first focus on the Start and End date.
I'll hide my special column here just for a second.
Let's first focus on the way that this data came in.
All of these columns actually came in.
JMP is recognizing them as nominal data.
The first thing you want to know about timestamp data,
if you want to work with it,
is that it should be stored in JMP as numeric and continuous.
Let's first open up Start Date and just take a look here.
JMP recognized it as character and so therefore it made it nominal.
We need to communicate to JMP that this is
date data or timestamp data by changing it to numeric and continuous.
That is the type of data,
dates, times and timestamp data need to be coded in JMP.
Now I'll keep this open and I'm going to click apply.
You can see that the values change .
So this is correct .
This looks nonsensical to us,
but this is the other thing you need to know about timestamp data
is the way it's stored in JMP
is the number of seconds from a reference date.
It's specifically the number of seconds from January 1st 1904 .
You don't need to remember the January 1st 1904 .
You just need to remember that dates, times and timestamps
are stored as the number of seconds.
The way that we make it look sensible to us is by changing the format.
I almost think of the format as a mask.
This is correct.
The software understands this, but I want to be able to read it easily.
Now let's work with the format.
I'll go to this drop down and I am just going to call out these three menus.
There's a date menu, a time menu, and then a duration.
If you're working with timestamp data,
these are the three menus that are going to be relevant to you
in order to format the data.
We'll talk about a special case in a second.
Let's format this in the same way that it appears.
Because this is formatted as a date,
sorry, day and then a month and then a year,
let's make the selection that preserves that.
I'm going to select this particular option,
click 'Apply' to confirm that that's what I want to see.
Now I'm good to go.
Notice again in the columns area.
Now start date is numeric and continuous and now JMP knows.
January 26 is five days after the 21st.
It doesn't know that right now for end date.
Now let's do the same thing with end date.
This is going to allow us to subtract the two.
To do date time math.
I'm going to come in here again, numeric and continuous.
We'll click 'Apply' .
That's the number of seconds from that reference date.
Now I'm going to change the format.
Okay, so now I have numbers that I can perform math with .
I can't perform math on character data.
If you want to do something like calculate a date difference,
now we can do that now that our data is being stored and JMP properly.
Let me show you one way that you can work.
Do, let's say, column math with dates.
I'm going to select these two columns.
I'm going to right click, and I'm popping into this new formula column menu.
I can ask JMP to help me write this formula.
Because I have two columns selected,
I'm going to go to this combined menu and tell JMP I want to take the difference.
I'll take the difference
in reverse order because I want end date, minus start date.
There you go.
Again, I'm not alarmed
because I remember that JMP stores this type of data in seconds.
To convert this to another unit, let's say days,
we just need to work with this formula a little bit more.
Let's do that, I'm going to hit the plus sign.
Now I'm in the Formula editor .
We can convert this to days by dividing this by 60 seconds are in a minute,
60 minutes are in an hour,
and then there's 24 hours in a day.
I'll click 'Okay' .
Now we can confirm, right, that this is the difference between the two.
You can just do some mental math here.
It's not too challenging.
That looks good.
I want to present you another way to do date time math or timestamp math.
We could use the Formula editor.
Let's go ahead and create a new column.
I will right click and go to the full Formula Editor.
There is a more comprehensive list of date time functions in the full Formula Editor
than you're getting in the data table when you right click.
This date difference,
if I hover over it, you can get a peek of what it offers you.
It allows you to specify an interval name.
I'll double click on that to bring it into the view.
Now let's tell JMP that we want to take that date difference with the two dates,
and then we can specify the units that we want .
We don't have to do the 60 X 60 X 24.
I'll put the end date first,
start date, and then my interval name does need to be in quotations.
This little hover help does let you know that that's the structure.
Let's click 'Okay'.
Now I can see I went backwards on that.
Let's double click in here.
We just need to swap those two.
Start date should be the first one.
There you go.
These are the same values we got when we performed that division.
Two different ways to perform these calculations.
But the big takeaway is
to familiarize yourself with some date time formats.
Not formats, but calculations.
You can do that by exploring these quick formulas.
It's via a right click in the data table.
There's a date time menu here
or in the full Formula Editor
that's going to be this very large menu of various date time formulas.
Okay, so now let's wrap up the ABCs
by looking at this special date time column that I hid.
I'll unhide it so that we can focus on it.
Now this is a full week day of the week name.
You see the date and the time.
Let's try formatting this,
or I should say communicating to JMP how this data should be stored.
I'm going to right click and go
into column info just like we did with Start and End date.
I'm going to go to numeric and continuous .
Because that's how this type of data needs to be stored in JMP.
I'm going to click apply.
Now I'm disappointed because JMP has basically deleted all those values.
It's telling me that it does not recognize
that information as being a date .
It worked,
we got seconds when we did that with Start and End Date,
but we're not getting that here.
This tells us
that we need to work a little bit harder because this is a special format
to tell JMP that this is time data or date time data.
We're going to leverage and undo here to get our data back.
Now I'll right click go back into column info
and here let's first do numeric continuous.
Here we're going to go to the format menu
and I'm going to introduce format pattern.
We talked about dates, time durations.
Here's where you want to go if you have a special situation like we have here.
Now I'm going to say set format pattern.
Now the process is about communicating to JMP.
Each element in that it is a date, time duration or other.
See this other menu.
That's what we're going to do is each piece we're going to tell JMP what it is.
Wednesday, I'm kind of focusing on this first line.
What is that?
Make this a little bit bigger.
Well, that's the day of the week .
It's the full name.
I'm going to make that selection.
Then I do have to tell JMP. "Okay, well, I have a comma next,
and a space".
And now I have the full month next.
I'm just scrolling and finding that in the menu.
Here is month long name.
I'm going to make that selection.
Again, I have a space and now I have a two digit day
and then I have a comma and then I have a year.
It's a four digit year .
You can see it being a little bit of a preview of what I'm specifying here.
That's really the process.
Just to kind of shorten this up, I'm going to do my little cheat thing
and I'm just going to paste in the full thing .
We built it up to year, but the process would be the same for
telling JMP about the hour, minute and AM, PM.
This is a game changer for those of you who have special formatting.
I do see this more and more often with
the various pieces of equipment that are collecting data over time.
All right, so now let's look to see,
now JMP does recognize this as date time data.
We have numeric and continuous,
and we can see via the formatting that this data is correct.
Okay, let's do some review, and then we'll push on to part 2 here.
What have we talked about so far?
JMP recognizes, let's call it timestamp data broadly
as the number of seconds.
You don't need to remember the January 1st 1904 .
Just know that it's the number of seconds.
The data should be stored as numeric and continuous,
and the format is the way that you make it legible or understandable to you.
JMP understands it as the number of seconds.
We need to view it as a specific type of format so that we can understand it.
The last thing I'll say is develop a familiarity with date formulas.
We had a chance to do that both in the data table with those quick
formulas and then in the full formula editor.
Oh, yes, I said that was the last thing.
But that special case we addressed
with the special date times, if that's your situation,
you can deal with it using set format pattern
so that you can ultimately work with that special formatting of timestamp data.
Okay, now we're going to go to part two,
and this is where we are going to do something that
I have seen lots of customers struggle with, which is,
they want to align data by timestamps,
but they can't do it because of a lack of awareness of tools.
That tool is going to be JMP Query Builder.
We're going to just dive into an example
that I think will illustrate this particular challenge.
All right, so here's a case study.
We are making this product called Forever Young Elixir,
who doesn't want that product.
To make Forever Young Elixir, here's a little bit of a sketch of that process.
We start with raw materials.
Then these raw materials go into a pressure step.
This is very abstract.
Then they go into a temperature step,
and then we get our finished product.
What we're most concerned about
with respect to this finished product is potency .
We want high potency in our finished product.
Every finished batch,
we take a sample and we measure potency.
So here you can see this is a sample of the data that's being collected.
This first batch finished at 10:00 AM on 6:28,
and it had a potency of 95.41%, etc .
I have one row per batch because I get one potency measure for each batch.
I also collect temperature data over time.
There's a sensor on this temperature step that continuously measures the temperature
that let's call it my product in process is going through.
I have the same thing for my pressure step.
There's a sensor here too, and I'm collecting pressure over time.
I'm making this product and I had some ideas about how to set the process.
But I want to know, can I make this better?
Can I get more batches that are of higher potency,
more consistently higher potency batches?
I want to use my sensor data
to figure out is there a correlation between the temperature and potency
and ultimately also pressure and potency,
and where should I try to get that process to sit at in these steps
to get to maximize potency .
That's what we're marching towards.
I want to identify
temperature settings that are going to give me the highest potency.
Let's focus on this graph here.
I'm plotting every single batch here
and its potency value on the y axis, and I have the average temperature
that it was experiencing in that temperature step.
We're going to focus on temperature,
but these steps would be analogous for that pressure information as well.
This is what I'm marching towards.
Being able to build a graph like this allows me to see,
oh, I can maybe maximize potency
by maintaining a temperature setting
of between 63 and maybe 68 .
This is where I'm getting the highest potency values.
We need to ultimately get a data table that has both potency values
and temperature values so that we can perform this correlation.
A little bit of an asterisk here only
because there are certainly other ways to analyze this type of data.
We're going to take the average,
but there are certainly other tools and JMP and other approaches one could take
to analyze this type of data.
We're not going to address that here.
Just kind of putting it out there as another thing to explore.
We're going to stay focused on this
data preparation aspect of working with this data.
When we have this data aligned, let's focus on batch 1, 2, 3, 4, 5.
We have one potency value for batch 1, 2, 3, 4, 5, that's the 95.41,
and that's why you see it across all the rows.
I only have one value for that batch,
but I have many values for temperature .
It's just sort of streaming and being collected.
How do I do this alignment?
What do you do,
when you don't have a batch ID?
That's really kind of the crux of the issue.
When we have unique identifiers,
we can join that data by those unique identifiers.
When we do not,
how do we address this problem?
That's what we're going to talk about now.
Really the first step before even getting into JMP
is to think about the process.
With a hypothetical batch finishing at 09:10 AM,
when is it inside of this temperature step?
What is the relevant time range
for a finished batch that finishes at 09:10 AM?
That's not a software question.
This is a let's understand the process question .
In talking to the manufacturing engineers,
we've determined that a batch spends five minutes in this temperature step
before it becomes a finished product.
We're going to use this five minutes,
and we're going to calculate from the finished product
timestamp and determine that for a 9:10 finish,
the product or the pre made product
started in this temperature step at 9:05AM.
That is the work ahead of us in the software is to associate
the sensor data using that five minutes to that end product .
We've got the finished time for the Forever Young Elixir batch,
and we need to collect the sensor time that's relevant for that finished product.
Okay, so let's dive into the software.
Here is our potency data.
We have one row per batch,
100 batches in here.
Let's go take a look at the temperature sensor data.
This is being collected almost every second.
We have 500 rows in here,
so no batch ID.
this is the challenge
that Query Builder is going to help us with.
If I scroll down, you can see, oh, some missing values, maybe.
I don't know if the sensor was down, that can happen.
Our first step really is
we're going to work with this temperature data I'm sorry, the potency data.
We're going to calculate a start time.
A start time meaning when did it start being in that temperature step.
Let's do some labeling because we're going to have a lot of timestamps here.
I'm going to call this batch finish timestamp.
Now let's do some math.
We're going to subtract five minutes from the batch finish timestamp.
Let's go to the formula editor, and I'm going to select
batch finish timestamp, and I'm going to say minus five
now, because I know, we all know now that JMP likes to work in seconds.
This five won't work.
I would need to change five to seconds,
or I could ask JMP to do that for me .
I could change this to 5 X 60.
Or I can come here and say that five is in minutes.
Now let's check it out.
Again, we're not alarmed because we know
we just need to change the mask, right, the format.
Let's change it to a timestamp.
We'll make it consistent with what we have.
There we go. We'll call this temperature start time.
For batch 1, 2, 3, 4, 5, that finishes at 10:00 AM
we're going to start collecting that sensor start time at 9:55.
Let's go pop over to Potency data, sorry, temperature data.
If you're used to using the join function in the Tables menu,
you don't want to do that for date data or timestamp data.
What you want to use is JMP Query Builder.
That's what we're going to use.
I'm going to join temperature sensor data with potency data.
I'm going to double click in here to set the join criteria.
This is going to allow us to set two criteria for aligning this data.
The first one is I want to collect temperature data
that's greater than my temperature start time,
or I should say greater than or equal to my temperature start time.
In my potency data that I calculated.
I have a second criterion that I have, which is
I want to stop collecting data for a particular batch
and use the batch finish time.
We did greater than before and now it's a less than.
I'll click 'Okay' .
Now let's build the query.
We've told the software how we want to align these rows
and now we're going to go to the next window, which is the build query part.
First step is we need to tell the software
which columns we want in the resulting table.
I'll just add all of the columns
and you can see you get this preview down here,
right, so we can start to see do some sanity checks on the data.
This is the join that we're about to do.
Does it look correct?
I think what's worked well for me is
to focus on a single batch so we can pick on 1, 2, 3, 4, 5.
This first batch here,
I can see that I have a single potency value.
Then I can also see that
I've collected temperature values that are inside that window of 9:55 and 10:00 AM.
This looks good.
Now, because I just have one potency value,
I do need to summarize that temperature data,
if I want to create a correlation.
This is where we're going to use an average.
I could run the query and get this raw data and then summarize
using table summary.
But I want to show you how you can do this in Query Builder.
We'll stay in this window
and we're going to use this aggregation option for temperature.
We need to get rid of some of our columns here
because we don't actually want this level of granularity.
We want this all at the batch ID level.
I'm going to get rid of timestamp,
which is the timestamp for the temperature sensor.
I can keep the batch finish timestamp
because there's only one value for each batch ID
and I don't necessarily need this temperature start time.
We can reorganize this in a more logical way .
Maybe we start with batch ID, we preserve the finish time
and then we've got our Potency and our temperature.
Now we're going to go to the temperature one and take an average .
Certainly you could calculate additional statistics as well.
Maybe you want to also look at,
you could look at a min value.
You could look at a max value .
There are many other ways to kind of look at this data.
We're just going to stick to temperature
so it looks like we maybe had some sensor data
where we didn't have batch finished batch.
Okay, so this looks good. Let's just do one more sanity check.
Batch 1, 2, 3, 4, 5, the average temperature is 56
and we have this potency value.
You could certainly do more checks.
That's something I would recommend is
just to go to your raw data and just confirm.
But this looks good.
We're ready to create the table.
At this point, I'm going to say run query
and we're ready to look at correlations .
I've got this blank row, I could just delete this guy,
but now I have Potency and average temperature for each batch.
Now I can go into graph builder and look at that relationship.
Here's potency, here's temperature
and maybe we add a model to it .
Let's change this to a line of fit.
It certainly doesn't look like it's linear,
probably better modeled as quadratic.
We're done.
We could add some more statistics here, but this is really a nice starting place
where we can start to see,
to make statements like I'm seeing higher P otency values
when temperature is
maybe between 64 and 68.
I should go back to my process and try to see if I can set
the temperatures to stay within that operating window.
Okay, so with that, I'll conclude
and again, just remember,
when you're working with timestamp data and you want align rows,
think of JMP Query Builder, don't think of Tables Join.
You really need to set to be able to set
two criterion for the boundaries in order to align that data.
Just as a wrap up, I'll do some shameless promotion.
I've documented this case study in a blog post.
If you'd like to review it, you can take a look here.
I even include the data set in here as well so you can recreate this.
There are some nice references as well to some other blog posts
that others have written on this topic of working with timestamp data.
All right, thank you very much.