cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
The Doctor Cures  Your Date and Time Import Problems

 

If you work with data for any amount of time, you’re almost certainly going to encounter date and time issues along the way. In this episode of Ask the Data Doctor, Brady Brady shows you how to simple it is in JMP to shape, reshape or change your data – in particular date and time data – to make it look the way you want or need it to. Brady will also give you a brief look at how JMP reads, interprets and displays dates and times.

 

For you do it yourselfers, head on over to the Add-Ins section of the community and grab yourself the fix for all your date and time formatting ailments

 

Full Transcript (Automatically Generated)

View more...

Hi, everybody. Thanks for joining us. And today we are going to talk about date and time issues. So, welcome to the data doctor. And what this show is going to do is just discuss different ways to massage your data, maybe convert different formats, reshape tables, what have you. There's lots of different ways for jump to do that.

 

So we'll go over some that may be new to you, and may give you some new ways to think about things. So let's let's start today with a scenario where, you know, I've got a data table here with some dates, and I also have some data to go along with that. And I've got a nice graph that comes up. Okay, and this is, maybe I like this graph a lot. And I think, wow, I, you know, I'd love to do that with my own data. And so that's easy enough to do because I can just come here and Copy this script and then run it on my own data table. So let's say I go and do that I got this other data table looks pretty similar, it actually looks almost identical. And I go ahead and run this, and I get nothing. So the question then is what happened?

 

And how can I fix this? And so, you know, we try to do some investigation. There's nothing really looks different here. They these data tables have the same number of rows, they've got the same data types, it looks like you know, you got a categorical date and continuous apples for both datasets. So this, you know, we're kind of wondering what's going on here. We could take a look at the distributions of those values with these. These graph headers. This is new in jump 15. So this is a really cool feature. And due to the way these data are distributed, there's actually nothing I can see here either so now Were maybe a little bit worried, you know, what's what's what's happening here. And let's, let's increase the width, okay of both of these columns.

 

Let's look at the first data table. Notice how both columns are right justified. What that means is that jump is representing these values as numbers. I look in the second data table, the date is left justified, okay, it's aligned to the left. And that is telling me that jump is representing those as characters. And that's our problem. So this is a case and this happens a lot to people where jump is representing the data as a character just like a string like Joe or Bob.

 

Okay, and so jump cannot use that the way you'd like it to be used. So what we're going to talk about today are some ways to get around this to get jumped to do what we want. And we'll go in increasing order of difficulty if you will, or Just, you know, I guess time investment, the quickest, easiest way to do this, if you can, is just change your date datatype and then change your format, as as you would like it to be. So we'll see if that works. If that doesn't work, then we'll go on to steps two and three. So for this example, let's go back to that second data set. And to change our data type, we're just going to right click on the column header, go to Column Information. And we're just going to try to change this data type to numeric. That's what we want.

 

And let's see if that works. And we can see that it does work. If I drag this further out, the column width, I can see that all those values are aligned to the right, they're being represented as numbers. And, you know, for those of you that are interested, this is actually each of these numbers is the number of seconds since January 1 1904. And so you can see they're all pretty large numbers. Now this probably isn't the way you want to look at the data. But the fact of it is jump can still use this data intelligently. If I just run the script, I do get the graph I was after. And because I've used different formatting and the x axis, I'm still getting my years to show up. That said, you probably don't even want to look at these this way in your data table. So once you've managed to convert these two numbers, it's easy enough to go back into that column information. And just pick the format you want. So you've got date and time formats down here.

 

And if you pick a time format, you'll get hours, minutes, seconds after the date. And if you just pick a date format, you will just get the date, month and year. And I'm going to just pick this format. This is the format I like day and then a three digit or a three character month, followed by years. And we'll go ahead and hit OK. And so now we can see that these formats are going to match if I decrease the width of these things, columns and go back and compare to that first table. You know, as you can see, here it looks, everything looks the same. So that's your first first thing you're going to try. First thing you're going to try is to change the data type and the format to be what you want. Well, what if that doesn't work? Let's show an example where that doesn't work. Here, I have lots and lots of different formats. Okay, and if I just try to change any of these, let's just right click here and try what we did before. I'm just going to change this data type two numeric. Okay, and I hope it works. And boy, oh, Miss, I lost all my data, missing data for everything. So that clearly didn't work. I'm going to undo that operation. And so now I've got to try something else. Well, let's, let's try the in format function.

 

And so we're going to build a formula column. And we'll use a single function and see what happens. And so notice in this data table, I have lots and lots of different formats. If I, if I look at this example column, this data is all character data. And it's all represented with, with the exception of these last four lines, which are in the same format, I got different formats for all of these lines. So there's a lot going on here. Let's see how many of these jump can handle natively. I'll double click to make a new column, I'll right click and select formula. And then if you go ahead over here to the date time section, you'll see a little bit of the way down in format. That means input input format. So we're just going to in this first argument for string, we're going to put the column of interest and I'm not even going to bother with that second column right now. I just want to see if I can turn this thing into numbers. And look at this. This works a lot of the time.

 

I mean, actually There's only let's let's look at the missings here. Okay, 14 out of 55 cases are still missing, but the rest of them jumped digested. So all of these different if I, if I invert this here and take a look at these formats, all these highlighted formats, junk can deal with them natively, you don't have to do anything, you just use that input, or excuse me in format function. And you can turn these into numbers. And once you've done that, then you can set the format like you did before. So I can come over here and just say, let's, let's put a time on these. And this is the format I like the best date, month, year, hours, minutes, seconds, hit OK. And we've turned all these into times. So that's gonna cover you know, 90% plus of your cases unless you have a really strange format. That'll do you know, you're probably only going to get to the first two steps. In these fixes, for everything else we've got or close to everything else, we've got a an add in, that you can use to customize how you look at a character string and turn it into a date. So let's take a look at that. Where you can find that if you go to the jump user community, you scroll down into this first row here, you'll see the file exchange.

 

So if I click there, that's going to take me to another page where I can select add ins, scripts or sample data. And you're going to want to pick add ins here. And when you get to that page, you're going to type Data Table Tools. And so the first hit that comes up is the Data Table Tools add in and that's what we want. And so when you click on that link, it'll take you to a place where you can download the add in. So you'll just click here to download the add in Once you've done that, if you go back into jump and open that add in, you'll notice that it's now living in an add ins menu.

 

And so mine is right here, Data Table Tools. And I've got all my choices here. So we'll talk about that in just a second. So let's suppose you've done that. And you would like to, I'm going to delete column three here just to clean things up. Let's suppose that I want to convert this date, this character string into a date. Alright, how can I do that with the with the add in? Well, let's go to the add in. And you could just come here to special formula columns, and open up the custom date formula writer. In fact, I'm not going to do that. I'm going to run all of these in a separate table. This this data table tools has lots of different applications in it, one of which is the custom date formula writer. I'll just leave this open. So I can click that link anytime I like. So let's go ahead and click that link. And we have to pick a data table. And the data table we're working with right now is this example date formats. We pick a column where our date of interest lives and the first row in the table is returned only the first row.

 

So for the the table we're using, you know, I'll end up writing a formula that can convert this first row, but these other rows are not going to be converted. So this this add in assumes, as is the case, usually, that I've got a data table and all the dates are in the same format, it just may not be the format I want. Now there's two ways to delimit here. Okay, we'll do an example of each. The first way is is when you've got character delimiting like something is setting off fields whether it's commas, colons spaces, what have you. And that's what we have here. When you click on a delimiter, it will highlight that and highlights all the rest of those spaces so that you don't have to do it over and over. So we've got spaces now I'm going to click on commas. We've got colons.

 

And once I've done that everything's broken apart. These are the chunks that I need to process. We look at our year, it's a four digit year. So we make sure we've made that selection and this little radio button, and we'll just apply the delimiting. And we just go through and tell john, what each of these fields mean. So March is our month, and we have to pick what kind of a month it is. It's a complete name month, as opposed to a numeric or a three letter month. 20 twos the day, 1999 a year. And then we've got five hours, six minutes, seven seconds. And here we've got an AMPM designator and once we fill this out, all we have to do is determine if we'd like an hour offset to be used and you can see here that our timezone central European time.

 

So if I want to leave it in that time, I don't apply an hour offset. Maybe I'd like to convert that to Greenwich Mean Time when then I could apply the hour offset that is that is applicable here. And then I pick the display format I want lots of choices here. pick my favorite. That's it, build a formula column. Let's take a look. We come back in our data table. And we can see right justified values. So we know that jump is representing this as a as a numeric, you can also look over here in the column section to the left and see that that's got the little blue triangle. So it's continuous data. That's what we wanted. And it's done our formatting for us.

 

And in fact, if we right click and look at the formula, you can see the jump has written this script for us. And so if you're interested in dates and times and some of the functions associated with them, you can certainly go through the script and and see what's going on. But the beauty of it is if you don't care, you don't have to care. This was all written automatically by, you know, interpreting those selections that you made in the menus. Right. So that's our first type of problem where we have character delimiting. And that'll work on, you know, any of these, you could, you could do that way. Now, what happens in the other case, where Take a look at this military time format, and something interesting going on here. If you look at what I've highlighted, there are six numbers.

 

And then there's a Z, that's Zulu time, that's Greenwich Mean Time. I need to break these up based on position there, there are no delimiters and in fact, that 11 that starts this off, that's actually the day that is 11, April, and then 1430. You know, it's 2:30pm. So I To tell jump how many characters we're working with here so that it can break things up. So let's see an example of that. I will, however, delete this date column that we built. Just to get us back to start. I'm just going to toggle back and forth between these to refresh the display.

 

And here is that first row of data now. And when I when I choose fix delimiting, watch what happens I get this little red tick mark. And I'm gonna put tick marks in front and behind everything that I want to to limit. So this 11 I have to tick mark in front and behind and the 14 and the 32 beginning and end of the April, the beginning and end of the 98. And that's what I care about. Now I have to look at this year. Okay, right now it's 98. So it's a two digit year and I've got a couple of choices. I could just assume that this is the 2000s but that's Not what we want here, right? We don't want this date to be 2098. So we'll do a custom look back.

 

And the way I like to think about a custom look back is you think about how far in the future you'd actually like to look and then subtract that from 100. So let's say in this data set, I'm never going to get a date that's more than 10 years into the future. Well, then I'd like to look back 90 years. So I can just come here, look back 90 years. And that's going to make sure that things like 98 shows up as 1998 38 shows up as 1938. But something like 25 would show up as 2025. So we can go ahead and apply the delimiting and that's going to refresh these drop down menus. So now remember, I said that this 11 was actually the day and the 14 is our hours, 30 or minutes.

 

April's our month but it's a three letter month. We don't we don't have complete name months here. So we got to make that selection 98 is our year. And this is Zulu time. So I'm not going to apply any offset. But I will put things in my favorite format. Go ahead and build that column. And let's take a look. Okay, and as we see, let's, let's take a look at the years this last line had a 38 year well, that's getting written as 1938 to 11 April. Now 1230 is going to be 12:30pm. And so we've got that. Here we've had 1430. So that's written as 2:30pm. So it's all looking good. Again, if you want to take a look at that formula and see what's going on, there it is.

 

And so, you know, between or among, I guess I should say, these options This should get you pretty far, whether you're just able to change the data type, like we did in the first example, whether you have to do a really quick formula column and use the in format command to turn things into numbers. Or whether you have a really special case, and you need to define explicitly some delimiting scheme, whether that's fixed or whether it's character delimited you've got options here and this should you know, once you try these, this will fix it put it this way, if it doesn't fit your date and time issues, you got to let me know and I'll, I'll see what I can do about updating the ad and because this, this covers most of the options I can think of out there. So looks like we got a couple minutes left. That's That's all I've got. Julian I don't know if there are any questions or anything like that or if you just want to take it back. Thanks, Brady. Yeah, I'll I'll go back. And actually, you know, one question I think some people may have is how they can ask you questions or put in? I don't know, request for the doctor to cover something.

 

Oh, that's Yeah, I'm glad you mentioned that, because I forgot to and, you know, for this segment, as Julian said in the beginning, you know, this is about you guys, and about the problems you're having. And so I can try to think about the problems you're having. But you know, it's better if you'd let us know. So I don't know if you've set up a special space. So we could do that on the community somewhere, Julian, right. Yeah. Yeah. So your episode page actually will be a great place people can comment on this episode, but we'll probably get you a special place to be is, I imagine a lot of people would like to like to ask you some specific questions. Perfect. Yeah, we report that back. And this is a recurring segment. So every Friday we get to hear from the doctor. So yeah, definitely, please submit your submit your questions.

 

Comments
Ake

Why did I not come across this before? I needed this so many times! Many thanks!

Ake

hogi

Thanks @brady_brady for this wonderful talk on how to fight with data formats.

How about 

hogi_4-1710777118237.png

(from the video)

or 15b (with points) :

hogi_5-1710777136572.png


looks very dangerous - hard to detect the inconsistency/issue !

hogi_0-1710843462652.png

 

 

@hogi 

 

Both of those examples are delimited, so you can fill out the dialog as below. The second example has slash delimiters rather than dots, but it will otherwise be the same.

 

brady_brady_0-1710781283984.png

 

hogi

Thanks

 

Another way to get March 04:

Informat( "4.3.2023", "Format Pattern", "<DD></><MM></><YYYY>" );
Informat( "4/3/2023", "Format Pattern", "<DD></><MM></><YYYY>" );