Hi JMP Community,
I'm looking for help with forcing a date format import using the Excel Import Wizard via JSL. I'm trying to import data that is in only two columns in an Excel file, one date, one data, it looks like the following:
The problem I have is at the transition from the dates 12.02.2019 to 13.02.2019 (marked by red at the left side). For some reason, JMP is reading the date values from the start to the last 12.02.2019 value as dd.mm.yyyy format, but then switches at 13.02.2019 to reading dates as mm.dd.yyyy format. When I go into the Excel file and review the formatting for the cells, they're identical. I can't figure out why JMP is suddenly switching. When it makes this switch, it leaves the date cells empty in the JMP table.
If I import the data in two different blocks, those from the start to the last dd.mm.yyyy format, and a second one for the others, it reads each date correctly, however the Column Info for the two different dates are formatted differently and are incompatible with each other. If I try concatenating the two sub-sets, it just switches the day/month of whichever table I concatenate to.
I've also tried changing several of the toggle options in JMP preferences to try and force system or JMP settings, but all were unsuccessful.
If I can import the date as a character, I can modify things accordingly and then switch it back to a continuous data type, and it should be all OK. The only problem is I am not sure how to force this, or if this is the best way.
As with most automation attempts, I STRONGLY prefer to not go in and edit every Excel sheet or file. I want to have JMP do this via JSL script.
The JSL code I use to import is:
Open( "file_location\file.xlsx", Worksheets( "Sheet 1" ), Use for all sheets( 1 ), Concatenate Worksheets( 0 ), Create Concatenation Column( 0 ), Worksheet Settings( 1, Has Column Headers( 1 ), Number of Rows in Headers( 1 ), Headers Start on Row( 1 ), Data Starts on Row( 2 ), Data Starts on Column( 15 ), Data Ends on Row( 0 ), Data Ends on Column( 16 ), Replicated Spanned Rows( 1 ), Replicated Spanned Headers( 0 ), Suppress Hidden Rows( 1 ), Suppress Hidden Columns( 1 ), Suppress Empty Columns( 1 ), Treat as Hierarchy( 0 ), Multiple Series Stack( 0 ), Import Cell Colors( 0 ), Limit Column Detect( 0 ), Column Separator String( "-" ) ) )
Is it related to the "Limit Column Detect(0)" option? I can't find any documentation on this and what it does.
This issue is similar to one posted by @ghartel back in May of 2017 (https://community.jmp.com/t5/Discussions/Excel-Import-Date-Format/td-p/16619), which didn't get a direct solution to their specific issue, at least as far as I can tell.
Any help is much appreciated!
Hmmmm, @DS, you might have found a bug.
The approach I take is to read in the data, let it be a string then convert it. However, if I try to use only the last Data Type() message, I see the same behavior that you described. If I keep the same informat and format statement, then the problem is not seen. Once converted, then set to your chosen format
Names default to here(1); dt = Open( "C:\temp\ExcelDateBlog.xlsx", Worksheets( "Sheet1" ), Use for all sheets( 0 ), Concatenate Worksheets( 0 ), Create Concatenation Column( 0 ), Worksheet Settings( 1, Has Column Headers( 1 ), Number of Rows in Headers( 1 ), Headers Start on Row( 1 ), Data Starts on Row( 2 ), Data Starts on Column( 1 ), Data Ends on Row( 0 ), Data Ends on Column( 0 ), Replicated Spanned Rows( 1 ), Replicated Spanned Headers( 0 ), Suppress Hidden Rows( 1 ), Suppress Hidden Columns( 1 ), Suppress Empty Columns( 1 ), Treat as Hierarchy( 0 ), Multiple Series Stack( 0 ), Import Cell Colors( 0 ), Limit Column Detect( 0 ), Column Separator String( "-" ) ) ); wait(0); //change to 2 seconds to see that the data is read in as text dt:Date << Data Type (Numeric, "Continuous", Format( "ddmmyyyy", 12 ), Input Format( "ddmmyyyy" )); dt:Date << Data Type (Numeric, "Continuous", Format( "m/d/y", 12 ), Input Format( "ddmmyyyy" ));
Hoppe that helps.
The Excel Preview looks at the first 100 rows to determine the data type for the preview, for performance reasons. During the actual import operation, JMP looks at all the rows. If it sees data that leads it to a different conclusion after row 100, it can produce a different result. If you want the Preview to look at all the rows to produce what you will see on import, you can select the "Show all rows" option in the Preview Pane Refresh of the UI. If you want the import to look at only the first 100 rows when reading the data, to produce the behavior of the Preview, you can go to "Advanced Options" in the second pane of the import dialog and select "Limit column type detection".
JMP 15 will introduce the ability to force the numeric formatting types of individual columns to the whatever the user chooses.
Thanks for the input. I tried your suggestion, but unfortunately, it did not work. Even in the preview pane when "limiting column detect" and showing all rows, JMP has an issue with reading in the format. It reads in the first 205 rows just fine, it's at row 206 where it switches format for some reason.
What doesn't make sense is that the formatting within Excel doesn't change from one row to the next. If it did, I can understand why JMP might read them in differently. All cells are formatted as "Date" dd.mm.yyyy in Excel. The same thing happens on the next tab that I'm importing, but at rows 52 to 53. I could understand if there was an issue in the original Excel file generation where the same rows across all tables had some glitch that saved the dates differently, but it's not even at the same location from tab to tab when JMP reads in the data.
The only formatting thing from the Excel side that I see could be that the "type" option in the "date" category starts with an *, see attached image below. But, this should update according to the OS settings. All dates in the column are formatted this way, so if JMP has an issue with one, it should have an issue with all.
I will try @gzmorgan0's suggestion for importing and see if that works.
That will be nice to have JMP force certain formatting types during import, especially when it comes to building a JSL code for automating the process.
Thanks for your thoughts and input. Unfortunately, this approach also doesn't work. The problem stems from JMP misreading the date format before it even imports it. If I run the script or try to do it throught he wizard GUI, JMP misclassifies the dates as either a d.m.y or m.d.y format when it's the other one.
Even modifying your code for different variations in date format doesn't solve the problem. It simply doesn't import the mixed format and only treats one kind or the other as continuous, the others it'll just ignore. The JSL code won't import it as a nominal data type either -- it comes straight in as continuous.
I did a little more digging into the Excel file and the one common thing across all tabs that causes this mistake when I try to import the data is when it reads down the date column and goes from the date 12.02.2019 (12th Feb, 2019) to 13.02.2019 (13th Feb, 2019).
Here's my theory: I think JMP is mixing up the format for the dates before the 12th to after. My original date range is from 02 Jan 2019 to 28 Feb 2019: 02.01.2019 to 28.02.2019.
As JMP is reading down the date column, it appears to actually be interpreting the date 02.01.2019 as 01 Feb 2019, so when it gets to 12.02.2019, it's actually reading in the date as 02 Dec 2019. As a consequence, when it gets to 13.02.2019 (13 Feb 2019), it doesn't know how to read in the date since there is no 13th month. The column properties in JMP always show it as "continuous" modelying type, "d.m.y" format and "d.m.y" input format. It does this unless I split the import into two sections.
If I split the import into two different parts, one from 02.01.2019 to 12.02.2019 and then from 13.02.2019 to 28.02.2019, JMP imports the data appropriately and assigns the column format appropriately. I can then concatenate the two into a correct data table. I don't know why it wasn't working in my original post, but it can work this way.
I can also confirm that there is no dependence on the data being correctly imported on what preferences I've set within JMP.
This is not so conducive to automation since the original file might change the location (row) of this 12th/13th date change. I guess I might have to make it work via the split route, though. If this is a bug in JMP, it would be great if this could be fixed.
JMP should honor formats that are applied specifically to a column in Excel. Have you gone into Excel and use the Format Cells dialog to explicitly set the column type to Date and then a European format like German d.m.y? In that case JMP should bring the data in correctly.
Hi Brian (@briancorcoran),
Yes, in fact the column is set to "date" with a German style d.m.y. format. Please check a previous post in this thread where I include a screen shot of the Excel column property window and the setting. Even my system setting is like that, see below.
dt:Date << Data Type (Numeric, "Continuous", Format( "m/d/y", 12 ), Input Format( "ddmmyyyy" ));
dt:Date << Data Type (Numeric, "Continuous", Format( "ddmmyyyy", 12 ), Input Format( "ddmmyyyy" ));
As stated in my response, this was very unexpected behavior and seems to be a bug to me. I am using JMP Pro 14.3