JMP and Excel - A Never ending story? - Part 2 cont.
Feb 16, 2017 4:50 PM
In this blog series about "JMP and Excel – A Never ending Story?" I spoke about importing spread sheet data, and described the first few typical challenges when importing data from spreadsheets (or sometimes also other sources). This time I will focus on dates and date formats, the third topic I listed in Part 2.
Day and Month will be wrongly interpreted
When you paste dates such as “10/11/2012” into a data table, JMP might need to resolve the ambiguity between m/d/y and d/m/y formats:
Is “10/11/2012” in October or November? If you assign a date format to the column, there is no ambiguity. BUT: special or some custom formats (e.g. formats with a * prefix) in Excel will not always be recognized
JMP’s default guess is m/d/y unless the first pasted value can only be a d/m/y date! To work around you either specify the proper date format upfront or have a distinct dummy date value in the first row, e.g. 13/1/99.
All changes and imports underlie the fact that values that do not match the format you choose will be set to missing. You will receive a JMP warning for that.
In some cases, JMP chooses a character format for the column. You can change the character column to a numeric data column later. Let’s discuss this a bit deeper: Column info lets you change the date format. There is an input and output format you could set. Be sure you make it right : Changing Date formats in JMP Column Info with Input and output format)
Please Note: Between Windows and Macintosh, the number of digits after a decimal point and the date format of imported data might differ. For example, “10/25/2012” might be formatted as “25Oct2012” on Macintosh. Columns might be imported as character columns on Macintosh but not on Windows.
As said, not all formats may be recognized, therefore the displayed imported data might differ from the expected. However, the full data will be in memory, so a change of the display will show the full information. If not specified upfront you also can change here the date format to a time or duration format or vice versa
In some cases the easiest way would be to change the display language preference setting in JMP e.g. to English or German. You can see the "how" in my previous blog post. But not always this resolves all cases.
Date format not recognized
Sometimes date formats are hard to recognize and JMP will import them as character. Examples can be customized dates including calendar weeks, quarters and others. In such cases you might need to create new columns with a formula to extract the specific information for further analysis. I want to show an example:
“KW14/2016” could be transferred into two new columns “weekNr” and “year” using the formula editor or JMP Scripting Language. The Formula Editor is a good friend for those tasks, as well the WORD or other character functions, of course you could use REGEX as well, usually this needs some familiarity with this function. The Year you can simply extract using this script (or analog in the Formula editor):
Num(Word(2, :MyDate, "/"))
Also other ways you might find valuable, I leave that up to your preferences.
Define your day, month and other date information
A different approach of transferring date data imported as characters is to use Brady Brady’s “Data Table Tools” Add-in. There you’ll have a nice functionality to handle these cases. Let’s have a brief look at it, more you’ll find in the description and blogs from Brady itself. Let’s assume you have data like in this data table:
The date column is a character format, as the format is not known by JMP. Changing the data and input format will therefore not work here resulting in missing values (no matter what language setting you are using). In the Add-in menu select "Data Table Tools" => "Special Formula Columns" => "Custom Date Formula Writer" and then do the following:
Select Table and column to change
Define the delimiters, here the underscore, it will replicate to other places in the string (Note that you can choose whatever you like interactively – my personal favorite)
Specify digit format for the year (e.g. important if you have a two-digit year)
Map each fraction to what it should represent (day, minute, year, …)
Important for the month is the format it has (Numeric, 3-Letter or complete name)
Tell what the output format should be and then press “Build Formula Column”.
The result is clean numeric date data in a new formula column “NewDate” where you can look at the formula and see how the conversion could have been done manually
That helped me already in a plenty of situations. Though not in all. First it didn’t work with local names as Okt for October in German. Second I had one case where the data from a database was not read in correctly and the result was already missing values after the import. There we could work around using the Language setting change I mentioned already. There might have been also the possibility to check the database connection and how it grabbed the data, but this is much more to dig into and hard if it is not your machine you could easily share with development and IT :)
What comes next?
I hope this will be useful information for your work and I’m happy for your feedback on how those hints work out for you. In my next two and probably last blog post for this JMP and Excel series I will talk about data quality issues in the data to be imported. Especially where should I start with proofing and editing the data – in Excel or after import. JMP or JMP, that’s the question ;)
Also I will talk about best practices importing data into JMP and further resources which will help as reference.
What is your experience? What challenges do you face?
I’m happy to learn from you and see how you have overcome those challenges and if my suggestions help you to reduce your challenges importing (spreadsheet) data.