Have you ever experienced an issue importing data from Excel or other sources? I bet you are not alone, at least I had mine. In my previous blogs Part 1 and Part 1 cont. I showed in general how to import spreadsheet data into JMP. Let’s now talk about typical challenges importing data I come across over and over again, both working with data on my own or when listening to our customers and users.
Most often I faced challenges when the system of the creator of the file has different language settings than the receiver of the file aka I. E.g. someone in Germany gets a file from someone in the US. Then you probably get in trouble with the dates or numbers format. Although this is not only a challenge restricted to JMP Users, it can be frustrating. Changing system settings would often do the trick but is not the preferred way for several reasons – e.g. it will affect also other software. So what can you do?
What is your experience? What challenges do you face?
Living in Germany (and that’s true for most non-US countries) there are some other typical issues with Umlaut, special characters, comma vs. decimal separated floating point, date formats (especially month vs. day defaults) and I’m sure you know more. If so please post them as comments so we can collect those and check if we can find ways to improve importing international data – as I want to call it.
However, now I will try to describe how I dealt with several of those. At all they are meant to be suggestions and may not be valid for all cases. However, I found them quite helpful in most of the cases I worked with international data.
Floating point data will be separated in comma instead of decimal points or vice versa
Two Digit years are wrongly imported
Date format will not be recognized or Day and Month will be wrongly interpreted
Floating Point Numbers
In US and other English based countries you will find a number format separated with decimals rather than comma, which is commonly used in European countries. Why is this important?
If the language setting aligns with the decimal format (US with decimals, GER with comma, …) the data will be imported and displayed correct as continuous and like in the source table. If the language setting differs, e.g. US number format in the spreadsheet and GER setting in JMP and/or the system, you will recognize that the data type changes from continuous to nominal character.
Usually you would use Column Info and change the Data Type from character to numeric and Modeling Type from nominal to continuous. In this case this will result in empty data as the Language setting cannot interpret the comma or the decimal, depending on the language.
So there are two things you could do to work around this:
Change the display language setting and use the checkbox to use JMP’s language rather than the system’s one.
Use a script to substitute the comma/decimal and change then the column information appropriately
How to do this?
In JMP you can change your language preference settings to handle different language format for numbers, dates or currencies (I will take care of Windows systems, on Mac you need to change the Mac systems language setting (at least up to today):
In Windows: Under File à Preferences select “Windows Specific”
Changing your Display language will only change your displayed language for menu, headers and statistics
Checking the small box will allow JMP to use the selected language also for numbers, dates and currency formats rather than the systems language settings.
This will address most of the time conversion issues getting files from outside Europe with other defaults like points instead of comma.
You can use JSL to set (de)activate this checkbox, but not to change the display language itself (therefore this will be just half the way to solution):
Preferences(Use JMP Locale Settings( 1 ))
I have written two scripts, depending what language setting you have and what you want to convert into. They can be found in the file exchange including a guide on how to use the scripts. I have added them to my menu list, so I can use them whenever it’s necessary: Substitute_Decimal2Comma.jsl and Substitute_Comma2Decimal.jsl
Two Digit Year numbers
In case of two-digit year numbers you should take care that JMP recognize the correct 100 years. To do so there is a setting for the underlying two-digit year rule JMP should refer to:
Under File à Preferences select “Text Data Files”
Changing the Two-digit year rule will have impact on your date interpretation
Select the 100-year range in wich your dates fall.
For example, if the earliest date is 2/2/79, and the year is 1979, select 1970-2069. If the earliest date is 2/2/12, and the year is 2012, select 2000-2099 and not 1910-2009.
If dates span centuries, you must recode the dates with four-digit years before importing the data!
The third topic of date formats will be part of my next blog. Please share your experiences using the comment option below. I also would like to know if you found the scripts valuable. Thank you!