JMP Blog

arati_mejdal · Jul 17, 2017 10:00 AM

Do you use JMP in one language and then share a data table with colleagues who use JMP in another language? Or do you ever have columns in your data table that contain data in more than one language? Those capabilities are made possible by the use of Unicode in our software.

When JMP Director of Development and data visualization expert Xan Gregg found out about the Adopt-a-Character program from Unicode, he thought JMP ought to be part of that since our software uses Unicode.

"Unicode has been a great asset for JMP and data sharing in general, and the Adopt-a-Character program is a great way to show our support," Xan says.

JMP began using Unicode characters in JMP 6 (released in 2005) for text processing and storage on multiple operating systems, and now in seven languages (English, Japanese, Simplified Chinese, Castilian Spanish, German, French and Italian).

Before Unicode, JMP relied on language-specific code pages that were stored with individual data tables. When a data table that had been created in one language was opened by someone using JMP in a different language, there was a problem. But Unicode solved that problem.

"You can open any JMP file in any language, because of Unicode," says JMP developer Michael Hecht, who has worked on JMP since its first version and is the resident Unicode expert. "You can also combine different languages in the same data table. You could have a column in English, another in German and a third in Japanese."

The Unicode Consortium establishes a unique code number for characters used in electronic text that can be used regardless of platform, application or language. Its standards also include such things as statistical symbols, emoji and music notation. Unicode doesn’t just unify text across different languages, but it also unifies the way different operating systems encode text.

Hecht has a big Unicode book in his office, The Unicode Standard 4.0 (see below). It's a reference book that lists all the Unicode characters up until 2003.JMP developer Michael Hecht pages through The Unicode Standard 4.0 in his office. JMP recently adopted a Unicode character, the lightbulb emoji.

The Unicode Standard is now up to 10.0. "Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters," according to Unicode.

The sponsorship (via the adoption of a Unicode character) "directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages," Unicode says.

After considering some statistical symbols and various emoji, we settled on sponsoring the lightbulb symbol. Why?

"It represents those transformative aha moments that our users -- who are scientists, engineers and other data explorers -- experience when they use JMP," says Diana Levey, Director of Marketing for JMP.

Beyond making this fun connection to discovery and innovation through data exploration, JMP is happy to support the important work of the Unicode Corsortium.

"It's a huge vision," says Michael Hecht. "Look at the size of this book! Unicode has a massive ongoing task: always revising and supporting new languages and symbols."

Thanks, Unicode! Shine on!