Just look at the list of Character functions available in the JMP Formula Editor.
Not that anyone is counting, but that's 42 functions! Needless to say, if you've got a character column and want to change it somehow, JMP probably has a function to do it for you.
Notice, though, that nine of those functions are all – or mostly – about extracting some portion of a character value. Admittedly, this is one of the most common operations – extracting some part of a larger string of characters. This is known as a substring.
The most obvious function for this is the Substr() function, but this function requires that you know exactly where the part you're interested in lies in the larger string. It also means that the part you want has to be in the same place in every string.
That's why I love the Word() function. It's amazingly flexible and works the way that I think.
The Word() function takes two required arguments and one optional.
Word(n, "text", <"delimiters">)
The Word() function divides the text up into words based on the delimiters you specified and returns the nth word. If you don't specify the delimiters argument, then the default delimiter is the same one we use to separate words in sentences: a space.
As an example, consider the following column of days:
Day |
December 6 |
December 16 |
September 28 |
May 19 |
September 29 |
September 11 |
March 14 |
November 13 |
October 27 |
January 7 |
Separating the month is easy with the Word() function. You just want the first word of each string.
Word( 1, Day )
Month |
December |
December |
September |
May |
September |
September |
March |
November |
October |
January |
That works well in many cases, but the real power comes when you think of using other characters as delimiters to divide a string up into "words."
Here's another list. These are locations. This time the latitude and longitude is embedded in the string behind the city name.
Location |
1600 NEW BERN AVE Raleigh, NC [35.779892815000494, -78.61572674699966] |
3800 NEW BERN AVE Raleigh, NC [35.79777009600048, -78.57633099599968] |
3200 CAPITAL BLVD Raleigh, NC [35.822875239000496, -78.58848179699964] |
700 VARDAMAN ST Raleigh, NC [35.77016025800049, -78.61995064499968] |
2900 BASSWOOD DR Raleigh, NC [35.74121031600049, -78.60477047599966] |
1300 CURFMAN ST Raleigh, NC [35.76619700500049, -78.65143264899967] |
8500 SWARTHMORE DR Raleigh, NC [35.89068092800045, -78.60012101699965] |
1200 CANYON ROCK CT Raleigh, NC [35.76156160400046, -78.55683519199965] |
800 GREENWICH ST Raleigh, NC [35.768321829000456, -78.59393748999963] |
300 ROSELLE CT Raleigh, NC [35.788000157000454, -78.58367178499964] |
It would be useful to have latitude and longitude in another column.
If you think of the square brackets as delimiters, this is pretty simple. You just want the second "word."
Word(2, Location, "[]")
Now you've got a column (LatLong) of strings that have latitude and longitude separated by a comma and a space.
LatLong |
35.779892815000494, -78.61572674699966 |
35.79777009600048, -78.57633099599968 |
35.822875239000496, -78.58848179699964 |
35.77016025800049, -78.61995064499968 |
35.74121031600049, -78.60477047599966 |
35.76619700500049, -78.65143264899967 |
35.89068092800045, -78.60012101699965 |
35.76156160400046, -78.55683519199965 |
The Word() function comes to the rescue again. This time, we want to use comma and space as the delimiters. If we use only a comma, then the longitude string will have a space at the start. Using both delimiters gives us strings with no space.
Latitude
Word(1, LatLong, ", ")
Longitude
Word(2, LatLong, ", ")
These should be cleaned up further by converting them to numbers using the Num() function, but that's an exercise for another post.
The key to taking advantage of the Word() function is to view your strings as "sentences" and think abstractly about choosing delimiters.
Here's a final example. Suppose you have a column of lot numbers (LotNo) with a name embedded in the middle:
LotNo |
A30941ALTE69073172 |
A82886ILEAK51521781 |
A54715XSURVXC88997379 |
A7860BATTBC97523479 |
A79004ILEAK52159080 |
A59508ABAEG58962051 |
A44172ILEAK28255713 |
A21319XSURVXC88014800 |
A58308BATTBC49519851 |
A19412C83938451 |
This is difficult since the name doesn't occur in the same position in the string in each row. If you think of the numbers the same way you think of spaces when reading, then the name becomes the "second" word.
Word(2, LotNo, "0123456789")
NAME |
ALTE |
ILEAK |
XSURVXC |
BATTBC |
ILEAK |
ABAEG |
ILEAK |
XSURVXC |
BATTBC |
C |
If you have time to learn only one of the 42 character functions in the formula editor, I'd start with the Word() function.