Using the Extract Segment tool in the Recode platform
Oct 29, 2019 10:32 AM
| Last Modified: Nov 19, 2019 6:31 AM
Recode has always been a great tool for cleaning up messy data, but it’s also useful for extracting data embedded in text columns. JMP 15 adds the flexible Extract Segment tool, and the platform has been enhanced to generate semantic scripts in many cases so that the script or formula can be executed against new data.
For example, let’s take the sample data set Drosophila Aging Distances. Say we want to get the WK number out of the Observation ID column and put it in a separate numeric column. We notice that the field has a regular pattern where the WK number is the third field.
Step 1. Recode the Observation ID column
Step 2. From the red triangle menu, choose Advanced->Extract Segment...
Note that the Text to Capture section shows that there is only one word in our sample text. However, for this purpose, our words are delimited by the underscore character. We’ll add the underscore as a delimiter.
Once we add the delimiter, we can see that the first sample contains four words. I have used the slider to move the selection to the third word. You can see in the Preview area that we are now capturing the WK portion of the string. We use the OK button to finalize this step in our process.
Step 3. Now that we have isolated our desired data, let’s use Extract Segment again to peel off everything but the numeric portion. If we remove all delimiters, then each character is treated like a word. Notice that I have selected the third word. However, I have also switched the end to point to the end of the string (a negative number indicates we are counting from the end of the string). We only have a single digit in this case, but if we had multiple we would be capturing from the third word to the end of the string.
Step 4. We now have the digits as strings. We can go to the red triangle menu again and choose Parse as Numbers to retrieve the numbers.
Step 5. Save to a new column or a script. Note in this script that it captured the three steps performed and can be run against new data. The Recode Column portion is very similar to the Recode command that would be written out to a formula column.
The resulting column looks like this:
Get Drosophila Aging Distances, give it try and let me know what you think.