JMP cleanup from Excel data that has some rows that need to be split into separate columns
Sep 5, 2018 9:01 AM
Recently, I needed to analyze data from a website that stored data in columns and rows. Every column cell in the main row related to the upper left cell (Title). However, each Title had two cells below it (the first cell in the subsequent two rows). One contained a description of the Title; the other contained content tags related to the Title.
Unfortunately, the web page was not set up to allow me to import as .html or .txt using JMP File>Internet Open. So, I cut and pasted into Excel. A JMP systems engineer colleague, Olivia Lippincott @olippincott, said that for Excel users she has met, the Excel format (Figure 1) is common and she often helps them with cleanup.
Multiple rows needed to be split into separate columns. This is how we did it.
Figure 1: Table of Excel data imported into JMPSecond, we created a new column called Title Type, with Column Properties Data Type = Numeric and Modeling Type = Nominal. We populated the cells with a repeated sequence of 1,2,3 by typing 1, 2, and 3 in the first three cells in Title Type column, highlighting them, and then selecting FILL>REPEAT SEQUENCE TO END OF TABLE. This gave us an ID column to use to identify the Title field pattern.
Figure 2: Add Title Type to identify the pattern 1-3 for the three Title TypesThird, we split the table (TABLE>SPLIT) to break Title Type 2 and Title Type 3 into two different columns. I saved the script to the data table, too, for reference.
Figure 3: Splitting Title Column into 3 columns by Title TypeFigure 4: JSL script to split Title Column into 3 columns by Title Type
Fourth, we moved some of the columns around, renamed title Title Type 2 to Description and Title Type 3 to Tags. Finally, we got rid of the leading Tags: text on the Tags column by recoding Tags: to a blank space using the Replace String option under the Recode red triangle.Figure 5: Accessing Column Recode using Replace String
Figure 6: Using Recode to replace consistent Tags: text found in each cell in a column
Figure 7. Final Videos Table ready for JMP analysis
The whole cleanup process using JMP took about 1 minute. Perhaps these steps will help you during your own data cleanup and preparation.