Hello! Welcome to today’s Need For Speed! Today is all about Fixing Structural Errors!
My name is 2-Click Clovis, and I am truly passionate about data analysis and
most important, time efficiency! JMP was an integral tool for my previous work in the semiconductor and manufacturing industries. Since joining JMP, I have learned so many new tips and tricks. I’ll never forgive myself for not knowing what I now know because I could have saved so many hours in my data analysis workflow! I see it as my current responsibility share my newfound knowledge with all current JMP users to help them regain their precious time.
After I demonstrate how quickly JMP can perform your routine data manipulation and analyses compared to other tools, I’ll show you the quickest way to get it done within JMP.
Fixing Structural Errors
Have you ever fantasized taking a baseball bat and completely obliterating your work laptop? That’s a rhetorical question, because I already know you have. No need to lie to me.
Well, I definitely have…
I used to have that exact fantasy each time I opened a spreadsheet to find it filled with messy data. I just wanted to do data analysis! I did not want to spend hours fixing typos, inconsistent capitalization, or dealing with different notations of the same value.
When I discovered JMP’s Recode platform, an hour of data cleanup became seconds! I went from being an ideal candidate for anger management therapy to being cool as a cucumber.
Fixing Structural Errors for One Column
Structural errors are when you measure or transfer data and notice strange naming conventions, typos, or incorrect capitalization. These types of errors often occur for user-inputted data, when multiple users have access to the same spreadsheet, or when joining tables from different measurement sources. These types of errors can really skew your models and make them inefficient, so they need to be addressed immediately!
In the example below, I am interested in a Particle Size variable where I should only have values of Large, Medium, and Small. However, you will notice that I have some typos (ex: Larg, Mediumm, Smaall, etc.) and different notations for the same value (ex: S, s, and small for Small). Each of these would be considered separate and unique in your analyses, which is unacceptable!
With JMP, there is no need to manually go through each row and fix the typos or standardize the notation style. Just right-click the column of interest and select Recode.
A new window appears. In this window, you will have all your current values displayed under the Old Values column. In this example, we have 13 different values for Particle Size when there should only be three. You will see the number of appearances for each value in the Count column. For example, we have 26 instances of the value Large in our table.
There is a drop-down menu in the top-left corner of the window, where you can choose to create a new column with new values, a formula column, or replace the values in the current column with In Place. For this example, we will be using In Place.
We can select multiple of the old values and group them together by clicking on the Group box to the top-right of window. The default new value will be the old value in the group that had the highest count. However, you can always change the new value by clicking on it and typing something different.
In the example below, watch as I group the 13 old values into three new values of Large, Medium, and Small.
When satisfied with the new values, click on the Recode box at the bottom-right of window; the table updates to reflect the changes.
Tip: Clicking on the red arrow at the top-left of the Recode window has shortcuts for fixing common structural errors such as converting characters to title case/lowercase/uppercase, removing whitespace or punctuation, extracting or replacing part of a string .
Fixing Structural Errors for Multiple Columns
What if you have the same type of structural error across multiple columns? JMP has a shortcut: Standardize Attributes! All you need to do is right-click the columns of interest, select Standardize Attributes, and click on Recode.
In the example below, we have several categorical variables that have structural errors. The presence or lack of an underscore to our strings is creating separate unique values.
An easy way to standardize the values and solve our problem would be to use the Replace String… option of Recode under the red triangle. You can choose to replace the space with an underscore or vice versa.
In the example below, watch as I use the Standardize Attributes feature and the Replace String… option to clean the data and replace a space with an underscore.
For previous Need for Speed posts, click here!
On that note,
is clicking out!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.