Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
If you never have to deal with missing data, please count your blessings — nothing to see here today! However, for those of us who are not so lucky, I’ve got some good news: First, throughout JMP 11, we have added capabilities for dealing with missing data. The Reliability Growth platform now handles missing data. And in JMP Pro, the Partition platform and most Fit Model personalities now have an “informative missing” option, while Partial Least Squares supports imputation.
But — what if you want to preprocess a data set, performing imputations before you begin modeling? For that, I’ve uploaded a new add-in to the JMP File Exchange: the Imputation Add-In for JMP Pro. (Download requires a free SAS profile.)
The add-in allows you to choose from a variety of statistics and pre-processing options, and perform imputations for one or several variables, all in one step. Its interface is modeled closely after the Summary platform, so you’ll find it easy to use. I’ll also go through an example that will illustrate how to use the add-in and explain its output.
The example will use the “Missing Car Physical Data.jmp” data set. This data set was formed by deleting some of the data from the “Car Physical Data.jmp” data set included in the sample files in JMP. I’ve placed it in the sample data section of the file exchange in case you want to work through these examples.
After opening the file and using the Columns Viewer (a great new platform in JMP 11 —give it a try!), we see that each column is missing data:
In this example, we will group by Country, imputing for the Displacement variable in three different ways:
95% upper confidence limit for the mean
To impute the mean Displacement, grouping by Country, we cast Country into the Group role, then select Displacement and choose Mean from the drop-down menu:
Once you’ve done this, the panel next to the drop-down will be populated with your choice:
To impute the trimmed mean,
Select “Trim” from the Trim/Winsorize drop-down.
Choose the level of trimming. Here we trim a total of 20% of the data (i.e., the central 80% of the data will be used) by entering a value of 0.20.
Select Displacement in the columns menu and choose “Mean” from the Imputation Method drop-down.
Notice that T[0.2]-> appears before the Mean(Displacement) text, to indicate that the data will receive a 20% pre-processing trim before computing the mean.
To impute the upper 95% confidence limit for the mean,
First, select “No” from the Trim / Winsorize drop-down. Otherwise, the data will be trimmed as a preprocessing step.
Next, ensure that 0.95 is entered in the CI Confidence box.
Select Displacement in the columns menu and choose “Upper CI Mean” from the Imputation Method drop-down.
Click OK to proceed with the imputation.
Two tables are produced. The first is a summary table, listing the values that will be imputed, by grouping level:
The next table is a copy of the original table, with additional columns from the imputation(s). Imputation columns are grouped with the original column for easy reference. Here, we see that four new columns have been created:
An indicator column, Impute_Flag(Displacement), set to 1 if imputation was performed (i.e., if the value was missing in the original data) and 0 otherwise.
A column for each of the three imputation methods we requested.
Using the Columns Viewer platform on this new table, we see that, as expected, the 3 imputed columns contain no missing values:
Further inspecting the new columns, we see that when the imputation flag is 0, the original value is used. When the imputation flag is 1, we insert the appropriate statistic (by group level, if grouping was used):
The columns are formula-based, so if you change the value of Country in a given row, or delete (or enter) the Displacement in a given row, the flags and imputed values update automatically. This allows these columns to be used intelligently by the prediction formulas produced by modeling platforms.
The add-in also has help buttons to make it easy to get started: Click on any of the “?” buttons to access help text.