Subscribe Bookmark
brady_brady

Staff

Joined:

Jun 9, 2012

New Imputation Add-In for JMP Pro

If you never have to deal with missing data, please count your blessings — nothing to see here today! However, for those of us who are not so lucky, I’ve got some good news: First, throughout JMP 11, we have added capabilities for dealing with missing data. The Reliability Growth platform now handles missing data. And in JMP Pro, the Partition platform and most Fit Model personalities now have an “informative missing” option, while Partial Least Squares supports imputation.

But — what if you want to preprocess a data set, performing imputations before you begin modeling? For that, I’ve uploaded a new add-in to the JMP File Exchange: the Imputation Add-In for JMP Pro. (Download requires a free SAS profile.)

The add-in allows you to choose from a variety of statistics and pre-processing options, and perform imputations for one or several variables, all in one step. Its interface is modeled closely after the Summary platform, so you’ll find it easy to use. I’ll also go through an example that will illustrate how to use the add-in and explain its output.

The example will use the “Missing Car Physical Data.jmp” data set. This data set was formed by deleting some of the data from the “Car Physical Data.jmp” data set included in the sample files in JMP. I’ve placed it in the sample data section of the file exchange in case you want to work through these examples.

After opening the file and using the Columns Viewer (a great new platform in JMP 11 —give it a try!), we see that each column is missing data:

In this example, we will group by Country, imputing for the Displacement variable in three different ways:

  • Mean
  • Trimmed mean
  • 95% upper confidence limit for the mean
  • To impute the mean Displacement, grouping by Country, we cast Country into the Group role, then select Displacement and choose Mean from the drop-down menu:

    Once you’ve done this, the panel next to the drop-down will be populated with your choice:

    To impute the trimmed mean,

    • Select “Trim” from the Trim/Winsorize drop-down.
    • Choose the level of trimming. Here we trim a total of 20% of the data (i.e., the central 80% of the data will be used) by entering a value of 0.20.
    • Select Displacement in the columns menu and choose “Mean” from the Imputation Method drop-down.
    • Notice that T[0.2]-> appears before the Mean(Displacement) text, to indicate that the data will receive a 20% pre-processing trim before computing the mean.
    • To impute the upper 95% confidence limit for the mean,

      • First, select “No” from the Trim / Winsorize drop-down. Otherwise, the data will be trimmed as a preprocessing step.
      • Next, ensure that 0.95 is entered in the CI Confidence box.
      • Select Displacement in the columns menu and choose “Upper CI Mean” from the Imputation Method drop-down.
      • Click OK to proceed with the imputation.

        Two tables are produced. The first is a summary table, listing the values that will be imputed, by grouping level:

        The next table is a copy of the original table, with additional columns from the imputation(s). Imputation columns are grouped with the original column for easy reference. Here, we see that four new columns have been created:

        • An indicator column, Impute_Flag(Displacement), set to 1 if imputation was performed (i.e., if the value was missing in the original data) and 0 otherwise.
        • A column for each of the three imputation methods we requested.
        •  

           

          Using the Columns Viewer platform on this new table, we see that, as expected, the 3 imputed columns contain no missing values:

           

          Further inspecting the new columns, we see that when the imputation flag is 0, the original value is used. When the imputation flag is 1, we insert the appropriate statistic (by group level, if grouping was used):

           

          The columns are formula-based, so if you change the value of Country in a given row, or delete (or enter) the Displacement in a given row, the flags and imputed values update automatically. This allows these columns to be used intelligently by the prediction formulas produced by modeling platforms.

          The add-in also has help buttons to make it easy to get started: Click on any of the “?” buttons to access help text.

          You can download this add-in (and many others) for free at the JMP File Exchange. (A free SAS profile is required.)

          Happy imputing!

          2 Comments
          Community Member

          Ursula Garczarek wrote:

          Dear Brady, I'd love to use the tool, but I seem to miss a very important piece of information: when I download the file, it comes as a .zip, which has three files in it, one being a .jsl which editted is somehow machine language. I tried to rename the .zip .addin and stuff like that, but that does not work. Can you send me a link to instructions how I actuually can install this addin? Kind regards,

          Ursula

          Brady Brady wrote:

          Hi Ursula,

          We have seen this problem before, although we are not sure why it happens. If you are using Windows (we've not heard of this happening with Macs), here is the fix:

          1. Download the file; from the your post it looks like you have already done this.

          2. Find the file using Windows Explorer, but do not extract the .zip file--leave it intact.

          3. In the settings for Windows Explorer, ensure that the box for "Hide Extensions of Known File Types" is NOT checked.

          4. Rename the file so that the extension is no longer .zip, but .jmpaddin (Note, if you did not perform step 3, the file will now APPEAR to have extension .jmpaddin, but will actually have an extension of.jmpaddin.zip, so your problem will persist.)

          5. Double-click the file, or open it from within JMP.

          That should do it--good luck!

          Brady