cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

how do i handle Missing Data values in JMP

I have only categorial missing values. I'm trying Multivariate Normal imputation but not all rows are getting Multivariate Normal Imputation, what should I do to ensure that all rows in the column receive it?

 

Or should I do something completely different? 

5 REPLIES 5
WebDesignesCrow
Super User

Re: how do i handle Missing Data values in JMP

For missing values;

1) If I still need the missing values rows for my analysis, I would replace missing values with recode like "invalid" or "empty"

2) If the categorical missing values rows can be dropped for analysis, I would remove all missing values rows because it will give inaccurate representation for analysis.

dlehman1
Level V

Re: how do i handle Missing Data values in JMP

Imputing missing categorical values presents a few options and I think it depends much on subject matter knowledge to choose.  Some analyses, if you use "informative missing," will replace the missing value with the most common non-missing value in the data set.  That is rarely a sensible thing to do and likely not what you have in mind since you were trying to model the data to impute the missing values.  If you have enough variables and data you could try a predictive model to predict the missing categorical values as functions of the non-missing data, using any of the predictive models available (e.g. partitions, neural nets, etc.).  If all the missing values are for a single variable that might work - but if the missing values are in different columns for different observations, that is unlikely to work well.

 

You could try excluding those observations, but obviously this depends on how much data is missing and also on the context of the problem and the nature of the missing data.  I think WebDesignesCrow's suggestion of replacing missing values with a value (such as "missing") generally makes the most sense.  That doesn't require making any assumptions regarding the nature of the missing data and allows whatever analysis you do to uncover the relevance (or lack thereof) of the missing data.  There was a great talk from John Sall a couple of years ago (you should be  able to find it in the Community Forum) on "ghost data" that might give you more ideas.

Re: how do i handle Missing Data values in JMP

thank you for the answer.

Half of the dataset contains missing values. 

The dataset has about 375.000 rows and 20 columns 

Re: how do i handle Missing Data values in JMP

Is dimensionality reduction appropriate to use in this case?

dlehman1
Level V

Re: how do i handle Missing Data values in JMP

I am guessing that you are interested in predicting price in this data.  You should have no problem with the predictive models just using the "informative missing" option.  See here for some details:  https://www.jmp.com/support/help/en/17.0/?os=win&source=application#page/jmp/informative-missing-2.s....  While more elaborate treatment of missing values is possible, I don't think these are worth the effort in this case.  At least, I would first examine the models to see if the variables that have a lot of missing data are highly influential - if they are, then perhaps you want to examine these variables (and their missingness) more carefully.