cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Dessan300
Level I

Handling missing values

I want to impute missing values but getting a message that there is insufficient data to impute missing values. The image attached provides more details.

mv.JPG

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: Handling missing values

Hi @Dessan300,

 

It's hard to help you without more context or a toy dataset to understand and reproduce your problem.
The error message is quite self-explanatory, you don't have enough values to realize an Automated Data Imputation : Overview of the Explore Missing Values Platform. ADI is robust and flexible, but represent a complex method for imputing data, as it may require enough data to train the model to fill the missing values. The dimensionality is quite important for this technique, as it is based on low-rank matrix approximation method (method for dimensionality reduction). You can read the discussion on this topic : Some missing values are not imputed with ADI (Automated Data Imputation) 

 

I would recommend first to investigate the source/cause of these missing values, and spend some time on the missing values patterns : are the data Missing Completely at Random (MCAR), Missing at Random (MAR), or Missing not at Random (MNAR) ? More info here : https://muratkoptur.com/MyDsProjects/MissingData/Analysis.html 

The Missing Value snapshot available in this platform is very helpful to visualize and understand these patterns.

Discovering Patterns in Missing Values 

Depending on the missing patterns, if it's not random, it may be best to keep the missing value, as it could imply that a part of the experimental area may not be experimentally feasible or that the equipments are not able to measure in this experimental area (or the signal may be too low or too high in these area). Imputing values in this situation may be detrimental, as you'll lose the "non-feasibility" information of this experimental area. 

 

You may try "easier" options and compare them, like Multivariate Imputation options, or using other techniques not in this platform like K-Nearest Neighbors (and defining K appropriately depending on the dimensionality of your dataset and the precision/robustness you want to obtain).

Also take into consideration that some modeling options are able to handle missing values and do not require imputing values, like tree-based methods, Partial Least Squares, etc...

 

No matter the method and its complexity (from mean/median/mode imputation techniques to ADI or Multivariate SVD/RPCA imputation), imputing missing value may bias your dataset. Depending on your use case, it can be interesting to create a model using missing values and imputed values, and compare the outcomes of the two models with domain expertise.

You can also watch Exploring Missing Values.

 

Hope this first answer might help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

1 REPLY 1
Victor_G
Super User

Re: Handling missing values

Hi @Dessan300,

 

It's hard to help you without more context or a toy dataset to understand and reproduce your problem.
The error message is quite self-explanatory, you don't have enough values to realize an Automated Data Imputation : Overview of the Explore Missing Values Platform. ADI is robust and flexible, but represent a complex method for imputing data, as it may require enough data to train the model to fill the missing values. The dimensionality is quite important for this technique, as it is based on low-rank matrix approximation method (method for dimensionality reduction). You can read the discussion on this topic : Some missing values are not imputed with ADI (Automated Data Imputation) 

 

I would recommend first to investigate the source/cause of these missing values, and spend some time on the missing values patterns : are the data Missing Completely at Random (MCAR), Missing at Random (MAR), or Missing not at Random (MNAR) ? More info here : https://muratkoptur.com/MyDsProjects/MissingData/Analysis.html 

The Missing Value snapshot available in this platform is very helpful to visualize and understand these patterns.

Discovering Patterns in Missing Values 

Depending on the missing patterns, if it's not random, it may be best to keep the missing value, as it could imply that a part of the experimental area may not be experimentally feasible or that the equipments are not able to measure in this experimental area (or the signal may be too low or too high in these area). Imputing values in this situation may be detrimental, as you'll lose the "non-feasibility" information of this experimental area. 

 

You may try "easier" options and compare them, like Multivariate Imputation options, or using other techniques not in this platform like K-Nearest Neighbors (and defining K appropriately depending on the dimensionality of your dataset and the precision/robustness you want to obtain).

Also take into consideration that some modeling options are able to handle missing values and do not require imputing values, like tree-based methods, Partial Least Squares, etc...

 

No matter the method and its complexity (from mean/median/mode imputation techniques to ADI or Multivariate SVD/RPCA imputation), imputing missing value may bias your dataset. Depending on your use case, it can be interesting to create a model using missing values and imputed values, and compare the outcomes of the two models with domain expertise.

You can also watch Exploring Missing Values.

 

Hope this first answer might help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)