cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
dale_lehman
Level VII

Missing Data and SVM

I have started using the Support Vector Machines platform but it appears that predictions are missing for observations that are missing a continuous factor.  There is no informative missing setting for SVM, but the prediction formula shows that missing nominal data is used in the formula.  However, when a continuous factor is missing, then the prediction is missing (and the prediction formula does not have any term dealing with the continuous factor  being missing like it does for nominal factors that are missing).  Is there a reason why SVM cannot handle missing continuous factors but is able to use missing nominal factors?

1 ACCEPTED SOLUTION

Accepted Solutions
eclaassen
Staff

Re: Missing Data and SVM

SVM does not provide predictions if any X factor is missing - whether categorical or continuous.

SVM requires the data to be centered and scaled for continuous factors, so if the continuous factor is missing, the (:X-mean)/stddev in the formula is then missing. For categorical factors, the "design matrix" for that factor is created. If the factor level is missing, the design matrix is created as missing, ie [. .] for a two-level categorical factor.

If any of the center/scaled X's or design matrix X's have missing values, the ultimate predicted value is missing.

View solution in original post

4 REPLIES 4
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Missing Data and SVM

I suspect the value given with the prediction formula out of the new SVM platform with a missing categorical input is incorrect and it should not be displayed. If you need to handle missing data, I suggest using a different platform (for example, Explore Missing Values under the Analyze > Screening menu) to impute those values first, then move to SVM.

 

At least in my test cases, when saving a prediction formula with a missing input all of the intermediate columns return missing, and the final formula is just the first possible result. When you just save Predicteds, then those rows are missing as expected.

eclaassen
Staff

Re: Missing Data and SVM

SVM does not provide predictions if any X factor is missing - whether categorical or continuous.

SVM requires the data to be centered and scaled for continuous factors, so if the continuous factor is missing, the (:X-mean)/stddev in the formula is then missing. For categorical factors, the "design matrix" for that factor is created. If the factor level is missing, the design matrix is created as missing, ie [. .] for a two-level categorical factor.

If any of the center/scaled X's or design matrix X's have missing values, the ultimate predicted value is missing.

dale_lehman
Level VII

Re: Missing Data and SVM

Thank you. the documentation should probably reflect that SVM cannot handle missing values (without some manual pre-processing). Also, SVM has been very slow from my experience and also not particularly accurate (compared with other prediction models). I'd be interested hearing whether that has been true for other people as well.

Re: Missing Data and SVM

Regarding the documentation suggestion, our technical writers have taken steps to add this information to the Help and Documentation. Thanks!