cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
frankderuyck
Level VI

SVEM ADD IN FOR SUPPORT VECTOR MACHINE

I have a DOE results I would like to model with Support Vector Machine (there are strong non linear effects that can't be estimated with a polynomial model) How do I install the SVEM add in?

10 REPLIES 10
Victor_G
Super User

Re: SVEM ADD IN FOR SUPPORT VECTOR MACHINE

Hi @frankderuyck,

 

This is two different considerations :

  • SVEM is a validation strategy that could be applied to any model, from linear regression to neural networks.
  • Random Forest is a Machine Learning model.

 

Both share in common a way to prevent overfitting when modeling :

  • Random Forest use bootstrap samples and "Out-of-Bag" sample as "internal" model validation. The "Out-of-Bag" sample is a part of the data not used in training: Training samples are bootstrap samples sampled with replacement in the original dataset. Theoritically, you can calculate that by sampling with replacement your original data, around 1/3 of your data won't be sampled : machine learning - Random Forests out-of-bag sample size - Cross Validated (stackexchange.com) 
    This sample constitutes the "Out-of-bag" sample not used in the training of the Random Forest and enable to assess/validate the relevance and precision of the model.
  • SVEM use anticorrelated weights for training and validation, meaning that an experiment with a high weight for training will have a low weight for validation. By fitting a model with this validation setup, and then changing the weights and refitting the model with new weights for training and validation a high number of times, you have slightly different models that you can ensemble/combine, in order to reduce variance (and prevent overfitting). 

 

Now, my personal opinion and choice :

  • What I do see frequently is people using SVEM for their DoE with highly complex models without even further considerations or previous analysis/modeling. You can often see for example SVEM applied with Neural Networks for quite relatively simple designs and results. It sounds like "using a bazooka to kill a fly" for me : I tend to choose the most simple options/models before increasing the complexity (if necessary) : Ockham's razor. You also have to consider how much levels you have for your factors and other considerations (like dimensionality), as Machine Learning models are interpolating models, very efficient at finding a pattern between points. So if you only have 2-3 levels for your factors, that might be not enough to really gain a benefit using Machine Learning.
  • If you have enough levels and the traditional linear modeling doesn't seem appropriate because of non-linear relationships, then you might be interested to try Machine Learning models.
  • Since you have a low-size but high-quality training dataset with your DoE, you have to choose a ML model that is simple (no or few and easy hyperparameters fine-tuning), robust (less sensitive to hyperparameters values setting) and less prone to overfitting, since you can't split your DoE data into a training and validation sets. This is where Random Forests (with its internal "Out-of-bag" sample validation) and SVM (you can evaluate overfitting possibility with hyperparameters choice and number of support vectors) are interesting. 

Also using a robust ML algorithm instead of a complex one with a SVEM strategy has other benefits, such as lower computational time and easier interpretability.

 

Hope this answer will help you and make sense for you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)