cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
frankderuyck
Level VI

SVEM ADD IN FOR SUPPORT VECTOR MACHINE

I have a DOE results I would like to model with Support Vector Machine (there are strong non linear effects that can't be estimated with a polynomial model) How do I install the SVEM add in?

10 REPLIES 10
Victor_G
Super User

Re: SVEM ADD IN FOR SUPPORT VECTOR MACHINE

Hi @frankderuyck,

 

This is two different considerations :

  • SVEM is a validation strategy that could be applied to any model, from linear regression to neural networks.
  • Random Forest is a Machine Learning model.

 

Both share in common a way to prevent overfitting when modeling :

  • Random Forest use bootstrap samples and "Out-of-Bag" sample as "internal" model validation. The "Out-of-Bag" sample is a part of the data not used in training: Training samples are bootstrap samples sampled with replacement in the original dataset. Theoritically, you can calculate that by sampling with replacement your original data, around 1/3 of your data won't be sampled : machine learning - Random Forests out-of-bag sample size - Cross Validated (stackexchange.com) 
    This sample constitutes the "Out-of-bag" sample not used in the training of the Random Forest and enable to assess/validate the relevance and precision of the model.
  • SVEM use anticorrelated weights for training and validation, meaning that an experiment with a high weight for training will have a low weight for validation. By fitting a model with this validation setup, and then changing the weights and refitting the model with new weights for training and validation a high number of times, you have slightly different models that you can ensemble/combine, in order to reduce variance (and prevent overfitting). 

 

Now, my personal opinion and choice :

  • What I do see frequently is people using SVEM for their DoE with highly complex models without even further considerations or previous analysis/modeling. You can often see for example SVEM applied with Neural Networks for quite relatively simple designs and results. It sounds like "using a bazooka to kill a fly" for me : I tend to choose the most simple options/models before increasing the complexity (if necessary) : Ockham's razor. You also have to consider how much levels you have for your factors and other considerations (like dimensionality), as Machine Learning models are interpolating models, very efficient at finding a pattern between points. So if you only have 2-3 levels for your factors, that might be not enough to really gain a benefit using Machine Learning.
  • If you have enough levels and the traditional linear modeling doesn't seem appropriate because of non-linear relationships, then you might be interested to try Machine Learning models.
  • Since you have a low-size but high-quality training dataset with your DoE, you have to choose a ML model that is simple (no or few and easy hyperparameters fine-tuning), robust (less sensitive to hyperparameters values setting) and less prone to overfitting, since you can't split your DoE data into a training and validation sets. This is where Random Forests (with its internal "Out-of-bag" sample validation) and SVM (you can evaluate overfitting possibility with hyperparameters choice and number of support vectors) are interesting. 

Also using a robust ML algorithm instead of a complex one with a SVEM strategy has other benefits, such as lower computational time and easier interpretability.

 

Hope this answer will help you and make sense for you,

Victor GUILLER
L'Oréal Data & Analytics

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)