cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

I am interested in creating regression equations for QSAR-related problems.  The general idea is to find a linear regression equation from a pool of many (>200) independent variables or descriptors to predict one dependent variable.  Typically, the number of rows/observations is small (n=50 at a minimum).

 

There are ways to do this using Generalized Regression in JMP.  While the penalized regression methods or the simple forward or backward elimination can find a "good" linear regression equation, my concern is that these methods may not find the globally optimum solution and that they get stuck in a local minimum.

 

There are other techniques outside of JMP that can sample many more possible choices such as: genetic algorithms, simulated annealing, particle swarm optimization, ant/bee colony, etc. by taking subsets of descriptors, using a function to evaluate the resulting model, splitting up the subset to make and evaluate newer subsets.

 

I was wondering if anyone has created a way to do this within JMP or even has a way to do this outside of JMP (e.g. in R, Python, Matlab, SAS, etc.) and can perhaps think of a way to add-in this functionality within JMP?  My specific need is to make linear regression equations and not neural networks, random forests, or other types of models that JMP can also make.

5 REPLIES 5
P_Bartell
Level VIII

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

The penalized methods are one way to go. But have you considered partial least squares? You data set is wide and shallow. Tailor made for PLS. Still a linear equation. And if your predictors are correlated...even better.

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

An issue with PLS is that it won't select subsets of the independent variables and will just include every independent variable.  This may result in overfitting issues.

P_Bartell
Level VIII

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

It's not clear to me what you mean by 'subsets of variables'...if variable selection is part of your problem solving goals, PLS can be used quite nicely. You have the flexibility in JMP Pro to select the number of latent factors to work with. Then find the most influential variables within that construct...JMP Pro also has a flexible set of cross validation methods...KFold might work best for your shallow data?

Byron_JMP
Staff

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

I know @P_Bartell  loves PLS, and its a good solution in a lot of places. I'm a huge fan of the Tree Methods, and recently an add-in for XGBoost was published: XGBoost Add-In for JMP Pro 

 

Since it sounds like you have a JMP Pro license, it might be worth your time to take a look at it.

 

JMP Systems Engineer, Health and Life Sciences (Pharma)
Craige_Hales
Super User

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

None of these will help, except to suggest that JSL can be written to do simulated annealing and simulations of systems:

How to use Define Class  annealing/spring-force

Halloween Trilogy: all in one  annealing/spring-force

Wind Visualization  simulation/visualization of a system

Video flock/swarm/school  swarming/flocking

These are JSL intensive projects; you'll know a lot about JSL after you go down this path.

Craige