Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression...

PrakashMadhav · Feb 4, 2021 11:49 AM

I am interested in creating regression equations for QSAR-related problems. The general idea is to find a linear regression equation from a pool of many (>200) independent variables or descriptors to predict one dependent variable. Typically, the number of rows/observations is small (n=50 at a minimum).

There are ways to do this using Generalized Regression in JMP. While the penalized regression methods or the simple forward or backward elimination can find a "good" linear regression equation, my concern is that these methods may not find the globally optimum solution and that they get stuck in a local minimum.

There are other techniques outside of JMP that can sample many more possible choices such as: genetic algorithms, simulated annealing, particle swarm optimization, ant/bee colony, etc. by taking subsets of descriptors, using a function to evaluate the resulting model, splitting up the subset to make and evaluate newer subsets.

I was wondering if anyone has created a way to do this within JMP or even has a way to do this outside of JMP (e.g. in R, Python, Matlab, SAS, etc.) and can perhaps think of a way to add-in this functionality within JMP? My specific need is to make linear regression equations and not neural networks, random forests, or other types of models that JMP can also make.

P_Bartell · Feb 4, 2021 01:43 PM

The penalized methods are one way to go. But have you considered partial least squares? You data set is wide and shallow. Tailor made for PLS. Still a linear equation. And if your predictors are correlated...even better.

PrakashMadhav · Feb 4, 2021 03:55 PM

An issue with PLS is that it won't select subsets of the independent variables and will just include every independent variable. This may result in overfitting issues.

P_Bartell · Feb 4, 2021 04:11 PM

It's not clear to me what you mean by 'subsets of variables'...if variable selection is part of your problem solving goals, PLS can be used quite nicely. You have the flexibility in JMP Pro to select the number of latent factors to work with. Then find the most influential variables within that construct...JMP Pro also has a flexible set of cross validation methods...KFold might work best for your shallow data?

Byron_JMP · Feb 8, 2021 7:21 AM

I know @P_Bartell loves PLS, and its a good solution in a lot of places. I'm a huge fan of the Tree Methods, and recently an add-in for XGBoost was published: XGBoost Add-In for JMP Pro

Since it sounds like you have a JMP Pro license, it might be worth your time to take a look at it.

JMP Systems Engineer, Health and Life Sciences (Pharma)

Craige_Hales · Feb 8, 2021 11:19 AM

None of these will help, except to suggest that JSL can be written to do simulated annealing and simulations of systems:

How to use Define Class annealing/spring-force

Halloween Trilogy: all in one annealing/spring-force

Wind Visualization simulation/visualization of a system

Video flock/swarm/school swarming/flocking

These are JSL intensive projects; you'll know a lot about JSL after you go down this path.

Craige

Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations

Re: Genetic Algorithms, Simulated Annealing, etc. for Optimizing QSAR Regression Equations