cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Identification of Critical factors through PLS

I am trying to find out critical factors through multiple PLS like Initially I have selected all factors and based on variable importance plot (VIP) I reduced the factors. 2nd time I did PLS with those selected factor and VIP and again reduced the factors. Continued the process till to get all critical factors (VIP > 0.8, factors could be 3 or 4). Is it right approach to find out critical factors ?

1 ACCEPTED SOLUTION

Accepted Solutions
P_Bartell
Level VIII

Re: Identification of Critical factors through PLS

Generally I wouldn't recommend iteratively executing PLS models in a kind of step wise fashion to identify the key x's in a system. IMO many mathematical reasons why...I'll try to keep it on a conceptual level...and I hope others chime in.

 

One of the advantages of PLS is, on the x side of the model, it leverages ALL the x's to look for latent structures among ALL the x's, to find the latent structures that are most active on the y's. Then it reports back through some elegant mathematics which of the individual x's are most influential. If you start eliminating x's entirely, well, it's like they don't even exist in the system from a modeling point of view, you lose information from a modeling point of view on each subsequent pass, and they can't contribute anything to the creation of the iterating latent structures.

 

Here's my thoughts to help you whittle down x's. If whittle down to some minimum number of x's is your main goal. Depending on the type of data you have and the shape of your matrices, long and wide, narrow and long, wide and short, etc. there are many different analysis methods that are adept at variable identification besides PLS. If you have JMP Pro, look at the many modeling types in the Generalized Regression platform. Tree based modeling methods might work as well. Why not try these methods as well on your data to look for consistency among the conclusions and insights of each? Depending on the amount of correlation among the x's, even good old fashioned stepwise OLS might work...but tread carefully wrt to multicollinearity among the x's here.

 

Lastly, the grandfather gold standard method of all would be to take your first PLS pass influential x's and build a designed experiment with screening as the main goal. Back in my days in industry...we used this approach a lot and had lots of success with it solving root cause analysis type problems. As a wise engineer I once worked with was fond of saying, "Until you can turn a failure mode off and on, you don't know root cause."

View solution in original post

2 REPLIES 2
P_Bartell
Level VIII

Re: Identification of Critical factors through PLS

Generally I wouldn't recommend iteratively executing PLS models in a kind of step wise fashion to identify the key x's in a system. IMO many mathematical reasons why...I'll try to keep it on a conceptual level...and I hope others chime in.

 

One of the advantages of PLS is, on the x side of the model, it leverages ALL the x's to look for latent structures among ALL the x's, to find the latent structures that are most active on the y's. Then it reports back through some elegant mathematics which of the individual x's are most influential. If you start eliminating x's entirely, well, it's like they don't even exist in the system from a modeling point of view, you lose information from a modeling point of view on each subsequent pass, and they can't contribute anything to the creation of the iterating latent structures.

 

Here's my thoughts to help you whittle down x's. If whittle down to some minimum number of x's is your main goal. Depending on the type of data you have and the shape of your matrices, long and wide, narrow and long, wide and short, etc. there are many different analysis methods that are adept at variable identification besides PLS. If you have JMP Pro, look at the many modeling types in the Generalized Regression platform. Tree based modeling methods might work as well. Why not try these methods as well on your data to look for consistency among the conclusions and insights of each? Depending on the amount of correlation among the x's, even good old fashioned stepwise OLS might work...but tread carefully wrt to multicollinearity among the x's here.

 

Lastly, the grandfather gold standard method of all would be to take your first PLS pass influential x's and build a designed experiment with screening as the main goal. Back in my days in industry...we used this approach a lot and had lots of success with it solving root cause analysis type problems. As a wise engineer I once worked with was fond of saying, "Until you can turn a failure mode off and on, you don't know root cause."

Re: Identification of Critical factors through PLS

Thank you for detailing and clarification