Solved: Re: Identification of Critical factors through PLS

Avinashpathade · Apr 4, 2024 12:20 AM

I am trying to find out critical factors through multiple PLS like Initially I have selected all factors and based on variable importance plot (VIP) I reduced the factors. 2nd time I did PLS with those selected factor and VIP and again reduced the factors. Continued the process till to get all critical factors (VIP > 0.8, factors could be 3 or 4). Is it right approach to find out critical factors ?

P_Bartell · Apr 4, 2024 07:40 AM

Generally I wouldn't recommend iteratively executing PLS models in a kind of step wise fashion to identify the key x's in a system. IMO many mathematical reasons why...I'll try to keep it on a conceptual level...and I hope others chime in.

One of the advantages of PLS is, on the x side of the model, it leverages ALL the x's to look for latent structures among ALL the x's, to find the latent structures that are most active on the y's. Then it reports back through some elegant mathematics which of the individual x's are most influential. If you start eliminating x's entirely, well, it's like they don't even exist in the system from a modeling point of view, you lose information from a modeling point of view on each subsequent pass, and they can't contribute anything to the creation of the iterating latent structures.

Here's my thoughts to help you whittle down x's. If whittle down to some minimum number of x's is your main goal. Depending on the type of data you have and the shape of your matrices, long and wide, narrow and long, wide and short, etc. there are many different analysis methods that are adept at variable identification besides PLS. If you have JMP Pro, look at the many modeling types in the Generalized Regression platform. Tree based modeling methods might work as well. Why not try these methods as well on your data to look for consistency among the conclusions and insights of each? Depending on the amount of correlation among the x's, even good old fashioned stepwise OLS might work...but tread carefully wrt to multicollinearity among the x's here.

Lastly, the grandfather gold standard method of all would be to take your first PLS pass influential x's and build a designed experiment with screening as the main goal. Back in my days in industry...we used this approach a lot and had lots of success with it solving root cause analysis type problems. As a wise engineer I once worked with was fond of saying, "Until you can turn a failure mode off and on, you don't know root cause."

View solution in original post

P_Bartell · Apr 4, 2024 07:40 AM

Generally I wouldn't recommend iteratively executing PLS models in a kind of step wise fashion to identify the key x's in a system. IMO many mathematical reasons why...I'll try to keep it on a conceptual level...and I hope others chime in.

One of the advantages of PLS is, on the x side of the model, it leverages ALL the x's to look for latent structures among ALL the x's, to find the latent structures that are most active on the y's. Then it reports back through some elegant mathematics which of the individual x's are most influential. If you start eliminating x's entirely, well, it's like they don't even exist in the system from a modeling point of view, you lose information from a modeling point of view on each subsequent pass, and they can't contribute anything to the creation of the iterating latent structures.

Here's my thoughts to help you whittle down x's. If whittle down to some minimum number of x's is your main goal. Depending on the type of data you have and the shape of your matrices, long and wide, narrow and long, wide and short, etc. there are many different analysis methods that are adept at variable identification besides PLS. If you have JMP Pro, look at the many modeling types in the Generalized Regression platform. Tree based modeling methods might work as well. Why not try these methods as well on your data to look for consistency among the conclusions and insights of each? Depending on the amount of correlation among the x's, even good old fashioned stepwise OLS might work...but tread carefully wrt to multicollinearity among the x's here.

Lastly, the grandfather gold standard method of all would be to take your first PLS pass influential x's and build a designed experiment with screening as the main goal. Back in my days in industry...we used this approach a lot and had lots of success with it solving root cause analysis type problems. As a wise engineer I once worked with was fond of saying, "Until you can turn a failure mode off and on, you don't know root cause."

Avinashpathade · Apr 7, 2024 09:56 AM

Thank you for detailing and clarification

Identification of Critical factors through PLS

Re: Identification of Critical factors through PLS

Re: Identification of Critical factors through PLS

Re: Identification of Critical factors through PLS

Recommended Articles