- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Principal component regression
Anyway, I have looked and looked, and don't see any reference, much less functionality, in JMP for PCR (or integration with DoE for that matter). Am I missing something, and does this functionality exist in JMP, even if only in JSL form? If not, does anyone have experience with it? Would you just use the inverted correlation matrix on the PCs, basically the inverse of the PC formula column?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
Hi,
I dabbled with the same idea and was told that was nonsense on this board. Still, there may be a labor-intensive way to get what you want.
- Calculate the Principal Components from your data set (Multivariate Methods > Principal Components)
- In the report, go to Save Columns > Save Principal Components Values. The number of PCs you save depends on your data, and I tend to use PCs that capture at least 80% of the data variation
- Then, treat the PCs as any variables in your model (e.g., Response Screening, Fit Model,..)
Note: This approach is directly inspired by the workstream in R as explained on this page (LINK)
Best,
TS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
Thanks for the reply! So what is people's beef with PCR? Is it a fundamental statistical issue, or is it something else? Like i have a paper in using that takes 700 solvents across 100 properties. Seems highly unlikely there are 100 truly independent, fundamental aspects to the solvent space, and PCR seems like a reasonable way to reduce it to a manageable size. I'm sure there are many other places too for zeolites, minerals, etc. And that's just chemical.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
Could a Partial Least Squares regression be a more simple solution ?
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
I'm not sure that people have "beef" with PCR, but it has some limitations. Ultimately, you must be certain about what you want from your modeling effort.
PCR is a biased regression technique. This is not necessarily bad, but your model would be predictive only. It is not intended to provide any insights into what CAUSES things to happen like a designed experiment would. It is truly just built on correlations, with no claims on causation.
When you perform PCA on the X's as @Thierry_S suggests, only keeping the first few principal components is only considering the X's, not the target or response variable, Y. So you may not end up with the best model. The last principal component might have the best predictive ability of the target, so you should keep ALL principal components to build your PCR model. Once you fit a model with all PCs, you can then remove the insignificant ones to get a final model.
Your last post gets at the major issue with PCR: how do you translate back to the original X space. There are several possible approaches. One approach is to keep the factor that has the highest loading from each of the significant PCs. A second approach is to remove the factor that has the lowest loading from each of the significant PCs. And there are other possibilities. Which approach you choose will depend on what you want from a model.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
Ok so here's a thought. And this is delving into stuff I'm really not familliar with so let me know if this is absurd. Even though the Experimental Design using PCs results in a model in terms of PCs, the y vector is still in real units. So could one do what @Thierry_S suggested, generating the PCs. This way you are able to generate a design that at least tries to pick points that are along directions of maximum variance. Load those into the DoE platform as covariates. And then, I was just looking through the PLS platform @Victor_G shared, maybe instead of making a model in terms of the PCs, do a PLS regression of the final data against the original data set.
That way you are just using the PCs to make sure the usage of the original column space is maximized in your design, but you're using a regression model that prioritizes the original, physically grounded column space.
Is this reasonable? My first worry is that this just relegates the optimality criterion to the proverbial back room, but from what I gather, PLS is designed to handle highly correlated variables, so the need for optimality criterion is lessened? There might be other issues as well.
I'm just a lowly chemical engineer, so let me know if I'm not treating the statistics well here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
@ehchandlerjr for the design creation, I see two options :
- Using PCs as the factors in the design (with the risks mentioned by @Dan_Obermiller in terms of predictivity, but you might have a good experimental space coverage),
- Directly use your factors as covariates, since depending on which factors you have, you might not be able to change the levels independently : for example with chemical characteristics, you can't change molecular weight, topological surface area, carbon chain length fully independently for a set of molecules. Preparing a candidate set with all possible combinations of your factors and using these factors as covariates could help stay in the same original inputs, enable a good coverage of your experimental space, and directly model your responses with these factors (through PLS or other models). More infos here: What is a covariate in design of experiments?
Developer Tutorial - Handling Covariates Effectively when Designing Experiments - JMP User Community
Good luck for L'Oreal, keep me informed
Best,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Principal component regression
Perhaps my definition of an experiment is different than yours? I guess my practical question is: What are you actually manipulating in the experiment? I don't think the PC is necessarily directly translatable to a factor or even a set of factors that are or can be specifically and independently manipulated?
You can certainly regress on those, but I'm not sure this is an experiment? I mean if all you have is covariates, that is not an experiment, that is regression. Usually experiments include factors that are considered fixed effects. Covariates random effects. Together you get a mixed model. You are typically limited in the number of random variables you can account for in the mixed model.