cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ehchandlerjr
Level V

Principal components and screening DoE

Hello - I am currently using PCA on 60+ descriptors of chemical elements to reduce the dimensionality and convert "element" from a categorical variable to a continuous variable. However, I cannot find a simple way to use a set of factors with lots of required level combinations. My two thoughts are the "set covariate factors" or the disallowed combinations script. However covariate runs A) has a weird suggestion of the number of runs equaling the number of levels of my PC's, even without me checking the "Include all selected covariate rows in the design" and B) seems antithetical to the concept of PCA, which generates uncorrelated components. The disallowed combinations script I assume would work (haven't explored it yet), but I have never played with JSL and so I wanted to see if anyone had any suggestions before I go down that road.

 

Thanks so much!

 

P.S. I also included the (unfinished) table and a separate one for the PC's that I'm using to generate the DoE if that helps.

Edward Hamer Chandler, Jr.
2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: Principal components and screening DoE

You can add a covariate factor. JMP will assume that you want to use every row in the covariate data table. Reduce the number of runs, though, and JMP will find the optimal subset of rows. It will highlight them in the source data table, too.

View solution in original post

Phil_Kay
Staff

Re: Principal components and screening DoE

Hi,

 

I think that you might need to add a bit more explanation of what you are trying to achieve. Can you elaborate on "use a set of factors with lots of required level combinations."

 

Using PCA to reduce the dimensionality of a system like this is a fairly standard approach. The next step is often to build an experiment using the principal components as covariate factors and selecting candidates to give good coverage of the principal components space.

 

In the attached example with solvents, PCA reduces the solvent properties to 2 components (essentially "polarity" and "bulkiness"). These components are then used as covariate factors in Custom Design to select 9 solvents to estimate the response surface model. You can see the 9 solvents selected cover the PC space, and hence the original descriptor space, very nicely:

 

Phil_Kay_0-1670256915140.png

 

I hope this helps,

Phil

 

View solution in original post

12 REPLIES 12

Re: Principal components and screening DoE

You can add a covariate factor. JMP will assume that you want to use every row in the covariate data table. Reduce the number of runs, though, and JMP will find the optimal subset of rows. It will highlight them in the source data table, too.

ehchandlerjr
Level V

Re: Principal components and screening DoE

Ah that makes sense then. Thanks!

Edward Hamer Chandler, Jr.
ehchandlerjr
Level V

Re: Principal components and screening DoE

One more question. If I am doing this with two sets of covariates (one PC set for elements, and another for solvents), how would I do this? I tried to put another set in the table that has data occupying different rows than the first PC's just as a trial (as seen below), and the design failed

 

Thanks.

 

ehchandler_0-1670256557343.png

 

Edward Hamer Chandler, Jr.
Phil_Kay
Staff

Re: Principal components and screening DoE

Hi,

 

I think that you might need to add a bit more explanation of what you are trying to achieve. Can you elaborate on "use a set of factors with lots of required level combinations."

 

Using PCA to reduce the dimensionality of a system like this is a fairly standard approach. The next step is often to build an experiment using the principal components as covariate factors and selecting candidates to give good coverage of the principal components space.

 

In the attached example with solvents, PCA reduces the solvent properties to 2 components (essentially "polarity" and "bulkiness"). These components are then used as covariate factors in Custom Design to select 9 solvents to estimate the response surface model. You can see the 9 solvents selected cover the PC space, and hence the original descriptor space, very nicely:

 

Phil_Kay_0-1670256915140.png

 

I hope this helps,

Phil

 

ehchandlerjr
Level V

Re: Principal components and screening DoE

Hi @Phil_Kay - Yes so that's exactly what I meant. I worded it that way because I was talking about what I was trying to get JMP to do on a mechanical level, rather than ask a more statistical question. The solvent plot you shared is exactly the kind of thing I'm planning on doing, with both solvents and f-block metals, though I can't figure out how to put both sets of PC's into a DoE simultaneously without JMP thinking the PC's are all covariates of each other.

Edward Hamer Chandler, Jr.
Phil_Kay
Staff

Re: Principal components and screening DoE

It's a good question (re: selecting 2 different types of materials in a DOE) . I don't think there is an easy way to do this. But I think that a workaround is possible.

The workaround would be to create a table with 1 row for every possible combination of solvents and f-block metals. So the number of rows would = number of solvents * number of metals. I would have to think about the easiest way to create this.

You would need columns of PCs for solvents, and columns of PCs for metals (I think you could create these before or after the big table of all combinations). Then use all PC columns as the covariate factors.

 

Re: Principal components and screening DoE

You cannot combine two sets of PCAs as covariates. Custom Desing considers the table of covariate values as the runs in your design to be completed with the levels of the other factors. JMP finds the optimal factor levels in combination with the covariate combinations. Operationally and philosophically, you would need a covariate data table with the metals and the solvents. Still, there is no meaningful way to match the metals and the solvents row-by-row. They are truly independent.

ehchandlerjr
Level V

Re: Principal components and screening DoE

So I used Microsofts power query functionality in Excel to make the paired table of covariates like @Phil_Kay suggested, and I've attached it, but @Mark_Bailey you said "Still, there is no meaningful way to match the metals and the solvents row-by-row." Does this mean you don't think this would work, or just its more the brute force method than one that makes meaningful sense?

Edward Hamer Chandler, Jr.

Re: Principal components and screening DoE

Combining the metal table and the solvent table using a Cartesian join will give you all possible pairings of metals and solvents. Then you can optimally subset this table with Custom Design. I was thinking in the 'Custom Design' box, not the 'do-it-yourself box' as Phil was.