cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
MarcP
Level III

LDA vs Predictor screening to determine which process parameters describe differences between runs

Currently I'm investigating a unit in a chemical process. This unit runs for several months before it's cleaned. The performance differs between runs. My performance indicator for a run has 3 levels: Low, Med, High. There are about 30 process parameters (pressures, temperatures etc) identified as candidates that could describe/influence the performance indicator. These process parameters vary during a run. I try to identify which of these process parameters are the most influential and what there influences are (positive/negative).

I used LDA and predictor screening. The list of top 10 most influencing process parameters differ between the 2 methods. Because the methods are different I expected some differences, but not as extensive. My question: how to identify which of the 2 methods gives me the most reliable answer?

1 ACCEPTED SOLUTION

Accepted Solutions
P_Bartell
Level VIII

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

I (forgive me I should have read closer) just noticed you only have 13 runs? So automatically you have far fewer runs than you have predictor variables (30). Typically tree and many linear modeling methods need a larger number of runs than parameters you are trying to estimate. So I'm not surprised you are seeing differing answers. LDA has some dimensionality reduction features. Quite frankly without JMP Pro (if you had you could use FDE or partial least squares or perhaps one of the penalized regression  methods). Good luck. 

View solution in original post

10 REPLIES 10
ian_jmp
Level X

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Very hard to say without the data, or a reasonable sample thereof. Given that 'process parameters vary during a run', what does this variation actually look like within a run, and how many runs do you have data for? How many times do each level of the performance indicator occur?

MarcP
Level III

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

I have 13 runs. Each level occurs approx 4 times. Flows change depending on required throughput, pressures can change because of ambient conditions or long time fouling

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

So the predictor variables vary over time (during a run)? You might try Functional Data Explorer in JMP Pro to obtain the functional principle components, and then use them as predictors instead of the original measurements.

P_Bartell
Level VIII

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Let me start with one basic question. What are your criteria for '...most reliable answer.'? Are you conducting an investigation where you are seeking an exploratory 'answer' or a predictive 'answer'? The data analytics methods I'd recommend will in large measure be dictated by your answer.

 

MarcP
Level III

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

I want to identify which process variables explain why a run has a low or high performance. Next step is to see if I can set these variables to a bandwidth (or compensate with other variables) to maintain high runs on a consistent basis. From an engineering perspective, a number of candidate variables have been identified. The analysis is meant to find the most influential variables. (like a Root Cause Analysis)

 

I hope this answers your question. I wanted to prevent mislabling my question as exploratory or predictive 

P_Bartell
Level VIII

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Ideally I recommend the idea @Mark_Bailey shares. Depending on what you observe in the predictor variables if they are highly correlated with each other...Functional Data Explorer one path for identifying the most influential variables. An issue you may run into is multicollinearity among the predictor variables. FDE can handle this issue...but if possible once you use FDE for variable identification...nothing beats DOE for confirming A. variable identification and B. developing a predictive model for the process.

MarcP
Level III

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Unfortunately I don't have JMP Pro; I only have JMP. So what would then be the best alternative?

P_Bartell
Level VIII

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Here's where I'd start...my experience with processes similar to what have described so far is there may be lots of multicollinearity among the predictor variables. Have you investigated this issue? Your best bet there is to start with time series plots of the predictors...you don't have too many so using something as simple as Graph Builder could work. In fact, in the FDE platform these types of plots are a standard and very valuable report output. From there if lots of multicollinearity is present you might want to try and use principal components analysis to identify which components are contributing to the majority of variability in the predictors. Having only a categorical response limits your modeling options as well.

 

But my strongest recommendation is if you aren't familiar with this pathway or methods...find yourself someone who is that is willing to work with you and go through the data and problem. There are lots of ways to 'go'...and it's hard to walk through all the different pathways without seeing the actual data...and understanding more about the chemical process of interest.

MarcP
Level III

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

You're correct: there's a significant amount of colinearity. That's why I selected LDA and predictor screening. I get a 4% mislabeling in my LDA and can clearly distinguish between low's and high's. Next I looked at the Standardized scoring coefficients, and ranked them in order of magnitude.

 

Additionally, I ran a predictor screening with a 1000 trees. When I compare predictor ranking with the ranking of the standardized scoring coefficients from the LDA, there's hardly any match between higher predictor ranking and high standardized scoring coefficients.

Because they are 2 different methods I expected some differences, but not this much.

What could be the reason and which ranking list would be the most reliable (how to check the performance of the prediction screening?)