Solved: LDA vs Predictor screening to determine which process parameters describe differ...

MarcP · Jun 8, 2023 5:42 PM

Currently I'm investigating a unit in a chemical process. This unit runs for several months before it's cleaned. The performance differs between runs. My performance indicator for a run has 3 levels: Low, Med, High. There are about 30 process parameters (pressures, temperatures etc) identified as candidates that could describe/influence the performance indicator. These process parameters vary during a run. I try to identify which of these process parameters are the most influential and what there influences are (positive/negative).

I used LDA and predictor screening. The list of top 10 most influencing process parameters differ between the 2 methods. Because the methods are different I expected some differences, but not as extensive. My question: how to identify which of the 2 methods gives me the most reliable answer?

P_Bartell · Dec 15, 2021 03:49 PM

I (forgive me I should have read closer) just noticed you only have 13 runs? So automatically you have far fewer runs than you have predictor variables (30). Typically tree and many linear modeling methods need a larger number of runs than parameters you are trying to estimate. So I'm not surprised you are seeing differing answers. LDA has some dimensionality reduction features. Quite frankly without JMP Pro (if you had you could use FDE or partial least squares or perhaps one of the penalized regression methods). Good luck.

View solution in original post

ian_jmp · Dec 10, 2021 06:56 AM

Very hard to say without the data, or a reasonable sample thereof. Given that 'process parameters vary during a run', what does this variation actually look like within a run, and how many runs do you have data for? How many times do each level of the performance indicator occur?

MarcP · Dec 10, 2021 07:50 AM

I have 13 runs. Each level occurs approx 4 times. Flows change depending on required throughput, pressures can change because of ambient conditions or long time fouling

Mark_Bailey · Dec 10, 2021 08:27 AM

So the predictor variables vary over time (during a run)? You might try Functional Data Explorer in JMP Pro to obtain the functional principle components, and then use them as predictors instead of the original measurements.

P_Bartell · Dec 10, 2021 08:26 AM

Let me start with one basic question. What are your criteria for '...most reliable answer.'? Are you conducting an investigation where you are seeking an exploratory 'answer' or a predictive 'answer'? The data analytics methods I'd recommend will in large measure be dictated by your answer.

MarcP · Dec 10, 2021 09:03 AM

I want to identify which process variables explain why a run has a low or high performance. Next step is to see if I can set these variables to a bandwidth (or compensate with other variables) to maintain high runs on a consistent basis. From an engineering perspective, a number of candidate variables have been identified. The analysis is meant to find the most influential variables. (like a Root Cause Analysis)

I hope this answers your question. I wanted to prevent mislabling my question as exploratory or predictive

P_Bartell · Dec 10, 2021 09:15 AM

Ideally I recommend the idea @Mark_Bailey shares. Depending on what you observe in the predictor variables if they are highly correlated with each other...Functional Data Explorer one path for identifying the most influential variables. An issue you may run into is multicollinearity among the predictor variables. FDE can handle this issue...but if possible once you use FDE for variable identification...nothing beats DOE for confirming A. variable identification and B. developing a predictive model for the process.

MarcP · Dec 10, 2021 09:24 AM

Unfortunately I don't have JMP Pro; I only have JMP. So what would then be the best alternative?

P_Bartell · Dec 10, 2021 01:02 PM

Here's where I'd start...my experience with processes similar to what have described so far is there may be lots of multicollinearity among the predictor variables. Have you investigated this issue? Your best bet there is to start with time series plots of the predictors...you don't have too many so using something as simple as Graph Builder could work. In fact, in the FDE platform these types of plots are a standard and very valuable report output. From there if lots of multicollinearity is present you might want to try and use principal components analysis to identify which components are contributing to the majority of variability in the predictors. Having only a categorical response limits your modeling options as well.

But my strongest recommendation is if you aren't familiar with this pathway or methods...find yourself someone who is that is willing to work with you and go through the data and problem. There are lots of ways to 'go'...and it's hard to walk through all the different pathways without seeing the actual data...and understanding more about the chemical process of interest.

MarcP · Dec 15, 2021 05:04 AM

You're correct: there's a significant amount of colinearity. That's why I selected LDA and predictor screening. I get a 4% mislabeling in my LDA and can clearly distinguish between low's and high's. Next I looked at the Standardized scoring coefficients, and ranked them in order of magnitude.

Additionally, I ran a predictor screening with a 1000 trees. When I compare predictor ranking with the ranking of the standardized scoring coefficients from the LDA, there's hardly any match between higher predictor ranking and high standardized scoring coefficients.

Because they are 2 different methods I expected some differences, but not this much.

What could be the reason and which ranking list would be the most reliable (how to check the performance of the prediction screening?)

LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs

Re: LDA vs Predictor screening to determine which process parameters describe differences between runs