Choose Language Hide Translation Bar
SvenSteinbusch
Explorer

Using the Functional Data Explorer to Combine Online With Offline Data for Statistical Analysis (2020-EU-EPO-344)

Level: Intermediate

 

Sven Steinbusch, Senior Project Leader Upstream, Lonza AG
Martin Demel, JMP Senior Systems Engineer, SAS

 

The ability to explore the knowledge behind data will be an important strategic asset in the future. For offline data, statistical analysis are well established and are used extensively. The statistical analysis of online data and its correlation to the respective offline data is highly complex. However, the implementation of new data analysis tools in JMP Pro provides a way to make online data readily available for statistical analysis.

Using the Functional Data Explorer (FDE) and the Generalised Regression application, we successfully analysed data of a biopharmaceutical process and discovered hidden and unexpected correlations between online and offline data. These new tools allow us to gain more understanding of our processes, which in turn will allow us to better identify options in order to optimize our processes.

 

Comments

Hi Sven, interesting poster!

if I understand well is that (1) you collect several online spectra (2) use fuctional data explorer to find the eigenfuctions and FPC's and finally (3) use generalized regression to built a model using the scores of the spectra on the FPC's as input factors and the concentration of the product as the response? Why generalized regression and not OLS?

thanks for info! Frank

 

Hi Frank,

for the first part I let @SvenSteinbusch answer. Regarding GenReg vs. OLS, there are two main reasons:

  1. Part of the data used was based on a DOE and we then used the built-in GenReg functionality inFDE Explorer to fit the FPCs based on the DOE factors as supplementary variables.
  2. GenReg provides an interactive solutionpath plot which indicates when you start to overfit when adding mroe and more model effects. So you will be able to do variable selection in a much more interactive way than using fit model and OLS. In addition you have many more advanced methods you can choose from which are optimized for certain situations and goals. These methods are in general most of the time more robust against outliers, multicolinarities, missing values, distributional issues, censored data and more. And this might happen more often than expected or thought about in the past.

So summarized, as you have JMP Pro with the Functional Data Explorer, you should also take advantage of the more advanced modeling methods. You can easily compare these models with the OLS one and see if this makes a difference. In many situations you will get a more useful model. However, to be fair, in some situations an OLS model is equally useful. 

 

Hope this helps a bit. Martin

Hi Martin, thanks for your clarifying answer, I still have a couple of questeions:

1. Your first point --> did you model the response "Product Concentration" using the FPC's and DOE factors as input variables? 

2. In the presentation I read that you predict product concentration at several time points; I assume that you collected concentration data at several time intervals c(t) and for each time interval ti you created a model for the concentration c(ti)?

thanks for input! Frank

Hi Frank,

many thanks for your Feedback!

I try to answer your questions:

  1. we collected online process data of one parameter from several experiments.
  2. we used the FDE to analyse the data, to create a functional model and to get the FPCs for the subsequent analysis 
  3. The FPCs were finally linked with the product concentration offline data in order to predict the concentration based on the FPCs respectively on the online process data

In the first part of the poster (left site), the variances in the online process data were correlated with the product concentration using the FDE.

In the second part (right site) we predicted the product concentration of a running process based on the historical data. The historical data were used to create a prediction model for the product concentration, which is based on the FPCs from FDE analysis. We took the current available data of the running process at different time points, included this data into the FDE model, extracted the specific FPCs values of the running process and entered this data in the existing prediction model. In this way, it was possible to predict the product concentration at different time points of the running process.

Best wishes

Sven

Sven, thanks for this clear answer, great job! Frank