Today I'm presenting with Chris Gotwalth from JMP.
We're going to talk about how to model data from designed experiments
when the response is functional curve.
Functional or curve responses occur very often in industry.
Thanks to the new development of JMP,
we can now model and predict functional responses
as a functional of key DOE or product design factors
using both functional DOE or curve DOE modeling.
A functional DOE model is purely empirical.
However, a curve DOE model can take into account mechanistic
or expert knowledge on the functional form of the curve responses.
In this presentation, the method and results of predicting
functional responses using functional DOE and curve DOE modeling will be compared
using case studies from the consumer product industry.
This is the outline of a talk.
We will break the talk into two parts.
In the first part, Chris will talk about
what are the functional data examples of functional data
and then he will help you with fundamental understanding
of the functional DOE modeling,
including functional principle component analysis
as well as curve DOE modeling.
In the second part,
I will use two examples from Procter & Gamble
and compare the results of functional DOE and curve DOE modeling
using these two examples.
The first example is Modeling Viscosity
over Time Data from F ormulation Experiment.
The second example is Modeling Absorption Volume over Time Data
From a Diaper Design of Experiment.
Then I will finish the talk with a brief summary and conclusion.
Thanks Fangy i.
Now I'm going to give a quick intro to functional and curve data analysis.
But first I want to point out
that there is a lot of this kind of data out there and JMP really has made
analyzing curve response data as fast, easy and accurate as possible.
If you haven't heard of functional data analysis before,
you have certainly seen it out there.
It's all over the place,
and I'll show you some examples to make that clear.
For example, here are annual home price indices
from 1992 -2021 for all 50 US states.
Each function has a beginning measurement
followed by a sequence of other measurements
and then a final measurement.
They all have a beginning, a middle and an end.
The functions don't have to all have the same start and endpoints
or measurements at the same times.
In a time series analysis, we are really interested in using data
to predict forward into the future using data observed from the past.
In a functional data analysis or a curve data analysis,
we are generally more interested
in explaining the variation internal to the functions
than predicting beyond the range of times we've observed.
In product and process improvement in industry,
we are often working on non-financial curves.
I'm going to show you some examples that our customers have shared with us.
Here we see a set of infrared spectra of gasoline samples
used to develop an inexpensive tool to measure octane and gasoline.
The green curves had high octane, and the red ones were low in octane.
The height of the left peak turned out to be critical
for predicting octane level.
Microbial growth curves
are a common type of functional data in the biotech industry.
Today, F angyi will be demonstrating two methods in JMP
that can be used for analyzing DOEs,
where the response is a set of measurements.
The first method is called functional DOE analysis
and is best for complicated response functions like spectra
when you need the model to really learn the curves and the data from scratch.
The second is a curve DOE analysis,
which is based on non-linear regression models.
When you can use the curve DOE analysis,
I found that you get more accurate results with it.
But if you can't get the curve DOE analysis to work,
you can always fall back on the functional DOE analysis,
as it's more general than curve DOE.
The critical step in functional data analysis
that will be new to most people
is called functional principle components analysis,
also called FPCA for short.
This is how we decompose the curves into shape components
that describe the typical patterns we see in the curves,
as well as weights that attribute how strongly each individual curve
correlates with those shape components.
It's a kind of dimension reduction and data compression technique
that reduces all the information in the curves
into the most compact representation possible.
To illustrate FPCA, take a look at the set of curves in the plot here.
What do they have in common?
How do they differ from one another?
What I see in common
is a set of peak shapes with one peak per curve,
and the shapes go to zero away from the peak.
They also appear to be symmetric around the center of the peak.
In terms of differences, I see variation in peak heights,
and there are clear horizontal shifts from left to right,
and some curves are also narrower than other ones.
In a functional data analysis,
the first thing we do is find a smoothing model
that converts or approximates the discrete measurements,
converting them into continuous functions.
There's a variety of smoothing models in FDE.
I don't really have a firm rule as to which one is the best in general,
but here are my observations about the most common ones.
Wavelets and splines have different strengths.
Wavelets are new in JMP Pro 17
and are very fast and are generally the best with complicated functions
such as spectra, as long as the X coordinates of the data are on a grid.
On the other hand, there are B and P splines,
which are slower computationally
but are better for data with irregularly- spaced X s,
and are also often better
when there are only a dozen or fewer measurements per function.
If the data aren't large, I would try both splines and wavelets
and see which one is giving us the best fit
by looking at the graphs.
The main graphs I use to make decisions about smoothing models
are actual by predicted plots
and you wanted the one that hugs the 45- degree line more closely.
In this case, I would choose the wavelets model on the right
over the spline model on the left,
because those points are tighter around that 45- degree line.
Immediately after JMP Pro fits a smoothing model to the data,
it decomposes the signals
into dominant characteristic shapes it found in the data.
In mathematical language, these shapes are called eigenfunctions,
but a better and more approachable name would be to call them shape components.
Here we see that JMP has found
that the overall mean function is a peak shape
and that there are three shape components
that explain 97% of the variation in the data.
The first shape component appears to correspond to a peak height.
I've learned to recognize that the second shape
is a type of left- right peak shift pattern and that the third shape component
is something that would control the peak width.
Remember that these are shapes learned from the data,
not something that I gave JMP outside of the data.
What has happened is the observed spectra in the data
has been decomposed into an additive combination
of the shape components
with unique weights for each individual curve.
The functional PCA is like reverse engineering the recipe of the curves
in terms of the shape components.
The mean function is the thing that they all have in common.
The shape components are the main ingredients.
And the weights are the amounts of the ingredients
in the individual curves.
The functional DOE analysis is the same mathematically
as extracting the scores or weights
and modeling them in fit model with the generalized regression platform.
Fortunately, there is a red triangle option
in the Functional Data Explorer that automates the modeling,
linking up the DOE models with the shape functions for you
and presenting you with a profiler
that connects the DOE models with the shape functions.
You can directly see how changing the DOE factors
leads to changes in the predicted curve or spectra.
There are many potential applications of functional DOE analysis,
some of which Fangyi will be presenting later in this talk.
There is another approach in JMP called curve DOE modeling.
This answers the same kind of question as functional DOE,
but it is nonlinear regression based rather than spline or wavelet based.
What that means is that if you have a good idea of a nonlinear model,
like a three- parameter logistic model, and if that model fits your data well,
you can get models and results
that generalize better than a functional DOE model,
because the general shape of the curve
doesn't have to be learned from scratch from the data using splines or wavelets.
The idea being that if you can make assumptions about your data
that reproduce the modeling effort needed,
your predictions will be more accurate, especially from small data sets.
Curve DOE analysis has a very similar workflow
to a functional DOE analysis,
except that you go through the Fit Curve platform
instead of the functional Data Explorer,
and instead of choosing wavelets or splines,
you chose a parametric model from the platform.
Just like in a functional DOE analysis,
you want to review the actual by predicted plot
to make sure that your nonlinear model is doing a good job of fitting the data.
A curve DOE analysis is the same as modeling
the nonlinear regression parameters
extracted from the curves using the generalized regression platform.
This is the same thing as what's going on with a functional DOE analysis
with the FPCA weights.
Fit Curve automates the modeling and visualization just as FDE does.
Once you know functional DOE analysis,
it's really not very hard at all to learn curve DOE analysis.
Now I'm going to hand it over to F angyi
who has some nice examples illustrating functional DOE and curve DOE.
Thanks Chris.
Next I'm going to talk about two examples from Procter & Gamble.
The first example is viscosity over time curves
collected from a number of historical formulation experiments
for the same type of liquid formulation.
There are six factors we would like to consider for the modeling.
They are all formulation ingredients and we call them factor one to factor six.
The goal of our modeling is to use these formulation factors
to predict or optimize viscosity over time curve.
The response of modeling is viscosity over time.
This slide showed you some viscosity over time data.
For majority of our formulations, the viscosity of the formulations
would increase first with time and then decrease later on.
Next, we're going to perform functional DOE analysis on viscosity over time data.
Before functional DOE analysis,
we need to perform functional principal component analysis
on the curves smooth using different method.
Here, we apply functional principal component analysis
to the curves first using B-s plines
and find five functional principal component
where they cumulatively explains about 100% of variation in the curves.
Each of the curve would express
as the sum of the mean function plus linear combination
of the five functional principal components
or eigen functions also called shape function.
We also apply direct functional principal component analysis to the data
where it find four functional principal components
that cumulatively explains
about 100% of variation across viscosity over time curves.
E ach curve will then be expressed as the mean function
plus linear combination of the four functional principal components.
This slide compares the functional principal component analysis model fit
using two different options.
The first one is using the B-s pline option
and the second one is using the direct functional PCA analysis.
As you can see using the B -spline option, the model fit seems to be smoother
as compared to the model fit using direct functional PCA analysis.
This slide showed you the diagnostic plots,
the observed versus predicted viscosity
from the functional principal component analysis
using two different options.
Using direct functional PCA analysis,
the points are closer to the 45- degree lines
as compared to the one using B-s pline option,
indicating that direct functional PCA analysis
fits the viscosity over time data
slightly better than the functional principal component analysis
using B-spline option.
After performing functional principal component analysis,
there's an option in JMP, you can perform functional DOE modeling
and get functional DOE profiler.
For functional DOE modeling,
basically it's combining the functional rincipal component analysis
with the model for the functional principal component scores
using formulation factors.
For this profiler we can predict the functional responses,
in our case, is viscosity over time curves using different formulation factors.
You can select a combination of the formulation factors
and it's able to predict the viscosity over time curve.
This slide shows you the diagnostic plots, the observed versus predictive viscosity
and also the residual plots from the functional DOE modeling.
As you can see that the residuals from the functional DOE modeling
are larger than the functional principal component analysis
before the functional DOE modeling.
Our colleagues at Procter & Gamble
actually they find that Gaussian Peak model would fit
individual viscocity curves very well.
This Gaussian Peak model has three parameters A, B, C,
and this A indicates the peak value of the viscosity over time curve
and B is a critical point,
which is a time when viscosity reaches maximum,
and C is a growth rate.
The rate of the viscosity increase during the initial phase.
This is the fitting of the viscosity over time curve
using the Gaussian Peak model
using a feature in JMP, called curve fitting.
These are the diagnostic plots
of the viscosity curve fitting using the Gaussian Peak models.
It looks like the model fitting are not too bad,
however, the arrows seems to be larger than the arrows from the fitting
using functional principal component analysis.
After curve DOE fitting using Gaussian P eak model,
there's option in JMP you can perform curve DOE modeling.
Basically, curve DOE model is combining
the parametric model for the curves, the Gaussian Peak model,
and the model for the parameters of the Gaussian Peak model
express the parameter as a function of formulation factors
using generalized regression models.
Then you get the curve DOE model
and this is a profiler of the curve DOE model.
Using this profiler you can predict the shape of the curve
by specifying combination of the formulation factors.
Actually, this profiler is somewhat different
from the functional DOE profiler we got previously.
These are the diagnostic plots from curve DOE model.
As you can see here that the curve DOE model
does not fit the data well and it's much worse than the functional DOE model.
These are the curve DOE model fit on the original data.
As you can see that for a number of formulations,
the curve DOE model does not fit the data well.
This is a comparison of the profilers
from functional DOE model and curve DOE model.
As you can see that the profilers, they look quite different.
This compares the diagnostic plots
from functional DOE model and curve DOE model.
As you can see that functional DOE model
fits the data much better than the curve DOE model
with a smaller root mean square error.
Now I'm going to show you the second example.
This example is from a diaper design of experiment
with four different product A, B, C, D
at three different stations labeled as S1, S2 and S3,
so it's a factorial design.
Diaper absorption volume was measured over time
for these four different product at three different stations.
The response is diaper absorption volume over time
and the goal is to understand the difference
in diaper absorption curves across different products and stations.
These are a few examples of diaper absorption volume over time curves
where the fitting lines are smoothing curves.
We performed functional principal component analysis
on the diaper absorption volume over time curves
and this functional principal component analysis
was able to find five functional principal component
where cumulatively,
they explains about almost 100% of variations among multiple curves.
These are the functional principal component analysis model fit.
As you can see, for almost all the curves,
the fitted curve plateaued after a certain time point.
Functional principal component analysis model fitted curves really well
as you can see from the diagnostic plots.
We performed functional DOE modeling
of the functional principal component analysis
and this is profiler of the functional DOE model.
This model allows us to evaluate shape of the curve
for different diaper products at different measuring stations.
The product comparison at station two seems to be different
from the product comparisons at station one and station three.
These are the diagnostic plots of the functional DOE model.
Next, we would like to perform curve DOE modeling.
Before curve DOE modeling,
we would like to find some parametric model
that fits the diaper absorption volume over time data well.
I found that there's a function in JMP called biexponential 4P model.
This model is a mixture of two experiential model
with four unknown parameters.
This model fits all the diaper absorption volume over time curves really well.
These are the diagnostic plots of the curve fitting and you can see
that the biexponential 4P model fits all the curves really well.
After fitting diaper absorption volume over time curves
using biexponential 4P model, we performed curve DOE modeling using JMP
and this is a profiler of the curve DOE model.
Using this profiler, you are able to see the shape of the curve
as a function of diaper product as well as a measuring station.
This is a profiler of product A at station two and then station three.
These are the diagnostic plots of the curve DOE model
and you can see that curve DOE model fits the data well,
except that at higher diaper absorption volume,
the residuals are getting larger.
These are the curve DOE model fit on the original data.
As you can see that for most of the curves,
this model fits the data really well.
This compels the model profiler
of the functional DOE model versus curve DOE model.
As you may notice that there's some difference
between these two profiler at the later time point.
The predicted diaper absorption volume at the later time point
tend to plateau from the functional DOE model,
but it continue to increase at later time point
using the curve DOE model.
This compares the diagnostic plots from the functional DOE model
versus curve DOE model using biexponential 4P model.
As you can see that both of these models fits the data really well,
with functional DOE being slightly better
with slightly small root mean square error.
Now, you have seen the comparison of functional DOE modeling
versus curve DOE modeling using two P&G examples
and this is our summary and conclusions.
Functional DOE modeling is always a good choice.
When the parametric model fits all the curve data well,
curve DOE modeling may perform really well.
However, if the parametric model does not fit the curve data well,
then the curve DOE modeling may perform poorly.
Functional DOE model is purely empirical.
However, curve DOE model
may take into account mechanistic understanding
or extrovert knowledge in the modeling, so it can be hybrid.
I t's good to try different method like different smoothing method
before functional principal component analysis.
In functional DOE modeling,
try functional DOE model versus curve DOE model
and see which one performs best.
This is end of our presentation.
Thank you all for your attention.