My name is Zhiwu Liang,
statistician from Procter & Gamble Company.
I'm support of the business in Brussels Innovation Center for P&G.
My main job is doing the consumer survey data analysis.
Today, Narayanan and I will present
the G rowth Curve Modeling to Measure Impact
of the Temperature and Usage A mount on Detergent Performance.
Next slide, please.
Here is the contents we will cover today.
First, I will give the brief introduction about the structural equation models
and a bit about the data we will be using for our modeling.
Then I will turn to Narayanan to introduce the growth curve modeling,
model building process plus the JMP demo.
Without showing, I will present the conclusion and next steps.
Next slide, please .
The structural equation modeling
is a multivariate technique
that is used to test a set of the relationship
between the observed and the latent variables
by comparing the model predicted covariance matrix
and observed covariance matrix.
In SEM, what we have done is,
observed variables are manifest variable
as the indicator for latent variables,
which is using the measurement model to construct.
Latent variables form a regression model
to build a network which we call the structure model.
Here is an example with the three latent variable,
eight o bserved variable in JMP, so the SEM structure.
As you can see in the button left chart,
the circle represent the latent variable,
which is calculated through the indicators like cleaning,
as the latent variable is indicated by the full square
represent the manifest variable
overall cleaning, stain removal, whiteness and brightness.
Same as the freshness, latent variable indicated by the three manifest variable.
If you look at the right side of the window,
the loading window show the structure for the measurement model
how this individual latent variable relate to the indicator.
The button of the regression window show the two regression model:
cleaning drive overall rating, freshness dry overall rating.
This is the structure for the structure equation model.
Next slide, please.
The data we use for our growth curve modeling
is the survey data we conduct in the France with the 119 consumer.
We divide this 119 consumer into groups.
Sixty of them use control products,
which is the Ariel, soluble unit dose, the pods,
in our data set, marked as 0.
Another 59 consumers use test product is the Ecolabel product, code as 1.
Each consumer, during the 12 weeks of the test,
first four weeks, they use their own products.
Then they will go to the eight week's test week,
use one of our assigning products, either use the Ariel SUD or Ecolabel.
Then for each time of the wash,
the consumer will fill in the questionnaire,
provide some information about their washing behavior,
such as the washing temperature,
number of the pods used, soil level of the fabric, how dirty it is,
and overall rating of the performance for the product.
Our modeling objective is try to test if there is a product's effect
on the overall performance rating,
washing temperature on the overall performance rating,
number of the pods used for overall rating
for each of the wash.
Next slide, please.
Since every consumer, they have a different washing habit,
they have different condition,
not all of the consumer has the same number of wash during the test week.
Therefore, to make every consumer
the weight equal in our model building data set,
we first aggregate the consumer data by the panelist level by weekly basis.
You take the average washing temperature
during that week for the particular consumer,
number of pods used,
and the overall rating for each load during that week.
After this aggregate data,
we use the exploratory tool like JMP Graph Builder
to identify if there's any linear trend for overall rating,
for temperature trend during the test week,
and the number of the pods using trend during the test week.
Since the exploratory stage , OAR is pretty stable
in the week 9 to week 12, we use the intercept only model for OAR.
Then for the temperature for the product and for the number of the pods used
from this exploratory stage,
we found there is either increasing or decreased trend.
Therefore, we use the linear growth model to describe the temperature indicator
and the number of pods indicator.
To explain the product impact,
we also including the product manufactured variable in our model.
Then we first build a growth curve model for temperature number of the pods,
then add this latent variable to build a regression model from products variable,
intercept of temperature, slope of temperature,
intercept of the number of pods used, slope of the number of pods used
to intercept of the OAR to build multivariate, the growth curve model.
Now I would turn to Narayanan to introduce latent growth curve model.
Narayanan, it's your turn.
Thank you, Zhiwu, for the great [inaudible 00:06:49].
Hi, everyone. My name is Narayanan.
I am an Adjunct professor at the University of Cincinnati,
but I teach courses on data mining using JMP.
I'd like to start by giving a very broad definition
of what is latent growth curve modeling.
As we go along, I may use the letters LG CM to represent
latent growth curve modeling,
and SEM to represent structural equation modeling.
Latent growth curve modeling is basically a way to model
longitudinal data using the SEM framework.
Because it is built in the SEM framework,
it has all the advantages of specifying and testing relationship,
as Zhiwu was explaining with the example of structural equation modeling.
A s a side note, I would like to mention that LGCM
is actually an application of confirmatory factor analysis,
which is actually a submodel within structural equation modeling
with the added mean structure,
and this will be explained when we get into JMP.
One of the benefits of using the SEM framework
is that we are able to evaluate model fit.
Let us look at the statement there,
which says, every model implies a covariance matrix and mean structure.
What this really means is that
the observed covariance matrix and the mean vector
can be actually reproduced by the model parameter estimates
which are estimated using the latent growth curve modeling.
The equality between the two is what many of these fit indices
are actually testing.
One of the oldest one is the chi-square test
and the hypothesis it is testing is actually listed there:
the equality between the population,
and the model predicted covariance matrix,
and the mean vectors.
However, this test, which is one of the oldest,
has some watch- out.
One is that the test statistic in a function is sample size,
which means that larger sample size will tend to reject the model
even for trivial differences.
The other one is that the test is global
and does not reflect the local fit such as could be measured by R-square.
A lso, the fit is too exact as specified in the hypothesis.
We know from the famous box statement that all models are wrong.
Our models are only just an approximation.
Because of this, there have been several alternative fit measures
that have been proposed.
I'd like to mention three of them here.
The first is the Root Mean Square Error of Approximation.
This is actually measuring model misfit, adjusting for the sample size,
which was an issue with the chi-square test.
This is actually a badness- of- fit measure, so lower numbers are better.
But one of the advantages of using this fit measure
is that we have a confidence interval for it,
and the suggested threshold for this fit measure
is that the upper bound
of the confidence interval is less than 0.10.
The next is a Comparative Fit I ndex and Non-Normed Fit Index.
These are relative estimates, and they're actually testing
how good is your proposed model
compared to a baseline model,
which is usually a model of no relationship.
This is a goodness- of- fit measure,
and so the suggested criteria here is that these fit measures
cross a threshold of at least 0.95.
The last one is a Standardized Root Mean Squared Residual.
This is actually an average squared residual of all the elements
in the covariance matrix.
This is a badness- of- fit measure.
Again, we are looking for smaller numbers,
and the suggested threshold here is that this value is less than 0.08.
On top of all this, finally, do not forget to check the actual residuals,
the standardized residuals.
What we are looking for here is numbers which are beyond
minus 2 and plus 2 threshold.
The idea here is to look at the totality of fit
and not just any one measure.
Having discussed fit measures,
now let us look at the longitudinal process we want to study.
Zhiwu described three different processes.
First one is success criteria as measured by overall satisfaction rating
from week 9 to week 12.
Then we have got two time vary ing covariates.
That means these are varying over time.
One is the temperatures setting
in which the product was used from week 5 to week 12,
and then the amount of product used also from week 5 to week 12.
Then finally, we have an indicator variable
indicating what type of product it is, and this is a time invariant covariate
doesn't change with time.
The modeling strategy we are going to use,
first, we're going to visualize data using Graph Builder.
Then we are selecting a univariate latent growth curve model
for each of the processes.
Then we combine all of them,
put together as a multivariate LGCM.
Then we'll finally test the hypothesis that Zhiwu proposed,
which is how well the product and other growth factors
impact overall satisfaction.
We will choose the simplest model when we build.
I am going to get into JMP.
I am running JMP 18, which is an early adopter version,
and I am going to show some scripts,
and I will show you how I got to some of these from the JMP platforms.
The first thing I want to do is visualize the overall satisfaction,
and these are trajectories.
What these are, are basically individual on each line from week 9 to week 12.
Here, the overall satisfaction plotted here
for each of the 119 consumers.
They're basically one trajectory for each consumer.
If you look at this particular consumer, row number 16,
that person's trajectory is on a downward trend
from week 9 through week 12.
They started somewhere in the mid-50s, and by the time they are in week 12,
their satisfaction measure has come down to about 37.5 on a scale of 0-100.
Let us look at another person.
This person here who used the Ariel product,
their trajectory is on an upward swing
going from the mid-70s probably to the early 90s
by the time they reach week 12.
They are getting more and more satisfied
week over week.
Sorry for that.
A bubble screen showing up.
What we want to do is we want to understand
how different consumers
are experiencing satisfaction over the weeks,
and the change in these processes for these consumers
is what we want to model using LGCM.
What I'm going to do is I'm going to turn on the script,
LGCM of overall satisfaction.
I have built here three different models.
What these are basically the latent variable corresponds to an intercept
for these repeated measures of the overall satisfaction
from week 9 through week 12.
I've built three different models.
I've built a fourth model,
which is a simplification of the first model.
I've built a no-growth model,
which means different people
have different levels of satisfaction in the beginning, which is week 9,
but then their trajectories flatten out and does not grow over time.
Second model is a linear growth model, which means that trajectories do change
in a linear fashion over time.
The third model is a quadratic model, which means their trajectories change
in a quadratic fashion over time.
Then finally, I've got a simplification of the first model,
but I'm assuming almost elasticity or no change in the variance across time.
I'm going to look at these fit measures
that I talked about
and choose the model that fits the best.
What I'm looking for is low values of chi-square,
high values of CFI, which means CFI goes on a scale from 0-1
and low values of RMSEA, which also goes on a scale from 0-1.
It looks like all my models, no- growth, linear growth, and quadratic growth,
fit the data equally well.
But however, I'm going to take the simplest of the models
because if I look at the estimates as I can
in the path diagram,
many of these coefficients relating to the slope,
the linear slope or the quadratic slope,
are actually not significant as shown by the dotted lines.
In this linear growth model, what we have is an intercept,
which measures the initial level of satisfaction,
and slope, which measures
the rate of increase of the satisfaction over time
or rate of decrease of satisfaction over time.
S lope measures that, intercept measures the initial level.
We can see all the estimates related to the slope
are actually not significant as indicated by dotted lines.
The same is the situation for the quadratic model also.
Therefore, I'm going to take the simplest of the model,
which is the no- growth model for this process,
which is overall satisfaction.
Let me show you how I do this.
In JMP, go under the Analyze and pick Multivariate and choose
the S tructural Equation Model platform.
Choose the repeated measures, in this case is OAR from week 9
through week 12.
Drop them in Model Variables box and click OK.
We have got these four repeated measures
available as modeling variables in the path diagram area.
I can build this model from scratch using the path diagram,
but JMP has made it easier by using shortcuts.
I'm going to go under the Model Shortcut, red triangle,
choose Longitude Analysis, and check the linear latent growth curve
or the intercept-only model.
If I choose the intercept-only model,
I get this path diagram which you saw in my script.
If I run the model,
you will get the estimates and the fit statistic for this model.
If you want to add the linear growth model
to do the same thing, come under M odel Shortcuts,
Longitudinal Analysis, and Linear G rowth Curve Model.
Now we have got not only an initial level
as represented by the intercept latent variable,
we've got the rate of growth of this process
as represented by the slope latent variable.
We can run this model.
Click on Run,
and you get the model estimates, as I showed you before,
which are not significant for the slope latent variable.
You get the fit statistics right here under the Model Comparison table.
T hese models are easy to fit in JMP using the model shortcut menu
available under the Model Shortcut.
I'm going to close the one I just created.
We have so far built
a univariate LGCM for a single process.
I'm going to repeat the same thing for the other two growth process we have,
and we're going to look at the wash temperature trajectories.
Let me show you how to do this in JMP.
In JMP, in Graph, click on Graph Builder and open up the temperature variables.
We want to look at temperature from week 5 through week 12.
Drop them on the x-axis.
For the type of graph you want,
choose the last icon in the bar at the top.
This is a parallel plot.
There will be some smoothness associated with this.
Drag this letter bar all the way to the left.
There should be no smoothness at all.
Take the product variable, which is an indicator variable,
put them on Overlay.
Now you get individual trajectories.
If you want to add the average trajectory,
choose the sixth icon on this toolbar from left.
Click on the Shift key and click on this.
Now you get that average trajectory
of temperature used over these eight weeks.
Click on Done
to get the plot with more real estate.
This is exactly the plot that I showed using the script.
You can clearly see that
from week 7 onwards, there might be a growth
in the temperature setting.
It looks like people are increasing the temperature
as time progresses from week 7 through week 12.
I'm going to close this.
We have a graph to visualize
the trajectories of the temperature setting.
We repeat the same thing.
We want to choose a model for that process.
A s before, I built the same three models:
a no-growth, a linear growth, and a quadratic growth.
I'm going to look at the fit statistic here.
This time, we see definitely a significant improvement
in going from the no- growth to linear growth
in terms of the fit statistics.
The quadratic growth is a marginal increase over the linear growth model.
Again, for the same reason as before,
all the estimates in the quadratic slope are actually not significant.
To keep things simple, I'm going to choose the simpler model,
which is the linear growth for temperature.
The last process is the pod usage.
This is the number of pods.
Now we can see clearly an increasing trend,
more so for the Ecolabel product,
which means people are using more and more products
when they use Ecolabel as compared to Ariel,
which is a P&G product.
I want to model this.
Let me close that.
Click on the script for LG CM of pod usage.
I'm going to look at the fit statistic.
A gain, I see a good model fit,
especially the linear and the quadratic.
For the same reason as before, I'm going to choose the linear model.
Here I want to look at
the estimates for the quadratic slope,
and this is what I mean by not choosing the quadratic slope
because you've got all the parameters point unit to that to be not significant.
Now we have got a model for each of the three processes.
We chose a no-growth model for overall satisfaction.
We chose a linear growth model for low temperature.
Now I'm going to put them all together
using a multivariate, latent growth curve model.
This is basically all the three processes put together.
Here , I want to show you the similarity
between a confirmatory factor analysis model
and latent growth curve model as was pointed out in the previous slide.
You can see that there is a mean structure added to it
with a triangle with a number one,
and there are lines going from that to each of the latent variables.
If I right-click and use the Show option and not show the mean structure,
you can see the familiar confirmatory factor analysis model
with latent variables and the indicators associated with each one of them.
We have a single latent variable, intercept for the overall satisfaction.
We have two latent variables for the temperature,
which is initial intercept and the slope.
We have the same two latent variables indicating the pod usage.
Initial level as represented by the int pods
and the rate of change of product usage as indicated by the slp pods,
which is basically the slope of pods.
Let me put back the means activated.
Now we can actually look at the estimates of these,
which are really one of the important pods
of the latent growth curve model.
What we have here is an estimate
of the initial level of satisfaction at week 9
because that was the starting time period for overall satisfaction.
That's about 71 on a scale of 0-100.
This is the average temperature setting at week 9,
which is 36 degrees Celsius.
Here is the product usage, 1.4 pouches.
Here is the rate of change of product usage
because there is a slope of product usage, the latency variable,
which is about 0.02.
People are using slightly more as time goes on.
That is what we get.
The overall fit of this model is also fairly good.
I think we saw that.
CFI exactly at the threshold 0.95,
and our upper bound of the RMSEA is definitely less than 0.1.
Now we go to the last model,
which is the hypothesis that Zhiwu wanted to test,
where we want to see if product,
the indicator variable, and the other growth factors
have a significant impact on overall satisfaction.
In order to remove the clutter,
I have not shown all the indicators.
All we are seeing is only the circles,
which represent the latent factors for each of the growth curve models
and a single product variable indicating what type of product it is.
Again, let us look at the fit of this model.
Fit of this model is indeed good.
We have a 0.95 for the CFI.
We have less than 0.1 for the upper bound of the RMSEA.
We will look at more fit indices
after we interpret some of the estimates here.
I'm going to interpret the solid lines which are significant coefficients.
We have a significant product effect
from the product variable to the intercept of overall satisfaction.
This can be interpreted basically as a regression coefficient,
which is the average level of satisfaction
for product coded 1
minus the average level of satisfaction for product coded 0.
Ariel is coded as product 0,
so we have much more satisfaction with Ariel,
a delta of negative 9 in favor of Ariel
on a scale of 0- 100.
That is a big change.
Delta in favor of the Ariel product.
Let us look at the product effect on pods.
Again, the same way, average amount of product used
for product coded 1 minus product coded 0.
This time, we are using more of the Ecolabel product.
If you are a manufacturer of Ariel, this is good news for you.
A lso, the rate of change of product use
is also more for Ecolabel compared to Ariel,
or 0.02 pouches from week to week.
Finally, we have the intercept of temperature
having a negative impact on the overall satisfaction,
which means higher temperatures lead to less satisfaction.
Remember, these are products which are marketed as cold-wash products.
That means they should work better in cold temperatures
and not higher temperatures.
I also want to show you where you can look for other fit statistics beyond
what is coming out in the model comparison table.
Under the S tructural Equation Model in red triangle,
if you check on Fit Indices, which I've already checked,
there are more fit indices that can be shown at the bottom.
We want to look at CFI and RMSEA, which we've already seen,
and here is the Standardized Root Mean S quare Residual,
which I discussed.
This is also exactly at the threshold of 0.08.
All in all, in terms of fit indices, our model does fit quite well.
Finally,
I told you not to forget the residuals.
These are normalized residuals in terms of the measured variables.
We have 21 measured variables, eight for pods, eight for temperatures,
four for overall satisfaction, and one for the product variable.
This is a 21 by 21 matrix.
What we are looking for
is numbers which are outside the plus 2 minus 2 range.
There are just too many numbers to look at in the table,
but JMP produces a heatmap.
Heatmap option is also under the red triangle.
What we are looking for is dark red or dark blue.
Here, we have two dark reds
which are relationship between pod usage at week 6 ,
temperature at week 12,
pod usage at week 6, and temperature at week 9.
Finally, we have one,
because this is just a mirror image of the one that is here.
This is the relationship between temperature at week 9
and temperature at week 10, which is not modeled.
This could actually be modeled
by adding an error covariance, which I did not do.
If I did this, the model, in fact, would be even better.
I want to go back to the presentation and summarize what we have found.
Oops, sorry, wrong slide.
In terms of conclusion, we started the Graph Builder
to visualize our trajectories,
and we built latent growth curve model using the SEM platform.
We extended from univariate to multivariate models.
A ll our models, including the last one, had acceptable fit, in fact, good fit.
Product had a significant impact on OAR,
which means Ariel is better than Ecolabel in terms of its overall satisfaction
and significant impact on the number of pods,
which means less product was used for Ariel compared to Ecolabel,
and also from week to week.
Intercept had a negative impact on OAR, which means people prefer
lower temperature setting than higher temperature setting.
If you are a P&G manufacturer, this is good news for you
because Ariel works better than Eco label
in this modeling framework that we have done.
I'm going to turn it over to Zhiwu to see what the next steps are
from this model results.
Zhiwu?
Thank you very much.
Thank you, Narayanan, and very excellent presentation
and wonderful demo.
As Narayanan mentioned,
the modeling results prove the product has a significant impact
to the overall satisfaction of the performance
of the detergent products in our test.
This result provides us the confidence we can make a very clear claim,
Ariel products is a favor to the cold wash can be used less than the normal products.
This is also modeling confirm the consumer behavior change.
If you use Ariel product, you will have more washing loads
go to the cold wash, use less energy and use less product.
Also, we plan to conduct bigger size consumer study for including the more
covariates variables in the future modeling stage like the additive usage
and the washing cycle of every wash and the low size per wash.
This is our next step.
Next slide.
Now we would like to take question if you have any.
Thank you very much for attending the presentation.
We look forward to your questions probably in the JMP Summit.