Level: Intermediate
Jeremy Ash, JMP Analytics Software Tester, SAS
The Model Driven Multivariate Control Chart (MDMCC) platform enables users to build control charts based on PCA or PLS models. These can be used for fault detection and diagnosis of high dimensional data sets. We demonstrate MDMCC monitoring of a PLS model using the simulation of a real-world industrial chemical process: the Tennessee Eastman Process. During the simulation, quality and process variables are measured as a chemical reactor produces liquid products from gaseous reactants. We demonstrate how MDMCC can perform online monitoring by connecting JMP to an external database. Measuring product quality variables often involves a time delay before measurements are available which can delay fault detection substantially. When MDMCC monitors a PLS model, the variation of product quality variables is monitored as a function of process variables. Since process variables are often more readily available, this can aid in the early detection of faults. We also demonstrate fault diagnosis in an offline setting. This often involves switching between multivariate control charts, univariate control charts and diagnostic plots. MDMCC provides a user-friendly interface to move between these plots.
Auto-generated transcript...
Speaker
Transcript
Hello, I'm Jeremy Ash. I'm a
statistician in JMP R&D. My job
primarily consists of testing
the multivariate statistics
platforms in JMP, but I also
help research and evaluate
methodology. Today I'm going to
be analyzing the Tennessee
Eastman process using some
statistical process control
methods in JMP. I'm going to
be paying particular attention
to the model driven multivariate
control chart platform, which is
a new addition to JMP 15.
These data provide an
opportunity to showcase the
number of the platform's
features. And just as a quick
disclaimer, this is similar to
my Discovery Americas talk. We
realized that Europe hadn't seen a
model driven multivariate
control chart talk due to all the
craziness around COVID, so I
decided to focus on the basics.
But there is some new material
at the end of the talk. I'll
briefly cover a few additional
example analyses, then I put on
the Community page for the talk.
First, I'll assume some knowledge
of statistical process control
in this talk. The main thing it
would be helpful to know about
is control charts. If you're
not familiar with these, these
are charts used to monitor
complex industrial systems to
determine when they deviate
from normal operating
conditions.
I'm not gonna have much time to
go into the methodology of model
driven multivariate control
chart, so I'll refer to these other
great talks that are freely
available on the JMP Community
if you want more details. I
should also say that Jianfeng
Ding was the primary
developer of the model driven
multivariate control control
chart in collaboration with
Chris Gotwalt and that Tonya
Mauldin and I were testers. The
focus of this talk will be using
multivariate control charts to
monitor a real world
typical process; another novel
aspect will be using control
charts for online process
monitoring. This means we'll be
monitoring data continuously as
it's added to a database and
detecting faults in real time.
So I'm going to start off with
the obligatory slide on the
advantages of multivariate
control charts. So why not use
univariate control charts? There
are a number of excellent
options in JMP. Univariate
control charts are excellent
tools for analyzing a few
variables at a time. However,
quality control data are often
high dimensional and the number
of control charts you need to
look at can quickly become
overwhelming. Multivariate
control charts can summarize a
high dimensional process in
just a couple of control charts,
so that's a key advantage.
But that's not to say that
univeriate control charts aren't
useful in this setting. You'll
see throughout the talk that
fault diagnosis often involves
switching between multivariate
and univariate charts.
Multivariate control charts give
you a sense of the overall
health of the process, while
univariate charts allow you to
monitor specific aspects of the
process. So the information is
complementary. One of the goals
of monitoring multivariate
control chart is to provide some
useful tools for switching
between these two types of
charts. One disadvantage of
univariate charts is that
observations can appear to be in
control when they're actually
out of control in the multivariate
sense and these plots show what I
mean by this. The univariate
control chart for oil and
density show the two
observations in red as in
control. However, oil and density
are highly correlated and both
observations are out of control.
in the multivariate sense,
specially observation 51, which
fairly violates the correlation
structure of the two variables,
so multivariate control charts
can pick up on these types of
outliers, while univariate
control charts can't.
Model driven multivariate
control chart uses projection
methods to construct the charts.
I'm going to start by explaining PCA
because it's easy to build up
from there. PCA reduces the
dimensionality of the process by
projecting data onto a low
dimensional surface. Um,
this is shown in the picture
on the right. We have P
process variables and N
observations, and
the loading vectors in the P
matrix give the coefficients for
linear combinations of our X
variables that result in
square variables with
dimension A, where the dimension
A is much less than P. And then
this is shown in equations on
the left here. The X can be
predicted as a function of the
score and loadings, where E is
the prediction error.
These scores are selected to
minimize the prediction error,
and another way to think about
this is that you're maximizing
the amount of variance explained
in the X matrix.
Then PLS is a more suitable
projection method. When you have
a set of process variables and a
set of quality variables, you
really want to ensure that the
quality variables are kept in
control but these variables
are often expensive or time
consuming to collect. The plant
could be making product without
a control quality for a long
time before a fault is detected.
So PLS models allow you to
monitor your quality variables
as a function of your process
variables and you can see that
the PLS models find the score
variables that maximize the
amount of variation explained of
the quality variables.
These process variables are
often cheaper or more readily
available, so PLS can enable you
to detect faults in quality
early and make your process
monitoring cheaper. And from here
on out I'm going to focus on PLS
models because it's more
appropriate for the example.
So PLS model partitions your
data into two components. The
first component is the model
component. This gives the
predicted values of your process
variables. Another way to think
about it is that your data has
been projected into the model
plane defined by your score
variables and T squared monitors
the variation of your data
within this model plane.
And the second component is the
error component. This is the
distance between your original
data and the predicted data and
squared prediction error (SPE)
charts monitor this variation.
Another alternative metric we
provide is the distance to model
X plane or DModX. This is just
a normalized alternative to SPE
that some people prefer.
The last concept that's
important to understand for the
demo is the distinction between
historical and current data.
Historical data are typically
collected when the process was
known to be in control. These
data are used to build the PLS
model and define the normal
process variation so that a
control limit can be obtained.
And current data are assigned
scores based on the model but
are independent of the model.
Another way to think about this
is that we have training and
test sets. The T squared control
limit is lower for the training
data because we expect less
variability for the various...
observations used to train the
model whereas there's greater
variability in P squared when
the model generalizes to E test
set. Fortunately, the theory
for the variance of T squared is
been worked out so we can get
these control limits based on
some distributional assumptions.
In the demo will be monitoring
the Tennessee Eastman process.
I'm going to present a short
introduction to these data. This
is a simulation of a chemical
process developed by Downs and
Vogel, two chemists at Eastman
Chemical. It was originally
written in Fortran, but there
are wrappers for Matlab and
Python now. I just wanted to note
that while this data set was
generated in the '90s, it's still
one of the primary data sets
used to benchmark multivariate
control methods in the
literature. It covers the
main tasks of multivariate
control well and there is
an impressive amount of
realism in the simulation.
And the simulation is based on
an industrial process that's
still relevant today.
So the data were manipulated
to protect proprietary
information. The simulated
process is the production of
two liquid products from
gaseous reactants within a
chemical plant. And F here is
a byproduct
that will need to be siphoned
off from the desired product.
Um and...
That's about all I'll say about that.
So the process diagram looks
complicated, but it really isn't
that bad, so I'll walk you
through it. Gaseous
reactants A, D, and E flow into
the reactor here.
The reaction occurs and the
product leaves as a gas. It's
then cooled and condensed into
liquid in the condenser.
Then a vapor liquid separator
recycles any remaining vapor and
sends it back to the reactor
through a compressor, and the
byproduct and inert chemical B
are purged in the purge stream,
and that's to prevent any
accumulation. The liquid product
is pumped through a stripper,
where the remaining reactants
are stripped off.
And then sent back to the reactor.
And then finally, the
purified liquid product
exits the process.
The first set of variables being
monitored are the manipulated
variables. These look like bow
ties in the diagram. I think
they're actually meant to be
valves and the manipulated
process...or the manipulated
variables mostly control the
flow rate through different
streams of the process.
And these variables can be set
to any values within limits and
have some Gaussian noise.
The manipulated variables are able
to be sampled in the rate,
but we use the default 3
minutes sample now.
Some examples of the manipulated
variables are the valves that
control the flow of reactants
into the reactor.
Another example is a valve
that controls the flow of
steam into the stripper.
And another is a valve that
controls the flow of coolant
into the reactor.
The next set of variables are
measurement variables. These are
shown as circles in the diagram.
They were also sampled at three
minute intervals. The
difference between manipulated
variables and measurement
variables is that the
measurement variables can't be
manipulated in the simulation.
Our quality variables will be
the percent composition of
two liquid products and you
can see the analyzer
measuring the products here.
These variables are sampled with
a considerable time delay, so
we're looking at the purge
stream instead of the exit
stream, because these data are
available earlier. And will use
a PLS model to monitor process
variables as a proxy for these
variables because the process
variables have less delay and
affect faster sampling rate.
So that should be enough
background on the data. In
total there are 33 process
variables and two quality
variables. The process of
collecting the variables is
simulated with a set of
differential equations. And this
is just a simulation, but as you
can see a considerable amount of
care went into modeling this
after a real world process. Here
is an overview of the demo I'm
about to show you. We will collect
data on our process and store
these data in a database.
I wanted to have an example that
was easy to share, so I'll be
using a SQLite database, but
the workflow is relevant to most
types of databases since most
support ODBC connections.
Once JMP forms an ODBC
connection with the database,
JMP can periodically check for
new observations and add them to
a data table.
If we have a model driven
multivariate control chart
report open with automatic
recalc turned on, we have a
mechanism for updating the
control charts as new data come
in and the whole process of
adding data to a database would
likely be going on a separate
computer from the computer
that's doing the monitoring. So
I have two sessions of JMP open
to emulate this. Both sessions
have their own journal
in the materials on the
Community, and the session
adding new simulated data to
the database will be called
the Streaming Session and
session updating the reports
as new data come in will be
called the Monitoring Session.
One thing I really liked about
the Downs and Vogel paper was
that they didn't provide a
single metric to evaluate the
control of the process. I have
a quote from the paper here
"We felt that the tradeoffs
among the possible control
strategies and techniques
involved much more than a
mathematical expression."
So here are some of the goals
they listed in their paper,
which are relevant to our
problem. They wanted to maintain
the process variables at
desired values. They wanted to
minimize variability of product
quality during disturbances, and
they wanted to recover quickly
and smoothly from disturbances.
So we'll see how well our
process achieves these goals
with our monitoring methods.
So to start off in the
Monitoring Session journal, I'll
show you our first data set.
The data table contained all of
the variables I introduced
earlier. The first variables are
the measurement variables; the
second are the composition.
And the third are the
manipulated variables.
The script up here will fit
a PLS model. It excludes the
last 100 rows as a test set.
Just as a reminder,
the model is predicting 2
product composition
variables as a function of
the process variables. If
you have JMP Pro, there
have been some speed
improvements to PLS
in JMP 16.
PLS now has a
fast SVD option.
You can switch to the
classic in the red
triangle menu. There's
also been a number of
performance improvements
under the hood.
Mostly relevant for datasets
with a large number of
observations, but that's
common in the multivariate
process monitoring setting.
But PLS is not the focus of the
talk, so I've already fit the
model and output score columns
and you can see them here.
One reason that the monitor
multivariate control chart was
designed the way it is, is that
imagine you're a statistician
and you want to share your model
with an engineer so they can
construct control charts. All
you need to do is provide the
data table with these formula
columns. You don't need to share
all the gory details of how you
fit your model.
Next, I'll provide the score
columns to monitor the
multivariate control chart.
Drag it to the right here.
So on the left here you can see
two types of control charts
the
T squared and SPE.
Um, there are 860 observations
that were used to estimate the
model and these are labeled as
historical. And then the hundred
that were left out as a test set
are your current data.
And you can see in the limit
summaries, the number of points
that are out of control and the
significance level. Um, if you
want to change the significance
level, you can do it up here in
the red triangle menu.
Because the reactor's in normal
operating conditions, we expect
no observations to be out of
control, but we have a few false
positives here because we
haven't made any adjustments for
multiple comparisons. It's
uncommon to do this, as far as I
can tell, in multivariate
control charts. I suppose you
have higher power to detect out
of control signals without a
correction. In control chart
lingo, this is means you're out
of control. Average run length
is kept low.
So on the right here we
also have contribution
plots and on the Y axis are
the observations; on the X
axis, the variables. A
contribution is expressed
as a portion.
And then at the bottom here,
we have score plots. Right
now I'm plotting the first
score dimension versus the
second score dimension, but
you can look at any
combination of score
dimensions using this
dropdown menus or the arrow
button.
OK, so I think we're oriented
to the report. I'm going to
now switch over to the
scripts I've used to stream
data into the database that
the report is monitoring.
In order to do anything for this
example, you'll need to have a
SQLite ODBC driver installed
for your computer. This is much easier
to do on a Windows computer,
which is what you're often using
when actually connecting to a
database. The process on the Mac
is more involved, but I put some
instructions on the Community
page. And then I don't have time
to talk about this, but I
created the SQLite database
I'll be using in JMP and I
plan to put some instructions
in how to do this on the
Community Web page. And hopefully
that example is helpful to you
if you're trying to do this with
data on your own.
Next I'm going to show
you the files that I put
in the SQLite database.
Here I have the historical data.
This was used to construct
the PLS model. There are 960
observations that are in
control. Then I have the
monitoring data, which at first
just contains the historical
data, but I'll gradually add new
data to this. This is the data
that the multivariate control
chart will be monitoring.
And then I've simulated new
data already and added it to the
data table here. These are
another 960 odd measurements
where a fault is introduced at
some time point. I wanted to
have something that was easy to
share, so I'm not going to run
my simulation script and add to
the database that way. We're
just going to take observations
from this new data table and
move them over to the monitoring
data table using some JSL and
SQL statements. This is just an
example emulating the process
of new data coming into a
database. Somehow you might not
actually do this with JMP, but
this was an opportunity to show
how you can do it with JSL.
Clean up here.
And next I'll show you this
streaming script. This is a
simple script, so I'm going to
walk you through it real quick.
This first set of
commands will open the
new data table and
it's in the SQLite database,
so it opens the table in the
background so I don't have to
deal with the window.
Then I'm going to take pieces
from this data table and add
them to the monitoring data
table. I call the pieces
bites and the bite size is 20.
And then this next command will
connect to the database. This
will allow me to send the
database SQL statements.
And then this next bit
of code is
iteratively sending SQL
statements that insert new
data into the monitoring data.
And I'm going to
initialize K and show you the
first iteration of this.
This is a simple SQL statement,
insert into statement that
inserts the first 20
observations into the data
table. This print statement is
commented out so that the code
runs faster and then I also
have a wait statement to slow
things down slightly so that
we can see their progression
in the control chart.
And this would just go too fast
if I didn't slow it down.
Um, so next I'm going to move
over to the monitoring sessions
to show you the scripts
that will update the report
as new data come in.
This first script is a simple
script. That will check the
database every .2 seconds for
new observations and add them
to the JMP table. Since the
report has automatic recalc
turned on, the report will update
whenever new data are added. And
I should add that
realistically,
you probably wouldn't use a
script that just iterates like
this. You probably use task
scheduler in Windows or
Automator on Mac to better
schedule runs of the script.
And then there's also another
script that will
push the report to JMP Public
whenever the report is updated,
and I was really excited that
this is possible with JMP 15.
It enables any computer with a
web browser to view updates to
the control chart. Then you
can even view the report on
your smartphone, so this makes
it really easy to share
results across organizations.
And you can also use JMP Live
if you wanted the reports to
be on restricted server.
I'm not going to have time
to go into this in this
demo, but you can check out
my Discovery Americas talk.
Then finally down here, there is
a script that recreates the
historical data in the data
table if you want to run the
example multiple times.
Alright, so next...make sure
that we have the historical data...
I'm going to run the
streaming script and see
how the report updates.
So the data is in control at
first and then a fault is
introduced, but there's a
plantwide control system
that's implemented in the
simulation, and you can see
how the control system
eventually brings the process
to a new equilibrium.
Wait for it to finish here.
So if we zoom in,
seems like the process first
went out of control around this
time point, so I'm going to
color it and
label it, but it will
show up in other plots.
And then in the SPE plot,
it looks like this
observation is also out of
control but only slightly.
And then if we zoom in on
the time point in the
contribution plots, you can
see that there are many
variables contributing to
the out of control signal at
first. But then once the
process reaches a new
equilibrium, there's only
two large contributors.
So I'm going to remove the heat
maps now to clean up a bit.
You can hover over
the point at which the process
first went out of control and
get a peek at the top ten
contributing variables. This
is great for giving you a
quick overview which variables
are contributing most to the
out of control signal.
And then if I click on the plot,
this will be appended to the
fault diagnosis section.
And as you can see, there's
several variables with large
contributions and just sorted
on the contribution.
And for variables with
red bars the observation is
out of control in the univariate
control charts. You can see
this by hovering over one of
the bars and these graphlets
are IR charts for an
individual variable with a
three Sigma control limit.
You can see in the stripper
pressure variable that the
observation is out of
control, but eventually the
process is brought back under
control. And this is the case
for the other top
contributors. I'll also show
you one of the variables
where we're in control, the
univariate control chart.
So the process was...
there are many variables out
of control in the process at
the beginning, but
process eventually reaches
a new equilibrium.
Um...
To see the variables that
contribute most to the shift in
the process, we can use mean
contribution proportion plot.
These plots show the average
contribution that the variables
have to T squared for the group
I've selected. Um, here if I
sort on these.
The only two variables with
large contributions measure the
rate of flow of reactant A in
stream one, which is the flow of
this reactant into the reactor.
Both of these variables are
measuring essentially the
same thing, except one
measurement...one is a
measurement variable and the
other is a manipulated
variable.
You can see that there is a
large step change in the flow
rate, which is what I programmed
in the simulation. So these
contribution plots allow you to
quickly identify the root cause.
And then in my previous talk I
showed many other ways to
visualize and diagnose faults
using tools in the score plot.
This includes plotting the
loadings on the score plots and
doing some group comparisons.
You can check out my Discovery
Americas talk on the JMP
Community for that. Instead, I'm
going to spend the rest of this
time introducing a few new
examples, which I put on the
Community page for this talk.
So.
There are 20 programmable faults
in the Tennessee Eastman process
and they can be introduced in any
combination. I provided two other
representative faults here. Fault
1 that I showed previously was
easy to detect because the out
of control signal is so large
and so many variables are
involved. The focus on the
previous demo was to show how to
use the tools and identify.
faults out of a large number of
variables and not to benchmark
the methods necessarily.
Fault 4, on the other hand,
is a more subtle fault,
and I'll show you it here.
The fault i...that's programmed
is a sudden increase in the
temperature in the reactor.
And this is compensated for by
the control system by increasing
the flow rate of coolant.
And you can see that
variable picked up here and
you can see the shift in
contribution plots.
And then you can also see
that most other variables
aren't affected
by the fault. You can see a
spike in the temperature here
is quickly brought back under
control. Because most other
variables aren't affected, this
is hard to detect for some
multivariate control methods.
And it can be more
difficult to diagnose.
The last fault I'll show you
is Fault 11.
Like Fault 4, it also involves
the flow of coolant into the
reactor, except now the fault
introduces large oscillations in
the flow rate, which we can
see in the univariate control
chart. And this results in a
fluctuation of reactor
temperature. The other
variables aren't really
affected again, so this can be
harder to detect for some
methods. Some multivariate
control methods can pick up on
Fault 4, but not Fault 11 or
vice versa. But our method was
able to pick up on both.
And then finally, all the
examples I created using the
Tennessee Eastman process had
faults that were apparent in
both T squared and SPE plots. To
show some newer features in
model driven multivariate
control chart, I wanted to show
an example of a fault that
appears in the SPE chart but not
T squared. And to find a good
example of this, I revisited a
data set which Jianfeng Ding
presented in her former talk, and
I provided a link to her talk
in this journal.
On her Community page,
she provides several
useful examples that are
also worth checking out.
This is a data set from Cordia
McGregor's (?) classic paper on
multivariate control charts. The
data are processed variables
measured in a reactor, producing
polyethylene, and you can find
more background in Jianfeng's
talk. In this example, we
have a process that went out of
control. Let me show you this.
And it's out of control in...
earlier in the SPE chart than in
the T squared.
And if we look at the mean
contribution
plots for SPE,
you can
see that there is one variable
with large contribution and it
also shows a large shift in the
univariate control chart, but
there are also other variables
with large contributions, but
that are still in control in the
univariate control charts.
And it's difficult to determine from
the bar charts alone why these
variables had a large
contributions. Large SPE values
happen when new data don't
follow the correlation structure
of the historical data, which is
often the case when new data are
collected, and this means that
your PLS model you trained is
no longer applicable.
From the bar charts, it's hard
to know which pair of variables
have their correlation structure
broken. So new in 15.2, you
can launch scatterplot matrices.
And it's clear in the
scatterplot matrix that the
violation of correlations
with Z2 is what's driving
these large contributions.
OK, I'm gonna switch back
to the PowerPoint.
And real quick, I'll summarize
the key features of model driven
multivariate control chart that
were shown in the demo. The
platform is capable of
performing both online fault
detection and offline fault
diagnosis. There are many
methods provided in the platform
for drilling down to the root
cause of faults. I'm showing you
here some plots from a popular
book, Fault Detection and
Diagnosis in Industrial Systems.
Throughout the book, the authors
demonstrate how one needs to
use multivariate and univariate
control charts side by side
to get a sense of what's going
on in a process.
An one particularly useful
feature in model driven multivariate
control chart is how
interactive and user friendly
it is to switch between these
two types of charts.
And that's my talk. Here is
my email if you have any
further questions. And
thanks to everyone that
tuned in to watch this.
... View more