Speaker

Transcript


Hello, I'm Jeremy Ash. I'm a 

statistician in JMP R&D. My job 

primarily consists of testing 

the multivariate statistics 

platforms in JMP, but I also 

help research and evaluate 

methodology. Today I'm going to 

be analyzing the Tennessee 

Eastman process using some 

statistical process control 

methods in JMP. I'm going to 

be paying particular attention 

to the model driven multivariate 

control chart platform, which is 

a new addition to JMP 15. 

These data provide an 

opportunity to showcase the 

number of the platform's 

features. And just as a quick 

disclaimer, this is similar to 

my Discovery Americas talk. We 

realized that Europe hadn't seen a 

model driven multivariate 

control chart talk due to all the 

craziness around COVID, so I 

decided to focus on the basics. 

But there is some new material 

at the end of the talk. I'll 

briefly cover a few additional 

example analyses, then I put on 

the Community page for the talk. 

First, I'll assume some knowledge 

of statistical process control 

in this talk. The main thing it 

would be helpful to know about 

is control charts. If you're 

not familiar with these, these 

are charts used to monitor 

complex industrial systems to 

determine when they deviate 

from normal operating 

conditions. 

I'm not gonna have much time to 

go into the methodology of model 

driven multivariate control 

chart, so I'll refer to these other 

great talks that are freely 

available on the JMP Community 

if you want more details. I 

should also say that Jianfeng 

Ding was the primary 

developer of the model driven 

multivariate control control 

chart in collaboration with 

Chris Gotwalt and that Tonya 

Mauldin and I were testers. The 

focus of this talk will be using 

multivariate control charts to 

monitor a real world 

typical process; another novel 

aspect will be using control 

charts for online process 

monitoring. This means we'll be 

monitoring data continuously as 

it's added to a database and 

detecting faults in real time. 

So I'm going to start off with 

the obligatory slide on the 

advantages of multivariate 

control charts. So why not use 

univariate control charts? There 

are a number of excellent 

options in JMP. Univariate 

control charts are excellent 

tools for analyzing a few 

variables at a time. However, 

quality control data are often 

high dimensional and the number 

of control charts you need to 

look at can quickly become 

overwhelming. Multivariate 

control charts can summarize a 

high dimensional process in 

just a couple of control charts, 

so that's a key advantage. 

But that's not to say that 

univeriate control charts aren't 

useful in this setting. You'll 

see throughout the talk that 

fault diagnosis often involves 

switching between multivariate 

and univariate charts. 

Multivariate control charts give 

you a sense of the overall 

health of the process, while 

univariate charts allow you to 

monitor specific aspects of the 

process. So the information is 

complementary. One of the goals 

of monitoring multivariate 

control chart is to provide some 

useful tools for switching 

between these two types of 

charts. One disadvantage of 

univariate charts is that 

observations can appear to be in 

control when they're actually 

out of control in the multivariate 

sense and these plots show what I 

mean by this. The univariate 

control chart for oil and 

density show the two 

observations in red as in 

control. However, oil and density 

are highly correlated and both 

observations are out of control. 

in the multivariate sense, 

specially observation 51, which 

fairly violates the correlation 

structure of the two variables, 

so multivariate control charts 

can pick up on these types of 

outliers, while univariate 

control charts can't. 

Model driven multivariate 

control chart uses projection 

methods to construct the charts. 

I'm going to start by explaining PCA 

because it's easy to build up 

from there. PCA reduces the 

dimensionality of the process by 

projecting data onto a low 

dimensional surface. Um, 

this is shown in the picture 

on the right. We have P 

process variables and N 

observations, and 

the loading vectors in the P 

matrix give the coefficients for 

linear combinations of our X 

variables that result in 

square variables with 

dimension A, where the dimension 

A is much less than P. And then 

this is shown in equations on 

the left here. The X can be 

predicted as a function of the 

score and loadings, where E is 

the prediction error. 

These scores are selected to 

minimize the prediction error, 

and another way to think about 

this is that you're maximizing 

the amount of variance explained 

in the X matrix. 

Then PLS is a more suitable 

projection method. When you have 

a set of process variables and a 

set of quality variables, you 

really want to ensure that the 

quality variables are kept in 

control but these variables 

are often expensive or time 

consuming to collect. The plant 

could be making product without 

a control quality for a long 

time before a fault is detected. 

So PLS models allow you to 

monitor your quality variables 

as a function of your process 

variables and you can see that 

the PLS models find the score 

variables that maximize the 

amount of variation explained of 

the quality variables. 

These process variables are 

often cheaper or more readily 

available, so PLS can enable you 

to detect faults in quality 

early and make your process 

monitoring cheaper. And from here 

on out I'm going to focus on PLS 

models because it's more 

appropriate for the example. 

So PLS model partitions your 

data into two components. The 

first component is the model 

component. This gives the 

predicted values of your process 

variables. Another way to think 

about it is that your data has 

been projected into the model 

plane defined by your score 

variables and T squared monitors 

the variation of your data 

within this model plane. 

And the second component is the 

error component. This is the 

distance between your original 

data and the predicted data and 

squared prediction error (SPE) 

charts monitor this variation. 

Another alternative metric we 

provide is the distance to model 

X plane or DModX. This is just 

a normalized alternative to SPE 

that some people prefer. 

The last concept that's 

important to understand for the 

demo is the distinction between 

historical and current data. 

Historical data are typically 

collected when the process was 

known to be in control. These 

data are used to build the PLS 

model and define the normal 

process variation so that a 

control limit can be obtained. 

And current data are assigned 

scores based on the model but 

are independent of the model. 

Another way to think about this 

is that we have training and 

test sets. The T squared control 

limit is lower for the training 

data because we expect less 

variability for the various... 

observations used to train the 

model whereas there's greater 

variability in P squared when 

the model generalizes to E test 

set. Fortunately, the theory 

for the variance of T squared is 

been worked out so we can get 

these control limits based on 

some distributional assumptions. 

In the demo will be monitoring 

the Tennessee Eastman process. 

I'm going to present a short 

introduction to these data. This 

is a simulation of a chemical 

process developed by Downs and 

Vogel, two chemists at Eastman 

Chemical. It was originally 

written in Fortran, but there 

are wrappers for Matlab and 

Python now. I just wanted to note 

that while this data set was 

generated in the '90s, it's still 

one of the primary data sets 

used to benchmark multivariate 

control methods in the 

literature. It covers the 

main tasks of multivariate 

control well and there is 

an impressive amount of 

realism in the simulation. 

And the simulation is based on 

an industrial process that's 

still relevant today. 

So the data were manipulated 

to protect proprietary 

information. The simulated 

process is the production of 

two liquid products from 

gaseous reactants within a 

chemical plant. And F here is 

a byproduct 

that will need to be siphoned 

off from the desired product. 

Um and... 

That's about all I'll say about that. 

So the process diagram looks 

complicated, but it really isn't 

that bad, so I'll walk you 

through it. Gaseous 

reactants A, D, and E flow into 

the reactor here. 

The reaction occurs and the 

product leaves as a gas. It's 

then cooled and condensed into 

liquid in the condenser. 

Then a vapor liquid separator 

recycles any remaining vapor and 

sends it back to the reactor 

through a compressor, and the 

byproduct and inert chemical B 

are purged in the purge stream, 

and that's to prevent any 

accumulation. The liquid product 

is pumped through a stripper, 

where the remaining reactants 

are stripped off. 

And then sent back to the reactor. 

And then finally, the 

purified liquid product 

exits the process. 

The first set of variables being 

monitored are the manipulated 

variables. These look like bow 

ties in the diagram. I think 

they're actually meant to be 

valves and the manipulated 

process...or the manipulated 

variables mostly control the 

flow rate through different 

streams of the process. 

And these variables can be set 

to any values within limits and 

have some Gaussian noise. 

The manipulated variables are able 

to be sampled in the rate, 

but we use the default 3 

minutes sample now. 

Some examples of the manipulated 

variables are the valves that 

control the flow of reactants 

into the reactor. 

Another example is a valve 

that controls the flow of 

steam into the stripper. 

And another is a valve that 

controls the flow of coolant 

into the reactor. 

The next set of variables are 

measurement variables. These are 

shown as circles in the diagram. 

They were also sampled at three 

minute intervals. The 

difference between manipulated 

variables and measurement 

variables is that the 

measurement variables can't be 

manipulated in the simulation. 

Our quality variables will be 

the percent composition of 

two liquid products and you 

can see the analyzer 

measuring the products here. 

These variables are sampled with 

a considerable time delay, so 

we're looking at the purge 

stream instead of the exit 

stream, because these data are 

available earlier. And will use 

a PLS model to monitor process 

variables as a proxy for these 

variables because the process 

variables have less delay and 

affect faster sampling rate. 

So that should be enough 

background on the data. In 

total there are 33 process 

variables and two quality 

variables. The process of 

collecting the variables is 

simulated with a set of 

differential equations. And this 

is just a simulation, but as you 

can see a considerable amount of 

care went into modeling this 

after a real world process. Here 

is an overview of the demo I'm 

about to show you. We will collect 

data on our process and store 

these data in a database. 

I wanted to have an example that 

was easy to share, so I'll be 

using a SQLite database, but 

the workflow is relevant to most 

types of databases since most 

support ODBC connections. 

Once JMP forms an ODBC 

connection with the database, 

JMP can periodically check for 

new observations and add them to 

a data table. 

If we have a model driven 

multivariate control chart 

report open with automatic 

recalc turned on, we have a 

mechanism for updating the 

control charts as new data come 

in and the whole process of 

adding data to a database would 

likely be going on a separate 

computer from the computer 

that's doing the monitoring. So 

I have two sessions of JMP open 

to emulate this. Both sessions 

have their own journal 

in the materials on the 

Community, and the session 

adding new simulated data to 

the database will be called 

the Streaming Session and 

session updating the reports 

as new data come in will be 

called the Monitoring Session. 

One thing I really liked about 

the Downs and Vogel paper was 

that they didn't provide a 

single metric to evaluate the 

control of the process. I have 

a quote from the paper here 

"We felt that the tradeoffs 

among the possible control 

strategies and techniques 

involved much more than a 

mathematical expression." 

So here are some of the goals 

they listed in their paper, 

which are relevant to our 

problem. They wanted to maintain 

the process variables at 

desired values. They wanted to 

minimize variability of product 

quality during disturbances, and 

they wanted to recover quickly 

and smoothly from disturbances. 

So we'll see how well our 

process achieves these goals 

with our monitoring methods. 

So to start off in the 

Monitoring Session journal, I'll 

show you our first data set. 

The data table contained all of 

the variables I introduced 

earlier. The first variables are 

the measurement variables; the 

second are the composition. 

And the third are the 

manipulated variables. 

The script up here will fit 

a PLS model. It excludes the 

last 100 rows as a test set. 

Just as a reminder, 

the model is predicting 2 

product composition 

variables as a function of 

the process variables. If 

you have JMP Pro, there 

have been some speed 

improvements to PLS 

in JMP 16. 

PLS now has a 

fast SVD option. 

You can switch to the 

classic in the red 

triangle menu. There's 

also been a number of 

performance improvements 

under the hood. 

Mostly relevant for datasets 

with a large number of 

observations, but that's 

common in the multivariate 

process monitoring setting. 

But PLS is not the focus of the 

talk, so I've already fit the 

model and output score columns 

and you can see them here. 

One reason that the monitor 

multivariate control chart was 

designed the way it is, is that 

imagine you're a statistician 

and you want to share your model 

with an engineer so they can 

construct control charts. All 

you need to do is provide the 

data table with these formula 

columns. You don't need to share 

all the gory details of how you 

fit your model. 

Next, I'll provide the score 

columns to monitor the 

multivariate control chart. 

Drag it to the right here. 

So on the left here you can see 
two types of control charts 
the 

T squared and SPE. 

Um, there are 860 observations 

that were used to estimate the 

model and these are labeled as 

historical. And then the hundred 

that were left out as a test set 

are your current data. 

And you can see in the limit 

summaries, the number of points 

that are out of control and the 

significance level. Um, if you 

want to change the significance 

level, you can do it up here in 

the red triangle menu. 

Because the reactor's in normal 

operating conditions, we expect 

no observations to be out of 

control, but we have a few false 

positives here because we 

haven't made any adjustments for 

multiple comparisons. It's 

uncommon to do this, as far as I 

can tell, in multivariate 

control charts. I suppose you 

have higher power to detect out 

of control signals without a 

correction. In control chart 

lingo, this is means you're out 

of control. Average run length 

is kept low. 

So on the right here we 

also have contribution 

plots and on the Y axis are 

the observations; on the X 

axis, the variables. A 

contribution is expressed 

as a portion. 

And then at the bottom here, 

we have score plots. Right 

now I'm plotting the first 

score dimension versus the 

second score dimension, but 

you can look at any 

combination of score 

dimensions using this 

dropdown menus or the arrow 

button. 

OK, so I think we're oriented 

to the report. I'm going to 

now switch over to the 

scripts I've used to stream 

data into the database that 

the report is monitoring. 

In order to do anything for this 

example, you'll need to have a 

SQLite ODBC driver installed 

for your computer. This is much easier 

to do on a Windows computer, 

which is what you're often using 

when actually connecting to a 

database. The process on the Mac 

is more involved, but I put some 

instructions on the Community 

page. And then I don't have time 

to talk about this, but I 

created the SQLite database 

I'll be using in JMP and I 

plan to put some instructions 

in how to do this on the 

Community Web page. And hopefully 

that example is helpful to you 

if you're trying to do this with 

data on your own. 

Next I'm going to show 

you the files that I put 

in the SQLite database. 

Here I have the historical data. 

This was used to construct 

the PLS model. There are 960 

observations that are in 

control. Then I have the 

monitoring data, which at first 

just contains the historical 

data, but I'll gradually add new 

data to this. This is the data 

that the multivariate control 

chart will be monitoring. 

And then I've simulated new 

data already and added it to the 

data table here. These are 

another 960 odd measurements 

where a fault is introduced at 

some time point. I wanted to 

have something that was easy to 

share, so I'm not going to run 

my simulation script and add to 

the database that way. We're 

just going to take observations 

from this new data table and 

move them over to the monitoring 

data table using some JSL and 

SQL statements. This is just an 

example emulating the process 

of new data coming into a 

database. Somehow you might not 

actually do this with JMP, but 

this was an opportunity to show 

how you can do it with JSL. 

Clean up here. 

And next I'll show you this 

streaming script. This is a 

simple script, so I'm going to 

walk you through it real quick. 

This first set of 

commands will open the 

new data table and 

it's in the SQLite database, 

so it opens the table in the 

background so I don't have to 

deal with the window. 

Then I'm going to take pieces 

from this data table and add 

them to the monitoring data 

table. I call the pieces 

bites and the bite size is 20. 

And then this next command will 

connect to the database. This 

will allow me to send the 

database SQL statements. 

And then this next bit 

of code is 

iteratively sending SQL 

statements that insert new 

data into the monitoring data. 

And I'm going to 

initialize K and show you the 

first iteration of this. 

This is a simple SQL statement, 

insert into statement that 

inserts the first 20 

observations into the data 

table. This print statement is 

commented out so that the code 

runs faster and then I also 

have a wait statement to slow 

things down slightly so that 

we can see their progression 

in the control chart. 

And this would just go too fast 

if I didn't slow it down. 

Um, so next I'm going to move 

over to the monitoring sessions 

to show you the scripts 

that will update the report 

as new data come in. 

This first script is a simple 

script. That will check the 

database every .2 seconds for 

new observations and add them 

to the JMP table. Since the 

report has automatic recalc 

turned on, the report will update 

whenever new data are added. And 

I should add that 

realistically, 

you probably wouldn't use a 

script that just iterates like 

this. You probably use task 

scheduler in Windows or 

Automator on Mac to better 

schedule runs of the script. 

And then there's also another 

script that will 

push the report to JMP Public 

whenever the report is updated, 

and I was really excited that 

this is possible with JMP 15. 

It enables any computer with a 

web browser to view updates to 

the control chart. Then you 

can even view the report on 

your smartphone, so this makes 

it really easy to share 

results across organizations. 

And you can also use JMP Live 

if you wanted the reports to 

be on restricted server. 

I'm not going to have time 

to go into this in this 

demo, but you can check out 

my Discovery Americas talk. 

Then finally down here, there is 

a script that recreates the 

historical data in the data 

table if you want to run the 

example multiple times. 

Alright, so next...make sure 

that we have the historical data... 

I'm going to run the 

streaming script and see 

how the report updates. 

So the data is in control at 

first and then a fault is 

introduced, but there's a 

plantwide control system 

that's implemented in the 

simulation, and you can see 

how the control system 

eventually brings the process 

to a new equilibrium. 

Wait for it to finish here. 

So if we zoom in, 

seems like the process first 

went out of control around this 

time point, so I'm going to 

color it and 

label it, but it will 

show up in other plots. 

And then in the SPE plot, 

it looks like this 

observation is also out of 

control but only slightly. 

And then if we zoom in on 

the time point in the 

contribution plots, you can 

see that there are many 

variables contributing to 

the out of control signal at 

first. But then once the 

process reaches a new 

equilibrium, there's only 

two large contributors. 

So I'm going to remove the heat 

maps now to clean up a bit. 

You can hover over 

the point at which the process 

first went out of control and 

get a peek at the top ten 

contributing variables. This 

is great for giving you a 

quick overview which variables 

are contributing most to the 

out of control signal. 

And then if I click on the plot, 

this will be appended to the 

fault diagnosis section. 

And as you can see, there's 

several variables with large 

contributions and just sorted 

on the contribution. 

And for variables with 

red bars the observation is 

out of control in the univariate 

control charts. You can see 

this by hovering over one of 

the bars and these graphlets 

are IR charts for an 

individual variable with a 

three Sigma control limit. 

You can see in the stripper 

pressure variable that the 

observation is out of 

control, but eventually the 

process is brought back under 

control. And this is the case 

for the other top 

contributors. I'll also show 

you one of the variables 

where we're in control, the 

univariate control chart. 

So the process was... 

there are many variables out 

of control in the process at 

the beginning, but 

process eventually reaches 

a new equilibrium. 

Um... 

To see the variables that 

contribute most to the shift in 

the process, we can use mean 

contribution proportion plot. 

These plots show the average 

contribution that the variables 

have to T squared for the group 

I've selected. Um, here if I 

sort on these. 

The only two variables with 

large contributions measure the 

rate of flow of reactant A in 

stream one, which is the flow of 

this reactant into the reactor. 

Both of these variables are 

measuring essentially the 

same thing, except one 

measurement...one is a 

measurement variable and the 

other is a manipulated 

variable. 

You can see that there is a 

large step change in the flow 

rate, which is what I programmed 

in the simulation. So these 

contribution plots allow you to 

quickly identify the root cause. 

And then in my previous talk I 

showed many other ways to 

visualize and diagnose faults 

using tools in the score plot. 

This includes plotting the 

loadings on the score plots and 

doing some group comparisons. 

You can check out my Discovery 

Americas talk on the JMP 

Community for that. Instead, I'm 

going to spend the rest of this 

time introducing a few new 

examples, which I put on the 

Community page for this talk. 

So. 

There are 20 programmable faults 

in the Tennessee Eastman process 

and they can be introduced in any 

combination. I provided two other 

representative faults here. Fault 

1 that I showed previously was 

easy to detect because the out 

of control signal is so large 

and so many variables are 

involved. The focus on the 

previous demo was to show how to 

use the tools and identify. 

faults out of a large number of 

variables and not to benchmark 

the methods necessarily. 

Fault 4, on the other hand, 

is a more subtle fault, 

and I'll show you it here. 

The fault i...that's programmed 

is a sudden increase in the 

temperature in the reactor. 

And this is compensated for by 

the control system by increasing 

the flow rate of coolant. 

And you can see that 

variable picked up here and 

you can see the shift in 

contribution plots. 

And then you can also see 

that most other variables 

aren't affected 

by the fault. You can see a 

spike in the temperature here 

is quickly brought back under 

control. Because most other 

variables aren't affected, this 

is hard to detect for some 

multivariate control methods. 

And it can be more 

difficult to diagnose. 

The last fault I'll show you 

is Fault 11. 

Like Fault 4, it also involves 

the flow of coolant into the 

reactor, except now the fault 

introduces large oscillations in 

the flow rate, which we can 

see in the univariate control 

chart. And this results in a 

fluctuation of reactor 

temperature. The other 

variables aren't really 

affected again, so this can be 

harder to detect for some 

methods. Some multivariate 

control methods can pick up on 

Fault 4, but not Fault 11 or 

vice versa. But our method was 

able to pick up on both. 

And then finally, all the 

examples I created using the 

Tennessee Eastman process had 

faults that were apparent in 

both T squared and SPE plots. To 

show some newer features in 

model driven multivariate 

control chart, I wanted to show 

an example of a fault that 

appears in the SPE chart but not 

T squared. And to find a good 

example of this, I revisited a 

data set which Jianfeng Ding 

presented in her former talk, and 

I provided a link to her talk 

in this journal. 

On her Community page, 

she provides several 

useful examples that are 

also worth checking out. 

This is a data set from Cordia 

McGregor's (?) classic paper on 

multivariate control charts. The 

data are processed variables 

measured in a reactor, producing 

polyethylene, and you can find 

more background in Jianfeng's 

talk. In this example, we 

have a process that went out of 

control. Let me show you this. 

And it's out of control in... 

earlier in the SPE chart than in 

the T squared. 

And if we look at the mean 

contribution 

plots for SPE, 

you can 

see that there is one variable 

with large contribution and it 

also shows a large shift in the 

univariate control chart, but 

there are also other variables 

with large contributions, but 

that are still in control in the 

univariate control charts. 

And it's difficult to determine from 

the bar charts alone why these 

variables had a large 

contributions. Large SPE values 

happen when new data don't 

follow the correlation structure 

of the historical data, which is 

often the case when new data are 

collected, and this means that 

your PLS model you trained is 

no longer applicable. 

From the bar charts, it's hard 

to know which pair of variables 

have their correlation structure 

broken. So new in 15.2, you 

can launch scatterplot matrices. 

And it's clear in the 

scatterplot matrix that the 

violation of correlations 

with Z2 is what's driving 

these large contributions. 

OK, I'm gonna switch back 

to the PowerPoint. 

And real quick, I'll summarize 

the key features of model driven 

multivariate control chart that 

were shown in the demo. The 

platform is capable of 

performing both online fault 

detection and offline fault 

diagnosis. There are many 

methods provided in the platform 

for drilling down to the root 

cause of faults. I'm showing you 

here some plots from a popular 

book, Fault Detection and 

Diagnosis in Industrial Systems. 

Throughout the book, the authors 

demonstrate how one needs to 

use multivariate and univariate 

control charts side by side 

to get a sense of what's going 

on in a process. 

An one particularly useful 

feature in model driven multivariate 

control chart is how 

interactive and user friendly 

it is to switch between these 

two types of charts. 

And that's my talk. Here is 

my email if you have any 

further questions. And 

thanks to everyone that 

tuned in to watch this. 