Fault Detection and Diagnosis of the Tennessee Eastman Process using Multivariat...

Jeremy Ash, JMP Analytics Software Tester, JMP

The Model Driven Multivariate Control Chart (MDMVCC) platform enables users to build control charts based on PCA or PLS models. These can be used for fault detection and diagnosis of high dimensional data sets. We demonstrate MDMVCC monitoring of a PLS model using the simulation of a real world industrial chemical process — the Tennessee Eastman Process. During the simulation, quality and process variables are measured as a chemical reactor produces liquid products from gaseous reactants. We demonstrate fault diagnosis in an offline setting. This often involves switching between multivariate control charts, univariate control charts, and diagnostic plots. MDMVCC provides a user-friendly way to move between these plots. Next, we demonstrate how MDMVCC can perform online monitoring by connecting JMP to an external database. Measuring product quality variables often involves a time delay before measurements are available, which can delay fault detection substantially. When MDMVCC monitors a PLS model, the variation of product quality variables is monitored as a function of process variables. Since process variables are often more readily available, this can aide in the early detection of faults.

Example Files

Download and extract streaming_example.zip. There is a README file with some additional setup instructions that you will need to perform before following along with the example in the video. There are also additional fault diagnosis examples provided. Message me on the community if you find any issues or have any questions.

Auto-generated transcript...

Speaker	Transcript
Jeremy Ash	Hello, I'm Jeremy ash. I'm a statistician in jump R amp D. My job primarily consists of testing the multivariate statistics platforms and jump but
	I also help research and evaluate methodology and today I'm going to be analyzing the Tennessee Eastman process using some statistical process control methods and jump.
	I'm going to be paying particular attention to the model driven multivariate control chart platform, which is a new addition to jump and I'm really excited about this platform and these data provided a new opportunity to showcase some of its features.
	First, I'm assuming some knowledge of statistical process control in this talk.
	The main thing you need to know about is control charts. If you're not familiar with these. These are charts used to monitor complex industrial systems to determine when they deviate from normal operating conditions.
	I'm not gonna have much time to go into the methodology and model driven multivariate control chart. So I'll refer to these other great talks that are freely available.
	For more details. I should also mention that Jim finding was that primary developer of the model driven multivariate control chart and in collaboration with Chris Got Walt and Tanya Malden I were testers.
	So the focus of this talk will be using multivariate control charts to monitor a real world chemical process.
	Another novel aspect of this talk will be using control charts for online process monitoring this means we'll be monitoring data continuously as it's added to a database and texting faults in real time.
	So I'm going to start with the obligatory slide on the advantages of multivariate control charts. So why not use University control charts there. There are a number of excellent options and jump.
	University control charts are excellent tools for analyzing a few variables at a time. However, quality control data sets are often high dimensional
	And the number of charts that you need to look at can quickly become overwhelming. So multivariate control charts summarize a high dimensional process. And just a few charts and that's a key advantage.
	But that's not to say that university control charts aren't useful in this setting, you'll see throughout the talk that fault diagnosis often involves switching between multivariate in University of control charts.
	Multivariate control charts, give you a sense of the overall health of a process well University control charts allow you to
	Look at specific aspects. And so the information is complimentary and one of the main goals of model driven multivariate control chart was to provide some tools that make it easy to switch between those two types of charts.
	One disadvantage of the university control chart is that observations can appear to be in control when they're actually out of control in the multivariate sense. So I have to
	Control our IR charts for oil and density and these two observations in red are in control, but oil and density are highly correlated. And these observations are outliers in the multivariate sense in particular observation 51 severely violates the correlation structure.
	So multivariate control charts can pick up on these types of outliers. When University control charts can't
	model driven multivariate control chart uses projection methods to construct its control charts. I'm going to start by explaining PCA because it's easy to build up from there.
	PCA reduces dimensionality of your process variables by projecting into a low dimensional space.
	This is shown in the in the picture to the right we have p process variables and and observations and we want to reduce the dimensionality of the process to a were a as much less than p and
	To do this we use this P loading matrix, which provides the coefficients for linear combinations of our X variables which give the score variables. The shown and equations on the left.
	tee times P will give you predicted values for your process variables with the low dimensional representation. And there's some prediction air and your score variables are selected.
	In a way that minimizes this squared prediction air. Another way to think about it is, you're maximizing the amount of variance explained x
	Pls is more suitable when you have a set of process variables and a set of quality variables and you really want to ensure that the quality variables are kept in control, but these variables are often expensive or time consuming to collect
	At planet can be making out of control quality for a long time before fault is detected, so
	Pls models allow you to monitor your quality variables as a function of your process variables. And you can see here that pls will find score variables that maximize the variance explained in the y variables.
	The process variables are often cheaper and more readily available. So pls models can allow you to detect quality faults early and can make process monitoring cheaper.
	So from here on out. I'm just going to focus on pls models because that's that's more appropriate for our example.
	So pls partitions your data into two components. The first component is your model component. This gives you the predicted values.
	Another way to think about this as your data has been projected into a model plane defined by your score variables and t squared charts will monitor variation in this model plane.
	The second component is your error component. This is the distance between your original data and that predicted data and squared prediction error charts are sp charts will monitor
	Variation in this component
	We also provide an alternative distance to model x plane, this is just a normalized version of sp.
	The last concept that's important to understand for the demo is the distinction between historical and current data.
	historical data typically collected when the process is known to be in control. These data are used to build the PLS model and define
	Normal process variation. And this allows a control limit to be obtained current data are assigned scores based on the model, but are independent of the model.
	Another way to think about this is that we have a training and a test set, and the t squared control limit is lower for the training data because we expect lower variability for
	Observations used to train the model, whereas there's greater variability and t squared. When the model generalized is to a test set. And fortunately, there's some theory that's been worked out for the
	Variants of T square that allows us to obtain control limits based on some distributional assumptions.
	In the demo will be monitoring the Tennessee Eastman process. I'm going to present a short introduction to these data.
	This is a simulation of a chemical process developed by downs and Bogle to chemists at Eastman Chemical and it was originally written in Fortran, but there are rappers for it in MATLAB and Python now.
	The simulation was based on a real industrial process, but it was manipulated to protect proprietary information.
	The simulation processes. The, the production of to liquids.
	By gassing reactants and F is a byproduct that will need to be siphoned off from the desired product.
	The two season processes pervasive in the in the literature on benchmarking multivariate process control methods.
	So this is the process diagram. It looks complicated, but it's really not that bad. So I'm going to walk you through it.
	The gaseous reactants ad and he are flowing into the reactor here, the reaction occurs and product leaves as a gas. It's been cooled and condensed into a liquid and the condenser.
	Then we have a vapor liquid separator that will remove any remaining vapor and recycle it back to the reactor through the compressor and there's also a purge stream here that will
	Vent byproduct and an art chemical to prevent it from accumulating and then the liquid product will be pumped through a stripper where the remaining reactants are stripped off and the final purified product leaves here in the exit stream.
	The first set of variables that are being monitored are the manipulated variables. These look like bow ties and the diagram.
	Think they're actually meant to be valves and the manipulative variables, mostly control the flow rate through different streams of the process.
	These variables can be set to specific values within limits and have some Gaussian noise and the manipulative variables can be sampled at any rate, we're using a default three minutes sampling in
	Some examples of the manipulative variables are the flow rate of the reactants into the reactor
	The flow rate of steam into the stripper.
	And the flow of coolant into the reactor
	The next set of variables are measurement variables. These are shown as circles in the diagram and they're also sampled in three minute intervals and the difference is that the measurement variables can't be manipulated in the simulation.
	Our quality variables will be percent composition of to liquid products you can see
	The analyzer measuring the composition here.
	These variables are collected with a considerable time delay so
	We're looking at the product in the stream because
	These variables can be measured more readily than the product leaving in the exit stream. And we'll also be building a pls model to monitor
	monitor our quality variables by means of our process variables which have substantial substantially less delay in a faster sampling rate.
	Okay, so that's an a background on the data. In total there are 33 process variables into quality variables.
	The process of collecting the variables is simulated with a series of differential equations. So this is just a simulation. But you can see that a considerable amount of care went into model modeling. This is a real world process.
	So here's an overview of the demo, I'm about to show you will collect data on our process and then store these data in a database.
	I wanted to have an example that was easy to share. So I'll be using a sequel light database, but this workflow is relevant to most types of databases.
	Most databases support odd see connections once jump connects to the database it can periodically check for new observations and update the jump table as they come in.
	And then if we have a model driven multivariate control chart report open with automatic re calc turned on. We have a mechanism for updating the control charts as new data come in.
	And the whole process of adding data to a database will likely be going on on a separate computer from the computer doing the monitoring.
	So I have two sessions of jump open to emulate this both sessions have their own journal in the materials are provided on the Community.
	And the first session will add simulated data to the database and it's called the streaming session and the next session will update reports as they come into the database and I'm calling that the monitoring session.
	One thing I really liked about the downs and Vogel paper was that they didn't provide a single metric to evaluate the control of the process. I have a quote from the paper here. We felt like
	We felt that the trade offs among possible control strategies and techniques involved, much more than a mathematical expression.
	So here's some of the goals they listed in their paper which are relevant to our problem maintain the process variables that desired values minimize variability of the product quality during disturbances and recover quickly and smoothly from disturbances.
	So we will assess how well our process achieve these goals, using our monitoring methods.
	Okay.
	So to start off, I'm in the monitoring session journal and I'll show you our first data sent the data table contains all the variables I introduced earlier, the first set are the measurement variables. The next set our composition variables. And then the last set are the manipulated variables.
	And the first script attached here will fit a pls model it excludes the last hundred rose is a test set.
	And just as a reminder, this model is predicting our two product composition variables as a function of our process variables but pls model or PLS is not the focus of the talk. So I've already fit the model and output score columns here.
	And if we look at the column properties. You can see that there's a MD MCC historical statistics property that contains all the information
	On your model that you need to construct the multivariate control charts. One of the reasons why monitoring multivariate control chart was designed this way was
	Imagine you're a statistician, and you want to share your model with an engineer, so they can construct control charts. All you need to do is provide the data table with these formula columns. You don't need to share all the gory details of how you fit your model.
	So next I will use the score columns to create our control turn
	On the left, I have to control charts t squared and SPE there 860 observations that were used to estimate the model. And these are labeled as historical and then I have 100 observations that were held out as a test set.
	And you can see in the limit summaries down here that I performed a bond for only correction for multiple testing.
	As based on the historical data. I did this up here in the red triangle menu, you can set the alpha level, anything you want and
	I did this correction, because the data is known to be a normal operating conditions. So, we expect no observations to be out of control and after this multiplicity adjustment, there are zero false alarms.
	On the right or the contribution proportion heat maps. These indicate how much each variable contributes to the outer control signal each observation is on the Y axis and the contributions are expressed as a proportion
	And you can see in both of these plots that the contributions are spread pretty evenly across the variables.
	And at the bottom. I have a score plant.
	Right now we're just plotting the first score dimension versus the second score dimension, but you can look at any combination of the score dimensions using these drop down menus, or this arrow.
	Okay, so we're pretty oriented to the report, I'm going to switch over to the monitoring session.
	Which will stream data into the database.
	In order to do anything for this example, you'll need to have a sequel light odd see driver installed. It's easy to do. You can just follow this link here.
	And I don't have time to talk about this but I created the sequel light database. I'll be using and jump I have instructions on how to do this and how to connect jump to the database on my community webpage
	This is example might be helpful if you want to try this out on date of your own.
	I've already created a connection to this database.
	And I've shared the database on the community. So I'm going to take a peek at the data tables in query builder.
	I can do that table snapshot
	The first data set is the historical data I I've used this to construct a pls model, there are 960 observations that are in control.
	The next data table is a monitoring data table this it is just contains the historical data at first, but I'll gradually add new data to this and this is what our multivariate control chart will be monitoring.
	And then I've simulated the new data already and added it to this data table here and see it starts at timestamp 961
	And there's another 960 observations, but I've introduced a fault at some time point
	And I wanted to have something easy to share. So I'm not going to run my simulation script and add the database that way.
	I'm just going to take observations from this new data table and move them over to the monitoring data table using some JSON with sequel statements.
	And this is just a simple example emulating the process of new data coming into a database, somehow, you might not actually do this with jump. But this is an opportunity to show how you can do it with ASL.
	Next, I'll show you the script will use to stream in the data.
	This is a simple script. So I'm just going to walk you through it real quick.
	The first set of commands will open the new data table from the sequel light database, it opens up in the background. So I have to deal with the window, and then I'm going to take pieces from this new data table and
	move them to the monitoring data table I'm calling the pieces bites and the BITE SIZES 20
	And then this will create a database connection which will allow me to send the database SQL statements. And then this last bit of code will interactively construct sequel statements that insert new data into the monitoring data. So I'm going to initialize
	Okay, and show you the first iteration of this loop.
	So this is just a simple
	SQL statement insert into statement that inserts the first 20 observations.
	Comment that outset runs faster. And there's a wait statement down here. This will just slow down the stream.
	So that we have enough time to see the progression of the data and the control charts by didn't have this this streaming example would just be over too quick.
	Okay, so I'm going to
	Switch back to the monitoring session and show you some scripts that will update the report.
	Move this over to the right. So you can see the report and the scripts at the same time.
	So,
	This read from monitoring data script is a simple script that checks the database every point two seconds and adds new data to the jump table. And since the report has automatic recount turned on.
	The report will update whenever new data are added. And I should add that realistically, you probably wouldn't use a script that just integrates like this, you probably use Task Scheduler and windows are automated and Max better schedule schedule the runs
	And then the next script here.
	will push the report to jump public whenever the report is updated.
	I was really excited that this is possible and jump.
	It enables any computer with a web browser to view updates to the control chart. You can even view the report on your smartphone. So this makes it easy to share results across organizations. You can also use jump live if you wanted the reports to be on a restricted server.
	And then the script will recreate the historical data and the data table in case you want to run the example multiple times.
	Okay, so let's run the streaming script.
	And look at how the report updates.
	You can see the data is in control at first, but then a fault is introduced, there's a large out of control signal, but there's a plant wide control system that's been implemented and the simulation, which brings the system to a new equilibrium
	I give this a second to finish.
	And now that I've updated the control chart. I'm going to push the results to jump public
	On my jump public page I have at first the control chart with the data and control at the beginning.
	And this should be updated with the addition of the data.
	So if we zoom in on the when the process first went out of control.
	Your
Jeremy Ash	It looks like that was sample 1125 I'm going to color that
	And label it.
	So that it shows up in other plots and then
	In the SP plot it looks like this observation is still in control.
	And what chart will catch faults earlier depends on your model. And how many factors, you've chosen
	We can also zoom in on
	That time point in the contribution plot. And you can see when the process. First goes out of control. There's a large number of variables that are contributing to the out of control signal. But then when the system reaches a new equilibrium, only two variables have large contributions.
	So I'm going to remove these heat maps so that I'm more room in the diagnostic section.
	And to make everything pretty pretty large so that the text would show up on your screen.
	If I hover over the first point that's out of control. You can get a peek at the top 10 contributing variables.
	This is great for quickly identifying what variables are contributing the most to the out of control signal. I can also click on that plot and appended to the diagnostic section and
	You can see that there's a large number of variables that are contributing to the out of control signal.
	zoom in here a little bit.
	So if one of the bars is red. This means that variable is out of control.
	In a universal control chart. And you can see this by hovering over the bars.
	I'm gonna pan, a couple of those
	And these graph, let's our IR charts for the individual variables with three sigma control limits.
	You'd see for the stripper pressure variable. The observation is out of control in the university control chart, but the variables eventually brought back under control by our control system. And that's true for
	Most of the
	Large contributing variables and also show you one of the variables where observation is in control.
	So once the control system responds many variables are brought back under control and the process reaches
	A new equilibrium
	But there's obviously a shift in the process. So to identify the variables that are contributing to the shift. And one thing you can look at is a main contribution.
	Plot
	If I sort this and look at
	The variables that are most contributing. It looks like just two variables have large contributions and both of these are measuring the flow rate of react in a in a stream one which is coming into the reactor
	And these are measuring essentially the same thing except one is a measurement variable and one's a manipulated variable. And you can see
	In the university control chart that there's a large step change in the flow rate.
	This one as well. And this is the step change that I programmed in the simulation. So these contributions allow us to quickly identify the root cause.
	So I'm going to present a few other alternate methods to identify the same cause of the shift. And the reason is that in real data.
	Process shifts are often more subtle and some of the tools may be more useful and identifying them than others and will consistently arrive at the same conclusion with these alternate methods. So it'll show some of the ways that these methods are connected
	Down here, I have a score plant which can provide supplementary information about shifts in the t squared plant.
	It's more limited in its ability to capture high dimensional shifts, because only two dimensions of the model are visualized at a time, however, we can provide a more intuitive visualization of the process as it visuals visualizes it in a low dimensional representation
	And in fact, one of the main reasons why multivariate control charts are split into t squared and SPE in the first place is that it provides enough dimensionality reduction to easily visualize the process and the scatter plot.
	So we want to identify the variables that are
	Causing the shift. So I'm going to, I'm going to color the points before and after the shift.
	So that they show up in the score plot.
	Typically, when we look through all combinations of the six factors, but that's a lot of score plots to look through
	So something that's very handy is the ability to cycle through all combinations quickly with this arrow down here and we can look through the factor combinations and find one where there's large separation.
	And if we wanted to identify where the shift first occurred in the score plots, you can connect the dots and see that the shift occurred around 1125 again.
	Another useful tool. If you want to identify
	Score dimensions, where an observation shows the largest separation from the historical data and you don't want to look through all the score plots is the normalized score plot. So I'm going to select a point after the shift and look at the normalized score plot.
	I'm actually going to choose another one.
	Okay.
Jeremy Ash	Because I want to look at dimensions, five, and six. So the
	These plots show the magnitude of the score and each dimension normalized, so that the dimensions are on the same scale. And since the mean of the historical data is is that zero for each score to mention the dimensions with the largest magnitude will show the largest separation.
	Between the selected point and the historical data. So it looks like here, the dimensions, five and six show the greatest separation and
	I'm going to move to those
	So there's large separation here between our
	Shifted data and the historical data and square plot visualization is can also be more interpreted well because you can use the variable loadings to assign meaning to the factors.
	And
	Here I have
	We have too many variables to see all the labels for them.
	Loading vectors, but you can hover over and see them. And you can see, if I look in the direction of the shift that the two variables that were the cause show up there as well.
	We can also explore differences between sub groups in the process with the group comparisons to do that I'll select all the points before the shift in call that the reference group and everything after in call that the group I'm comparing to the reference
	These
	And this contribution plot will will give me the variables that are contributing the most to the difference between these two groups. And you can see that this also identifies the variables that caused the shift.
	The group comparisons tool is particularly useful when there's multiple shifts in a score plot are when you can see more than two distinct subgroups in your data.
	In our case, as, as we're comparing a group in our current data to the historical data. We could also just select the data after the shift and look at a main contribution score plot.
	And this will give us
	The average contributions of each variable to the scores in the orange group. And since large scores indicate large difference from the historical data. These contribution plots can also identify the cause.
	These are using the same formula is the contribution formula for t squared. But now we're just using the, the two factors from the score plot.
	Okay, I'm gonna find my PowerPoint again.
	So real quick, I'm going to summarize the key features of the model driven multi variant control chart that were shown in the demo.
	The platform is capable of performing both online fault detection and offline fault diagnosis. There are many methods, providing the platform for drilling down to the root cause of the faults.
	I'm showing you. Here's some plots from the popular book fault detection and diagnosis in industrial systems throughout the book authors.
	Demonstrate how one needs to use multivariate and universal control charts side by side to get a sense of what's going on in the process.
	And one particularly useful feature and model driven multivariate control chart is how interactive and user friendly. It is to switch between these types of charts.
	So that's my talk here. Here's my email. If you have any further questions, and thanks to everyone who tuned in to watch this.

Presented At Discovery Summit Americas 2020

Presenter

Jeremy Ash

Fault Detection and Diagnosis of the Tennessee Eastman Process using Multivariate Control Charts (2020-US-45MP-606)

Example Files

Presenter

Files

Example Files

Advanced Statistical Modeling

Basic Data Analysis and Modeling

Consumer and Market Research

Data Access

Data Blending and Cleanup

Data Exploration and Visualization

Design of Experiments

Mass Customization

Predictive Modeling and Machine Learning

Quality and Process Engineering

Reliability Analysis