This presentation examines a workflow created using a multivariate control chart for an industrial process system. First, a PLS model is created to predict a result from the process, such as yield or the protein content of a product, just to name a couple examples. From there, a model-driven multivariate control chart (MDMCC) is created, representing the variables critical to making that prediction.
The MDMCC is then uploaded to JMP Live. In addition, the data used to create the model and MDMCC is pulled via an add-in that was created to pull data from a PI data historian. Within JMP Live, scripting is done to automate the data pull so that it refreshes on its own, creating a live model-driven multivariate control chart. It is an excellent tool for process engineers, since it can indicate T2 and squared prediction error deviations. We like these charts because they provide more information than typical control charts, since they not only show deviations in the univariate control of variables but also deviations in the relationship between the x variables, which is not seen by typical control charts.

Hello, I'm Drew Luby. Today, I'm going to talk to you about my poster presentation called Automatically Updated Multivariate Control Charts with JMP Live. First, a little bit of background. I'm an Analytics Engineer with ADM. ADM is an agricultural processing facility, we have many plants across the world, actually. A lot of these plants are heavily instrumented with sensor data, so pressure, temperature, flow, those things. This is high-frequency data that we want to be able to analyze and troubleshoot, and take action on our data right at that moment.
Here's an example of that data or what the process might look like. In this example here, you're seeing a germ drying process, it's corn germ. There are five dryers here that all come together to make one final germ moisture at the end. You can see all of this sensor data. We have OP of valves and AMPS, and we have steam flows and pressures, and we have temperatures. There's lots of different data available in our process here, and we want to relate all of this data together and compare it to this finished moisture, which is our spec for this product.
Our challenge that we have is that we want to look at lots of different variables, but we need to be able to look at the whole process all in one. To do that, we believe that the objective is to build a model-driven multivariate control chart that's going to allow us to look at all of those variables I just showed on one chart rather than having to trend them all individually each day.
The model-driven multivariate control chart will then let us see if there are any variables that are outside of normal operation, and it would also let us see if there are any variables that are performing differently than we would expect. From that model-driven multivariate control chart, we're able to dig in deeper and find out if there are single variables that are failing or that are out of spec or that are at different parameters than they typically are. That directs us to those so that an engineer or an operator in the plant can take action right away.
That's ultimately our objective, is to use the anomaly detection to alert to those process changes and take some action so that we don't run for a long time with a parameter that outside of spec or a parameter or a valve performing differently than we would expect it to be. Lots of examples there. We want to identify them quickly, and we want to be able to take action. The model-driven multivariate control chart does that for us.
The method that we'll take to do this is shown right here. I'll go through these steps both here in the poster, and then I'll actually show them in JMP. The first thing we need to do to access our data is to use an add-in that we have called OSIsoft PI Tools. This is something that we've created, it is available in the community, and I believe it's linked on the poster. This add-in allows us to pull these PI tags, which is our data historian PI, pull them into JMP, and not only that, it creates a JSL script that we can copy and paste on the JMP Live side so that we can automatically pull new data every day. Which is necessary for troubleshooting the process at that moment.
After we pull the data in, we'll build a PLS or a Partial Least Squares model. Technically, we could build a model-driven multivariate control chart off of either PCA or PLS models, but we'll choose PLS here since it's a multivariate model that has a prediction of a Y variable. In this example that I'm using, we want to use all those process parameters I showed, the temperatures, the pressures, the AMPS, the flows. We want to use all of those to try to predict that final moisture. We'll build a PLS model and utilize that for our model-driven control chart.
Once we have the model-driven multivariate control chart that's created from PLS, we'll build a dashboard in JMP that will show the control chart and then also a process screening platform that allows us to look at each of these variables individually as well. Then, finally, once we have that published to JMP Live, I'll go through the steps and show how we automate that data pull.
Again, the value we're creating or the objective we're creating is to create a tool for our process engineers, our operators, that's accessible to them, and that's something that shows new real-time data when they look at it. Then that tool will show them where there may be problems in the process without having to trend every single variable. Really, the model-driven multivariate control chart, combined with JMP Live, has served that purpose perfectly. We'll head over to JMP, and I'll show exactly how this is done.
Within JMP, we have created this add-in, JMP OSI PI data right here. This is a nice add-in. Basically, it allows us to type in the address for our server. I've got some of these based on drop-downs. Then, if you're familiar with PI data, it gives you the type of queries that PI uses, and it also has the start and end times using PI time. If you're not familiar with that, that might be something to look up. If your facility uses PI, though, you probably are familiar with it.
In this example, I'm going to use a 7-day pull, so star minus 7 days, I'm going to pull interpolated values every hour. The tags that I'm going to use, I've already gone through and preselected these. I'm not using every tag from that display I showed you. These are some of the critical ones, but I could use every tag. I simply paste them in the window here and I can change my searches. I can make a long table versus a wide table. Lots of things I can do here to make my data pull what we're looking for. The critical part of this add-in is right here in that it generates a script, and this script can be copy and pasted into JMP Live. It's fairly long, but it can be copy and pasted into JMP Live to allow the recreation of this data table.
Importantly, because I'm using relative timestamps, so star minus 7 days to star would be today from today minus 7 days ago, every time it runs, it will pull new data based on whatever time JMP Live is telling it to rerun. When this runs tomorrow, it would be from tomorrow till 7 days ago. I've always got live data pulling.
I won't let that run, it actually can take a bit of time. On the video here, I've already recreated this a bit ago, and I have done some hiding and excluding to clean up the data table some. This is prepared, but you can see it's these same tags up above, and it's a 7-day run. This one happened to start on July 23rd and run through July 30th.
Normally, when I would bring this data in, there's a lot of cleaning that needs to go on, and organization of the data table, and that's already been done here. I don't want to act like I'm skipping ahead in steps, but I've just pre-done that work.
To make that Partial Least Squares model, I have JMP Pro, so I can do it within the Fit Model platform, or it's right here in Partially Squares under Multivariate Methods. I'm going to predict this germ dryer moisture from all of these Process Variables. I'm going to go ahead and let it do K-fold validation. Partial Least Squares, I hit Run, and I will use the default settings here, and so this model is created.
For my profession, where I work, this is a pretty good model. We have a lot of process noise, so we're not necessarily getting to things that are 90% or 0.9 R². This is acceptable for what we do. Four factors here, and that's fine. I could probably go further and clean this up a little bit, but this is the type of model that, directionally, I think is correct, and as far as accuracy is concerned, I think it's good enough for our example. Again, normally I'd have an iterative process there to clean this up a little bit.
To create a model-driven multivariate control chart, there's a couple of ways from here. I can go straight to Model-driven Multivariate Control Chart for Saved X Scores. What that will do is save those X Scores to the data table and create a model-driven multivariate control chart off those, and that's fine. What I'm going to choose to do, though, is save the prediction as an X Scores formula because, in this example, maybe I want to create a trend that's showing my predicted moisture in between my samples as well as the model-driven multivariate control chart. That's the pathway I took here. Either one would get you to this state, which I'm about to show, which is some columns for X Scores formulas.
If you're not as familiar with factors and PLS modeling, that might be something we can discuss at Discovery, but this is a factor, and you can see that it is a formula that's a linear combination of all those variables. There's four of those factors here. Then this prediction formula is actually a linear combination of the four factors. That's how the PLS model reports there.
To make a model-driven multivariate control chart, I would come to Analyze, Quality and Process, Model-Driven Multivariate Control Chart. I can just put these X Scores into the Process Variable and time into the Time ID, and it generates a plot of T².
Without getting too heavy into statistics, the T² is essentially measuring how far away I am from the center of my process of all of those variables. One of the ways I explain this, we have a rate that runs through our plant. We call it a grind rate, where we're grinding corn in that situation. If we typically have a standard rate that we run. If we were to be down on grind half rate or zero grind, my T² would be high because it would be very different from the normal state. There's also things like if my moisture coming out of one of the dryers was extremely high or the steam was much higher than it typically is, all the variables could be running either really high or really low, and that's going to contribute to a high T². It's a statistic that lets me tell the state of my process for all those variables by really looking at one statistic called the T².
An example here, if I look at this point, you can see that it allows me to show those bars. Then right here, let me highlight that point to make… You can see that that's the one that's most out of control. Sure enough, this germ dryer 4 moisture was running way higher than it typically does. What's nice is that I've got a control chart for germ dryer 4 here showing that it's way out of spec, but I'm able to show for all of those variables with one control chart. That's really the value-added principle here, is that I can look at hundreds variables with one statistic.
I like to do a Contribution Heat Map here, which I believe helps a process engineer troubleshoot this, so I can tell the germ dryer 4 moisture was red there. Really quickly, I could see T², what caused it, this one right here. They'd know right away where they need to go look and see what the problem is.
The other thing I like to add to my model-driven multivariate control chart under Monitor the Process is a squared prediction error plot. I'm going to resize the scale a little bit. SPE, or squared prediction error, that's measuring something a little bit different than T². What SPE is measuring is how far away we are from the prediction of the model or outside of the model plane, so to speak.
If SPE is high, what that means is that the value we predicted is way different than what the model would or the value we predicted was different than what actually occurred, or the value of the prediction of the X matrix, if you will, is different than what actually occurred. That means that the interaction between my variables was different than I expected. In my situation with my example, and I've looked into this a bit, you can see that Germ dryer 1 moisture is the one that's high here with SPE being high.
What's actually happening in that situation is that usually when we increase steam, the steam valve opens, the steam flow meter increases, usually we'll see the germ dryer moisture come down. In this case, it did not come down. That tells me that there's something different in the prediction structure of the model, and SPE alerts as being high. That's something that a univariate control chart will not always catch or T² won't catch because each of those variables alone may be within their normal control limits.
Normally, they're positively correlated, and for this time period, they're being negatively correlated. That's important to our plant because it might identify, one, a lab sample was an improper value. Maybe it was sampled in the wrong spot. Could mean that my steam flow meter has failed. Could mean that there's something wrong with my dryer. That's important for us to discover. Again, I do the same thing with a Contribution Heat Map, and I stretch it out a bit.
Just so happens in this example, you can see there's missing data here. We happen to have some outages in here, and you can see how things are highlighting because of that. That is the model-driven multivariate control chart that I think we've found to be most useful for our engineers and process engineers to troubleshoot the process.
A common thing that they need to do is look at… They see a high T², or they see a high SPE, and then they want to see that variable on a univariate control chart. I think the best way to show that is under Analyze, Quality and Process, Process Screening, where I'll put all of these values into my Process Variables and Time into Time.
Now I get a handy, and you can tell we have an outage in this data set, so everything's red, but it actually sorts the variables by the least stable. That way you'd again know which variable to go look at when you're trying to troubleshoot the process. If I highlight all of them and do Show Charts for Selected, I get univariate control charts, and I'll turn those markers on. You can see this is going down, and that's so there's that example.
This tool, in combination with that model-driven multivariate control chart tool, are very powerful in that I can quickly know if a T² is in line and SPE is in line, in theory, as an engineer, I can move on to another process. I don't need to look at the germ process. It's running like it normally does, and all the variables are interacting like they normally do. If I saw something, I could use this heat map to try to figure out which variable is off, and then I could come over and look at the univariate control chart and see what's going on with it.
Now the challenge is, that's great, but how do we get it into a form that's easy to share, that people can access easily, and that's going to update every time without somebody having to come in every morning and pull it into JMP again. The way we do that is first we're going to publish this to JMP Live. First, we're going to make a dashboard. I'm going to combine windows. We'll call it Dashboard. That's fine. Let's try that again. We need to choose the ones that we want to combine.
Now we have a dashboard. In my experience, this doesn't look as good in JMP Live because you get scroll bars, and you have to zoom through things, and so I like to come up here to this red hotspot, Edit Dashboard. I'm going to click this one and drag it and dock it right here where it says Insert After Tab, and then I'll run that dashboard.
Now I have a nice dashboard that's tapped. You can imagine my engineer coming in here, looking at this, seeing if they need to pay attention to the germ system or not. If they do, on the next tab over, they have all their univariate control charts.
This is what I actually want to publish to JMP Live. I'll File, Publish Reports to JMP Live. It just takes a moment. We're going to publish that dashboard, and I'm going to put it in a poster presentation folder right now. I'm going to give it a title Poster Presentation Germ Example. We'll publish a new data table, and then I just hit Publish, and I can open it right here. Let's bring it over here. This is the JMP Live side. The benefit of JMP Live is that anybody can access it in our company, whether or not they have JMP or not. It gives them the opportunity to review these. It's interactive. If I highlight dots there, I can see them here, so I can see the points. I can bring them over here. It's all connected and interactive, which is really handy, much better than sending out PDFs or images.
This is exactly what I wanted to look like. You can see where I had an excluded data, we would have had really high T², because that's where I believe we had a lot of outages. That's why SPE was so high during those times, too. This right here is what I want to automate. I want this to pull in new data every hour so that we can always see the most recent data.
The way JMP Live works, this is a report, and behind the scenes is actually the data table that we created. It published my data table into JMP Live, and so the key to automation is to actually have this data table update with new PI data, whatever frequency, and then it will automatically generate that new report.
I'm on the data table side, I can go to Settings, and I can do a Refresh data via script. Back on my JMP data table, right here is actually the script that my add-in generated. I can copy and paste this right into that JMP Live window, and it will work. It will work to recreate the table as I originally made it. It will not recreate these X Scores 1 formulas or the prediction.
What you have to do, and you could use the log or your workflow builder when you create these columns, or if you do anything like putting spec limits in, any changes you make to your data table, you need to copy those scripts. I've already created that here just to save some time, but you can see down at the bottom of my script, I've done new formula columns, basically, and I just copy this using the Log or Workflow Builder. I copy both the formula for that column, and I give it the model-driven multivariate control chart statistics.
That's all copied and pasted at the end. The rest of this stuff at the top comes straight from that add-in, all I need to do is come to this empty script, paste it in there, hit Save, give it a refresh schedule, so I can choose. I'm going to have this repeat every hour. I'm going to have my example canceled tomorrow, so it's not using the server too much. I could have it repeat every 5 minutes if I wanted to. I'm just going to have it do it every hour and I can choose a schedule.
Every time that runs, this script is going to go back out to our PI Server, call for new data, rebuild this data table, and then the report will automatically be generated on that new data. I also have to give it some credentials so that it can access our PI Server. I hit Save, and then I can hit Refresh Data to verify it works. It should just take a second to show that it works.
It failed on me, of course, here while I'm doing my… Because the thing that I still had in my clipboard was those old tags that we pasted into the add-in. We're going to copy that, paste that. Sorry about that. We'll hit Save. Now it's saved, and now we should be able to refresh data. Sorry about that error. While this is running, I'll say one of the other advantages we see about JMP Live is that sometimes we have data tables that we want to update every day, every month, whatever it is. Instead of having a person go in and update JMP every time, you can actually just build it in here, set your update frequency, and here I can reload it, and then you can always pull that data table right back out of JMP Live. It's a really good tool for being able to just update the data tables, let alone creating reports that automatically update.
Now you can look at my report here, and you can see our data now goes from 8/25 to 8/28. You can see that the T² is coming down. I know that plant is getting through some maintenance issues they were going through there. In fact, I believe we had a dryer down, which is why that steam pressure controller is always read there. This is the tool we want our people to have and that we have found value out of. Again, navigate here with a quick look, is my T² inspect, just like I'm looking at any control chart, and then I'm looking at my SPE. If both of those are good, then I really don't need to worry about those could be hundreds of univariate variables that are behind the scenes.
If there is something that's concerning, a high T² up here, I can either look here at my heat maps or I can come over here and look at the process screening, and you can see that's the one that was down. Again, I have an interesting example here where we had some up and downs with our dryers, but you can then quickly find which variable is problematic. Is it running out of spec? Is something operating differently than we would expect it to be? Then once you find that, then you can more easily and quickly take action on it.
That is all that I wanted to show. If you're going to be at Discovery and you have questions about this, I will be there and we can discuss. Thank you.