Using JMP to Build Soft Sensors for Efficient Monitoring of Chemical Production Processes
Production processes are routinely sampled to determine the concentration of key components to meet safety, environmental, or quality criteria. The deployment of temperature-compensated density meters provides an opportunity for live process monitoring to replace offline sampling. Historic process data is not suitable for model building as offline analysis occurs sporadically, with uncertainty about the exact time of sampling and with limited variability in results.
A naïve approach of varying the temperature and concentration before measuring the density (in the laboratory) leads to inflated errors (since concentration is the desired prediction). The method has merits since it only involves solvent addition and temperature control, which can be automated. We show how this model can be used as a first pass, to target evenly spaced temperatures and densities, followed by sampling to determine the concentration, to produce a model with much lower prediction uncertainties.
Exporting the model to Seeq and PowerBI enabled continuous monitoring and decreased costs from daily sampling by €15,000 per year (for a single process). The implementation removed delays waiting on the offline analysis, reduced the risk of operator exposure to process chemicals, and enabled the production team to predict and plan interventions, thus increasing operational time.
Today, I'm going to be talking to you about how we can use JMP to build soft sensors for the efficient monitoring of chemical production processes. I'd like to start by acknowledging Chris Knight, who did the actual laboratory work to gather the data, and Daniel Garten, who helped with the transfer of the models from the laboratory into the production process. These things start, as they often do in data science, with someone asking you to build a model. Naturally, that leads you to ask the question, where did the data come from? How was it generated? And there's been an improvement over time in the sensors available to us on our chemical manufacturing processes.
And this miniaturization and improvement has actually led to the point at which we can now use the same sensors in the laboratory as we do at the full production scale. This opens a lot of interesting opportunities because these sensors, you can then build a model on one scale and directly transfer it to the other because they're running the same exact equipment. In this case, I'm going to be talking about temperature compensated density meters, which are a form of device we use to not only measure flow, but also can be used to measure the density and temperature at the same time.
It's an interesting device. As you can see here, one is up in the laboratory. We have our small system here, and we have a loop where we pass the liquid through the sensor, and then this is the electronics which communicates the results back to the controller. Now, when it comes to this approach, very quickly, you reach a problem. So naturally, the first inclination is, well, I've got different concentrations. I can make those up. They're easy to do. And I can vary the temperature. That's easy to do. So therefore, that's what I'll do. I'll vary the concentration and the temperature, and I'll measure the density. Now, when you do that, there are a number of issues when it comes to build a model. One, because you had fixed concentrations at varying temperatures, you end up with only a limited number of results in your Y-axis when it comes to the prediction part of the model. Equally, if there's any interactions between temperature and density, these will not be captured by this data set. You will not be able to do the reverse. You can't take the equation and reverse it, which is what you want to do.
If we take a step back, and we think about how data is generated in the real world, what happens in the real world is we have a solution, our fixed temperature, which reaches the density out with our control. Then we take a sample of that solution, and we analyze it to find what the concentration of the analyte is. This really informs our lab process. In the lab, what we can do, is if we take a solution at a fixed temperature and either dilute it down or add analytes, we can then get to the point where we target our density. And then we sample it to work what the analyte concentration was. And this mimics both the way it works in the chemical process, but also gives us the same error structure and allows us to then fit an appropriate model. And the error here is about half the error here. So that's an improvement. So we've gone from about 0.6, plus or minus 0.6 weight/weight %, down to plus or minus 0.3, weight/weight % on the individual prediction errors. So this is an improvement. So what does this look like in practice?
Here's an example where we have taken five temperatures. We have taken evenly spaced densities and tried to get them within the analyte range of four to 12. Then we can see here that the slopes of the lines are changing as we change the temperatures, suggesting there is an interaction somewhere between density and temperature. Therefore, this model would take this into account. If we hadn't done it this way round, we may have missed the opportunity to take that interaction into account. Now, we have this model. We can then apply it at the process scale. One way of doing this is our processes are monitored using a software called Seeq. The formula tool in Seeq and the formula tool in JMP are very similar. A simple find and replace can take a formula built in JMP, and it can then be used directly in Seeq. What we can do now, is we've built a model, and we want to validate it. Here we can see over time, we've got the upper and lower bounds of the model, along with some bars representing when process samples are taken and the results obtained in those process samples.
You can see that we're getting a reasonable agreement. Having this model live allows us to target when we want samples to be taken. Rather than them all coming from the same part of the range, when it's low or high, we can start deliberately targeting and say, "Okay, for us to validate our model, we want a spread of data across the range scene and not just at points at the upper or lower bound." It gives us better data. When we've done that, we can then go back to our modeling process. Here I have an example where we have the spread across the laboratory scale in red with circles and the spread seen in the process, which is in blue's crosses. We can see here that the process data tends to be much of a much narrower range, which would present problems building models if we just use process data. This is the reason where the strength comes in from having the voluntary data is we can build a model over a wider range, which is valid across the range typically seen in the process. Now we've got that, we can rerun our modeling process. In this case, I use SVEM.
... Validating on sample models. I fitted 100 models using generalized linear regression with LASU. Then we took the average of those models in order to account for variability that might be due to the samples we have available to us. From that, I wasn't able to see that in this system where I wasn't able to get real process liquids and I had to work on ideal substitutes, there was a 1% offset between what we were seeing in the laboratory, what we're seeing in the plant and in the process. Now we've put that through SVEM, we've aligned that problem, and now we have a model which we know works both at the laboratory scale and at the process scale. We want to do something with that now. We want to make it more permanent. Going beyond Seeq, we can think about something like Power BI. Here, we take that same equation, and we now put it into Power BI, and we have our plus or minus. Our density and temperature go into Power BI from the Process Historians. That's calculated out with our model to give us our expected values. Then the results from our quality control system go into Power BI as well, and we can then plot them on the graph together.
Equally, we can do math, so we can find out how different the model is to the result and give that as action limits. We can even get emails. So we can now automate it to get an email saying, This is the difference between the model and the last routine check, and what actions to take if there's a problem. This is nice because the problem is not always with the model. With this density, temperature-compensated density meters, one issue you have sometimes is process interruptions or process issues caused to be coated and fouled. Then you'll get an offset, some sudden step change offset in the measure density for a same process liquid. This model then also gives us a warning to say, "Look, the process sensor health is a problem, and you need to consider acting on that."
In conclusion, We have improved process operator safety. They are no longer having to take as many samples, so rather than sampling daily, we now only have to sample once a month to check that our measurement systems and our model are still in agreement. It's also helped us to move from a reactive process to a proactive one. Before, they were blind to what was going on in the process, but now we can go, "Okay, right. We're heading up, and maybe in three days time, I need to schedule a change over because the process liquid is getting within the range we want to take action on. Rather than guessing and taking samples every day and trying to work out where might I be in three days time?" We've got a much better trajectory because you can see here, it's not always the same. No. Sometimes it's quite rapidly increasing. Other times, these values increase more slowly depending on what's going on in the process. So that's been helpful. Equally, with removing delays as well, because before you would take a sample, that would go to the quality control laboratory who would analyze it and get back your result, and then you decide your action.
So now we can be more proactive, and that's a good thing in terms of increased throughput through a process. And then there's also a direct saving, so about €15,000 per year per process by reducing the labor required to monitor these processes and the consumables required to analyze them. If you're interested in more details, I've summarized the whole talk as a paper on the community, where you can find the data along with a detailed method. Thank you very much.