Choose Language Hide Translation Bar

Structural Equation Modeling of Coupled Twin-Distillation Columns (2021-EU-45MP-756)

Markus Schafheutle, Consultant, Business Consulting
Laura Castro-Schilo, JMP Senior Research Statistician Developer, SAS
Christopher Gotwalt, JMP Director of Statistical R&D, SAS

 

We describe a case study for modeling manufacturing data from a chemical process. The goal of the research was to identify optimal settings for the controllable factors in the manufacturing process, such that quality of the product was kept high while minimizing costs. We used structural equation modeling (SEM) to fit multivariate time series models that captured the complexity of the multivariate associations between the numerous process variables. Using the model-implied covariance matrix from SEM, we then created a Prediction Profiler that enabled estimation of optimal settings for controllable factors. Results were validated by domain experts and by comparing predictions against those of a thermodynamic model. After successful validation, the SEM and Profiler results were tested in the chemical plant with positive outcomes; the optimized predicted settings pointed in the correct direction for optimizing quality and cost. We conclude by outlining the challenges in modeling these data with methodology that is often used in social and behavioral sciences, rather than in engineering.

 

 

Auto-generated transcript...

 

Speaker

Transcript

Hello, I'm Chris Gotwalt and
today I'm going to be
presenting with Markus Schafheutle
and Laura Castro-Schilo
on an industrial application of
structural equations models, or
SEM. This talk showcases one of
the things I enjoy most about
my work with JMP. In JMP
statistical development, we have a
bird's eye view of what is
happening in data analysis
across many fields, which gives
us the opportunity to cross
fertilize best practices across
disciplines.
In JMP Pro 15, we added a new
structural equations modeling
platform. This is the dominant
data analytic framework in a
lot of social sciences because
it flexibly models complex
relationships in multivariant
settings. One of the key
features is that variables may
be used as both regressors and
responses at the same time as a
part of the same model.
Furthermore, it occurred to me
that with these complicated
models they are represented with
diagrams that, at least on the
surface, look like diagrams
representing manufacturing
processes. I wasn't the only one
to make this connection. Markus,
who was working with a
chemical company, thought the
same thing. He was working on a
problem with a chemical company
with a two column twin
distillation manufacturing
process where they wanted to
minimize energy costs which were
largely going to steam
production, while still making
product that stayed within
specification. He reached out to
his JMP sales engineer,
Martin Demel, who then connected
Markus to Laura and I.
We had a series of meetings
where he showed and described
the data, the problem and the
goals of the company. We were
able to model the data
remarkably well. Our model was
validated by sharing the results
as communicated with the JMP
profiler to the company's
internal experts and then with
the first principle simulator
and then with new physical data
from the plant. This was a clear
success as a data science
project. However, I would like
to add a caveat. The success
required the joint effort of
Laura, who is a top tier expert
in structural equations
modeling. Prior to joining JMP,
she was faculty in quantitative
psychology at the University of
North Carolina, Chapel Hill, one
of the top departments in the
US. She is also the inventor of
the SEM platform in JMP Pro.
This exercise was challenging
even for her. She had to write a
JSL program that itself wrote
a JSL program that specified
this model, for example.
The model we fit was perhaps
the largest and most
sophisticated SEM model of all
time. I want to call out the
truly outstanding and
groundbreaking work that
Laura's done both with the SEM
platform generally and in this
case study in particular. Now
I'm going to hand it over to
Markus, who is going to give
background to the problem the
customer wanted to solve, then
Laura is going to talk about
SEM and her model for this
problem. I'll do a brief
discussion of how we set up the
profiler and then Markus will
wrap up and talk about the
takeaways from the project and
the actions that the customer
took based on our results.
Thank you, Chris, for the
introduction.
Before I start with the problem,
I want to make you familiar with
the principles of distillation.
Distillation is a process which
separates a mixture of liquids
most of the time and separates
them into their individual
components. So how does this
work? Here, you see a schematic
view of a lab distillation
equipment. You see here is a
flask where the crude mixture,
which has to be separated, is
inside. You heat this up, here
in this case, with an oil bath
and stir it and then it starts
boiling. So the lowest boiling
material starts first and then the
vapor goes up here and pauses
here the thermometer to reach
the boiling temperature and then
further goes into these cooler
here where it condensates and
then the condensates drop here
into the other little flask.
So as I said, it's built
from...for separating a mixture
of liquids with different
boiling points, and those are
separated by boiling point.
For example, everybody knows
that perhaps if you want to make
schnapps from a mash, you just
make the mash inside of this
flask, heat it up, and then you
distill the alcohol over here
and get the schnapps.
As it looks very simple
here in the lab, in the
industry it's a bit more
complicated. Because let's
say the equipment is not
only bigger but more
complex and mainly because
of of the engineering part
of of the story.
So in this study, the
distillation was not done
batchwise as you've seen before,
but in a continuous manner. This
means the crude mixture is
pumped somewhere in the middle of
the column and then the low
boiling material goes up as a
vapor to the top of the column
and there it leaves the column
and the other higher boiling
material flows downward and
leaves the column on the bottom.
And to make it a bit more
complicated, in our case the
bottom stream is then pumped
into a second column and
distilled again, so to
separate another material
from the main..from...
from the residual stuff. So
actually we separated the
original crude mixture into
three parts Distillate 1,
Distillate 2 and what's
left, then the bottom stream
of the second still.
And to make it even more
complex, we used the heat of the
distillate of this second column
to heat the first column in
order to save energy for this.
So in a schematic view,
this looks like this.
Here you have Still 1 and
there's Still 2. And here we
have the raw material mix which
is actually assembly of
distillates from the
manufacturing and what we want
to do is to separate the value
material from the rest. So we
pumped this crude mix into the
first stills. As I said somewhere
in the middle it separates in
the low boiling point, which is the
first value material we want to
have, and the rest leaves the
first still on the bottom. Then
it's stored immediately here in
a tank and then from here it's
pumped again into the second
still, again somewhere in the
middle, and it separates into
the second value material and
away stream, which was then
redeposited. To heat the stuff,
we start on this side because
this is the higher boiling
material, so we need higher
temperatures and this means also
more energy. So we pump in the
steam here, heat this still here,
and the material leaving here on
the top has the temperature of
the boiling of this material, so
we used the residual heat to
heat the first still. And if
there's a little gap between
what's coming up from here and
what we need for for
distillation here in the first
still, we can add a little
extra steam to keep
everything running.
So this is a very high level
view of the things, and if you
want to go a bit more into the
details, here it's kind of the
same picture. But here I show
you with all the different tags
which we have here for all the
readouts and quality control and
temperature control and so on
and so forth. So what you see
here? We start here, for feed one
into the first still.
And then we separate into the top
stream where we control the
density, which is a quality
characteristic. And in the
bottom stream, the bottom stream
goes in this intermediate tank.
And then from here it's fitted
again into the second still and
also and separated here in the
bottom stream for top stream.
And again we are testing here
density for quality control.
Here also we add the steam to
heat all these things up and
the top flow then goes via heat
exchanger into the first still
and heat that up again. And here
we have the possibility to add
some extra steam to have
everything in balance here.
So what we see is on a local
basis you have a lot of
correlation, so this is done
here with the color code of the
arrows. So for example, the feed
here together with the feed
density, which is a measure for
the composition here of this
feed. So together with these two
are defining the top stream and
quality here, and that
bottom stream more or less. So you
have some local predictions. Also
over here the material going in
here and here defines stuff over
here. But if you want to have a
total description of the entire
equipment, then it gets tricky
because you can do local least square
correlations here. You can do it
here. You can do it separately
for the steam or also here. But
as you see, we have a start of
the mass stream coming here, going
through first still, through the
second still, to here and we have
an energy stream which starts
more or less here, going through
here via the heat exchanger also
down here. So it's a kind of a
circuit, which we have here, and
all these things are correlating
more or less in a kind
of circuit and this gives gives
us the difficulty that we
actually didn't know what the Xs
and the Ys were.
And that was the reason where we
started to think about other
possibilities to model this.
So the target for this study was
to find the optimal flow and
steam settings for all these
varying incoming factors
here, and in a way that we
are able to stay everything
in spec. So the distillate
quality should stay in spec but
also internal operational
specs and also the spec for
the final waste stream.
And the most interesting part,
at least money
wise, we want to minimize
the consumption of the
speed...sorry, of the steam.
So what we actually needed
was first of all, a good model
which describes this and that
was the point where Laura came
into the game here and developed
this structural equation model.
And we also need the kind of
profiler which enables us to
figure out what are the best
settings, the optimal settings
for all these incoming
variations, which we may have
here in order to stay within all
these specs. And that was the
point where Chris came in,
building on the model from Laura,
a profiler, which we can use for
doing all the predictions we
need. So now I want to pass
over to Laura to describe the
model she built from this
data here. Laura, please.
Thank you, Markus. I'm Laura
Castro-Schilo and I'm going
to tell you about the steps
we followed to model the
distillation process using
the structural equation
models platform.
So when Markus first came and
talked to us about his project,
there were three specific
features that made me realize
that SEM would be a good tool
for him. The first is that
there was a very specific
theory of how the processes
affect each other, and we saw
that on the diagram that he
showed.
An important feature of that
diagram is that all variables
had dual roles. In other words,
you can see that arrows point at
the nodes of the diagram, but
those nodes also point at other
variables, so there wasn't a
clear distinction between what
was an input and what was an
output. Rather, variables had
both of those roles.
Lastly, it was important to
realize that we were dealing
with processes that were
measured repeatedly. In other
words, we had time series data
and so all of these features
made me realize that SEM would
be a good tool for Markus. Now,
if you're not familiar with SEM,
might wonder why. SEM is a very
general framework that affords
lots of flexibility for dealing
with these types of problems.
I've listed in this slide a
number of different features
that make SEM a good tool, but
since we're not going to be able
to go through all of these, I
also included a link where you
can go with learn more about SEM
if you're interested. Now I'm
going to focus on two of the
points I have here. The first
is that SEM allows us to test
theories of multivariate
relations among variables,
which was exactly what Markus
wanted to do.
Also, there are very useful
tools in SEM called path
diagrams. These diagrams are
very intuitive and they
represent the statistical models
that we're fitting.
So let's talk about that point a
little more. Here is an example
of a path diagram that we could
draw in the SEM platform to
represent a simple linear
regression, and the diagram is
drawn with very specific
features. For example, we're
using rectangles to represent
the variables that we have
measured. Here, it's X and Y. We
also have a one-headed arrow to
represent regression effects. And
notice the double-headed arrows
that start and end on the same
variables represent variances.
Now, if these were to start and
end on a different variable,
those double-headed arrows would
then represent a covariance. In
this case, we just have the
variance of X and the residual
variance of Y, which is the part
that's not explained by the
prediction of X.
So this is the path diagram
representation of a simple
linear regression. But of course
we could also look at the
equations that are represented
by that diagram. And notice that
for Y, this equation is that of
simply a linear regression. And
I've omitted here the means and
intercepts just for simplicity.
It's important to note that
all of the parameters in
the equations are
represented in the path
diagram, so these diagrams
really do convey the
precise statistical model
that we're fitting.
Now in SEM, the diagrams or
models that we specify imply a
very specific covariance
structure. This is the
covariance structure that we
would expect given the simple
linear regression model. So you
can see we have epsilon X as the
variance of X. We also have the
variance of Y, which is a
function of both the variance of
X and the residual variance of
Y, and we also have an
expression for the covariance of
X and Y. And generally speaking, the
way that model fit is assessed
in SEM is by comparing the model
implied covariance structure to
the actual observed sample
covariance of the data, and if
these two are fairly close to
each other, we would then say
that the model fits very well.
So a number of different models
can be fit in SEM.
And today our focus is
going to be specifically
on time series models.
When we talk about time series,
we're speaking specifically about
a collection of data where there
is dependence on previous data
points, and these data are
usually collected across equally
spaced time intervals.
And the way that time series
analysis deals with the
dependencies in the data is by
regressing on the past. So one
type of these models are called
autoregressive processes or
ARP. And you can see here, where
Y represents a process that is
measured at time T, the auto
regressive models consist on
regressing that process on
previous observations of that
process up to time T minus P.
So if we're talking
specifically about an
autoregressive one process,
then you can see we have the
process YT regressed on its
immediately adjacent past YT
minus one.
And the way that we would
implement this in SEM is simply
by specifying, as we saw before,
the regression of YT on YT minus
one. So notice that here the
regression equation is very
similar to what we saw in
the previous slide, and so
it's no surprise that the
path diagram looks the same.
And we can extend this AR(1)
model to one that includes two
lags, in other words, an
autoregressive of order two. And
here we see we have the process
YT that is being regressed on
both T minus 1 and T minus 2.
And if we look at the path
diagram that represents that
model, we see that we have an
explicit representation for the
process at the current time, but
also at the lag one and lag two.
A very specific aspect of this
diagram is that the paths for
adjacent time points are set to
be equal to each other, and this
is an important part of the
specification that allows us to
specify the model correctly. So
notice here we're using beta 1
to represent this lag 1
effects and we also have to
set equality constraints on
the residual variances.
Lastly, we also have the effect
of YT minus 2 as it's
predicting YT, and so here's the
lag 2 effects.
Now all of these models are
univariate time series models,
and you can fit them using the
structural equation modeling
platform in JMP or you could
also use the time series
platform that we have available.
However, the problem we were
dealing with with Markus' data
require more complexity. It
required us to look at
multivariate time series models
and a type of these models are
called vector autoregressive
models. And what I'd like to
show you is one of these models
of order two.
So we have a process for X and
another one for Y, and the same
autoregressive effects that we
saw before are included here.
Notice we have our equality
constraints which are really
important for proper
specification. But we also have
the cross lagged effects which
tell us how the processes
influence each other. And notice
here gamma 1 and gamma 1 and
also gamma 3, gamma 3,
suggesting here that we have to
put equality constraints on
those lag 1 effects across
processes.
We also have to incorporate the
covariances across the processes
so we have their covariance at
time T minus 2. But we also have
the residual covariances at time
T minus 1 and T and notice these
have to have equality
constraints again to have proper
specification. So I'm going
to show you in this video how
we would fit a bivariate time
series model just like the
one I showed you, using JMP
Pro. We're going to start by
manipulating our data so that
they're ready for SEM. First,
we standardize these two
processes because they are in
very different scales.
Then we create lagged variables
to represent explicitly the time
points prior to time T. So we're
going to have
T minus 1 and T minus 2.
We launched the SEM platform and
we're going to input the Xs
then the Ys so that it's
easier to specify our models.
And now I sped up the video
so that you can quickly see
how the model is specified.
Here we're adding the
cross lagged effects
for lag 1.
And then directly using the
interactivity of the diagram, we
add the lag 2 effects.
And what remains is to
specify all the equality
constraints that are required
for these models within
process and across processes.
We name our model.
And lastly, we're
going to run it.
As you could see, even just a
bivariate time series model that
only incorporates two processes
requires a number of equality
constraints and nuances in the
specification that make it
relatively challenging. However,
in the case of the distillation
process data, we had a lot more
than two processes. We were
actually dealing with 26 of
these processes and in total we
had about 45,000
measurements, which were taken
at 10 minute intervals.
And so our first approach was
to explore the univariate
time series models using the
time series platform in JMP.
And when we did this, we
realized that for most
processes an AR(1) or AR(2)
model fit best, and so this
made me realize that really
at the very least we needed
to fit multivariate models in
SEM that incorporated at
least two lags.
We also had to follow a number
of preprocessing steps for
getting the data ready into SEM.
On the one hand, we had a lot of
missing data, and even though
SEM can handle missing data just
fine, with models that are as
complex as these ones, it became
computationally very very
intensive. And so we decided to
select a subset of data where we
had complete data for all of the
processes and that left us with
about 13,000 observations.
Also, as we saw in the video, we
had large scale differences
across the processes, so we had
to standardize all of them. And
lastly we created lag variables
to make sure that we could
specify the models in SEM.
Now for model specification,
equality constraints in
particular are very very big
challenge because it would take
a lot of time to specify them
manually and it would be, of
course, tedious and error-prone.
So our approach for dealing with
this was to generate a JSL
script that would then generate
another JSL script for launching
the SEM platform.
And what you see here is the
final model that we fit in the
platform and thankfully, after
estimating this model, we are
able to obtain a covariance
structure that is implied by the
model and that was the piece of
information that I could pass
over to Chris Gotwalt, who
then used the information from
that matrix in order to create a
profiler that Markus could use
for his purposes.
So Chris, why don't you tell us
how you created that profiler?
Thank you, Laura. Now I'm going
to show the highlights of how
I was able to take the model
results and turn them into a
profiler that the company
could easily work with.
So Laura ran her model on the
standardized data and sent me a
table containing the same model
intercepts and she also included
the original means and standard
deviations that were used to
standardize the data. On the
right we have the sim model
implied covariance matrix, which
includes the covariances between
the present values and the
lagged values from the immediate
past. This information describes
how all the variables relate to
one another. In this form,
though the model is not ready to
be used for prediction. To see
how certain variables change
as a function of others, we have
to use this information to
derive the conditional
distribution of the response
variables, given the variables
that we want to use as inputs.
So essentially we need the
conditional mean of the
responses given the inputs. So
to do that, we need to implement
this formula right here.
And to do that, we use the SWEEP
Operator in JSL, the SWEEP
Operator is a mathematical tool
that was created by SAS CEO and
co-founder Jim Goodnight. It's
was published in the American
Statistician in 1979. The SWEEP
Operator is probably the single
most important contribution to
computational statistics in the
last 100 years. Most JMP users
don't know that the SWEEP
Operator is used by every single
JMP statistical platform in
many ways. We use it for matrix
inversion, the calculations that
sums of squares and also can be
exploited as simple and elegant
way to compute conditional
distributions if you know
how to use it properly.
I created a table with columns
for all the variables. The two
rows in the table are the
minimums and maximums of the
original data, which lets the
profiler know how to set up the
ranges. I added formula columns
for the response variables using
the swept version of the
variance matrix from Laura's
model and put those formulas
into the back here in the data
table or the far right.
Here's what one of the formulas
looked like. I pulled in the
results from the analysis as
matrices. Laura's model included
the estimated covariance between
the current Ys in the last two
preceding values because it was
a large multivariate
autoregressive model of order 2.
Predicting the present by
controlling the two previous
values of the input variables
was going to be very cumbersome
to operationalize. So I made
a simplifying assumption that
these two values were to be
the same, which collapsed the
model into a form that was
easier to use. To do this, I
simply use the same column
label when I was addressing
into the lag one, and
lag two entries for term.
Without machinery in place,
I created a profiler for the
response columns of
interest. I set up
desirability functions that
reflected the company's
goals. So they wanted to
match a target on on
A2TopTemp, maximize A2BotTemp,
and so on, ultimately
wanting to minimize the sum
of the steam that came out
of the two columns.
So you can lock certain
variables in the profiler by
control clicking on a pain. The
lock variables will have their
value drawn via a solid red
line, and then once we've done
that we can enter values for
them, and when we run the
optimizer or maximize
desirability, the locked
variables will be held fixed.
This way we find settings of the
variables that we can control
that keep the product being made
to specification while
minimizing energy costs.
It's fair to say that it would
be difficult for someone else to
repeat Laura's modeling approach
on a new problem, and it would
be difficult for another person
to set up a profiler like I did
here. If enough people see this
presentation and want us to add a
platform that makes this kind of
analysis easier in the future,
you should let us know by
reaching out to Technical
Support via support@JMP.com.
Now I'm going to hand it back
over to Markus who will talk
about what the customer did with
the model and our conclusions.
Thank you, Chris.
So with the prediction Profiler,
which Chris just presented, we
used that to, let's say, make a
predictive landscape, which
makes us understanding how
the best settings should be in
order to achieve all the
necessary quality specs. And so
what the three factors which
are, let's say, are varying with
limited extent to our influence,
and what's the
feed for the Still 1 and the
feed for the Still 2 and also
the quality or the composition
of the feed into one.
And what we've turned out as in
their model as well, is the
cooling water temperature was
also playing an important role
in this scenario. All the other
variables are of smaller importance
so that we neglected them in
this first approach.
Here you see the landscape.
It's kind of a variability
chart, so to say, so we have
here the feed density for the
Still 1, the feed into Still 1
and the feed into Still 2
and all possible combinations,
more or less. And here you see then
the settings which are predicted
to be best in order to stay
within the specifications. And
here are some of these I have
specifications as well, so we
have to stay inside them.
So it's, for example, it's the
steam flow for Still 2, the
reflux there, the boiler up and
the same things for the
Still 1. And here on the right
side, you see the
predicted outcomes, so the
quality specs, so to say. So the
temperature of the top flow in
Still 1 that the density
of the distillate, the density
of the distillate of Still 2,
and so on and so forth. So what
you see is here, if you have a
look here on the desirability,
which is the bottom row here,
there's big areas where
we cannot really achieve a good
performance of our system. And if
you have a look into the details
you see, OK, here we are off spec,
here we are off spec, here on
some points, we are off spec, and so on
and so forth. But what else it sees
is that this in spec/off spec thing
is also governed not only by
these three components down
here, but also by the river
temperature, and for the moment
it's highlighted the lowest
river temperature; it's 1
degree. So this you see here
with it, we are staying most of
the time in specs, though there
only are rare
combinations of these three
factors where we aren't. But if
we are increasing the river
temperature, for example for 24
degrees, then the areas where we
are off spec are...become much
more predominant. Also here it's
very hard to stay within this
specifications. So what we
learned from the model
is that we have problems to stay
in our specifications when the
river temperature is above 7
degrees C. So then then the
the question was why is
that? And the engineers
often...suspected is that
this was because of the
cooling capacity of the
coolant. But before we went
into the real trial, we
compared our SEM model versus a
thermodynamic model based
on Chem CAD.
And what it pointed out was
that both models are
pointing in the same
direction, so there were no
no real discrepancies
between the both.
OK, this made us in an
optimistic mood and so we did
some real trials
and with the best
settings, and let's say,
approved the the predicted
things from the models.
And so it turned out, as I said
already, that what the engineer
suspected that the cooling
capacity of the cooler is not
sufficient. And so when you have
at higher river temperature,
then the heat transfer is too
small, and so the equipment...
equipment doesn't really run
anymore. So the next step now is
to use these data from from this
study here to justify another
investment which builds
a cooler here with a better
heat exchange capacity.
So thanks to Laura and Chris, we
could set up here the investment.
If you have questions so
please feel free to ask now.
Thank you.