Analysis of a Distillation Process Stream Using JMP® Pro 15 (2021-EU-45MP-766)

2 Kudos

Level: Advanced

Stanley Siranovich, Principal Analyst, Crucial Connection LLC

In this session, we will analyze actual process data from an industrial source. Our dataset consists of 253 snapshot measurements of 27 variables from a distillation column measured over two and a half years. For independent variables we have 15 temperature columns, four flow columns, a pressure column, and five calculated columns. For the dependent variable we have the Reid Vapor Pressure.

Several characteristics make this dataset particularly challenging:

Seasonality, as the refinery transitions from winter to summer blends and back again through two and a half cycles.
A reboiler, which takes the stream after it leaves the column, increases the temperature, and returns it back to the column.
Non-uniform sampling intervals.
Numerous outliers, mainly on the high end of the range.

Because of the above and because we want to understand the process, not just predict an outcome variable, this analysis should prove instructive to those working in other industries. Finally, in order to better demonstrate the analytical flow and successive discovery, we will conduct most of this session as a live demo.

Auto-generated transcript...

Speaker	Transcript
Stan Siranovich	Good morning, good afternoon and good evening everyone. I am Stan Siranovich. I am the principal analyst at Crucial Connection LLC.
	And we are located in Jeffersonville, Indiana, right across the river from scenic Louisville, Kentucky and today I'm going to talk to you about an analysis of the distillation process stream using JMP Pro 15.
	Ok. Let's give a little context here. What what I've downloaded is some sensor data from a small Canadian oil refinery.
	The data was sampled several times per week so so the individual records are about two to three days apart, and what we want to do today is use the data to predict vapor pressure.
	Now vapor pressure has to be measured in the lab we can't measure it online so somebody has to go out into a plant or a refinery
	that looks a whole lot like this one. They have to go out, find the valve, take an actual physical sample, bring it back to the lab and measur it.
	So would be nice if we can build a model to predict what the vapor pressure is, so that is job number one. Job number two is to build a model so that we can understand our process and I will talk about the difference between those
	those two approaches a little bit later in the presentation.
	So this is the data that we're going to be working with. I downloaded, it is a csv from the website and I will provide links to all
	to the data
	later on. And when I downloaded it and imported it, it looked like...let me find it.
	It kind of looked like this. We had the date in the first column and then a whole conglomerate of temperatures, we got Temp1, FlowC1, TempC2.
	We have pressure and we notice there's apparently two categories of temp, one with the C appended
	and another category without.
	And we don't know the locations. That's...that's another issue, so we can't make any assumptions here.
	We do know, however, that there is a reboiler which takes the stream after it leaves the bottom of the column, increases temperature and returns it back to the top.
	We also know, know the Temp11 is actually the difference between Temp3 and Temp9, so the system that we have is something that looks
	like this. We have sensors, we have the crude oil coming in the bottom. It's heated, vaporized and it's in reflux. We take the gasoline off of the top, these vapor pressure we want to measure.
	And then the heavy portions with the higher boiling point, they settle to the bottom, and we siphon those off here, put them through the reboiler and put them back into column...into the column to increase our efficiency. So let me close that one.
	And what I did with some rearranging.
	The way I prefer to rearrange my data, before I start the data analysis, is to just drag the columns in a column window, rather than scrolling all the way across the top and back and forth Excel style. I prefer to make a selection here,
	drag it up and the columns change places. So let me close that. I went through
	some of those changes, and this is what I ended up with. I put the dependent variable, the vapor pressure,
	in the first column, and then the date, and then temperatures, the inverted temperatures, and some other categories in a logical order. And I'll talk about that a little bit too. It makes it a whole lot easier when you go on to the next step.
	So
	so much for the context in orientation. Let's start with the actual work here. Start off with Graph Builder.
	And we'll go to Graph Builder and we get a window that looks like this.
	So we're interested in the vapor pressure so put that here, and let's see what it looks like over time. So we'll put that in the X.
	Let me make that a little bit bigger so it's easier to see.
	And right away, just after that simple step,
	a whole bunch of stuff comes to light. First of all, we apparently have some seasonality here with the higher vapor pressures being around the month of January, wintertime in the northern hemisphere, so that makes sense.
	And in the winter, tends to be lower. And what else did we see here? Well, looks like there's a slight upward trend maybe, and we can check that out too, but we also see that we have a whole lot of outliers.
	So if we hover over it, we get some data that's associated with with that particular point. We can see at the bottom
	we have the date and the vapor pressure because those are the values that I dragged into the graph and we also see Temp3, Temp9 and Temp11.
	Remember that Temp11 was the difference between 3 and 9, so what I did was label them, so JMP put them in automatically. So let's examine that a bit further.
	So we can do the control on a Windows system. It's control click; we'll select that one. Let's pick some other outliers here.
	Looks like we have a couple here almost coincidence. Got them both. Same up here.
	And the rest of them, I think I'll just leave them go. Well we'd like to get a little better idea of what's going on with those outliers, so let's let's go up here to tables, subset.
	And we, we have the selected rows radio button already selected, and if we click on that,
	we get a new data table, and if we want, we can delve further into the intricacies of the outlying points.
	The other thing we can do is
	save that. We could, if we wanted to, we could
	click here and we could save it, subset...we could...
	we could do some other manipulations and we could also save the script but for right now, let me just close that and we will move on.
	Okay, we also saw that apparently, we have some upward drift here, so what we can do is get rid of that line and put in a fit line and sure enough, we
	noticed some upward drift over the last two, two-and-a-half years.
	And we've also got some confidence intervals that JMP would enforce automatically. And if we hover over the line and right click and select line of fit,
	we get another menu. And let's see...well, let's check the R squared on that, and it put the R squared for us right up here in the left hand corner of the graph. And we notice it's about .09 percent, small but let's check the significance on that. Go back to the line, choose line of fit.
	And we'll choose F test. And it turns out that that R squared is small but the drift is significance...significant right here 0001 .0001.
	And to get a little more information let's revisit that particular menu and put the equation in, and we now have the equation, and we did all of that,
	one window. So go up here, click on the red triangle. We could save the script if we wanted to, but let's just, for right now, come up here.
	We'll minimize it and we will close it. Now another thing we can do is come up here and click header graphs and if we click on that,
	we see with what a bunch of graphs up here and actually they are histograms, it gives the distribution of the columns underneath it.
	And if we look very closely up here we can see that our outliers are still connect...are still selected and
	all these graphs are connected and interactive. And if we just scroll over, we can see, for example, now here the the outliers look like they're associated with with lower range. It's here in Temp5 and that looks like the lower ranges in Temp3 also. We can continue our scrolling.
	Over here at pressures, we have some flows. Let me move the window so I can see what's what's going on. And scroll a little bit more and we come upon down to our flow areas and it looks like Flow3, excuse me,
	FlowC3 and FlowC4 tend to be, well should get rid of the...if you get rid of the...some of the taller bars they may be, almost a uniform distribution.
	Now, one of the things we can do here...