In a production environment, control charts are indispensable for differentiating common cause and special cause variation. With established process, the major focus of control chart reviewers is identifying and responding to large deviations from the expected process performance, along with a minor focus on drift toward the upper or lower control limits.
Small periodic shifts in the process are easy to miss, and even when observed, are often ignored because they only represent a small amount of directional noise. Identifying and investigating these shifts provide an opportunity to identify accidental manufacturing experiments and input changes that have not caused a problem yet.
The toolkit for finding these shifts has traditionally centered on the use of cumulative sum (CUSUM) and exponentially weighted moving average (EWMA) charts. Both methods require some level of parameter tuning to “correctly” identify shift boundaries. Updates to the JMP CUSUM and EWMA platforms have made iterative interactions with these methods much simpler. In addition to the traditional methods, a machine learning method using a fused lasso approach is also effective for quickly identifying shift boundaries.
In this paper, CUSUM, EWMA, and fused lasso are demonstrated and compared for their ability to detect small shifts and ignore spurious patterns in noisy process data.

Hey, hello. My name is Byron Wingerd, and I'm one of the JMP systems engineers. I usually work with pharma companies on the East Coast. Today I'm going to be talking about detecting small shifts. This is looking at process data or assay data and trying to understand when little shifts happen and how to detect them. Going back in data, also retrospectively, looking at data and trying to understand when shifts might have happened in the past. We're talking about finding signals in the noise of data that's captured continuously.
I've got some warnings for you. Your process doesn't love you. Your system, whatever your process is, if it's a measurement system or a manufacturing system, purifications, whatever that process is, it's more sensitive to the input variation than your incoming quality control methods. What do I mean by this? That stuff you're looking for, like purity, that's great if you can detect it, but you're not going to be able to detect things that you're not expecting to find.
On top of that, C of A's from vendors, especially from chemicals, sometimes these just get copied from lot to lot. There's not a lot of warning about small changes in your incoming raw materials unless you've got crazy sensitive methods for detecting those. For the vast majority of people, those just don't exist.
I got another warning for you. Your tracking and trending programs? They don't love you. Part of that is our fault. We've become complacent. Everybody talks about if you have too many alarms, people start to ignore them. But the other side of that is there's no alarms. Everything looks great. Everything's in control. We don't have to worry about anything. We stop looking carefully at the data also. Between this problem of we can't detect things that are going to affect our process, and also that everything looks great, I'm not looking that carefully, we can run into trouble.
Why do we use control charts? Why do we do SPC, CPV, whatever we want to call it? The number one thing is we want to differentiate common cause from special cause variation. It's a mechanism for responding to changes in the process we detect. With statistics, we can be pretty sensitive about what we're finding. We can also automate alerts so we can find things when, I don't know, we're not paying attention.
But this whole tracking and trending, it's not really optional. If you're in pharma, it's required by law. If you're doing basically any other kind of manufacturing, ISO9000 talks about having this also. It's not optional. We should all be doing it and watching our processes.
Another warning, your process might be quietly warning you about things that are going to come and get you later. Here's what I mean. There are these accidental manufacturing experiments. There's close calls. Like, oh, almost got out of control or almost lost control of that batch. There's weird stuff that isn't out of control, but you see you can find small changes in the data. The problem is you can't see what you're not looking for.
On the other side of the coin is you don't want to be fooled by randomness, just chasing down random signals. This is the whole point of control charts. We only want to respond to special cause variation, but not common cause. We're in that margin, like looking at that data in between.
What I want to talk about is the application of some tools that are in our toolbox for detecting small shifts. We have CUSUM charts. These are fantastic. Ideally, these would be used prospectively, looking for changes that have happened. It's looking at the cumulative sum of moving up or down in a process. We have different methods for interpreting those. Another really good one is EWMA or exponentially weighted moving average charts. These are really good.
Then we can get into this idea of change point detection. This is where some things in multivariate, there's something called Fused Lasso we're going to talk about, and then maybe even using some tree methods like partition, predictor screening, bootstrap forest, to look for small changes, shifts, and drifts that are happening. We have some great tools in JMP to look at these, and I'd like to talk about some of the key ones.
CUSUM, it's a great tool for detecting small shifts in a process, especially drifts that are occurring slowly and then start to accelerate. We can find those by looking at patterns in the charts that pop out really obviously. In the past, there was all this thing of the V-Mask and you're looking at the V-Mask and where the data is going in the V-Mask, and it felt like handwaving a little bit. It works. It's totally solid. But in JMP, we rebuilt that platform a couple of years ago so that it looks more like a regular control chart. It's very handy to use.
That control chart, the CUSUM, uses some parameters. There's the Sigma, how big a difference do you want to detect. There's the head start, like the fast initial response. There's an H value and a K value that are looking at how big a change we can detect and what's defining the Sigma that we're using, the K-Sigma interval. Some parameters that need to be tuned to dial that in really well.
Now, another control chart we mentioned was the EWMA. This is a control chart with limited memory where CUSUM is looking at cumulative sum. EWMA has memory, and it's like, how far do you let the previous observations influence it? You can weight that memory exponentially. Again, it has some parameters. It's got the target, the Sigma, and especially the Lambda. The Lambda is that smoothing, like how far back in memory do you remember everything or nothing or something in between in that waiting.
For both these methods, and these are really common for finding shifts and drifts, it takes a little bit of background and expertise to use these. You have to know how to dial them in to detect exactly what you're looking for, because you might detect things that aren't there or miss things that are there if and not make really accurate calls if you're over adjusting the parameters in one way or another.
Wouldn't it be nice if there was some way that we could have some, I don't know, machine learning-ish tool make those calls for us that's a little bit less biased by our background and what we understand? Well, that would be pretty cool. There is a method that can help us do that, and it's called Fused Lasso. This is part of the toolbox, the advanced modern regression methods. It's a tool we can get at using generalized regression. Generalized regression is a feature in JMP Pro. It's a penalized regression method.
In the past, people have talked about this as a method for finding change points. It gets down to some complicated math and some complicated-ish ideas about how it finds where those change points might be in the data. I just have to acknowledge, 10 years ago, Clay Barker presented this idea in one of his talks on Genreg, change point detection, fused lasso.
I think it's one of those things where it ended up being a footnote, and it's really important. I've been using this for a couple of years, but I've seen that not a lot of people know about it. I mean, I hang on every word that Clay Barker says, he's my hero. But maybe not everybody else.
At this point, you may be saying, "Hey, Byron, I need you to pump the brakes a little bit. Advanced modern regression methods? Genreg? I feel a little despaired because you're talking about like JMP Pro and some really complicated math that I don't want to follow." It turns out this is only a little bit complicated. It's relatively easy to set up, and you can set up a way to detect signals in a way that might have some advantage over the simpler methods.
Just some background. Something that people have been using for a long time is something like the partition platform in JMP. If you put in time as your X and Y as whatever your responses you're looking at, you can use it to find places where the mean has changed over time, especially if you're using some regularized schedule, just like row number or something. It works pretty good at finding where your process data might have shift, like the mean shift might have happened. It can find some of those breakpoints. It's pretty good at it.
There's a couple of little problems. If the breakpoints are too close together, it can't find them. If your noise is too big, it can't find them. If you have a lot of them, it might only find the first one and not later ones. But it's a really cool idea because what you're doing, just like the fused lasso does, is trying to find neighbors of groups that have means that are similar and then find differences in the means. That's exactly what is going on with partition.
Going back to this advanced regression techniques, what we need to do is set up a special data structure with a really simple, and by simple, I mean short, not actually simple, little JSL statement that will build a matrix for me. What this matrix looks like is this here in the slide. Imagine a diagonal matrix where the first column has all ones, and the next column is zero, and then all ones. The next column is zero, zero, and all ones. We have a column for every row that we want to try and look at in our data table.
If we've got a hundred rows of data, we get a hundred columns. If you've got a thousand rows of data, you get a thousand columns in this diagonal matrix. For simplicity, later on, I'm going to call it the discovery matrix, but that's just a totally wrong made-up term. It works for me, though.
Setting this up in the past, we could use this discovery matrix, and we could use the partition platform, and we could do the same thing. It made it much more sensitive for detecting the breaks because we would find this column, this corresponds to a change in the mean. That column is the row in the data set. It worked really well, and it was pretty simple to set up.
JMPs moved a lot in the past couple of years. One of my favorite things in JMP is called predictor screening. What predictor screening does is it builds a lot of trees and then averages them together.
Okay, so here's the key thing. In predictor screening, it's going to make, say, 100 trees, and it's going to take a random sample of the columns from that discovery matrix and a random sample of the rows from my data, just looking at a random sample of my Y I'm trying to explain. It'll build a tree. It'll do that again and again and again and again and again.
What happens is that that part where one shift might be that's close to another shift, those don't end up in all the trees. We can find those other shifts that are close to each other. We can find lots of shifts across that data set. This solves some of the problems that partition had of not being able to find things close together and not being able to find more than a couple of shifts if they're happening. This lets us find a whole lot more.
I set up a sandbox to test out how well this actually works. How small is small, and what does my sandbox look like? Well, I want to find things that are a 1-2 Sigma shift. I want to find systematic changes. This is like where I'm going with the accidental manufacturing experiments or changes in raw materials that happened. They didn't take us out of control, but they might have shifted this up a little bit. When that lot of material goes out, it shifts back down again. These are tiny shifts that are happening as our process is running, and this lets me forensically go back and diagnose some of it. I want really small things that are happening.
You can imagine something like this, where in my sandbox, I've got about 200 rows. In the first hundred rows, I've got a shift at 33 plus one, and then I've got a downshift of minus 1 at 66. Let's say I've got a mean of 3, and I'm shifting up to 4, and then back to 3 again is my mean. I've got noise associated with that. This is the really clean, pretty picture. You look at this and you're like, "Well, Byron, you start looking at data like that, I can find that all the time." Okay, but I'll tell you the catch at the end.
In the blue range of this is I've got no shift. The first 100 rows, I've got a defined shift. Second shift, there's nothing. This is where noise is. I shouldn't ever find anything in this section, and I should always find a shift at 33 and 66.
This is pretty bold to say that I've got a method that can do this when the signal is easy to find. But let's say my data looks something like this. Let's say I've got several thousand sets of data. These are all plotted on top of each other. There's a haze, and I always want to find that shift in 33 and 66. On average, you can still look at this data here and like, "Yeah, Byron, I can totally see that's there." But some of these are going to shift really cleanly, and some of them aren't going to shift, and some of them are going to have just noise or walks just by random chance alone in that blue section over there. How good am I detecting a change versus not detecting a change?
Here's an example to make this a little bit more clear. At about 33, that first shift, I can see just looking at it, hey, something went up. But that second shift, it's like a trend down. It's not real clear. In my no drift or shift towards the end, is it going up? Or is there a saw-tooth pattern? I'm not sure. These shifts might not be really easy to see or detect.
Okay, so I'm going to get jumpy here for a little bit, and I'm going to show you what this actually it looks like in real life, and then we'll come back to evaluate some of the results. All right. Here's my toolbox. I've got a random sample of data, and this one here, it has that shift up, and it's labeled with a red line and a shift down labeled with a red line. This is just a control chart of the data. Just looking at this, I would have a hard time saying, yes, there's really something there.
I come down here, and I look at a CUSUM control chart, and the purple lines are where shifts might happen. Here we can see pretty clearly lined up with the red, CUSUM is detecting that shift. If I change the K and H and Sigma values in here, I can make this more sensitive. Let's make this be 0.15. Now, by changing the K I'm looking at, I can bring that in a little bit smaller. Let's change it to even smaller. I'm not finding 66, but it's finding something out here, and that's out here in this 80-90 range. It says something happens. But this is a good test.
I can come down here and here's my EWMA chart. Let me make this a little bigger. I have my Lambda slider, and I can change. If I go here, there's no memory at all. If I go here, there's all memory. It's pretty much exactly like a CUSUM chart. I said that, and I know there are people who do control charts a lot who are going to listen to that. Please don't kick me out for saying that, but it's what it is. Here, by decreasing this or increasing Lambda, I can get change points right around that 33 identified, but I can't really find anything out here around 66.
Let's look at the third method. Here I'm using predictor screening, and I've just made a little Pareto chart. In predictor screening, I've set up my discovery matrix. We can take a look at the data set really quick. No, I'm not going to look at the data set. Trust me, it's there. What I see here is the first guess. It says there could be a shift at row 33, and guess number 2, rank number 2, is 65. We see something out here at 142, and then 35, and then 40, and 80, 70. Our first couple of guesses are hitting right on target.
What I've done is I've drawn in on this graph a blue line for each one of these rows that are indicated in the predictor screen. The proportion that they explain is the thickness of the blue line here. It's saying, I think we see something around 33, and I think we see something around 65. This does a pretty good job of finding that shift in the mean as it's going along, because you can see here, there's this point right here, and if you were to ignore that point completely, then it would look like there was a shift of this stuff relative to this stuff. But our eyes catch that, and the other methods catch that one, too, and it tries to include it. But it wasn't included in all the trees, so it let us find that shift, which is pretty cool.
How frequently is this going to work correctly? I would want to know that. I'm going to come back over here to PowerPoint. Let's get this thing going again. What I did was I simulated thousands of these individual data of set. Each one was equally noisy, and some of them were easy to detect, some of them were hard to detect, but thousands of them. I ran the prediction profiler against each one of those sets of data, and I captured the first 10 ranks, and I checked to see, did you pick the right row or not?
This is what I found. The majority of the time, we're picking really close or right on the shift points that we've defined. Very infrequently, we're picking points that are out here in that noise past 100, where I know for sure nothing's happening. It looks like it's working pretty well.
Let's look at some of the stats. This is with 8,000 different sandboxes. By the time we get to discovery, this is probably going to be a bigger number. Scored with predictor screening, we only use 100 trees, and we looked at the top 10 ranks, and we had a window of plus or minus 3. 33 plus or minus 3, and 66 plus or minus 3. Did it pick in that range? For both of them at the same time being in the top 10, 95% of the time, yes. It found 33, the shift at 33, 90% of the time, and 66, 92% of the time. It's doing a pretty good job of finding these, which is fantastic.
In this graph, and I'm trying to express this idea of sensitivity and specificity. This is in predictor screening, we get the first 10 ranks, the first 10 things. I'm asking, is it in plus or minus 3 window or within plus or minus 12 window, what percentage of the time? The first pick was picking it almost always. In the noise, the first pick, or especially the first five, was very infrequently showing up. It's a pretty good, pretty sensitive method to use.
The conclusions. One, the small shift detection is really accessible by multiple methods in JMP, but this idea of using the fused lasso approach is fantastic. A small Easter egg is going to be in JMP 19 Graph Builder, one of the line personalities. I don't know it for sure, but check that out when 19 comes out. These advanced methods are pretty easy because you can just use a script to generate that discovery matrix. Without any tuning, this fused lasso approach works really well. Just a little skill and a lot of opportunity.
Is there going to be a script that I can use to do this myself? Are you going to share that Sandbox tool? Yeah. By the time Discovery comes around, there'll be a really nice add-in. You can pick your data, use the Sandbox tool, Explorer tool for multiple shift detection methods. Also, I'll republish the script that Clay had used to make that Discovery matrix. Hopefully this is useful, or at least entertaining. I hope to see you guys at Discovery. If you see me out in the hallway, say hi. I should be there.
Catch you all later. Stay jumpy.
Presenter
Skill level
- Beginner
- Intermediate
- Advanced