More Unsupervised Engineering: Hacking the Short Run Control Chart
So, let’s just say the quiet part out loud: If you’re strict about it, semiconductor manufacturing data as it comes out of the databases shouldn’t be on typical control charts. For many, many reasons. But fear not! There is a solution. Join Mike Anderson as walks through a case study showing the pitfalls of using standard control charts with semiconductor data and how to address them using the new short run control chart introduced in JMP 18.
Let's have a look at this problem here with control charting. By way of introduction, let's start with just something really simple. What is a control chart? Control chart is actually really a simple idea. It's a trend chart, time series chart that has some specific calculated limits and some summary statistics that when we abide by certain assumptions allow us to check and see when a process is behaving as we would expect it to over time. It doesn't really speak to whether it's doing well or doing badly. It just speaks more to the fact that nothing's changing.
But that is an incredibly powerful tool when we were dealing in manufacturing contexts. We want to know our processes are running straight and normal with no unexpected variation, or we want to be able to track drift and things like that. These control charts and are an integral part of statistical process control. They're seen all over factories and things like that, and they're they're one of the cornerstones for figuring out the health of a process as we're moving along in our manufacturing time.
When we look at some of those assumptions I mentioned, three in particular are going to cause the headache that I mentioned today. The first is that we expect to have a randomized data stream. We expect to be randomly sampling parts from different lots or randomly sampling sites on different parts and things like that as much as possible.
We also expect to be able to use appropriate summary statistics. There are some questions about there about using normality and things like that. Not going to dive too deep into that, but the first two are interlinked in the context of the third, which is really the problem that we're talking about today, this idea of no special cause variation. When I talk about special cause variation, these are things that you as an engineer as a scientist go, "Yeah, I expect to see that behavior because of x, y, and z." Well, if you expect to see that behavior because of x, y, and z, that's an assignable or a special cause.
Generally speaking, we want to get those out of the data streams, as much as possible. Sometimes we can't, but for the most part, we want to mitigate those special cause variations. The only thing we see in our control chart is just the main signal convoluted with the random noise of life. That's the idea that we're we're dealing with.
Why is this a problem in semiconductor particularly? To start off with, we run into this interesting problem that we have specific sites that we can measure on our chips, on our wafer. We got this block of silicon, this piece of silicon that we're inscribing wires and devices into. We've got specific locations on those chips that we can measure. We also don't want to measure every single location because that becomes extremely expensive for us to work with. We do these sampling plans. There's a couple of there on the middle, and on the right that you can see of different sampling plans that we use over time and just some examples.
That in itself isn't so much of a problem. It does violate that randomization rule that we're talking about. Ideally, we'd want to randomly sample across those available sites in that full wafer map that I'm that I've got on the left here. But it's not an insurmountable problem, but it does complicate things.
The bigger problem is when we look at the data that comes out of these. Let's look at this. This is a fairly typical set of data from a semiconductor process, full wafer map at the top, and two different sampling plans below it. Now, if we were to just look at this data coming out of the tool on the control chart, we wouldn't see anything wrong. We wouldn't see anything wrong. What we would see is that we have a fairly widespread and a fairly biased spread in our data points in our control charts. It's going to drive the mean into weird directions that we really wouldn't expect it to, and it's going to make it in some cases look off-target.
We can figure out why that is really quickly just by looking at what's called a radial transform. All I'm doing is I'm looking at the radius of each data point, away from the center of the wafer here. When we do that, we suddenly realize that there's something screwy going on here. We can see that in the full wafer map, we have this bump in the middle. We see it regardless of sampling plan.
For those of us that are in the field, we would go, "Oh, I know what that is." We might see it go up, or we might see a dip depending on the technology that that that the is generating the data, but we would go, "Oh, yeah. I know exactly where that's coming from," and that is an assignable cause.
In this case, the other two problems that we have: the sampling plan, the lack of randomization, and that assignable cause are leading us… If we look very closely here, if we were to take the mean… Let's set aside for the second whether or not this is the right thing to do. We're just going to use this for illustration purposes.
The mean of this data is around a 112. Call it a 113 with the rounding error. This sampling plan gives us almost a full unit high; or almost a full half unit high. Let's call it. But a different sampling plan can get us to where it's within rounding error of being the actual number that we would expect. Coupling a non-randomized sampling plan with special cause variation can lead to unreliable statistics, unreliable summary statistics. These three together are causing all kinds of trouble with one another. But the interesting thing is that because of that interplay that we're seeing, we can actually use that to fix the problem itself. It's an interesting repurposing of these problems.
To put it another way, if we had no special cause variation on a wafer, we would expect to see the topology on the left. We would just expect to see a flat static value with random noise. In some places in the fab, we do see this. There are places in the fab we do see this. We see this in plate. Sometimes we see this in CMP. We can see this in photo resist as well if we've got everything nice and dialed in with our deposition and spin code, things like that.
We can see that property in the fab, but when we come to our deposition tools or our etch tools, we can see this topology, which is a bit of a problem. Quite a lot. This is a little bit exaggerated, obviously, but it's this toroidal shape we see pop up all over the place. You may not first be able to notice it because the sampling plan sometimes masks it. It'll show up as a dot here and maybe a blob over here, but they all are on the same radius. It's always a good habit of transforming the data to go, "Are those connected somehow?" Just as a matter of course. But in this case, that's the problem.
Let's summarize the problem here just a little bit. We have little to no randomization. We have appropriate summary statistics, but they're problematic because of the lack of randomization coupled with the fact that we have special cause variation. A lot of times in the plant, we have special cause variation, and we don't account for it in our control charts.
Our control charts are therefore unreliable because the special cause variation can do all kinds of screwy things. It can either survive down the standard deviation and cause those calculated limits to suck in because our variation appears lower. We start getting false alarms, or it can do the exact opposite. It can blow out our control limits and make it look like everything's hunky-dory when, in fact, things are dancing all over the place and are very much out of control. We need to fix this.
It turns out that in JMP 18, we got a brand-new way to fix this. It's quite it's quite handy. Let's go over and have a look at it. JMP 18, we have this thing called the Short Run Control Chart. Now depending on the school you came from in where you got your statistical process control experience—be it Six Sigma or a formal training or one of those other programs—you may hear it called a Short Run Control Chart, you may hear it call a target chart, or you may hear it called a Z chart. They're all in the same flavor of chart.
The thing that makes them interesting is we're going to not look at directly at the average value, for instance, we're going to look at the deviation of each sample subgroup from that average. In this case, what I've done is I've actually made some pseudo products within the wafer. For those of us that are in the industry, this is actually not that that groundbreaking, this idea of segmenting a wafer into different sized rings, or quadrants, or different shapes when we want to monitor different things. What's unique here is we're putting it all in the control chart one control chart and treating it as one data stream because it is.
All those different parts of the wafers see this the same process for the most part if we talk about the recipe. They see the same process. They just see different parts of the same process. We really don't have the ability to control that fine toroidal shape. We have certain knobs on the tool that we can change, but for the most part, they're all highly correlated. We can't really do much about that correlation.
We can treat them as different products seeing the same process, and that lets us use the Short Run Control Charts. That's really handy because it gives us the ability to assign a different target value to each pseudo product. We're talking about pseudo products here. That in turn lets us model out that toroidal shape in the middle. We can basically say, "We expect no matter what we're going to do this middle section is going to be much bigger than the others. We're going to just assign it a higher target and make sure it's sitting where we expected to be in relation to the center and the ring, the center and the outer edge." That's the idea.
Now how did I do this? Let's start with the segmenting of the process. I'm going to pull in my pseudo products. This is just what they look like. I defined a zone that 0-50 for a radius and then 50-100 for a radius, and a 100-150 for a radius. I did take the time to make sure when I assigned those that each pseudo product had the same number of data points in it. Each of the in this case, each of them have five. That's about in the same range, about in the range of normal sampling plans for the fabs that I've seen. So, 13-15, but we want to have a divisible number for each zone. We have five groups in each.
Then in the wafer or in the data set, all I did was I went in and made a formula that says if my radius is 50, zone 1, 2, 3. Just to make it look pretty for everybody, I went in and assigned a value label to make it look pretty so that it makes a little bit more sense which one's which. That's all the magic that we have to do, really, to be able to make this work.
Now I went in just to make up the control chart look a little cleaner. I went in and created a wafer ID, and I'm using fake wafer IDs here. Then I put each product in as a subscript on there so that they're broken up in the control chart a little bit, so you can see them a little better. That's really all we need to do to get this done.
Now let me show you why this is important. By breaking up these rings, we can see that we have a center, a ring, and an edge. Each of those has their own target value. The edge is going to have… We're going to expect that to be somewhere around 308, 309, 318 maybe for this one, and 310 for this one. But this is what we're going to allow our control chart to handle. It's basically going to be plotting threes control charts in one graph using three different targets in one graph.
Let's go ahead and build this. To build it, we're going to come under Analyze, Quality and Process. It does exist under the control charts menu. I like building it from the Control Chart Builder. Let me make sure something's turned off. Good. Let's go ahead and just build this. I'm going to start by changing my control chart type to my Short Run. The reason for that is I now get my part or my product, is another way of talking about this. I can put that in this group here. I'm going to take my zone and drop it over here. You see, I've got my summary statistics. I've got everything when I'm on the road.
Now I'm going to go ahead, and I'm going to take that wafer and zone and drag it into my subgroups. I'm just looking at ten lots of material here at this point. You could do bigger. It just takes a little longer to render. Then we'll go ahead and we'll put my thickness in there. Now, by default, this control chart sets up with these little dividing lines between each product. I'm not crazy about that, so we can turn that off under the red triangle. All the answers to questions are under the red triangle. I'm going to turn that guy off.
Now some advantages of the Short Run Control Chart. The primary one, as I said, is that we can put multiple products with similar conditions that we would expect to behave in a similar manner on one control chart. That's one massive benefit of these. The other is that because of the way these things are set up, the control limits are insensitive to anything other than the sample size, so they don't change, which is really handy for these long control charts. Your controls, as long as your sample size, in this case, 5, as long as your subgroup size stays the same, those limits will not move, which is wonderful because you don't have to recalculate them.
That doesn't change if we work with either the center or the standardized version. The standardized version goes through and divides by the standard error. It's a Z. You notice it's changed to a Short Run Z chart up here. That's where the Z comes from. We're instead of plotting just the difference from target as with the centered chart, we're plotting the Z score of each subgroup. Some cases that has more value depending on, how closely matched your standard deviations are. You can look in the documentation to see how that would work out.
But, again, these are incredibly powerful. The advantage here, you can see really clearly this pops out, this shift. We can look at it and go, "These three groups appear to be from the same wafer." Those are from lot 6 wafer 5. If we look down here, oh, dear, lot 7 wafer 1 had a big old jump, and it stayed there through two lots and then bounced back up. We can start seeing the trends and patterns that indicate help or changes in our process much easier with this.
More importantly, those trends and patterns are all can also show us changes in the way for topology. If we're doing something that is specifically trying to drive down that middle section. Maybe we've got a new tuning on our tool. We can tune that down, and we can see if that middle section starts going back towards to where it's with the rest of the wafer. We can see those things now. We're pulling out—we're modeling out, basically—that confounding variable, that correlated factor that we built in.
Again, this is incredibly useful, easy to work with. One more time, combining similar smaller product runs with common conditions, but different expected values. That's one of the values of a short written control chart. In our case, we're repurposing it to be able to model out some special cause variation that we see to get us back to a condition where we're not violating the assumptions of the control chart itself.
If you want to have some additional reading, the documentation's very good. There's also a really good book on this, called Innovative Control Charting, came out in 2006. If you read our documentation, look at the notes, that's the primary reference for our dataset or for our methods in this particular technique.