Scripting an Interactive Tool for Exploration of Historical Throughput Data
Semiconductor factory capacity modeling involves maintenance of throughput values (as parts per hour [PPH]) for thousands of process recipes. Processing data is stored for each run, providing a rich pool of historical data that can be used to determine PPH values. In capacity modeling, inefficiencies that degrade capacity are isolated from throughput, so modeled PPH should represent the highest capable performance under continuous operation. These requirements complicate the process of sifting through historical data and special care is needed by process experts to identify non-steady-state runs that should be excluded from the analysis.
Using JSL, an interactive tool was created to assist in analysis of historical throughput data to evaluate, select, and document PPHs by recipe for use in capacity modeling. The script constructs a fully interactive, self-contained dashboard application that utilizes JSL-enhanced platforms and custom controls to assist in exploring sources of variation and isolation of steady-state runs. PPH value selection is made and documented directly in the dashboard, and finally, a summary of the selected values and distributions can be exported to PowerPoint for a complete analysis report.
This presentation includes a demonstration of the dashboard tool and walks through parts of the script.
Hi, my name is Michelle Terwilliger. I'm an industrial engineer for Texas Instruments at our South Portland Maine location. Today, I'll be talking about an interactive tool for exploring historical throughput data made using scripting in JSL.
Before I talk about my script, I want to share first a little bit about factory capacity modeling, just to give some background on how we're using JMP to do it. We start with our information about a toolset that we want to model capacity for, including the throughput by recipe and total toolset hours for that toolset.
We then apply a Start Profile that we're considering running in the factory, and we find our passes by recipe, and we aggregate that through the throughput by recipe to get required production hours in order to run that material.
We then take the inefficiencies that we expect to have for the tool set, and we use that to scale the total tool set hours, which gives us the planned actual production hours. Then the ratio of these two values gives us percent loaded. That tells us whether or not we can run that planned profile or if we need to add or remove any wafers.
For this presentation, I'll be talking about throughput by recipe. This is something that we measure in parts per hour PPH. You'll hear me say that a few times. Throughput values are stored for pretty much every recipe that we have in the factory. That's thousands of individual recipes. As you can imagine, that's quite a bit to store and maintain and keep up to date over time.
I also want to point out that the inefficiencies are completely isolated from that throughput. When we're thinking about throughput values for the model, we want to try to capture values that are representing representative of the highest possible performance that that recipe can maintain, so modeling it without any inefficiencies.
As we think about selecting values for the model, there's a few conditions that we want to keep in mind. We want the highest capable performance, no inefficiencies. We want to represent steady state conditions under continuous operation. Just a reminder that the inefficiencies are represented elsewhere in the model, so we're not ignoring them.
Now, this is very difficult or impossible to replicate in a manufacturing environment. Rather than trying to run tests that reproduce this scenario, we use historical production data to try to find insight into what values might be if we were to run in this scenario. That's where JMP comes in.
We use JMP to visualize and explore the sources of variation. We use it to exclude runs with known inefficiencies, isolate the runs that are representative of steady state conditions, and then document those results.
As we approach a data set for review, we'll have a sample of throughput measurements for a given recipe, and we can assume that within this, there's going to be losses from many different sources, something like tool differences, differences between recipes. The influence from a previous run can have a lot of effect on what the next run looks like. Load timing, number of chambers used, et cetera.
Given that, we should also have measurements that those losses approach zero. If that happens, should result in an upper limit to the observed throughput when all of the losses are as close as they can be to zero. It turns out that that is the case. We really see quite a significant plateau in our data at an upper limit here. This is what we're looking at for that high the best capable performance that we want for modeling purposes.
There's a process to selecting PPH. We'll start by isolating the steady state runs, keeping in mind that some runs might appear faster, but they occur in situations that can't be maintained over multiple runs, so they're not steady state.
We then want to remove the runs that have identifiable sources of throughput losses, so that would be items like we see up here. We want to select a PPH value for the model according to that observed upper limit. We tend to use 98th percentile as a suggestion, but there's obviously wiggle room there.
Then we repeat this for all recipes and export a report of the findings. I'm going to move into a demo. This is an example table that we would export for run data from a tool set. You'll see each row is one run of various recipes across a variety of tools. We have some PPH measurements taken with a couple of different methodologies. Then further over here, I have some of those factors that influence variability.
I'll run my PPH tool, and we'll start with this column selector. I'll use Takt PPH for this tool set, and my sources of variation are grouped here. We're going to look at tool, total wafers, chamber count, and previous recipe spec.
I'll run this. This is our main view of our dashboard. Up here, we have some global controls. This panel contains distributions of each recipe spec, and you'll see below each one, I have some custom controls that apply directly to the recipe spec above it. I have a summary table down here that contains information for each recipe, and this will populate as we do a review.
This panel is for analysis. Right now, it's pretty empty, but you'll see it will populate as we start to click on some of these options here. I've got time plots for each recipe, and then finally, just data filters over here. We'll start with this first recipe.
First thing I'll do is filter to it. You'll notice that this predictor screening changed when I did that. Now we can see that for this recipe, the biggest factor contributing to variability this tool. I'll click on that and then click variability plot, and I can color by selected.
Now we can see within this data set how it varies across tool. What I notice here is that pretty much every tool is capable of hitting that upper limit that we noticed. I would feel pretty comfortable placing my model PPH value at this limit.
I should point out in this example, I've got a line here for model PPH. This would be the previous value that we were modeling for this tool set. In this case, we're way understating the capacity for this tool. We're saying it can run at 35.1, but actually it's capable at 37.48.
There's a whole gap here that we're understanding capacity for the factory when this isn't set correctly. It's important to find that proper value so that we can really model the correct capacity for the factory.
When I'm thinking about selecting this value, another option I have, if there's some points here that seem like they're flyers that you want to remove from the data, I have a button called trim, which will take off the top and bottom 1% of data.
We could select either the model PPH or this recommended 98% PPH, or we have the option to enter a custom value that you can type in, you can move it up and down using these buttons. Then once you feel like you've found the value that you want, you can select it down here, click it there, and then hit Select. You'll notice that it populates down in the summary table. We also could add notes if we want to, and those will populate here.
Now I'll move on to a little bit more of a complicated recipe spec to show what it's really like to dig into data like this. I'll start by filtering. You'll see that this panel updated. Now, our top contributing factor to variability in this data set is previous recipe specs. What was running before I loaded this lot of wafers?
I'll look at that variability plot and I'll color it. To me, this one stands out quite a bit. It runs faster when it's running after recipe B. We'll notice we're actually looking at recipe B. That satisfies that steady state condition. We're running the same recipe one after the other. That's the most efficient way to run this recipe. That is what we want to capture in the model.
I'm actually going to select the rest of them, and I'll use this button to just hide and exclude, so we can get rid of that source of variation that's known and identified. Now, looking at our predictor screening tool is, once again, the biggest contributing factor.
I'll just take a look at that. To me, it seems like most tools are capable of hitting that high value. I would feel pretty comfortable, once again, with selecting this 98% recommended value. I do notice that tool 6 has quite a significant tail compared to the others. That's something I might want to make a note of so that I could talk with a processor equipment engineering, and see if there's anything they want to check on that tool. I'll put that note in and I'll select my 98% value.
Now let's say you're doing this review, and you want to either revisit a previous review that was done or you didn't finish, and you want to import a partially finished summary, we can click on this button and import a summary table, which would be a saved off version of this table, and it will actually repopulate the table with all of your previous values, as well as updating all of the selected values in the distribution. You can just pop back in a review that you had done previously.
Then finally, once you feel you're done, and you're ready to export your results, this button here will create a PowerPoint that has a table of all of the values that you selected, your notes, where they came from, and then thumbnails of each recipe specs. You could use these in another import, and then this table can get imported right into the model.
Moving back to my slideshow. Rather than step through the entire script for this tool is quite extensive. I'm going to focus on three distinct features that I think really bring it all together and make it possible.
First is, when you're working with display elements created using a by variable that can get complicated to work with, trying to interact with specific elements within the group. This would be my list of distributions by recipe. How do I select which one I want to interact with?
My solution for that is to, when I create that distribution, I'm naming it, and that gives me a list of references to the individual display element. I can use that with a loop or directly reference by index to those.
Second is managing data. There's a lot of different values associated with recipe specs or different aspects within this report that need to be maintained and passed between display elements and other places. The solution for that is another index list by recipe name that maps back directly to the index list that we had up here and uses associative arrays to store the data by name.
Third is for the interactive elements, they need to identify which piece of your distribution, which distribution you want them to interact with when you click on them or when you interact with them. They need to be tied back to the specific recipe spec. When you create them in a loop, they don't retain the index value that they were created with.
The solution is to give them the ability to self-identify what their context is by storing a reference to the parent element and then using the X path command to extract the index value where they reside. I'll step through some examples of all of those things to hopefully help make that a little bit more clear.
Here's my distributions created with a by variable, By recipe spec, and you'll see I've named it dist_right_here. Then that actually contains a list of all of the distribution elements that make up this display. Then I have my recipe names stored also in a list which map back directly to these.
You can loop I'm going to loop through this list of distributions and do things like populate your data by recipe spec. Right here, I have an associative array, and I'm populating it by using the fourth column of my summary data table.
That's giving me my quantiles, and you can see that each individual recipe name points to a value. You also can interact with graph elements within the same loop. Here I'm making some changes to the access, and in this case, I'm adding a reference line.
To do this, I'm using a nested naming convention here. My recipe list stores the name, and that gets nested inside of the associative array. That references the name and that gives the value. That nested technique here is really a key feature of being able to pass things back and forth between all the different places that need access to that value.
Next for the interactive elements, those get created within the same loop that we just talked about. They get appended as siblings to each of the individual distribution elements, which I reference here with an index.
Here's an example for the Select button. There's a lot of code going on here, but it starts with this path checking display path back to the distribution box. I have a reference stored to the parent container that holds the distribution when it was created so that this function can step back to this, and it will give you a text that contains a number that will be the same index number at which that display element exists at.
When you click the Select button, it's checking where it is, and it's giving an index that maps and says, "Okay, I want to select this for recipe A." Then that recipe, whether it's the index or the name, can get passed to other functions within there. You can do things like interact with specific rows in your summary table by that recipe name. You can change your display elements. You can access data values. Anything you really need to do with that recipe name can be done in other functions once you've identified what the name is for when you clicked on it.
I actually have another demo that I'm going to go through. This is a smaller, shorter example that is made using JMP sample data just to show a more a simple example doing some of the same things that I have in the bigger script. I want to emphasize that this is a lot more simple than it seems. This is only 73 lines of code if you take out the spacing in the comments. This one I will go through step by step.
I'll start by running the script. It opens up this consumer prices sample data. We have a summary table that has each of our different items, the mean price, and then a selected price, and underneath each distribution, just a small little interactive element here.
You can default to the mean value, and you can move this up and down, or you can select a value and then select your price, and that gets populated here. Just a little example to show a similar thing to what I've done in the larger script.
Let's step through this script. Just starting off with the basics here. I've got declarations that are my list of items, my associative arrays for mean price, a temporary price, which is a placeholder to keep track of what's in this value. Then my selected price, which is used to grab the value that you've chosen from here and pop populate it into the summary table there.
I start by building the summary table and changing some formatting, adding a new column. Then I build a display window, and inside of this H-Splitter box, I have a reference to that summary table that we just made, and now I'm creating my distributions right in there. You'll notice, once again, this parent reference to the scroll box that holds the distributions named in a reference here. This is the entire thing used to create the display box that you're seeing.
Now, right after that, I'll create my list of items, which is going to map back to the individual display elements within this list of distributions. Now I'll do my loop, which is the same loop that we talked about previously. I'll start by populating the mean price, which comes right from the summary table, and then the temporary price is the same.
Now I'll change the access formats for each, and I'm adding reference lines to that mean value using that nested technique that we talked about. Now I'll append as sibling to each of the distributions. I got a bunch of containers here just for spacing, and then my number edit box, which is the actual box that you can type in and move.
When that is interacted with, it will execute this function that's inside of it. The first thing that's doing is checking its path. I've got it set to show the values there. If we just pick one and start messing with it, you can see in the log down here, it's going to spit out the X path. It will extract the index and give the name.
As we do different ones, you can see it's giving you different names based on what you click on. After that, it is just accessing the access box in order to interact with it. We're removing any old reference lines just to make sure that we're not duplicating the reference line each time we do this, adding back any that may have been removed in case if any lines overlapped, they would have been by that previous one. That's what's happening here.
Then the temp price will get updated by item name with the value that's inside that number box. Then we'll add a reference line specifically for the value that you're selecting. Now this is the code for the Select button. Just like the number box, we're checking our path back in order to extract the item name and the index.
We're going to remove any previously selected access lines, adding back any that may have been deleted, update the selected price with that temp price that we just created when we interacted with the box earlier. Then we're going to update the summary table with the price that we just selected, and then create a bold green line just to indicate that that was the selected value.
That's all there is. I hope this didn't seem too complicated. I think most of it is actually just interacting with the access and adding reference lines and anything that actually creates the functionality of the interactive elements is really pretty simple.
This script, as well as the sample data and the larger PPH tool script will all be uploaded with my presentation content. Those should be be available to access if you want to mess around with them or dig deeper into how any of it is done.
I'm going to wrap up with just some learnings that I've had through this process of creating and using the PPH tool. The first is that appending interactive elements to the platforms in JMP really expands the analysis capability.
This is things like those, just the option to select a value and have it populate elsewhere, or in the PPH tool, the predictor screening, and then being able to select values from there and then further expand with a variability plot.
Next is the ability to document export findings, which really helps to streamline the communication pipeline. We start from raw data, we end up with somebody performing an analysis, and then they're able to export that into a table that can be sent to whoever's maintaining the model.
You can make notes that can get sent over to a process or equipment engineer for different things they might want to check on their tools. Next, when it comes to scripting, those ordered lists and using those as index keys with associative arrays to store, look up data, and pass it between different display elements was really key in getting this to work with a big set of data.
Then create Creating the interactive controls with a by variable display element going through the loop. There's a lot that you can do just by looping through that display element list that you get when you make something using a by variable.
Then finally, the biggest thing I want to get across is that the scripting is not as hard as it looks. The PPH tool is a big script and might be intimidating to look that, but it really over time, it's not that difficult to build something like that.
When I am scripting, I'm always using the scripting index. I always have the JMP help firms open, and I always have my log open to just look at what's happening as I'm doing things in JMP.
I really want to encourage anyone who hasn't tried it or may be intimidated by it to please just give it a try because it really expands what you can do with JMP. That's all I have. Thank you for your time.