Modern chip designs have multiple IP components with different process, voltage, and temperature sensitivities. These components must all function across a wide range of operating conditions, but different components will be more challenged at different process corners: some at high temperature, some at low voltage, etc. Their performance can be predicted prior to manufacturing, and then characterized when actual silicon is available to test; the more IP components, the more modeling/characterization work is required.
Customers have their own performance requirements layered on top of this: one may require a very high operating speed with little concern for power, while another may have extremely stringent power requirements, possibly turning off some chip components entirely. These requirements aren't met by running separate processes or product versions; instead each production batch is a mix of different performance grades. Optimizing this mix to match different customer requirements (bins) across the manufacturing process window has a huge gross margin impact: too much or too little of a given bin results in unsellable product, which is essentially a yield loss.
Monte Carlo simulations are optimal for simulating these speed-power distributions and bin mixes under different conditions: they can handle the large number of statistical parameters (typically 40 – 60, plus a covariance matrix) and bin limits (~20-100) involved. Consistent parameter extraction and documentation are critical to managing these, which is where JMP scripting comes in: it provides the statistical tools plus journaling capability to ensure "first time right" simulations. Furthermore, its fast execution time (1-5 minutes/run) and consistent documentation allow multiple business case comparisons in real time.
The diagram below shows the process flow for a typical series of simulation runs. We start with the predicted or actual performance characteristics, and from them extract the underlying statistical parameters describing the product; we may then scale these parameters (especially variance) to evaluate different scenarios. At this point the Monte Carlo engine generates a simulated "cloud" of speed-power distributions. Overlaid on this are the bin limits (customer requirements). Numerically integrating the cloud volume within the limits determines the statistical distribution of bins for this particular cloud. The scipt automatically repeats this across different manufacturing conditions to determine the process window over which the bin mix is acceptable (no nonsellable die). At this point we may wish to consider different scenarios, in which case we can rerun the simulation with different scaling and/or limits, until we finally reach the optimal combination of process and limits for a given demand mix.
The plot below shows a typical (simulated) speed vs. power plot for a single IP domain. The frequency is nearly always a linear function of log(static leakage current = SIdd), so it is described by a slope, intercept, and RMSE (noise). log(SIdd) itself is described by a mean and standard deviation. Because we commonly evaluate standard deviation changes we include a scaling parameter as one of our inputs, which means the cloud has three parameters for power, plus three for every frequency test at each voltage and temperature. In the plot below we show a sample bin limit: die below 0.32 logA static power and above 1040 MHz (inside the green box) are passing this bin's requirements.
We can expand the number of passing die by adjusting the nominal test voltage (within the customer spec range). At higher voltages parts run faster, so slower die can now pass the frequency requirement. However, the part also consumes more static and dynamic power, and there is a fixed budget for that. To stay within that budget the allowed static component at a fixed reference voltage must be lower. The light green box in the picture below shows the additional die which pass the bin requirements at high voltage. The reverse is true for lower voltages; the dark green box shows additional die captured under this scenario.
Die that do not pass one bin, even with different voltages, may still pass another. The plot below shows two additional bins that capture these outlier die: a low-speed, low-power bin (blue) and a high-speed, high-power bin (red).
Of course many die can pass multiple specs; the next plot shows these overlapping regions, with die in the darkest green quadrant passing all voltages for the green spec limits, as well as the red and blue spec limits. The binning regions are described by a combination of logical ORs across voltage, plus logical ANDs across temperature and IP domains. These die can be used to flex supply as needed from one bin to another.
There are two ways to count die for each bin. The first is the maximum distribution, counting all possible die which can pass the bin's speed/power requirements. This is an important supply constraint: if demand exceeds the maximum distribution then the supply will always be short, resulting in nonsellable material (yield loss). For this metric die in overlapping regions count for each bin they pass, so the distribution sum across bins can exceed 100%.
The second way to count die is the bin mix. In this methodology the user specifies the sort order in which die are considered for each bin; once a die passes a bin it is removed from further consideration. The sum of the bin mix distributions is therefore always 100%, because no double-counting takes place. This the second major supply constraint we consider: it determines whether we will have enough die to meet demand in actual production. Unlike the maximum distribution the bin mix distribution can be changed by using a different sort order, and we commonly evaluate these options throughout a product's lifetime.
The table below shows how the maximum and bin mix distributions differ for a set of five die:
While the principles of bin assignment and counting are straightforward, the implementation can be very complex. This is where scripting is a powerful aid: it ensures the logic is executed correctly for each die every time, removing a major potential source of user error.
We wish to contain all the myriad inputs for a simulation in one file for easy, consistent documentation. In addition, we would like a script interface that will manage statistical extractions and actual simulation runs. Fortunately, JMP can support all of this in a single data object, which we use for all of our inputs. An example looks like this:
The upper region is used for various scripting routines. Because parameter names are often long and complex, we include the option to rename them to something less cumbersome using JMP's text parsing commands. During the initial run based on a data file the script extracts and stores all of the necessary statistical parameters in the configuration file; future runs can simply use these stored parameters. Users do not need to store the data file separately, allowing anyone to run an identical simulation using the product's configuration file. This avoids inconsistencies when users base their analyses on different input data sets.
The script also includes an option to validate the configuration table. This is a critical tool to enable "first time right" simulations, because it checks for a host of potential errors that may produce spurious results. Rather than relying on expert user review each time, the script ensures that no key inputs are missing (limits, tests, temperature or voltage grouping parameters, etc.), and also checks for more subtle errors that could produce bogus (but realistic-looking) simulations: perfectly-correlated parameters, nonphysical correlations, or bin/pattern/grouping combinations that guarantee zero or 100% distributions at all times.
We use JMP's journal file capability to aggregate and standardize our outputs, ensuring best practices for documentation. The key features for most analyses are the simulated speed-power distribution (multiple parameter pairs can be plotted if desired), and the plots of maximum and bin mix capability distributions across the manufacturing process range. In addition to the graphs the file also includes the statistical parameters and all other inputs from the configuration file, so that we have complete trace-ability of all underlying simulation details. This is critical when we are early in a product's life cycle and considering several scenarios: failure to document any of the underlying inputs can cause a great deal of confusion, but by scripting the output we ensure this never happens.
With documentation automated, we are able to focus on rapidly evaluating different scenarios. Because the simulation time is very short (1-5 minutes) the more common ones can be evaluated real-time in meetings, greatly facilitating discussions with the Marketing and Design teams. We show some examples below, starting with one of the simplest: scaling the process variance. The first graph shows the impact to a single bin's maximum distribution when the factory process control improves: the distribution drops in some regions, but at the optimum target it is significantly better. This situation is so common the variance scaling is a tunable parameter in the configuration file.
The second, extremely common variation to explore is that of changing bin sort order; this can occur throughout a product's life cycle. The plots below show the bin mix distribution across the manufacturing process when the sort order is: orange, red, light green, dark green, blue. (The maximum distribution plot isn't shown here, since it's unaffected by sort order.) At the process target (vertical black line) the orange bin dominates, with small amounts of light green and red-binned die, and virtually no blue or dark green distribution.
We now wish to evaluate the effect of exchanging the red and dark green bins' sort order: how much less of the former and more of the latter will we now be able to ship? Sort order is a parameter we can easily change in the configuration file, so this comparison only takes a couple of minutes to run; the plot below shows the result (note that the light green bin is completely covered by the dark green bin, which is tested first in the new order).
While the orange bin distribution remains unchanged, a significant portion of die (~30%) will now fall into the dark green bin rather than red or light green as before. This result can be compared to customer demand to determine if the test program should be changed.
Another common scenario requiring simulation is pre-assembly die sorting. In this case, products may be built into different packages depending on their performance. In addition to understanding overall bin mix, we now wish to understand what the mix will be for each package type, introducing an additional layer of complexity. We also wish to understand how well our pre-assembly testing directs die to the correct package. This is easy to do with the JMP script, because pass/fail data for every test and every bin is included for every die. In the plot below we can see that die failing the wafer pre-assembly speed test (turquoise) are indeed among the slowest when tested after assembly to ACSCAN_PLAT_MIN_FMAX--but the correlation isn't perfect: the turquoise die overlap fairly heavily with other bin boundaries (note that this simulation uses different tests and bins than the previous example).
Whether this correlation is good enough to use in production depends on whether the resulting per-package bin distributions are acceptable. The graphs below shows the difference between building all die (on the left) or all but the turquoise die (on the right). For clarity, the corresponding graphs for bin mix distributions are not shown, but would be considered as part of the assessment.
A common, but somewhat more involved situation is when we wish to consider different bin limits; this occurs most often during the product definition phase. There may be some uncertainty in what power to offer a customer, or the Design team may want to understand the implications of a piece of IP running slower than desired. Our simulation script has the flexibility to vary bin limits in discrete steps; we can then study the distribution impact. Because these analyses can be very different we do not generate special graphical outputs, but the distribution data are easily saved as separate data tables that we can study however we like. The example below shows the standard output for "Bin 1 Combo" maximum distribution as the power limit is varied in 11 steps: the peak becomes lower and shifts to the right for tighter power limits. (Bins 2-5 have only single curves because their limits do not change. To the right we show a custom graph that plots Bin 1 distribution at the target process (vertical black line), as a function of the bin limit (labeled by step number; we can convert this to actual power values as needed). As usual we could perform this analysis on the bin mix distribution too, if desired.
From the graph on the right we can see we have room to tighten the power limit by 3-4 steps before Bin 1 distributions start to drop off sharply; at that point the supply becomes very sensitive to minor changes. We would feed this information back to the Marketing team as they consider what power specifications to offer a customer.
We may also want to evaluate the speed sensitivity to a particular piece of IP. We again step the bin limits and consider three metrics. The first is the optimal process target (peak distribution location on the X axis): this is not infinitely tunable, and not trivially done, so we don't want the process target to be sensitive to speed loss. We also need a reasonable process window (the range over which supply matches demand), and of course we need enough die passing that bin to meet customer orders. All three of the graphs below tell a similar story: we can tolerate up to ~40 MHz speed loss, but beyond that the optimal process target is very sensitive to any speed variation, the process window rapidly shrinks, and the bin supply itself begins to collapse. This input gives the Design team a clear criterion for success, and a solid feel for how much margin they should include in their simulations.
In particularly complex situations speed and power variations can be considered simultaneously; for example, if improving speed margin results in a power cost. The scripting tool's built-in journaling is particularly important here, documenting all the changing inputs automatically.
Using a JMP script has greatly improved our product performance-modeling abilities. We can manage complex inputs with built-in validation to avoid common user errors. The script handles the math behind the Monte Carlo simulation itself, freeing the users to focus on value-added results rather than the statistical details. It also ensures consistent output encompassing our best practices, so that every analysis is thoroughly documented without the need for expert review or a lengthy documentation spec.
The tool can be used with either silicon data, or seeded with performance estimates before we have products in hand. This is where its greatest value lies: we can impact products very early in their life cycle and provide meaningful feedback to the Design and Marketing teams before the final specifications are locked down. This enables better customer engagement, and ensures that when our products launch all functional die are sellable. Finally, we can quickly perform sensitivity analyses that ensure we have a robust process window. Products that meet these criteria at launch are more profitable, ensure smooth manufacturing operations, and better meet customer needs over their entire lifetimes.