Georgia Z. Morgan, Sr Statistician, Retired
Cause analysis is a general term applied to the tasks used to investigate, analyze and eventually determine the causal factors of a single, or multiple, anomalies. Advanced machine learning methods and other data mining tools have reduced the time and effort to analyze data.
Data prep, the steps to acquire data and transform it into information, require domain expertise. To root cause a complex, manufacturing process, it is likely that the domain experts are from different functional organizations and the data from different sources.
My experience is shared by others:
“Often, the data is in different systems and needs to be accessed and turned into a data set that can be used for data mining and machine learning. …
This often requires a significant amount of data aggregation and transformation. Once a single analytics base table for the analysis has been aggregated, the other aspects of the life cycle come into play. Because it is necessary to experiment with data, the preparation stage is also very iterative with the analyst trying different types of data to get the most accurate predictive results.”
(n.a.), (n.d.), Data Mining from A to Z. Retrieved July 31, 217 from the SAS website https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/data-mining-from-a-z-104937.pdf
“As I’ve noted on countless occasions, there is a tremendous difference between mining data and answering questions. Any given dataset offers a particular glimpse onto reality, but no single dataset offers a perfect, holistic and unbiased view of the entirety of all existence. This means that when it comes to analyzing data, in many ways it is far more important to have any understanding of the data one is looking at than it is to have a PhD in statistics.”
Leetaru, K. (June 12, 2016) Why We Need More Domain Experts In The Data Sciences. Retrieved July 31, 2017 from the Forbes website https://www.forbes.com/sites/kalevleetaru/2016/06/12/why-we-need-more-domain-experts-in-the-data-sci...
It is also my experience that:
This presentation has two goals. My primary goal is to demonstrate how the JMP Virtual Join can help a root cause analysis team be more agile, when it comes to combining tables. My secondary goal is to gather alternate proposals for the cause analyses that will be presented. The example is hypothetical in that it does not describe an actual problem that I have seen, but is based upon several experiences of what can go wrong. Also, all of the data is simulated. While this example will draw from semiconductor (IC) manufacturing, this problem could happen to any processing factory or even in drug studies.
A brief overview of aspects of IC manufacturing, to provide context for the data and anomaly, will be followed by some details about Virtual Joins, and then the analyses. Note, the example presented here is one of two that are written up in the attached paper.
Keywords: anomaly, RCA (Root Cause Analysis), data mining tools, semiconductor manufacturing, JMP® virtual join, JSL, reentrant process, IC (integrated circuit), FEOL (Front End Of Line), BEOL (Back End Of Line), EOL (End Of Line, typically, yield and performance testing and Die Prep), PM (Preventive Maintenance).
Integrated Circuits, ICs, are created from numerous processing steps of: oxidation, implantation, deposition of films and sputtered metals with different conductivity and barrier (non-conductivity) characteristics, patterning, planarization, and of course cleaning and measurement. It starts with a bare, doped silicon wafer  and ends up with multiple die (ICs) per wafer. The maximum number of die per wafer depends upon the wafer size (diameter) and die size. See Figure 1(L)
Figure 1(R) is a cartoon depicting a very small segment of a chip with a 5-layer Cu metal interconnect BEOL. Even though the architecture, metal thickness, layout and possibly the recipe, might be different for each Cu layer, typically, they are processed in the same factory area (fab bay) and on the same tools.
Figure 1 (L) Carrier of 25 bare silicon wafers with flat and fully processed wafer with notch (R) An IC graphic of a five layer Cu interconnect BEOL.
The FEOL and BEOL are segmented. Even a minute amount of Cu quickly diffuses through oxides creating defects and degrading the gate oxide quality.
Figure 2 depicts a possible fab (factory) layout for a FEOL . Processing equipment of the same type and function are located in bays. Their layout is optimized for plumbing to bulk chemical delivery, exhausts and drains. An overhead rail system, called an automatic material handling system (AMHS) carries fab lots from one station to the next, or temporarily stores them in a remote stocker until a tool becomes available. There is no standard layout, each semiconductor corporation has its own factory planning group. Most of the factory queues are reentrant. For example, the four tools in Bays 7 and 9 might perform seven different clean steps. Knowledge of the factory flow and equipment usage is essential for root cause analysis. Using this example, a single tool could potentially affect a fab lot up to seven times, and some fab lots might never have been processed on that piece of equipment.
See the references  or do a quick web search for videos and other sources to learn more about semiconductor manufacturing.
Figure 2 Possible fab layout for a FEOL process.
Processing sequences, time between steps, when it was processed on a specific tool and handling can be important contributing factors to an anomaly, a problem that affects the IC quality. Each silicon wafer has a wafer number scribed into the silicon. There is an industry standard that can trace each wafer to its origin. For about 20 years, fab (factory) processing equipment and handlers have RFID readers. Manufacturing systems capture the exact time wafers enter and exit a tool and more. The simulation data presented (and attached) uses a simple numbering system. It is not a replica of fab data, but provides the desired characteristics for this example.
A Virtual Join:
Figure 3 (L) depicts 7 joined tables, they are joined by the unique wafer number (Wafernum). The Link Reference is displayed as a blue shadowed key when linked and gray when unlinked. Click on the disclosure icon to reference the columns for that table.
Figure 3 (L) EOL Yield/Performance/Anomaly Data Virtually Joined with Equipment Data for 7 Operations, (M) M1 Auxiliary Table Panel with Link ID, (R) Table Menu, Merge Referenced Data,
Figure 3 (M)Depicts the table panel for one of these tables (M1, the first Cu layer). The Link ID is a gold key. The Link Reference and Link ID must be the same data type. The Link ID values must be unique, no duplicate values. If using JSL, or to be able to re-link tables in another session, the tables need to be saved to the disk. The list below provides some characteristics of virtual joins. Figure 3(R) shows that at any time a selected set of table links can be merged. Select the Link References to be merged, or select none for all linked columns, then select Merge from the main table menu. It is as easy as that to merge them into the main table.
Items to Note
Referenced Column( "Tool[Wafernum.M1]",
Reference( Column(:Wafernum.M1), Reference(Column( :Tool ))))
This simpler syntax seems to work.
:Name("Tool[Wafernum.M1]") in other words, :Name("colname[Link ID]")
Create an add-in or scripts to automatically create links, and open established link files.
Virtual Join Benefits:
Figure 4 Fab Equipment Data for 3 Operations, Same Column Names
Simulated Example - Intermittent Failing EOL M3 Resistance
One of the more difficult causes to root out is a contamination problem, especially if the contamination is removed with scheduled maintenance (PM) and dissipates with usage (repairable) and if it is conditional. A simulation of for a BEOL semiconductor manufacturing fab for a 7 layer metal sputter was created to demonstrate this anomaly.
Epidemiology, drug studies and food and drug processing likely experience a similar type of anomaly at some frequency.
SRHO is a metal resistance quality metric. There is no shortage of potential causes for an increase in SRHO, even restricting the investigation to just the M3 loop. If lines are too narrow resistance increases. If lines are cracked or other defects or metal contamination can affect the metal conductive characteristics. Figure 5 displays the effect. The top graph represents baseline material for several weeks before a holiday shutdown. The bottom graph depicts post-holiday material. A few intermittent lots’ EOL indicators for M3 resistance fail SPC, and are so far out of control, wafers must be scrapped. There were only a few anomalous lots, then none were seen. Then the “eventual rule” kicks in: small signals eventually become large signals. A flurry of bad lots appear at the end of line.
Figure 5 M3.SRHO (T) Baseline Lot Avg (B) Post-Holiday Lot Avg versus M3 Sputter Tools
Cause Analysis Questions
Only M1-M7 sputter was simulated with a FIFO queue, with an exception. Full lots are 25 wafers, half lots are 12 wafers, and quick turn lots are 6 wafers. The FIFO exception is that a 6 wafer lot takes precedence. To add some reality to the simulation, random throughput times for "other" process steps were used to have varying reentrant times between M1 and M2, etc. These other processing times were based upon fixed setup times; processing time based upon the number of wafers and precedence; and random queue times. In a real cause analysis with real data, each tool and queue in the loop needs investigation. Only the sputter tools were simulated to limit complexity and focus on methods. Also note, the paper uses the narrative that inline critical dimension data showed no change in line size. To eliminate line breakage or cracks, typically a strip back or inline visuals are reviewed. Nothing was found so the focus is on the metal itself.
To answer question #1, a comparison of incidence by tools is reviewed. Most often this is a one tool at a time review done by the modules and reviewed at the first RCA team meeting. Or a data mining tool with feature selections could be applied to a massive table with columns from multiple steps. Instead of a single monolithic file, a virtually linked main table like the one seen in Figure 3 (L) could be used.
Question #2, to continue the narrative, a chemical review of the bulk systems finds nothing. All tools are plumbed to the same systems. At this point, the anomaly data could be linked with tables of inline monitor data, tool sensor data and analyzed.
As the narrative continues, an engineer states that a couple weeks before shutdown, the M5 recipe, the chemical mix was changed to make the deposition more conformal to the topography (structures) at M5. The characterization prior to implementation included a few M5 wafers followed by a few wafers of M1, and repeated for the other layers. All results were within baseline.
The RCA team chair asks for an analysis of wafer counts for both M3 and M5, trying to answer question #3, about tool state. Suppose the engineer's first file only had total wafer counts. Close the table and have the engineer add a column for the cumulative number of M5 wafers run prior to M3. When the data is modified, open the new file and link. Figure 6 demonstrates tha different files, different iterations can be linked to the anomaly data.
Figure 6 Post-Holiday EOL M3 Anomaly Data Linked with M3 Tool PMCounters
Figure 7 - Post-Holiday M3.SRHO versus PM_WFRCOUNT by Tool (only SPUTT02 shown here)
Figure 7, a Fit Y by X plot of M3.SRHO data versus wafer counts for SPUTT02 demonstrates that failures occurred with low wafer counts and high wafer counts. The simulation scheduled SPC every 500 wafers and a PM is done after 5000 wafers or about 2500 wafers per chamber. There seems to be a depletion effect seen on the full lots (a decrease with each wafer). Also, due to the nature of this simulation, early post-holiday material had many 6 wafer, quick turn lots.
Given that the pre-implementation characterization showed no effect with only a few M5 wafers run, could this be an accumulation or due to sequencing, in other words, question #4?
For past analyses, I have used event analyses for my next steps.
Event sequencing can be done, fairly quickly using the JMP function Lag( :col, n ), which works on character columns. In this case, the tool histrory, what the tool has run previously, is created.
Figure 8 Snippet of table M3 Last 20.jmp Structure with lags of Tool Events
Figure 9 Partition of M3.SRHO Lot Average vs. Last 20 Tool Events
Figure 8 displays a snippet of the table structure and Figure 9 the Partition history. Lags 1, 2 and 3 were chosen in order, and each time, splitting on the event M5LotMoveIn. Figure 8 displays a formula column to simplify the sequencing. It is the concatenation of coded events, 0 = PM, 5=M5 and 1 for all other events. Figure 10 is the graph presented for the RCA management update.
Figure 10 Presentation Graph of M3SRHO Lot Processing Effect
Virtual Joins make for agile data updates and restructuring, which is especially useful for RCA.
 Figure 1(L) Bare Silicon Wafer, (n.a.), (n.d.), Silicon Wafer. Retrieved July 31, 2017, from TechInstro website, https://www.techinstro.com/silicon-wafer/
 Figure 1 (L) , (n.a.), April 09, 2013, Die Per Wafer Formula and (free) Calculator. Retrieved July 31, 2017, from anysilicon website, http://anysilicon.com/die-per-wafer-formula-free-calculators/
 Figure 1(R) (n.a.), (n.d.), Back End of Line. Retrieved July 31, 2017 from Wikipedia, The Free Encyclopedia, website, https://en.wikipedia.org/wiki/Back_end_of_line
 Figure 2, 유광재, January 20, 2015, Optimal Fab SDM LAB, KAIST Industrial & Systems Engineering, Figure 2. Optimal FAB layout design based on the material flow among the process types. Retrieved July 31, 2017, from http://sdm.kaist.ac.kr/wordpress/korean/%EC%9C%A0%EA%B4%91%EC%9E%AC/, http://sdm.k aist.ac.kr/wordpress/korean/wp-content/uploads/sites/2/2015/01/YGJ_2.png.
 Leetaru, K. (June 12, 2016) Why We Need More Domain Experts In The Data Sciences. Retrieved July 31, 2017 from the Forbes website https://www.forbes.com/sites/kalevleetaru/2016/06/12/why-we-need-more-domain-experts-in-the-data-sci...
 (n.a.), (n.d.), Data Mining from A to Z. Retrieved July 31, 217 from the SAS website https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/data-mining-from-a-z-104937.pdf
 Other websites to learn more about semiconductor manufacturing
https://www.youtube.com/watch?v=vmAyXWvLHeI (FOUP Load Movie)
https://www.youtube.com/watch?v=4Q_n4vdyZzc (Semiconductor Technology at TSMC, 2011)