Choose Language Hide Translation Bar

GRR Analysis of Effects from Measurement Queue Time on SiO2 Thin Film Thickness

SiO2 thin film has been widely used as STI liner, gate oxide, spacer, etc., in the semiconductor industry. The thickness of SiO2 layers is strictly controlled and is affected by facilities, chambers, and measurements . Among these factors, thickness is directly susceptible to measurements. If measurement queue time is too long, true thickness of  the SiO2 layer formed from thermal process may be distorted as thickness may increase naturally in the atmosphere.

To analyse effects from queue time and measurements on SiO2 thickness, JMP GRR analysis was introduced. After defining the operation, a cause-and-effect diagram is used summarize possible factors for thickness shifts. Next, thickness from coupons is collected, based on JMP MSA design platform. The thickness of each coupon is measured multiple times as repeatability tests and degradation tests, with the same repeatability tests conducted every three hours as reproducibility tests. Once the variability in thickness from repeatability and reproducibility is analysed using Xbar and S charts, GRR analysis is performed to evaluate current GRR performance. Finally, relationships between P/T ratios, alpha/beta risks, and spec tolerance, regression models between thickness and queue time are built to determine if the measured thickness is to be trusted.

 

Hi, everyone. I am Jiaping Shen. I am a Process Support Engineer from Applied Materials. Applied Materials is a leader in materials engineering solutions, is to produce virtually every new chip and advanced display in the world.

There is an internal JMP program inside Applied Materials to help engineers solve engineering issues based on JMP. Today, as a member of the JMP program, I'd like to share how I use JMP to do gauge repeatability and reproducibility analysis on silicon dioxide thickness to assist queue time control and measurement capability evaluation.

In wafer fabrication, process engineers rely on metrology tools to monitor each layer to ensure product quality. If measurement results are not accurate, it may lead to quality issue. So how are measurement results affected? If one tool measures the thickness of part several times and variations are huge, the tool repeatability is bad. If another tool measures the same part again, the gap between these two tools is huge. It means reproducibility is bad.

The analysis about repeatability and reproducibility of gauges is called GRR analysis. In this project, I take silicon dioxide thickness as an example to introduce how to evaluate measurement capability. Different from other GRR project, I use measurement queue time levels to introduce reproducibility.

Here is a general overview of the analysis of the flow. Based on the data collected, we evaluate the GRR performance and conduct root cause analysis to see if there is any further improvement. Then we discuss current processability and explore future opportunities.

The silicon dioxide thickness was collected from 15 coupons on a wafer. Each coupon got measured four times after zero, three hours and six hours after silicon dioxide generation.

Finally, we got 180 data points according to JMP, MSA Design platform. The thickness spec is from 97-103 angstrom. For GRR performance, we have four success criteria: P/T Ratio, P/TV Ratio, P/V Ratio, and P/M Ratio.

Among the four criteria, the numerator is precision, which is calculated from variations due to repeatability and reproducibility. In this project, tolerance is six, and I will use P/T ratio as a success criteria.

How about GRR performance? The first model shows P/T ratio is eight, less than 10%. It means the measurement capability is good, while the P/TV ratio is 31%, greater than 30%. It means measurement capability is bad. Why? This is because the part range is too tight, so we cannot trust P/TV ratio, and we need to trust P/T ratio, and it shows measurement capability is good.

How about interaction between part and queue time? From crossed GRR model, the interaction only accounts for 0.2% of the tolerance is negligible. With current capability, how possible will we make any mistakes in judging whether a part is within spec or not? The risk that a good part is falsely rejected is called alpha risk. Higher alpha risk increases production cost.

The risk that a bad part is falsely accepted is called beta risk. Higher beta risk brings risk to customers. During production, parts at the target have zero alpha and beta risk. Good parts near spec limit have high beta risk and bad part near spec limit have high beta risk.

How about alpha and beta risk in the project? Both are zero. Can we trust it? No, this is because all the parts are within spec limit. It is totally different from the actual production. As a result, we cannot rely on the calculated risk. Next time, we should deliberately pick up parts that 90% of which are uniformly distributed in spec range to simulate the true production.

The current measurement capability is good, but do we have improvement opportunity in the future? I will use Xbar-S chart to analyze root causes of GRR from repeatability and reproducibility.

From the top repeatability chart, X-axis includes 15 parts at three queue time levels. Y-axis is standardization representing the repeatability of each full repeat. Overall, standardization is very stable. That's queue time effect repeatability. The purple line is average standardization for each queue time level, and you can see there is no trend.

How about standardization for each part? You can see standardization is lower at wafer center while higher at edge. It may be attributed to higher stress at wafer edge. Repeatability is very stable. How about reproducibility? Most of the parts are beyond measurement arrow red wine, so metrology tool can differentiate between parts.

The trending purple line indicates that the average thickness increased by 0.2 angstroms up to 6 hours, far below the spec tolerance six. So long-term degradation risk is low. The M-shape curve is what we want to get best [inaudible 00:07:27] uniformity. If we overlapped three M-curve together, they are parallel, so there is little part to queue time interaction.

The repeatability is stable still, and reproducibility is also good compared to our spec tolerance. Still, pair of T-test between the first and fourth repeats are conducted to evaluate short-term degradation risk due to native oxidation.

The difference is statistically significant while not practically when comparing to the spec tolerance. There is little concern on any part measurement degradation within four repeats and ANOVA cross-GRR model is safe.

In the previous slide, we are talking about measurement capability. How about process capability? Process capability, Cp, is calculated by ICC and P/T ratio. ICC in this case is 0.9. P/T ratio is 8.88%. Final Cp is greater than 2 and falls into the green region. It means process is capable, measurement is capable, and stable within 6 hours.

However, because the ICC is highly depending on sample selection, ICC is less reliable compared to P/T ratio, so we better keep ICC fixed and move P/T horizontally as our first move if we want to do some adjustment. Keeping ICC fixed and moving P/T ratio from 0.08 to 0.16, we reach the green boundary. In this case, spec limit is tightened from 6 to 3.35.

How about other risk? There are three graphs show the P/T ratio, alpha risk and beta risk, with tolerance reduced from 100% to 30%. As tolerance is being reduced, P/T ratio increased to 29.6%, marginally acceptable. Alpha risk is still under 5%. Beta risk goes beyond 10% when tolerance is reduced by 40%. Based on three criteria, we can tighten tolerance range from 3 to 3.6 and keep P/T ratio around 50%.

This graph summarizes how we iteratively and continuously improve process and measurement capability in different scenarios. When Cp is greater than 2, P/T is less than 0.3, marked by the light green stars. We should consider tightening spec until Cp is equal to 1.33 to be ready for improvement.

When Cp is less than 1.33, P/T is less than 0.3, marked by the blue star. We should improve process part-to-part capability and reduce the ICC until Cp is equal to 2.

When Cp is less than 1.3, with P/T greater than 0.3, marked by the orange star, we should consider optimizing GRR performance to reduce P/T ratio to less than 30% and improving Cp at the same time. That is how we could make decision to improve measurement or process in different cases. This is how we conduct GRR analysis based on different queue time levels. Thank you for listening.