Hello, good morning. Good evening everyone.
My name is Raisa.
I'm a manufacturing quality engineer of Applied Materials, Taiwan.
I started to learn JMP at beginning of this year.
Recently, I pass ed this certification exam
with score, 925 in this July.
Today, I'd like to make a short presentation
about QN Immediate Fix Time Analysis by JMP.
As we know, once a quality notification cure is created,
it must take additional, more or less time to fix issue,
and may impact on production planning and scheduling.
Therefore, we'd like to find a worst- case by analysis.
Okay.
For analysis, here are five s ub topics on agenda.
First, the Root Cause Analysis of QN Fix Cycle Time,
Graphical Root Cause Analysis Summary,
Compare Fit Model, Partition, Neural Model,
and then H ybrid Text M ining
and the Data Mining Analysis.
Finally, Take Away Learnings.
Okay, let's get started.
The Histogram.
1st layer of Root Cause Analysis of QN Fix Cycle Time.
Before investigation, think about that.
What scenarios impact on QN Fix Cycle T ime
and how long is it endurable?
First up,
define five days as a criteria
and also a key condition to follow C-wide spread i n wording
and within five days,
we in spec and success criteria.
On the other hand, over five days out of spec failure analysis,
later, I make directly a breakdown to SA and the FA.
Notice this shade of distribution between SA and the FA.
Meanwhile, look at the Mosaic Plot
for the proportion of each category
to infer a potential r oot cause
for the success analysis.
We can see the Workmanship and MFG rework,
seem to have quick response and better fix cycle time.
For FA dimension issue, take more fix cycle time.
It is obvious variation in FA time.
Distribution between SA and the FA,
suppose if FA time is one of the key factor,
S1 to impact the f ix c ycle time.
The Box P lot.
The box plot is a graph of the distribution of continuous variable.
Therefore, plot continues fixed cycle time versus nested structure,
categorical country under containment
to search other factors impact fix cycle time or not.
It displays the five- number summary of setup data.
It is non- parametric tool to use Median as central tendency.
Besides, there are some observations on box plot graph.
First, at least seven point to detect the first outlier.
Otherwise, it becomes whisker (skew) group problem
where sample size is less than seven .
Second, observed screwed distribution
by Box Width or Whisker Length.
How to handle marginal outliers,
which are we think two Sigma GRR noise from whisker
and back to the Root Cause Analysis,
it is not difficult to find a recycle time
of the containment replacement
is much longer than other containment.
And with that,
X₂ containment and X₃ country here.
Heatmap.
Heatmap is another graphical tool
to defect data value by color.
A gain, until now, we gather three input factors,
defect type from histogram, containment and country from box plot
and in order to follow study quarter impact
quarter input impact on fixed cycle time.
Here at back categorical called defect type on Y axis
and the color cycle time
and keep bus prognostic structure.
Categorical country under containment in X and X group.
Then use a 8 by 9 layout look balance
to quickly catch out the maximum
and the minimal cycle time scenarios.
For FA, it is easy to find a little red area, right?
The library highlights the longest fix cycle time.
With that, Replacement Taiwan Damage parts,
in the worst case for cycle time.
The Replacement United States
and the Dimension issue parts is the second worst scenario.
For SA is set for dimension and damage defect.
Others are easy to quickly fix.
For Pareto C hart,
to further analyze the FA and SA from heatmap ,
heatmap must use two- dimensional Pareto Chart by to variable
defect type and the country under specific containment.
Here are X₁ defect type,
X₂ containment and X₃ country we mentioned before.
Then add additional workstation X₄ here
in the course of Pareto C hart to visualize frequency event.
Now for FA failure analysis,
we get replacing high work supplier damage issue,
frequently happen in CVD service fraud
or replacing United States supplier dimension issue
often happens in CVD workstation.
In the same way for X analysis,
instead for dimension issue or damaged defect,
United prior, functional and a workmanship issue
can quickly fix in CVD major test.
Currently, we have four input factors and SA and FA frequency.
For interface, one we are more interest in
is pass or fail frequency or pure cycle time.
Then Tabulate.
Here put our previous mention of factors X₁ to X₄ for on Tabulate.
Meanwhile, Tabulate pure cycle time and frequency into a account
to do further comparison.
For FA, CVD service fraud, damage issue,
Taiwan supplier require replacement,
it did take a longer cycle time
although the frequency is now the highest,
like seven times here,
the means of the cycle time, 34 days is much longer than others.
For SA, in CVD motor test,
we can measure issue in United States prior
and fixed by MFG re work.
Even there is only one day on the table,
but the frequency and is far too low to be true.
Here I summarize the main points.
Follow Root Cause Analysis,
use different graphical JMP platforms
in engineering and large caustics
sequence to conductive prior Root Cause Analysis.
In previous slide, I show Histogram, Box plot,
Heatmap, Pareto Chart and the Tabulate.
Second, identify a potential input X
to protect the QN fixed cycle time.
A ccording to the Tabulate, the FA results from a damage issue,
replacement, Taiwan suppliers and CVD service workstation.
Next, build a model to predict QN fix cycle time
and validation of the root cause.
Before entering each model's detail,
here, I'd like to introduce model selection and the comparison first up.
The fit model, consider data structure and a distribution.
Here are some challenge in fit model.
For skewed distribution, use log transformation, but no help.
All input variable, X₁ to X₄ are categorical type.
After I build our 60 % of workstation category,
R-square increased 6 % only.
Check dependency among a categorical variable
by correspondence analysis part.
It is low risk because the closer things are to a region,
the less distinct that they probably are.
In other word, the farther away the more distinct.
Second, proximity between labels probably indicate a similarity.
For partition tree model,
the plus points are distribution free model,
split based on data available,
little overfit concern,
but minus points recursive split.
Therefore, use JMP projector screen by random forest
to average a recursive product
and find out a five input factor with their ranking.
It is convenient and a quickly way to find important input
to optimize or improve model.
Regarding a neural network,
the plus points are strong transformation model,
two steps training and a validation model.
However, the minus is significant overfit concern .
Which model is more proper to be believed
that goes through each model results?
Come back to fit model, main event only.
If our score isn't high,
only 30 % are wrong,
because data is severe right skewness.
Observed significant level of risk,
so Max R- square around just 47 % is not worth it
and use log transformation of the cycle time variable
to avoid a negative number of 95 % confidence interval,
but no help a lot so log choice is out.
The next is Partition Tree Model.
Here are three partition models,
are baseline model,
model augmentation
and a model simplification.
Experience a series of improvement per engineering and the logical thinking
that R square improved to 62 % from 38% baseline model.
All the detail will show you step by step in following slide.
Model augment.
During this step, we improve model 20 %.
Where are they from?
First, they will present improve.
Here, changing QN age to immediate fix cycle time
for propriety experience, but no help.
Second, 6% and add one X factor workstation.
Remember it is export, we discover from Pareto C hart.
UD code becomes less critical from 26 % to 8 % only.
The third and ano ther 4 % by changing to containment from UD code.
Now, check a contribution ranking here.
The number two become workstation instead of country anymore.
In model simplification,
here, improve additional 6 % R square by model simplification.
Before simplification model,
the plus is all scenario under consideration,
but like minus two, many categories might dilute predict power.
In simplification model,
filtering out minor categories with fewer counts,
like remove 60 % categories of workstation,
the total amount decreased to 270 from 426.
Check again, the contribution ranking,
the defect type and our workstation are still the number one and the number two.
Now, we have more confidence to use the model
to predict the FA and the SA.
For Partition Tree Model Optimization
as I mentioned before,
the major contributors are Defect type & Workstation
around 80% for Pareto C oncept.
Compare defect type in SA and FA prediction.
For SA here, it's labeling issue.
Makes sense, we don't spend more time to fix every issue.
For FA, it's damage.
Yes, it would take much more cycle time.
About a workstation comparison between SA and FA
PVD mechanical and CVD module tester.
Currently, it still needs further analysis and understanding.
About the FA country here, in a profile or prediction profile is flat.
Doesn' t country impact QN fixed cycle time?
Is it right?
To answer the question here, I introduce the model limitation,
recursive partition.
Recursive partitions, sequential dependency risk.
Factor country is spread six times,
and only one time happen in higher cycle time cluster.
Such recursive dependency limitation may impact the predictive model.
The third model,
Neural Network (Artificial Intelligence).
Here observes severe overfit concern
between training and the variation R square.
If R-square between Training Set and V alidation Set
is over 20 %, it has overfit concern.
Besides, we find it in neural model,
the number one ranking is workstation,
and the number two is fault by
which is different from previous partition model.
For SA, the workstation is at staging,
CCT staging, where material are brought together
before entering MFG fault ,
and it doesn't have competitive operation process.
Once the issue happen, it can be fixed quickly.
Makes sense.
For FA at the CVD service fraud,
it has competitive operation process.
Yes, it did to have longer cycle time to treat difficult issue.
Until now, we already have three model,
Tree Model Partition and the Neural Model,
and which model is much more proper and meet reality.
Therefore,
model comparison and selection,
Root Cause Analysis, graphical tool
damage issue replacement,
Taiwan CVD service fraud is the worst scenario with longer fixed cycle time.
Currently, Neural model has the identical scenario
as the Graphical Root Cause Analysis,
but only concern is overfit risk.
Besides, the three model has very close prediction
on the worst cycle time within 1.2 days.
The final I will introduce, Test Mining and the Data Mining Hybrid.
Currently in QN D ata base, it still has test messenger resulting well
to such more information about long cycle time in QN tester variable database.
Use JMP Test Explorer to discover some frequency keywords
such as here, I circle replace, rework, dimension and the F10246
a project to do further analysis,
then convert them to binary detectors,
conduct a further Data Mining and the Root Cause Analysis
on F10246 case via heatmap graph.
Here, put dimension indicator under F10246
and containment replace and re work in Y.
According to the heat map results, F10246
it did suffer lots of fix cycle time
that other project by color results
and check dimension detector observed is
not only dimension issue, but also our various defect cause
long cycle time, even if just are fixed by rework.
In the end, here are my takeaway learning.
JMP G raphical Platforms are very powerful to conduct
deeper r oot cause analysis through engineering
and the logical, data- driven process
and compare and select a more appropriate JMP model
from Classic Fit M odel, P artition and a Neural Network
by knowing the model limitation
and the risk of connecting to previous Graphical Root Cause Analysis.
Conduct a Hybrid Text M ining and Data Mining R oot Cause Analysis
on the complicated QN Database.
Final, I'd like to thank GCI M BB Charles Chen as my project mentor
and that's all my presentation.
Thank you for your time and attention.
Thank you.