cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Exploring JMP DOE Design and Modeling Functions to Improve Sputtering Process (2022-US-30MP-1126)

Ting Cheng, Process Engineer, Applied Materials
Cui Yue, Process Engineer, Applied Materials

 

In an AMAT Six Sigma Black Belt project regarding to PVD sputtering process, the project goals were to optimize several film properties. A Pugh Matrix was used to select the most feasible hardware design. PVD sputtering process parameters were chosen (Xs) based on the physics of the PVD sputtering process. 

 

To improve the design structure, definitive screening design and augment design were used to avoid confounding at Resolution II and III. RSM model and least square Fit were used to construct the predictive modeling of these film properties. In addition to main effects, several interaction effects were found to be significant in the reduced RSM model. Each interaction effect uncovered potential insights from two competing PVD sputtering physics models.

 

To further optimize the multiple-stream sputtering process, group orthogonal supersaturated design DOE was utilized to optimize each sputtering process block. Film properties (Ys) compete with each other when searching the optimal design. Three JMP techniques were used to find the optimal Robust Design:

  • Set up simulation-based DSD to optimize the desirability functions.
  • Conduct Profiler Robust Optimization to find the optimal design.
  • Run a Monte Carlo simulation to estimate non-conforming defect %.

By using these JMP 16 modeling platforms and techniques, this Black Belt project was very successful, not only in improving the film properties, but also in furthering the understanding of how physics interact in the process. This multifaceted JMP DOE, modeling and simulation-based approach became the benchmark for several new JMP projects undertaking a similar approach.

 

 

Hi.  Hello,  everyone.

This  is  Cheng Ting  here,

and  the  other  presenter today  will  be  Cui Yue.

Today,  we  are  going to  share  with  you  our  experience

of  exploring  JMP DOE  design

and  the  modeling  functions to  improve  the   sputtering process.

First of  all,

let's  have  a  brief  background and  introduction  about  the  project.

This  is  the  Black  Belt Six Sigma DMAIC Project

regarding  to  the  sputtering  process.

The  project  goal  were  to  optimize several  film  properties.

In  the  define  phase, we  have  identified  the  three   CTQ

and  the  three correlated  success  criteria.

CTQ 1 is  the  most  important and  a  challenging  one.

The  success  criteria

is  the  measurement  result is  larger  than  0.5.

CTQ2 and   CTQ3  are  equally  important.

The  success  criteria  for  both  of  them

is  the  measurable  result  should  be less  than  0.05.

Different   JMP tools  has  been  applied massively  throughout  the  measure  phase,

analyze  phase,  and  improve  phase.

We  will  share  our  experience of  using  the   JMP tools  here.

In  the  measure  phase,  we  did   a MSA, and  we  finalized  the  data  collection

for  the  baseline  hardware.

Three  tuning  knobs, namely  X ₁,  X ₂, and  X₃  are involved.

After  data  collection,

we  use  a  Monte-Carlo  simulation  in  JMP to  analyze  the  baseline  capability.

This  is  the  first tool we  introduce   today  too.

In  order  to  establish  your  baseline  model,

we  use  augmented  DOE, RSM,  prediction  profile,

as  well  as  interaction  profile, check  functions  in   JMP.

After   entering  the  analyze  space,

in  order  to  do  a  root- cause  analysis and the  capability  analysis,

we  use  the  goal  plot,

desirability  function, multivariate  method,

and  the  graphic  analysis  tools.

In  the  improvement  phase, we  did   hardware  modification,

where  another  tuning  node  X₄ was  introduced.

Interactive  graph,  augmented DOE,

RSM,  desirability  function, and  interaction  profile  were  used

in  this  section to  further  improve  the  process.

The  GOSS  stepwise  fit and  desirability  function  were  used

in  the  robust DOE   modeling to  further  improve  the  process.

Some  of   these  tools  will  be  demonstrated by   Cui Yue later.

The  control  phase  will  not  be  covered at  today's  presentation.

As  we  mentioned, in  the  measure  phase,

so  after  the  baseline  collection, we  actually  use  the   Monte-Carlo simulation

to  understand the  baseline  process  capability.

These  are  the  baseline capabilities  for  three  CTQs.

We  can  see that  all  of  them  are  normal  distribution,

indicating  a  strong  linear  correlation with  the  input  parameters.

For  CTQ1,  the  sample  mean is  out  of  spec  limit.

Thus,  we  have  a  very   negative  Ppk  value.

The  99 %  of  the  baseline  process  result cannot  meet  CTQ1  spec  at  all.

CTQ2  is  close  to  the  spec among  the  three  CTQS.

Sample  mean  is  lower than  the up  spec limit.

Thus,  we  have  a  positive  Ppk .

The  48 %  of  the  baseline  process  result, they  do  not  meet  the  spec  for  CTQ2.

Sample  mean  for  the  CTQ3 is  out  of  spec  limit  as  well.

Thus,  it  has  negative  Ppk.

64 %  of  the  baseline  process result  did  not  meet  the  spec.

This  baseline  capability  confirmed

that  the  biggest  longevity for  this  project  is  C TQ1.

Apparently, the  baseline  process  condition,

which  we'll  call process  conditional  1  here,

cannot  meet  our  CTQ  success  criteria, especially  for  CTQ1.

We  will  need  to  tune the  process  condition.

However,  before  that,  we  will  need to  know

if  the  current  hardware  can  meet the  requirement   or not.

The  subject  matter  experts, the   SME,  has  advised   us,

has  proposed  the  two  hypothesis, and  advise  us  to  shift  the  process

to  condition  2 based  on  the  second  hypothesis.

However,  before  that, we  will  need  to  check

if  the  prediction  model  is  good for  both  condition.

Hence,  we  use  the  scatter  plot to  check  the  collected  data  structure.

As  we  see  here,  the  data  collected  is  not in  a  orthogonal  structure.

This  is  because  we  actually  use a  two- step  evaluation  design,

and  widen  the  process  range to  meet  the  success  criteria  of  CTQ1.

We  did  have   a weak  prediction  capability in  the   whiter area.

However,  we  still  have  good  prediction for  the  condition   1 and  condition  2.

We  also  did  a  confounding  analysis.

The  fact  there  is  certain  confounding  risk in  the  Resolution  II  between  X  and  X₃.

Nonetheless, we  still  built  a  prediction  model.

We  use  the  response  service  method for  the  fitting.

In  this  case, the  main  effect,  interactions,

and  the   quadratic terms will  be  fitting  together.

Based on  the  RS,  S R as well, RS- adjusted  square,  and  the  p value,

we  can  see  it's  a  valid  prediction  model.

From the  effect  summary,

we  can  see  that  only  the  significant  terms are  included  in  the  model.

With  the  interaction  profile, we  can  see  two  interactions

which  are  correlated  with  two  hypothesis we  mentioned  before.

With  the  prediction  profile, we  pick  the  process  condition  2.

At  this  process  condition,

what  we  can  see  here is  95 of  the  confidence  interval  of   CTQ1

is  between  the  range  of  0.5  and  0.6.

This  CTQ  has  been  tuned  into  spec.

However,  in  the  meantime, the  CTQ2  and  the  CTQ3 are  out  of  stack.

Hence,  we  use  the  goal  plot to  compare  the  two  process  conditions,

and  re alize that  when  we  have  improved  CTQ 1

from  process  condition   1 to  process  condition  2

by  getting  closer  to  the  target

and  a  narrowing  down of  standard  deviation.

However,  in  the  meantime, C TQ2  and  3  were  compensated

with  larger  standard  deviation and  the  further  distance  to  the  target.

Hence,  there  is  a  tradeoff between  three  CTQs.

In  this  case, we  try  to  find  an  optimized  solution

with  the  desirability  function  in  JMP.

For CTQ1 ,  the  success  criteria is  that  mode of  0.5.

H ere,  we  use  the  maximum  plateau  method when  set  the  desirability  function.

Means  any  value  more  than or  equal  to  the  target

will  be  equally  preferred.

We  also  highlight  the  importance  of CTQ 1 by  putting  1.5  as  the  important  factor.

Fo r CTQ2 and 3, the  success  criteria  is  less  than  0.05.

Hence,  we  use  the  minimum plateau when  set  the  desirability.

Any  value  less  than or  equal  to  the  target

will  be  equally  preferred.

However,  after  maximize  the  desirability,

the  calculated  optimized  solution is  only  around  0.02,

and  none  of  the  three  CTQs can  meet  the  success  criteria.

Hence,  we  can  conclude that  there  is  a  hardware  limitation

in  this  case.

After  discussion  with  the  SME,

we  decide  to  introduce  Y₄ into  our  data  analysis.

Y₄  is  not  a CTQ  here,

but  it  is  a  measurement  that  can  reflect the  intrinsic  property  of  the  process.

This  intrinsic  property  will  affect the  CTQ2  and   CTQ3 directly.

if   Y₄  is  more  than  zero, then  it's  a  positive  process,

and  when that  Y₄  is  less  than  zero, it's  an  active  process.

If   Y₄ is  closer  to  zero, then  we  call  is  a  neutral  process,

which  lead in to  a  smaller  number of  CTQ2  and  CTQ3  together.

Here  is  the  distribution  of  Y₄ of  the  baseline  hardware.

As  what  we  can  see  here with  the  baseline  hardware,

the   Y₄ is  always  more  than  zero.

That,  we  will  always  have for  positive  process.

The  multi variate graph here actually  shows  the  relationship

among  the   Y₄:   CTQ2  and   CTQ3.

They  are  strongly  correlated.

If  we  have  smaller   Y₄, we  will  also  have  smaller  CTQ2 and  CTQ3.

In  order  to  have  a  wider  range  of   Y₄,

we  decide  to  add  in  another  factor, X₄, in  the  improve  the  hardware.

Together, another  two  scientific  hypothesis

has  been  proposed  by  the   SME.

We  have  collected   data  on  the  new  hardware

and  compare  the   Y₄  distribution in  two  hardwares.

In  the  baseline  hardware,  without  X₄ , we  have  collected  data  orthogonally

and  with  certain  range  for  each  factor.

This  is  the  distribution  of  the   Y₄ under  these  contributions.

With  the  improved  hardware, with  the  X  introduced,

we  have  collected  data in  the  same  range  of  X ₁,  X ₂,  and  X ₃.

This  time,  Y₄  a t different  X ₄, has  also  been  collected.

Comparing  the  two  distributions, we  can  see  that  without X₄ ,

we  only  see  one  cluster  for Y ₄, with  the  peak  value  more  than  zero.

However,  with  the  X₄  introduced,

we  can  observe a bimodal  distribution  for  Y₄,

with  one  peak  with  mean  more  than  zero, and  another  peak  with  mean  less  than  zero.

The  process  condition  makes Y  less  than  zero

actually  draws  our  attention.

Under  these  process  conditions, we  will  have  negative  process,

and  this  may  help  us to  improve  CTQ2 and CTQ3,

but choose  that  process  if  we  cannot  meet all  CTQs  in  one  process  only,

because  neural  process  benefits CTQ2 and   CTQ3.

We  did  a  simple  screen of  processed  conditions

when  we  have   a negative  process, and  this  lead  us  to  a  certain  range  of   X₄.

That's  why  we   collect more dat a  in  this  range,

because  it's  our  condition  of  interests.

Now,  we   conclude that  the  X₄  did  impact  on  Y ₄,

and  thus,  it  can  impact on  CTQ2  and   CTQ3.

Now,  we  can  further with the  impact  of   X₄  on  CTQ1,

and  build  another  model for  the  improved  hardware.

Prior  the  data  collection,

we  have  prescreened the  conditions  of  interest

using  the  interactive  graph  in  JMP.

We  will  collect  more  data with  certain  range  of   X₄,

because  this   X₄  actually can  give  us  the  negative  Y₄  value.

It  also  covers   most  ranges  of  Y ₄.

As  we  can  see  here, before  the  data  collection,

as  what  we  can  see  here,

this  is  not  the  most  orthogonal   structure,

since  we  have  collected  more  data at  the  conditions  of  interests,

even  though  after  doing a  design  evaluation,

we  find  a  low  con founding  risk.

The  data  structure  is  still  good for  modeling.

This  is  the  model  we  constructed.

As  what  we  can  see  here, we  have  an  adequate  model.

Only  factor with  the  p  value  less  than  0.05,

or  what  included  in  the  model.

The  RS quare is  more  than  0.8.

The  difference between  the  R Square  adjusted

and  R Square  is  less  than  10 %.

The  p  value  for  the  whole  model is  always  less  than  0.0 01.

Also,  through  the  interaction  profile, the  hypothesis   1 to 4  has  been  validated.

This  time,  can  we  find any  optimized  solution?

Again,  we  run  the desirability  function.

The  left  side  is  the  optimized  solution provided  with  the  baseline  hardware

before  the  X₄  installment,

and  the  right  side is  the  optimized  solution

with  the  improve d hardware with   X₄  installments.

As  what  we  can  see  here, compared  with  the  baseline  hardware,

improve d  hardware  did  provide an  optimized  solution

with  higher  desirability and  improve  the  result  for  each  C TQ.

However,  the  desirability  is  still  low, which  is  only   0.27.

Not  all  CTQs meet the  success  criteria  in  one  step.

So  we  still  did  not  find that  adequate  solution

in  one-step  process  for  the  project.

However,  but  as  what we  mentioned  previously,

since  we  have  a  cluster of  the  process  conditions,

allow  us  two  negative  process  with  the  Y ₄ less  than  zero.

We  can  propose  a  potential  solution with  two- step  process.

The  solution  with  two- step  process may  not  be  that  straightforward.

As  we  know,  if  we  can  find the  optimized  solution  in  one  step,

all  we  need  to  do  is  just to  round  the  process  at  the  conditions,

gives  the  maximized  desirability.

The  result  will  be  predictable since  we  have  a  known  model  for  it.

Now  we  want  to  have  a  two- step  process.

For  each  step,  we  have  a known  model,

and if  the  process  condition is  determined.

However,  due  to  the  different process  duration  for  each  step,

we  will  have  a  new  model for  the  final  results.

In  this  new  model, we  will  have  nine  variables  in  total:

X  to  X₄  for  each  step, and  the duration  for  each  step.

Now,  the  question  is,

how  are  we  going  to  find  a  proper  solution for  the  two- step  processes?

We've  got  two  strategies.

The  first one  is  to  do a  DSD  modeling  for  the  nine  variables.

In  this  case,  we  will  need  to  have at least  25  runs  for  the  first trial.

Of  course,  we  will  have orthogonal  data  structure.

The RS M  model  can  be  constructed, but  the  cost  will  be  very  high.

The  other  strategy  is  to  screen  design

with  the  group  orthogonal super-saturated  design  first,

which  is  the   GOSS  design.

In  this  case,  we  can  screen  the  impact of  seven  variables  with  six  run.

This  is  why  it's  super- saturated.

We  have  more  variables than  the  data  points.

Of  course,  we  will  need to  screen  out  two  variables

before  the   GOSS,

and  we  use  the  interactive  graph again  in  this  case,

and  the  details  will  be  reviewed by  the  next  slides.

The   GOSS  design  provides

two  independent  blocks for  each  process step.

There  is  no  interactions between  the  factors  across  block.

The  data  structure  is   orthogonal in each  block,

making  it  possible  to  screen  effect with  the  super-saturated  data  collection.

However,  this  GOSS  will  show the  impact  of  main  effect  only,

and   no  interaction  will  be  considered.

This  is  a  low- cost  approach, and  we  manage  further  DOE  design.

The  follow-up DOE  can  be  DSD, augment,  or OFAT.

Each  of  these  has  its  own   pros and cons,

and  it  will  not  be  covered in  this  presentation.

Anyways,  to  save  the  cost, we  decide  to  proceed  with  strategy  2.

We  start  with   a GOSS  design.

As  we  discuss ed  now, we  have  nine  different  variables.

However,  in  the  GOSS, we  can  only include  seven  variables.

In  order  to  narrow  down  the  parameters for  the   GOSS  design,

we  did  a  simple  screen with  the  interactive  graph.

For  step  1,  we  choose the  process  conditions,

allow  us  to  have  a  good  CTQ2.

After  screening,

we  decide  to  fix   X₂  in  this  case, based  on  the  previous  learning.

As  seen  here, when  the  CTQ1  is   more  than  0.5,

all  we  have  is  a  positive  process with  the  Y₄  for  more  than  20 %.

Hence  in  this  case,  for  step  2, we   choose process  conditions,

allow  us  to  have  the  Y ₄  less  than  -0.5 so we  can  have  a  negative  process.

In  this  case,  adding  two  steps  together, the  final Y ₄ will  be  closer  to  zero.

This  way  can  improve  in  CTQ2  and  CTQ3.

After  screening, we  decide  to  fix  X₁   for  this  step,

based  on  the  previous  learning.

After  data  collection,

we  did  a  step wise fit with  main  effect  only,

since  in  the  GOSS, as  we  mentioned  previously,

only  main  effect  is  considered.

All  three  CTQs  validated  the  model with  the  p value  less  than  0.05,

and  the  RS quare  adjusted  around  0.8, and   VIF  less  than  5.

After  maximize  the  desirability,

the  model  provide  us with  an  optimized  solution

with  desirability  more  than  0.96, which  is  way  higher  than  0.27,

and  not  to  mention  0.02.

Hence,  we  can  actually  lock the  process  parameters

and   further  by   the optimized  solution in  next  step,

which  is  we  choose  to  use  the   OFAT.

But  this  will  not  be  covered  here.

Here,  I  like  to  summarize

what  we  have  discussed in  this  presentation.

In  this  presentation,  we  have and  share  with  you  the  experience.

We  use  different  JMP  tools involved  in  the  data  analysis

throughout  the  DMAIC  project in  different   stages.

For  the  base line capability  analysis, we  have  used   Monte-Carlo simulation.

We  also  used  a  goal  plot.

For the   root- cause  analysis, we  used   multivariate method.

We  also  used  a  graphic  analysis.

In  order  to  help  with  the  DOE,

we  used  the  augmented  DOE, GOSS, and design diagnostic.

In  terms   to  have  a  good  model and  prediction,

we  used  different   fitting functions, and  we  also  used  the  prediction  profile.

We  also  used  the  interaction  profile.

It  also  was  mentioned that  these  profiles  are  not  only  used

for  the  model  and  the  prediction,

but  it  also  help  us to  have  a  deeper  understanding

of  the  process  itself.

In  order  to  screen  out the  conditions  of  interest,

we  actually  use  the  interactive  graph,

which  is  simple, but  very  useful  and  powerful.

In  order  to  do  d ecision  making,

we  actually  use  the  desirability  functions help  us  to  make  a  decision.

Until  now, we  share  with  you  our  experience

of  how  the  JMP  can  help  us to  do  the  analyze.

The  last  but  not  least,

we  would  like  to  thank  Charles for  his  mentorship  along  the  progress.

Thank  you.

My  partner Cui Yue,

she will  share  with  you some  demonstration  with  the  JMP  side.

She  will  help  us  to  demonstrate

how  can  we  use the  interactive  graphic  analysis

as  well  the   stepwise  fitting in  the  GOSS  model.

Okay,  thank  you.

Thank you, Cheng Ting.

I  think  you  can  see  my  screen  now,  right?

The  first thing, I  would  like  to  introduce  you

the  interactive  plot Cheng Ting has  just  mentioned.

This  is  actually  one of  my  personal  favorite  function  in  JMP.

It's  simple,  but  it's  very  powerful.

Here,  our  purpose  is  to  screen what's  the  most  relative  function,

a factor   relative to Y₄.

Which  one  contributes  the  most  to  Y₄, from  X₁  to  X ₄?

We  can  simply  select all  the  factor  of  interest  we  want  here,

and  click OK.

Now,  we  are  having the  distribution  of  all  the  factors.

Now,  as  Cheng Ting  has  mentioned, we  want  to  know  what  contributes  the  most

to  the   Y₄ at  negative  side.

For  example,  here.

Then  now,  we  can  see only  X₄  from  13- 14,  X₃  from   0- 1,

and   X₂  at  this  range,  2.5 -5.

Also,  X ₁ at   19-20, can  make  this  happen.

Thus,  it's  easier  to  give  us  a  range of  different  factors,

how  it  will  contribute  to   Y₄, and  how  we're  going  to  choose  the  factors

if  we  want the  Y₄  reaches  certain  level.

Similarly,  if  we  want  it  to  be slightly  higher  than  0 %,

we  can  also  simply  click  this  area.

Actually,  to  be  quite  straightforward, to  solve  the  problem  at  one  shot,

we  just  select these  two  together.

Then,  we  can  see,  okay, X  should  be  at  this  range,

maybe   10- 14.

X₃,  definitely  should  concentrate at   0-1.

X₂  is  a  slightly  wider  distribution.

X ₁,  there are o nly  two c andidates for  this  direction.

From  this  one,  we  can  easily and  intuitively  find

the  contribution  factors  we  want.

This  is  the  first function I  would  like  to  demonstrate  to  you,

the  distribution  function.

There  are  many  other  things the  distribution  function  can  do,

including  see  the  data  distribution and   test  a  lot  of  thing,

[inaudible 00:24:49]  test ,  et cetera.

So  I  won't  come  in  here.

The  second  thing  I  want to  share  with  you  is  the   GOSS.

It is  actually the   GOSS fit  stepwise.

Now,  we  are  having  three  CTQs.

All  the  CTQ, the  analysis  dialogue  is  open  here.

For  all  the  three, they  have  separate  analysis  dialogue.

To  do  a  very  straightforward  way, we  just  the  Control  and  click  Go.

It  will  select  the  factors for  all  three  C TQs at  once.

Very  straightforward.

These  three  will  use the  same  stopping  rule,

which  is  the  minimum BI C.

Then,  let's  click Run M odel.

This  fit  model  will  give  us  our  fitting separately  for  C TQ1, 2, 3.

Then,  we  need  to  modify  the  model or  reduce  the  model  one  by  one.

Here,  our  criteria  is  to  choose the  p  value  lower  than  0. 05.

We  define  that  value.

When  the  p value  lower  than  5 %, the  factor  is  significant.

Here,  we  can  remove  X₄ .

Next  one,  for  C TQ2...

For  CTQ1,  we  are  down  here.

For  CTQ2,  we  can  remove  X  and  X2 .

Okay,  now  we  are  having both  lower   than 0.05.

For X₃, w e  also  can  do the  same  thing  accordingly.

Let's  move  X₄ first .

All  the  three  factors, p  value are lower  than 0.05.

We  have  get  the  reduced  model.

Here  at  the  bottom, we  will  have  a  prediction  profiler.

If  you  don't  have  it,  we  can  add  it from  the  profiler  function.

Then,  we  would  like  to  find the  optimum  condition.

How  we  are  going  to  do  that?

We  are  going  to  use the   desirability function.

First step, we  will  always  be  set  the  desirability,

which  is  already  set  here.

We  have  one  me ta-gate to  minimize.

Then,  let's  use  the  maximum desirability  function.

H ere,  we  can  find  our  optimum  condition.

If  we  use  Maximize and  Remember, here  is  our  optimal  condition.

Then,  we  can  use  this  condition to  run  the  process

and  valid ate the  sequence  again.

These  are  two  functions I'd  like  to  introduce  you.

Okay,  thank  you.

Thank you.

Comments
MxAdn

Nice presentation on the use of JMP to solve a very challenging research problem.  I am amazed on the success of GOSS with just a few experiments!