Choose Language Hide Translation Bar

A Cross-step Insight With Paired Upstream and Downstream DOE (2023-EU-30MP-1335)

During a process development phase, the process operating conditions must be characterized and optimized at upstream and downstream levels. However, despite the well-known effect of the upstream outcome properties on the performance of the downstream, these two levels are often studied and tested separately, and the interlinked correlations are ignored. As a result, it’s not uncommon to see cases where a significant improvement in the upstream is not retained downstream and, even in some extreme cases, causes additional downstream challenges and a lower overall process yield. Hence, it is important to have a comprehensive cross-step view in the design of the experiments during the process characterization. In this regard, the JMP DOE toolbox is ideal for addressing such cases. In this work, a case study will be presented that demonstrates a systematic two-step method to use the JMP custom design DoE toolbox and the covariate factors option for a cross-step experimental design. The design focuses on pairing the upstream and downstream experimental conditions in an efficient manner. Moreover, operational practicalities such as batch parallelization are included in the final design.

 

 

Hi,  my  name  is  Tannaz  Tajsoleiman.  Today,  I'm  going  to  talk  about  an  application  of  JMP  data  analysis  platform  for  pharmaceutical  industry.  The  focus  of  this  presentation  is  going  to  be  on  how  we  can  get  more  cross- level  insight  using  smarter   Design of Experiment  DoE  and  afterwards  how  we  can  analyze  it  better  in  order  to  find  optimum  process  settings  for  specifically,  for  example,  pharmaceutical  industries.

But  what  do  I  mean  of  cross- level  insight?  To  get  to  that,  let  me  explain  the  case  story  that  we  have  done  with  one  of  our  customers.  That  they  were  in  the  phase  of  process  development of  a  vaccine  production,  and  they  wanted  to  characterize  this  process.  If  we're  going  to  divide  the  different  phase  of  vaccine  production  and  the  process.  We  can  split  it  into  two  main  phases  of  Upstream  and  Downstream.

The  Upstream  phase  is  the  phase  that  they  infect  the  host  cells  with  a  limited  number  of  virus  and  then  keep  it  under  a  controlled  environment  and  under  specific  process  condition  for  the  virus  to  grow  and  the  population  reaches  the  target  level.  After  that,  they  have  to  move  the  materials  to  the  next  unit.  We  call  it  Downstream  Unit  for  purification  and  the  formulation  to  make  the  vaccine  ready  for  the  injection  and  going  out  to  the  market.

The  customer  were  focusing  on  the  process  development  of  these  two  phases,  meaning  that  they  needed  to  characterize  the  process.  In  order  to  do  that,  they  needed  to  identify  what's  the  most  significant  parameters  for  each  of  these  units,  both  Upstream  and  Downstream  using   Design of Experiments.  T hen  after  that,  collect  the  data  to  be  able  to  model  the  process  for  each  individual  unit,  and  then  using  the  model  to  find  the  optimum  robust  setting  for  each  of  them.

It's  a  very  typical  task  within  the  bio-industries  to  split  the  process  into  main  units  and  then  characterize  each  unit  individually.  M ost  of  the  cases,  this  characterization  are  not  influenced  by  each  other.  So  they  are  completely  independent  task  and   Design of Experiments.

Keeping  this  in  mind,  the  agenda  of  this  talk  is  going  to  be  on  how  we  can  make  these  DOEs  more  descriptive  and  more  comprehensive  to  have  the  highest  or  more  information  about  different  levels  of  the  process,  and  then  afterwards,  how  we  can  use  JMP  data  analysis  platform  to  first  model  the  data  and  then  use  the  model  to  optimize  the  processes.

Let's  start  with  the   Design of Experiment  for  this  specific  case  study. W e  started  with  actually  running  several  workshops  with  the  teams  coming  from  both  Upstream  and  also  Downstream.  T hen  we  had  several  workshops  to  just  figuring  out  what  are  the  most  important  factors  or  parameters  in  their  processes.

We  could  easily  see,  in  the  first  glance,  that  we  are  ending  up  to  a  very  high  number  of  factors  that  practically  it's  impossible  or  it's  out  of  the  budget  to  run  such  a  big   Design of Experiments.  W e  had  to  narrow  it  down  with  starting  scoring  each  of  these  parameters  based  on  the  importance  and  how  much  they  vary  within  a  normal  production  and  then  how  it  would  be  the  ease  of  change  for  each  of  these  factors.  W e  could  score  it  first  and  then  narrow  the  number  of  parameter  to  eight  for  each  of  these  units.

After  that,  we  could  easily  also  within  that  workshop,  could  see  the  high  influence  of,  for  example,  Upstream  outcome  on  the  Downstream  Unit  just  because  these  two  units  are  highly  connected  together  and  they  can  easily  influence  each  other.  Meaning  that  if  they  wanted  to  characterize  a  Downstream  separately  and  finding  the  optimum  setting,  most  probably  it  cannot  work  as  optimum  as  they  expected  because  it  can  highly  influence  by  the  outcome  of  Upstream  Unit.

It's  better  to  have  it  instead  of  two  individual  or  independent   Design of Experiment,  having  a  joint  DoE,  cover  both  sides  together  and  also  covers  the  interaction.  That  gives  us  a  very  good  cross- factor  overview  within  our   Design of Experiments  and  help  also  at  the  same  time  to  minimize  the  number  of  runs  that  they  make.  A lso  getting  too  much  better  randomization  factor  for  our  runs,  of  course,  under  practical  supervision.

How  we  have  done  this  theory?  Before  I  jump  to  the  demo  that  I  wanted  to  show  you,  let  me  give  you  some  information  about  the  limitation  that  we  had  in  this  theory.  F irst  of  all,  each  Upstream  batch  could  directly  go  to  one  Downstream  batch.  But  in  the  Downstream,  they  had  the  possibility  to  run  two  reactors  in  parallel  to  increase  the  capacity  of  Downstream.

But  the  biggest  limitation  they  had  in  the  Downstream  was  that  when  they  were  running  those  parallel  reactors,  two  of  the  design  factors  had  to  be  set  the  same  for  those  two  parallel  reactors.  For  example,  in  this  case,  they  had  to  keep  the  Enzyme  time  and  Hold  up  time  on  the  Downstream  constant  on  both  parallel  reactors.

Then  also  it  was  very  important  to  have  a  proper  time  planning  between  Upstream  and  Downstream,  meaning  that  they  couldn't  have  a  big  lag  from  the  finishing  of  the  Upstream  batch  until  it  reaches  to  the  Downstream  batch.

Also  between  these  two  phases,  we  could  identify  a  factor  called  initial  cell  density  that  is  mostly  coming  from  Upstream,  that  it  was  a  common  factor  between  these  two  units.  W e  call  it  as  a  pair  factor  to  join  these  two  DoEs  and  connect  these  two  DoEs  into  one  complex  one.

To  start  that,  we  also  needed  to  understand,  okay,  what  is  the  minimum  number  of  Upstream  batches  or  Downstream  batches  we  need?  S ince,  as  you  can  see  here,  they  both  had  eight  design  factors,  but  since  the  Downstream  process  had  the  extra  limitation  on  our  design,  that  would  be  the  one  controlling  the  minimum  number  of  experiments.  That's  why  I  start  the  DoE  by  looking  at  the  Downstream  process.

Let's  have  a  look  at  the  demo.  As  you  can  see  here,  these  are  the  design  factor  that  we  had  for  the  Downstream  Unit.  I  want  to  know  how  many  minimum,  how  many  experiments  or  badges  do  I  need  from  Upstream  to  be  able  to  cover  all  the  required  badges  or  experiments  I  run  in  my  DoEs.

Starting  with  custom  design,  then  I  can  load  my  factor.  These  are  all  factors  that  I  have  for  Downstream.  Plus  I  need  to  add  my   initial  cell  density,  the  part  factor  that  comes  from  Upstream  batches.  As  you  can  see  here,  almost  all  of  these  factors,  the  changes  are  set  as  easy  except  the  Hold up  time  and  also  Enzyme  time,  which  are  the  two  factors  that  I  want  to  keep  them  similar  for  the  two  running  batches  in  parallel.

With  this  setting,  I  can  be  sure  that  those  batches  are  set  to  have  the  same  Hold  up  time  and  Enzyme  time.  Moreover,  I'm  interested  in  the  main  effect  in  my  model  and  the  interactions.  Let's  have  a  look  here.  As  you  can  see,  some  of  them,  the  interactions  are  set  at  if  possible,  which  I  want  to  have  them  as  necessary.  And  we  are  good  here.

As  you  can  see  here,  JMP  suggests  me  that  you  need  minimum  19  experiments  in  a  Downstream  to  cover  for  these  interactions.  T o  be  safe,  we  always  want  to  have  one  minimum  extra  on  top.  T hat  gives  me  20  experiments  or  20  Upstream  batch  as  the  minimum  starting  point  for  Upstream.

Okay,  then  let's  have  a  look  at  my  Upstream.  These  are  my  Upstream  factors  and  I  can  load  it  in  my  custom  design.   I  am  also  here  interested  in  two  interactions.  S ince  we  said  that  we  want  to  have  20  Upstream  experiments,  I  can  specify  as  a  user  how  many  experiments  I  want.  T hen  I  can  make  the  design  by  pressing  this  one.

To  save  the  time,  I  already  saved  the  design  and  let  me  bring  it  up  here.  This  is  the  experimental  design  for  my  Upstream  in  the  first  level.  Now,  I  want  to  use  this  design  to  design  my  Downstream  experiment.  To  do  that,  I  go  back  to  my  design  factor  for  Downstream.  A gain,  custom  design.  I  will  load  my  factor  as  I  want  them  to  be.

N ow,  I  want  to  include  this  design  experiment  for  Upstream  in  my  experiment.  How  I  do  that  is  that  I  first  select  that  design  window  and  then  I  can  call it.  I  can  select  the  coherent  factors,  and  then  import  that  experiment  also  in  my  design.  A s  you  can  see  here,  these  are  the  experiments  that  was  designed  for  the  Upstream,  and  they  are  set  as  a  covariate .

More  than  step  is  that  I  want  to  have  all  my  Downstream  parameters  plus  my  initial  set  density.  I  want  to  look  at  them  also  in  interaction  mode.  But  the  factors  that  comes  from  Upstream,  I  keep  them  as  a  main  effect.  As  you  can  see  here,  there  are  some  still  at  if  possible,  I  need  to  be  sure  that  they  are  set  as  a  necessary  one.

Then  since  I  already  included  a  design  experiment,  then  I  can  exclude  the  factor  that  I  have  from  Upstream.  Maybe  I  can  just  remove  them  all  here  again.  Add  the  main  factor,  the  second  order,  and  then  the  factors  from  Upstream  only  as  a  main  factor.

Then  I  can  choose  them.  If  I  go  up  here  and  say  perfusion  rate,  this  one,  this  one,  and  this  one,  these  are  the  factors  are  coming  from  my  Upstream,  and  I  can  set  them  as  if  possible.  Just  to  be  sure  that  I  have  everything  included  in  this  one  and  this  one  has  to  be  also  if  possible.  I  don't  need  to  force  it  in  this  design  anymore  because  it's  already  included  there.  Now,  I  can  set  it  to  have  10  whole  plots  and  20  runs  here  and  make  the  design.

To  save  the  time  again,  I  have  made  the  design  and  this  would  be  the  outcome.  Now  here  I  have  a  combined  DoE  both  for Upstream  and  Downstream.  The  nice  part  about  it  is  that  now  I  have  a  column  called  Whole  Plot  that  it  says  that  for  the  same  number,  meaning  that  it  says  that  each  parallel  unit,  I  have  a  fixed  Hold up  time  and  the  Enzyme  time.

As  you  can  see  here  that  it  changes,  it  keeps  constant  between  each  two  pairs.  I'm  combining  both  designs  Upstream  and  Downstream  now  into  one  unit.  Then  I  can  also  prioritize  which  Upstream  Unit  has  to  be  run  first  to  be  able  to  keep  the  time  balance  between  the  Upstream and  Downstream,  and  also  have  so  much  flexibility  in  keeping  the  condition  fixed  between  the  parallel  expert.  This  could  give  me  a  very  good  overview  of  both  level  and  then  both  phases  together.

Let's  go  back  to  the  slides.  N ow  we  have  a  good   Design of Experiment  covering  both  phase.  Now  let's  have  a  look  how  we  can  do  the  analysis  together.  But  to  save  the  time,  I'm  going  to  only  look  at  the  Upstream  process  and  look  at  the  data.  T he  aim  is  that  now  I  want  to  model  the  growth  phase  of  the  virus  in  the  Upstream  and  use  the  Kinetic  Modeling.

Then  after  I  find  that  model,  then  I  can  find  the  optimal  process  conditions  for  Upstream.  How  can  use  the   functionality  build  up  in  JMP?  T his  is  my  data  set  at  the  moment,  it's  a  normalized  data  set  that  we  have.  A s  you  can  see  here,  this  is  the  data  that  we  collected  during  this  experiment.  It  was  30  different  batches.

T hen  we  ran  different  processes  and  we  monitored  the  growth  phase  of  virus  over  different  days  and  different  conditions.  And  then  we  could  see  how  the  virus  population  builds  up.  U nder  some  conditions  after  a  while,  the  virus  population  starts  to  degrade  or  go  to  the  dead  space.

N ow  we  want  to  characterize or model  this  profile  or  this  growth  phase  of  the  virus  to  be  able  to  predict,  okay,  what  would  be  the  best  combination  of  cell,  virus,  the  environmental  condition,  and  also  how  many  days  do  we  need  to  run  in  the  Upstream?

To  do  that,  JMP  has  a  very  nice  feature  in  the  specialized  modeling  and  fit  curve.  I  can  put  Y,  the  yield  is  the  virus  concentration  or  population  over  time,  as  Y.  X  as  my  day  and  the  batch  number  and  the  factors  that  is  controlling  my  process,  which  is  the  incubator,  the  temperature  and  the  initial  virus  concentration  in  the  logarithmic  scale.

The fit curve  gives  me  a  very  nice  initial  overview  of  how  each  of  these  batches  build  up  over  time  individually.  But  then  now  I  want  to  model  these  batches  or  these  characteristic,  this  Kinetic  model.  We  have  a  very  nice  option  here  in  JMP  that's  called  Exponential  Growth  and  Decay  and  then  Fit  Cell  Growth.  So  it's  a  built- in  library  that  you  can  easily  choose  and  then  you  can  fit  a  logistic  growth  model  to  your  profile.

As  you  can  see  here,  this  cell  growth  models  tries  to  fit  this  functionality  to  each  of  my  curves,  covering  of  Y0 ,  which  is  initial  virus  concentration,  YM ax  is  the  maximum  reached  virus  concentration,  and  then  division  rate  and  then  mortality  rate.  T hen  it  gives  you  a  nice  summary  of  each  of  these  batches,  and  this  is  the  valid  development  of  each  of  these  parameter  for  each  of  the  batches.

But  then  I  want  to  have  one  extra  further,  just  figuring  out  how  is  this  process  parameter  affecting  each  of  my  factors  here.  T hen  in  Pro  version  of  JMP,  there  is  a  very  nice  option  called  Curve  DoE  A nalysis  that  gives  you  this  option  to  analyze  what  factors  impacting  each  of  my  process  parameter  of  YM ax,  for  example,  division  ratio  or  mortality  rate  or  so  on.

As  you  can  see  here,  it  gives  you  a  combined  window  that  for  each  of  these  four  parameters,  it  gives  you  a  possibility  to  use  the  Generalized  Regression and  then  try  to  analyze  or  model  each  of  them  individually.  As  an  example,  you  see  that  for  YM ax, I  can  see  the  temperature  and  initial  virus  concentration  has  a  significant  effect  and  incubators  are  not  working  similar.  You  really  get  a  good  overview  of  the  different  parameters  and  stuff.

Or  if  I  go  for  mortality,  I  look  at  the  Profiler,  I  see  also  how  would  be  the  effect  of  these  two  factors  on  each  of  the  incubators,  for  example.  It  has  a  very  good  user  interface  to  combine  all  these  analysis  together  and  gives  you  a  very  nice  overview  of  different  factors  and  their  effect  on  my  process.

But  also  gives  you  a  nice  Profiler  at  the  bottom  here  that  gives  you  good  information  about,  okay,  if  I  change  my  incubator  and  temperature  and  my  initial  virus  concentration,  how  does  it  affect  my  yield?  T hen  if  I'm going to, for  example,  reach  to  my  target  within  half  a  day,  how  should  be  my  process  condition?  T hen  I  can  easily  use  the  Desirability  Function  here  and  maximize  it.

What  I  can  see  here,  if  I  want  to  reach  my  target  and  get  the  maximum  virus  concentration  within  half  a  day,  I  have  to  start  with  the  highest  initial  virus  concentration  and  highest  temperature. While i f  I  give  it  more  time  and,  for  example,  say  that  it's  fine,  I  can  run  my  Upstream  with  two  and  a  half  days,  then  the  situation  would  be  different.

You  would  say  that,  okay,  to  get  the  higher  yield,  you  need  to  start  with  the  lower  cell  density  or  virus  density,  and  then  viruses  that  like  that  high  temperature.  That's  why  you  should  go  with  a  lower  temperature.  It's  a  very  nice  intuitive  way  to  just  play  around  with  different  factors  and  then  based  on  the  visibility  of  your  system  and  also  the  practicality  that  you  have  in  your  process,  then  you  can  fine  tune  and  find  the  optimum  robust  process  condition.

This  is  fantastic.  But  the  biggest  problem  that  I  have  with  this  Curve  DoE  Analysis  is  that  I'm  missing  some  very  nice  functionality  in  the  standard  modeling  part  of  it,  which  if  I  go  for  diagnostic  plot,  for  example,  I'm  missing  this  digitized  residual  plot  that  helped  me  to  find  outli er,  which  I  cannot  identify  if  I  do  have  outli er  here  or not.

Or  if  I  do  need  to  apply  some  transformation  like  p lots  transformation  or  so.  I  really  missed  those  functionality  in  this  part,  but  I  still  have  an  option  to  compensate  that  missing  with  extracting  this  summary  out  and  then  try  to  model  each  of  these  parameters  individually.  But  let's  do  that.

I  extracted  that  parameter,  as  you  can  see  here,  the  group  summary  parameter,  and  I  extracted  them,  and  then  I  added  to  my  data  table.  Now  I  can  have  a  look  at  each  of  them  individually  and  make  the  analysis  and the  to  just  figuring  out.

For  example,  I  will  go  for  Y0 ,  and  then  you  have  the  option,  if  I  go  here,  to  just  apply  your  own  routine  of  modeling  and  then  include  all  the  factors  and  use  Generalized  Regression  and  then  move  on  with  figuring  out  if  you  need,  for  example,  to  apply  it  in  transformation  or  remove  any  outliers.  Then,  for  example,  if  I  look  at  this  one,  if  I  look  at  YMax  model  and  I  try  to  model  it  in  a  normal  routine,  I  can  see  that,  for  example,  I  do  have  one  out lier  that  I  have  to  remove  it.

Or  for  example,  I  need  to  apply  some  transformation  on  my  data  to  clear  it  out  first  and  also  to  make  it  more  normalized  and  then  include  it  in  my...  Then  move  on  with  my  rest  of  the  analysis.  But  then  if  I  compare  here,  for  example,  now  I  want  to  remove  the  outlier  here.

Now  I'm  removing  the  outlier  here.  Then  I  see  that  I  don't  need  any  more  plots  curves  transformation  after  removing  the  outlier.  What  I  see  here,  this  is  going  to  be  the  profile  from  a  normal  or  standard  approach  of  modeling  of  each  of  the  parameters.  While  if  I  come  back  to  this  section,  I  was  missing  that  analysis  and  therefore,  what  I  could  get  out  of  it  was  such  a  Profiler  because  I  still  had  my  outlier  in  this  data  set  and  I  couldn't  see  that. T hen  it  was  coming  with  something  else.

If  I  do  this  first,  I  do  my  initial  inspection  and  then  I  can  come  back  here  in  my  picture  and  then,  for  example,  I  can  apply...  I  first  need  to  remove  this  again,  Re move  Fit.  And  then,  for  example,  I  can  apply  my  local  data  filter  and  exclude  my  outlier  and  then  do  the  modeling  part  of  it.  And  then,  for  example,  and  move  with  the  other  part.

I  have  this  option  to  do  this.  And  of  course,  I  do  have  the  option  to  force  factor  in  model  launch,  as  that's  controlled  and  force  term,  to  force  some  factors  that  I  could  see  in  my  previous  analysis  that  I  have  to  have  it.  To  start  with.  I  have  these  two  options  to  compensate  for  the  missing  diagnostic  part  in  this  analysis.

But  in  general,  that's  a  perfect  tool  to  have  a  very  good  overview  about  what  could  be  the  potential  factors  affecting  on  my  process.

Okay.  To  summarize  how  we  can  use  this  functionality,  as  you  can  see  here,  we  started  with  fitting  a  curve,  and  then  we  extracted  the  model  parameter,  and  then  we  could  investigate  each  of  those  parameter  individually,  and  then  either  we  force  the  significant  factor  in  the  fit  curve,  or  exclude  the  outliers,  and  also  just  beforehand  apply  any  transformation  that  is  needed,  and  then  you  can  get  a  combined  overview  as  such  in  the  fit  model.  This   going  to  be  a  very  nice  analysis and  a  very  useful  package  of  functionality  for  our  case.

To  summarize,  what  we  have  done  is  that  I  really  wanted  to  emphasize on,  first  of  all,  it's  very  important  to  know  the  complexity  of  the  system,  especially  when  we  have  interact  with  some  different  level  of  processes  that  they  are  interacting  partly  together and  before  moving  on  for  several  DoEs  that  they  are  independently  working  and  they  are  independently  used  to  characterize  the  system.

Try  to  combine  these  DoEs  into  one  coherent  one  because  it  really  gives  you  a  good  overview  of  multi- layer  interactions  and  also  gives  you  a  lot  of  information  that  you  miss  in  practice  if  you  run  several  independent  DOEs.

Then  JMP  also  gives  you  a  lot  of  flexibility  on  building  up  these  DoEs  and  also  implement  a  lot  of  practicality  information  in  your  DoEs.  T hen  at  the  end  also,  we  could  see  that  how  different  data  analysis  functionality and  also  built  in  libraries  could  help  to  characterize  the  different  processes,  specifically,  for  example,  in  this  case,  it  was  the  self- growth  process,  and  how  we  can  use  some  very  powerful  tools,  especially  in  JMP  Pro,  and  also  how  we  can  interact  with  other  standard applications  that  you  have  in  JMP  to  compensate  for  some  missing  features  in  the  JMP  Pro  Advanced  Analysis  Toolbox.  With  that,  I  want  to  thank  you  so  much   for  your  attention  and  I  hope  you  like  the  presentation.