cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
CMC, SVEM, Neural Networks, DOE, and Complexity: It’s All About Prediction - (2023-US-30MP-1499)

Philip Ramsey, Consultant and Professor, North Haven Group and University of New Hampshire
Patricia D. McNeill, Associate Director of Cell Culture Development, Lundbeck Seattle BioPharmaceuticals, Inc.

 

Scientists in biopharma working along the CMC development pathway are challenged by the increasing complexity of biologic-based therapeutics and processes that produce them. Many critical responses exist (often 10-20) that are themselves functions of numerous, highly interactive input process factors.  

 

We use a large case study to show that current experimental design strategies combined with current response surface methods are generally inadequate to deal with the complex kinetic behaviors observed to be ever-changing across the design space. The case study consists of a 7-factor, hybrid experimental design used to develop a bioprocess with 13 critical responses. Employing a combination of SVEM, neural network models, and the hybrid experimental design, we show that accurate predictive models can be estimated for all responses that interpolate satisfactorily throughout the design space. 

 

Furthermore, we show that the powerful tools in JMP and the Prediction Profiler, specifically the Design Space Profiler, are essential to visualizing and understanding the experimental design space and optimizing the bioprocess.  We also discuss the need for new optimal design strategies. JMP Pro 17 is used throughout the talk.

 

 

I  want  to  thank  the  JMP  steering  committee and  the  JMP  organizers

for  inviting  Phil  and  myself to  come  and  present  our  exciting  talk

on  CMC ,  SVEM ,  DOE ,  and  Complexity : It 's  All  About  Prediction .

Want  to  start  by  thanking  Dr .  Tiffany  Rao , she 's  been  involved  with  the  planning

and  numerous  conversations  for  the  work that  we 're  going  to  present  today .

Going  to  do  an  overview , tell  you  who  Lundbeck  is ,  who  I  work  for ,

and  then  provide  the  background

for  the  DOE that  we 're  going  to  talk  about ,

which  is  process  development for  a  biologic  drug .

Our  case  study and  what  I 'm  doing  in  traditional

for  what  I 've  started  to  do for  development

is  start  with  the  first  step  of  doing  DSD for  mid -late  stage  development ,

then  follow  that  with  a  second  step

of  doing  augment with  a  space -filling  design .

Then  we  are  hoping  to  prove  to  you  today that  for  analysis  that  SVEM

allows  us  to  have  better  prediction for  all  of  this  work  and  allows  us  to  have

better  timelines for  our  work  that  we 're  doing .

Lundbeck  is  located

We 're  headquartered  in  Copenhagen ,

we 're  over  6 ,000  employees in  over  50  countries ,

and  we  are  striving  to  be the  number  one  in  brain  health .

The  part  of  the  company  that  I  work  with is  the  CMC  biologics

and  we 're  basically  located in  the  Copenhagen  area

and  in  the  Seattle  area  where  I 'm  located .

Let 's  talk  about  the  background  for   the  DOE  that  we 're  going  to  present  today .

The  process  that  we  want  to  develop for  drug  substance ,  for  these  biologics ,

we  start  with  a  cell  of  vials , we  take  those  out  of  the  freezer ,

we  then  expand  in  shake  flasks , go  bigger  into  culture  bags ,

maybe  a  seed  bioreactor , then  to  a  production  bioreactor .

That  production  bioreactor goes  approximately  two  weeks .

We  have  complex  nutrient  feeds ,

we  have  PH  control ,  temperature  control , there 's  the  base  that  we 're  adding .

Once  we  finish  that  14 -day  production , we  need  to  figure  out  a  way

to  get  the  cells  that  are  secreting our  molecule  into  the  supernatant .

How  do  we  separate the  cells  from  the  product ?

That  harvest  can  be  a  centrifuge , it  can  be  depth  filtration .

Then  we  pass  it  on to  our  downstream  colleagues .

They  first  usually  do  a  capture  step where  they 're  getting  rid

of  most  of  the  host  cell  proteins , the  host  cell  DNA .

But  then  we  need  to  do  two  polished  steps where  we 're  then  saying ,

"Okay ,  what  are the  product -related  impurities ?

Maybe  there 's  not  the  full  molecule  there , so  we  have  to  get  rid  of  those ."

Then  finally ,  we  have  to  make  sure , through  ultra  filtration  and  diofiltration

that  we  can  transfer  into  the  buffer

that  it 's  going  to  be  when  it  is transferred  for  the  patient 's  use

and  it 's  also  at  the  right  concentration .

You  can  imagine , every  step  along  this  way ,

there  are  many  factors ,

there  are  many  knobs  that  we  can  turn to  control  this  process ,

make  sure  that  it 's  robust

and  we 're  making the  same  product  every  time .

When  we 're  focused on  treating  the  patient ,

we  also  want  to  focus  on  the  business .

We  can 't  put  all  of  our development  resources  for  every  molecule .

We  want  to  right -size the  research  that  we 're  doing

at  the  right  stage  of  the  product .

There 's  many  things that  could  kill  a  product ,

but  if  we  can  develop  this in  the  right  time  and  the  right  space

using  these  tools  from  JMP ,

we  can  shift this  development  timeline  to  the  left

and  we  can  also  reduce the  amount  of  resources

and  the  cost  to  the  company .

If  we 're  first  getting  a  molecule ,

that 's  when  you 're  going  to  start  looking at  your  categorical  factors .

We  might  be  doing  the  cell  line  screening .

We  want  to  make  sure  that  we  have the  right  cell  line

that 's  going  to  last  all  the  way through  commercialization .

For  the  downstream  group , they  may  be  looking  at  resins

for  both  upstream  and  downstream ,

looking  at  medias  and  buffer  components and  the  formulations  of  those .

That 's  when  you 're  making  sure that  you  have  the  right  thing ,

that 's  going  to  keep  you  going through  your  development  pathway .

But  then  once  you 're  in  the  clinic ,

now  you  want  to  really  start  to  gain understanding  of  the  process  parameters .

Our  strategy  is  to  start with  a  development  screening  design

and  we  want  to  be  bold in  our  level  settings  at  this  stage

and  I 'll  talk  a  little  bit  more about  that  later ,

for  the  late  stage  development .

Then  we  can  build  on  what  we  learned from  the  Definitive  Screening  Designs

by  augmenting  those  designs with  space -filling  or  other  designs

so  that  we  really  understand that  design  space .

What 's  different that  we 're  hoping  to  show  now

than  traditional  walks through  this  pathway

is  that  in  the  past , we 've  been  throwing  out

the  factors  that  we 've  said aren 't  important .

But  with  modern  designs and  modern  ways  of  doing  analysis ,

we  can  keep  all  of  the  factors and  all  of  the  work  that  we 've  done  so  far

and  gain  better  understanding of  the  whole  process ,

especially  with  biologics that  are  quite  complex .

Before  I  pass  the  baton  to  Phil , I  just  wanted  to  talk  one  more  about

Let 's  see  if  I  can

I 'm  going  to  minimize  this  screen just  for  a  minute  so  I  can  show  you  this .

This  is  an  experiment  that  I  did to  prove  the  power  of  DOE  for  my  boss .

The  full  data  set  was  an  OFAT  for  PH , and  the  response  was  tighter .

We  wanted  to  do very  many  different  levels

in  a  wide  range because  he  wasn 't  sure  at  the  time

that  we  were  going  to  be  able to  pick  what  the  optimized  level  was .

But  what  I  wanted  to  show  him  was  that ,

"Okay ,  we  did  this  experiment , we  have  all  of  this  data .

We  were  able  to  model where  the  optimized  condition  was ,"

and  that 's  shown  in  blue ,

and  that  turned  out to  be  the  correct  case .

When  we  tested  the  model , that  was  the  optimized  condition .

Let 's  pretend  now  that  we 're  starting , we  don 't  know  that  data .

If  we  had  picked  a  conservative range  setting  for  our  experiment ,

our  noise  to  signal  would  be  quite  high

and  so  we  would  have  missed finding  the  optimized  spot .

But  if  we  had  picked  a  wider  range in  our  settings

and  still  with  only  three  points ,

the  model  still  would  have  chosen the  optimized  spot .

What  I 'm  going  to  challenge the  subject  matter  experts

when  you 're  designing  your  DSDs is  really  be  bold  in  your  range  setting .

You  will  still  find  the  optimized  spot

and  you  have  to  have  some  knowledge of  your  process  so  that  you  can  complete

the  design  of  experiment

and  have  all  of  the  runs at  least  have  enough  signal

that  you  can  measure and  then  subsequently  model .

Once  you  learn from  your  Definitive  Screening  Designs

more  about  your  design  space , you  can  come  back

and  then  you  can  be  internal to  that  space .

That 's  when  you  augment with  a  space -filling  design .

Now  I 'm  going  to  pass  the  baton  to  Phil

and  he 's  going  to  take  you through  the  analysis .

Okay ,  thank  you . Thank  you ,  Patty .

We 're  going  to  talk  about  a  very  nice

and  somewhat  complicated  experiment that  Patty  and  her  team  run .

They  do  a  lot  of  great  work and  they 're  big  advocates  of  DOE  and  JMP

and  I 'm  very  happy  they  let  me get  to  play  with  them  sometimes .

It 's  fascinating  work .

But  before  I  get  into  the  actual  analysis ,

I  wanted  to  talk  about a  few  relevant  concepts

that  members  of  the  audience  may

or  may  not  be  familiar  with , and  that  includes  complexity .

It 's  a  really  hot  topic  out  there .

Talk  about  what  is  actually  prediction .

That  is  a  muddled  concept  to  many  people .

Then  from  there , I 'll  launch  into  talking  about

how  we  analyze  prediction and  how  we  did  with  Patty 's  experiment .

Complexity,  a  fellow named  Daniel  Finelli

from  London  School  of  Economics, written  much  about  this

and  he  calls  it  "the  elephant  in  the  room " that  statistics  and  many ,

what  he  calls  "metasciences ,"  are  ignoring and  they 're  ignoring  it  at  their  peril .

I  won 't  get  into  a  lot  of  detail .

You  can  look  him  up  on  the  internet , he  has  a  lot  of  videos  and  papers .

But  complexity  is  a  huge  problem .

It  is  staring  science  and  statistics

and  data  science  and  machine  learning in  the  face  and  it  needs  to  be  dealt  with .

At  present ,  we 're  not  really  dealing with  it  directly  in  statistics .

By  the  way there  are  now   whole  applied  math  programs

based  on  studying  complex  systems .

My  bottom  line  is ,  complexity  is  real .

Complexity  requires  new  thinking .

We  really  have  to  rethink DOE  and  analysis .

You 're  going  to  see  that for  complex  systems,

and  we  also  have  to  understand something  else ,  systems  theory  101  is

complex  systems  are  defined by  their  interactive  behavior .

In  point  of  fact ,  main  effects are  actually  even  misleading .

You  have  to  somehow  be  experimenting

in  a  manner  that  you  can  capture this  interactive  behavior ,

and  you 're  going  to  see  current  strategies fall  short  of  that  goal .

Patty 's  already  mentioned  the  CMC  pathway .

Nowhere  is  this  problem  of  complexity more  obvious  than  in  bioprocesses .

You  have  complex  combinations of  biology  and  chemistry ,

and  interactions  are  everywhere .

When  I  talk  to  scientists in  biotechnology ,

they  know  right  up  front  we 're  dealing with  really  complex  interactive  systems .

But  first ,  I  need  to  point  out  prediction .

If  you 're  working  in  CMC  development  work , it 's  all  about  prediction .

The  ICH  guidelines  that  are  used by  scientists  in  the  CMC  development  work

don 't  specifically  say  prediction ,

but  if  you  read  what  they  say , it 's  all  about  prediction .

Basically ,  you 're  building  processes to  manufacture  biologics ,

and  with  the  new  cell  and  gene  therapies ,

these  processes  are  becoming hopelessly  complicated .

I  personally  rely  heavily on  the  scientists  to  explain  it  to  me ,

and  they 're  the  people who  really  make  all  the  decisions .

I 'm  the  helper ,  and  I 'm  very  happy to  be  there  as  part  of  it .

But  it 's  all  about  prediction .

That  is  not  how  many  scientists and  even  statisticians ,

have  viewed  CMC  work .

By  the  way ,  this  applies to  all  areas  of  science .

I 'm  focused  with  Patty on  the  CMC  development  pathway ,

but  prediction  is  important .

What  is  prediction ?

It 's  muddled . It 's  not  clearly  defined  in  disciplines .

Here 's  what  it  really  is and  how  I  define  it .

It 's  a  measure  of  how  well

models  that  you  develop interpolate  over  a  design  region .

In  other  words ,  we 're  going  to  fit  a  model to  what  we  call  a  training  set ,

and  then  we  need  some  way  of  knowing how  that  model  would  apply

over  the  whole  design  region .

In  CMC  work ,  especially  late  stage , that  is  very  important .

You  be  able  to  do  that , as  many  of  you  know .

You  really  have  a  training  set to  fit  the  model .

That  training  set  in  no  way can  evaluate  prediction .

I  know  there 's  a  common  belief

you  can  evaluate  prediction on  training  sets .

You  simply  can  not .

You  must  have  a  test  set .

Also  I 'll  talk  a  little  bit about  the  fact  in  dealing  with  scientists ,

and  a  lot  of  it in  chemistries  and  biologics .

Again ,  I  do  a  lot  of  it  in  biotechnology ,

but  also  in  other  areas like  battery  technology ,  material  science .

It  is  becoming  very  obvious .

The  kinetics  are  complicated .

They 're  constantly  changing over  design  regions .

The  kinetic  behavior that  you  see  around  the  boundaries

is  often  very  different from  what 's  happening  on  the  interior .

Why  does  this  matter ?

Well ,  the  classic  approach to  response  surface ,

even  including  optimal  designs , relies  upon  what  I  call  boundary  designs .

Almost  all  of  your  observations  are  around the  boundaries  of  the  design  region .

In  point  of  fact , whether  people  want  to  hear  it  or  not ,

the  central  composite  design ,

commonly  used  in  response  surface ,

is  about  the  worst  design you  could  think  of  for  prediction .

The  interior  of  the  space  is  empty .

If  you  fit  these  models  on  the  boundary ,

and  then  you  predict what 's  happening  on  the  interior ,

it 's  not  prediction ,  it 's  speculation .

You  don 't  know . You  have  no  data .

I 'm  going  to  show  you in  the  case  study ,

you 're  probably  going  to  reach some  wrong  conclusions .

The  boundary  regions ,  indeed , often  behave  very  differently ,

and  we  have  a  need  to  reconsider our  approach  to  designs .

Another  issue

in  response  surface  and  statistics

is  this  ubiquitous  use of  full  quadratic  models .

They  are  not  sufficient to  model  complex  response  surfaces .

In  fact ,  they 're  far  from  it .

Unfortunately ,  I  get  a  lot  of  pushback

from  statisticians who  claim  it  is  good  enough .

My  answer  is , "Well ,  if  you  actually  use  designs

that  had  sufficient  interior  points ,

you 'd  quickly  discover they  don 't  fit  well  at  all .

Again ,  trying  to  measure  prediction

on  the  interior  of  a  design  region using  boundary  designs  is  futile .

By  the  way ,  my  good  friend , the  late  John  Cornell  and  Doug  Montgomery ,

published  a  paper  on  this  in  1998 , and  I 'll  be  polite ,  they  were  ignored .

It  was  actually  somewhat  nastier than  ignored  by  the  statistics  community .

They  showed  in  the  paper that  full  quadratic  models

are  just  not  sufficient to  cover  a  design  region .

Patty  mentioned  SVEM , self -validating  ensemble  modeling .

It 's  an  algorithm .

I 'm  one  of  the  co -developers with  Dr .  Chris  Gottwald  of  JMP ,

a  person  I  hold  in  very  high  regard .

I  won 't  get  into  the  algorithm  by  the  way ,

there  are  references  at  the  end where  you  can  go  and  learn  more  about  it .

It  has  been  talked  about at  discovery  conferences  actually ,

going  all  the  way  back to  Frankfurt  in  2017 .

But  SVEM  is  an  algorithm  that  allows  you to  apply  machine  learning  methods .

Machine  learning  methods  are all  about  predictive  modeling .

Believe  me ,  people  in  that  field know  a  lot  more  than  you  may  think

about  prediction  and  apply  them to  data  from  small  sets  like  DOE 's .

I  won 't  get  into  SVEM .

It 's  a  whole  new  way  of  thinking about  building  predictive  models ,

and  I  think  it 's  in  its  infancy ,

but  it 's  already  proving  very  powerful and  useful  in  biotechnology .

Let 's  get  to  the  experiment .

This  is  actually  a  hybrid  experiment that  Patty  and  her  team  created .

There  are  seven  factors and  there  are  13  responses .

But  due  to  time  constraints , I 'm  only  going  to  focus  on  four ,

and  even  that 's  going  to  be  hard to  get  it  all  in .

The  data  and  the  experiment are  highly  proprietary .

I  do  thank  Lundbeck  and  Patty for  actually  allowing  us  to  use

an  anonymized  version  of  this  design .

I  have  a  lot  of  case  studies , some  of  them  similar  to  this ,

and  the  people  who  own  the  data

wouldn 't  even  let  me  discuss  it if  I  anonymized  it .

That  was  very  nice  of  them .

I  think  we  have a  really  important  story  to  tell  here .

This  is  a  hybrid  design .

It 's  comprised  of  a  19 -run Definitive  Screening  Design

around  the  boundaries .

Then  it  has  16  space -filling  designs on  the  interior .

There  are  center  points in  both  parts  of  the  design .

How  would  we  analyze  this ?

Well ,  what  I  want  to  do  is  discuss the  strategies  of  analysis  that  are  used ,

the  algorithms  that  are  used , and  make  comparisons  to  SVEM .

I 'll  tell  you  in  advance , SVEM  is  going  to  do  very  well .

Then  we 'll  talk  about  some  of  the  issues with  the  models  themselves

and  how  we  use  them .

I 'm  going  to  do what  most  people  currently  do .

I 'm  going  to  take  the  boundary  points , the  DSDs ,

fit  models ,  and  then  apply  them to  the  space -filling  designs  as  a  test  set

and  see  how  well  my  model  interpolates .

Step  two ,  I 'll  reverse  the  process .

I 'll  fit  models to  the  space -filling  points ,

and  then  I 'll  use  the  DSD  as  a  test  set and  see  how  well  my  model

actually  extrapolates a  little  bit  to  the  boundaries .

Three  is  a  common  strategy used  in  machine  learning .

I 'm  going  to  use  a  holdback  test  set .

I 'm  going  to  take  the  35  runs and  break  them  up .

I  did  this  in  a  way  to  make  them  both equivalent  as  much  as  I  could

into  a  training  set containing  both  SFD  and  DSD  points ,

and  then  also  a  whole  back  test  set that  has  a  representation  of  both .

Then  finally ,  step  four , what  many  people  would  automatically  do .

I 'll  just  fit  models to  the  whole  data  set .

In  general ,  I  don 't  recommend  this because  there 's  no  way  to  test  the  model .

I  will  say  up  front ,

because  we  do  have  a  lot of  space -filling  points  on  the  interior ,

I 'm  more  comfortable  with  this  approach than  I  am  in  practice .

But  these ,  I  find ,  are  the  four basic  strategies  that  would  be  used .

How  do  I  analyze  it ?

Well ,  if  you  have  a  DSD ,

people  like  to  use Fit  Definitive  Screening ,

I 'll  look  at  it ,  it  only  applies  to  DSDs .

Honestly ,  it 's  not  really a  predictive- modeling  strategy ,

nor  do  they  claim  it  is .

But  I  find  people  seem  to  use  it  that  way .

I 'll  use  Forward  Selection .

If  you  know  what  the  AICc  statistic  is , we 'll  do  that  in  GenReg ,  in  JMP  17 .

Then  we 'll  look  at  something they  have  in  GenReg  that 's  very  nice .

That  is  the  SVEM  algorithm .

I 'm  going  to  use  that   with  Forward  Selection .

Then  I 'm  going  to  look at  something  people  may  not  know .

It 's  a  hidden  gem  in  JMP .

Something  called  Moving  Average in  the  Stepwise  platform .

John  Saul  put  it  there  many  years  ago .

I  think  he  was  being  very  insightful .

Then  we 're  going  to  talk  about SVEM  and  Neural  Networks .

Basically ,  no  software  does  this .

I  have  worked  with  the  Predictum ,

some  of  you  know  Wayne  Levin  and  Predictum to  develop  an  add -in  to  do  this .

It 's  currently  the  only   software  available  that  does  this .

The  SVEM  add -in  was  used to  do  the  Neural  Networks .

I  won 't  get  into  the  add -in  particularly ,

I 'll  just  quickly  show  people where  these  things  are .

Then  finally  I  said  the  fourth  strategy   was  used  to  hold  data  set

because  I  get  asked about  this  all  the  time .

I  just  threw  in  some K -cross  Fold  validation  to  use

with  the  SVEM  methods   and  some  of  the  other  methods .

Those  are  the  methods  we 'll  use

and  for  methods  like  Fit  Definitive ,

Forward  Selection   and  Moving  Average  methods ,

we 'll  assume  a  full  quadratic  model as  that  is  the  tradition .

The  other  methods ,  again , we 're  going  to  use  a  Neural  Network

which  is  more  flexible .

There  are  four  responses, and  this  is  really  important .

I  didn 't  randomly  select  them .

There  are  four  of  them and  they  vary  in  complexity .

Again ,  I 'll  admit  this  is  subjective .

There  is  no  internationally   approved  measure  of  complexity

and  this  is  based  upon  the  ability to  model  the  responses .

Again ,  there  are  13  responses .

Typically ,  in  CMC  pathway  work , there  are  10 -20 ,  maybe  more ,

most  of  them  critical  quality  attributes .

They  are  important

and  they  vary  within  the  experiment from  some  are  fairly  low  in  complexity ,

some  are  very  high , very  difficult  to  model .

Frankly ,  in  those  cases ,

Neural  Networks  are  basically   your  only  option .

So  pay  attention  to  this   because  this  complexity

turns  out  to  be  very  important in  how  you  would  go  about  modeling .

Then  the  question  is   if  I 'm  going  to  evaluate  prediction ,

well ,  how  do  I  do  that ?

Remember ,  I  prefer  prediction  be   on  an  independent  test  set

with  new  settings  of  the  factors .

That 's  how  we  judge  interpolation .

Well ,  something  called  the  Root  Average  Square  Error

or  RASE  scores  is  very  common .

This  is  the  standard  deviation of  prediction  error .

Again ,  it 's  commonly  used to  judge  how  well  you  predict .

Smaller  is  better ,  obviously ,

but  there  is  a  problem  with  it that  we 've  particularly  uncovered ,

especially  in  simulations .

Models  with  low  RASE  scores  often   have  substantial  prediction  bias  in  them .

In  prediction ,  there  really  is still  a  bias -variance  trade -off .

So  how  do  we  evaluate  bias ?

Well ,  there 's  no  agreed  upon approach  to  that  either .

But  the  easiest  way and  the  most  visual  way

is  actual  by  predicted  plots   on  a  test  set .

Ideally ,  if  you  were  to  fit  a  slope

to  the  actual  bi -predicted  plot , I 'll  show  an  example .

The  ideal  prediction  equation  that  a  slope would  be  one  with  an  intercept  of  zero .

The  farther  the  slope  is  from  one , the  greater  the  bias .

For  purposes  of  demonstration , I 'm  going  to  set

a  specification  of  0 .85 -1 .15 with  a  target  of  1  for  the  slope .

If  you  can  stay  within  that  range ,

then  I 'd  say  you  probably  have acceptable  amounts  of  bias .

In  reality  that  happens  to  be more  of  a  subject  matter  issue .

Then  finally  I  said ,  "Well ,  you  can  fit a  slope  to  the  actual  bi -predicted  plot .

There 's  an  additional  problem ."

The  predictor  is  the  predicted  values .

They  have  a  lot  of  error  in  them .

So  this  is  actually  an  errors and  variables  problem ,

which  is  not  commonly  recognized .

But  JMP  17  has  a  really  nice  solution .

It 's  called  the  Passing -Bablok   modeling  algorithm

and  it 's  been  well -established ,   especially  in the  biopharma .

This  fits  a  slope ,  taking  into  account errors  in  X ,  the  predictor .

So  how  does  it  work ?

Well ,  it  fits  a  slope .

If  you  look  on  the  left ,

you 'll  see  the  slope  is  about  0 .5 . We  have  strong  bias .

There 's  a  lot  of  prediction  bias .

What  I  really  like  in  the  application in  JMP ,  they  give  you  the  reference  line .

The  dashed  blue  line  is  the  ideal  line slope  of  one ,  intercept  of  zero .

On  the  left ,  our  predictive  model   is  showing  a  lot  of  bias .

It 's  systematically  not predicting  the  response .

To  the  right ,  is  a  case  where   there 's  actually  a  small  amount  of  bias

in  general ,  that  would  be  acceptable .

By  the  way ,   these  were  picked  as  one 's  models

that  had  relatively   low  overall  RASE  scores .

These  are  called  the  Passing -Bablok  slopes

and  they  are  integral   to  how  I  evaluate  prediction ,

the  overall  RASE  and  the  slopes .

What  I 'm  going  to  do  at  this  point ,

I 'm  going  to  actually  go  over  to  JMP ,   if  you  don 't  mind .

I 'll  make  a  quick  change   in  the  screen  here

and  I 'll  make  this  as  big  as  I  can for  everybody .

Overall  in  this  exercise ,

I  fit  close  to  140  models  and  I  did  them all  individually  and  evaluated  them .

Yes ,  it  took  quite  a  while

and  I 'm  going  to  show  a  graphic   to  try  to  summarize  the  results

for  the  different  methods .

I 'm  going  to  open a  Graph  Builder  script .

I 'll  make  this  as  big  as  I  possibly  can for  everyone .

I 'm  using  some  local  data  filters ,   to  define  the  display .

Notice  we  have  four  training  scenarios .

I 'll  start  with  where the  DSD  is  the  training  set .

We  fit  models  to  the  boundary

and  then  we  evaluate  them   and  how  they  predicted

the  space -filling  design  points .

Y2  is  the  easy  response .

I  expected  all  approaches to  do  well ,  they  did .

Notice  I  set  these  spec  limits and  that 's  0 .85 -1 .15

all  fell  within  that  allowable  region .

Two  of  the  methods  that  did  well ,

I  particularly  liked  the  moving  average ,

so  it  did  pretty  well .

None  of  them  had  a  slope  of  exactly  one .

The  DSD  points  don 't  exactly  predict what 's  going  on

in  the  space -filling  design  points , but  they  all  did  relatively  well .

Now  we 'll  go  to  moderate  complexity .

Now  you  start  to  see  some  separation .

It 's  getting  harder  to  model  the  surface .

Again ,  I 'm  using   this  interval  of  0 .85 -1 .1 .

I 'm  looking  on  the  y -axis  at  the  RASE   score  standard  deviation  of  prediction .

On  the  x -axis ,  I 'm  looking  at  slope .

For  Y1 ,  using  the  DSDs  to  predict

the  space -filling  design  points as  the  test  set .

The  only  models  that  really  performed  well were  the  Neural  Networks  with  SVEM .

By  the  way , the  code  is  NN  is  Neural  Network ,

H  is  number  of  hidden  nodes .

We  have  models   with  varying  levels  of  hidden  nodes

and  I  simply  evaluated   RASE  scores  and  slope .

We  go  to  more  complexity .

Now  Y3  has  high  complexity .

It  is  hard  to  model .

The  lowest  RASE  scores  were the  methods  you  see  on  the  lower  right ,

but  you  can  see there 's  substantial  prediction  bias .

I  felt  overall   the  best  combination  of  low -bias

and  RASE  score  were  Neural  Networks , particularly  one  with  27  hidden  nodes .

Then  finally  number  four is  high  complexity .

We  fit  the  model  to  the  DSDs  and applied  it  to  the  space -filling  points .

I  didn 't  think  any of  the  models  did  great .

All  of  them  showed  some  prediction  bias .

Maybe  the  best  performance  was a  Neural  Network  with  12  hidden  nodes .

It  had  the  lowest  RASE  score ,  but  still , there  were  some  issues  with  bias .

So  that 's  one  strategy .

Well ,  what  if  I  were  to  do  the  opposite ?

I  fit  the  model   to  the  space -filling  points

and  then  apply  them to  the  boundary  DSD  points .

Again ,  let 's  start  with  the  easiest  case .

Y2  really  does .   It 's  a  pretty  simple  response .

Actually ,  the  SVEM  method  in  GenReg using  SVEM  and  Forward  did  very  well .

The  next  best  I  thought  was a  Neural  Network  with  10 .

Remember,  there 's  a  little  bit   of  extrapolation   going  on  here .

Finally ,  Y1  with  moderate  complexity .

Again ,  only  the  Neural  Networks  did  well .

As  we  go  up  in  complexity ,  increasingly just  the  Neural  Networks  are  working.

You 'll  find  similar  results for  the  other  approaches .

I  won 't  show  all  of  them , they 're  covered  in  the  notes .

But  the  general  conclusion  by  the  way ,  is

that  when  you  use   the  boundary  points  as  a  test  set

or  you  use  the  space -filling  designs  as   a  test  set  and  try  to  predict  the  other ,

they 're  just  not  doing as  well  as  they  should .

In  other  words ,  as  I  said  earlier , the  boundary  points ,

the  DSD  points   and  the  space -filling  design  points ,

there  are  differences  in  their  kinetic behavior  that  we 're  not  picking  up .

The  only  way  we 're  going  to  pick  it  up

is  to  actually  fit  models over  the  whole  design  space .

We  did  do  that  by  the  way .

I  should  just  quickly  show  you .

I  used  the  whole  data  and  we  fit  models and  we  actually  did  pretty  well .

I  didn 't  show  the  Passing -Bablok  slopes .

I  will  just  quickly  do  a  little  more  work with  JMP  for  those  who  are  interested .

The  Passing -Bablok  slopes can  be  done  in  Fit  Y  by  X .

I  will  admit  we  wrote  a  script

and  added  it  to  the  predictive add -in  to  do  this  in  Fit  Y  by  X ,

but  you  can  easily  do  it  yourself .

Here ,  and  I 'll  pick  one  of  the  cases , is  the  DSD  data  and  I 'll  pick  Y1 .

How  did  we  do  fitting  models ?

If  you  look  in  the  menu , there 's  the  Passing -Bablok .

I  strongly  suggest  you  look  at  it .

A  lot  of  regression  problems are  errors  in  variables .

How  did  the  method  do  it  overall ?

I  want  to  explain  something  else .

The  orange  points  are  the  DSDs , the  boundaries .

The  blue  points  are the  space -filling  design  points .

Here  I  fit  models  to  the  DSD

and  the  Passing -Bablok  slopes  are  being   fit  to  the  space -filling  design  points .

Overall ,  the  best  performance was  turned  in  by  the  DSDs .

There 's  one  of  them  here .

It 's  Saywood  6 .

Another  one  that  had I  forgot  what  it  was .

Let  me  widen  this  out  for  you .

Nineteen .

Notice  the  slope  is  close  to  one ,

but   you  can  clearly  see there  is  some  bias .

In  other  words ,  you  can  see  an  offset between  the  fitted  slope

and  the  ideal  slope ,  the  dashed  blue  line .

This  is  pretty  typical  overall .

I 'll  just  very  quickly  show  you .

If  you  have  JMP  Pro  and  you  want   to  do  SVEM  using  linear  models ,

just  go  to  Fit  Model ,  Recall .

This  is  a  full  quadratic  model .

You  could  do  others .

Go  to  GenReg   and  then  under  estimation  methods .

There 's  SVEM  Forward .

There 's  SVEM  Lasso .

These  work  very  well .

From  a  lot  of  work  in  these  methods ,

I  still  find  SVEM  Forward gives  you  the  best  results .

The  Lasso  tends  to  give  you a  lot  of  biased  results

on  test  sets  in  particular .

If  you 're  interested  in  model  averaging ,   if  you  have  JMP  standard ,

just  going  to  hit  recall  again , just  go  to  the  Stepwise  platform .

Didn 't  do  it .  Stepwise .

I  won 't  run  it .

It  will  take  too  long  because  model averaging  uses  best  subsets  regression .

It 's  time -consuming ,  but  it 's  there .

Again ,  Neural  Networks  with  SVEM ,

you  have  to  have   the  Predictum  add- in  to  do  that .

There 's  a  link  to  it  if  you 're  interested .

At  this  point ,

I 'm  going  to  not  do   too  much  more  analysis .

Again ,  you  can  go  through   and  look  at  the  various  slopes

for  the  various  responses

and  you  can  see  many  of  these  methods   resulted  in  highly  biased  slopes .

In  other  words ,  the  DSD  points  and  the   space -filling  designs  are  too  different .

We 've  really  got  to  understand   we  need  to  fit  models

over  the  entire  design  region .

At  this  point , I 'm  going  to  just  finish  up .

By  the  way ,  there  is  enough  material  here,

and  I  do  have  basically  many  talks that  are  combined  into  here .

I  apologize ,  but  I  think there 's  an  important  message  here .

By  the  way ,  I 'm  just  showing  slides   with  the  Passing -Bablok  slopes .

Then  finally ,  I  want  to  just   give  you  some  final  thoughts .

I  think  we  really  need  some new  thinking  in  statistics .

We  don 't  have  to  throw  out everything  we 've  been  doing .

I 'm  not  saying  that .

The  most  important  is  we  are   in  the  era  of  digital  science .

Digital  chemistry ,  digital  biology , digital  biotechnology  are  here .

They 're  not  tomorrow . We 've  got  far  more  automation .

Lots  of  great ,

especially  in  biotechnology , pilot  and  bench  scale

devices  that  scale  nicely , where  we  can  do  lots  of  experiments .

The  problem  is  complexity .

We  need  to  think  differently .

Machine  learning  methods  via  SVEM

are  very  important  for  fitting these  complex  systems .

We  need  to  get  away  from   the  response  surface  approaches

that  really  haven 't  changed .

Maybe  we 've  got  computers and  some  new  designs .

I  think  DSDs  are  really  very  clever .

We  have  optimal  designs ,  but  they  suffer from  the  fact  they 're  boundary  designs

and  people  keep  insisting on  full  quadratic  models .

That 's  a  mistake ,   as  I 've  tried  to  show  briefly  in  the  talk ,

and  you  will  be  able  to  download  the  talk ,

you  can  see  how  poorly  these  methods generally  did  with  the  complex  responses .

As  far  as  I 'm  concerned ,   we  need  new  types  of  optimal  designs .

At  a  minimum ,  these  need to  accommodate  a  lot  of  factors .

Patty ,  by  the  way ,  without  getting into  details ,  has  run  a  DSD

Not  a  DSD .   You  did  space -filling  design  with  18  runs .

Given  they  have   Amber  Technology  available ,

if  you  know  what  that  is , they  can  do  it .

Why  do  we  need  that ?

Because  these  systems  are  interactive .

We  need  to  stop  thinking they 're  a  minor  part  of  the  equation .

Main  effects  do  not  describe the  behavior  of  a  complex  system .

Its  interactivity  is what  drives  the  behavior .

We  need  to  cover  the  interior of  the  design  region .

Yes ,  we  would  like  to  cover the  boundaries  too .

We  don 't  want  to  be  specifying  a  model .

Optimal  designs  require  you  specify what  is  usually  a  full  quadratic  model .

We  need  to  get  away  from  that .

Space -filling  designs ,  by  the  way ,

are  optimal  designs  that  do  not require  a  model  be  specified .

But  they 're  not  the  total  answer .

We  need  to  cover  the  design  space .

We  need  to  give  the  user  a  lot  of  input

that  would  be  scientists   on  how  they  distribute  the  points .

The  work  of  Lu  Lu and  Anderson -Cook  point  the  way .

I  won 't  have  time  to  get  into  that .

That 's  another  topic .

We  need  to  be  able  to  easily  combine our  design  with  other  data .

That  includes  engineering  runs ,  GMP  runs ,

even  models  from  partial  differential equations  and  simulations .

Especially  if  you  want   to  get  into  digital  twins ,

you 've  got  to  be  able  to  do  that   using  what  I  call  meta  models .

Then  finally ,  Patty  mentioned  this , so  I  wanted  to  bring  it  up .

The  standard  practice  in  design of  experiments ,  assuming

that  somehow you 've  got  to  screen  out  factors

is  actually  a  really  high -risk ,   no -reward  strategy  in  complex  systems .

You  will  regret  it .

You  will  someday ,  at  a  later  stage ,  come back  and  have  to  redo  experimental  work .

I 've  seen  this  time  and  again .

In  complex  systems ,

this  idea  that  there  are  active and  inactive  factors  is  simply  wrong .

They  all  matter  at  some  level somewhere  in  the  design  space .

Frankly ,  with  our  modern  tools , you  don 't  need  to  do  it  anyway .

Also ,  something  else  people  do reflexively  reduce  linear  models .

We 've  shown  in  our  research  in  SVEM .

Also ,  a  nice  paper  by  Smucker ,  Edwards ,

and  we  showed  reducing   models  degrades  prediction .

Why Because  you 're  making   your  model  stiffer  and  stiffer ,

it 's  not  going  to  interpolate  well .

I  will  stop  at  this  point and  there  are  some  references  at  the  end .

Comments

Excellent talk @pattidimcneill  @nph !   I had to look up CMC = Chemistry, Manufacturing, and Controls as I kept thinking CNC = Computer Numerical Control.  

I believe all models considered are trained by minimizing mean squared error (MSE, including penalized variations) and thus will tend to exhibit typical bias-variance tradeoffs since MSE = Variance + Bias^2.    You’ve presented nice motivation for avoiding high-bias models by also considering the Passing-Bablok slope (btw, PB Is also available in the Method Comparison add-in Accuracy routine).    Minimizing MSE typically involves some shrinkage of predictions toward the center, and so it seems best MSE models will tend to have a PB slope > 1.   You have a good set of examples to compare, including that one case where both slopes are near 1 but the lines are offset, indicating that the PB intercept may also be of interest.   


More complex models tend to exhibit less bias and higher variance, and along these lines it would be interesting to try both XGBoost and the new Torch Deep Learning add-in (available by request at JMP 18 Early Adopter ) on these data.   Boosted trees and deeper nets might work well given their inherent ability to pick up on nonlinear interactions, presuming sufficient data to reveal them.   I’d be happy to explore this further with you:  russ.wolfinger@jmp.com.    



philramsey

Hi Russ

I agree one way or the other the bias-variance tradeoff applies to all modeling.  The catch has always been trying to quantify bias.  At least in the case of prediction (in the sense of interpolation) where a test set exists, data the model has never seen, provides a way to assess bias and so far the Passing-Bablok slopes and intercepts appear to be an effective bias measure.  How to come up with an overall MSE measure combining RASE and the Slope and the Intercept alludes me, but there must be a way to do so.  For now, one can make some good subjective judgements as we show in the talk using plots and tables of the values.  I agree that in general I expected the slopes to > 1, but as you see in the talk, sometimes they are << 1 and even negative on occasion.  Although, for the neural network models they do tend to > 1, but not necessarily for the linear type models we explore.   

 

I appreciate your offer to consider XGBoost and Torch (I assume Pytorch).  I admit I did try the Torch Deep Learning Addin in JMP 18 EA for the talk, but I could not get the Addin to work for unknown reasons.  In the talk, we used the Neural platform in JMP, which has performed well in practice, but I am very interested in seeing how other instances of neural networks or learning networks in general perform.  I have briefly explored NNs in Julia and they do not seem to work the same way as JMPs NN but that is not surprising nor a criticism of JMP.  Bottomline, I would very much appreciate the opportunity to work with you on these other approaches.  Boosted trees are definitely of interest, but I have not had a chance to explore them for analyses of experimental data; agreed, they tend work very well with nonlinear, complex systems such as we have in biotechnology.  Please let me know how you would like to proceed. I appreciate the offer.