cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Time-Efficient Strategy for Selecting a Test Set in the Validation of an Image Detection Algorithm - (2023-US-30MP-1362)

Caroll Co, Statistician, Social & Scientific Systems
Sandra McBride, Principal Statistician, Social & Scientific Systems Inc., A DLH Holdings Corp. Company
Shawn Harris, Statistician, Social & Scientific Systems Inc., A DLH Holdings Corp. Company
Debra A Tokarz, Senior Pathologist, Experimental Pathology Laboratories, Inc.
Thomas J Steinbach, Vice President/Senior Pathologist, Experimental Pathology Laboratories, Inc.
Mark F Cesta, Pathologist, National Institute of Environmental Health Sciences

Helen Cunny, Toxicologist, National Institute of Environmental Health Sciences

Keith R Shockley, Staff Scientist, National Institute of Environmental Health Sciences

 

Advances in digital image analysis have created opportunities for quantitative histopathology assessments in rodent toxicology studies. Microscopic evaluation of rodent spleen is performed to assess for test article-induced immunotoxic effects but can be subject to inter- and intra-pathologist variability in characterization of differences between treatment groups and across studies.

 

To address this problem, an image detection algorithm was trained to quantify tissue compartments in histologic sections of rodent spleen. Our aim was to design a study to evaluate how the image detection algorithm compared to digital annotations performed by human raters for specific features of the spleen, while keeping within operational constraints (e.g., rater time and effort).

 

In this talk, we show how we used JMP Custom Designer, using data generated by the image algorithm as inputs, to select and allocate a test set across human raters. We used a response surface model, which is designed to select samples that fall on the boundaries and center of the input space. The resulting study design allowed us to strategically select a test set and create a balanced sampling plan for use across several pathologists from different institutions.

 

 

Hello  everyone. I'm  Caroll  Co.

 

I'm  a  statistician  at  DLH.

Today,  I  will  talk  about   a  project  I worked on

in  creating  a  time- efficient  strategy   for  selecting  a  test  set

in  the  validation   of  an  image  detection  algorithm.

This  work  was  done   in  collaboration with scientists

from  the  National  Institute  of Environmental Health Sciences,

pathologists  from  experimental   pathology laboratories,

and  my  fellow  coworkers  at  DLH.

In  rodent  toxicology, advances   in digital  image analysis

offer  opportunities  for  quantitative   histopathology  assessments.

An  example  of  where   digital  image  analysis  could  be  useful

is in the evaluation of  test   article- induced immunotoxic effects

in  rodent  spleens.

Typically,  pathologists  use  a  microscope

to  evaluate  whether  there  is   immunotoxicity in the spleen

and  make  judgments  on  whether  spleens   from  animals that receive a treatment

are  different  from  the  control  group.

This  workflow  is  prone   to  inter-  and  intra-rater  variability

in  characterizing  differences   within  the  study

and  also  across  different  studies.

Here's  an  example  of  a  zoomed- in   cross-section  view  of  a  rodent  spleen.

Here,  I'm  just  pointing  out

specific features  of  interest  to  our collaborators  that  we  want  to  capture.

Our  problem.

When  we  got  involved  in  this  project,

our pathologist collaborators had already trained  an  algorithm

that  measure  these  features  of  interest.

The  question  that  was  posed  to  us  was   how  do  we  validate  this  algorithm?

Validation  is  a  very  broad  term.

To  narrow  our  focus,

we  thought  about  what  are  some  of  the   main  questions  that  we  wanted  to  address.

First,  the  algorithm  was  trained by a select few people,

so  we  wanted  to  see  whether  different  pathologists from different laboratories

would  agree  with  the  algorithm  output.

Second,  there  were  multiple  features

where  the  algorithm was  trained  to  measure.

We  wanted  to  see   if  there  are  any specific features

where  performance  is  better  or  worse.

Then,  we  also  wanted  to  see

if this algorithm  can  hold  up  to   against  a   wide  range  of  cases.

If  there  are  any  blind  spots,   can  we  find  them?

I'll  also  mention  that  from  here  onwards,

I will  refer  to  the  image  algorithm   as  the  AI.

One  solution  that  we  thought  of   was to have both humans and AI

annotate  the  same  images   and  compare  the  output.

This  is  how  it  would  work.

A  tissue  sample  gets  scanned   so  it  turns  into  a  digital  image

or a whole slide  image WSI,

and  that  image  gets  fed   into  the  AI  for  processing.

Then  for  humans,  they  view  the  image using  an  image software

where  they  can  manually annotate  features  of  interest.

One  of  the  questions  we  got  asked was how  many  images  do  we  need  to  validate?

As statisticians,

our  answer  is  always  as  many  as  you  can or  do  you  have  hundreds?

Do you  have  thousands?

But  after  talking  to  them,

you realize that there are a number of   operational  constraints  to  implement  this.

The  first  constraint  was  that

there was a  need  for  each  image   to  be  evaluated  by  three  different  people.

Having  three  people  was  useful

so that  we  can  also  estimate the  variability  between  raters.

It  also  turns  out  that  this  process of  annotation  is  very  time- consuming.

After  talking  to  the  pathologists, the  maximum  number  of  images

they were willing  to  annotate   was  about  24  per  person.

Then,  lastly,  one  of  the  goal   of  this project

is  to  get  buy- in  or  support from  pathologists  from  other  labs.

It  meant  we  needed  participation from multiple labs

and  multiple  people  from  each  lab.

In  this  study,  we  got  participation from three centers

and  we  had  three  pathologists   representing  each  center.

In  total,  we  had  nine  pathologists recruited  in  the  study.

Based  on  all  of  these  constraints,

we've  determined  that  we  can  validate only  72  images.

The  sampling  plan.

Now  we  know  our  sample  size.

How  do  we  select  the  72?

Random  selection  is  okay, but  can  we  do  better?

What  we  came  up  with  was

since  the  cost  per  image  for  the  AI   is relatively low,

we  got  the  AI  to  process a  larger  set  of  images

and then from there,

use  information  from  the  AI output to  better  select  our  72  samples.

We  use  a  response  service  model   to  select  our  points

and an RSM will select points

towards  the  boundaries and  center  of  the  input  space.

It's  a  model  containing  your  main  two- way interaction  effects  and  quadratic  effects.

This  model  was  particularly  useful to  our  validation  problem

because we wanted  to  look  for  areas  where agreement  fails  between  humans  and  AI.

Generally,  bugs  tend  to  occur on  the  boundary  and  edges  of  this  space.

This  type  of  model   fits  the  problem  that  we  had.

Now,  we  have  a  plan  to  select  the  images.

The  question  now  is  how  do  we  allocate these  72  images  across  nine  raters?

We  expect  the  samples   to  have  a  wide  range  of   complexity.

We  wanted  to  make  sure  that  everyone got  a  balanced  mix  of  slides.

The  complexity  or  this  case  mix

is determined  by  the  output that  we  got  from  the  AI.

We  also  needed  to  satisfy the constraint

of  making  sure  that  each  image   is  seen  by  three  different  people.

Here  is  my  workflow.

I  will  show  you  how  I  created   the  sampling plan that satisfied

all  of  our  operational  constraints   in JMP.

I  have  three  steps  in  this  workflow.

First,  I'm  going  to  show  you how  we  selected  the  72  images

from  a  larger  set.

Second  is  how  we  replicated the  72  images  three  times,

so we  have  216  runs.

We needed to do this

because we wanted each image to  be  seen  by  three  raters.

The  last  step,

I'll  show  you  how  we allocated  the 216 runs across nine raters

so  that  each  person  has  exactly  24  images.

In  each  of  these  steps, I  will  actually  be  using  the  DOE  platform.

Now,  I'm  just  going  to  move  over  to  JMP.

I  have  my  JMP  file  in  there.

My   JMP journal.

I  am  going  to  open  a  sample  data  set,

and  I  say  sample  because  our  data   has  not  been  published  yet.

For  the  purposes  of  this  demonstration, I  will  be  using  a  fake  data  set.

This  data  set  has  the  same  features to   the  original  data  that  we  collected.

This  is  what  the  data  looks  like.

I  have  my  slide  ID,  which  is  just a  numeric  variable  going  from  1  to  100.

Going  from  1  to  100  right  here.

I  have  four  variables  that  I'm  looking  for.

That's  been  collected  by  the  AI.

Features  1,   2,   3, and  4.

Three  of  them  are  continuous and  one  is  account  variable.

If  you  look  at  under  Analyze,   Multivariate  Methods, Multivariate,

if  we  just  show  the  scatterplot  matrix of all of the variables you'll see,

all  of  them  are  uncorrelated.

This  is   the  spread  of the  range  of  our  variables.

Step  one.

Under  DOE,  there's  a  Custom  Design, that's what I'm clicking.

We  can  leave  the  responses  Y  here. Just  leave  it  alone.

The  factors  here,  there's  actually   two  ways  to  do  this.

One  way  is  to  click  on  this  button  here   that  says  Add  Factor.

I  think  about  fifth  of  the  way and  there's  a  Covariate  selection.

If  you  click  that,

it'll  ask  you  which  columns  of covariates you  want  to  include  in  the  design.

In  my  case,  it's  these  four  features.

I'm  just   selecting  them and  you  click  Okay.

JMP  will  automatically  populate the min and max

for  each  of  these  variables.

You'll  see  here  that   they're  all  treated  as  covariates.

That  looks  good.

I'll  just  quickly  close  this

and  then  show  you  another  way   of  doing  the  same  thing.

Again,  in  DOE  Custom  Design.

What  I  showed  you  was  adding  it   using  this   way  here.

Another  way  to  do  it   is  to  just  use  this  button  right  here

which says Select Covariate  Factors

and  it'll  give  you  the  exact  same  thing.

Again,  I'm  picking  the  factors   that  I'm  interested  in

and it automatically  populated this  Factors  window.

I'm  just  scrolling  down  to  the  bottom   and  click  Continue.

This  is  where  we  can put  in  the  model  that  we  want.

For  us,  we  wanted  an  RSM or  response  service  model.

There  is  again  a  shortcut button  right  here.

I'm  just  going  to  click  that.

What  that  did  is  it  added...

We  already  have  our  main  factors  in  there,

but  it  added  our  quadratics and  two- way  interaction  effects.

Then,  lastly,  at  the  bottom in  here   where  it  says  number  of  runs,

we  don't  want  100  because  we  actually just  want  to  select  72  out  of  100.

I'm  just  going  to  change  this  to  72.

This  all  looks  good  to  me.

I'll  click  Make  Design.

Let's  just  give  JMP  a  few  seconds   to  create  the  design  for  us.

Here  we  go.

This  is  the  design  that  it  created  for  us.

As  you  can  see,  if  you  scroll  down,

it  only  gave  us  72  rows, which  is  what  we  had  asked  for.

I'm  just  going  to  say, turn  this  into  a  data  table.

I'm  just  going  to  hit  Make  Table right  here  on  the  bottom  left.

I'm  going  to  close  this  for  now.

Now,  before  I  show  you what  this  design  looks  like,

I  actually  want  to  go  back   to  the  original  table,

just  so  you  could  see...

We  can  compare  what  happened to the observations that were picked

versus  observations  that  were  not  picked.

If  you  go  back  to  the  original  data  table, the  ones  that  were  chosen,

that  72  are  actually  highlighted.

What  I'd  like  to  do  at  this  point is  I'd  like  to  create  a  new  column

that  would  identify  which  rows   were  selected  and  which  ones  weren't.

To  do  that,  you  go  under  Rows.

Under  Row  Selection,

there  is  the  last  option  in  here says  Name  Selection  in  Column.

What  it'll  do  is it'll  label   the  currently selected rows

and  save  whatever  values   you  assign  for  that  column.

I'm  going  to  click  that.

The  column  name.

Y ou  get  to  create   a  name for that column.

T he  rows  that  are  highlighted,   I'm  going to give them a value of 1.

The  ones  that  were  not  selected, I  want  to  give  them  a  value  of  zero.

I'm  going  to  press O kay, and  that  created  this  column  right  here.

All  of  my  wants,  there's  72...

There's  72  rows  for  the  ones   that  were selected

and  then  28  for  the  ones that  were  not  selected.

I  want  to  go  back  to  my  scatterplot  matrix to  see  the responses

or  to  see  which  observations  were  picked   and  which  ones  weren't.

But  before  we  do  that, I do want to color-code them

so  that  when  you  look  at  the  graph,

I  can  immediately  spot  which  ones were  picked  and  which  ones  weren't.

A quick  way  of  doing  that  is  to  do  Rows, Color  or  Mark  by  Column.

Then,  here,  I  want  to  use   this  column called Selected

so  that  all  the  zeros  will  be  identified by an orange circle

and  all  the  ones  will  be  identified   by  a  blue  circle.

We  use  a  marker  as  well.

That  might  make  it  easier to  identify  the  observations.

I'll  just  click O kay.

Now,  all  of  my  rows   are  marked   by  the  plus  or  the  circles.

If I go back to my...

If  I  go  back  to  my  scatterplot  matrix,

again,  it's  under  Multivariate  Methods,  Multivariate,  and  I  just  hit  Recall.

I'm  doing  the  same  thing as  I  did  initially.

Let  me  click Okay.

I'm  just  going  to  make  this   a  little  bit  bigger.

You'll   now  see  that  the  blue  pluses are  the  observations  that  were  selected,

and the  orange  circles   are  the  ones  that  weren't.

The  ones  that  were  selected

tend to  occur  more  on  the  boundary and  edges  of  our  space.

This  is  exactly  what  we  wanted.

This   looks  great  to  me.

Let's close this.

Just  to   convince  yourself

that  the   model  is  doing what  it's  supposed  to  be  doing,

I  actually  did  this  same  setup,

but  this  time,  instead  of  picking  72, I'm  only  picking  24.

Let's  make  it  a  little  bit  more  extreme.

Just  to   show  you  an  example   of  what  that  looks  like.

Again,  now  my  selection.

Now,  there's  only  24  rows that  are  selected  in  here.

Let's  do  the  same  thing.

Making  this  a  little  bit  bigger.

Now,  you  see  fewer  blue  pluses because  there  should  only  be  24.

But  then  you'll  see  how   those  observations are getting picked

versus  the  ones  that  were  not  picked.

They  do  tend  to  occur  more   on  the outer boundaries,

but  still  some  in  the  center.

Let's  go  back  to  our  original  problem. We'll  just  close  this.

This  was  our  original  data  table.

The one that was created  by  the  design  is  this  one.

What  do  we  have  here?

We're  still  keeping  the  same  variables   that  we  asked  JMP to  include.

We  have  our  features  1  2,   3, and  4.

Why  our  response  is  still  missing?

We  now  have  this  new  column that  says  Covariate  Row  Index.

Basically,  this  just   links  you  back   to  your  original  table.

If  this  says  88,  it  would  be  row  88 on  your  original  table  that  got  captured.

This  one.

This  row  here  should  be  the  same   as  this  row  right  here.

Before  we  move,  I'm  actually  just going  to  rename  this  to  my  slide  ID.

In  my  case,  the  slide  ID, which  is  just  a  number  from  1  to  100,

it's  actually  just  the  same   as  the  Covariate Row  Index.

That  was  all  for  step  one.

Step  two  is  now  replicating  the  72  images.

How  do  we  do  that?

A  quick  way  of  doing  that  is  to  go  to  DOE.

The  second  selection  is  augment  design.

Then  this  window  will  pop  up  asking  you   what  are  your  responses  and  factors?

I  know  why  we  don't  really   have a response,

but  I  think  you  still  need  to  put  in   something  in  here.

That's  okay.

You  just  put  it  in  there   even  though  it's  all  missing  values.

Then,  for  the  factors,  we're  going   to  select  our  features  1  to  4.

This  time,  we  would  also need  to  select  our  Slide  ID.

Click  Okay.

This  new  window  pops  up with  all  of  our  factors .

Again, JMP  auto- populates   the  ranges  of  all  of  our  variables.

On  the  bottom  here,   under  Augmentation  Choices,

there's  a  replicate  button, and  that's  exactly   what  we  need.

Click  that  and  then  they'll  ask  you,

how  many  number  of  times   do  you  want  to  perform  each  run?

The  default  is  two, but  we  actually  want  it  to  be  three

because  we  needed  each  image   to  show  up  three  times  in  our  design.

Then  we  click  Okay.

Now,  what  you'll  see  is  that   in  our design,

instead  of  just  72, we  now  have  72  times  three.

We'll  just  scroll   all  the  way  to  the  bottom.

We  now  have  216  rows.

I'll  just  click  Make  Table to  turn  that  into  a  data  table.

We'll  close  this.

A  few  more  things  that  I  want  to  check.

A  couple  more  things  I  want  to  check before  we  move  on  to  step  three.

That  is,   every  time  you're  doing   these  steps,

you  want  to  make  sure  that  it's  actually doing  what  you  think  it  should  be  doing.

In  this  case,

I  wanted  to  check  that  each  slide  ID actually  occurs  three  times.

We're  going  to  use  tabulate  to  do  that.

Tabulate, S lide  ID,  and  I  should  have a  count  of  three  for  each  ID.

That's  what  we  have.

That  looks  great.

That's  it  for  step  two.

Let's  move  on  to  the  last  step, which  is  the  most  exciting  step.

Now,  we  have  216  rows and  we  have  the  slide  IDs.

Now,  we  want  to  distribute  this to  nine  different  people.

How  would  we  do  that  in  a  way   that  each  person  gets   a  balanced  mix?

A  cool  way  to  do  it  is  to use  DOE  again.

DOE, click  on  DOE Custom  Design.

Like  what  we  did  before,

we're  still  going  to  use   our  covariate  as  factors.

Just  select  features  1  to  4   and  our  slide  ID.

JMP   will  auto- populate   the  mins  and  max.

Now,  all  of  the  row  here  is they're  all  listed  as  covariate.

But  at  this  point,  I  actually want  to  add  two  more  factors.

One  factor  is  a  categorical  factor with three levels,

and  that  is  the  center  or  the  laboratory because you  have  nine  pathologists,

but  they're  all  coming   from  three  different  centers.

I'm  just  going  to  rename  them  A,  B  and  C.

Then  I  want  to  do...

In each center, we  also  have   three  different  people  participating.

I  want  to  add  another  categorical  factor   again  with  three  levels.

This  is  going  to  be  our  rater.

These  are  like  the  people.

Let's  name  them  1,   2, and  3.

Basically,  what  I'm  saying  is  that Center  A  will  have  Raters  1, 2, and 3,

Center  B  has  Raters  1,  2, and   3, and  so  on.

In  total,  we  have  nine  different  people,

nine  different  combinations of  center  and  rater  in  here.

I'm  going  to  minimize  this.

This  actually  just  shows  the  data  table where  we  pulled  our  covariates  from.

Then  hit  Continue  here.

Now,  we  get  to  tell  JMP what  kind  of  model  do  we  want.

There's  a  couple  of  things  in  here.

First,  I  actually  wanted  to  add  in.

JMP  will  automatically   put  in  your  main  effects.

These  are  all  the  factors that  are  in  my  model.

I  wanted  to  put  in  an  interaction  term between  center  and  rater,

and  that's  because I  wanted  to  make  sure  that

all combinations  of  center  and  rater   appear  in  our  design.

That   guarantees  that.

The  other  thing  to  know  is  that

we actually  don't  have  enough runs  to  estimate  slide  ID.

Remember,  slide  ID  goes  from...

There's  72  distinct  slide  IDs  in  here,

but  we  actually  don't  want  an  effect,  a  slide  ID  effect.

We  just  want JMP  to   take  that  into  account when  it's  constructing  the  design.

We  don't  really  want  to  estimate  slide  ID.

Under Estima bility  here, you  can  change...

The  estimability  for  slide  ID,

from Necessary,  you  can  change  it  to  If  Possible.

Then,  lastly,  the  number  of  runs.

I  think  t he  number  of  runs they  calculated  for  me  was  18.

But  we  actually  wanted   to  use  up  all  of  our  runs

because  we  have  216  runs and  now  we're  just  looking  to  see...

We wanted  to  get   JMP to  tell  us how  do  we  allocate  these  216.

Make  Design.

It  just  might  take  a  little  bit  longer.

I'll  go  into  like, how do we check the design

and  then  talk  a  little  bit   about  the  run  order.

This  is  what  the  design  looks  like.

We  have  our  original  features,   features  1  to  4  in  here,  the  slide  ID,

and now it added  center  and  rater assigned   for  each  slide  ID.

What  this  is  saying  is  that  rater   B1 would have to annotate slide number 27,

A1  will  annotate  slide  96,  and  so  on.

There  should  be  216  runs  in  here.

That  looks  okay.

The  last  part  here   under  Data  Table  Options,

there  is  a  check mark   for  Include Run Order Column.

I'm  going  to  click  yes   because  in  our case,

for  annotation,  if  you  expect  there   to  be  some  type  of  time  effect

in  whatever  process  that  you're  doing.

In  our  case,  we  were  maybe   worried  about

will  there  be  a  learning  curve, will  there  be  a  fatigue  effect?

We  want  to  make  sure  that   not everyone is starting with a slide ID

that's  like  the lower-numbered slide ID

and  ending  with   the  higher- numbered  slide  IDs.

I'm  just  going  to  click Make  Table.

I'm  going  to  close  this  window  for  now.

This  is  what  now  our  design  looks  like.

Before  we  do  our  checks, I'm  just  going  to  create  a  new  column

where  I  can  concatenate the  center  and  rater.

I'm  just  highlighting   these  two  columns, right-click.

Under New F ormula  Character,

I  just  want  to  concatenate  them   with   a  comma.

This  is  going  to  be our  Center, Rater  variable.

There  you  go.

Three  things  that  we're  checking  here.

First  is  let's  do  tabulate.

We  want  to  make  sure  that  each  person   has  exactly  24  images.

That's  my  Center, R ater, and there's 24 here.

That's  great.

The  other  thing  that  we  want  to  check  for is  that  there  are  no  repeats.

We  don't  want...

For  example,  we  don't  want  slide  ID  1 being  assigned  to  the  same  person  twice

because  that  would  not  be  fun   for  that  person.

If  I  do  a  crosstab  of  slide  ID  by Center, Rater,

I  should  see  just  a  column  of  1s.

That  means  each  slide  ID  was  assigned   to  three  different  people.

I'm  just  scrolling  through  here

and   looking  at  this  table  to  see that  there  are  no   2s or 3s  in  here.

That  looks  great.

Then,  lastly,  I  do  want  to  check,

how  does  the  case  mix  look  like across  these  nine  raters?

One  way  of  thought  of   at  least checking that visually

is  to  just  do  a  parallel  plot   using  Graph  Builder.

I'm going to extend  this.

I'm  going  to  highlight   features  1, 2, 3, and 4

because  those  are  my  original  variables.

Then,  my  center  and  rater, I  am  dragging  it  to  here  on  my  x- axis.

I'm  going  to  hit  this  parallel  plot  option right  here  on  the  top  right  icon.

I  get  that.

A ctually,  you  don't  want Center, Rater  in  there.

I'm  just  going  to  turn  that  off.

Maybe  what  I  want  is  Center, Rat er and  it's   own  panel.

What  this  shows  us  is  we  have   the  assignments  from  the  nine  people,

so  A1  all  the  way  to  C3.

This  is  the  case  mix  of  the  images that  they  got  assigned  to.

A gain,  we're  just  doing  this  visually.

What  I'm  looking  for  is that  there are no...

When  you  look  at  them   in total   that  they  all  look  about the same,

that  they're  somewhat  blended   and  there's no clumping that's happening

and  they  look  to  be  okay.

Another  way  to  do  it  is  to, I  guess  you  can  overlay.

It  gets  a  little  bit  hard  because   I  do  have  nine  different  colors  in  here.

But  again,  you're  just  looking  for...

You  don't  want  there  to  be  any  patterns  in  here.

You  don't  want  there  to  be   clumps  of  green up here or down here

or  wherever  in  this  space.

You  could  also  check  it  by  center.

I guess  my  colors  are  not  the  best.

You  have  yellow,  blues,  and  greens.

They  all  look  to  be  well- mixed.

Then,  I  did  mention  about  the  run  order.

What  it  is,  is  that   you  just  want to make sure

that  when  you're  telling  people   how  to  do  their  annotation,

you  want  to  make  sure  that   that  is  also  randomized

so that if there is a time effect, that's  taken  into  account  already.

What  I'm  plotting  here  is  the  slide  ID.

A gain,  this  is  sequential. It goes from 1 to 100.

Then  this  is  the  run  order,

the  sequence  of  when  the  pathologist  would   rate  these  images.

This  looks  random.

We  could  also  plot  it  by  center  and  rater.

This  is  for  each  individual  person.

They  look  good.

I'm just  going  to  close  that.

Now,  I'll  just  go  back  to my PowerPoint slide

and  just  end  with  some  conclusions  here.

We  can  use  DOE  in  selecting   samples  or  test  cases

where  you  have  prior information.

If  you  have  data  or  covariates   that  you can use to inform the selection,

why  not  use  them?

A response  surface  model  is advantageous

if  you're  interested in  the  boundary  or  edge  cases.

Second  point  is  that  you  can  use augment  DOE replication

if  you  have  a  situation  where  you  need  multiple raters per sample.

The  job  or  the  design   already  gives you an opportunity

to  factor  run  order  in  your  plan.

That  could  be  really  useful   if  you expect there to be a time effect

such  as  a  learning  curve  or  fatigue.

I have  a  couple  of  links  here  to  a  blog

that  talk  more  or  discuss  more  about  what is  a  covariate  in  design  of  experiments.

Please  check  that  out  if  you  want to  learn  more  about  this  technique.

Then,  lastly,  I  just  wanted  to  say

thank  you  to  all  my  collaborators who  helped  make  this  project  possible.

Thanks.