Choose Language Hide Translation Bar

The Unique Challenges of Cell Therapies: JMP Scripts for Power and Sample Size Calculations - (2023-US-30MP-1473)

The most novel, innovative, and promising therapeutics in biopharmaceuticals are cell therapies. Cell therapies transfer human cells into a patient to treat disease. These cells either come directly from the patient or from a healthy (cell) donor. Multiple regulatory guidance documents recognize the importance of providing cell therapy manufacturers the flexibility to improve their processes. Therefore, it is imperative to show that the pre- and post-change processes are comparable and process changes pose no threat to the safety or efficacy of the drug product.

 

One method used to ensure comparability is an equivalence test of means. There is a regulatory expectation that the study is done as a paired design, often referred to as a split-apheresis study, unless there is minimal donor-to-donor variability. In split-apheresis studies, the same donor material is split and processed in the pre- and post-change process for comparison. The design of this study presents unique challenges in cell therapies as they require adequate sample sizes to ensure properly powered designs, yet the number of healthy donors available is usually quite low (three to six donors). Additionally, the power depends on lot-to-lot and assay variability, assay replication strategy, and the effect size used for the equivalence acceptance criterion (EAC).

 

This talk presents a series of JMP scripts that extend the existing capabilities of the Sample Size Explorer platform to address each of these relevant study nuances, as well as the capability to overlay power curves to address trade-offs with different sample sizes and approaches.

 

 

I  am  Heath  Rushing.

Although  Andrew  Karl,  and  Jeff  Hofer , and  Rick  Burdick,  some  teammates  of  mine ,

did the  majority  of  technical  work  here, I'm  going to  be  the   one presenting  today .

I'm  going to  talk  about  how JMP  and  JMP  scripts  can  be  used

in  a  very  particular  specific application   in  cell  therapies.

I'm  going to  talk  a  little  bit  about what   gene and cell therapies  are

and  the  very  specific  instance that  I  want to  talk  about

is  comparability.

I'm  going to  focus  on  process  changes.

Interestingly  enough,  last  year,

I  gave  a  talk , and  it  focused on  cell  and  gene  therapies.

They're  very  novel  therapeutics.

The  first  one  was  approved in  the  United  States  in  2017.

A  little  bit  different  than  most  of ,

what  I  call   the  small  molecule  and  the  large  molecule  therapeutics

that  you may  have  heard  of  in  the  past.

Let  me  just  touch  base on  what  is  a  cell  and  gene  therapy.

First  thing  I'm  going to  do is  touch  base  on  what  a  gene  therapy  is.

What  you're  essentially  doing is  you're  replacing

a  gene  with  a  healthy  one,

or  turning  off   bad  genes.

A  lot  of  cancers are  caused  by  defective  genes.

What  you're  doing   is  you're  inserting these  healthy  genes  back  into  a  patient

of  either  in  vivo  or  in  vitro.

An  in  vitro  would  be  more of  a  bone  marrow  transplant.

Last  year,  I  talked  about ,

the  challenge  with  gene  therapies is  that  patient -to -patient  variability.

I  focused  on  process  development.

Then  I  talked  about  cell  therapies.

In  the  cell  therapies , what  you're  doing is you're  replacing  disease  cells.

You're  either  transferring some  sort  of  healthy  cell  into  a  patient,

or  we're   replacing missing  cells  into  a  patient.

Where  do  these  cells  come  from?

They  either  come from  the  patient  themselves,

so  you  would  have  to  deal with  that  patient -to -patient  variability,

or   in  most  cases, they  come  from  a  healthy  donor.

Now  you're  not  dealing with  this   patient-to-patient variability,

but  you're  dealing with  donor -to -donor  variability.

Whenever  I  say  donor, I'm  talking  about  a  healthy  donor.

I  could  be  a  healthy  donor.

Then  someone  else could  be  a  healthy  donor  also.

In  both  of  those  cases

is  you  have  to  deal with  that   patient-to-patient

or  donor -to -donor  variability.

What's  interesting  is  last  year , I  gave  the  example   in  process  development,

and  it  looks  something  like  this.

It  was   the  exact  same  data  set that  I  used  last  year  that  I  said,

say  that  you  were  developing  a  process where  you  look  time,  temperature , and  pH,

and  you're  measuring  their  effect on  cell  viability  and  by  product.

In  that  case  is, I  cannot  use  one  donor  material,

I  had  to  split  that  up into  four  different  donors.

I  said,  "If  you  ran  these  experiments for  process  development,

and  you  did  not  consider that  there  was  donor -to -donor  variability,

this  is  what  you  would  see.

What  you  would  say   is  we're  looking for   P-values  that  are  below   0.05,

you  would  say  nothing affects  cell  viability

and  nothing  affects  by  product.

You  were  not  able  to  detect that  you  had  any  significant

or  critical  process  parameters

for  the  very  reason that  you  do  not  consider

that  there  could  be  a  difference  in  donor.

Right   now,  if  you  do  consider  those as  what  it's  called  a  fixed  donor  effect,

the  only  thing  that  I  did is  I  brought  in  donor.

Then  you  see  that  NAND.

This  really sticks  out what  significantly  affects  cell  viability

and  what  significantly  affects  by  product.

The  whole  talk  was  on  how  does that   donor-to-donor  variability

affect  statistical  inference and  also  process  capability.

I'm  going to  focus on  that  statistical  inference.

What  are  you  trying  to  do in  process  development

is   you're  trying  to  determine  if  things  like pH, and  temperature,  and  time

significantly  affect your  critical  equality  attributes.

Say  that  I  was  a  drug  manufacturer ,

and  I  have  set  up a  process  development  study .

I  send  this  process  development  study .

I  want  to  determine if  temperature  affects,

and  I'm  going to  call  it  cell  viability.

I  say,  "Hey  whenever  I'm  looking  at  that , is  I  want to  make  sure  that  if  something

significantly  affects my  quality  attributes,

I  control  that  in  my  process.

But  if  it  doesn't,  I  am  not  spending  money and  time  and  resources  controlling  it."

What  I'm  concerned  with as  a  drug  manufacturer

is the  Type I  error  rate.

I  do  not  want  to  inflate a   Type I  error  rate.

A   Type I  error  rate  would  say,

"Hey , this  is  significant when,  in  fact,  it's  not."

What  do  you  think  that  regulatory  agencies would  be  more  concerned  with?

You  controlling  more  things?

Are  you  not  controlling  things that  should  be  controlled?

That  is  exactly  right is  they'd  be  more  concerned

about  that  patient  risk, that   Type II  error.

In  process  development is  drug  manufacturers

do  not  want  to  inflate  the   Type I  error.

They  also  want  significant  power.  Why?

Because   that  controls  that  patient  risk.

The  whole  point of  me  showing  that  last  year

was  to  show  the  effective donor-to-donor  variability

on  trying  to  determine your  critical  process  parameters.

I  call  it  statistical  inference.

Right   now, what  happens   if  I  change  my  process?

I  had  a  colleague  just  last  week , I  was  working  with  her.

Whenever  we're  talking  about  cell and  gene  therapies,  she  said,

and  this  is  her  quote, "Heath , at  cell  and  gene  therapies,

things  are  constantly  changing.

You  could  have  things like  analytical  methods  change.

You  could  have  things like  process  change."

Today,  I'm  going to  focu s on this  process  right  here.

Mainly,  I'm  going to  focus on  that  process  change.

I  do  want to  point  out that  regulatory  agencies  understand

that  you  have  a  need for  improving  your  process.

Even  if  you  improve  your  process, are  you  changing  your  process ?

They  recognize  the  need  for  that, but  they  also  recognize  the  need

that  the  therapeutics that  you're  making  from  that  process

should  be  similar in  terms  of  product  quality.

You're  using  these  in  clinical  trials.

What  does  it  mean  to  be  similar?

That  doesn't  say  that  they  have to  be  exactly  the  same,

that  they  have  to  be  similar or  comparable.

In  terms  of  me  saying that  something  is  similar,

what  I  want to  do  is  I  want to  make  sure that  I  have  some   similarity  condition.

That's  the  whole  point  of  comparability.

For  very  low  risk  attributes,

what  I  can  do  is  I  can  show  that  process  A and  process  B

is  similar  in  side -by -side  plots .

For  more  higher  risk  attributes,

what  I  want to  do   is  maybe something  like  a  quality  range.

In  terms  for  quality  ranges,

I  just  take  that  reference  group, the  old  process,

and  I  built  some  range  around  it

and  ensure  that  all of  th e measured  quality  attributes

from  the  new  process fall  within  that  range.

For  very  high  risk  attributes,

what  I  want to  do is  I  want to  do  equivalen ce testing.

This  is  what  I'm  going to  focus on  today,

tell  you  about what  equivalence  testing  is,

and  how  that  acceptable  difference or  that  similarity  condition  is  set.

It's  called  equivalence  testing, Two One Sided  t-test.

To  reiterate  what  we  talked  about  before,

whenever  I'm  using  design  of  experiments in  process  development,

what  I  do   is  I'm  changing some  variable -like  temperature

from  low  to  high ,

and  I'm  measuring  the  effect on  my  critical  equality  attributes.

I  am  assuming  in  the  null  hypothesis that  they  are  the  same.

What  I  do  is  I  set  up  a  design to  see  if  they're  different.

A   Type I  error  in  that  case would  be  me  saying,

"Wow,  they're  different " when , in  fact,  they're  not.

That  would  mean  that  I  would  control  that.

I  would  spend  resources controlling  that  in  the  process.

If  I'm  a  drug  manufacturer,

I  do  not  want  to  control  things that  I  don't  need  to.

I'm  concerned about  that   Type I  error  rate .

If  I  was  a  regulatory  agency,

I  would  be  even more  concerned  with  the   Type II.

There's  no  difference when,  in  fact,  there  is .

You  should  be  controlling  something and  you're  not.

If  I  was  a  regulatory  agency,

I'd  be  more  concerned with  the   Type II  error.

Now, we're  going  to  flip  it.

We're  going to  talk about  equivalence  testing.

Equivalence  testing is  I'm  not  saying  that  they  are  the  same.

I  am  assuming  that  there  is  a  difference.

I just  want to  make  sure that  the  difference  isn't  too  big.

That  too  big,  I'm  going to  call  delta.

T here's  a  lot  of  different  ways to  calculate  that  delta.

I'm  going to  call  it  d or  that  delta  right  there,

often  called  the  equivalence acceptance  criteria.

I would  like  to  come from  subject  matter  expertise,

but  the  majority  of  times, it  comes  from  me  taking

some   k-value  times  that  historical  value.

That's  split  into  two  different  tests .

On  one,  I'm determining if  it's  less  than  positive  d.

In  the  other  one , I  want  to  show in  the  alternate  hypothesis

that  that difference is  greater  than  a  negative  d.

I'm  going  two  different  sides.

That's  what's  called the  left -hand  side  of  the  bottom,

or  the  top.

In  terms  of,  if  I  was  a  drug  manufacturer, what  would  I  want  to  do?

I would  want  to  be  able  to  reject both  of  those  hypotheses.

I  would  want  high  power, low   Type II  error  and  high  power.

T his  is  equivalent  to  taking a  90%  confidence  number

around  the  difference  in  means in  ensuring  that  90%  confidence  interval

whenever  I'm  looking  at  the  low  and  high or  within  the  balance  of  those  lower  delta

in  the  upper  delta.

If  you're  looking  at  this, you  should  think  to  yourself  is,

"I  want  the  width  of  that  confidence interval  to  be  very  small."

What  are  the  different  ways that  I  could  make  the  width

of  that  confidence  interval

for  the  difference  between those  two  means  very  small?

I  could  decrease  my  standard  deviation.

That's  a  good  thing.

I  could  increase  my  sample  size.

That's  a  good  thing.

I  could  also  increase  my  alpha  level.

Maybe  that  wouldn't  be  so  good because  what  you're  doing

is  you're  inflating your   Type I  error  rate.

In  inflating  your   Type I  error  rate, what  you're  saying   is,

I  am  stating  that  they're  equivalent when  indeed they're not.

The  different  ways  to   control  the  width of  that  confidence  interval

is  to  lower  s, increase  in,  or  increase  alpha.

We  talked  about   two of  those  being  good and  one  of  those  not  being  good.

It  makes  sense  that   if  I'm  a  drug  manufacturer,

I  want  to  maximize the  power  of  the  design.

That's  the  flip. I  want  to  minimize  my   Type II  error.

Regulatory  agencies  want  to make sure that  you  do  not  inflate

that   Type I  error  rate.

That   Type I  error  rate would  be  saying  your  assume  equivalent

or  you're  stating  equivalence when  indeed  they're  not.

In  JMP,  you  can  do these  equivalence  tests,

and  I  want to  show  you  an  example  of  that.

For  my  journal,  first  thing  I  want to  do

is  I  want to  show  you that  in  terms  of  determining

your   Type I  error  rates and  your   Type II  error  rates

is  JMP  provides  power  curves

under  Sample  Size  Explorer , Power , Two  Sample  Independent  Equivalence.

Caleb  King  did an  awful  great  job  with  this.

I say  awful  great  job, but  he  did  a  great  job  with  this.

Let's  just  say  that  my  margin, my  equivalence  acceptance  criteria,

is  plus  or  minus  2  standard  deviations.

I'm  just  going  to  put  a  2  here,

and  that's  just  2  times  the  standard deviations  that  I'm  talking  about.

That's  all  that  I'm  doing.

Let's  just  say  that in  my  historical  process

is  I  have  10  lots,

and  I'm  going to  compare  it to  a  new  process  that  has  5  lots.

I  want  to  see  what  the  power  is if  they  are  exactly  the  same,

but  there's  no  difference  between  these.

A  few  things  that  I  want to  point  out  here is  JMP  gives  those  power  calculations.

The  other  thing  that  it  does  is it  allows  you  change  those.

What's  going to  happen  if  I  do  things like  increase  my  new  process,

the  number  of  samples in  my  new  process  day  report ,

my  power  is  going to  go  up.

What  would  happen  if  I  do  things  like,

"Hey,  Heath, I  want  to  decrease  that  margin  of  error

to,  instead  2  standard  deviations , to  say  maybe  1.5  standard  deviations,

essentially , as  I'm  taking  those  boundaries

and  I'm  tightening  them  up."

What  I  see  is  my  power is  going  to  go  down.

I'm  able  to  ask  myself   all  those  typical  questions

that  you  would  in  equivalence  testing.

This  is  something  else  that  I  want to  show  you  that's  going to  come  up

is  JMP  has  the  ability  to  say,

do  I  know   the  true standard  deviation  or  not?

If  I  know  the  true  standard  deviation, that  is  going  to  be  better.

You're  going to  see that  your  power  goes  up.

Indeed,  what  happens is  my  power  goes  up.

That's  usually  not  the  case.

I  always  call  that  the  utopia,

which  uses  the  cases  if  I  do  not  know what  that  true  standard  deviation  is.

I  always  call  this  the  optimum, a  car  of  the  utopia.

I  always  call  the  no,  the  realism.

I  would  be  remiss

if  I  did  not  show  you  the  tools that  JMP  does  have

for  showing  that  equivalence

like  if  I  had  an  historical  process where  I  had  10  lots  and  I  made  5  new  ones.

First  thing  I  want to  do

is  I  want to  look  at  this through Graph Builder,

and  I  see  that  there  is  no  effect between  those  two.

I  can  see  both  of  those , and  they  both  look  like

they  came  from  the  new  process, the  blue  versus  the  red.

How  about  if  there  is  an  effect ? What  I  do  is  I  see  a  shift.

Just  like  I  showed  you  before is that  is  Two  One Sided   t-test.

JMP  has  tools  for  that.

Jin  Feng  did  a  great  job  with  this. My  goodness.  I  love  the  scores  plot.

Here's  the  difference  in  means. Here's  the  lower , and  here's  the  upper,

and  that's  within  the  boundaries.

In  that  case,   what  you've  done is  you  rejected  both  the  null  hypothesis

in  favor  of  the  alternate,

which  is  the  same as  what  you  see  in  the  picture.

What  you  also  see  here is  that  if  there  is  an  effect,

is  I  am  not  going to  reject  both  the  nulls.

One  of  those  is  I  am  going  to  fail to  reject  and  indeed  I  did.

What  you'll  see   is  my  confidence  intervals outside  that  boundary.

I  would  like  to  talk about  a  very  specific  case.

A  very  specific  case  in  cell  therapy is  called  split  apheresis  design.

In  a  split  apheresis  design , this  is  a  situation  where

in  cell  therapies is  you're  changing  the  process.

What  you  do is  you're  using  donor  materials  split

between  the  two  different  processes.

We  kept  getting  questions over  and  over  and  over  again

from  our  customers  about,

"Can  I  look  at  the  sample  size and  power  calculations

for  these  pair  of  designs ?"

Cannot  overlay  them .

You cannot  see  if  they're  dependent upon  that  donor -to-donor  variability?

Let's  talk  about  a  split  apheresis  design.

In  a  split  apheresis  design, first  thing  I  want to  do

is  I  want  to  tell  you  about the  regulatory  expectation.

This  is  even  a  recent  draft guidance  document  from  the  FDA

in  July  of  2023,  just  last  month.

In  that they  said  that  you  need to  select  a  suitable  statistical  test

for   analysis in  difference between  paired  data

where  those  donors  are  paired  up.

That's  where  the  split  apheresis  design comes  from.

For  every  single  donor  material that  you  have,

you  split  it  in  between process   A and  process  B.

This  is  not  two  independent   t-tests.

What  this   is, is  a  paired  design.

That's  the  first  thing that  I   wanted  to  talk  about.

The  second  thing  is,

I  wanted  to  talk  about that  you  are  very  often  in  early  stage,

so  you  do  not  have a  line  of  donor  materials,

so  you  have  very  low  sample  sizes.

It's  hard  to  get  power out  of  low  sample  sizes.

The  third  thing that  I'm  going  to  tell  you  is ,

how  do  you  come  up  with  your  EAC?

How  do  you  come  up with  your  similarity  condition ,

that  difference, that  acceptable  difference?

What  you  do   is  you  use  historical  data that  is  made  off  of  multiple  donors.

You  take  the  standard  deviation used  off  of  historical  data.

I'm  going to  call  that  n 1  or  historical.

You  take  some  k  number of  standard  deviations  of  historical  data .

You  do  a  test , and  you're  using the  split  apheresis  design

to  judge  off  of  that  historical  data.

These are two examples that I want to show you.

The first example here

is  where  you're  looking at  process   A and  process  B.

What  you  see   is  you  do  see six different  donors  here.

What  you  see  in  the  one  on  the  left is  the  majority  variation  is  coming

from   donor-to-donor  variability,

not  the  difference  between process  A and  process  B.

You  have  high  donor -to -donor  variability.

I'm  going to  call  that,  prho.

In  the  case  on  the  right, what  you  do   is,

is  the  majority  variation is  coming  from  the  difference

between  process  A  and  process  B, not  the   donor-to-donor  variability.

The  majority  variation  is  coming from  the  analytical  or  the  process.

What  that  tells  you is  you  have  very  low  rho.

You'd  have  low   donor-to-donor  variability.

I'm  going to  show  you  a  series of  scripts  that  we  worked  on.

These  are  typical  questions that  came  from  our  customers.

In  our  cases,  we  do  not  know what  the  standard  deviation.

How  does  that  compare  to  the  known?

How  about  those   Type I and   Type II  error  rates?

Remember,  if  I'm  a  drug  manufacturer, I  want  to  increase  the  power.

If  I'm  a  regulatory  agency,

I  want  to  make  sure  that  you  do  not inflate  that   Type I  error  rate.

How  are  we  going to  do  this?

This  is  from  the  European Medicines  Agency,  2001.

The  best  way  to  do  that  is  with  things

called  expected  operating characteristic  curves.

That  gives  you  power  on  the   y-axis and  a  shift  in  the  main.

I 'm  going  to  go  through a  series  of  scripts,

and  these  series  of  scripts ...

It's really one script that  have  right  here,

that  it's  going to  allow  me to  change  things  like  that  rho,

that  proportion of   donor-to-donor  variability.

That  k -value,  remember, how  do  I  set  the  acceptance  criteria?

It  is  k  times  that  standard  deviation.

The  typical  way  of  doing  this is  that   k times  those  historical  lots.

This  is  the  number  of  historical  lots that  you  use   n1.

n2 is  the  number  of  lots  that I'm  going to  use  for  that  paired  design.

Whenever  you  run  the  script, what  happens   is  you  come  out ,

and  it  does  a  series  of  simulations.

In  this  case,  it  did  5,000  simulations ,

and  it  calculates  the  power  for  you.

In  those  5,000  runs, what  percentage  of  those  passed?

It  looks  something  like  this. It  gives  you  a  lot  of  different  options.

My  goodness . I  can  look  at  different  k -values.

I  can  look  at  a  different  number  of   n1, which  are  called  historical  lots.

I  can  also  look  at  the  different number  of  n 2  or  paired  lots.

Right   now, I  want to  talk  about...

Whenever  I  do  this,  what  I  can  do

is I  can  select  which  of  these different  cases  that  I  want  to  look  at

to be  able  to  answer  typical  questions.

Let  me  open  up my  typical  comparisons  here.

The  first  one  I  want to  talk  about  is,

"Heath, what  if  I  have a   known standard  deviation?"

Look s something  like  this.

That's  what  the  known standard  deviation  looks  like.

A  few  things  that  I  want to  point  out

is  this  is  the  percentage  of  time  that  you're  going   to  claim  equivalence.

If  they're  exactly  the  same  that  you  said you're  going to  claim  equivalence

a  high  percentage  of  time.

If  there's  a  huge  difference  between  them like  a  two  standard  deviation  shift

or  a   three standard  deviation  shift , is  you're  not  going  to  claim  equivalence.

That's  a  good  thing.

The  other  thing that  I  want to  show  you  here

is   if  you're  looking at  this  alpha  of   0.05,

being  that  I  set  my  k -value  at  2 ,

k  number  of  standard  deviations versus  10  historical  lots,

the  standard  deviation of  10  historical  lots,

you  would  expect that  alpha  level  would  be  0.05,

the  exact  alpha  level that  I  set  in  my  equivalence  test.

Right  now, the  thing  that  I  want to  show  you

is  this  is  for  a  proportion of  donor -to -donor  variability  of  90%.

What  happens  if  I  change  that?

What  happens  if  I  change  that  to  60%.

What  happens  if  I  change  that  to  30% ?

There's  no   donor-to-donor  variability.

What  you  see   is  that  paired  test, the  power  curve  looks  really good

whenever  I  have high  donor -to -donor  variability.

The  other  thing  that  you  notice with  the  known  standard  deviation

is the  alpha  level  regardless of  operating  characteristic  curve

is  always  at   0.05.

Let's  talk  about   some  other  typical  questions.

One  typical  question   is ,

how  does  it  compare for  the  different  levels   of  rho?

How  does  my  typical  way  of  doing  this ?

I  do  not  know what  the  standard  deviation  is.

My  typical  way   of  doing  this is  in  the  blue.

The  known  standard deviation  is  in  the  red.

One  thing  that  I  want to  point  out

is  I  want to  point  out this  one  right  here.

What  you  see is   the  preferred  approach,

the  approach  that  even  regulatory documents  have  said  that  you  should  do,

the  paired  approach,

using  the  standard  deviation

that  is  calculated off  of  my  historical  lots,

is  I  have  an  inflated  Type I  error  rate.

This  should  be   0.05  just  like  it  is  here.

That  was  really strange  to  us, and  we  looked  into  this.

When  we  looked  into  it, what  we  found  is ,

it  has  everything  to  do with  this  right  here.

The  reason  why  it  has  everything to  do  with  this  right  here,

as  I  said,  how  do  I  decrease   the  width of  that  confidence  interval?

The  way  that  I  decrease the  width  of  that  confidence  interval

was  either  to  decrease  s , or  increase  n , or  increase  my  alpha  level.

Understand  this.

This  is  why  you  have an  inflated   Type I  error  rate

with  this  paired  test

is  those  deltas , which  you're  using  to  judge  this  off  of,  those  deltas

are  using  the  standard deviation  off  of  historical  data

that  contains  donor-to-donor  variability.

That  confidence  interval  right  there

does  not  contain donor-to-donor  variability.

Why? Because  you  did  a  pair  test .

That  contains  only  analytical and  process  variability.

That's   where that  inflated   Type I error  rate  comes  from.

Using  this  paired  approac h is  understand

you  have  an  inflated   Type I  error  rate.

We  see  that, and  it's  even  more  prevalent

when  you  have  high donor -to -donor  variability.

Why?  Because  if  you  have low   donor-to-donor  variability,

th at process  variability

is the  largest  part  of  the  variance component  that  you  have.

Let's  look  at  a  few  more questions  that  you  have.

A s  I  said,   this  one  script answers  these  different  questions.

This  is  answering  the  question ,

"Hey,  Heath,  if  I  use  that  paired  approach that's  recommended,

can  I  look  at  what  happens as  I  increase  sample  size

from   3  to 4 to 5 to 6?"

Two  things  that  I  want to  point  out  here

is  number  one ,  what  you  see as  I  increase  sample  size,

is  I'm  going to  have  higher  power.

I  still  do  not  have  adequate  power if  there's  no   donor-to-donor  variability.

That  means   that  I  have 0  donor-to-donor  variability.

I  would  need  at  least a  sample  size  of  8  or  8  different  donors.

If  I  do  have  high donor-to-donor  variability,

like  0.9, 90 %  of  that  variability, which  you  see  is  I  do  have  high  power

for  no  difference  between  the  means.

What  I  can  do   is  I  can  make  sure to  answer  those  questions

with  overlaid  operating  character s occurs  for  different  sample  size.

I  can  also  answer  that  question   if  I  was  looking  at , and  I  say,  "Hey ,

I've  stated  my  different  sample  sizes,

but what if we look at  the  different  k -values?"

Understand  that  your  acceptance  criteria is  k  number  of  standard  deviations.

What's  going to  happen is that  acceptance  criteria

are those  what  I  call  go  post are  going  to  widen  as  you  increase  k.

Therefore,  you're  going to  have a  much  higher  ability

to  pass  equivalence ,

and  you're  going to  have much  higher  power.

Another  typical  question  is  this.

What  if  I  want to  change both  of  those  together?

I'm  a  big  fan  of   Graph Builder .

What  Graph  Builder   is what  you're  looking  at  here

is not  only  are  you  looking  at,

"Hey , Heath,  I  am  increasing sample  size  in  blue,  that  would  be  3,

in  red,  that  would  be  4,

in  green,  that  would  be  a  5, and  in  purple,  that  would  be  6 ,

but  I  also  looked  at  it for  different  k -values.

What  would  your  operating characteristic  curves  look  like?"

Good?

I  want to  revisit  this.

Just  like  I  said  before , I  said, "Hey ,  I  want to  revisit  this

and  show  you  that  for..."

Whenever  I  have  a  large  proportion of  donor -to -donor  variability ,

I  said,  "What  you  see  for  2  right  here, I  would  expect   my  alpha  level

that  my  proportion  of  time that  I  pass  this  test  would  be   0.05."

But  what  you  see  is  you  have inflated   Type I  error  rate.

How  does  this  look?

Whenever  I'm  looking  at   a  rho

or  a  proportion of   donor-to-donor  variability

that  is  very  small,

I  do  not  have  much  power.

The  question  was , what  if  we  did  this  instead?

If  we  had  low   donor-to-donor  variability,

if  what  we  did   is  we  used information  from  those  historical  lots.

If  I  have  no   donor-to-donor  variability or  very  low   donor-to-donor  variability,

why  couldn't  I  just  do a  independent  t -test,

where  I  compare  from  process  A or  my  historical  process,

not  just  the  paired  lots,

but  I  also  consider those  10  historical  lots

and  not  comparing  to  the  mean of  the  new  process?

We  wanted  to  see  how  that  compared .

Doing  it  that  way   is the independent test  is  in  the  red.

The  paired  way  is  in  the  blue.

What  you  see   is,  if  I  have little  to  no   donor-to-donor  variability

in  my  cell  therapy split  apheresis  process ,

you  said  that  the  independent   t-test

has  much  better  profile than  the  paired  approach.

However,  if  I  have  high donor-to-donor  variability,

that  paired  approach  in  the  blue

has  a  much  better  operating characteristic  than  the  red.

Right  now,  the  question  is instead  of  just  automatically

doing  that  split   apheresis  pair  design,

maybe  it  would  be  better

to  make  a  decision  based  upon that   donor-to-donor  variability.

How  does  this  compare   whenever I'm looking  at  different  k -values?

I  see  the  exact  same  thing, the  exact  same  phenomena that

with  a  low  donor-to-donor  variability,

it  makes  sense  to  do the  independent  t -test.

With  high   donor-to-donor  variability is  I  have  a  much  better

operating characteristic  curve

are  higher  power  associated with   the  paired  approach.

It  doesn't  matter  if  I  looked at  a   k of  1.5 , or  2,  or  even  3.0.

Regardless  of  the  k -value ,

I  have  a  much  better operating  characteristic  curve

if  I  consider that  donor -to -donor  variability.

What  if  I  looked  at  different  values of  those  historical  lots?

I  looked  at  3.

We  looked  at  4.

W e  looked  at  5  paired  lots. We  looked  at  6  paired  lots.

Regardless, you  see  the  same  phenomena.

We're  currently  writing  a  paper  on  this

to  try  to  propose

that  if  you  have  low donor-to-donor  variability,

maybe  it  does  not  make  sense for  you  to  use  a  split   apheresis

or  a paired  analysis  approach.

Maybe   the  approach  is  only  good

whenever  you  have  high donor -to -donor  variability.

T hese  are  typical  questions   that  are  asked in  the  split   apheresis  designs.

What  I  want to  do is  I  just  want to  cover

t wo  or  three  more  of  these

j ust  to  show  you  a  few o ther  things  that  you  could  do.

These  are  different  things that  we  were  looking  at .

We  looked  at,  "Hey , how  does the  operating  characteristic  curve,

how  does  that  compare if  we  looked  at  in  the  blue

that's  using  nothing but  the  historical  lots

to  estimate  the  standard  deviation

versus  if  you  use  the  paired  and  the  historical  lots,

which  is  in  the  red?"

What  you  see  is  there's  not  much difference  between  these  two,

especially  if  I'm  using  higher sample  sizes  like  the  n 2.

W e  also  looked  at,

"Hey,  if  I  estimated that  standard  deviation

using  a  few  different  ways,

what  if  I  looked  at  estimating

that  standard  deviation using   the  historical  lots,

which  is  in  the  blue,  versus  in  the  red

is  using  the  historical  lots and  the  paired  lots?

I  compare  the  independent  case versus  the  paired  case.

What  do  I  see? "

As  I  said  before, you  see  that  exact  same  phenomena

with  a  low  donor-to-donor  variability.

The  much  better  way  of  doing  this would  be  an  independent  t -test

on  the  lower  right -hand  corner.

That  is  where  you  high donor-to-donor  variability.

It  makes  sense  that  we  would  use the  paired  approach.

Last  one  that  I  want  to  show  you

is  this  is  something that  we've  been  working  on.

We  looked  at  the  paired  approach versus  the  independent.

The  paired  approach  is  in  the  blue.

The  independent  is  in  the  red.

I've  said  this  over and  over  and  over  again.

That  it  makes  sense  that  if  I  have low  donor-to-donor  variability,

the  independent  case in  the  blue  looks  much  better.

If  I  have  high   donor-to-donor  variability, the  paired  approach  looks  better.

But  one  thing  that  we  did i s we  took  a  look  and  just  said,

"What  if  I  took  a  look   at  the  approach that  gave  me  the  shortest

with  that  confidence  interval?"

That's  in  the  green .

What  you  see  is  that  usually gives  you  the  best  approach

regardless  of   what  your  rho  is

or  what  your  proportion of  donor -to -donor  variability  is.

In  closing,

I  would  like to  just  point  out  a  few  things.

This  script  that  we  have  answers, along  with  the  typical  questions

that  our  customers  have on  operating  characteristic  curves,

associated  with  these split   apheresis  designs,

what I do  want   to  pull  away from  here,  though,

is  if  you  do  have  a  low  proportion of   donor-to-donor variability

is  you'll  see  that  these  designs are  very  underpowered

for  fewer  than  8  lots, fewer  than  8  different  donor  material.

We  live  in  a  world  in  cell  therapies

where  you  do  not  have  a  lot  of  donor  materials,

so  you  have  very  low  sizes.

It  would  be  much  more  efficient if  you  had  low   donor-to-donor variability

to  use  the  independent  case.

We  do  have  the other  revisions  that  we  made  on  this

where  if  you  were  able to   make  multiple  lots

for those  paired  approaches with  the  same  donor,

or  if  you're  able  to  take  multiple  measurements

to  be  able  to  look  at  those operating  characteristics  curves.

Thank  you.