Choose Language Hide Translation Bar

Approaches to Comparisons with JMP (2022-US-45MP-1102)

It is common to need to compare two populations with only a sample of each population. Statistical inference is often used to help the comparison. Our presentation is limited to statistical inference that involves two hypotheses: the null hypothesis and the alternative hypothesis. Sometimes the goal of the comparison is to provide sufficient evidence to decide that there is a significant difference between two populations. At other times, the goal is to provide sufficient evidence that there is significant equivalence, non-inferiority, or superiority between two populations. Both situations can be assisted with a hypothesis test, but they require different tests. We review these situations, the appropriate hypotheses, and the appropriate tests using common examples.

 

Another common comparison is between two measurements of the same quantity. This situation is broadly covered by Measurement System Analysis. Our presentation focuses instead on the Method Comparison protocol for chemical and biological assays used by pharmaceutical and biotechnology development and manufacturing. We present two methods that are available in JMP 17 to assess the accuracy of a new test method against an established reference method. One method is known as Deming regression or Fit Orthogonal in JMP. The second method is known as Passing-Bablok regression. We review the background of assessing accuracy, the unique nature of data from method comparisons, and demonstrate both regression methods with examples.

 

 

Hello. My  name  is  Mark  Bailey.

I'm  a  senior  Analytics  Software  Tester at  JMP.

My colleague  today  is Jianf eng  Ding,

a  Senior  Research Statistician  Developer.

I'm  going  to  start  the  presentation

about  some  new  approaches  to  comparisons that  will  be  available  in  JMP  17.

I'm  going  to  start  over. I  don't  know  why.

Hello. My  name  is  Mark  Bailey.

I'm  a  Senior  Analytics  Software  Tester at  JMP.

My  co-presenter today  is  Jianfeng  Ding,

a  Senior  Research Statistician  and  developer.

I'm  going  to  begin  with an  introduction  to  our  topic.

Before  we  talk  about  specific  comparisons,

we'd  like  to  introduce some  fundamental  concepts.

All  of  this  has  to  do  when we're  comparing  populations.

Comparing  populations is  a  very  common  task.

The  comparison,  we  hope,  will  lead to  a  decision  between  two  hypotheses.

Samples  from  these  populations  are often  collected  for  the  comparison.

S tatistical  inference  can  provide  some valuable  information  about  our  samples.

In  particular,

is  there  sufficient  evidence  to  reject one  hypothesis  about  these  populations.

A clear  statement  of  the  hypothesis  is

really  essential  to  making  the  correct choice  of  a  test  for  your  comparison.

These  hypotheses represent  two  mutually  exclusive  ideas

that  together  include the  only  possibilities.

They're  called  the  alternative and  null  hypotheses.

The  alternative  hypothesis

is  really  a  statement  about the  conclusion  that  we  want  to  claim.

It  serves  to  represent  the  populations and  it  will  require  sufficient  evidence

to  overthrow  the  other  hypothesis, which  is  the  null  hypothesis.

It  states  the  opposing  conclusion  that must  be  overcome  with  strong  evidence.

It  serves  as  a  reference  for  comparison and  it's  assumed  to  be  true.

This  is  important  that  we sort  this  out  today  because  historically,

statistical  training  has  presented  only one  way  of  using  these  hypotheses.

The  most  often  taught  statistical  tests

are  used  to  demonstrate  a  difference between  the  populations.

But  that's  not  the  only  possibility.

The  lack  of  understanding about  this  distinction

can  lead  to  misusing  these  tests.

The  choice  of  a  test  is  not  a  matter of  the  data  that's  collected

or  how  the  data  is  collected.

It's  strictly  a  matter  of the  stated  hypotheses

for  the  purpose of  your  comparison.

Let's  look  at  two  similar  examples  that are  actually  fundamentally  different.

But  let's  start  out  where  we  have a  purpose  of  demonstrating  a  difference.

In  this  example,  let's  say  I  would  like  to demonstrate  that  a  change  in  temperature

will  cause  a  new  outcome, an  improvement  perhaps.

We  want  to  claim  that  a  new  level

of  our  response  will  result from  changing  the  process  temperature.

We'll  use  a  designed  experiment to  randomly  sample  from  a  population

for  the  low  temperature  condition and  the  high  temperature  condition.

The  two  hypotheses  are  the  null  states

that  the  temperature  does not  affect  the  outcome.

This  will  be  our  reference.

The  alternative  states  our  claim,

which  is  the  temperature  affects the  outcome,

but  only  if  the  evidence  is  strong enough  to  reject  the  null  hypothesis.

All  right,  this  is  going  to  sound  very similar,  but  it's  exactly  the  opposite.

In  this  case,  an  example  two, we  need  to  demonstrate  equivalents.

Here  we  want  to  demonstrate

that  a  temperature  change doe  not  cause  a  new  outcome.

That  is,  after  the  change, we  have  the  same  outcome.

For  example,  this  might  be  the  case  where

we  are  planning  to  change the  process  temperature

to  improve  the  yield,

but  we  want  to  make  sure that  it  doesn't  change

the  level  of  an impurity  in  our  product.

We  design  the  same  experiment to  collect  the  same  data

and  we  have  the  same  two  hypotheses, but  now  they're  reversed.

It's  the  null  that  states that  the  temperature  affects  the  outcome,

that  is,  there's  a  difference,

while  the  alternative  states

that  our  change  in  temperature will  not  affect  the  outcome.

Are  we  testing  for  a  difference or  for  equivalents?

Really  we  see  that  from  these  examples

that  it's  not  the  data,  the  data identical,  but  the  tests  are  different.

the  choice  is  not  about  the  data,

it's  about  our  claim,  or  in  other  words, how  we  state  our  hypotheses.

Also  remember  that  hypothesis tests  are  unidirectional.

They  serve  only to  reject  a  null  hypothesis

of  a  high probability  when  it's  false.

in  our  presentation  today,

we'd  like  to  introduce  some  new equivalents  tests  as  well  as  some

additional  methods  that  are  used  when comparing  two  measurement  systems.

I'm  now  going  to  hand  it  over  to  Jianfe ng to  talk  about  equivalence  test.

Thanks  Mark.

Hello.

I'm  Jianfeng Ding .

I'm  a  Research  Statistician Developer  at  JMP.

In  this  video I'm  going  to  talk  about  the  equivalence,

non-infererority and  superiority  test  in  JMP  17.

The  classical  hypothesis  test  on  the  left

is  the  test  that  most  quality professionals  are  familiar  with.

It  is  often  used  to  compare two  or  more  groups  of  data

to  determine  whether they  are  statistically  different.

The  parameters  data  can  be  a  mean  response

for  continuous  outcome  and  a  proportion when  the  outcome  variable  is  binary.

Theta  t  represents  the  response from  treatment  group

and  theta  zero  represents response  from  a  control  group.

There  are  three  types of  the  classic  hypothesis  test.

The  first  one  is  the  two  sided  test and  the  rest  are  one  sided  tests.

If  you  are  looking  at  this  two  side  test on  the  left,

the  no  hypothesis  is  that the  treatment  means  are  same

and  the  alternative  hypothesis  is  that the  treatment  means  are  different.

Sometimes  we  really  need  to  establish that  things  are  substantially  the  same

and  the  machinery  to  do  that  is called  an  Equivalence  Test.

An  equivalent  test  is  to  show the  difference  in  theta  t  and  theta  zero

is  within  a  prespecified  margin  delta

and  allow  us  to  conclude  the  equivalence with  a  specified  confidence  level.

If  you  look  at  the  equivalence  test, the  no  hypothesis  is  that

the  treat  statement  means  are  different and  the  alternative  hypothesis  is  that

the  treatment  means  are  within a  fixed  delta  of  one  another.

This  is  different  from  the  two sided  hypothesis  test  on  the  left.

Another  alternative  testing  scenario  is the  Non-inferiority  Test,

which  aims  to  demonstrate  that  results are  not  substantially  worse.

There  is  also  a  testing  scenario  called superiority  testing,

that  is  similar  to   non-inferiority  testing,

except  that  the  goal  is  to  demonstrate that  results  are  substantially  better.

There  are  five  different  types of  equivalent  type  test

depend  on  the  situation.

When  should  we  use  this  test will  be  discussed  next.

These  tests are  very  important  in  industry,

especially  in  the  biotech and  pharma  industry.

Here  are  some  examples,

if  the  goal  is  to  show  that  the  new  treatment

does  not  differ  significantly from  the  standard  one

by  more  than  some  small  margin, then  equivalent  test  should  be  used.

For  example,  a  generic  drug that  is  less  expensive

and  cause  few  side  effects than  a  popular  name  branded  drug.

You  would  like  to  prove  it  has  same efficacy  as  the  name  brand  one.

The  typical  goal in  non-inferiority  testing  is  to  conclude

that  a  new  treatment  or  process

is  not  significantly worse  than  the  standard  one.

For  example,  a  new  manufacturing process  is  faster.

You  would  make  sure  it  creates  no  more product  defects  than  the  standard  process.

A  superiority  test  tries  to  prove

that  the  new  treatment  is  substantially better  than  the  standard  one.

For  example,  a  new  fertilizer  has  been developed  with  several  improvements.

The  research  wants  to  show

that  the  new  fertilizer  is  better than  the  current  fertilizer.

How  to  set  up  the  Hypothesis.

The  graph  on  the  left,

summarize  these  five  different type  of  equivalent  type  tests  very  nicely.

This  graph  is  created  by  SAS  STAT  College, john  Castelloe  and  Donna  Watts.

You  can  find  their white  paper  easily  on  the  web.

Choosing  which  test depend  on  the  situation.

For  each  of  the  situation,

the  blue  region  is  the  region  that  you are  trying  to  establish  with  the  test.

For  equivalent  analysis, you  can  construct  an  equivalence  region

with  upper  bound  theta  zero  plus  delta and  lower  bound  theta  zero  minus  delta.

You  can  conduct an  equivalence  test  by  checking

whether  the  confidence  interval  of  theta

lies  entirely in  the  blue  equivalence  region.

Likewise,  you  can  conduct a  non-inferiority  test  by  checking

whether  the  confidence  interval  of  theta

lies  entirely  above the  lower  bound  if  large  theta  is  better,

or  below  the  upper  bound  if smaller  theta  is  better.

These  tests  are  available  in  one  way for  comparing  normal  means

and  in contingency for  comparing  response  rates.

The  graphical  user  interface of  equivalence  test  launch  dialog

makes  it  easy  for  you to  find  the  type  of  test

that  corresponds to  what  you  are  trying  to  establish.

A   [inaudible 00:12:00]  in  the  report summarize  the  comparison  very  nicely

and  makes  it  easy  for  you to  interpret  the  results.

Next,  I'm  going  to  demonstrate equivalence  tests.

I'm  going  to  use  the  data  set called  Drug  measurements.

That  is  in JMP  sample data  as  my  first  example.

Twelve  different  subjects  were  given three  different  drugs  A,  B  and  C.

And  32 continuous  measurements  are  collected.

We  go  to  theta  YX,

and  we  load  the  response and  the treatment.

This  will  bring  the  one  way  analysis.

Under  the  red  triangle find  the  equivalence  test.

There  are  two  options  means and  standard  deviations.

We  are  going  to  focus on  means  in  this  talk.

we  bring  the  dialogue  and  you  can  select a  test  that  you  would  like  to  conduct

and  the  graph  will represent  the  selected  test.

For  the  superiority  or   non-infererority test  there  are  two  scenarios.

Large  difference  is  better  or smaller  difference  is  better.

Choose  option  depend  on  the  situation.

You  also  need  to  specify the  margin  here  for  the  delta.

You  need  to  specify the  significance  level  alpha  as  well.

You  can  choose  use  pooled  variance  or unequal  variance  to  run  the  test.

You  can  do  all  pair  wise  comparison

or  you  can  do  a  comparison with  the  control  group.

We're  going  to  run an  equivalence  test  first

and  we  will  specify  the  three  as margin  for  the  difference.

We  click  the  Okay  button.

Here  is  the  result of  the  equivalence  test.

From  this  forest plot you  can  see  that

the  confidence  interval  for  the  main difference  between  drug  A  and  drug  C

is  completely  contained in  this  blue  equivalence  region.

The  max  P-value  is  zero  less  than  .05.

We  can  conclude to  the  .o5  significance  level.

Drug  A  and  drug  C  are  equivalent.

But  if  we  look  at  drug  A  and  B, drug  B  and  C

we  can  see  their  confidence  interval of  the  main  difference

is  both  beyond  this  blue  region.

At  the  .05  significance  level

we  cannot  conclude  that  drug  A  and  B or  drug  B  and  C  are  equivalent.

Assume  drug  C  is  our  standard  drug and  we  would  like  to  find  out

if  the  measurements  of  drug  A  or B  are  much  better  than  drug  C.

We  can  run  a  superiority  test  to  prove.

Let  me  close  this  outline  note  first and  we  bring  the  launch  dialogue  again.

This  time  we're  going to  do  a  superiority  test.

For  this  test  we  believe large  difference  is  better.

Here   we  keep  this  selection.

A lso  for  this  study  we  want  to  set drug  C  as  our  control  group.

We  plug  in  the  delta,  the  margin .04  for  this  case  click  Okay  button.

Here  is  the  result  for  superiority  test.

From  the  forest  plot  you  can  easily see  that  the  confidence  interval

of  mean  difference  between  drug  B  and  C

is  completely  contained in  this  superior  region

and  the  P-value is  less  than  .05 .

We  conclude  that  drug  B is  superior  to  drug  C.

The  confidence  interval  of  mean  difference

between  drug  A  and  C  is beyond  this  blue  region.

The  P-value  here is  much  bigger  than  .05 .

we  conclude  at  the  .05  significance  level

we  cannot  conclude  that drug  A  is  superior  to  drug  C.

This  concludes  my  first  example.

Now  I'm  going  to  use  a  second  example

to  use  the  relative  risk between  two  proportions

to  show  you  how to  conduct  a  non-inferiority  test.

Bring  the  data  table.

The  trial  is  try  to  compare  a  drug

called  FIDAX  as  alternative  to  drug  VANCO for  the  treatment  of  colon  infections.

Both  drugs  have  similar efficiency  and  safety.

221  out  of  224  patients  treat  with  FIDAX

achieved  clinical cure by  the  end  of  study.

Compare  to  223  out  of  257 patients  treated  with  VANCO.

We're  going  to  launch  Fe Y  by  X  again.

And  put  our  response  and  a  treatment variable  and  account  will  be  Freg.

Since  the  response variable  is  categorical.

Contingency  analysis  is  produced

and  all  the  test  here  is  based on  classical  hypothesis  test.

The  P-value  suggests  us that  we  cannot  conclude

that  clinical  Q  is different  across  the  drug.

But  for  this  study we  really  want  to  find  out

if  drug  FIDAX  is  not inferior  to  drug  VANCO.

We  go  to  red  triangle  menu, find  equivalent  test.

There  are  risk  difference and  relative  risk.

We  are  going  to  choose relative  risk  to  do  this  case.

In  the  launch  dialogue we  choose  non-inferiority  test

and  the  large  ratio is  preferred  by  us  for  this  study.

We  also  need  to  find the  category  interest.

For  this  study  we  select  yes as  a  category  of  interest

and  we  also need  to  plug  in  our  ratio  margin  here.

We  specify  zero  .09.

We  click  Okay  button  and  here  is the  result  of  non-inferiority  test.

From  the  forest  plot you  can  easily  see  that

the  confidence  interval  for  the  relative risk  between  drug  FIDAX and  drug  VANCO

is  completed contained  in  this  non-inferior  region.

We  conclude at  the  .05  significance  level,

drug  FIDAX  is  not inferior  to  drug  VANCO.

This  concludes  my  talk  and  I will  give  it  back  to  Mark.

Thank  you  JainFeng.

I'm  going  to  now  talk  about  a  very common  procedure  called  method  comparison.

It's  a  standard  practice  whenever  new measurements  are  being  developed.

We  have  to  assume  that  there  is a  standard  method  that  exists  already

to  measure  the  level  of  some  quantity.

Perhaps  it's  the  temperature or  the  potency  of  a  drug.

A new  method  has  been  developed for  some  reason.

We  want  to  make  sure  that  its  performance is  comparable  to  the  standard  method.

Today  there  are many  standards  that  have  been  developed

over  many  years  by  various  organizations to  make  sure  that  this  is  done  properly.

What  we  would  hope  is that  the  new  test  method

ideally  returns  the  same value  as  the  standard  method.

A scatter  plot  of  the  test  method versus  the  standard  method

would  show  that  the  data  agree with  the  identity  line  Y=X .

But  of  course  the  data  points won't  perfectly  agree

because  of  measurement  error

in  both  the  standard method  and  the  new  test  method.

Regression  analysis  can  determine the  best  fit  line  for  this  data

and  the  estimated  model  parameters  can be  compared  to  that  identity  line.

This  ends  up  being  stated  in  the  two hypotheses  as  follows.

The  null  hypothesis

says  that  they're  not  comparable  and  so another  way  of  saying  that  is

the  intercept  is  not  zero and  the  slope  is  not  one.

The  alternative

represents  our  claim  that the  new  method  is  comparable

and  so  we  would  expect  the  intercept to  be  zero  and  the  slope  to  be  one.

We'll  compare  by  using  regression and  ordinary  least squares  regression

assumes  a  few  different  things.

It  assumes  that  Y  and  X are  linearly  related.

It  assumes  that  there  are  statistical errors  in  Y  but  not  in  X.

These  statistical  errors  are  independent of  Y,  that  is,  they're  constant  for  all  Y.

There's  no  data  that  exert excessive  influence  on  the  estimates.

But  in  the  case  of  a  method  comparison,

the  data  often  violate  one  or more  of  these  assumptions.

There  are  measurement  errors in  the  standard  method  as  well.

Also,  the  errors  are  not  always  constant,

in  which  case  we  might  observe  that  the coefficient  of  variation  is  constant.

That  is,  the  errors  are  proportional,  but the  standard  deviation  is  not  constant.

Finally,  there  are  often  outliers  present

that  can  strongly  influence the  estimation  of  these  parameters.

other  regression  methods  can  help.

Deming  regression will  simultaneously  minimize

the  least  squared  error  in  both  Y  and  X

and  Passing-B ablok  regression is  a  non-parametric  method.

It's  based  on  the  median of  all  possible  pair-wise  slopes

and  because  of  that  it's  resistant to  outliers  and  non- constant  errors.

The  Deming  regression  is  available  in  JMP

through  the  Bivariate  platform using  the  Fit  Orthogonal  command.

Deming  regression  can  estimate the  regression  several  ways.

It  can  estimate  the  error  in  both  Y  and  X,

or  it  can  assume  that the  error  in  Y  and  X  are  equal,

or  it  can  use  a  given ratio  of  error  of  Y  to  X.

Passing  Bablo k  is  now  available  in  JMP  17,

again  through  the  Bivariate  platform using  the  Fit  passing  Bablok  command.

It  also  includes  checks for  the  assumptions  that

the  measurements are  highly  positively  correlated

and  exhibit  a  linear  relationship.

There's  also  a  comparison  by  difference.

The  Bland- Altman  analysis  compares

the  pair-wise  differences to  the  pair-wise  means

to  assess  the  bias between  these  two  measurements.

The  results  are  presented  in  a  scatter plot  of  Y  versus  X  for  your  examination

and  also  to  see  if  there  are  any anomalies  in  the  differences.

This  is  all  provided  through the  Match  Pairs  platform  along  with

several  hypothesis  tests.

I'll  now  demonstrate  these  features.

I'm  going  to  show  you Deming  regression  for  completeness

that's  actually  been available  in  JMP  for  many  years.

I'm  going  to  use  a  data  table  which  has measurements  for  20  samples

by  the  standard  method and  then  four  different  test  methods.

I'm  just  going  to  use  method  1.

I  start  this  by  selecting the  analyze  menu  set  Y  by  X.

The  standard  goes  in  the  X  roll, while  the  method  one  goes  in  the  Y  roll.

Here  we  have  the  scatter plot  to  begin  with.

I'll  click  the  red  triangle and  select  Fit  Orthogonal

and  you  can  see  the  different  choices  I mentioned  just  a  moment  ago.

I'm  going  to  have  JMP  estimate the  errors  in  Y  and  X.

There's  a  best  fit  line  using  Deming regression  along  with

information  about  that.

we  can  see  that  our  intercept for  the  estimated  line  is  close  to  zero,

our  slope  is  close  to  one,  and  in  fact our  confidence  interval  includes  one.

Now  I'm  going  to show  you  Passing  Bad blok.

I  return  to  the  same  red  triangle,

select  Fit  passing  Ba blok,

and  a  new  fitted  line is  added  to  my  scatter  plot.

It  looks  very  much  like  the  result from  the  Deming  regression.

But  remember  that  Passing  Ba blok

is  resistant  to  outliers or  non-constant  variance.

First  we  have  Kendall's  test that  is  telling  us  about  the  correlation.

Positive  correlation is  statistically  significant.

We  then  have  a  test, a  check  for  linearity,

and  we  have  a  high  P- value  here indicating  we  cannot  reject  linearity.

Finally  we  have the  regression  results.

I  see  that  I  have  an  intercept  close to  one,  but  the  interval  includes  zero,

so  I  can't  reject  zero.

The  slope  is  close  to  one.

My  interval  includes  one, so  I  can't  reject  that  the  slope  is  one.

Finally, using  Passing  Ba blok  this  curve  menu,

I'll  click  the  red  triangle and  select  Bland- Altman  analysis.

This  launches  the  Match  Pairs  platform, so  it's  a  separate  window.

Here  we  are  looking at  the  pair-wise  differences

between  method  one  and  the  standard

versus  the  mean  of  those  two  values.

We're  using  this  to  assess  a  bias.

The  Bland- Altman a nalysis is reported  at  the  bottom.

The  bias  is  the  average  difference.

We  hope  that  it's  zero.

The  estimate  is  not  exactly  zero,

but  we  can  see  that the  confidence  interval  includes  zero,

so  we  would  not  reject  zero.

We  also  then  have  lower  limits of  agreement,

and  we  see  that  they also  include  zero  as  well.

The  standard  methods  that  are  used

when  comparing  two  measurement  methods are  now  available  in  JMP  17.

That  concludes  our  presentation.

Thank  you  for  watching.