Choose Language Hide Translation Bar

Drones Flying in Warehouses: An Application of Attribute Gauge Analysis (2022-US-30MP-1108)

Attribute gauge analysis is typically applied to compare agreement or lack thereof between two rating approaches to a problem. For example, two inspectors may have differences of opinion as to whether a part is conforming (Pass) or non-conforming (Fail) based on consideration of specific quality indicators in individual parts. How do we quantitatively measure the degree of agreement? In more complicated situations, attribute gauge analysis may be applied to compare agreement among multiple raters for multiple responses, including agreement to a standard. We describe a personal consulting case involving the use of drones flying in warehouses to read labels of stacked inventory shelves in place of manual efforts by humans. We illustrate the application of JMP’s attribute gauge analysis platform to provide graphical and quantitative assessments such as the Kappa statistic and effectiveness measures to analyze such data.

 

 

Hi,  I'm  Dave  Trindade,

founder  and  owner  of  Stat-Tech, a  consulting  firm  specializing

in  the  use  of  JMP  software for  solving  industrial  problems.

Today  I'm  going  to  to  talk about  a  consulting  project

that  I  worked  on  over  the  last  year

with  a  robotics  company.

We're  going  to  be  talking about  Drones F lying  in  Warehouses:

An  Application of  Attribute  Gauge  Analysis.

Attribute  gauge  analysis  is  typically applied  to  compare  agreement,

or  lack  thereof  between  two or  more  rating  approaches  to  a  problem.

For  example,  two  inspectors may  have  differences  of  opinion

as  to  whether  a  part is  conforming,  call  it  pass

or  nonconforming  call  it  fail.

Based  on  consideration  of  specific quality  indicators  for  individual  parts,

how  do  we  quantitatively measure  the  degree  of  agreement?

Let's  actually  start  off  with  an  example.

Let's  say  we  have  two  inspectors,

inspector  1  and  Inspector  2,

and  they  are  presented with  a  list  of  100  parts,

the  critical  characteristics on  these  100  parts

and  asked  to  determine  whether  each  part should  be  classified  as  a  pass  or  a  fail.

I've  summarized  the  results on  the  table  to  the  right,  partial  table.

You  see  there  are  100  rows  in  the  table.

All  variables  are  nominal, so  the  first  column

is  the  part   1-100.

Then  the  second  column is  the  rating  by  Inspector  1,

whether  it's  a  pass  or  fail, and  then  the  second  inspector

where also  second  column has  a  pass or  fail  rating.

Now,  if  we  were  not  familiar  with  JMP's gauge  attribute  analysis  program,

first  step  that  we  could  take  could  be

to  look  at  the  two classification  distributions

and  use dynamic  linking  to  compare.

What  I  will  do  is  show  you  the  slides

and  then  I  will  go  and demonstrate  the  results  on  the  slides

after  I've  gone  through  the  slides, through  a  certain  amount  of  material.

For  example,  if  we  click,  say, let's  generate  distributions

of  the  two  columns,  the  fail, Inspector 1  and   Inspector 2,

then  we  can  click,  say,  for  example, on  the  fail  column  for   Inspector 1,

you  see  mostly  matches  for  Inspector  2,

but  there  are  a  few disagreements  over  here.

There  are  some  passes  where   Inspector 1

classified  it  as  a  fail, but  Inspector 2  classified  it  as  a  pass.

Now,  when  you  do  click on  that  JMP  highlights,

the  actual  rows  that  correspond  to  this.

You  can  see  over  here,  for  example, row  four,   Inspector 1  called  it  a  fail

and   Inspector 2  called  it  a  pass.

Generally  though, they're  mostly  in  agreement,

fail, failed, fail, fail  and  so  forth.

We  could  also  do  that by  clicking  on   Inspector 2  fail,

and  then  seeing how  it  compares  to   Inspector 1.

We  see  that  there  are  actually five  instances  of  disagreement

between  the  two  inspectors.

When  the   Inspector 1 classifies it  as  a  fail,

there's  five  that I nspector 2  classified  it  as  a  pass.

Now  we  can  also  visualize the  inspector  comparison  data.

To  do  that,  we  can  use  graph builder with  tabulate  to  view  agree  and  disagree

counts  between  the  two  inspectors. Here's  one  way  of  visualizing  it.

We  can  put   Inspector 1 using  graph builder  on  the  horizontal  axis

and   Inspector 2  on  the  vertical  axis.

Then  we  see  now  with  color  coding, whether  it's  agreement  or  disagree,

agree  is  green  and  then  the  rows

that  are  disagree color  code  is  red  markers.

Now  we  can  see  the  actual  distribution.

Then  we  can  use  tabulate  to  actually total  the  numbers  that  are  involved.

We  can  see  over  here for   Inspector 1  and   Inspector 2.

Inspector 1  and   Inspector 2  agreed on  the  fail  categorization

for  42  of  the  parts and  they  agreed  on  44  of  the  pass  parts.

They  disagreed  on  nine  instances  over  here

where   Inspector 2  called  it  a  fail and   Inspector 1  called  it  a  pass.

And   Inspector 2  called  a  pass, where   Inspector 1  called  it  a  fail.

So  those  total  14.

The  inspectors  agreed on  a  classification  for  86%  of  the  parts

and  they  disagreed  on  14%.

From  there  now  we  can  go and  do  attribute  gauge  analysis

and  see  what  JMP  can  do  for  this  analysis.

To  go  to  attribute  gauge  analysis, we're  going  to  go  to  quality

and  analyze  quality  and  process variability  attribute  gauge  chart.

Then  we  cast  the  rows. And  here  I've  shown  the  inspectors

are  listed  under the  Y  response,  both  of  them.

Then  the  column  for  the  part is  listed  as  the  grouping.

These  are  required  entries.

We  notice  the  chart  type is  attribute,  we  click  okay.

And  now  JMP  provides  us  with  this attribute  gauge  analysis  report.

The  first  chart  that's  shown  over  here is  the  percent  agreement  for  each  part.

So  we  have  100  parts  on  the horizontal  rows  axis  over  here,

and  when  there's  100%, it  means  the  two  inspectors  agreed.

When  there's  0%,  it  means  to  disagree.

The  left  chart  shows  the  overall percent  agreement  86%  by  inspector.

Since  the  comparison is  between  only  two  inspectors,

both  are  going  to  have the  same  86%  agreement  value.

The  agreement  report  now  includes

a  numerical  summary of  the  overall  86%  agreement.

You  can  see  86  matches out  of  100  inspected

and  the  individual  ones  are  going  to  be the  same  since  it's  only  one  issue

of  whether  it  was  a  pass or fail  for  a  given  part.

And  95%  confidence  limits  are provided  for  the  two  results,

both  for  the  inspector and  for  the  agreement.

Now  the  agreement  comparisons report  includes  a  new  statistic

that  perhaps  many  people are  not  familiar  with,

called  the  Kappa  Statistic.

It's   devised  by  Cohen, that's  given  in  the  reference.

The  Cohen  Kappa  Statistic  index, which  in  this  case  is   0.7203.

Is  designed  to  correct for  agreement  by  chance  alone.

This  was  very  interesting  to  me when  I  first  read  about  this.

Like, "What  do  we  mean by  agreement  by  chance  alone?"

Let's  go  into  a  little  bit of  an  explanation  of  agreement  by  chance

and  how  can  we  estimate  it.

Let's  consider  two  raters,  R1  and  R 2.

We'll  assume  totally  random  choices

for  each  rater  for  each sample, example,  for  each  part.

We  further  assume  that  the  probability a  rater  selects  either  choice  pass

or  fail  over  the  other  is  50%. So  it's  50/50.

Hundred samples  or  trials  are  differently categorized  by  pass/ fail  for  each  rater,

similar  to  flipping a  coin  for  each  choice.

We  just  visualize  two  inspectors and  they're  each  flipping  a  coin,

and  they're  trying  to  match how  many  get  head, heads

or  tail,  tail, or  head, tail or tail, head.

What's  the  expected  fraction of  agreements  by  chance?

Well,  it's  a  simple problem  in  probability.

Similar  to  tossing  two  coins, there  are  only  four  possible

and  equally  likely  chance  outcomes  between the  two  inspectors  for  each  part.

Rater  1  could  call  it  a  fail, and  Rated  2  could  call  it  a  fail.

They  would  agree. Rater 1  can  call  it  a  pass,

and  rater  2  would  call  it  a  pass and  there'd  be  agreement  there.

The  disagreement  would  be  when  they don't  agree  on  whether  it's  a  pass,  fail.

Now,  these  are  four equally  likely  chances.

Well,  two  of  them  are  to  agree and  two  of  them  are disagree.

Therefore,  the  probability  of  agreement by  chance  alone  is  two  out  of  4 it's 50%.

It's a  simple  probability  problem.

Now,  how  do  we  calculate the   Kappa S tatistic  on  the  other  hand?

As  I  say,  it's  meant  to  correct for  this  expected  probability

of  agreement  by  chance  alone.

The  simple  formula  that  you  can  put into  JMP  for  the   Kappa Statistic

is the  percent  agreement,  in  this  case, would  be  86%  minus  the  expected  by  chance

from  the  data,  which  we  know is  going  to  be  around  50%.

How  do  we  actually  use  the  data to  estimate  the  expected  chance

from  the  data  itself,  the  expected agreement  by  chance  from  the  data?

Well,  the  estimation of  Cohen Kappa Statistic

is  shown  below,  and  this  is basically  how  it's  done.

This  is  the  tabulated  value. This  is  what  we  saw  earlier.

Agreement  on  fail/ fail for   Inspector 1  and   42  instances.

Agreement  between Inspector 1  and   Inspector 2

on  the  past criteria  for  44  instances.

You  had  those  up  and  that's  86%.

So  I  show  you  over  here  in  the  Excel format  that  we  added,  42  plus  44

we  got  divided  by  100.

Then  disagree  is  one  minus [inaudible 00:08:58]

or just  the  five  plus  nine  divided  by  100.

Now,  to  calculate  the  agreement  by  chance, the  Cohen   Kappa Statistic  is  the  sum

of  the  products  of  the  marginal fractions  for  each  pass/ fail  type.

Well,  here  are  the  marginal  fractions.

F or  fail,  the  marginal  fractions  are  51 divided  by  100 and 47  divided  by  100.

So  we  form  a  product  of  those  two. Fifty-one  divided  by  100

times  47  divided  by  100.

Plus,  now  we  enter  the  fail. The other criteria is the fail.

This is 49...  I  should  have  said that  was  the  first  disagreement.

Fifty-one and  47  for  the  fail  criteria. For  the  pass  criteria  now,

we  go  49  out  of  100, and 53  out  of  100,

and  we  multiply  those  two together  and  add  them.

That  gives  us  a  number when  we  calculate  it  out  to  49. 94,

which  is  very  close  obviously,  to   0.5.

Then  the   Kappa Statistic  is the  percentage  agreement

minus  the  expected  by  chance,  divided by  one  minus  the  expected  by  chance.

That  comes  out  to  be 0.7023  in  this  case.

The  Kappa. Here  are  some  guidelines

for  interpreting  Kappa.

Go  back  up  again.

I  want  to  show  you that  the  Kappa  was  72.03.

The  guidelines  for  interpreting Kappa  are  if  Kappa  is  greater

than  0 .75,  it's  excellent.

If  it's  between  0.4 and 0. 75,  it's  good.

And  if  it's  less  than  0 .40 it's  called  marginal   or poor.

There's  some  finite  divide  lines  on  this.

An excellent,  totally  100%  agreement,

would  make  a  Kappa  of  one, and  we  could  actually  get  a  negative Kappa

which  would  be  agreement that's  less  than  by  chance  alone.

The   books  for these are given  in  the  reference.

All  right,  let  me  just  stop  here and  go  into  JMP  to   what  we've  done  so  far.

We  can  see  that.

The  data  file  that  we've  got over  here  is  the  inspectors

that  I  talked  about,  and  I  said  we could  take  a  look  at  the  distribution.

Obviously,  if  you're  familiar  with  JMP, you  just  go  analyze  distribution

and  put  in  the  inspectors  over  here.

Then  we  can  put  this  next  to  the  rows,

and  if  we  click  fail, we  see  the  comparison  between  the  fails

for   Inspector 1  versus  the  fail for   Inspector 2  and  some  disagreements.

Then  you  can  see  which  areas the  rows that  disagree it's  example  four.

Similarly,  if  we  calculate  over  there, we  can  compare   Inspector 2  to   Inspector 1.

Get  a  different set  of  numbers  over  here.

The  other  option  that  I  mentioned for  visualization  is  to  do  a  graph builder

and  in  graph builder,  we  can  put Inspector 1  on  the  horizontal  axis

and   Inspector 2 on  the  vertical  axis.

Now  we  have  a  comparison, and  now  we  can  actually  see  the  numbers

of  times,  for  example, how  many  times  did   Inspector 1

rate  something  as  a  pass  where Inspector 2  rated  as  a  fail.

If  we  go  back  to  the  data  table,

we  see  that  there  are  nine  instances  of that,  and  they're  highlighted  in  a  row.

Th is  is  a  very  quick  way of  seeing  what  the  numbers  are

for  the  different categories  that  we're  working  with.

Let's  say,  click  done  over  here.

The  other  thing  we  can  use as  I  mentioned,  is  the  tabulate  feature.

We  can  go  to  tabulate, and  we  can  put   Inspector 1  on  the  top

and   Inspector 2  on  the bottom  over  here  this  way.

Then  we  can  add  in  another  row  for  the marginal  totals  down  here  and  so  forth.

Now  we  have  the  summary that  we  can  put  next  to  the  graph builder

and  see  what  the  actual tabulations  are  that  we've  got  for  that.

That's  something  that  we  would  do

perhaps  if  we  were  not familiar  with  JMPs  program.

But  let's  use  JMP  now  to  do  the  analysis.

We're  going  to  come  over  here, we're  going  to  go  to  Quality  and  Process.

We're  going  to  go to  variability  attribute  gauge  chart.

We're  going  to  put in  the  inspectors  over  here.

We're  going  to  put  in  the  part.

Over  here  you  notice it's  required,  attribute.

Okay,  click  okay, and  now  we  have  our  output.

This  output  shows  again for  each  part  the  agreement.

Zero  means  I  disagree. 100%  means  I  disagree.

This  shows  the  rating between  the  two  inspectors,  86%.

This  is  the  summary,  86%.

Here's  our  Kappa  index  over  here, and  we  have  the  agreement  within  rates.

This  is  kind  of  redundant  in  the  sense

because  we're  only  looking at  one  binary  comparison.

Then  further  on  down  here, we  can  do  the  agreement  by  categories.

We  can  actually  calculate  how  much is  the  agreement  by  the  fail,

individually,  or  by  the  pass  individually.

Okay,  so  that's  how  we  would do  it  for  a  simple  comparison.

But  what  if  we  now  consider that  the  actual  diagnosis

of  the  part  was  known  or  confirmed?

Let's  go  back  into  our PowerPoint's  presentation.

I introduce a  standard.

Okay,  this  is  a  measure of  what  we  call  effectiveness.

We're  going  to  assume  the  correct  part, classification  was  either  known

or  subsequently  confirmed.

So  this  is  the  true  correct  diagnosis that  should  have  been  done  on  that  part.

How  accurate  are  the  inspectors  choices?

In  other  words,  how  can  we  compare how  accurate  each  inspector  was

to  determine  matching  up with  the  true  standard?

We  set  up  that  column  in  JMP, and  now  we  can  go  through  the  process

that  we  said  earlier of  looking  at  a  distribution.

For  example, if  we  add  in  the  distribution,

this  time  we  include  a  standard, now  we  can  click  on  pass,

and  we  can  see  the  agreements between   Inspector 1  and   Inspector 2

on  pass  classifications.

You  can  see  both  of  them  had  some misclassifications,  some  wrong  diagnosis.

We  can  click  on  fail  and  do  the  same thing  for  the  other  category.

Then  JMP  will  highlight those  in  the  data  table.

To  do  the  attribute  gauge analysis  in  JMP  using  the  standard,

all  we  have  to   do now  is  enter  standard into  the  dialog  box  as  we  did  before.

This  is  the  additional  column.

The  big  difference  now, and  this  is  not  highlighted  in  the  manual,

is  that  under  attribute  gauge, we  can  now  get  a  chart

that  applies  specifically to  the  effectiveness.

What  we're  going  to  do  is  unclick  these agreement  points  on  the  chart,

and  click  instead  the  effectiveness points  under  attribute  gauge.

When  we  do  that,  we  get  another chart  that  measures  the  effectiveness.

And  this  effectiveness has  three  ratings  for  it.

This  gauge  attribute  chart now  shows  the  percent  agreement,

0, 50 %,  or  100%  of  the  two inspectors  to  the  standard  for  each  part.

A  0%  implies  both  inspectors misdiagnosed  their  problem.

Seven  events  of  that  occurred.

A  50%  signifies  one  of  the  inspectors got  their  correct  classification,

and  obviously, 100%  means  they  both  got  it  right.

Then  the  left  chart  shows the  overall  percent  agreement

to  the  standard  for  each  inspector.

We  noticed  that  there  was  some  slight difference  between  the  two  inspectors.

We  now  generate  the  effectiveness  report

that  incorporates the  pass/ fail  comparisons

to  the  standard  for  each  inspector.

You  can  see   Inspector 1 got  42  of  the  fails  correct.

He  got  43  of  the  passes  correct, but  he  got 10incorrect  of  the  fails,

call  them  passes.

And  he  got  five  of  the  passes  incorrect.

I  find  this  notation a  little  bit  confusing.

I  put  it  down  at  the  bottom.

When  we  say  incorrect  fail,

that  means  a  fail  was incorrectly  classified  as  a  pass.

When  we  say  incorrect  pass, it  means  a  pass  was  incorrectly

classified  as  a  fail.

You  can  get  your  mind  going  in  crazy  ways

just  trying  to  always interpret  what's  in  there.

What  I've  done  is  I  created my  own  chart  to  simplify  things.

The  misclassifications  shows  over  here that  17  actual  fail  parts  were  classified

as  pass,  and  eleven  pass parts  classified  as  fail.

So  that's  in  the  JMP  output.

But  what  I  said  over  here, and  I'd  love  to  see  JMP  include

something  similar  to  this  as a  clear  explanation  of  what's  going  on.

Inspector 1,   Inspector 2, the  standard  is  pass,

and  then  the  correct classification as  pass  is  43  and  42.

The  misclassified as  fails  are five  and  six.

Then  over  here, when  the  standard  is  the  fail,

the  correct  choices  by  Inspector 1 and Inspector 2 is  42  and  45.

And  the  misclassified, so  when  it  was  fail,

10  of  them  were  classified as pass and seven  over  here.

Now,  understand, a  fail  part  classified  as  pass

is  a  part  that's a  defective  part  going  up.

That's  called  a  miss.

On  the  other  hand, a  fail  part  that  is t he  actual  pass  part,

we  call  that  basically  produces  risk, that's  a   false  alarm.

And  JMP  uses  those  terms, false  alarm  and  miss,

later  on  I'll  explain  that.

I  like  this  chart  because  it  seems to  make  a  clear  explanation

of  what's  going  on.

Using  graph builder, again  we  can  view  the  classifications

by  each  inspector  as  shown  over  here.

A gain,  you  can  highlight specific  issues  there.

JMP  also  allows  you to  define  a  conformance.

In  other  words,  we  said  non- conforming is  a  fail  and  conforming  is  a  pass.

That  way  we  can  take  a  look at  the  rate  of  false  alarms  and  misses

in  the  data  itself  as determined  by  the  inspectors.

We  can  see  that  the  probability of  false  alarms  for   Inspector 1  was  0.1

and   Inspector 2  is 0.125. The  probability  of  misses,

okay, this  means  that we  let  it  defect  the  park,  go  out,

was  higher  for  Inspector 1 and   Inspector 2.

I'll  show  how  these  calculations  are  done.

To  emphasize  this,  a  false  alarm

occurs  when  the  part is  incorrectly  classified  as  a  fail,

when  it  is  a  pass. That's  called  a  false  positive.

The  false  alarm,  the  number  of  parts

that  have  been incorrectly  judged  to  be  fails,

divided  by  the  total  number  of  parts that  are  judged  to  be  passes.

Now,  that's  where  this calculation  is  done  over  here.

If  I  go  up  here, here's  the  pass is  misclassified  as  fail,

so  if  I  take  five  out  of  48, I  end  up  with  that  number  0.1042.

Now  the  next  thing  is  a  miss.

That  part  is  incorrectly classified  as  a  pass.

When  it  actually  is  a  fail. That's  a  false  negative.

In  this  case, we're  sending  out  a  defective  part.

The  number  of  parts  that  have  been incorrectly  judged  to  be  passes

divided  by  the  total of  parts  that  are  judged  to  be  fails

is  10  out  of  42  plus  10 is  0.193.

A gain,  going  back  to  this  table,

these  are  the  parts  that  are  fail,  but  10 of  them  were  misclassified  as  a  pass.

So  the  parts  that  should  have  been classified  as  they  fail  is  52.

Ten  divided  by  52  gives you  that  number  of 0.1923.

So  I  like  that  table  is  easier to  interpret  over  here.

The  final  thing  about the  conformance  report

is  you  can  change your  conformance  category,

or you  can  switch  conform  to  non-conform.

You  can  calculate  also  an  escape  rate. And  that  is  the  rate  that  the  probability

that  a  non-conforming  part  is actually  produced  and  not  detected.

To  do  that,  we  have  to  provide some  estimate  or  probability

of  non-conformance  to  the  JMP  program.

I  put  in  like  10%,  let's  say  10% of  the  time  we  produce  a  defective  part.

Given  that  we've  produced a  defective  part,  what's  the  probability

that  's  going  to  be  a  miss and  then  escape?

And  that's  the  escape  rate. That's  the  multiplication  of  the  two.

our process  will  produce  times the  probability  of  a  missed  process

produces  fail  part  times the  probability  of  a  miss.

Now  let's  go  into  JMP  again

and  we're  going  to  use the  inspection  with  the  standard.

I'm  quickly  going  to  go  through  this.

We  do  analyze  distribution, again,  put  into  three  over  here,

and  now  we  can  click on  a  standard  down  here

and  then  we  can  highlight, compare  the  Inspectors 01 and Inspector 2.

Another  way  to  visualize  it  is  to  use graph  builder  as  we've  done  before.

Then  we  can  put  Inspector 1  over  here.

Let  me  do  it  this  way.

And   Inspector 2  can  be  on  this  side  now.

Then  we  can  enter the  standard  over  here  on  this  side.

And  now  we  have  a  way  of  clicking and  seeing  what  the  categories

were  relative  to  the  standard.

That's  a  very  nice  little  graph, and  if  you  wanted  to  say,

"Okay,  how  many   Inspector 1

versus  classified as  a  pass  when  it  was  a  fail.

Now  we  can  bring  that  to  a stand, and  the  rows are  highlighted  too.

Let's  go  into  JMP. Now  we're  going  to  analyze

quality  and  process, variability  attribute  gauge  chart,

recall  and  I'm  going  to  add in  the  standard  over  here.

We  click  the  standard. Now  here's  the  issue.

JMP  gives  us  the  attribute  gauge  chart, but  this  was  for  the  agreement.

What  we'd  like  to  measure is  against  the  standard.

We  come  up  here on  the  attribute  gauge  chart

and  what we're  going  to  do  is  unclick anything  that  says  agreement.

And  click  on  anything that  says  effectiveness.

There  might  be  a  simpler way  to  do  this  eventually

in the  [inaudible 00:23:31] programming  in  JMP.

Now  we  have  the  effect  on  this  chart, again,  as  I  said,  50%  means

that  one  of  the  inspectors  got it  right,  0%  means  they  both  got  it  wrong.

And  we  had  the  agreement  report  showing  86  that  we've  seen  before.

But  what  we  want  to  get  down  to is  the  effectiveness  rating,

the  effectiveness  report.

And  now  we  see that  Inspector  1  was  85%  effective.

Inspector 2  was  87%  effective.

Overall  it  was  86%  effective.

Here's  the  summary of  the  miss  classifications.

And  these  are  the  ones that  are  listed  over  here.

As  they  say  this  terminology  you  need to  understand  that  incorrect  fails

were  correct  passes and  incorrect  passes  were  correct  fails.

Then  the  conformance  report  is  down  here, we  showed  you  how  to  do  the  calculation

and  then  we  can  change  the  conforming category  by  doing  that  over  here.

Or  we  can  calculate  the probability  of  escape,  escape  rate,

by  putting  in  some  number in  that  estimates,

how  often  we'd expect  to  see  a  defective  part.

I'm  putting  in  over  here  point  one,

click,  okay.

And  then  JMP  gives  us  the  probability of  non-conformance  and  the  escape  rates

as  shown  over  here  now  for  each  inspector.

I was  going  back  to  now my  PowerPoint  presentation.

Now  that  we  have  a  feeling for  these  concepts  of  agreement,

effectiveness  and  the  Kappa  index, let  us  see  how  we  can  apply  the  approach

to  a  more  complex  problem  engage analysis  called  inventory  tracking.

As  part  of  a  consulting  project with  the  robotics  company,

and  It's   Vimaan Robotics,

and  by  the  way, there's  some  wonderful  videos

if  you  click  over  here that  shows  the  drones  flying

in  the  warehouse  and  so  forth

doing  the  readings and  some  of  the  results  from  the  analysis.

As  part  of  a  consulting  project was  introduced  to  the  problem

of  drones  flying  in  a  warehouse using  optical  character  recognition

to  read  inventory labels  and  boxes  and  shelves.

In  measurement  system  analysis  MSA,

the  purpose  is  to  determine  if the  variability  in  the  measurement  system

is  low  enough  to  accurately  detect

differences in  product- to- product  variability.

A  further  objective  is  to  verify that  the  measurement  system

is  accurate,  precise  and  stable.

In  this  study, the  product  to  be  measured  via  OCR

on  drones  is  the  label  on  the  container stored  in  racks  on  a  warehouse.

The  measurement  system  must read  the  labels  accurately.

Furthermore,  the  measurement  system  will also  validate  the  ability  to  detect,

for  example,  empty  bins,  damaged  item, counts  of  items  in  a  given  location,

dimensions,  and  so  forth. All  being  done  by  the  drones.

In gauge R&R studies, one  concern  addresses  pure  error,

that  is  the  repeatability  of  repeated measurements  on  the  same  label.

Repeatability  is  a  measure  of  precision.

In  addition,  in  gauge R&R studies,

a  second  concern  is  the  bias  associated with  differences  in  the  tools,

that  is,  differences among  the  drones  reading  the  same  labels.

This  aspect  is  called  reproducibility, that's  a  measure  of  accuracy.

The  design  that  I  proposed

was  a  cross  study  in  which  the  same locations  in  the  warehouses,

in  the  bins  are  measured multiple  times,  that's  for  repeatability

across  different  bias  factors, the  drones  for  reproducibility.

The  proposal  will  define  several standards  for  the  drones  to  measure.

The  comparisons  will  involve  both within- drone  repeatability,

drone- to- drone  agreement  consistency,

and  drone- to- standard  accuracy.

The  plan  was  to  measure  50  locations  1-50,

and three drones  will  be used  to  measure  reproducibility

that's  drone- to- drone  comparisons, and  there  will  be  three  passes

for  each  location  by  each drone  to  measure  repeatability.

Now,  multiple  responses  can  be  measured against  each  specific  standard.

So  we  don't  have  to  have  just one  item  and  a  standard.

We  can  have  different  characteristics.

The  reading  can  be  binary,  that  is, classified  as  either  correct  or  incorrect.

And  also  the  reading  can  provide  status

reporting  for  a  location,  like  the  number of  units,  any  damage  units,  and  so  forth.

Examples  of  different  responses

are  how  accurately can  the  drones  read  a  standard  label?

Are  there  any  missing  or  inverted  labels?

Are  the  inventory  items in  the  correct  location?

Is  the  quantity  of  boxes in  a  location  correct?

Are  any  of  the  boxes  damaged? This  would  be  something

that  a  human  person  would  be checking  as  part  of  an  inventory  control,

but  now  we're  doing  it  all  with  drones.

Here's  the  proposal.

I  have  50  rows  over  here,

150  rows  actually,  because  each  location is  being  read  three  times  by  each  drone.

So  I  have  drone  A,  drone  B,  and  drone  C.

Then  these  are  the  results of  a  comparison  to  the  standard.

We're  classifying five  standards,  A,B,C,D  and  E,

and  they're  randomly  arranged in  here  as  far  as  the  location  goes.

And  it's  one  characteristic specify  each  of  the  50  locations.

S ince  we're  doing  three readings,  it's  150  rows.

T hree  drones  reproducibility, three  passes  for  each  location

and  by  each  drone  that's  repeatability

and  the  standards are specified  for  each  location.

I'm  going  to  make  an  important  statement over  here  that  the  data  that  I'm  using

for  illustration  is  made- up  data and  not  actual  experimental

results  from  the  company.

We  can  start  off  with distributions  and  dynamic  linking.

We  can  now  compare  the  classification of  the  drones  by  standard.

We  generate  the  distributions and  then  we  click  on  say,  standard  A,

and  we  can  see how  many  drones  got  that  standard  A  right,

or  whether  any  drones  had further  misdiagnosis.

Same  thing  if  we  can  click  on  standard  E, we  can  see  drone  A  had  a  higher  propensity

for  misclassifying  standard  E and  same  thing  with  drone  C.

Now  the  chart  below  shows  how  well

the  drones  agreed  with  each other  for  each  location.

Here  are  the  50  locations  and  we're looking  at  the  drones  comparing.

Now  when  you're  comparing  a  drone  to  other drones,  you've  got  a  lot  of  comparisons.

You're  comparing maybe  drone  one  to  i tself  three  times.

You're  comparing  zone  one,

drone  one  to  zone  two, times  for  each  one  of  the  measurements.

So  it  could  have  like a 1, 2, 3   for  zone,

drone  one  and  a  one,  two,  three  for  zone.

Two,  and  you're  comparing  all  possible combinations  now  of  those  drones.

That's  why  the  calculations get  a  little  bit  complex

when  you  get  multiple  drones  in  there.

But  you're  doing  a  comparison.

This  shows  the  agreement among  all  the  comparisons.

Now  we  noticed  that between  zones  the  five  and  10,

that  for  these  locations the  accuracy  dropped  quite  significantly

and  that  prompted  further investigation  as  to  why?

It  could  have  been  the  lighting, it  could  have  been  the  location,

it  could  have  been  something  else  that  was interfering  with  the  proper  diagnosis.

You  see  most  of  the  drones are  reading  accurately  100%.

This  is  an  agreement between  the  drones,

so  they  were  agreeing roughly  90,  91%  at  the  time.

And  these  are  the  confidence intervals  for  each  drone.

So  this  told  us  how  well  the  drones were  comparing  to  each  other.

Now  we  got  agreement  comparison, the  tables  will  show  the  agreement  values

comparing  pairs  of  drones and  drones  to  the  standard.

And  the  Kappa  index  is  given  against the  standard  and  repeatability

within  drones  and  reproducibility  are  all excellent  based  on  the   Kappa Statistics

and  agreement  across  even the  categories  is  also  excellent.

So  we're  comparing  here  drone  A  to  drone B

drone  A  to  drone  C,  drone  B  to  drone  C, all  doing  excellent  agreement.

We're  comparing  here  the  drones  to  the standards,  all  an  excellent  agreement.

And  then  this  is  the  agreement, just  basically  a  summary  of  it.

Then  this  is  the  agreement by  the  different  categories.

Now  again,  we  can  look  at  the  attribute chart  for  effectiveness.

Same  way  we  click  out all  the  agreement  check  boxes

and  then  click  on  the  effectiveness  boxes.

We  see  again  over  here, that  seven  and  eight  had  the  lowest

agreement  to  the  standard.

Again,  that  could  have  been  something associated  with  the  lighting.

It  could  have  been  something  associated with  some  other  issue  there.

Then  the  overall  agreement to  the  standard  by  drone,

you  can  see  they're  about  95%.

The  drone  is  pretty  accurate and  they  were  pretty  reproducible

and  the  repeatability  was  excellent.

This  is  the  effectiveness  report. Now  this  is  a  little  bit  more  elaborate

because  now  we're  comparing  it  for  each of  the  five  characteristic  standards

and  these  are  the  incorrect  choices that  were  made  for  each  one.

Out  of  150  possible  measurements,

drone  A  measured  142  correctly, drone  B  145  and  drone  C  140.

So  effectiveness  is  the  important  one.

How  accurate  were  the  drones?

We  can  see  that  the  drones  are  all running  up  around  an  average  about  95%.

This  appears  to  be  highly  effective.

Then  we  have  a  detail  analysis

by  level,  provided in  Misclassification  report.

So  we  can  see  individually  how  each  one of  these  drones  compared  to  the  different,

how  each  one  of  the different  characterizations

were  measured  correctly or  incorrectly  by  the  drones.

This  is  the  ones  that  are misclassifications.

A gain,  let  me  go  into  JMP.

Oh,  one  further  example I  meant  to  show  up  over  here.

You  using  graph builder,

we  can  view  the  classifications and  mis classifications  by  each  strong.

This  is  a  really  neat  way  of  showing  it. I  wish  JMP  would  include  this  possibly

as  part  of  the  output,  but  you  can  see where  the  misclassifications  occur.

For  example,  for  drone  A, when  you  misclassify  drone  C,

most  of  them  were  classified  correctly, but  there  are  a  few  that  were  not.

These  show  the  misclassifications.

I  like  that  kind of  representation  in  graph builder.

Now  let's  go  back  into  JMP and  we're  going  to  do

attribute  gauge  analysis  multiple with  the  actual  experiment  that  was  run.

Okay,  so  we're  going  to  analyze distributions so  we  can  do  this.

We  can  compare  the  drones  to  the  standard.

Again,  we  can  just  click  on  a  standard  and see  how  it  compares  across  the  drones.

We  can  also  do  an  analyzed  graph builder.

And  we  can  put  the  zone  drone  A

and  then  drone  B, and  then  drone  C  over  here.

And  then  we  can  put  the  standard  in  there

and  it  shows  very  clearly what's  happening  with  that.

B ut  we  can  go  and  also  into  JMP and  use  the  JMP  quality  and  process

variability  attribute  gauge.

So  we  add  the  three  drones  in  here, we  add  the  standard,

and  we  put  in  the  location  and  we  get our  gauge  attribute  chart report

showing  that  drones  as  far  as  the agreement  with  each  other,  we're  at  90%.

This  one  has  the  most  difficult locations  to  characterize.

Here  are  the  agreement reports  that  I've  shown  you.

Drone  A,  Drone  B  and  Drone  C  agreement with  the  other  drones  and  with  itself  too.

Drone  A  to  drone  B, these  are  the  Kappa  values.

This  is  the  measurement to  the  standard,  all  very  high.

And  then  these  are  the  agreement across  categories.

And  then  for  the  effectiveness,

to  get  that  graph  that  we  like  to  see for  the  effectiveness  report,

we  take  out  the  agreement  over  here and  click  on  now  the  effectiveness.

We  now  have  the  effectiveness  plot

on  the  tap  that  shows  us  how the  drones  agreed  with  the  standard.

We now  go  back  into  the PowerPoint  presentation  over  here.

Okay,   to  summarize w hat  we've  done  over  here,

the  use  of  attribute  gauge  analysis allowed  the  company  to  provide  solid  data

on  the  agreement  and  effectiveness of  drones  for  inventory  management.

T he  results  are  very  impressive. Subsequent  results  reported

on  the  company's  website  show  inventory counts  to  be  35%,  faster  inventory  costs

reduced  by  40% and  reduced  missed- shipments

and  damage  claims  reduced by  50%  compared  to  the  previous  methods.

In  addition,  the  system generates  what  we  call  actionable  data

for  more  accurate,  effective,  safer, more  cost  effective,

and  faster  inventory  control.

Some  excellent  references  over  here is  Cohen's  original  paper,

and  book  by  Fleiss  is  excellent,

has  a  lot  of  detail,  and  also the  book  by  Le  is  well  done.

I  thank  you  very  much  for  listening. Have  a  good  day.