Choose Language Hide Translation Bar
Level III

A Method To Strategically Pre-process Data From Industrial Processes Before Storage and Analysis (2023-EU-30MP-1238)

Günes Pekmezci, Senior Expert for Industrial Internet of Things, Bundesdruckerei
Luís Fernando Ferreira Furtado, Senior Expert in Industrial Controllers and Robotics, Bundesdruckerei
Michael List, Manufacturing Engineer, Bundesdruckerei


Despite the development of new network and media technologies, the intense use of bandwidth and data storage could be a limiting factor in industrial applications. When recording sensor signals from multiple machines, a question must always be asked: which meaningful information could be extracted from the data and what should be saved for later analysis? The answer to this question is a method proposed and implemented by the Production Data Engineering Team at Bundesdruckerei GmbH in Berlin, a wholly-owned subsidiary of the German federal government that produces security documents and digital solutions. This method focuses on pre-processing data directly in the machine controller, strategically reducing the amount of data to send only the meaningful information to the network over OPC/UA, stored in the database, and further analyzed using JMP. A case study is presented, describing the implementation of this method in torque and position data from a servomotor used in a cutting process. The JMP Scripting Language is used to automatically generate reports of the cutting tool wearing, which is also analyzed in combination with the quality data of the product. Those reports allow the Production Engineers to understand the machines better and strategically plan tool changes.



Hi,  I'm   Günes Pekmezci  and  my  colleague,  Luis  Furtado.  We  are  working  at   Bundesdruckerei,  and  we  are  both  engineers  in  production  department  for  the  data  team.  Today,  we  would  like  to  present  you  a  method  to  strategically  process  data  from  industrial  processes  before  analysis  and  storage.

I  would  like  to,  first  of  all,  tell  a  little  bit  more  about  our  company.   Bundesdruckerei  is  a  government- owned  company  that  produces  security  documents  and  digital  solutions.  We  are  getting  bigger  and  bigger  every  day.  Right  now,  we  have  3,500  employees.  We  continue  to  grow.  These  figures  are  from  2021.

In  that  year,  we  also  had  a  sales  margin  of  €774  million.  We  have  over  4,200  patents.  Most  profits  we  are  earning  is  coming  from  German  ID  systems,  which  I  will  talk  about  it  a  little  bit  more  in  further  slides.  Then  we  have  also  secure  digitization  solutions  as  a  bigger  profit  bringer  for  us.

If  we  look  at  the  target  markets  and  our  customers,  we  will  see,  like  I  said,  the  official  ID  documents  first.  This  means  that  we  are  physically  and  digitally  producing  official  identity  documents  like  ID  documents,  passports,  resident  permits,  and  this  is  our  biggest  market.

Then  we  also  have  some  security  documents,  which  means  that  we  are  producing  bank  notes,  postage  stamps,  text  stamps,  and  pertinent  security  features  for  the  government.  On  top  of  that,  we  have  a  growing  department  for  eG overnment.  Here  we  are  creating  solutions  for  the  authorities,  mostly  German  state  authorities,  to  digitalize  their  public  administration  systems.

We  also  have  high  security  solutions.  In  this  department,  we  are  creating  higher  security  required  solutions  for  the  security  authorities  and  organizations.  We  are  also  having  a  target  market  in  the  health  industry.  We  are  creating  products  here  and  also  systems  for  secure  and  trusted  digitalized  health  systems.

Other  than  that,  we  also  are  active  in  the  finance  field.  Here  we  are  creating  products  and  systems  to  control  and  secure  financial  transactions,  both  in  public  and  also  enterprise  sector,  which  also  could  be  taxes,  banks ,  insurance,  et cetera.

If  we  come  to  our  use  cases,  what  we  want  to  share  with  you  today,  we  are  going  to  tell  you  about  a  use  case  that  we  decided  to  implement  for  us,  for  predictive  maintenance.  Like  every  other  company,  our  aim  was  to  create  some  use  cases  for  new  digital  area.  We  thought  about  what  could  be  analyzing  for  big  data,  predictive  maintenance,  and  things  like  that.

We  decided  also  starting  with  our  biggest  document  that  the  German  passport . This  document  is  very,  very  complex,  and  it  has  a  lifetime  for  10  years.  We  have  a  high  production  rate  also  here,  and  we  decided  to  create  a  predictive  maintenance  use  case  for  a  process  in  this  document.

Our  process  is  punching  process.  It  was  a  good  process  for  us  because  we  have  a  good  understanding  in  this  process  and  also  which  is  very,  very  important  in  industrial  of  things  that  we  had  the  access  to  the  data  that  we  could  analyze  to  create  our  information.

Our  objective  for  this  use  case  was  to  create  a  better  product  quality  by  making  a  predictive  maintenance  for  our  tool  wear  state.  Instead  of  having  the  tool  worn  out   we  react  to  it.  We  decided  to  look  at  the  data  and  create  an  information  that  will  allow  us  to  plan  our  tool  change  time.

We  can  also  minimize  our  downtime,  minimize  our  scrape  rates.  We  could  also  use  this  use  case  in  different  machines,  use  this  as  a   long-term  behavior  of  the  process.  It  was  a  really  good  use  case  for  us  to  start  with.

I  will  give  the  rest  to  Luis  to  explain  you  further  how  we  go  into  this  use  case  and  what  we  did  exactly,  what  were  our  challenges,  and  how  we  find  solutions  for  that.

Thank  you,  guys.  I'm  going  to  present  a  bit  more  about  our  product  and  process.  In  the  case  of  product  that  we  are  analyzing  this  study  is  the  passport.

The  passport,  when  you  think  about  this,  is  a  book.  It's  like  a  sandwich  full  of  pages,  and  those  pages  has  also  a  lot  of  security  features  like  the  picture  that  is  printed,  the  data  that  is  lasered.  There's  also  the  chip,  the  antenna  from  the  chip.  There's  also  h olography  layers.  There's  several  features  for  security  that  is  inside  of  the  German  passport.

When  you  make  the  sandwich,  there's  a  lot  of  machines  also  to   bring  all  those  features  to  the  product.  When  you  make  the  sandwich,  you  need  to  cut  it  in  the  right  size  according  to  the  norm.  When  you  cut  it,  we  separate  the  finish  book  and  also  the  borders  that  we  don't  need  anymore.

The  point  is  this  cutting  process,  we  use  a  punching  machine .  This  tool  that  is  installed  at  the  end  of  this  punching  machine,  also  wears  with  the  time,  and  the  quality  of  the  cut  also  starts  to  be  not  very  good  at  the  end  as  it  was  in  the  beginning.  What  we  are  trying  here  in  this  project  is  how  to  make  the  assurance,  what's  the  perfect  time  to  change  the  tool  and  that  with  that  term.

Here's  a  picture  of  the  end  product,  the passport .  Here  the  borders  that  were  cut.  I'm  going  to  present  a  bit  more  the  tales  of  a  sketch  of  the  machine,  how  that  works,  and  what  was  the  original  idea.

But  first,  we  have  our  original  architecture  of  implementation  of  the  data.  We  have  a  machine  with  several  sensors,  sensor  number  1,  2,  up  to  any  sensors  that  we  need  to  measure.  We  bring  all  the  sensors  to  the  machine  PLC,  that  is  the  controller  of  the  machine,  and  then  you  just  mirror  this  data  to  the  master  computer,  and  you  mirror  this  data  again  to  the  database.

That  was  the  first  original  implementation  that  we  had.  The  database  will  have  a  lot  of  data,  and  then  it  starts  to  make  the  analysis  of  the  machine  and  try  to  understand  what  is  happening  in  the  machine, and  in  this  case,  what  is  happening  in  the  punching  tool  that  is  cutting  the  passport .

When  you  think  about  the  sketch  of  this  machine,  so  we  have  this   servomotor  that  turns  this  wheel .  With  the  mechanical  linkage  here,  we  can  move  up  and  down  the  punching  tool .  At  the  end,  we  have  the  tool,  the  cutting  tool  that  has  exactly  the  end  shape  that  we  need.

This  tool  here,  with  the  time,  we  see  it wearing.  It's  not  that  sharp  anymore,  and  then  we  start  to  have  a  not  good  quality  in  the  product  that  we  are  producing.  Then  you  need  to  change  the  tool  to  make  it  sharp  again.

Good.  How  you  can  be  sure  that  this  tool  is  good  to  cut?  We  measured  the  position  of  the   servomotor,  and  we  measure  also  the  torque  of  the   servomotor,  and  we  bring  all  the  data  from  the  position  and  the  torque  to  the  controller,  then  you  get,  as  I  presented  in  the  previous  slide,  we  mirror  the  data  to  the  master  computer,  and  then  you  mirror  the  data  to  the  database.

In  this  industrial  controller,  it's  not  continuous.  The  curve  is  not  continuous  like  here,  but  it's  discrete.  In  the  end,  you  need  to  think  about  a  measurement  of  every  CPU  cycle  or  the  clock  tick  of  the  CPU.  In  this  case,  we  get  all  this  data  and  it  transferred  to  the  master  computer.  Then  we  make  the  analysis  from  the  database.

But  the  point  is  we  realized  that  using  OPC  UA,  not  all  100 %  of  the  data  comes.  This  is  a  scenario  that  everything  is  fine.  We  have  all  the  points  inside  of  the  server,  inside  of  the  database,  but  sometimes  we  have  missing  areas.  We  have  like a  lacon  that  data  is  not  coming.  We  realized  that  we  have  only  95 %  of  the  data.  5 %  of  this  data  is  lost  when  you  have  a  CPU  cycle  of  100 hours.

Well,  this  loss  could  be  in  the  point  that  we  are  not  measuring  but  could  be  exactly  the  point  when  you  have  the  peak .  When  you  miss  data  like  here  and  you  miss  data  like  here,  we  compromise  our  measurement  of  the  tool.

Even  with  only  that,  you  have  a  data  loss  of  5 %,  and  then  you  have  not  100 %  of  the  data,  you  have  95 %  of  the  data  coming  to  the  storage,  but  95 %  of  the  data  coming  from  storage  for  all  the  sensors  that  we  have  in  a  machine,  for  all  the  machines  that  you  have  in  the  production  process,  it's  a  lot  of  data.  Then  you  start  to  realize  that  after  a  year,  we  have  a  lot  of  database  storage  amount,  and  this  is  something  that  you  want  to  reduce.

With  this  original  implementation,  we  still  have  that  missing  data,  normally  missing  data  in  the  points  that  we  need  to  measure.  Then  you  had  open  questions  about  this  implementation.  The  first  question  is,  is  that  possible  to  measure  this  tool  in  a  reliable  way  using  the  motor  torque?  The  other  one  is  how  to  reduce  the  amount  of  data  that  you're  sending  to  the  database?

Good.  Then  the  first  idea  that  we  had,  we  decided,  "Okay,  we  won't  check  the  data  from  the  database.  We're  going  to  collect  the  data  directly  on  the  machine  with  a  different  method  that  we  won't  lose  the  data.   100 %  of  the  data  come  to  the  computer  because  you're  measuring  exactly  in  the  machine  controller.

Let's  do  this  experiment  a  lot  of  times  for  different  sets  of  the  machine.  Let's  see  if  the  curve  has  the  same  form  and  if  this  curve  changes  a  bit  in  amplitude  when  you  change  the  scenario  in  the  machine.  At  the  end,  you had  four  scenarios,  and  you're  doing  this  extensively  this  test  in  the  machine,  and  you realize  this  is  the  result  of  this  experiment.

We  tried  old  and  worn  tool,  so  the  tool  was  not  that  sharp  anymore. We  had  a  passport  with  32  pages.  We  have  two  products.  It's  a  passport  with  32  pages  and  the  passport  with  48  pages.

The  client  can  order  quarterly  to,  "Okay,  if  you're  going  to  travel  too  much,  then  order  48  pages."  We  tried  with  old  and  worn  tool,  32  pages.  We  tried  with  old  and  worn  tool  with  48  pages.  Then  we  changed  the  tool  for  a  new  one,  and  we  repeat  this  experiment  with  the  new  and  sharp  tool  for  32  and  new  and  sharp  tool  for  48- page  product.

This  is  the  result  of  the  curves.  We  realized  that  all  the  curves  has  the  same  shape,  and  this  is  a  superposition  of  a  lot  of  curves  that  we  tried,  and  the  variation  is  very  small.  But  we  can  also  see  that  we  can  see  very  clearly  the  peak  value  for  the  old  two 48  pages  is  a  bit  far  from  the  old  two 32- page.  Also,  new  tool ,  the  peak  value  is  shorter  because  you  have  less  force  to  cut.

This  is  about  the  torque  in  the  motor.  When  you  have  less  force  to  cut  because  it  was  sharp,  then  you  have  an  even  lower  amount  of  torque.

Good.  With  this,  we  got  some  information.  All  the  scenarios  present  the  same  shape  of  curve.  The  curve  is  in  the  same  shape,  and  we  realize  that  "Okay,  then  I  don't  need  to  record  all  the  curve.  I  can  also  only  record  the  position  of  the  peak."  This  is  what  is  interesting  for  us  for  this  new  implementation  that  we  are  proposing  here.

The  peak  value  could  be  used  for  two  different  things.  The  peak  value  could  be  used  for  the  tool wear  monitoring.  That  is  the  original  idea  that  we  wanted.  Another  thing  that  for  us  is  also  important  is  product  classification.  You  can  also  check  the  quality  of the  product  if  you  are  producing  a  32  or  48  page  is  a  safe  way  to  say  the  product  has  32  or  48  page.

Good.  Then  what  is  the  difference?  The  difference  is  the  implementation  directly  in  the  controller  of  the  machine.  The  whole  sketch  of  this  machine  is  the  same.  Then  you  get  the  data  inside  of  the  controller  in  the  same  way.

But  what  we  made  here  different.  We  preprocessed  the  data,  we  filter,  we  made  a  window  here .  In  this  window,  we  search  for  the  peak.  When  we  find  the  peak,  we  get  the  peak  of  the  torque  and  in  which  position  of  the  motor  this  peak  happened.  Then  we  just  transfer  one  set  of  data,  not  the  whole  curve  of  the  machine.

How  that  works  in  the  end?  The  original  implementation  that  you  saw,  per  sensor  in  a  machine,  we  had  every  year,  11.7  gigabytes  per  sensor.  That  was  quite  a  lot.  When  you  think  that  we  have  several  hundred,  almost  thousands  of  sensors  in  a  machine,  and  we  have  more  machines  in  our  production  area,  this  is  something  very  critical  for  us.

With  this  proposed  implementation,  we  have  everything  very  similar.  The  sensors  go  to  the  machine.  But  inside  of  the  machine,  we  do  a  preprocessing.  We  filter  just  the  meaningful  information  that  you  need,  and  then  it  transfer  less  data  to  the  master  computer,  and  then  it  transfer  less  data  to  the  database .  It  made  our  analysis  just  with  this  less  amount  of  data  but  the  meaningful  one.

In  this  case,  it  reduced  more  than  a  hundred  thousand...  No,  a  thousand  times  less.  Now,  it's  8  megabytes  per  year  per  sensor.  This  is  a  good  implementation.

This  was  implemented  in  JMP  and  JMP  Live.  I'm  going  to  give  the  word  back  to   Günes ,  so  she  could  keep  explaining  the  next  steps,  what  we  did  afterwards.

Thank  you,  Luis.  How  we  generated  information  in  JMP  with  this  analysis  is  like  everyone  else.  We  started  analyzing  our  data  in  the  JMP  first,  and  it  was  easy  to  analyze  also  our  huge  data  sets,  like  20  million  data  sets  in  JMP.  But  then  when  we  decided  to  get  just  the  peak  values,  we  were  able  to  create  our  reports  also  very  lighter  and  very  informational.  Then  we  decided,  okay,  when  it's  so  good,  then  we  decided  to  send  our  results  to  the  JMP  Live.

Right  now  in  JMP  Live,  we  have  the  following  reports,  and  it  is  generated  automatically  every  week.  There  is  a  meeting  every  week  for  the  machine  colleagues,  and  they  look  at  this  report  to  decide  when  is  the  time  to  change  the  tool.

Here  you  can  see  different  machines.  We  have  six  machines  of  this  kind.  Then  you  can  see  our  peak  value  for  the  torque,  and  then  you  see  the  development  through  the  weeks.

Here  you  can  also  see  when  we  have  a  tool  change  in  machine  1  and  2,  you  c ould  automatically  see  next  week  the  values  of  the  peak  starting  again  from  a  lower  point  of  view,  which  Luis  already  explained  why  is  it  happening.

This  is  our  JMP  Live  report  that  we  create  our  planned  change  time  for  the  tool.  If  we  go  to  the  method  that  we  are  proposing...  I  want  to  tell  you  again  how  we  started  going  toward  this use  case.  We  started,  like  every  other  use  case,  first  of  all,  defining  our  project  requirements.  Then  we  took  all  the  data,  like  many  of  the  other  industries  also  trying  to  do  in  industrial  of  things.

We  said,  "Okay,  we  need  all  the  data."  We  tried  to  take  all  the  signals  from  the  machine.  We  analyzed  it  somewhere  different.  Then  we  looked  at  the  data  and  we  said,  "Okay,  is  this  good  enough  for  our  quality  of  the  information?  Does  it  meet  our  project  requirements?"

It  wasn't  meeting  our  project  requirements  because  of  this  missing  data.  With  the  missing  data,  we  weren't  able  to  see  the  right  data  to  have  the  relevant  information.  Then  we  said,  "Okay,  let's  go  to  the  machine  and  understand  the  process  a  little  bit  better.  Why  is  this  happening?  What  can  we  do  about  it?"

Then  we  started  doing  these  experiments  that  Luis  explained  on  the  machine  directly,  and  we  collected  the  data  locally.  Then  we  come  back  to  our  analyze  process,  and  then  we  said,  "Yeah,  now  the  data  is  good,  the  quality  is  good."

Now,  we  also  ask  the  question,  "Okay,  is  this  all  the  relevant  data?  Is  there  a  way  to  reduce  the  storage  without  reducing  the  data  quality?"  Then  we  decided  to  implement  this  preprocessing  algorithm  directly  at  the  machine  to  reduce  the  size  of  the  data.

What  we  are  suggesting  for  you,  too,  is  when  you  start  a  use  case  for  the  production  processes,  after  defining  your  project  requirements,  it  is  better  directly  go  to  the  machine  and  start  doing  experiments  there,  and  then  collect  the  data  locally.  When  you  first  do  this  step,  you  will  spare  yourself  a  lot  of  time  to  create  the  architecture  to  be  able  to  get  all  these  data  somewhere  else.

Also,  you  will  spare  yourself  lots  of  money  because  you  maybe  don't  need  that  much  space  in  your  servers  and  et  cetera.  If  you  start  directly  here,  you  can  go  all  the  other  steps,  and  then  you  will  be  able  to  get  a  result,  a  use  case  that  works  the  best,  and  you  will  have  less  time  for  that.

If  we  summarize  our  lessons  learned  and  benefits  for  the  use  case,  we  can  definitely  say  an   application-oriented  approach  is  very  good  implementing  use  cases  for  production.  You  really  need  a  deep  process  and  machine  understanding  for  the  industrial  of  things  use  cases.

It  will  definitely  will  be  better  for  you  if  you  create  a  team  of  engineers,  people  who  are  working  at  the  machines,  and  also  the  data  people  together,  because  you  need  a  really  deep  understanding  of  what's  happening,  what  you  exactly  need  to  be  able  to  get  a  benefit  out  of  it.

Our  personal  benefits  for  this  specific  use  case  was  to  create  a  method  that  we  can  use  for  other  machines  and  processes,  which  we  are  also  sharing  for  you  today,  and  hoping  that  you  can  also  use  it  for  your  processes.  Then  also  this  method  that  we  created  for  us  was  able  to  use  in  other  machines  and  other  punching  processes  that  other  machines  have.

Also,  we  had  a  really  good  knowledge  at  the  end  of  this  use  case  about  the  tool  wear  state  for  us.  We  could  also  increase  our  downtime  because  instead  of  waiting  for  a  tool  to  be  worn  out,  we  were  able  to  plan  our  downtime.  That  means  automatically  that  we  were  also  decreasing  our  costs.

On  top  of  it,  we  were  also  able  to  use  this  method  and  this  analysis  for  a   long-time  behavior  of  our  tools,  which  also  a  great  thing  because  at  the  end,  we  were  able  to  have  a  predictive  maintenance  use  case.  A s  a  cherry  on  the  top,  we  were  able  to  reduce  our  data  storage  needs  significantly.

In today's  world  where  we  talk  about  the  energy,  it's  very  important  to  have  just  the  relevant  data  in  our  servers  because  it's  more  sustainable,  it's  more  energy  efficient.  We  were  really  happy  with  our  results,  and  we  are  hoping  also  you  will  get  some  inspiration  out  of  our  method,  and  maybe  you'll  be  able  to  use  it  for  yourselves.

Thank  you  for  your  attention,  and  this  was  our  method.  Have  a  nice  day.

Article Tags