Choose Language Hide Translation Bar

Digitalizing the Data Dictionary and Other Report Contextual Information - (2023-US-30MP-1465)

Have you gone back to your analysis, only to realize you've forgotten how you defined a metric or the general context of your data? Have you had stakeholders ask for this information or be confused on how to interpret a metric?

 

If so, why not consider digitalizing this information so that it's easy to convey among report periods or across common reports? This method helps provide report consistency, making it easier for stakeholders to gain the information they need when interpreting your analysis. It also saves you time, because this information is now in a centralized place, automating it as part of the analysis.  

 

JMP data tables can be used to build data dictionaries and store other report facts that can be easily populated in JMP reports. This presentation shows how you can build this capability, whether you are a basic JMP user or a seasoned JSL scripter. Make metric definitions part of your JMP analytic workflow, rather than an afterthought!

 

 

Have  you  gone  back  to  your  analysis  and tried  to  figure  out  what  you  were  doing?

 

Maybe  the  analysis  was  done  six  months  ago

and  you're  trying  to  go through  your  analysis

and  trying  to  figure out  what  was  this  metric  again?

How  did  I  define  it, what  were  the  assumptions  behind  it?

Now  think  about  your  audience.

If  they're  looking  at  something  that  you

don't  necessarily  understand  or  they  may have  want  a  little  bit  more  nuance  about

what  the  definition of  a  metric  is  in  the  analysis.

Well,  the  great  thing  is  we  can  do  that.

There  are  things  called  data dictionaries  that  we  can  be  utilizing.

What  is  a  data  dictionary?

A  data  dictionary  is  really  a  catalog

that  communicates  the  structure and  the  content  of  the  data.

More  importantly,  what  we're  going  to  be talking  about  today  is  really  meaningful

descriptions  around   what the  metrics  are  so  that

you  and  your  audience  can  understand  what  is  contained  in  that  analysis.

The  purpose  of  this  is  really  to  be able  to  over  communicate  so  that

everyone's  on  the  same  page  of  what the  variables  or  metrics  are

and  how  they  were  calculated

or  any  other  important  information  to  help either  your  audience  or  your  future  self

understand  and  document what  those  metrics  are.

We're  also  going  to  be  showcasing  how to  be  able  to  centralize  these  definitions

in  one  place  and  be  able to  digitalize  this  content.

What  I  mean  by  digitalized  is  just like  you  take  data  from  raw  form

and  being  able  to  present  that into  a  graph  or  an  analysis

for  people  to  consume,

we're  going  to  do  the  same thing  with  a  data  dictionary.

We're  going  to  be  able to  pull  the  data  dictionary

to  be  able  to  be  tied  into  your analysis  so  that  your  audience  or  yourself

do  not  have  to  go  find where  that  information  is.

It's  all  tied  together for  easy  user  experience.

Okay,  so  today  we're  going  to  talk  about I  have  three  options  here  for  folks.

The  first  solution  is  just really  a  simple  no  code.

If  you're  one  of  the  people  who  love the  GUI  around  JMP

and  you  do  not  want  to  have  anything with  code,  this  is  your  solution.

Let  me  walk  you  through  a  quick  way of  being  able  to  do  this

so  that  you  can  use   the  data  dictionary  on  concept.

The  second  part  is  if  you're  into  JSL and  you  have  a  very  simple  data  dictionary

that  you  want  to  have and  be  able  to  tie  on,

I'll  show  you  how  to  do it  with  a  little  dose  of  JSL.

Now,  if  you're  a  supercharger   and  really  want  to  see  JSL

in  all  its  glory  and  being  able to  essentially  be  able  to

have  a  way  of  being  able to  categorize  your  work

along  with  defining the  metrics  as  you  go  along

and  then  be  able  to  centralize  that,

really  be  able to  digitalize  the  data  dictionary

and  then  be  able  to  pull  out exactly  what  you  need  for  a  given  report.

We'll  do  this  with   the  supercharged  full  out  JSL.

Then  we'll  talk  about a  call  to  action  at  the  end.

Let's  talk  quickly.

Let's  just  JMP  in  here  and   talk about  the  simple  no  code  option.

First  we're  just  going  to  be  using a  JMP  data  table,

and  then  we're  going  to  embed the  definitions  into  that  data  table.

Let  me  open  up  the  data and  the  definition  table.

Here's  what  a  data  dictionary, a  very  simple  data  dictionary  looks  like.

You  have  the  variables and  a  description,

which  each  of  these  variables  are columns  here  in  the  data  table.

You  have  the  variable  or  the  column name  along  with  the  meaning.

I  also  added  the  references  over  here.

We  can  come  over  here  and   just plot  our  data  and  we  can  see   what

the  various  different metrics  look  like  in  here.

But  maybe  your  audience  comes  here and  goes,  what  does  abundance  mean?

Or  how  are  we  defining  richness? Or  what's  the  Shannon  Metric?

Well,  they're  going  to  have  to  go and  figure  out  what  that  actually  means.

What  we  can  do  here  is  we  can  tie  this data  table  to  the  data  dictionary  table,

to  the  data  table,   and  we  can  do  this  really  quickly.

If  you  go  over  to  this  triangle  over  here,  click  on  the  red,

go  down  to  copy  data  table.

We  can  come  over  here, do  the  same  thing  over  in  the  data  table.

But  this  time we're  going  to  hit  new  script

and  we're  going to  type  in  definition.

We're  going  to  then  paste.

I  have  command  v  or  control  v  on  here.

You'll  see  some  code.

You  do  not  need  to  understand what  that  actually  means.

You're  going  to  say  save   and  then  you're  going  to  say  okay.

You're  going  to  see that  you  have  the  definition.

If  your  audience  or  your  future  self

wants  to  come  in  here  and  say,   hey,  here's  my  plot,  that's  great,

I  see  my  data,  but  I  need to  know  what  the  definition?

They  just  need  to  hit  the  plot  and  voila.

You  have   well  formulated  your  definitions.

Okay,  very  simple.

Hopefully  if  somebody  has a  data  table,  come  up  with  some  metrics,

use  another  JMP  data  table  for  it   and  just  attach  it

to  your  data  table   and  you  have  a  great  reference.

Okay,  so  that  is  the  simple  notebook.

Now  let's  go  to  simple with  a  little  dose  of  JSL.

All  right,  so  here   we're  going  to  publish  the  analysis

along  with  the  definition  table   to  a  window,

and  then  we're  going  to  export the  analysis  to  a  PowerPoint  deck.

What  I'm  going  to  do   is  I'm  going  to  run  the  script

and  then  I'll  walk  you  through how  I  did  the  script.

The  first  thing  is  let's  get  this.

This  is  what  the  outcome of  the  script  does.

You  see  the  same  analysis  up  here, but  what  you  see  down  here  is

we  have  a  definition  table,  and  you can  just  open  up  the  definition  table,

and  then  you  can  see   what the  different  rows  are

or  the  definitions of  the  different  meanings  of  the  data

down  here  along  with  the  reference

if  people  wanted  to  go and  know  where  to  go

to  look  for  more  of  that  information,

How  did  we  actually  do  this?

Essentially  the  first  thing  we  did  was basically  opened  up  the  data  tables  here

and  then  the  next  thing  we  did  was  we  created  a  window.

This  is  basically  this  window  right  here

and  then  an  outline  box  with  the  analysis.

This  is  just  a  simple  graph  builder,  nothing  fancy.

Then  down  here  we  did  another outline  box  here  where  I  had  it  closed.

But  we're  going  to  need  to  keep that  open  for  right  now.

Then  simply  just  did  a  data  table  box

that  just  basically pasted  that  into  the  window.

Okay,  now  let's  say  that  this  is  great,

but  we  want  to  export this  to  a  PowerPoint.

We  can  very  well  easily  do  that  by  making sure  you  have  the  definition  window  open.

Don't  have  it  closed  or  it  won't  export to  the  PowerPoint.

Make  sure  it's  open  and  then just  say  save  to  PowerPoint.

Voila. There  you  go.

It  exports  the  data  to  the  PowerPoint.

Now  there's  a  little  bit  of  things that  you're  going  to  need  to  do.

You  can  make  this  bigger, you  can  make  this  change  the  color  on  it.

The  titles  up  here,  make  that  white  just so  it  makes  it  stand  out  or  change  it.

Then  also  here's the  data  dictionary  as  well.

All  you  need  to  do  is this  is  all  edible  so  that  you  can

actually  make  this a  little  bit  more  user  friendly.

I  find  that  12  font  makes  it  better,

but  you  can  have  the  definitions  in  the  backup  of  your  slides

so  that  people  can  come  back   and  be  able  to  reference  it.

It  doesn't  have  to  be  part  of  the  meet  of  the  presentation.

It  could  just  be  in  the  backup  slides.

Makes  it  really  easy  to  be  able to  export  all  this  information

to  PowerPoint  and  just  have it  tied  with  your  analysis.

All  in  really  essentially a  few  lines  of  code.

Basically  all  I  did  to  export  this  was I  created  the  button  save  the  PowerPoint.

Just  basically  save  the  graph and  then  save

on  one  page,  on  page  two

and  then  on  page  three  I  had  the  data  dictionary  part.

This  is  the  object  for  the  data  dictionary and  this  is  the  object  for  the  graph.

Pretty  straightforward to  be  able  to  do  that.

We're  going  to  close  all  these  slides files  and  we  can  close  this  as  well.

Now  there  is  a  drawback with  the  data  table  box.

I'll  get  into  a  little  bit  of  that in  my  next  example  on  that.

In  the  supercharger,

I'm going to talk about the drawbacks on the data table function itself,

but  I'm  also  going  to  showcase how  to  capture  dynamic  metric  definitions

as  you  do  a  report  and  then also  giving  you  a  little  bit  more  detail,

how  to  get  more  details  around  that.

Then  I'm  going  to  showcase  how to  do  a  dynamic  definition  box  around.

This  is  going  to  take a  little  bit  more  coding

than  just  specifying that  data  table  box.

Let's  talk  about   what  the  drawback  is

and  let's  take  this data  dictionary  from  the  World  Bank.

You  can  see  that  they  have  some  really long  definitions  on  here

and  we  have  some  more  information  around these  various  two  metrics.

But  if  I  just  do  a  data  table  box, what  you'll  see  is

it  doesn't  take  any  of  the  cell  height, width  or  set  wrap  on  here.

This  is  something  actually  I  had  to  go  back  and  to  JMP  support

and  they  confirmed  that using  the  data  table  box,

it  doesn't  capture  these  aspects.

This  isn't  ideal  on  here.

Now,  I  did  put  a  request in  the  JMP  wish list

that  they  would  take  this  so  that  we wouldn't  necessarily  have  to  do  the  coding

that  I  will  be  showing  you in  a  few  more  minutes.

This  isn't  necessarily  ideal,

but  basically  this  data  table  box, there's  some  drawbacks  on  that.

The  way  that  I  got  around  to  it, just  to   give  you  an  idea  of  just

using  the  table  is  I  literally,

let  me  just  show  you   this.

This  was  my  attempt on  here,  recreating  it.

This  is  the   the  desired  outcome and  it  took  a  lot  of  coding  on  here.

Just  to   give  you  an  idea, I  mean,  you  could  try  to  do  this,

but  this  isn't  really  dynamic.

This  is  just  basically you  would  have  to  explicitly,

you're  basically  coding  each  one  of  these  cells,

the  titles  and  then each  one  of  the  text  boxes  down  here.

For  example,  I  took  this  indicator  name from  the  first  row  and  put  it  here,

long  definition  from  the  first  row  of the  data  table  to  here,  blah,  blah,  blah,

and  just  gone  on  and  on  and  on  and  on.

This  isn't  ideal  at  all.

It  requires  a  lot  of  explicit  coding.

Also  the  other  thing  I  noticed  is

you  can't  do  for  loops  within the  display  boxes  at  all.

I  went  back  to  JMP  support  and  said

is  there  anything  that's  just  not documented  of  how  to  get  the  display  boxes

or  the  cell  heights  from  the  data tables  translated  back  over?

They  actually  gave  me  a  solution

and  I'll  show  you   how  we  go  through  that in  a  few  minutes.

But  because  basically  the  data  table  box does  not  allow  you  to  set  the  cell  height,

the  cell  width  or  the  wrap.

Let's  look  at   how if  we  did  want  to  do  something

that  is  a  little  bit  more  dynamic

and  building  out  a  data  dictionary from  your  analysis.

Here,  what  I've  done  here

is  really   as  I  was  going  through building  metrics,

I  define   the  different  aspects that  I  would  want  in  my  data  dictionary.

Here  I  have  initial  metric  for  the  name of  metric,  the  definition,  the  source,

the  frequency  of  the  metric, and  the  specific.

I  will  be  capturing  that  every  time I  go  through,  I  create  a  new  metric.

Let's  say  I  work  at  the  World  Bank,

I'm  creating  these  metrics and  this  data and so forth.

Imagine  I  have  a  whole  bunch  of  code  that basically  develops  a  metric  and  so  forth,

takes  that  data,  does  stuff,

I  create  this  GDP  growth, annual  percentage  type  of  thing.

This  is  essentially  how  I  basically would  have  captured  the  information.

I  just  take  the  information for  the  definition  name.

I  would  put  the  definition  up  here

in  a  separate  line  and  then  carry  that  as an  object  in,  and  just  do  an  insert  into

and  then   insert  this into  the  list  and  so  forth.

That  essentially  if  I  run  the  script just  so  that

and  make  sure  I  don't  have anything  highlighted,

I'm  essentially  taking  each  one  of  these metrics,  and  I  have  four  metrics  down  here

and  just  basically  defining each  one  of  these  metrics

with  the  different  information as  we're  going  through  and  building  out.

One  of  the  things  I  did  put in  here  is  this  report.

Up  here and  why  I  have  this, and  this  is  important  is

maybe  you  want  to  have  an  idea where  that  report  is  going  to.

Basically  this  metric  is  going to  be  used  for  this  report.

These  metrics  are  going to  be  for  that  report

so  that  you  can  centralize all  your  definitions  in  one  place.

Like  you  can  go and  put  this  into  a  database

or  consolidate  it  all into   one  location

so that  everybody  can  have  access to  that  information  and  understand,

have  one  working  definition and  know  where  that  is.

If  you're  doing  multiple  reports,

to say  what  reports  those  are  in

and  then  also  know if  they  need  to  make  changes,

which  reports are  these  metrics  in.

It's  a  really  great  way to  keep  track  of  things

and  also  keep some  governance  around  those  names.

What  I  did was,  essentially  did, as  you  can  see,  is  I  built  out,

I  just  basically  came  up  with  a  list,

and  then  essentially for  each  one  of  these  lists,

I  just  set  those values  into  that  column.

If  we  go  back  to  here,

each  one  of  these  objects,

which  is  a  list,  would  be  basically everything  for  the  indicator  name.

There  should  be  four  things.

If  we  go  over  here  and  look  at the  death  name,

and  I  go  to  my...

You  can  see  that  I  have a  list  here  of  names

and  that  just  becomes all  the  content  within  the  indicator  name.

That  was  just  basically  done  by creating  a  new  table

and  then  just setting  that  list  for  that  column.

I  did  that  for  each  one  of  those.

Then  once  you  have  these  all consolidated  into  one  place,

you  can  then  just  go  save  this to  a  database  or  some  central  location

so that  not  just  you, but  anybody  else  can  get  it.

Then  you  can  also pull  this  into  your  reports

based  on   what you  want  to  have  done.

That's  what  we're going  to  talk  about  next.

Let  me  just  close  this  out.

Go  back  to ...

Close  this.

Close  this.

All  right,  so  we  just  built  a  definition

and  now  let's  talk  about  kind of  building  a  dynamic  analysis.

We  just  created that  table  of  indicators.

What  we're  going  to  want  to  do  is   we  want  to  create  a  report,

but  we  don't  want  all  maybe we  don't  want  necessarily...

maybe  we  have  like  100  indicators

and  now  we  just  want just  a  few  of  those  indicators.

The  first  thing  we  can  do is  just  basically,

we  can  open up  these  data  tables.

Now  I'm  adding  in,

I  have  another  data  dictionary,

which  is  basically the  different  countries  here.

I  have  the  data  dictionary of  the  indicator

or  the  metrics  that  we  just  created.

Then  we  actually  have  the  data  over  here,

which  is  all  the  actual  data along  with  the  countries  on  there.

But  we  want  to  create  a  report with  the  definitions.

Once  we  do  that, we  can  then  go  and  figure  out

we  can  get  the  values  of  the  variables here  from  the  data  dictionary.

We  basically  are  going  to  pull all  these  metrics

and  then  we're  going  to combine  that  with   the  other

information  that  we  are  getting from  the  actual  metric  definitions.

I  got  a  lot  of  things  going  on  here.

Basically  I'm  going  to  be  pulling on  here  these  four  data  points.

But  I  also  need  to  be  from  this I  want  to  also  be  pulling

this  information,  the  country  name, the  country  code,  and  the  year.

I  basically  added  those  variables in  there  and  I  did  another  insert  here.

That  essentially  I'm  just  pulling the  columns  from  our  metrics.

I  won't  be  getting  all the  metrics  from  this  data  table.

Let's  run  this  and  then  I'm  going to  just  subset  the  data

based  on   what  we  need.

Okay,  so  this  is  our  subset  of  data.

W e  have  our  metadata  over  here

along  with  those  four  data  metric   variables  that  we  wanted.

The  next  part  is we're  going  to  just  do  a...

We  want  to  come  up  with  our  analysis. I'm  just  doing  something  really  simple.

Just  run  that.

Then  here  comes  where...

So  if  I  go  back  to  this  data  dictionary,

we  saw  that  there's  already  some

cell  height  width  that  are  really  specific for  each  one  of  these  cells

that  make  it  easy  reading  for  your  audience.

We  want  to  be  able  to  capture  that

and  be  able  to  translate  that  back  into the  presentation  that  we  have

when  we  take  these  tables  and  put  this  on.

We  can't  do  that  with  the  data  table

because  basically  the  data  table doesn't  allow  that  for  that  formatting.

Thank  you  to  Jay  Sun  from  Tech  Support,

who  helped   figure  out  how  to  do this  because  there  was  a  couple  of  things.

One,  as  I  said  before,

you  can't  do  for  loops  within  table  boxes or  other  displays,

you  need  to  be able  to  do  this  by  a  lot  of  object.

I'm  struggling  with  the  terminology  here,

but  essentially  you  really  have  to  be taking  this  and  building  the  table  box.

Then  also,  depending  on  how  many

columns  and  also   how  many  indicators  you  have,

you  want  to  have  this  dynamic.

You  don't  want  to  be coding  this  specifically.

Essentially  we're  capturing  both the  cell  heights,

we're  getting  the  names,

I  have  coded  the  wisp  explicitly.

The  code  that  we  had  was  working

and  then  for  some  reason  yesterday it  wasn't  working.

The  only  way  I  could  get  around with  this   was  to  explicitly

map  that  out  to  make  this  work.

Then  basically  building  out  the  data table  with  some  columns

and  then  indicators  and  basically   just  basically  having  some  for  loops,

basically  taking  the  formatting

and  the  content   and  building  out  the  table.

Let's  run  this  and   see what  this  actually  looks  like.

This  is  what  this  actually  looks  like.

Again,  similarly,  we  had  the  analysis,

I  keep  the  boxes  closed

and  basically  all  you  need  to  do  is  open  up

and  then  here's   all  your  definitions  on  here

along  with  the  indicator  name,

which  is  the  y  axis  over  here along  with  the  long  definition.

Folks  can  really   geek  out  on  that,

where  the  source  is, where  the  annual,

how  frequently  the  metric is  actually  taken,

and  the  other   metadata  around  it,

like  how  these  metrics  were  actually  calculated.

You  can  do  this  for...

Here,  I  have  it  for the  metric  definitions,

but  I  also  have  the  definitions for  the  different  countries  too.

For  example,  I  was  looking  at  this going  Eurozone,  what's  Euro zone.

I  know  what  Europe  is and  I  know  there's  some  countries

that  take  euros,  but  they  may not  necessarily  be  in  the  euro.

For  example,  I  think  Montenegro  is  one of  the  countries  that  uses  the  Euro,

but  they're  not  considered  the  Eurozone.

Same  thing  with  Europe  and  Central  Asia. What  does  that  mean?

Is  Mexico  in  North  America  or  not?

Depends  on  different  definitions.

This  really  helps  the  audience  hopefully get  the  information  that  they  want

so  they  can  interpret this  data  fairly  well.

As  you  can  see,  this  took a  bit  more  coding  on  that.

I  won't  necessarily  go through  all  the  detail  of  that  coding,

but  if  you  want  to  get that  coding,  feel  free  to...

I  would  be  more  than  willing  to  be able  to  share  this  coding  with  you.

Just  email  me

at  sarahcallison@achievemorewithdata.com and  I'll  be  more  than  glad  to  share

the  data  that  I  have  with  you along  with  the  code.

Okay,  so  call  to  action.

Well,  my  call  to  action  to  everyone  is

try  to  build  in  data dictionaries  with  your  analysis.

It  really  helps  your  audience really  understand  the  data.

It  really  helps  them  be  able  to  go, oh  yeah,  that's  what  it  means,

and  be  able  to  focus  versus   questioning  what  they're  actually  seeing.

Also,  number  two,  it  helps  you  too,

as  the  analyst,  remember,  oh  yeah, this  metric  was  percentages  or

dollars  versus  euros  or  whatever  else  that you  may  need  to  know  about  that  analysis.

The  other  aspect  too  is  vote for  on  the  JMP  wish list.

Enable  cell  formatting  for  Data  table  box.

I  totally  believe  that  this  could  help simplify  the  code  that  we  use

versus  having  to  have  for  loop   and  having  to  understand  a  lot  of  JSL.

I  think  this  would  really  open  up  things for  getting  more  people

to  do  data  dictionaries

if  we  can  enable  simpler coding  just  with  the  data  table  box.

So,  I  really  appreciate  everyone's  time in  listening  to  this

and   seeing an  alternative  way  of  being  able

to  present  your  data  and  add  more contextual  information  to  your  data.

Please  let  me  know   how  you're using  the  data  dictionary

and  if  you  have  other

unique  ways  of  being  able to  present  that  information.

So  thank  you  very  much

and  hopefully  I'll  see  you  at  the Junk  Discovery  Summit  in  October.

Thank  you.