Digitalizing the Data Dictionary and Other Report Contextual Information - (2023...

Have you gone back to your analysis and tried to figure out what you were doing?

Maybe the analysis was done six months ago

and you're trying to go through your analysis

and trying to figure out what was this metric again?

How did I define it, what were the assumptions behind it?

Now think about your audience.

If they're looking at something that you

don't necessarily understand or they may have want a little bit more nuance about

what the definition of a metric is in the analysis.

Well, the great thing is we can do that.

There are things called data dictionaries that we can be utilizing.

What is a data dictionary?

A data dictionary is really a catalog

that communicates the structure and the content of the data.

More importantly, what we're going to be talking about today is really meaningful

descriptions around what the metrics are so that

you and your audience can understand what is contained in that analysis.

The purpose of this is really to be able to over communicate so that

everyone's on the same page of what the variables or metrics are

and how they were calculated

or any other important information to help either your audience or your future self

understand and document what those metrics are.

We're also going to be showcasing how to be able to centralize these definitions

in one place and be able to digitalize this content.

What I mean by digitalized is just like you take data from raw form

and being able to present that into a graph or an analysis

for people to consume,

we're going to do the same thing with a data dictionary.

We're going to be able to pull the data dictionary

to be able to be tied into your analysis so that your audience or yourself

do not have to go find where that information is.

It's all tied together for easy user experience.

Okay, so today we're going to talk about I have three options here for folks.

The first solution is just really a simple no code.

If you're one of the people who love the GUI around JMP

and you do not want to have anything with code, this is your solution.

Let me walk you through a quick way of being able to do this

so that you can use the data dictionary on concept.

The second part is if you're into JSL and you have a very simple data dictionary

that you want to have and be able to tie on,

I'll show you how to do it with a little dose of JSL.

Now, if you're a supercharger and really want to see JSL

in all its glory and being able to essentially be able to

have a way of being able to categorize your work

along with defining the metrics as you go along

and then be able to centralize that,

really be able to digitalize the data dictionary

and then be able to pull out exactly what you need for a given report.

We'll do this with the supercharged full out JSL.

Then we'll talk about a call to action at the end.

Let's talk quickly.

Let's just JMP in here and talk about the simple no code option.

First we're just going to be using a JMP data table,

and then we're going to embed the definitions into that data table.

Let me open up the data and the definition table.

Here's what a data dictionary, a very simple data dictionary looks like.

You have the variables and a description,

which each of these variables are columns here in the data table.

You have the variable or the column name along with the meaning.

I also added the references over here.

We can come over here and just plot our data and we can see what

the various different metrics look like in here.

But maybe your audience comes here and goes, what does abundance mean?

Or how are we defining richness? Or what's the Shannon Metric?

Well, they're going to have to go and figure out what that actually means.

What we can do here is we can tie this data table to the data dictionary table,

to the data table, and we can do this really quickly.

If you go over to this triangle over here, click on the red,

go down to copy data table.

We can come over here, do the same thing over in the data table.

But this time we're going to hit new script

and we're going to type in definition.

We're going to then paste.

I have command v or control v on here.

You'll see some code.

You do not need to understand what that actually means.

You're going to say save and then you're going to say okay.

You're going to see that you have the definition.

If your audience or your future self

wants to come in here and say, hey, here's my plot, that's great,

I see my data, but I need to know what the definition?

They just need to hit the plot and voila.

You have well formulated your definitions.

Okay, very simple.

Hopefully if somebody has a data table, come up with some metrics,

use another JMP data table for it and just attach it

to your data table and you have a great reference.

Okay, so that is the simple notebook.

Now let's go to simple with a little dose of JSL.

All right, so here we're going to publish the analysis

along with the definition table to a window,

and then we're going to export the analysis to a PowerPoint deck.

What I'm going to do is I'm going to run the script

and then I'll walk you through how I did the script.

The first thing is let's get this.

This is what the outcome of the script does.

You see the same analysis up here, but what you see down here is

we have a definition table, and you can just open up the definition table,

and then you can see what the different rows are

or the definitions of the different meanings of the data

down here along with the reference

if people wanted to go and know where to go

to look for more of that information,

How did we actually do this?

Essentially the first thing we did was basically opened up the data tables here

and then the next thing we did was we created a window.

This is basically this window right here

and then an outline box with the analysis.

This is just a simple graph builder, nothing fancy.

Then down here we did another outline box here where I had it closed.

But we're going to need to keep that open for right now.

Then simply just did a data table box

that just basically pasted that into the window.

Okay, now let's say that this is great,

but we want to export this to a PowerPoint.

We can very well easily do that by making sure you have the definition window open.

Don't have it closed or it won't export to the PowerPoint.

Make sure it's open and then just say save to PowerPoint.

Voila. There you go.

It exports the data to the PowerPoint.

Now there's a little bit of things that you're going to need to do.

You can make this bigger, you can make this change the color on it.

The titles up here, make that white just so it makes it stand out or change it.

Then also here's the data dictionary as well.

All you need to do is this is all edible so that you can

actually make this a little bit more user friendly.

I find that 12 font makes it better,

but you can have the definitions in the backup of your slides

so that people can come back and be able to reference it.

It doesn't have to be part of the meet of the presentation.

It could just be in the backup slides.

Makes it really easy to be able to export all this information

to PowerPoint and just have it tied with your analysis.

All in really essentially a few lines of code.

Basically all I did to export this was I created the button save the PowerPoint.

Just basically save the graph and then save

on one page, on page two

and then on page three I had the data dictionary part.

This is the object for the data dictionary and this is the object for the graph.

Pretty straightforward to be able to do that.

We're going to close all these slides files and we can close this as well.

Now there is a drawback with the data table box.

I'll get into a little bit of that in my next example on that.

In the supercharger,

I'm going to talk about the drawbacks on the data table function itself,

but I'm also going to showcase how to capture dynamic metric definitions

as you do a report and then also giving you a little bit more detail,

how to get more details around that.

Then I'm going to showcase how to do a dynamic definition box around.

This is going to take a little bit more coding

than just specifying that data table box.

Let's talk about what the drawback is

and let's take this data dictionary from the World Bank.

You can see that they have some really long definitions on here

and we have some more information around these various two metrics.

But if I just do a data table box, what you'll see is

it doesn't take any of the cell height, width or set wrap on here.

This is something actually I had to go back and to JMP support

and they confirmed that using the data table box,

it doesn't capture these aspects.

This isn't ideal on here.

Now, I did put a request in the JMP wish list

that they would take this so that we wouldn't necessarily have to do the coding

that I will be showing you in a few more minutes.

This isn't necessarily ideal,

but basically this data table box, there's some drawbacks on that.

The way that I got around to it, just to give you an idea of just

using the table is I literally,

let me just show you this.

This was my attempt on here, recreating it.

This is the the desired outcome and it took a lot of coding on here.

Just to give you an idea, I mean, you could try to do this,

but this isn't really dynamic.

This is just basically you would have to explicitly,

you're basically coding each one of these cells,

the titles and then each one of the text boxes down here.

For example, I took this indicator name from the first row and put it here,

long definition from the first row of the data table to here, blah, blah, blah,

and just gone on and on and on and on.

This isn't ideal at all.

It requires a lot of explicit coding.

Also the other thing I noticed is

you can't do for loops within the display boxes at all.

I went back to JMP support and said

is there anything that's just not documented of how to get the display boxes

or the cell heights from the data tables translated back over?

They actually gave me a solution

and I'll show you how we go through that in a few minutes.

But because basically the data table box does not allow you to set the cell height,

the cell width or the wrap.

Let's look at how if we did want to do something

that is a little bit more dynamic

and building out a data dictionary from your analysis.

Here, what I've done here

is really as I was going through building metrics,

I define the different aspects that I would want in my data dictionary.

Here I have initial metric for the name of metric, the definition, the source,

the frequency of the metric, and the specific.

I will be capturing that every time I go through, I create a new metric.

Let's say I work at the World Bank,

I'm creating these metrics and this data and so forth.

Imagine I have a whole bunch of code that basically develops a metric and so forth,

takes that data, does stuff,

I create this GDP growth, annual percentage type of thing.

This is essentially how I basically would have captured the information.

I just take the information for the definition name.

I would put the definition up here

in a separate line and then carry that as an object in, and just do an insert into

and then insert this into the list and so forth.

That essentially if I run the script just so that

and make sure I don't have anything highlighted,

I'm essentially taking each one of these metrics, and I have four metrics down here

and just basically defining each one of these metrics

with the different information as we're going through and building out.

One of the things I did put in here is this report.

Up here and why I have this, and this is important is

maybe you want to have an idea where that report is going to.

Basically this metric is going to be used for this report.

These metrics are going to be for that report

so that you can centralize all your definitions in one place.

Like you can go and put this into a database

or consolidate it all into one location

so that everybody can have access to that information and understand,

have one working definition and know where that is.

If you're doing multiple reports,

to say what reports those are in

and then also know if they need to make changes,

which reports are these metrics in.

It's a really great way to keep track of things

and also keep some governance around those names.

What I did was, essentially did, as you can see, is I built out,

I just basically came up with a list,

and then essentially for each one of these lists,

I just set those values into that column.

If we go back to here,

each one of these objects,

which is a list, would be basically everything for the indicator name.

There should be four things.

If we go over here and look at the death name,

and I go to my...

You can see that I have a list here of names

and that just becomes all the content within the indicator name.

That was just basically done by creating a new table

and then just setting that list for that column.

I did that for each one of those.

Then once you have these all consolidated into one place,

you can then just go save this to a database or some central location

so that not just you, but anybody else can get it.

Then you can also pull this into your reports

based on what you want to have done.

That's what we're going to talk about next.

Let me just close this out.

Go back to ...

Close this.

All right, so we just built a definition

and now let's talk about kind of building a dynamic analysis.

We just created that table of indicators.

What we're going to want to do is we want to create a report,

but we don't want all maybe we don't want necessarily...

maybe we have like 100 indicators

and now we just want just a few of those indicators.

The first thing we can do is just basically,

we can open up these data tables.

Now I'm adding in,

I have another data dictionary,

which is basically the different countries here.

I have the data dictionary of the indicator

or the metrics that we just created.

Then we actually have the data over here,

which is all the actual data along with the countries on there.

But we want to create a report with the definitions.

Once we do that, we can then go and figure out

we can get the values of the variables here from the data dictionary.

We basically are going to pull all these metrics

and then we're going to combine that with the other

information that we are getting from the actual metric definitions.

I got a lot of things going on here.

Basically I'm going to be pulling on here these four data points.

But I also need to be from this I want to also be pulling

this information, the country name, the country code, and the year.

I basically added those variables in there and I did another insert here.

That essentially I'm just pulling the columns from our metrics.

I won't be getting all the metrics from this data table.

Let's run this and then I'm going to just subset the data

based on what we need.

Okay, so this is our subset of data.

W e have our metadata over here

along with those four data metric variables that we wanted.

The next part is we're going to just do a...

We want to come up with our analysis. I'm just doing something really simple.

Just run that.

Then here comes where...

So if I go back to this data dictionary,

we saw that there's already some

cell height width that are really specific for each one of these cells

that make it easy reading for your audience.

We want to be able to capture that

and be able to translate that back into the presentation that we have

when we take these tables and put this on.

We can't do that with the data table

because basically the data table doesn't allow that for that formatting.

Thank you to Jay Sun from Tech Support,

who helped figure out how to do this because there was a couple of things.

One, as I said before,

you can't do for loops within table boxes or other displays,

you need to be able to do this by a lot of object.

I'm struggling with the terminology here,

but essentially you really have to be taking this and building the table box.

Then also, depending on how many

columns and also how many indicators you have,

you want to have this dynamic.

You don't want to be coding this specifically.

Essentially we're capturing both the cell heights,

we're getting the names,

I have coded the wisp explicitly.

The code that we had was working

and then for some reason yesterday it wasn't working.

The only way I could get around with this was to explicitly

map that out to make this work.

Then basically building out the data table with some columns

and then indicators and basically just basically having some for loops,

basically taking the formatting

and the content and building out the table.

Let's run this and see what this actually looks like.

This is what this actually looks like.

Again, similarly, we had the analysis,

I keep the boxes closed

and basically all you need to do is open up

and then here's all your definitions on here

along with the indicator name,

which is the y axis over here along with the long definition.

Folks can really geek out on that,

where the source is, where the annual,

how frequently the metric is actually taken,

and the other metadata around it,

like how these metrics were actually calculated.

You can do this for...

Here, I have it for the metric definitions,

but I also have the definitions for the different countries too.

For example, I was looking at this going Eurozone, what's Euro zone.

I know what Europe is and I know there's some countries

that take euros, but they may not necessarily be in the euro.

For example, I think Montenegro is one of the countries that uses the Euro,

but they're not considered the Eurozone.

Same thing with Europe and Central Asia. What does that mean?

Is Mexico in North America or not?

Depends on different definitions.

This really helps the audience hopefully get the information that they want

so they can interpret this data fairly well.

As you can see, this took a bit more coding on that.

I won't necessarily go through all the detail of that coding,

but if you want to get that coding, feel free to...

I would be more than willing to be able to share this coding with you.

Just email me

at sarahcallison@achievemorewithdata.com and I'll be more than glad to share

the data that I have with you along with the code.

Okay, so call to action.

Well, my call to action to everyone is

try to build in data dictionaries with your analysis.

It really helps your audience really understand the data.

It really helps them be able to go, oh yeah, that's what it means,

and be able to focus versus questioning what they're actually seeing.

Also, number two, it helps you too,

as the analyst, remember, oh yeah, this metric was percentages or

dollars versus euros or whatever else that you may need to know about that analysis.

The other aspect too is vote for on the JMP wish list.

Enable cell formatting for Data table box.

I totally believe that this could help simplify the code that we use

versus having to have for loop and having to understand a lot of JSL.

I think this would really open up things for getting more people

to do data dictionaries

if we can enable simpler coding just with the data table box.

So, I really appreciate everyone's time in listening to this

and seeing an alternative way of being able

to present your data and add more contextual information to your data.

Please let me know how you're using the data dictionary

and if you have other

unique ways of being able to present that information.

So thank you very much

and hopefully I'll see you at the Junk Discovery Summit in October.

Thank you.

Files