Showing results for 
Show  only  | Search instead for 
Did you mean: 
A Dark Tale: Visual Storytelling With JMP® (2021-EU-30MP-792)

Level: Intermediate


Caleb King, JMP Research Statistician Tester, SAS


In this talk, we illustrate how you can use the wide array of graphical features in JMP, including new capabilities in JMP 16, to help tell the story of your data. Using the FBI database on reported hate crimes from 1991-2019, we’ll demonstrate how key tools in JMP’s graphical toolbox, such as graphlets and interactive feature modification, can lead viewers to new insights. But the fun doesn’t stop at graphs. We’ll also show how you can let your audience “choose their own adventure” by creating table scripts to subset your data into smaller data sets, each with their own graphs ready to provide new perspectives on the overall narrative. And don’t worry. Not all is as dark as it seems...



Auto-generated transcript...




Caleb King Hello, my name is Caleb King, I am a developer at the at the JMP software. I'm specifically in the design of experiments group, but today I'm going to be
  a little off topic and talk about how you can use the graphics and some of the other tools in JMP to help with sort of the visual storytelling of your data.
  Now the data set I've chosen is illustrate that is the hate crime data collected by the FBI so maybe a bit of a controversial data set but also pretty relevant to what what's been happening.
  And my goal is to make this more illustrative so I'll be walking through a couple graphics that I've created previously
  for this data set. And I won't necessarily be showing you how I made the graphs but the purpose is to kind of illustrate for you how you can use JMP
  for, like I said, visual storytelling so use the interactivity to help lead the people looking at the graphs and interacting with it
  to maybe ask more questions, maybe you'll be answering some of their questions, as you go along. But kind of encourage that data exploration, which is what we're all about here at JMP.
  So with that said let's get right in.
  I'll first kind of give you a little bit of overview of the data set itself, so I'll kind of just scroll along here. So there's a lot of information about
  where the incidents took place.
  As we keep going, and the date, well, when that incident occurred. You have a little bit of information about the offenders
  and what offense, type of offense was committed. Again some basic information, what type of bias was presented in the incident. Some information about the the victims.
  And overall discrimination category, and then some additional information I provided about latitude and longitude, if it's available, as well as some population that I'll be using other graphics. Now just for the sake of, you know, to be clear,
  the FBI, that's the United States Federal Bureau of Investigation, defines a hate crime is any
  criminal offense that takes place that's motivated by biases against a particular group. So that bias could be racial, against religion, gender, gender identity and so forth.
  So, as long as a crime is motivated by a particular bias, it's considered a hate crime, and this data consists of all the incidents that have been
  collected by the FBI, going back all the way to the year 1991 and as recent as 2019. I don't have data from 2020, as interesting as that certainly would be,
  but that's because the FBI likes to take its time and making sure the data is thoroughly cleaned and prepared before they actually create their reports. So
  you can rest assured that this data is pretty accurate, given how much time and effort they put into making sure it is.
  Alright, so with that let's kind of get started and do some basic visual summary of the data. So I'll start by running this little table script up here. And all this does is basically give us
  a count over over the days, so each day, how many incidents occurred, according to a particular bias. From this I'm going to create a basic plot, in this case it's simple sort of line plot
  here, showing the number of incidents that happen each day over the entire range. So you get the can see the whole range of the data 1991 to 2019
  and how many incidents occurred. Now this in and of itself would probably a good static image, because you get kind of get a sense of
  where the the number of incidents falls. In fact here I'm going to change the axis settings here a little bit. Let's see, we got increments in 50s, let's do it by 20s. There we go.
  So there's a little bit of interactivity for you... interaction. We changed the scales to kind of refine it and get a better sense of
  how many incidents, on average, there are. I ran a bit of a distribution and, on average, around 20 incidents per day
  that we see here. Now of course, you're probably wondering why I have not yet addressed the two spikes that we see in the data.
  So yes, there are clearly two really tall spikes. And so, if this were
  any other type of software, you might say, okay, I'd look like to look into that. So you go back to the data you try and, you
  know, isolate where those dates are and maybe try and present some plots or do some analysis to show what's going on there. Well, this is JMP and we have something that can
  help with that, and it's something that we introduced in JMP 15 called graphlets and it works like this. I'm just going to hover and boom.
  A little graphlet has appeared to help further summarize what's going on at that point. So in this case there's a lot of information.
  We'll notice first the date, May 1, 1992. So if you're familiar with American history, you might know what's happening here, but if not,
  you can get a little bit of an additional clue by clicking on the graph. So now you'll see that I'm showing you
  the incidents by the particular bias of the incident. So we see here that most of the incidents were against white individuals and then next group is Black or African-American and it continues on down.
  I kind of give away the answer here, in that the incidents that occurred around this time where the Rodney King riots in California. Rodney King,
  an African-American individual who was unfortunately slain by a police officer and that led to a lot of backlash and rioting
  around this time. So that's what we're seeing captured in this particular data point, and if you didn't know that, you would at least have this information to try to start and go looking there...looking online to figure out what happened.
  We can do the same thing here with a very large spike. And again, I'll use the hover graphlet, so hover over it. I'll pause to let you look. So we look at the date, September 12, 2001. That's in it of itself a very big clue
  as to what happened. But if we look here at the breakdown, we can see that most of the incidents were against individuals of Muslim faith, of
  Arab ethnicity or some other type of similar ethnicity or ancestry. In this case, we can clearly see
  that, after the unfortunate events of September 11, the terrorist attacks that occurred then, there was on the following day, a lot of backlash against
  members who were of similar ethnicity, similar faith and so forth, so we had an unfortunate backlash happening at that time.
  So already with just this one plot and some of that interactivity, we've been able to glean a lot of information, a lot of high level information in areas where you might want to look further.
  But we can keep going. Now something new in JMP 16 is, because we have date here on the X axis, we can actually bin the dates into a larger category, so in this case let's bin it by month.
  And we see that the plot disappears. So here's what I'm going to do. I'm going to rerun it and let's see.
  There we go.
  You never know what will happen. In this case, so this is what's supposed to happen; don't worry.
  So we've binned it by month and we noticed an interesting pattern here. There seems to be some sort of seasonal trend occurring, and let's use the hover graphlets to kind of help us identify what might be happening. So I'm going to hover over the lower points. So if I do that,
  we see okay, January, December, okay. Interesting. Let's hover over another one, December.
  And yet another one, December. Ah, there might actually be some actual seasonal trend in this case going on. We seem to hit low points around the the winter months.
  And in fact, if I go back to my data table, I've actually seen that before. It was something I kind of discovered while exploring that technique and I've already created a plot to kind of address that.
  So this was something I created based off of that, kind of, look at, you know, what's the variation in the number of incidents over all the years within this month.
  And here we can see them the mean trends, but we also see a lot of variation, especially here in September because of that huge spike there.
  So maybe we need something a little more appropriate. So I'll open the control panel and hey, let's pick the the median. That's more robust and maybe look at the interquartile range, so that way we have a little bit more robust
  metrics to play with.
  And so, again, we see that seasonal trends, so it seems that there's definitely a large dip within the winter months
  as opposed to peaking kind of in the spring and summer months. Now
  this might be something someone might want to look further into and research why is that happening.
  You might have your own explanations. My personal explanation is that I believe the Christmas spirit is so powerful that overcomes whatever whatever hate or bias individuals might have in December.
  Again that's just my personal preference, you probably have your own. But again, with just a single plot, I was able to discover this trend and
  make another plot to kind of explore that further. So again with just this one plot, I've encouraged more research. And we can keep going.
  So let's see, let's bin it by year, and if we do that, we can clearly see this kind of overall trend.
  So we see a kind of peak in the late '90s around early 2000s before dropping, you know, almost linear fashion,
  until it hits its midpoint about in the mid 2010s before starting to rise again. So keep that in mind, you might see similar trends in other plots we show. But again, let's take a step back and just realize that in this one plot we've seen different
  aspects of the data. We even
  answered some questions, but we've also maybe brought up a few more. What's with that seasonal trend?
  And if you didn't know what those events were that I told you, you know, what were those particular events? So that's the beauty of the interactivity of JMP graphics is it allows the user to engage and explore and encourages it all within just one particular medium.
  All right.
  Let's keep going. So I mentioned, this is sort of visual storytelling, so you can think of that sort of as a prologue, as sort of the the overall view. What's...what's
  what's the overall situation? Now let's look at kind of the actors,
  that is, who's committing these types of offenses? Who are they committing them against? What information can we find out about that? So here I've
  created, again, a plot to kind of help answer that. Now this might be a good start. Here I've created a heat map
  that then emphasizes the the counts by, in this case, the offender's race versus their particular bias. So we see that
  a lot of what's happening, in this case I've sorted the columns so we can see there's a there's a lot going on. Most of its here in this upper
  left corner and not too much going on down here, which I guess is good news. There's a lot of biases where there's not a lot happening, most of it's happening here in this corner.
  Now, this might be a good plot, but again there's a lot of open space here. So maybe we can play around with things to try and emphasize what's going on. So one way I can do that is I'll come here to the X axis and I'm going to size by the count.
  Now you'll see here, I had something hidden behind the scenes. I'd actually put a label, a percentage label on top of these.
  There was just so much going on before that you couldn't even see it, but now we can actually see some of that information. So kind of a nice way to summarize it as opposed to counts.
  But even with just the visualization, we can clearly see the largest amount of bias is against Black or African American citizens and then Jewish and on down until there's hardly any down here. So just by looking at the X axis, that gives you a lot of information about what's going on.
  We can do the same with the Y, so again, size by the count.
  And again, there's a lot of information contained just within the size and how I've adjusted the the axes.
  And this case we include...we've really emphasized that corner, so we can clearly see who the top players are. In this case, most of it is
  offenders are of white or unknown race against African-Americans, the next one being against Jewish, and then
  anti white and then it just keeps dropping down. So we get a nice little summary here of what's going on. Now, you may have noticed that as I'm hovering around, we see our little circle. That's my graphlet indicator, so again I've got a tool here.
  We've we've interacted a little bit and again, this could be a great static image, but let's use the power of JMP, especially those graphslets,
  to interact and see what further information we can figure out. So in this case, I'll hover over here.
  And right here, a graph, in this case, a packed bar chart, courtesy of our graph guru Xan Gregg. In this case, not only can you see, you know,
  what people are committing the offenses and against whom, your next question might have been, you know, what types offenses are being committed? Well, with a graphlet, I've answered that for you.
  We can see here the largest...the overwhelming type of offense is intimidation, followed by simple and aggravated assault, and then the rest of these, that's the beauty of the packed bar chart.
  We can see all the other types offenses that are committed. If you stack them all on top of each other, they don't even compare. They don't even break the top three.
  So that tells you a lot about the types of...these types of offenses, how dominant they are.
  Now, another question you might have is, okay, we've seen the actors, we've seen the actions they're taking,
  but there's a time aspect to this. Obviously this is happening over time, so has this been a consistent thing? Has there been a change in the trends? Well,
  have no fear. Graphlets again to the rescue. In this case, I can actually show you those trends. So here we can see how has the
  types of...the number of intimidation incidents changed over time? And again, we see that the pattern seems to follow what that overall trend was.
  A peak in the like, late 90s, and then the steep trend...almost linear drop until about the mid 2010s, before kind of upticking again more recently.
  And again we can maybe see that trend and others. I won't click to zoom in, but you can just see from the plot here, those trends in simple assault here and aggravated assault as well, a little bit there.
  And you can keep exploring. So let's look at the unknown against African-Americans and see what difference there might be there. In this case, we can clearly see
  that there are two types of offenses that really dominate, in this case, destruction or damage to property (which, if you think about it, might make sense; if you see your property's been damaged, there's a good chance you may not know who did it)
  and intimidation are the dominant ones. And again, you can...the nice thing about this is the hover labels kind of persist, so you can again look and see what trends are happening there.
  So in this case, we see with damage, there's actually two peaks, kind of peaked here in the late '90s early 2000s, before dropping again. And with intimidation, we see a similar trend as we did before.
  Again within just one graphic, there's a lot of information contained and that you, as the user, can interact with to try and emphasize certain key areas, and then you, as the user, just visualize...just looking at this and interacting with it, can play around and glean a lot of information.
  All right.
  And let's keep going. Now you'll notice that amongst the reporting agencies, so, most of them are city/county level
  police departments and so forth, but there's also some universities in here. So there might be someone out there who might be interested in seeing, you know, what's happening at the universities.
  And so, with that, I've created this nice little table script to answer that. Now this time,
  I've been just running the table scripts and I mentioned, I won't go too much behind the scenes, this is more illustrative.
  Here I'm going to let you take a peek, because I want to not only show you the power of the graphics but also the power of the table script. Now if you're familiar with JMP,
  you might know, okay, the table script's nice because I can save my analyses, I can save my reports, I can even use it to save graphics like I did in the last one, so you may not have noticed that you can also save
  scripts to help run additional tables and summary tables and so forth. So let me show you what all is happening behind here, in fact, when I ran the script, I actually created two data tables.
  You only saw the one, so in this case I first created the data table that selected all the universities and then from that data table it created a summary and then I close the old one.
  And then I also added to that some of the graphics. So I won't go into too much detail here about how I set this up, because I want to save that for after the next one. I'll give you a hint. It's based off of a new feature in JMP 16 that will really amaze you.
  All right, let's go back to...excuse incidents.
  And here again I've saved the table script. This one that will show us a graphic.
  So here we can see again is that packed bar chart, and here I'm kind of showing you which universities had the most incidents. Now again, this in and of itself might be a pretty good standard graphic.
  You can see that, you know, which university seem to have the most incidents happening and again it's kind of nice to see that there's no real dominating one. You can still pack the other universities
  on top of them, and nobody is dominating one or the other. So that in and of itself is kind of good news, but again there's a time aspect to this. So
  have these been maybe... has the University of Michigan Ann Arbor, have they had trouble the entire time? Have they...would they have always remained on top? Did they just happen to have a bad year? Again, graphlets to the rescue.
  In this case,
  you'll see an interesting plot here. You might say, you know, what what is this thing? This looks like it belongs in an Art Deco museum. What...
  what kind of plot is this? Well, it's actually one we've seen before. I'm just using something new that came out in JMP 16, so I'm going to give you a behind the scenes look.
  And in this case, we can see, this is actually a heat map. All I've done
  is I do a trick that I often like to do, which is to emphasize things two different ways, so not only emphasizing the counts by color, which is what you would typically do in a heat map, the whites are the missing entries, I can also now in JMP 16 emphasize by size.
  And so I think this again gets back to where we size those axes before. It emphasizes...helps emphasize certain areas. So here we can see now maybe there's a little bit of an issue with incidents against African-Americans,
  that has been pretty consistent, with an especially bad year in apparently 2017, as opposed to all of the other incidents that have been occurring.
  Now there's no extra hover labels here.
  All I'll do is summarize the data, but that's okay.
  This in and of itself gives you a lot of information, so this is a new thing that came out in JMP 16 that can again help with that emphasis.
  And again, we can keep going. We can look at other universities, so here, this might be an example of a university where they seem to have a pretty bad period of time,
  the University of Maryland in College Park, but then there was an area where
  things were really good, and so you might be interested in knowing, well, what happened to make this such a great period?
  Is there something the university instituted, what they did that seemed to cause the count, the number of incidents to drop significantly? That might be something worth looking into.
  And you can keep going and looking again to see whether it's a systemic issue, whether like, in this case, it seemed there's just a really bad year that dominated, overall they were just doing okay.
  They were doing pretty good. Again, this might be another one. They had a really bad time early on, but recently they've been doing pretty good,
  and so forth. So again, kind of highlighting that interactivity yet again,
  and in this case, with some of the newer features in JMP 16. Now, before we transition to the last one, I have a confession. I'm a bit of a map nerd, so I really like maps and any type of data analysis that, you know, relates to maps.
  I don't know why. I just really like it and so I'm really excited to show you this next one, because now we look at the geography of the incidents.
  But I'm also excited because this really, I believe, highlights the power of both the table scripts and the JMP graphics, especially the hover graphs.
  So hopefully that got you excited as well, so let's run it. Now this one's going to take a little while because there's actually a lot going on with this table script. It's creating a new table. It's also doing a lot of functions in that table
  and computing a lot of things. So here we've got not just, you know, pulling in information but also there's a lot of these columns here near the end that have been calculated behind the scenes.
  Now I have to take a brief moment to talk about a particular metric I'm going to be using. So a while back, I wrote a blog entry called the Crime Rate Conundrum on on the JMP Community (,
  so shameless plug there, but in that I talked about how, you know, typically when you're reporting incidents, especially crime incidents,
  usually we kind of know that you don't just want to report the raw counts, because
  there might be a certain area where it has a high number of counts, a high number of incidents, is that just because that's...there's a problem at problem there?
  Or is it because there's just a lot of people there? And so we, of course, would expect a lot of incidents because there's just a lot of people. So of course people
  report incidents rates. Now that's fine because everybody's now on a level playing field but one side effect of that is it tends to elevate
  places that have small populations. Essentially, you have, if you have small denominator, you will tend to have a larger ratio just because of that.
  And so that's sort of an unfortunate side effect, and so there, I talk about an interesting case where we have a place with a really small population that gets really inflated.
  And how some people deal with that. One way I tried to address that was through this use of a weighted incident rate, essentially, the idea is
  I take your incident rate, but then I weight you by, sort of, what proportion...
  excuse me...basically a weight by how many people you have there. In this case, I have a particular weight, I basically rank the populations,
  so that the the largest place would have rank of of the smallest. However, in this case there's 50 states, so the state with the largest population would have a
  rank of 50 and the smallest state a rank of one. If you take that and divide that by you know the maximum rank, that's essentially your weight so it's it's a way to kind of put
  a weight corresponding to your total population and the idea here is that, if your incident rate is such that it overcomes this weight penalty, if you will,
  then that means that you might be someone worth looking into. So it tries to counteract
  that inflation, just due to a small population. If you are still relatively small, but your incident rate is high enough that you overcome your weight essentially,
  we might want to look into you. So hopefully that wasn't too much information, but that's the metric that I'll be primarily using so I'll run the script
  here we go. So first I've got a straightforward line plot that kind of shows the weighted incident rates over time for all the states.
  Now I'll use a new feature here. We can see here that New Jersey seems to dominate. Again interactivity, we can actually click to highlight it.
  There's some new things that we do, especially in JMP 16. I'm going to right click here and I'm going to add some labels. So let's do the maximum value and let's do the last value
  just for comparison.
  So here we can see this...the peak here was about 11.4 incidents per 1,000 (that's a weighted incident rate) here in sort of the early '90s.
  And then we see a decreasing trend, again it seems to drop about the same that all the the overall incident rate did before starting to peak again here in
  2019. So again with just some brief numbers again this, in and of itself, would be an interesting plot to look at, but as you could see, my little graphlet indicator is going, so there's more.
  Here's where the the map part comes in. So I'm going to hover over a particular point.
  In this case, not only
  can you see sort of the overall rate, I can actually break it down for you, in this case by county. So here I've colored the
  counties by the total number of incidents within that year. And again, there's that time aspect, so this shows you a snapshot for one particular year, in this case 2008.
  But maybe you're interested in the overall trend, so one one way you could do that is, hey, these are graphlets. I could go back, hover over another spot, pull up that graph, click on it to zoom,
  repeat as needed. You could do that or you could use this new trick I found actually while preparing this presentation.
  Let's unhide...notice over here to the side, we have a local data filter. That's really the key behind these graphlets.
  I'm going to come here to the year and I'm going to change its modeling type to nominal, rather than continuous, because now, I can do something like this. I can actually go through
  and select individual years or, now this is JMP, we can do better.
  Let me go here and I'm going to do an animation. I'm going to make it a little fast here. I'm going to click play, and now I can just sit back relax and, you know, watch as JMP does things for me.
  So here we can see it cycle through and getting a sense of what's happening. I'll let it cycle through a bit. We see...already starting to see some interesting things happening here.
  Let's let it cycle through, get the full picture, you know. We want the complete picture, not that I'm showing off or anything.
  Alright, so we've cycled through and we noticed something. So let's let's go down here to about say 2004, 2005. So somewhere around here, we noticed this one county here, in particular, seems to be highlighted.
  And in fact, you saw my little graphlet indicator. So again, I can hover over it, and here
  yet another map. Now you can see why I'm so excited.
  Again, in this case, I can actually show you at the county level, so the individual county level...
  Excuse me, let me...let's move that over a bit. There we are. Some minor adjustments and again, you can see my trick of emphasizing things two ways by both size and color.
  We can kind of see dispersion within the ???, this is individual locations and because there's that time aspect again,
  we we know better, we don't have to go back and click and get multiple graphs, we can again use the local data filter tricks. So I can go back. I'll do
  the year, and so in this case, we can again click through. Here I'm just going to use the arrow keys on my keyboard to kind of cycle through.
  And just kind of get a sense of how things are varying over time. In this case, you see a particular area, you've probably already seen it, starting about 2006, 2007ish frame.
  There's this one area...this here.
  Keansburg, which seems to be highlighted and you'll notice yet another graphlet. How far do you want to go?
  Graphception, if you will.
  We can
  keep going down further and further in. In this case, I get...I break it out by what the bias was, and again I could do that trick if I wanted to, to go through and cycle through by year.
  So, again so much power in these graphs. With this one graphlet, I was able to explore geographical variation at county level and even further below, and so it might be
  allowing you to kind of explore different aspects of the data, allowing you to generate more questions. What was happening in Keansburg around this time to make it pop like this?
  That's something you might want to know.
  So that's all I have for you today, hopefully I've whet your appetite and was able to clearly illustrate for you how powerful the the JMP visualization is in exploring the data.
  If you want to know more, there's going to be a plenary talk on data viz. I definitely encourage you to explore that and it kind of helps address different ways of visualization and how JMP can help out with that.
  But I did promise you, at one point, to give you a peek as to how I was able to create these pretty amazing table scripts and I'll do that right now.
  It's called the enhanced log now in JMP 16. This is one of the coolest new features in JMP 16. Enhanced log actually follows along as you interact and it keeps track of it. And so whenever I closed, in this case, closed a data table, opened a data table, ran a data table,
  if I added a new column, if I created a new graph, it gets recorded here in the log.
  This is something that John Sall will be talking about in his plenary talk. It's, again, one of the most new amazing features here.
  And this is the key to how I was able to create these tables scripts. I can honestly say that if this hadn't been present, I probably wouldn't have been able to create these pretty cool table scripts, because it'd be a lot of work to do.
  So again, this is a really cool feature that's available in JMP 16. So I hope I was able to convince you that JMP is a great tool for exploring data, for creating awesome visualizations, interactive visualizations. And that's all I have. Thank you for coming.