World Statistics Day was yesterday, but we’re celebrating all week long! This celebration means acknowledging the impact statistics has on our world. Who is your favorite statistician? Share with us who they are and why they top your favorites list.
Ever since Ben Shneiderman developed the Treemap in the early 1990s, people have been coming up with new algorithms for laying out tiles, or rectangles, to represent data. Treemap is an area-filling visualization designed to represent the data such that the user can compare values easily by looking at the relative area of the rectangles. You may be familiar with the Treemap platform in JMP or with the Treemap option in Graph Builder.
Treemaps in JMP have always generated tiles using a Split algorithm. This algorithm has the advantage of preserving the order of your data in the visualization. It is also stable -- meaning that as you resize the visualization, the rectangles tend to stay in the same general area. This makes it easier to find specific values.
Let's take a look at a Treemap of the San Francisco crime data, which you can find in the sample data folder in JMP.
In this Treemap, we see the number of incidents of each category of crime, over a certain time period. The larger the rectangle, the more often that type of crime occurred. It would appear that Larceny/Theft is the most frequent. But which is next? Is it Non-Criminal or Other Offenses? Sometimes it is hard to compare values when values are similar, especially when the rectangles are not located near each other. Also, how does Fraud compare to Suspicious Occ (short for Occurrences)? With the current algorithm in JMP, we sometimes get odd-shaped rectangles, either tall and thin or short and fat. This makes it difficult to compare those values as well.
To help with these issues, Treemap in JMP 12 now includes a new tiling algorithm called Squarify. You got a sneak peek of this in Anne Milley's post, What aspiring data analysts need to know. Squarify orders the rectangles by size so we can more easily compare similar values. The largest value will always be found in the top-left corner and the smallest value in the bottom-right corner. There are clear lines showing how the space was divided up, indicating which order the rectangles were generated, and therefore the order of the values.
So let's choose Squarify instead of Split from the Layout menu option. Larceny/Theft is again clearly the largest value. Using the line that extends from the top to the bottom along the right side of Larceny/Theft as a clue, I know that Other Offenses is next largest, followed by Non-Criminal, Assault, Vandalism and so on.
Squarify also attempts to keep each rectangle's aspect ratio close to 1, so it is close to being square. This avoids odd-shaped rectangles that are difficult to compare or in some cases even difficult to see. In the first picture, you may notice a thin red horizontal rectangle in the upper-left corner, above Assault. This represents the value for Arson. Because of the odd shape and the fact that it is a low value, it is difficult to compare and even difficult to find. In the second picture, showing Squarify, you can find Arson near the lower-right corner, but you can see there are other rectangles below it -- this indicates that Arson isn't the smallest value.
So if you are using Treemap and you want to find a specific value, Split might be the right algorithm for you. It will preserve the order of your data and make a specific value easier to find. But if the question you are trying to answer is which value is the largest, or smallest, or if you want to compare similar values, then give Squarify a try. It might make it easier to find the answer.
So what happens if you have nested categories in your Treemap? That is the topic for another blog post.