If you missed the Statistically Speaking episode on
Data Visualization for Scientists and Engineers
, the on-demand version is now available. We heard some really inspiring perspectives on the importance of data visualization throughout the whole analytic workflow!
Amoolya Singh, Head of Discovery Technologies at Calico Labs gave an amazing plenary talk, “Learning to see patterns in data: How data visualization facilitates the analytic workflow.” Who knew that weaving, computing and visualization were so interconnected? Fascinating!
Amoolya also joined the panel discussion with Kathleen Schneider, Senior Research Associate at Lundbeck; and Scott Wise, Principal Analytic Training Consultant at SAS. Since we didn’t have time to answer all the questions from our audience, our panelists have kindly agreed to answer some of them here.
Many visuals show data separated by color. What accommodations would you suggest for colorblind people?
Scott:
Xan Gregg, Director of Visualization Development at
JMP,
has some good recommendations on graphing in JMP for colorblind people:
While the node graphs and the tSNE plots are both data-rich, I often find them information-poor. I feel as though I have not developed the graphical literacy to interpret them. I always wish for graphs and displays that have a pre-attentive quality to them, such that I can see clearly which bits are informative. Any suggestions?
Scott:
Both node graphs (network diagrams, etc.) and tSNE plots (PCA results, etc.) really need to include the context and story in the graph (either verbally explained or featured as text on/around the graph) for you to really understand “how” to view the patterns in the graph. Don’t be afraid to incorporate the telling of the story in conjunction with these useful graphs.
What impact will Web 2.0 have on data science and analytics?
Note:
Web 2.0 is the second stage of development of the World Wide Web, characterized especially by the change from static web pages to dynamic or user-generated content and the growth of social media
.
Scott:
Web 2.0 is already here (as seen in the rise of social media). If you aren’t gathering this dynamic data source to analyze and visualize trends (like using text mining on social media about your company, products, patents, competitors, etc.) then you already behind the curve. For a good introduction, check out this “
Thriving in a Web 2.0 World
” blog from SAS.
3D graphs were brought up, but I've found that they are hard to share in a fixed format. How would you recommend sharing 3D graph findings?
Scott:
The first rule is to stay away from 3D graphs if they do not add value to the viewer, which is why you don’t see 3D bar charts in JMP since they are often confusing and don’t add any value.
The second rule is to ensure that when 3D graphs are valuable, make sure to use them interactively (like with filters, interactive HTML5 formats, or even with videos) instead of leaving them in a fixed, static format. Doing so allows the user to properly interact with and interpret them. For example, the JMP 3D scatterplot (as featured in our panel discussion on checking three factor experimental settings) is considerably less effective outside of JMP if you aren’t able to move the 3D cube around to change the viewpoint orientation of the points within the image. So, for people who don’t use JMP, try simulating the interactivity of exploring a 3D scatterplot by setting up local data filters that change the viewpoint orientation over a period of time and save this out as an interactive HTML5. Another option would be to create a short video recording yourself exploring the 3D graph that the user can play back.
How have you seen an animated bubble plot used at different companies and industries? Can they be annotated as the animation occurs?
Scott:
I have seen animated bubble plots and maps used in nearly all industries (high tech, healthcare, government, etc.) and in many different organizational areas. In manufacturing/supply chain, animated bubble plots are used successfully to show improvements or trends in products over time (like my example in the panel discussion). In research, animated bubble plots are often used to bring life to growth or changes in products/components throughout their R&D cycles. In sales, animated bubble maps excel in showing sales performance over time across geographic regions. You can create annotation directly on bubble plots, but they won’t move over time. Instead, work with the bubble labelling to show desired annotation information over time.
I teach AP statistics so I may be the only one here who is not in the private sector. As we try to grow interest in this field, I am always looking for videos, TED talks and guest speakers who can walk through a collection of real-world, present-day data and create a visualization and demonstrate the process step by step, so my students can see how particular careers might apply to what we are learning. Do you have any suggested videos?
Scott:
For teaching graphicacy, the formula that always seems to work is engaging your students early with compelling graphics that tell a story and make the statistics come alive. I can think of no better example than showing
Hans Rosling’s 2007 TED talk “The best stats you’ve never seen.”
If they don’t get excited after watching Hans interact with the chart, they probably don’t have a pulse!
Also, the
visualization themed JMP Blogs
in the JMP Community are another great place to get intriguing stories of visualizations enhancing statistics over many interesting topics. Make sure to check out the contributions from me, Ryan DeWitt and Byron Wingerd. Also, the graphics in the
JMP Public
site has an array of interesting and often interactive graphics that tell a story!
Please define graphicacy.
Scott:
I like the definition in
Wikipedia
: “Graphicacy is defined as the ability to understand and present information in the form of sketches, photographs, diagrams, maps, plans, charts, graphs and other non-textual formats.” This definition emphasizes that graphicacy encompasses both understanding and presentation. It also shows that it can happen via many different creative forms.
How JMP can help the user to evaluate not only statistical significance, but also practical significance?
Scott:
Practical significance means showing the magnitude of the difference or effect size. Therefore, I think the best way to combine practical and statistical significance is to always include the graph (which demonstrates the practice significance) next to the statistics (which addresses the statistical significance). Begin with the graph and then show the statistics, as this seems to work best with how our minds process differences in information.
Kathleen:
I work in the biopharmaceutical industry designing downstream (purification/separation) processes so that they can be scaled-up and transferred to manufacturing facilities. Frequently, the processes that I help design are complex and have many interactions. Typically, when I am using JMP, I am trying to solve practical problems including increasing process efficiency (greater yield or shorter process times) or mitigating risk (reducing impurities and risk of process failures). Often, increasing the yield or process efficiency will come at the cost of increasing the impurities or increasing the risk of process failure.
The process steps that we design have multiple substeps. Sometimes early process conditions can lead to increased process impurities or other negative outcomes that are not observed until later process steps. Due to the complexities of the interactions, using JMP for design of experiments and multivariate analysis – including evaluating the statistical significance of main factors, squared factors (whether the effect is linear or has a curve) and two-factor interactions – is a powerful way to evaluate which main factors and two-factor interactions are the most important factors to track and to modify, and which factors are not critical for the process. Using JMP for visualizing this data is also very powerful. The data is used to improve the process or reduce process- and product-related impurities or mitigate process failure. The data can also provide a guide for optimizing a process to produce the highest process efficiency with the lowest levels of impurities and lowest risk of process failure.
As an example, I was working on a process step as part of a team trying to reduce Impurity U. There was much discussion in the team about whether Factor A or Factor B was more important to the level of Response U. We discovered that we were all wrong. The critical factor turned out to be Factor C! Fortunately, we had been tracking Factor C and had data available to track the factor in the process. From this and other experiences, I recommend tracking as many factors as you can, including factors that you think may not be important. You might be surprised.
We appreciate these experts taking the time to provide their perspectives on some of the many questions we received. The rich experiences they have are evident in these answers, as well as in the on-demand version of this
episode
of Statistically Speaking.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.