Subscribe Bookmark RSS Feed
adam_terwillige

Joined:

Aug 13, 2014

Scagnostics JMP Add-in – A New Way to Explore your Data

Scagnostics, scatterplot diagnostics, was discovered by John and Paul Tukey and later popularized by Leland Wilkinson in Graph-Theoretic Scagnostics (2005). These analyses were redefined in High-Dimensional Visual Analytics: Interactive Exploration Guided by Pairwise Views of Point Distributions (2006).

The beauty of scagnostics is the ability to visually explore a dataset. JMP has the inherent feature called Scatterplot Matrix (SPLOM), which allows the user to simultaneously compare the relationship between many pairs of variables.

However, SPLOMs lose their effectiveness when the number of variables get too large.

That’s where scagnostics comes in! Scagnostics assesses five aspects of scatterplots: outliers, shape, trend, density, and coherence.

This summer, I had the privilege of writing this JMP add-in that allows the user to interactively explore data using nine graph-theoretic measures.  The add-in combines three current features of JMP: Distribution, Scatterplot Matrix, and Graph Builder. Each point in the scatterplot represents a 2D scatterplot. When the user selects a point in the scatterplot matrix in the bottom left, Graph Builder shows the respective scatterplot for the two variable in the bottom right.

7104_ScagnosticsExample.PNG

With scagnostics, we are able to uncover much more informative and enlightening analyses when doing exploratory visual data analysis.

Update: Version 2.1 of the Scagnostics Add-In has been uploaded, which updates the launcher to optionally take a pre-computed table as an argument, and also adds a red-triangle option to turn on the box-plots for the distributions.

Comments

Thank-you.  This is very useful!

Adam, this add-in was intriguing so I went back to the source paper to get a better understanding of the methodology and would encourage others to take a look. My comment is: Adam - please save the "platform" to the characteristics table so that we can save our tables and revisit the graphics at a future point in time! A secondary request would be to have the outlier box plots on the histograms to make selecting outliers in one variable even easier. Thanks for add-in, I have already utilized it in a data discovery project and plan to use it more in the future. Looking forward to that save platform feature!

Karen

danschikore

karen@boulderstats - thanks for your comments.  I think the updated Add-In will address both of your suggestions.  You can now save the characteristics table and provide it when you re-run the Scagnostics tool to avoid recomputing the data (thanks Ian@JMP!) .  We have also added a red-triangle menu to show the box plots, which does make it easier to select the outliers.

Thank you!  I ran this on a data set with 1200 columns (while I slept) for fun.  However, by doing so I found two markers (out of the 1200) that had plate to plate differences (2 plates worth of data). That is truly a needle in a haystack (or 2 needles in the haystack).

colemde

Great concept!  Unfortunately... Does not appear to work in JMP13 - with "Name Unresolved" error.   Have tried with multiple datasets.

mkennke

I have the same Problem with JMP 13 Pro - "Name Unresolved"