Why model visualization is integral to model building
Apr 5, 2016 11:00 AM
Model visualization? Data visualization has gained traction in the past few years, with numerous interesting books and talks focusing on improving our data visualization skills. JMP’s own Xan Gregg recently spoke about data visualization on Analytically Speaking). Model visualization is simply applying data visualization to models.
When I can “see” the model, my confidence and my ability to share and explain the model also increases. While the Profiler is one model visualization tool in JMP not to be overlooked, here I will look at a simple data plot for a specific type of model as an illustration of model visualization.
One area that I work in is medical diagnostics (e.g., a blood tests for some condition). We often start with a large number (say 1,000) of potential model factors (often referred to as markers, as in a biological marker for a disease, condition or state) and then work to reduce this number to a small number (say 10) factors to build an algorithm (model) that can be implemented on a piece of lab equipment.
Throughout this process, we have candidate models to evaluate. ROC curves and the area under the ROC curve (AUC) are one standard way to evaluate such diagnostic models. However, while these are helpful, they don’t tell the whole story. Different-shaped curves can have similar AUC, and often one area of the curve (say where sensitivity is high) may be of more clinical interest than the whole curve.
In addition to ROC curves, I have found that a plot of the disease state by the model outcome using data points as well as violin plots (as shown below) aids in the evaluation and understanding of the model. This plot provides a sense of how well the model can separate the data and provides an immediate feel for how shifting the cut-off (the value used to distinguish positive from negative) impacts the diagnostic performance.
Such a plot is also useful in understanding whether the diagnostic test may be more useful as a three-way test where you can identify with confidence negatives and positive subjects but have subjects in the middle for whom additional information is warranted before a clinical judgment is made.
Model visualization is an integral part of my model building strategy. Do you have favorite model visualization?