An analyst’s new year’s resolution: Add some good fat to your diet when necessary

wendytseng · Jan 26, 2021 02:16 PM

good fat.png

Am I reading a statistics blog post or a nutrition and/or self-help guide? If you are a bit thrown off by the title of this blog, you are likely new to this blog series. This post is part three of a five-part series where I address questions I have gotten a lot from scientists and engineers over the years: Which model outputs should I look at? What do they mean? What do I do with them?

With the goal of simplifying and providing clear guidance (not to make you hungry or make you feel guilty for eating processed foods), I have devised a framework that is shown in Figure 1. This “menu” is one you can use to focus in on the key outputs based on your situation (your data and disposition).

Figure 1: Key outputs in Standard Least Squares. Choose your meal based on your appetite!

In this post, I am going to focus on the outputs that you can look at in addition to the Veggie Plate outputs if you need some first aid for model health issues, or if you are a data nerd and want to nerd it up (you’re in good company, so hold your head high!). First, let’s review the outputs that you can use to answer the question, “Is my model healthy?” Next, let’s examine the outputs that will help you interpret the results to increase your understanding of the process and communicate them to your colleagues. For each of the outputs, I’ll try to be clear and concise about: 1) what the output is, 2) what you should look for and 3) what you can do with the information.

If you want to follow along ("taste test") with the sample data set that I use in the screenshots, see the first blog post.

Is my model healthy?

Lack of Fit

*Lack of Fit

Box-Cox Transformation

Transforming Data to Make Better Predictions

How do I interpret the results?

Variable Importance

Assess Variable Importance

Effect Tests

Summary

In this post, we discussed two outputs to help you assess and remedy, if necessary, model health issues: Lack of Fit and Box-Cox transformation. We also discussed two outputs to help you further understand the process and, ultimately, decide how to improve the process and communicate your actions to stakeholders: Variable Importance and Effect tests.

If you are working with an historical data set that may have a lot of correlated predictors, I encourage you to push forward to Part 4 of this series next week, where I’ll be discussing VIF’s and how they can help you assess model health. I’ll also be walking through Interaction Plots, which can be a helpful way to visualize how factors are influencing each other.

As mentioned in previous posts, there is no substitute for the foundational knowledge that the Statistical Thinking for Industrial Problem Solving Course can give you. Consider completing the Correlation and Regression module, specifically. Earn a badge upon completion that you can post on LinkedIn, add to your résumé and annual review. Make 2021 the year of personal and professional growth!