cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
An analyst’s new year’s resolution: Use models to solve more problems

2021 goals road.png

 

As a new year begins (was there ever a time you wanted a fresh start more???), you may have begun formulating some new goals for the year or started working toward them already. We are a few days into 2021, so hopefully you haven’t yet abandoned the ones you made!

If you have room for another new year’s resolution (or if you need a new one), may I suggest one that I have heard a lot of analysts articulate to me: “I would like to use models (or more do more modeling) in my work.” This goal sounds more achievable than “I will lose 10 lbs.” or “I will get eight hours of sleep,” right?

Analysts are a motivated bunch, but they often get stuck because they don’t have a method to follow. The biggest barrier I have seen from analysts who want to use models to solve their problems is that they get confused and/or bogged down with all the outputs they need to look at once they’ve built a model.

“Which model outputs should I be looking at? There are so many! And how should I interpret them?” 

The truth is this: There are a lot of outputs you can look at, and it can be pretty overwhelming if you don’t know which ones are the most important and what they are telling you.

move quickly with integrity.png

Quick detour

Knowing this is a topic (and what I put forth in this series) that may spark a lot of debate and discussion, let me tell you a little bit about my background so that you have some context for my point of view. My education is in biomedical engineering, and I worked in medical device R&D for several years before transitioning to a role as an applied statistician for a consumer packaged goods company (CPG) supporting shampoo formulators in designing and analyzing experiments to better formulate shampoo. (The next time you wash your hair, pay tribute to the statistical models that were used to perfect the lather volume and texture!)

After leaving the CPG company, I spent a good chunk of my career using data and analytics in retail. The scientists, engineers, business analysts and other collaborators I worked with all had to use data, analytics and software to make quick but sound decisions. A few had some formal training in statistics, while the majority simply learned statistics and statistical software on the job and are looking for clear guidance on best practices with an emphasis on the principal “less is more.” I tried to structure this guidance with these practitioners in mind (myself included).

What you'll learn

In this five-part blog series, I am going to focus on regression model outputs, specifically, Standard Least Squares output because that is the most commonly used method used for modeling. Standard Least Squares are a great beginning, particularly for those new to modeling.

First, be comforted by the fact that the statisticians and developers at JMP set up the default reports in a way that presents the most important information to analysts.

For those already using JMP: Have you ever noticed the Emphasis in the Fit Model input dialog and wondered what it is? I know I was pretty confused about it for a while. In contrast to the Personality options right above it, Emphasis does not change the type of analysis that is done. The different options simply turn on different model outputs as a default when you click Run. JMP uses the amount of data (number of rows) you have in combination with the number of model terms you want to estimate (number of terms in the Model Effects dialogue) to determine the best set of outputs to show you by setting a default Emphasis (Figure 1). JMP is always guiding analysts toward the best path, and the Emphasis selected is no exception. JMP uses the contents of your data to set the default Emphasis. (For more specifics on how it does this, you can read the documentation.)

Figure 1: Standard Least Squares Emphasis. JMP sets a default Emphasis based on your data, but you can change it. Emphasis determines the set of outputs that are shown; it does not affect the statistical analysis.Figure 1: Standard Least Squares Emphasis. JMP sets a default Emphasis based on your data, but you can change it. Emphasis determines the set of outputs that are shown; it does not affect the statistical analysis.

The outputs that are shown after you click Run are the ones you should focus on first. The outputs that are the most important given your data and the model you are trying to fit are expanded and displayed.

That being said, there are still a lot of graphs and tables to look at and without some training, you will not know which outputs to prioritize and how to interpret or take action on those outputs. Sadly, there still is no pill that you can take to give you this knowledge! 

Until that pill exists, I hope this blog post serves as a quick guide.

Model health and interpretation menu

Model outputs can be broken into two main categories of outputs: model health and interpretation. Model health outputs help you determine if your model is healthy or if you need to perform some “first aid.” Interpretation outputs are those that help you decide which factors are important and how they influence the response(s).

I have developed the menu in Figure 2 to help you focus on the key outputs based on how deep you want to go (or how hungry you are, so to speak). I think it’s only appropriate to use healthy foods in this menu because we are all on our best behavior (or have the best intentions) when setting our new year’s resolutions! My previous drafts of this blog post used a fast food menu board, which seemed…incongruous . I offer you three meals:

Figure 2: Focus in on the key outputs in Standard Least Squares.  Choose your meal based on your appetite!Figure 2: Focus in on the key outputs in Standard Least Squares. Choose your meal based on your appetite!

 

  • The Veggie Plate has the core outputs only and is for the analyst who is looking to move quickly or whose analysis allows for that simplicity.

The Veggie PlateThe Veggie Plate

  • The Veggie Plate + Healthy Fat (salmon is one of my faves!) includes the Veggie Plate outputs plus additional outputs that can help with finding remedies for poor model health or for analysts who really like getting into the data. Omega-3s have so many healing benefits!

Veggie Plate + Healthy FatVeggie Plate + Healthy Fat

 

  • Finally, the Full Square Meal includes outputs (“Whole Grains”) that are appropriate when you are struggling to get a good model or when you are just feeling extra hungry for more outputs to help you understand your problem.

The Full Square Meal: Veggie Plate + Healthy Fat + Whole GrainsThe Full Square Meal: Veggie Plate + Healthy Fat + Whole Grains

 

Taste testing the menu

If you want to follow along, I’ll be using the Tiretread data set in JMP’s sample data library. If you don’t have JMP, you can download a free 30-day trial or use it in a remote desktop environment when you take the Statistical Thinking for Industrial Problem Solving course (also free; link in the P.S.).

The Tiretread data set comes from a designed experiment where the goal was to optimize four product characteristics (responses): Abrasion, Modulus, Elongation and Hardness using three ingredients (factors): Silica, Silane and Sulfur. The analysis of the experiment involved building and using four (one for each response) standard least squares models relating the factors (via model terms describing main effects, interactions and quadratics*) to each of the four responses.

 

*Not to beat a dead horse, but the Statistical Thinking for Industrial Problem Solving modules on Correlation and Regression and the Design of Experiments will help you understand the terminology better.

Summary

I hope you will be bold in 2021 and pursue this goal of using models in your work and learning how to interpret the outputs. Know that JMP has your back with the default settings in the Fit Model platform. JMP considers the amount of data you have and the number of effects you are trying to estimate and presents the most important outputs to you.

And I also have your back! I hope you will continue reading the rest of this blog series to get an additional boost of confidence using models.

In the next three posts, I will describe how to interpret the outputs that are in each of the three “meals” and what actions to take in response to them where appropriate. In the last blog post of this series, I will show how you can set your Preferences in JMP so that your favorite outputs always show up in your model results.

Back in a week!

P.S. To go deeper on the topic of modeling, I highly recommend the Statistical Thinking for Industrial Problem Solving course we offer that is free, specifically the Correlation and Regression module. It goes deeper into the statistical concepts and includes hands-on exercises as well (along with free access to our software).

Last Modified: Feb 3, 2021 4:14 PM
Comments
Phil_Kay
Staff

Nice. Great idea for a blog series, @wendytseng. This should be really valuable for our scientist and engineer customers that are struggling with how to use their data. If you can understand Fit Model and standard least squares you can go a long way. And I really like your use of analogy.

qacarolk
Level I

Great information - love the pictures! Looking forward to more meals.....  

@wendytseng 

wendytseng
Staff

Thanks for your interest, @qacarolk ! Excited to have you join in the discussion.

P_Bartell
Level VIII

Love the food/meal metaphor. How about looking over the 'ingredients' BEFORE the main course? I found plotting univariate to multivariate response and predictor vs. response graphs before modeling a great way to find outliers, suspicious data, or features in the data that might influence the quality of the results of the main course. It's a great way to insure that the ingredients in the main course are indeed fresh and ready for consumption. 

wendytseng
Staff

@P_Bartell I love the analogy!  That is a great idea for a follow-up blog post: Distribution, Outlier Screening, Missing Data, etc.