Logistic regression and why I lose at board games to my daughter
There are lots of things that make SAS a great place to work. For me, picking a favorite thing about working at SAS is pretty easy: the onsite child care. I have two children who have been in the onsite day care and preschool, and the teachers and facilities have been truly great. Plus, just knowing that they're nearby throughout the day is awesome (ignoring the occasional meltdown on the commute to/from work). As my older daughter has just started school, I've had plenty of opportunities to reflect on how much I've enjoyed having them on campus.
Last year, one of my daughter's preschool teachers made a board game for each kid in the class. The game is perfect for helping a child learn to play board games: Each turn, you roll a die and move along the board until someone gets to the finish line. Of course, the board is decorated with stickers of popular animated characters to help catch a preschooler's attention. There's only one problem: I seem to always lose. Can I use JMP to figure out what's going on? Better yet, can I use some of the new features in JMP 14?I always lose playing this homemade board game. My daughter's hippo has a big early lead on my dolphin here.
Why do I keep losing? Could my daughter be better at rolling the die? And let's hope she's not cheating. Getting to the bottom of this issue should be easy: We'll play a handful of games, and I'll record the result of every turn for my daughter and myself. Each row in the data set will be a record of who rolled what (1,2,..., 6). If I build a model using my data and the player has a significant effect, I'll know something is up.
A snapshot of the data.With my data collected (about 60 turns for each of us), I just need to build a model for "Roll" as a function of "Player." Since Roll only takes six discrete values (each with equal probability...I hope), I probably should not do a traditional ANOVA, which assums the response is normally distributed. Instead, we can build an ordinal logistic regression model since there is a natural ordering to the response. Luckily, JMP has us covered. JMP has the Ordinal Logistic personality in Fit Model, and the Generalized Regression (Genreg) platform in JMP Pro added support for ordinal logistic regression in version 14. For the purpose of this post, we'll focus on Genreg since it offers a few powerful features (which we'll see in a follow-up post).
Building an ordinal logistic model in Genreg is easy. From the Fit Model dialog, we specify Roll as the response and Player as the only model effect. Then we select "Generalized Regression" from the Personality drop-down menu. Since the Roll column has an Ordinal modeling type, the model dialog knows to make the distribution Ordinal Logistic by default. If the response column did not have an ordinal type, we would not have been able to choose Ordinal Logistic.
Launch Genreg from the Fit Model dialog in JMP Pro.
So what do the results look like? Since ordinal regression is not new to JMP, I won’t go into too much detail about the model we’ve fit. Instead, you can look at JMP documentation or the UCLA Institute for Digital Research and Education site for more information. What are we looking for? If the coefficient for my daughter is negative and significantly different from zero, I know I have a problem. That would suggest that my daughter is more likely to roll larger numbers than me. Looking at the model output, I see that she has a negative coefficient (-.458), but the p-value (.31) wouldn’t be significant at any reasonable testing level. That means she tended to roll better numbers than myself over the games I recorded, but not outside of what is to be expected. The Prediction Profiler is a nice way to see what our model is telling us. The probability lines are diagonal because poor old dad is slightly more likely to roll lower numbers like 1, 2 or 3.
Ordinal Logistic Results
Our response has an obvious ordering: Two is better than one, three is better than two, and so on. But we could ignore that information and treat the response as if it had a nominal modeling type. Genreg can handle that case now, too, since we added support for the multinomial distribution for version 14. When we fit the ordinal model, including the player effect only contributed a single parameter to our model. With the multinomial, the player effect contributes five parameters to our model -- one parameter for each level of our response except the last. This makes the multinomial model much more flexible than the ordinal model, but the model can quickly become unstable if we start adding too many effects to our model. And as we see in the model output, the multinomial model does not suggest that there is a significant Player effect.Multinomial results - still no evidence that my daughter is better with the die.
If she isn't cheating, why do I keep losing?
Neither model suggests that my daughter has a significant advantage rolling the die, so why am I always losing? In our house, we have a rule when we play games: The youngest player goes first. And with only 22 spaces on the game board, whoever goes first has a pretty substantial advantage in getting to the goal first. For such a simple game, it was easy to write a JSL program to simulate the game and estimate how likely the first player is to win. For our game, it looks like the first player has about a 62 percent chance of winning. So I’ll lose about two-thirds of the games we play. That sounds about right. Out of curiosity, I went all the way out to 100 spaces on the board, and the first player still has a decent advantage of about 55 percent.Simulated probability that the player who goes first will win
Want to try it yourself?
Get the data table that includes scripts for reproducing my results and a simulation script that will run a simulation of win probabilities.
Look for the follow-up!
We've used new features in JMP Pro to solve a pretty important problem here: the mystery of why I've been losing at my daughter's board game. So why did I use Genreg to analyze these data? You'll have to check out my next post to find out. But here's a hint: For bigger problems, these models are particularly easy to overfit, which makes the interactive variable selection tools in Genreg especially useful.