If you remember, in both the ANOVA and simple linear regression, we calculate the total Sums of Squares the same way, we take each data point and we calculate the difference between it and the overall mean of all the data points. So the total Sums of Squares (or SST) is the same whether we look at Rhianna songs continuously by year or by pre- and post-2010. So where’s the difference?
The difference is in the Sums of Squares “Model” (in regression, we call this Sums of Squares Regression; in ANOVA, we sometimes call this Sums of Squares Group). In either case, our “Model” is how we think X will affect our Y, our variable of interest. In the regression case, we think year will have a continuous linear effect on Y – each unit of X will have an added effect on Y. In the binned ANOVA case, we think that being pre- or post-2010 will have a group effect on Y.
Our SSM for ANOVA bins pre-2010 together, so instead of a general downward trend throughout years, we would simply expect that pre-2010 songs have a higher # of distinct words compared to post-2010 songs. Under this ANOVA model, our best guess for any 2010 song would be the group mean. Everything else is considered error.
The vertical distance between the red and blue highlighted dots is the average difference in power when you dichotomize vs. leave your variable as continuous. When your continuous model has 80% power (n=20), your dichotomized model has only about 65% power – that’s a 15% reduction in power! To have 80% power with your dichotomized model, you’d need more cells, humans, or isotopes, and NSF won't pay for that!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.