Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
In my previous blog entry, I discussed the discrepancy between WordPress and Google Analytics view counts. Today, I'd like to look more closely at what the data may be telling us.
If I think about modeling the number of WordPress views that show up on a blog entry, I would think using Google views would be sufficient even on its own as a predictor. Recalling the scatterplot from the previous post, it still looks like something is amiss (for those interested, the r-squared value is 0.44):
I do have additional information at my disposal, so I thought it would be interesting to see if anything else seems to drive the WordPress views. As a first try, I want to keep things simple, but here are the variables I have at my disposal:
Google views: Should be a good predictor, but I’m not so convinced based on what I see.
Days online: I would expect that if I didn’t have the Google views, the longer a blog post is up, the more views it gets – but how about if I also have Google views in the model?
Comment count: The number of comments that appear in the bottom of the blog entry.
Tweets: You would think this is self-explanatory, but the numbers don’t always match up to what shows up in Twitter. Notionally tweets and retweets, but some adjustment gets made for a person (re)tweeting more than once – I’d be curious to hear your own experiences with this.
Facebook “Like”: Is not necessarily what it appears to be. This count is related to the number of times the blog post has been shared, the number of “Likes” it receives and the number of people who comment on it. Not that I would know from experience (haha), but this means that you can have a discussion on a blog entry appearing on Facebook that has nothing to do with views to the blog post itself.
LinkedIn Shares: This number seems to be a bit more reliable. This number is related to the number of times someone has clicked on a link via LinkedIn to bring them to the blog entry. However, this only really helps if someone has actually posted said blog entry on LinkedIn.
Did anything seem useful?
My different attempts at modeling are best left for another day/blog post, but here are the results from using Fit Model with Stepwise using the above factors with main effects and two-factor interactions for the factors above:
Actual by Predicted Plot
While it’s not surprising that Google views is in the model, what did surprise me is how much comments drive up the WordPress count. In addition, I might have thought Google views would already account for the days online, but the fact that it doesn’t (as well as the nonzero intercept) suggests that WordPress counts end up accumulating much more over time.
The LinkedIn counts also show as significant, but not quite at the same level as days and comments in some of the other modeling I tried. Based on the data, it’s not possible to tell how many people actually saw the link and were encouraged to click on it. Likewise, with tweets not even showing up in the model, I don’t have the information as to how many followers the people who tweeted have. So using LinkedIn and Twitter counts don't really help us understand the view count popularity.
I also looked for a model that didn’t even use the Google views, and while it had some extra terms, it still looks pretty good:
I should mention that fitting a model for Google views is not very effective (even with the number of days, etc., which is a bit surprising). While there were more terms in fitting WordPress views without using Google, the biggest drivers were days online and comments.
Trying to use this model for previous years doesn’t perform that well. There are some posts that seem to get lots of traffic, whether due to keywords or some other mechanism. As for the large discrepancy between WordPress and Google, I think the true number of views (whatever that actually means) is somewhere in between. Some of the Google views are so low that it’s hard to imagine so few people have seen some of the entries. However, it’s also hard to gauge how many people have “viewed” entries via the index page for the JMP Blog (http://blogs.sas.com/content/jmp/) rather than clicking on an individual post to read it.
Note to self (and my fellow bloggers): To try to make the top 10 list for 2015, post early and get lots of comments… although I doubt leaving myself many comments will have the desired effect. An anonymous colleague jokingly offered another idea: purposefully add typos, hoping a kind soul leaves a comment to correct it.