June 22, 2010

Why I’m skeptical of Nate Silver’s pollster ratings

Mark Blumenthal quoted me in a blog post at Pollster.com (where I frequently cross-post) commenting on Nate Silver’s pollster ratings — here’s what I wrote:

It’s not necessarily true that the dummy variable for each firm (i.e. the “raw score”) actually “reflects the pollster’s skill” as Silver states. These estimates instead capture the expected difference in accuracy of that firm’s polls controlling for other factors — a difference that could be the result of a variety of factors other than skill. For instance, if certain pollsters tend to poll in races with well-known incumbents that are easier to poll, this could affect the expected accuracy of their polls even after adjusting for other factors. Without random assignment of pollsters to campaigns, it’s important to be cautious in interpreting regression coefficients.

Silver has expressed bewilderment at the criticism he has received and the lack of competing measures, writing that “the task is not inherently more daunting than, say, trying to determine which are the most skilled baseball players are based on their batting averages.” As the comment above suggests, I think this is incorrect.

The problem is that polling isn’t like baseball. The performance of hitters is measured hundreds of times per year in a wide range of contexts that the hitters generally do not select (setting aside platoons, pinch-hitting, etc.) and it is relatively easy to adjust for external factors such as stadiums, opposing pitchers, defense, etc. By contrast, pollsters must be hired by candidates or assigned by staff to poll in specific races. In addition, their performance (as measured by forecasts of election outcomes) is measured very infrequently — only fifteen firms in Silver’s ratings have a sample size of more than fifty polls — and is subject to inherent sampling error. For all of these reasons, it’s not clear that we can distinguish between the performance of most pollsters without a heroic level of belief in our modeling assumptions and a neglect of the error associated with the projected ratings.