Mark Blumenthal quoted me in a blog post at Pollster.com (where I frequently cross-post) commenting on Nate Silver's pollster ratings -- here's what I wrote:
It's not necessarily true that the dummy variable for each firm (i.e. the "raw score") actually "reflects the pollster's skill" as Silver states. These estimates instead capture the expected difference in accuracy of that firm's polls controlling for other factors -- a difference that could be the result of a variety of factors other than skill. For instance, if certain pollsters tend to poll in races with well-known incumbents that are easier to poll, this could affect the expected accuracy of their polls even after adjusting for other factors. Without random assignment of pollsters to campaigns, it's important to be cautious in interpreting regression coefficients.
Silver has expressed bewilderment at the criticism he has received and the lack of competing measures, writing that "the task is not inherently more daunting than, say, trying to determine which are the most skilled baseball players are based on their batting averages." As the comment above suggests, I think this is incorrect.
The problem is that polling isn't like baseball. The performance of hitters is measured hundreds of times per year in a wide range of contexts that the hitters generally do not select (setting aside platoons, pinch-hitting, etc.) and it is relatively easy to adjust for external factors such as stadiums, opposing pitchers, defense, etc. By contrast, pollsters must be hired by candidates or assigned by staff to poll in specific races. In addition, their performance (as measured by forecasts of election outcomes) is measured very infrequently -- only fifteen firms in Silver's ratings have a sample size of more than fifty polls -- and is subject to inherent sampling error. For all of these reasons, it's not clear that we can distinguish between the performance of most pollsters without a heroic level of belief in our modeling assumptions and a neglect of the error associated with the projected ratings.
I'm fascinated by the effort to rate pollsters. Over the years, we casualty actuaries have pondered a comparable task of rating insurance companies' estimates of ultimate loss.
E.g., an actuary may predict that his/her company's ultimate loss to claims from the BP oil spill will be $X. How close is their actual loss to the estimate?
This is a difficult comparison, because the full ultimate loss won't be known for many years. Nevertheless, the IRS has used this sort of comparison to see if an insurance company has under-reported earnings by over-estimating ultimate loss.
Posted by: David in Cal | June 22, 2010 at 11:45 AM
To someone not in the field, your objections aren't at all convincing. Or if they are, they're an attack on the basic idea of measuring polling accuracy.
Say you can't figure out whether or not a certain polling organization is any good. Doesn't it follow that you also can't figure out if any of their polls were any good? If you can measure one, seems like you can get the other one too.
Posted by: Phonebanshee | June 23, 2010 at 04:16 PM
Or a different objection: if you can't measure how good a polling firm is in comparison to the competition, shouldn't the market be completely dominated by the lowest-cost players?
Posted by: Phonebanshee | June 23, 2010 at 04:19 PM
On point 1, we can measure, as Silver does, the extent to which a firm's pre-election polls accurately predicted the election outcome. That's one way to define if a poll is "any good." Unfortunately, combining those results into a systematic measure of skill is difficult given that pollsters are non-randomly assigned to campaigns, that we observe relatively few campaigns for most polling firms, and that their work is subject to sampling error,
On point 2, there are many areas in which people pay a premium despite a lack of evidence that it leads to improved performance -- CEOs, for instance. All it takes is a perception that some firms are better than others. (The market may also respond to other factors such as polling firm prestige, connections, service quality, etc.)
Posted by: bnyhan | June 23, 2010 at 04:42 PM
A part of Political Science involves sophisticated analysis of political polls. If the skill of the pollsters can't be measured, then for all we know most polls may be mediocre. In that case, mightn't Political Science suffer from the GIGO problem?
In other words, those who analyze polls professionally implicitly assume that the polls meet some standard of accuracy and usefulness.
Posted by: David in Cal | June 23, 2010 at 06:33 PM
You mention the differences that make pollster rating more difficult than batting averages, but there are some forces that work the other way. For example, at-bats produce binary values (hit vs non-hit), which decreases the informational value of each at-bat. But polls are continuous values with much less variance per sample, so each poll should be much more informative. I buy that non-random sampling can make it more difficult, but I'm not sure that, say, 20 polls by a small pollster would give less information than 100 at-bats by a called-up minor league player.
(Though, I imagine that academic circles tend to use much stricter confidence levels than baseball managers, which might be a significant increase in difficulty in itself... =P )
Posted by: AySz88 | June 23, 2010 at 06:33 PM