Brendan Nyhan

James O. Freedman Presidential Professor of Government

Dartmouth College

The attack on election forecasting straw men

In recent days, journalists, bloggers, and commentators have reared up to bash a fictitious conventional wisdom about election forecasting.

The premise for many of these statements is that political scientists believe that campaigns and other non-economic factors don’t matter in presidential elections. For instance, The Daily Beast’s Michael Tomasky describes “the political-science theory of presidential elections and economic determinism” as “pretty much strictly a function of economic conditions.” At Real Clear Politics, Sean Trende states that Emory’s Alan Abramowitz thinks “presidential elections can be reduced to a simple equation.” And in a Bloomberg View column, Ronald Klain, the former chief of staff for Al Gore and Joe Biden, writes that “a group of political scientists, mathematicians and scholars have argued that a handful of factors determine of presidential elections, irrespective of the campaigns.”

But as the political scientists John Sides and Seth Masket have already pointed out, these are straw men. Very few political scientists think campaigns don’t matter or that elections can be perfectly forecast in advance. In an earlier post, Sides expressed this point well:

Because people continually overestimate the effect of campaigns, this blog holds up the other end of the dialectic by emphasizing the economy and defending those who do. But plenty of research has identified the effects of campaigns too… it’s time to abandon this whole it’s-either-the-economy-or-the-campaign dichotomy…

Even New York Times blogger Nate Silver, who has become something of a professional critic of political scientists, concedes the point in a post today, writing that the view Klain ascribes to forecasters “is certainly not the majority opinion within the discipline.”

What’s bizarre about this flurry of articles is that election forecasting is such an obscure topic in the political press. The conventional wisdom that presidential election outcomes are largely unpredictable in advance and that the outcomes we observe are primarily the result of campaign strategy is vastly more prominent. So why is everyone so worried about forecasting models?

A related straw man is the idea that political scientists think their models make perfectly precise predictions. Here, for instance, is what Silver wrote:

[P]olitical scientists as a group badly overestimate how accurately they can forecast elections from economic variables alone. I have written up lengthy critiques of several of these models in the past…

The three posts that Silver links to critique a historian’s non-quantitative model which few political scientists would endorse, Ray Fair’s forecasting model, and the “Bread and Peace” model of Douglas Hibbs. Only the last two are representative of the field, and political scientists have criticized Fair’s model at length in the past (PDFs).

More generally, as Jacob Montgomery and I wrote last week, there is certainly reason to be concerned that these models are too confident about their predictions, but most sophisticated quantitative researchers in political science are aware of these concerns and do not interpret the forecasts so literally. Moreover, we can evaluate which models perform well in making forecasts beyond the data used in estimation and combine their predictions to create more accurate forecasts with appropriate estimates of uncertainty, as Montgomery and his co-authors do in their article (PDF). Silver dismisses this exercise as a “game show” and disparages any attempt to evaluate the models by their future performance — “most of how they perform over the next few elections will be determined by luck” — but we can and should aspire to better.

[Cross-posted to HuffPost Pollster]

November 16, 2011
Twitter roundup

(more…)

November 14, 2011
Forecasting 2012: How much does ideology matter?

By Brendan Nyhan and Jacob Montgomery

One year in advance of the 2012 election, New York Times blogger Nate Silver published a presidential forecasting model. The model includes measures of presidential approval and economic performance — standard variables in election forecasting models — as well as a novel measure of challenger ideology that appears to have substantial effects. Based on this model, Silver estimates that “The difference between [Mitt] Romney and [Rick] Perry amounts to about 4 percentage points” — a huge predicted effect that could easily swing the outcome of the election. Consider, for instance, Seth Masket’s graphic illustrating how the predicted probability of a Republican win depends heavily on the estimated ideology of the GOP candidate:

Though candidate positioning is likely to influence presidential election outcomes, there are important reasons to question whether the challenger ideology effect Silver identifies is so powerful.

First, when the economy is growing and presidential approval is high, strong moderate candidates may be scared off from entering the race, leaving only ideologues. A similar effect has been shown when one party has held the presidency for a long period of time. When this happens, the opposition tends to perform better due to the perception that is “time for a change”, and opposition parties are likely to nominate more moderate candidates in the hopes of regaining control of the White House at the expense of ideological purity.

Second, the estimates of challenger ideology that Silver uses are primarily drawn from voter perceptions of the candidates. However, these perceptions are driven by the content of the campaign, which is itself shaped by the economic context. Candidates who appear extreme in one era may seem less so in the next (consider the changing perceptions of Ronald Reagan between 1976 and 1980, for instance). For all of these reasons, Silver’s estimates of the effects of challenger ideology and election outcomes are likely to be significantly exaggerated.

In addition, as we demonstrate below, Silver’s model does not substantially improve the accuracy of presidential election forecasts, which casts further doubt on the importance of candidate ideology (see also Alan Abramowitz).

Silver’s model includes three predictor variables – presidential approval one year in advance of the election, election year GDP growth, and an estimate of challenger extremism (i.e., the extremism of the candidate of the party that doesn’t control the presidency at the time of the election). Using just three variables to predict the outcome of a presidential election may seem simplistic, but in forecasting simplicity is a virtue. With only 17 elections since 1944 to work with, including many indicators in a statistical model is likely to result in the identification of factors that are highly correlated with the election results we have already observed, but which do a horrible job in predicting the future.

For related reasons, Silver criticizes other forecasting models that use relatively obscure economic variables such as growth in real per-capita disposable income:

The government tracks literally 39,000 economic indicators each year…. When you have this much data to sort through but only 17 elections since 1944 to test them upon, some indicators will perform superficially better based on chance alone…. The advantage of looking at G.D.P. is that it represents the broadest overall evaluation of economic activity in the United States.

As Silver notes, there are legitimate reasons to worry that the search for statistically significant predictors will result in identifying indicators that perform well by “chance alone” (an extreme example: Washington Redskins home wins). Using such indicators can cause us to be overconfident in our statistical models (what statisticians call overfitting the data) and tends to make accurately predicting future events — like next year’s election — very difficult or impossible.

As you might expect, scholars have spilled a lot of ink debating the best forecasting indicators for outcomes ranging from the paths of hurricanes to stock prices. But rather than have a philosophical debate, we can evaluate this concern empirically to determine the extent to which specific forecasting models can successfully predict election outcomes beyond the range of the data used to estimate them. In particular, if models are spuriously identifying chance relationships, then they should perform relatively poorly after the point at which they was first published.

To do so, we began with Silver’s source data, which was compiled from the New York Times website and generously shared with us by Harry Enten.* Using a standard linear regression model, we almost precisely replicated the coefficients in the Javascript code for the interactive calculator on the Times website.

As a starting point for evaluating Silver’s model, we first compare it with the Douglas Hibbs’ “Bread and Peace” model, which uses the real per-capita disposable income variable described above. Silver has previously criticized the Hibbs model as performing relatively poorly outside the range of the years in which the model was first estimated (1952-1988). Here is the key graphic in question:

However, when we estimate both models using the same data as Silver’s graph above (1952-1988) and predict the outcome for the 1992-2008 period in terms of the share of the two-party vote received by the party of the president (the standard outcome variable in the literature), we find that Hibbs’s model generally performs better than Silver’s (Stata data and do files available upon request):

Of course, these are not the only available models for comparison. Indeed, political scientists and economists have estimated dozens of other presidential forecasting models over the past twenty years. For example, PS: Political Science and Politics published a pre-election symposium in 2008 that included presidential election forecasts from numerous scholars (see also here, here, here, or here). Most such models make predictions based on economic conditions and/or public opinion, but they typically do not include a measures of candidate ideology.

While it is fun to compare the performance of these forecasts, we should be clear that there is no one “correct” model. Rather than relying on a single model, we can instead combine the forecasts of numerous models using a technique called ensemble Bayesian model averaging, which creates a composite forecast weighted by the predictive performance of the component models. This approach was developed for combining weather forecasting models (see here) and has been applied to political outcomes in a paper (PDF) by Montgomery, Ward, and Hollenbach.

The figure below, which uses the methodology described by MWH to create Figure 3 in their paper, compares one-step-ahead predictions from Silver’s model, the most recent versions of six prominent models in the literature (Campbell’s “Trial-Heat and Economy Model,” Abramowitz’s “Time-for-Change Model,” Hibbs’s “Bread and Peace Model,” Fair’s presidential vote-share model, Lewis-Beck and Tien’s “Jobs Model Forecast,” and Erikson and Wlezien’s “Leading Economic Indicators and Poll” forecast**), and a composite forecast created using the ensemble technique. The forecast of each model is plotted with its 67% and 90% confidence intervals against the eventual outcome, which is represented with a dotted line:

Silver’s model performs well in some elections, but it is very inaccurate in comparison to its peers in 1992 and 2008. With those exceptions, it does not appear to differ from other models dramatically, though its overall performance is worse on average than the comparison models. The ensemble forecast appears to perform quite well, producing predictions that are relatively close to the actual outcome.

The figure below, which is adapted from Table 4 in MWH, compares the accuracy of the models more precisely using mean average error — an intuitive (though imperfect) metric for comparing forecasting models that measures the average amount by which they mispredict the final outcome.

We can see that all of the models are relatively accurate on average. They mispredict the vote share for the incumbent party by an average of 1.7 to 3.4 percentage points — an impressive record given that most models include only two or three variables. By this metric, Silver’s forecasts are the least accurate in the group and the ensemble forecast is the most accurate. (See MWH for a discussion of the extent to which these models appropriately express uncertainty about their predictions.) Since some of these models — and implicitly the ensemble model that relies on them — have been around for twenty years, this result should not be especially surprising. The literature on presidential forecasting is relatively mature.

At this point, we should note two important but wonky caveats. First, we follow MWH in using the most recent version of each of the forecasting models from political science. In some cases, model specifications may have been revised to account for previous results, which could artificially improve their performance in one-step-ahead prediction tasks (see footnote 14 in MWH). Second, these models differ in the extent to which we would even expect them to make accurate forecasts far in advance of a presidential election. For instance, Campbell’s model includes the president’s trial heat performance on the Labor Day before the election. Silver’s model, on the other hand, takes on the more ambitious challenge of using approval data from a year before the election (though it relies on GDP growth during the election year, which is of course not available in advance).

Ultimately, almost every analyst agrees at this point that it is still too soon to say with much confidence whether President Obama will win in November. In particular, there is still too much uncertainty about the state of the economy next year. However, both theory and data suggest that the conservatism of his opponent is likely to matter less than Silver’s model suggests.

Correction 11/10 1:19 PM: A previous version of this post stated that James Campbell’s “Trial Heat and Economy” model uses presidential approval on the Labor Day before the election as a predictor. It actually uses the president’s performance in a trial heat poll against his opponent on Labor Day.

Update 11/11 10:57 AM: We would like to be clarify that the model comparison we performed did not directly test whether adding estimates of challenger ideology to existing forecasting models would improve their performance. It would be desirable to do so. In the time that we had, our goals were to (a) raise concerns about causal inference and the difficulty of measuring challenger ideology and (b) compare the performance of Silver’s model against others in the forecasting literature.

Update 11/16 1:48 PM: See also Silver’s followup post and my response to him and other recent critics of election forecasting.

[Cross-posted to HuffPost Pollster]

* Silver’s challenger ideology data are primarily derived from The Party Decides by Marty Cohen, David Karol, Hans Noel, and John Zaller. We used the exact data underlying Figure 4.1 in The Party Decides as provided by Noel. The Times presents these data on a 0-100 scale but the underlying data are actually on a 0-7 scale based on estimated distance from the ideological center. We used Silver’s challenger ideology estimates for John Kerry in 2004 and Barack Obama in 2008 but convert them from the 0-100 scale to the original Party Decides metric.

** All of these authors generously shared their data with MWH.

November 10, 2011
Twitter roundup

(more…)

November 7, 2011
Twitter roundup

(more…)

October 31, 2011
Twitter roundup

(more…)

October 26, 2011
Twitter roundup

(more…)

October 21, 2011
Twitter roundup

(more…)

October 17, 2011
Perry’s “16th century” gaffe (with audio)

I attended the post-debate fraternity event last night where Texas governor Rick Perry mistakenly placed the American Revolution in the 16th century:

“Our Founding Fathers never meant for Washington, D.C. to be the fount of all wisdom. As a matter of fact they were very much afraid if that because they’d just had this experience with this far-away government that had centralized thought process and planning and what have you, and then it was actually the reason that we fought the revolution in the 16th century was to get away from that kind of onerous crown if you will.”

The incident has been widely reported, but I haven’t seen any audio or video, so here is an MP3 audio clip from my recording of the event. It’s low-quality but hopefully intelligible.

While I think the whole issue is pretty trivial, it does threaten to solidify the conventional wisdom that Perry is dumb. Yesterday, for instance, Wall Street Journal editorial writer Joseph Rago said at a pre-debate panel at Dartmouth (where I am on the faculty) that Perry seemed like he “had some sort of mental disability” during the previous debate. A few more incidents like this and Perry will be covered more like Sarah Palin and Dan Quayle than Mitt Romney. And as Palin and Quayle can tell you, once the narrative forms, reporters start looking for anecdotes to reinforce the story they want to tell. It’s a cycle that is very difficult to break.

Update 10/12 11:59 AM: NBC News has video:

Visit msnbc.com for breaking news, world news, and news about the economy

October 12, 2011
Perry’s challenges in Dartmouth GOP debate

Going into tonight’s GOP debate at Dartmouth College (where I am a faculty member), the challenge for Rick Perry, as TAP’s Jamelle Bouie notes, is to reassure nervous elites that he’s a capable national-level candidate while attracting support from anti-Romney conservatives who have swung toward Herman Cain:

Romney is leading the field with 38 percent support among likely voters in the New Hampshire presidential primary. Herman Cain takes the second place spot with 20 percent of the vote, and Ron Paul finishes third with 13 percent of the vote. The remaining candidates, including Rick Perry, poll at 5 percent or less.

This obviously isn’t great news for the Texas governor. But it’s not terrible news either. The simple fact is that Herman Cain isn’t a serious candidate. His policy knowledge is slim and his political organization is nonexistent. Yes, he’s traveled to a few primary states, but that has more to do with book sales than it does with actually running for president. Sooner or later, his bubble will pop, and he’ll fall back down to earth.

But while Cain’s candidacy is a sideshow, his constituency is not. Cain represents the largest faction in the anti-Romney wing of the Republican base, which is as large—if not larger—than Romney’s own base of support. In New Hampshire and elsewhere, these voters have attached to Cain for lack of a better choice.

To put this another way, Herman Cain has sucked the oxygen out of Perry’s bid for anti-Romney conservatives. As such, Perry’s task for tomorrow’s debate and the weeks ahead, is to reassure Republicans of his conservative credentials and re-establish himself as the real alternative to Romney. Part of that, as I noted earlier, will involve attacks on Romney’s record. But part of it, I think, will require Perry to gently show conservatives that while Herman Cain is a great guy, he’s not quite presidential material.

The problem is that Perry is uniquely ill-suited to go after Cain. First, the former Godfather Pizza CEO’s primary vulnerability is his lack of detailed policy knowledge, but the same is true of Perry. In addition, it would be awkward for Perry to target Cain so soon after the controversy over a racially offensive term painted on a rock at a hunting camp leased by Perry and his family. Cain, the only African American running for the GOP nomination, said afterward that Perry showed “a lack of sensitivity.”

For these reasons, it’s likely that Perry will instead focus his fire on Romney as he did at the Value Voters Summit and in an online video. He has to hope that other contenders will take on Cain in the hopes of attracting some of the gadfly candidate’s supporters once his boomlet dissipates.

What’s been strange to observe, though, is how Perry’s handlers and allies have failed to play in the expectations game in a savvy way. Rather than downplaying his likely performance in the debate tonight, they seemed to promise a major improvement in a New York Times story on Perry’s struggles that included an unfortunate comparison of the candidate to a “tired puppy.” By comparison, expectations for George W. Bush were set so low that it was considered a victory when he “survived” his first debate in 1999 “without any major gaffe” and the AP later reported that “Even the Texan’s allies sounded underwhelmed” by his early debate performances. If Perry’s camp is smart, they will avoid creating an expectation of a dramatic turning point that he is unlikely to deliver.

Going forward, Perry’s principal challenge is to stay viable so that more elites don’t defect to Romney. He is well-funded and has a favorable primary calendar. Regardless of his standing in national polls, he has a decent chance to mount a comeback against Romney because support in multi-candidate primaries is so fluid. When there are relatively minor ideological differences between candidates, it’s possible to make rapid gains as voters shift to their second or third choices for strategic or stylistic reasons. If Perry can adapt to the rigors of a national-level campaign, his odds of consolidating enough of the anti-Romney vote to win the nomination are significantly better than the current Intrade estimate of 18.9%.

[Cross-posted to HuffPost Pollster]

October 11, 2011