« Unlikely Huckabee third-party speculation | Main | Obama support post updated »

February 11, 2008


It looks like you're missing the data point for New York.

I wonder if there's a correlation between Obama's share of the white vote in a state and the share of the state's population that is black....

It would be great if you'd include the r-squared values for those regressions. It would also be pretty interesting to see a multiple regression on all the factors.

The Obama support by black population graph is interesting. It's quadradic fit is a function of two competing trends. In primaries, the blacker the state, the more votes Obama gets. In caucuses, the reverse seems to be true. My guess is that in caucuses, the more blacks are standing in obama's corner, the less comfortable whites are supporting him. In the privacy of the polling booth though, things are different. That's an interesting result.

California is an outlier on so many dimensions that it's going to skew any analysis done that includes us. There are many reasons why Clinton won so handily here, most of which aren't covered at all above (early voting, the unique media markets in this state, the importance of particular power brokers in certain constituencies, etc.).

More importantly, as others have pointed out, the conclusions above are only relevant to the general election if you think there's a significant chance that Obama would lose states like California or New York to McCain.

I'm not so sure about your choice of "Log(population) * Democratic Presidential Vote" as the x coordinate of your last graph. Let's say P is the state's population, and D is the proportion of Democratic votes in 2004. Then if you want to approximate the number of Democratic voters by state, you want P * D. But, as you say, this puts California too far ahead for comparison. So if you do the same trick you did for 5th graph, you should be using Log(P * D) = Log(P) + Log(D), not Log(P) * D. What you've got there is Log(P^D).

Dan makes an interesting point. The phenomenon of Obama doing best in states with the smallest and largest black percentages of population has been discussed widely, and Brendan's curve shows that. But dividing the results between primary states and caucus states seems to yield a different result. Just eyeballing the data in Brendan's graph, it appears that if only primary states are considered, the result would be a direct correlation between percentage of black population in the state and Obama's margin. Only Utah seems to be an outlier in that case. Of course there aren't many data points, and plenty of other factors are present.

It seems to me that the key piece of information you're forgetting is where Obama spent most of his campaign efforts.

Looking at polls over the last year, Hillary Clinton started with a considerable advantage (20-30%) both nationwide and in most individual states. Obama tended to close the gap only in the last 2 months. There is a decided uptick in his numbers just before the election in most states, with the notable exception of Florida (where the candidates agreed not to campaign.)

My assumption is that Obama focused on the initial states (IA, NH, SC, NV) and then the smaller, less costly states before super Tuesday and his efforts paid off (I rarely saw any campaign ads in California).

Now with super Tuesday passed, candidates can focus state-by-state again. If my assumption is true, you'll see significantly better than expected numbers for Obama in the upcoming races.

The US isn't just some mass carved up into states with different proportions of whites, blacks, and hispanics, and democrats and republicans. There are pretty big regional variations in political culture. None of this analysis seems to take into account any of that. Which makes it close to useless, I think. I noted in the comments that you were making pretty unreasonable extrapolations from the behavior of whites in SC to the whole country and here you seem to be making a version of the same mistake writ large. Did you try variables for "southerness", or "westerness", for example? What about urban / rural or Catholic / Protestant? I guessing that "big state" is not a really useful analytic category, but any of the ones that I mentioned might be.

Also, the fellow on DailyKos used Southern Baptists as proxy for cultural southernness which was pretty clever given the increasingly complex political geography of the "New South" . . .

Bingo! Thanks Brendan! I told ya all along.

I'd be interested in seeing white religiosity or a broader religiousness measure per state among whites. Maybe use the state level turnout in the 2004 general from exit polls among whites who attend church regularly. Midwestern states like Wisconsin don't have a lot of Southern Baptists, but do have a lot of highly religious whites.

Question: What would be the delegate count if the Dems had a winner take all primary? Thanks

Question: The bulk of these states happened on Super Tuesday. The difficulty of campaigning in 22 states at once in a very short window of time has me asking what if California had been held on a Tuesday like today (Feb. 12) -- fewer states, more camapaign time devoted to the state -- any difference? I guess my thought is that Obama doesn't have a "big state problem" but a "time in state" problem. Once that might not be so important in the general election.

In population genetics, we call this sort of statistical analysis "story telling". This description is supported by the independent observation that these stats. will help you fall a sleep faster if read to you in a gentle, parental tone.

You need to provide the statistical significance (p values) for all of your fits. Just from looking at them, I can tell you right off that you do not have acceptably low p values (p = 0.05 is a typical cut-off in the scientific community) for many of the trends you report. Without statistics, those plots amount to reading tea leaves.

Steve, as I try to make clear above, this is just an exploratory, descriptive analysis (see here for my real political science research). And of course, as I also try to make clear, many of these (apparent) relationships are not statistically significant, which is a given with such a small sample. However, as noted above, I don't put much stock in p-values.

Another vote for r2s here. And I hear you about the p-values, but reporting them corrected for the number of hypotheses tested would put them in perspective.

Given that you show a difference (pval??) between primary and caucus support for Obama, it'd be interesting (and defensible) to split analyses up that way. For example, the first graph, Obama support by black population--I eyeball there that the primary states alone would be a decent linear fit, while the caucus states wouldn't really fit any model on their own but would cluster well above the primary line. Wouldn't that be consistent with the Bradley Effect (and the supposition that the public nature of caucuses suppresses it)?


Thanks for the nice mention of my piece. On the issue of endogeneity of my "campaign effects" variable - I had a few comments.

My intention of including the variable was as a way to differentiate the early and the big 2/5 states - Iowa, New Hampshire, South Carolina, Nevada, California - from the remaining states.

It seems to me that endogeneity would be a problem if and only if (a) candidates know a priori their standing in the states, (b) that knowledge induced them to change their campaign itineraries. I think this is a problematic case to make in the primaries.

In the general election - it is not a problem because candidates can develop early and confident estimates of their standing in the polls because of party identification. It serves as a quickie cue - i.e. McCain knows he is not winning Vermont and Clinton knows she is not winning Utah. But in a primary campaign - without party identification operating as a cue - there are problems with assuming that this variable is endogenous.

First, most of the polling in most of these states for most of the cycle is done of voters who are not paying attention. And, indeed, most of the polls have "broken" in one direction or another late in the cycle...*after* this independent variable has already been largely formed. So, even if a candidate like Obama saw that he was "down by 20 points" in New Hampshire as late as October (or whenever) - he is not going to alter his behavior.

Second, and relatedly, the polling has been very poor for most of the cycle. Of course, it is possible that internal campaign polling is getting it right while media polling is getting it wrong. However, I doubt it. My intuition is that all polling is "screwy" because of the absence of party identification as a guidepost. This is causing these late, dramatic shifts in the polls. There are simply more voters in the primaries who do not have an easily accessible cognitive framework for making their vote choice.

Third, the Democrats allocate delegates proportionally - thus, they have an incentive to campaign even if they know they are going to lose (e.g. Obama might have known he was going to lose Massachusetts).

A good case in point of all of this is Clinton in South Carolina. She had spent a lot of time and effort in South Carolina *before* the bottom fell out. As late as the fall - Team Clinton was bragging (if you believe the Atlantic) that Bill Clinton's relationship with African American pastors would ensure victory. She learned late that this was not the case. By that point, my "campaign effects" variable was already largely formed for South Carolina. Nevertheless, she still held a few late events in the Palmetto State. Her late realization that she should lose SC might have influenced this variable at the margins - but it was already essentially populated.

To be careful, I ran a test to see if this campaign effects variable violates Gauss-Markov 4. I picked up no relationship between it and the residuals, which is the kind of thing to expect if endogeneity is causing problems with the model itself.

Jay Cost

Thank you for these very interesting statistics and analyses. Hope you don't mind a few comments, and suggestions from a mostly qualitative cultural anthropologist.

Beware the variable “Hispanic”. It is too simplistic and bears no relationship to Spanish speaking populations in the US who are very dissimilar. Voting patterns of Puerto Ricans on the East coast tend to be different depending upon whether born on the mainland or on the island. Cuban Americans do not vote the same way as Puerto Ricans, though there are beginning to be differences in Florida between first generation Cubans and their children. There is also a difference between Cuban Americans who came in the first wave and those who came in the Mariel boatlifts. All of these groups are also demonstrating variation by age set.

A significant group in the East who are ignored are the rising number of Dominican Americans – most first and second generation, and currently the largest foreign born population in NYC. They do not relate to the part of the traditional Democratic party machine controlled by Puerto Ricans. Close attention should be paid to the recent endorsements of Obama by unions like the SEIU who have a large Spanish-speaking and youthful membership who engineered the nod.

Mexicans are different from Mexican-Americans, and Chicanos in California are not the same as Tejanos in Texas. Few demographers and quantitative researchers have begun to tease out the growing populations of Hondurans, Salvadorans, and other Central Americans who are growing in number, both in California and Texas (as well as other states).

Complicating this further are the factors of “race” and identity. East Coast Puerto Rican, Dominican and Cuban communities have a greater percentage of persons who cross-identify as both black and Hispanic – even though there are also tensions around these identifiers. Mexicans and Mexican Americans do not.

Another variable within these groups that should be examined is the religious factor. Though the conventional wisdom is usually that “Hispanics are Catholics”, the rising influence of Evangelical Christianity in many of these communities should be examined more closely.

These are similar questions raised by attempts to predict the voting of “Asians” ignoring Japanese, Chinese, Filipino, Hawaiian, Vietnamese, Indonesian ancestry or nativity.

The comments to this entry are closed.