Ken Schultz, a political scientist at Stanford, was inspired by the misleading Wall Street Journal graphic and disappeared Tax Foundation blog post to illustrate just how easy it is to manipulate bar graphs by changing the boundaries of the bins:
I thought it would be an interesting exercise to see how easily someone without scruples could twist the same data to support whatever argument they wanted to make about the distribution of taxable income (and, by implication, the proper targets for taxation). The attached file presents four graphs using the same data to depict the income distributions four different ways. This way, people can pick their preferred tax policy and then select the graph that supports their pick. No need for data to constrain your policy prescriptions!
And here are the graphics Schultz made using the same IRS data as the Journal (see Kevin Drum for a similar approach):
In short, it's possible to draw almost any conclusion you want from the data if you mess with the bin sizes enough. That doesn't mean that all bar graphs are equally valid, however. Some readers of my previous post have been arguing that the Journal's original graphic wasn't misleading, but that's wrong when it is considered in the context of the editorial. Here's the relevant passage:
The rich, in short, aren't nearly rich enough to finance Mr. Obama's entitlement state ambitions—even before his health-care plan kicks in.
So who else is there to tax? Well, in 2008, there was about $5.65 trillion in total taxable income from all individual taxpayers, and most of that came from middle income earners. The nearby chart shows the distribution, and the big hump in the center is where Democrats are inevitably headed for the same reason that Willie Sutton robbed banks.
There are many problems with the editorial's logic, but the relevant one here is the idea that the graph proves that most taxable income comes from "middle income earners." That's empirically false if you define "middle income" to mean the middle of the income distribution. The peak of the "hump in the center" of the Journal's own graphic is for people who make $100,000-$200,000, but as the Journal notes, the top 10% (including joint filers) make $114,000 and above. That's not the middle unless you stretch out the distribution (as the Journal did) by including numerous bins for the very small number of people making over $200,000. In reality, the top 20% earned 50% of all money income in 2009 (PDF; see Table 3), with the top 5% taking home 22%. The middle quintile -- the true "middle income earners" -- made a whopping 15%.*
Update 5/19 5:07 PM: In fairness to the Journal, I should note (as Schultz points out via email) that the IRS data the Journal used (Excel spreadsheet) is grouped into the same income ranges included in their bar chart. They didn't change the bin sizes to fit their preferred conclusion, but they did plot it in a way that misrepresented the shape of the US income distribution across the population.
* The Census data are money income, not total taxable income, but the conclusion holds.
Of course, it just shows the lump sum in each category, not individual earnings, so it is even more dishonest. 9000 people making $100 in an economy with $1000000 make most of the money, but the one person making $100,000 beats them all hollow in terms of taxable income. It is hard for me to fathom why people just plain don't understand that.
Posted by: Carol | May 19, 2011 at 02:06 PM
One could make the width of the bars for each category proportional to the total amount of income; the total area of the bar would indicate how much taxable income there is in each bracket.
Worth looking at, anyway.
Posted by: Mike the Mad Biologist | May 19, 2011 at 02:25 PM
Thanks for the 5/19 Update, Brendan. Apparently the WSJ chart is simply the output that Excel produces when it's asked to create a bar chart from the IRS numerical data. That's the most straightforward approach imaginable. Indeed, had the WSJ done anything else, there might have been a valid basis to accuse them of manipulating the data.
Brendan's Update shows that the Tax Foundation acted properly when they withdrew Kasprak's post. That post incorrectly and ignorantly criticized the WSJ.
The Update shows that Brendan and Krugman and many commenters at their sites were all mistaken in deducing malign intent or conspiracy. It's striking to read the venomous comments on Krugman's site. They show that mean-spirited, ignorant conspiracy mongers are found on the left as well as on the right.
Brendan, I don't buy the excuse that the WSJ supposedly "did plot it in a way that misrepresented the shape of the US income distribution across the population." As pointed out earlier, the chart clearly and correctly depicts the total income in each range. Nor have you provided the plot that you think properly represents the shape of the US income distribution. I think you should have simply admitted that you were wrong.
Five points may be worth mentioning:
1. Ken Schultz and Kasprak both claimed to prove that the WSJ exhibit was biased by providing other arrangements that they said were biased. Where's the logic in that? That's like "proving" that Obama wasn't born in Hawaii by showing how some other hypothetical person might have falsified birth documents.
2. Not a single WSJ critic came up with what they claimed was the proper way to set the bin ranges.
3. I think the real problem may be the difference of opinion on what constitutes "middle class." As I read the WSJ editorial, they considered the middle class to be everyone (other than the poor) below Obama's target group of "millionaires and billionaires." That's a vague definition. I think the WSJ imagined that upper middle class goes up to $200,000 or even more. With that definition the chart accurately shows that the rich earn a relatively small portion of the country'e income. However, many people would set the bottom of the "rich" range at a lower dollar figure. For them, the "rich" do earn a large portion of the country's income.
4. Krugman and many commenters mis-read the WSJ editorial to be a recommendation of whose taxes should be increased. The WSJ was actually predicting who would be taxed more. BTW the WSJ didn't actually convince me that they were right. But, an unconvincing argument still deserves to be accurately represented.
5. Does anyone think that Krugman will now admit that the WSJ did not manipulate the data, but rather charted it just as it came from the IRS? I'd give odds he will never do so. Brendan, I want to acknowledge that you have more integrity than Krugman and the NY Times.
P.S. to Mike the Mad Biologist -- I think your idea would work, However,the WSJ (using the normal Excel chart function) achieved the same thing. In the WSJ chart, the height of the bars for each category were proportional to the total amount of income. Thus, the total area of each bar did indicate how much taxable income there was in each bracket.
Posted by: David in Cal | May 19, 2011 at 07:44 PM
So the whole argument comes down to the definition of "middle class". If that means the people within some interval of the average earnings you get a smaller income definition.
But it seems rather clear that the WSJ includes in "middle class" the people making well more than the average but still below the artificial threshold somewhat promoted by the Obama admin as "rich" - >$200K.
Maybe there is an argument the WSJ shouldn't call this group "middle class", but this is all semantics as their point is that you can't tax just the >$200K earners (i.e. the "rich") and balance the budget.
So far I haven't many comments (if any) address that point, which makes the whole "misleading" argument a irrelevant sideshow.
Posted by: MartyB | May 20, 2011 at 02:11 PM
I agree with Brendan that the bar chart was not the best way to display this data. Anyone who's accustomed to working with financial data would realize that the bin widths could not be the same size. She would know that the bin widths are arbitrary. So, a sophisticated reader wouldn't be misled by the WSJ chart nor by any of the supposedly biased alternatives created by Schultz and Kasprak.
However, a naive reader might confuse the WSJ chart with other charts where the bin widths are equal. Thus, the naive reader might be misled by the shape of the chart.
It would have been better IMHO to use a pie chart rather than a bar chart. The pie chart wouldn't mislead viewers into thinking that they could find the center of the distribution from its highest bar. A pie chart would make it easier visually to add up all the segments corresponding to whatever definition of "rich" or "middle class" one wanted to use and see what fraction of total income was earned by that group.
Posted by: David in Cal | May 21, 2011 at 12:38 PM