May 16, 2011


Edward Tufte would be proud!

Nonsense, Ed Tufte wouldn't be satisfied unless the chart also showed Napoleon's retreat from Russia and his army's expenditures on Kleenex.

Serious problems with the Tax Foundation analysis:

1. The WSJ chart is the normal way that income ranges are graphed. Their chart isn't particularly misleading. If you want to compare total income for two ranges, (e.g. $1 million and above vs. $50,000 - $1 million), just mentally add the heights of the bars in each of the two ranges.
2. The Tax Foundation complains that one range (100K - 200K) is 4 times as wide as another (75K - 100K) The implication would seem to be that they want all bars to have equal width ranges. But, that's impractical. E.g., if every range width were 25K, you'd need 200 bars to cover the range of $5 million to $10 million. And then the bars for larger income groups would be very short, because there are few very high earners in each specific $25,000 width range.

3. The graph by percentile particularly silly. It puts into separate categories incomes that are close to each other in dollar terms simply because many other families earn pretty close to that amount. Nor does the Tax Foundation claim that the percentile approach is right. They simply use it to illustrate that different approaches are possible.

4. The Tax Foundation concludes that income distribution can be charted many different ways. So, what? Does that mean we are barred from using charts?

5. Note that the author of the post never says what the right way would be to set these ranges. It seems clear to me that he doesn't know the anwer to this question.

Brendan almost seems to be implying some sort of conspiracy because the Tax Foundation withdrew this post for unexplained "editorial and content reasons." I'm not at all surprised that they removed the post. It was evidently written by someone statistically naive. The post was an embarassment to the Tax Foundation.

It's disappointing that Brendan blogged about this obviously flawed post as if it were valid. Of course it criticized Brendan's political adversaries on the WSJ editorial page. Brendan's decision illustrates how people interpret data so as to confirm the opinions they already hold.

David in Cal: "It was evidently written by someone statistically naive."

And your comment was evidently written by someone with limited reading comprehension skills.

The entire point of the Tax Foundation piece can be found in this sentence: "Regardless of the broader merits of the editorial, this chart [found in the WSJ editorial] is a textbook example of how to lie with statistics." The author then proceeds to use the percentile graph to illustrate how the same tax numbers used by the WSJ could be used to generate a chart showing a completely different income distribution that would undermine the WSJ's argument. In fact, the author of the Tax Foundation's blog post is making the exact same argument you are making -- i.e. that "people interpret data so as to confirm the opinions they already hold." How are we to take seriously your critique of Nyhan when it is based around a fundamental misunderstanding of the basic point of the Tax Foundation's original blog post?

The idea that percentiles are not a reasonable way to graph incomes is goofy, of course it is reasonable. Many economic statistics are published this way.

If you don't like percentiles and think income amounts are the only legitimate way to graph , there are a couple of standard ways to do this, either linear (equal sized bins) or logarithmic. It would be interesting to see all three ways of graphing, maybe David in Cal can do this for us, since he criticizes Brendan for commenting on the Journal's bins without giving the right bins.

As for the Journal's bins, there is nothing normal about it. The ratios of the sizes of adjacent bins are 1, 1, 1, 1, 1, 2, 1, 2.5, 1, 4, 3, 1.6, 1, 1, 6, 1.6, infinity. I don't see an obvious rationale for these bin sizes. Possibly they want median income to be in the middle bin. If you do that with a straight linear or logarithmic graph, the top bin looks too big. Thus the stretching of the bins to the right of median income.

Exercise: In a graph with income on a logarthmic scale with median income in the middle bin, how much bigger than any other bin would the top bin be?

mercunio -- I teased Brendan about looking at facts so as to confirm his previous views because of his research into this very subject. I spent 40+ years in the business of analyzing data. To be clear, I believe there are correct ways to display data and deceptive ways. IMHO the WSJ approach was correct and appropriate, whereas the withdrawn piece said it was deceptive -- a view presumably endorsed by Brendan.

Setting technical questions aside, why did Brendan blog about this Tax Foundation post? Nobody stands behind it. The Tax Foundation withdrew it. They say it has problems with content and that it needs revision. Brendan doesn't know who wrote it. He can't evaluate the author's knowledge or his beliefs. The post really deserves no more credibility than some internet rumor or anonymous e-mail.

Nevertheless Brendan, who is usually so careful, promoted the post. I'd venture to guess that other liberals are sharing that post with each other. Why? IMHO because it criticizes the WSJ editorial page.

The critics have a point that changing the widths of the ranges would affect the look of the chart. However, that shouldn't affect its impact. E.g., suppose the range of 100k - 200 K were split in half. Then, instead of one high bar in the middle, you'd have two bars about half as high. To calculate the income earned by those with AGI between 50K and 500K you'd add the lengths of all the bars in this range. The total would be the same whether a range of 100 - 200K were used or whether that range was split in two. Either way, the chart would show that those with AGI between 50K and 500K earned well over half the income.

I see Brendan now has the name and background of the author, Nick Kasprak. Kasprak's background is in web-based tools, programming, physics, astronomy, and mathematics -- not statistics nor economics. This supports my belief that he is not sophisticated in the field of economic statistics.

David in Cal: I can't believe you'd actually defend that graphic. It's such a distortion of the data that it's practically fraudulent. The sizes of the bins are specifically chosen so that it looks like Americans with median incomes make the bulk of the money. But that middle bin is 100-200K, which is emphatically not a median salary! 100K puts a household in the top quintile (if not higher), 200K is close to the top 3%. These are elite incomes, not average ones.

Frankly David in Cal,

Americans deserves the honor to put the first man on the Moon, and to mess up the economy with structurated products based in, yes, economic statistics.

Please, don't compare your statistics knowledge with a B.A. in Physics and Astronomy like Mr. Kasprak. You only make the things worse.

Actually I find both graphs deceptive.

Note that the top percentile (95%+) would include some of the earner's in the original WSJ graph's $100-$200k bin. Thus, the new graphic would suggest we need to raise taxes on moderately high income households.

Furthermore, the attack on the WSJ graph's is a bit disingenuous. Exactly how would you propose for income bins? Just doing percentiles is a bit of a cop out - it provides a different perspective, but an income bin approach is also valuable. So what would your bins be? Would you use a constant $100k interval? I guess you could do that until you run to $1MM and then to $1MM+. Note that this would make the $0-$100k group the highest bar by far.

The WSJ graph is misleading because the bins don't adjust for how many people are in them. That's why the Tax Foundation's chart is much more honest. If they wanted, they could put dollar values on each of their percentile bins, but their being adjusted so that each bin has equal population is precisely the right way to address the data.

Thanks, Brendan, for the link to Kasprak's follow-up column. Mr, Kasprak is obviously smart, but his general unfamiliarity with statistics and economic presentations stands out.

His idea of having ranges that go up by a fixed multiple is clever and sensible. However, economic statistics are rarely presented that way. Usually range values are chosen as round numbers, either round dollar figures or round percentiles.

Kasprak's idea of charting the portion of the income within each range is clever. I've used that technique in my work. However, it's a rare and potentially confusing approach. I once got into a dispute after I had presented data using that sort of chart. The counter-party, who was an expert, claimed (after he lost money on the deal) that he had misunderstood my chart. The counter-party used his alleged misunderstanding as a negotiating point to mitigate his loss. His allegation was plausible enough that he was able to negotiate a reduction in his loss. Most newspaper readers wouldn't understand this sort of chart.

Kasprak says: "I didn’t want to pick arbitrary bin ranges and thus be guilty of the same distortion I accused the Wall Street Journal of yesterday." Having said this, he proceeds to pick arbitrary bin ranges. He uses an arbitrary formula rather than arbitrary dollar figures. Arbitrariness is unavoidable when displaying data. Some sort of bin ranges have to be chosen. That choice is always arbitrary. Nevertheless, such graphs are widely used and have considerable value.

Alnair's comment above reflects an attitude that's all too common -- that there's nothing to statistics. Anyone can do it. No doubt there are fields that are more challenging than probability and statistics. However, every field has its own techniques and traps.

Serious statistical errors are made by smart people who don't fully understand statistics. That's why top medical journals generally won't publish an article involving statistical analysis unless a biostatistician is a co-author. My biostatistician wife recently published a correction in a medical journal involving research done by a Doctor Paolo Zamboni. Zamboni has gotten a lot of publicity for his recommended treatment of Multiple Sclerosis, but he did his own statistical analysis. My wife's correction doesn't necessarily invalidate his conclusions, but there were errors in his work.

Or, for those to young to remember it, read up on Marilyn Vos Savant's Monty Hall problem. (The problem's creator was actually Steve Selvin, one of my wife's professors in graduate school.) The fact that there's more to probability and statistics than one might think is illustrated by this:

Many people refuse to believe that switching is beneficial. After the Monty Hall problem appeared in Parade, approximately 10,000 readers, including nearly 1,000 with PhDs, wrote to the magazine claiming that switching was wrong. (Tierney 1991) Even when given explanations, simulations, and formal mathematical proofs, many people still do not accept that switching is the best strategy.

Fargus -- note that Kasprak doesn't assert that his percentile chart is the right way to display the data. He merely says it's an alternative.

I agree with you that it would be useful to show a percentile chart that included the dollar figures at the ends of each bin range. That's a more Tufte-like approach.

However, IMHO no matter how the data is organized, the same conclusion will stand out: Well over half the income is earned by the middle class.

Depends largely on how you define the middle class, for one. If you define it broadly enough, then sure, well over half of the income is earned by them. But for instance, the top 400 families earn more than the bottom 50% of households in the country. Adjusting for population isn't just a technique; leaving it out is flat out dishonest.

What I'd like to see, actually, is something based entirely on income. For instance, how much income is made in the country that falls into each income bracket? For everybody who makes above that amount, that portion of their income would go into that bin. For somebody who makes $50,000, the first $8500 would go into that first bin, the next $26000 would go into the second bin, and the remaining $15,500 would go into the third. That would give us a much better idea of the overall effect of changing tax rates on each bracket, since it's income that's taxed, not earners.

@Dave in Cal: "Well over half the income is earned by the middle class"

Really? O.K., let's ask "Who are / what is "The Middle Class?"

"As for middle class, well, the Congressional Research Service issued a report last year pegging middle class income as between $19,000 a year and $91,000 a year." http://marketplace.publicradio.org/display/web/2008/01/11/what_is_the_middle_class

Using that definition, and eyeballing the WSJ graph, I'd say "No way."

As a former professor of Demograpy, Sociology, and Social Research Methods at the college level, I'm inclined to use percentages, say from 25% to 85% Eyeballing the second chart, I'd say: "Very unlikely."

Finally, I'd like to point out that there is a big difference between descriptive and inferential statistics. Statistical errors are almost always errors of inference. These are descriptive stats.

In addition to Tufte, I highly recommend Hans Zeissel's "Say It With Figures."

David in cal,

"why did Brendan blog about this Tax Foundation post?...The post really deserves no more credibility than some internet rumor or anonymous e-mail."

Are you certain that you are commenting in good faith? Brendan blogged about the post because it was published by the Tax Foundation. It was not analogous to a rumor; it was instead a published post, one that its publisher concedes was not the result of a hacker. It was not analogous to an anonymous email; it was a post formally published by an fully-identified, well-known publisher. Again: Brendan blogged about it because it was a formally published post, and the fact that its publisher withdrew it in its entirety (as best it could) after publishing it is in itself noteworthy. This was not obvious to you?

To be fair to the WSJ, they appear to have used the same bins as were used in the IRS table they were referencing:


Here's a relevant data point. There was about 4.5 times as much money earned (in AGI) by those earners making from $100,000 to $200,000 per year than by those making over $10,000,000.

There were about 1,000 times as many earners making $100,000 to $200,000 than there were making over $10,000,000.

Fair points, Hugh Loebner. If middle class ends at $91,000, then the middle class earns less than half of America's income.

I'm inclined to define "middle class" based on how people actually live, rather than arbitrary percentiles. Also, although I didn't say so, I meant "middle class" to include lower-middle class and upper-middle class. By those definitions, I'd say most families earning $200,000 are middle class. In fact, it's so expensive to live here in Silicon Valley that plenty of families with two working spouses earn considerably more than $200,000 but cannot afford the lifestyle that we associate with the rich: servants, yachts, independent wealth, mansions, routine use of limousines, etc.

Sam Brasel, you and I will just have to agree to disagree. The Tax Foundation said they withdrew the post because of "content reasons". I read that as a euphemism for admitting that the post was incorrect. I don't think it's unusual for an organization to withdraw a release when they discover it to be erroneous.

IMHO deducing a flimflam from the Tax Foundation's decision not to release this article is kind of like deducing a flimflam from Mr. Obama's decision (until recently) not to release his LFBC.

Let me add the reason why I look at "middle class" as the non-rich (and, of course, not poor.) Some are claiming that the deficit could be cured by increasing taxes only on the rich. ISTM that when they say this, they are inviting an image of "the rich" as upper crust people with mansions, servants, yachts, etc., rather than families where the husband and wife both work full time in order to live an ordinary life.

E.g., IMHO it would be a "bait-and-switch" to claim that taxes would only increase for the "rich", but then have taxes increase for a married couple of New York City public school teachers whose combined income (before taxes) is $200,000 a year.

The lesson here would more appropraiately be you can tell many differnt stories from the same data, rather than the old trope about "lying with statistics", which implies malign intent where there may be none.

Near as I can tell, there were about 3,964 straight up teachers in New York City who made over $100,000 in 2009 (once we subtract out administrators, trainers and special education teachers). Those are going to be the ones that have over 22 years of experience and a lot of education. Looking at the 2008 number of 80,000 teachers in the NYC public school system, that would mean that about 5% of public school teachers in NYC make the $100,000 you posit. It's pretty dishonest of you to act like this is the norm, or anywhere near the majority of the people who would be impacted by marginal tax rates increasing on couples making over $200,000.

What's more, the top teacher salary in NYC is just barely over $100,000. If there was a married couple who both made that top rate, they'd only see taxes increase on the last $98 of their income.

But it's even worse than that! Assuming they don't itemize their deductions and don't have any kids, this couple is going to be able to deduct $19,000 between their personal exemptions and standard deduction, so their AGI is going to be somewhere in the $180,000 range.

But you know all this.

The 2010 U.S. median household income was 31,111. That is where the middle of the middle class lives, not at $200,000.

When deciding whose taxes should be raised, I would ask these questions:

* Which class has benefited the most in tax cuts? The middle, middle class, which received about $300 a year in Bush tax cuts, or those with higher incomes?

* Which income classes can greater afford a tax hike? The middle middle class, which is struggling to keep a roof over it's head, eat, stay healthy and educate its young, or those who can easily afford such basic necessities, and then some?

* Which income classes pay payroll taxes, which average about 30%, and which income classes pay taxes on income, capital gains and dividends, which are far lower? http://www.fool.com/investing/general/2010/10/06/warren-buffett-on-taxes.aspx

What is fairer, a wealthy person that pays taxes at a low rate telling a household earning $31,111 that their taxes should be raised to pay for the deficit that was exacerbated by tax cuts they've received since Reagan, or a household earning $31,111 asking that the wealthy pay taxes at the same rate that they do?

I agree Fargus that if tax rates are raised for brackets above $200,000, the hypothetical pair of top pay-bracket NYC school teachers would pay at most a bit more tax. (Note that they could have AGI above $200,000 if they had summer jobs or earned dividends and interest on their savings.)

However, I was addressing the question of where middle class ends and rich begins. My point was that one wouldn't describe these two school teachers as "rich". Most of us would consider them to be upper middle class.

And my point was that your sob story about the pair of teachers living in NYC doesn't even say what you meant it to say.

This discussion on the whole is exceptionally intelligent and brings out valid points (on all sides). There are three separate arguments. One: How to display the data on income. Two: What an effective tax increase strategy would be. Three: What a fair tax increase strategy would be.

I'm a mathematician but not a statistician, so my comments are based on general principles (I hope).

The original WSJ display, even if based on IRS categories, is misleading. (I leave out the questions of intention and of what is customary.) One gets more useful information by using either a logarithmic income scale, where the division points multiply by an approximately constant factor (they can be rounded to nice numbers), or by population percentages. Both of these graphs complement each other.

The observation that most income is in the middle (broadly defined) should not be surprising but is possibly not widely known, and it should be. The large share of income -- and the small number of people -- at the top should be widely known also.

The distribution of income does not imply tax policy.

Two remarks on policy: (1) You cannot tax people at the bottom heavily without severe consequences to their health and welfare. (2) There are valid reasons to tax the highest income levels the most heavily. The majority of those with such incomes think there are valid reasons not to do so. This is a public policy question that needs to be debated honestly instead of in sound bites. (I have my own opinion, of course, for which I could quote Ferdinand Lot in "The End of the Roman Empire and the Beginning of the Middle Ages" about the effect of Rome's extreme inequality of income distribution.)

Can we also all keep in mind that annual income and wealth are correlated but not perfectly so?

I 30-something making $50k/yr for a non-profit who also has access to a family vacation home and will receive a tax-free inheritance of $1MM is wealthier than a 30-something lawyer with at $150k salary, $150k in tuition debt, no family wealth and a need to care for an elderly parent.

Many people earning in the $100k - $300k range are actually working hard to save and do a better than their parent's did. I doubt many of them are collecting substantial "economic rents". However, it's likely that as incomes move higher and get into the millions the recipient is receiving some economic rents. For example, pro athletes, top entertainers, CEOs. They've all worked very hard to get where they are, but the salaries reflect a "winner take all" power law relationship that is indicative of economic rents.

Let's focus on taxing economic rents where we are unlikely to provide a disincentive to hard work.

Fargus writes "Which income classes pay payroll taxes, which average about 30%, and which income classes pay taxes on income, capital gains and dividends, which are far lower?"

Where does that 30% number come from? Last I checked over 40% of U.S. households paid no federal income tax ("payroll tax"). Many more pay below 20%. As for "income, capital gains and dividends", the "income" is taxed at the earner's income tax rate, just like wage income. For capital gains remember you get taxed on inflation too. For example, if you bought a stock for $100 in 1985 and sold it for $200 in 2010, you'd pay federal capital gains of $15 even though after adjusting for inflation you made nothing (so you end up losing on an after-tax basis). That's why I'm all for getting rid of any and all preferential taxes on capital gains and dividends in exchange for indexing the basis with inflation (just like COLA and many contracts do).

Remember too that all income, regardless of source, above $250k for a household will be subject to an *additional* tax of 2.9% beginning in a few years (part of the healthcare legislation). That is on top of federal tax rates over 30% plus rates in states like California over 9%.

Last I checked over 40% of U.S. households paid no federal income tax ("payroll tax").

No, you can't call "payroll tax" equivalent to income tax. Payroll taxes include FICA (Social Security and Medicare). FICA is normally a 7.65% deduction from pay, matched by a 7.65% contribution by the employer (which - one could very well argue - directly reduces what the employer is able/willing to pay). That tax totals 15.3% (on the first $106,800, above which the rates drop to 1.45% each, or 2.9% total). Those taxes have no deductions (as income taxes do).

For the 2011 tax year, the employee contribution is reduced to 5.65%, for a collective rate of 13.3%.

That's federal payroll taxes.

State/City/County/Transit District/Other Special Government District Taxes usually hit their top marginal rate at a pretty low income. Where I work, there's a ~9% state tax, another ~0.7% transit tax (matched by the employer, so it's more like ~1.4%), and I'm not subject to a city income tax - so my tax rate can be over 25% in payroll taxes before paying any federal income tax.

Jack E. Lope,

We all get an annual statement from social security stating our contribution levels and we all know that the more we contribute the more we receive in retirement. So it's very different from other general taxes. In fact, social security benefits in retirement are *proportionally* higher for lower contributors. That's just the way the formula works.

As for those top marginal rates starting at "a pretty low income", it sounds like you're saying that incomes above median are "pretty low", since I know of no state where the top marginal rate is at or below median. More importantly, for a taxpayer it's the actual effective tax rate that matters. Even if somebody is paying 9% on a sliver of income, that person may pay only, say, a 3% total rate on all income. You probably know this.

Hmmm, seems to me that some pretty smart people in these comments disagree about what the best measurement system for these statistics should be.

This in itself tells me that the issue is not as black and white as Brendan's original post suggested regarding how "misleading" the WSJ's was.

A nice lesson in how implying malign intent can itself be misleading.

MartyB, I agree that some pretty smart people disagree here. But that is a meaningless statement when one of those smart people, David in cal, has dissembled in post after post after post, not the least of which: (1) defends the WSJ piece by stating "Either way, the chart would show that those with AGI between 50K and 500K earned well over half the income" without stating why he selected that particular income range, (2) states "The [TF] post really deserves no more credibility than some internet rumor or anonymous e-mail" when it was neither a rumor nor anonymous but instead a formally published piece, (3) makes the obvious statement "In fact, it's so expensive to live here in Silicon Valley that plenty of families with two working spouses earn considerably more than $200,000 but cannot afford the lifestyle that we associate with the rich: servants, yachts, independent wealth, mansions, routine use of limousines, etc." [REALLY David? I guess such people really ARE middle class then! Who could have known?] It's not that his posts are outright dishonest; it's that they're chock-a-block full of dissembling remarks that are confusing or off-point or make little sense. Why a smart person would do this repeatedly, I do not know.

In any event, note that it was not Brendan who trashed the WSJ piece, but the Tax Foundation. What Brendan did was to express his surprise at the nature of the piece; report on various facts concerning its bizarre withdrawal; and partly foil the withdrawal by reproducing the piece.

