Comparison of personal income distributions reported by the IRS and Census Bureau

We have discussed many times in this blog that personal incomes are not well defined and the lack of a comprehensive definition does not allow any accurate estimate of income inequality. Here we compare two data sets for 2001. The IRS reported 128,227,145 people with the cumulative income of $6.37E+12. The Census Bureau (CB) found 221,591,000 people with the total income of $6.45E+12, with the working age population of 243,946,000. All figures published by the CB are obtained as a projection of about 200,000 people covered by the Current Population Survey to the working age population as a whole using so called population controls and thus the higher incomes with just few representatives are subject to large biases. The Bureau of Economic Analysis (BEA) has estimated the personal income in 2001 as $8.89E+12. Therefore the IRS and the CB both reported only 72% of the gross personal income (GPI) as based on 53% and 87% of working age population, respectively. This BEA provides no personal income distribution (PID) and thus its data are worthless for the analysis of income inequality. The IRS and CB data sets are also not comprehensive since give two different and mainly independent slices of the personal income distribution. One might try to recover the overall PID from these data sets.

The IRS and CB provide PIDs in different income bins.  This excludes any direct comparison of the relevant PIDs.  The CB covers incomes between $0 and $250K with bins of $10K before and $50K above $100K. The IRS distribution spans the interval between $0 and $10M with the bin width varying from $1K to $5M. All incomes above $250K and $10M, respectively, are covered by open-end bins for which the width cannot be determined. We have calculated two probability density functions (PDF) for the IRS and CB by dividing their PIDs by the widths of income bins and total population. (We did not normalize to the total incomes because they are practically identical.) Figure 1 presents both PDFs. These curves represent the portion of total population in $1 bins as a function of income. Between $15K and $40K, the PDFs practically coincide. Below $15K, the probability density reported by the CB is higher, and above $40K the IRS curve is above the CB one. Both curves reveal a power law distribution above approximate $70K. This allows an extension of the CB curve above its limit of $250K with a power law function with the index of -3.34 as shown in Figure 2. From Figure 1, one can conclude that the excess of 93,000,000 of people in the CB’s PID is inherently related to low incomes. The IRS compensates the total income deficit associated with the lack of low-incomers by a larger portion of people with higher incomes. In that sense, the CB better covers the sources of low incomes and the IRS includes more accurate sources of incomes above $50,000.

In order to construct a comprehensive definition one should combine all sources on income over the whole income axis. The simplest way is to use the CB’s PID below and the IRS’ PID above some threshold. We have chosen the level of $75K because there are bins starting with this value for both the IRS and CB. The number of people reported by the IRS and CB with incomes above $75K is different: 19,452,000 and 15,218,000, respectively. The former number is more accurate since the IRS includes almost all sources of high incomes and we consider the joint (merged IRS/CB) distribution at high incomes to be that reported by the IRS. The extra 4,234,000 people with incomes above $75K might be counted in by the CB as having lower incomes. However, one cannot easily redistribute the CB’s PID by extracting these four millions. Therefore, we just added 4,234,000 to 221,591,000 reported by the CB in order to calculate the basis for the corresponding PDF. This is a crude approximation but it should not introduce a large bias in the lower income bins since it is less than 2% is added. At lower incomes, we use the CB’s PID. Figure 3 shows the new merged PID (black line) which includes 225,000,000 and $7,819B.  The total income in the merged distribution is closer (895) to the GPI.
Figure 1. PDFs for IRS and CB

Figure 2.  The high income PDFs for the IRS and CB. The actual CB PDF shown by red circles and is extended by a power law with the index of -3.34 as shown by yellow circles.  The highest two values reported by the IRS lie above the power law distribution and are shown by blue squares with red contour. The expected values are shown by yellow squares.  
Figure 3. The merged personal income distribution.  At lower incomes, we retained $5K bins instead of $10K in Figure 1. At higher incomes, the merged PID is parallel to that reported by IRS but is much lower because the normalization basis has been increased from 128M to 225M people.

