The IRS and CB provide PIDs in different income bins. This excludes any direct comparison of the relevant PIDs. The CB covers incomes between $0 and $250K with bins of $10K before and $50K above $100K. The IRS distribution spans the interval between $0 and $10M with the bin width varying from $1K to $5M. All incomes above $250K and $10M, respectively, are covered by open-end bins for which the width cannot be determined. We have calculated two probability density functions (PDF) for the IRS and CB by dividing their PIDs by the widths of income bins and total population. (We did not normalize to the total incomes because they are practically identical.) Figure 1 presents both PDFs. These curves represent the portion of total population in $1 bins as a function of income. Between $15K and $40K, the PDFs practically coincide. Below $15K, the probability density reported by the CB is higher, and above $40K the IRS curve is above the CB one. Both curves reveal a power law distribution above approximate $70K. This allows an extension of the CB curve above its limit of $250K with a power law function with the index of -3.34 as shown in Figure 2. From Figure 1, one can conclude that the excess of 93,000,000 of people in the CB’s PID is inherently related to low incomes. The IRS compensates the total income deficit associated with the lack of low-incomers by a larger portion of people with higher incomes. In that sense, the CB better covers the sources of low incomes and the IRS includes more accurate sources of incomes above $50,000.
In order to construct a comprehensive definition one should combine all sources on income over the whole income axis. The simplest way is to use the CB’s PID below and the IRS’ PID above some threshold. We have chosen the level of $75K because there are bins starting with this value for both the IRS and CB. The number of people reported by the IRS and CB with incomes above $75K is different: 19,452,000 and 15,218,000, respectively. The former number is more accurate since the IRS includes almost all sources of high incomes and we consider the joint (merged IRS/CB) distribution at high incomes to be that reported by the IRS. The extra 4,234,000 people with incomes above $75K might be counted in by the CB as having lower incomes. However, one cannot easily redistribute the CB’s PID by extracting these four millions. Therefore, we just added 4,234,000 to 221,591,000 reported by the CB in order to calculate the basis for the corresponding PDF. This is a crude approximation but it should not introduce a large bias in the lower income bins since it is less than 2% is added. At lower incomes, we use the CB’s PID. Figure 3 shows the new merged PID (black line) which includes 225,000,000 and $7,819B. The total income in the merged distribution is closer (895) to the GPI.
Figure 1. PDFs for IRS and CB
Figure 2. The high income PDFs for the IRS and CB. The actual CB PDF shown by red circles and is extended by a power law with the index of -3.34 as shown by yellow circles. The highest two values reported by the IRS lie above the power law distribution and are shown by blue squares with red contour. The expected values are shown by yellow squares.
Figure 3. The merged personal income distribution. At lower incomes, we retained $5K bins instead of $10K in Figure 1. At higher incomes, the merged PID is parallel to that reported by IRS but is much lower because the normalization basis has been increased from 128M to 225M people.