Income distribution is a fundamental process in all economic systems. Conventional economic theories provide a variety of views on the mechanism driving the division of gross domestic product among economic agents. Income distribution at personal level did not deserve the highest attention of the mainstream economists who are focused on households. We do not share this approach and consider personal income as a natural and indivisible level for theoretical consideration. Total income of families and households corresponds to a higher level of aggregation and the dynamics of their evolution is prone to all disturbances associated with fluctuations in their composition and average size over time. Therefore, we introduce and elaborate a concept describing the distribution of personal income and its evolution. Because of data availability, quality and time coverage an unavoidable choice for our study is the United States.
Redline of our investigation follows up the answer to the key question: Whether the configuration of personal incomes in the US is the result of distribution of a random part of nominal GDP growing at a rate prone to stochastic external (in economics - exogenous) shocks or there exists a deterministic and fixed hierarchy of personal incomes, which evolution defines the rate of GDP growth? If the distribution is a stochastic process together with the part of GDP related to personal incomes, i.e. with gross personal income (GPI), one should develop a statistical approach. If the distribution is fixed and defines the overall growth of economy one would be able to formulate a deterministic (e.g. mechanical) model. In this Chapter, we are trying to prove that the second answer is valid and the evolution of each and every personal income is predictable, potentially as accurate as in classical mechanics.
We do not feel that economics as a science is currently able to provide adequate concepts and methods to analyze personal incomes in quantitative terms. So, we adapt an interdisciplinary approach, which has already shown its fruitfulness in many scientific and technological areas. This success is achieved not only due to the coincidence of formal description of various physical, chemical, biological, and sociological processes, but also expresses the existence of very deep common roots in the nature. For example, the power law distribution of sizes is observed in economics (Pareto distribution), in frequencies of words in longer texts, in seismology (Guttenberg-Richter recurrence curve), geomechanics (fractured particle sizes), and many other areas. Recent studies associate the power law distribution with a realization of some stochastic processes known as "self-organized criticality" (SOC).
Economics and its numerous applications in real life demand huge amount of numerical data in order to estimate current state of a given economy and future development. Such data have been continuously gathered from the very beginning of capitalism as an economic system, but the 20th century and especially its second part is characterized by a dramatic increase in the number of economic observations and measurements. The resulting data set has become an object of a thorough study not only for professional economists but also for specialists in many other disciplines. There are many examples of successful application of mathematical and physical methods from many adjacent disciplines for understanding economic phenomena and processes.
Personal income distribution (PID) represents one of high-quality sets of quantitative data with a history of more than sixty years of continuous measurement with increasing accuracy. Irrelevant to the nature of these data, even the simplest scatter plot reveals some specific features, which are often observed in physics: growth and fall is well approximated by exponential and power law functions. Some of these functions are the solutions of ordinary differential equation, and thus one can presume that the processes behind the data can be also described by such equations. This makes it very attractive to apply standard methods of analysis and to model the evolution of personal incomes according to 'first principles' adopted in the natural sciences.
Among numerous possibilities, we selected the geomechanical model of a solid with inhomogeneous inclusions proposed and developed by V.N. Rodionov and co-authors (1982) as an analogue of an economy expressed as a set of personal incomes. The economy plays the role of a solid body and personal incomes correspond to inelastic stresses on the inclusions. We expected that some of the already available equations and solutions for a solid would provide an adequate description of incomes, and some of the equations would need modification. The intuition behind such an assumption was based not only on our professional experience in both disciplines but also on a formal equivalence of the PID in the United States and the Guttenberg-Richter recurrence curve.
The original geomechanical model describes the distribution of stresses in solid by separating them into elastic and inelastic components. Inelastic stresses are concentrated only on inhomogeneous inclusions and play an important role in the processes of deformation and fracturing. In the model, the size distribution of inclusions, d(l), is chosen to retain constant the total volume for any size l: d(l)~l-3. In other words, the number of inclusions of a given size l (per unit volume) decreases inversely proportional to the size cubed. This is a power law or scale free size distribution. The lower limit of l is likely constrained by the characteristic length of atom and the largest size should be substantially smaller then the size of the solid.
The growth rate of inelastic stresses is proportional to the rate of elastic deformation. Inelastic stresses are irreversible and dissipate over time. This is a fundamental property of real solids – no stress or deformation can be retained forever and even such hard rock as basalt undergoes plastic deformation and dissipation of stored energy. The defining property of the geomechanical model consists in the assumption that the rate of dissipation of inelastic stresses is inversely proportional to the size of inclusion, i.e. the larger is the inclusion the longer time is needed to dissipate the same level of inelastic stress. (When applied to economics this rule says that larger incomes more resistant to decline, i.e. they decay at a lower rate than small incomes.) To simplify relevant mathematics, only one deformation process with a constant rate is usually considered in the geomechanical model. The deformation is caused by some external forces, which provide a constant energy supply.
This geomechanical model has been adapted and modified for the purposes of economic modelling. Formally, the size of inclusion is interpreted as the size of some tool or means, which is used to generate or earn income. Such words as "generate", "produce", "earn" and their synonyms are equivalent in the framework of our model and express the assumption that the sum of all personal incomes is equal to GDP. The proposed model is a microeconomic model because it addresses the evolution of personal incomes depending on individual properties and conditions. On the other hand, when aggregated over the whole working age population, the model allows a macroeconomic level of consideration. Thus the model is a dual one expressing the fact that by definition Gross Personal Income (GPI) is equal to GDP. Here we assume that GPI is equal to Gross Domestic Income and there is no impersonal income, because any income, personal or corporate, ultimately has its personal owner who can use this income for consumption, saving or investment.
In contrast to the geomechanical model, observations of income force the size of earning means to be distributed uniformly from some nonzero minimum to a finite maximum value. Uniform distributions of sizes are not usual in physics. As a rule, larger objects are less frequent. Because the PIDs measured in the US and their aggregates are well predicted with a uniform size distribution of earning tools, we did not thoroughly analyze alternatives. It could be a good exercise for students, however.
In the microeconomic model, deformation caused by external forces is interpreted as the capability of a person to generate income independent of the size of earning means. As an inherent characteristic of a person it could hardly be changed under normal conditions. This property is related only to money earning and does not depend on other personal talents and deficiencies. In a sense, two persons with equal talent in some profession have quite different salaries. Unlike talent, the capability to earn money is a measurable characteristic expressed in monetary units.
The income earned per year or income rate, as an analogue of inelastic stress concentrated on an inclusion, is proportional to the product of the size of earning means and the capability to earn money. These capabilities (or rates of external deformation) are also distributed uniformly among people of working age. The capabilities and sizes of earning means - both are getting larger as real GDP per capita grows. As a result, the evolution of the system of personal incomes is described by equations, which include some features additional to those in the geomechanical model. The microeconomic model has the same functional dependence between defining variables and similar formal solution. So, in mathematical terms, we are ready to start modelling personal incomes.
Before one starts a quantitative modelling, a thorough investigation of data availability and quality should be carried out. No model can be proved valid or invalid when relevant data do not provide an appropriate resolution. The breakthroughs in the natural sciences always happen at the edge of resolution leaving behind firm knowledge. Following this tradition, §1.2 is fully devoted to the assessment of data quality. The distribution of personal incomes is measured by various institutions, both governmental and private. We rely on the data which have been gathered by the US Census Bureau in the March Supplements of the Current Population Surveys since 1947. Other sources cover shorter periods or have gaps in measurements. Moreover, the Census Bureau provides the dependence on age – a feature most important for an evolutionary model. At the same time, there are numerous and severe deficiencies in the CPS data. The most painful and dangerous for the consistency of quantitative modelling is the incompatibility of data after any new revision to the CPS questionnaire: the unit of income measurement has been randomly changing through time. In physics, metrology was introduced several centuries ago and always serves as a backbone of any empirical investigation.
The microeconomic model is formally introduced in §1.3. This is the final result of an extended empirical investigation. To select some initial model from numerous alternatives, to modify it for matching a bulk of observations, and to estimate empirical parameters and coefficients required time and efforts. In its computer version, the main loop of the model programmed in FORTRAN took around 25 lines. A few subprograms allow different levels of aggregation: from individual income to GPI. The programming is a straightforward one and one can repeat it in no time using defining equations and reported parameters. Real GDP per capita is the driving force of the model. Therefore, we do not need to numerically integrate ordinary differential equations, but to use measured GDP.
To begin with we test the predictive power of the model by estimating the overall PIDs in the United States. This is an intermediate level of aggregation which disregards the dependence of individual incomes on age. Together with predicted PIDs, §1.4 introduces initial conditions for actual modelling. Initial values of defining parameters are obtained by standard trial-and-error method. Since the Pareto law is an empirical one and is obtained directly from observations, the microeconomic model covers only the low income zone. This zone includes 90% of working age population, however.
One of basic results of §1.4 consists in finding of a rigid hierarchy of personal incomes. When normalized to gross personal income and total working age population all PID between 1994 and 2001 collapse to one curve. In other words, the normalized PID is an invariant. In classical mechanics, such invariants (in closed systems) as energy and momentum provide fundamental constraints on possible evolution of the systems and also result in strict links between aggregate variables. These links are usually expressed in homonymic equations of classic mechanics. One can refer to the representations given by Euler and Lagrange, for example. If similar invariants would exist in real economy one could derive numerical conclusions, and likely a sound theory.
Understanding and modelling of age-dependent PID does deserve special attention as demonstrated in §1.5. There are really dramatic changes in the shape of PID: from practically exponential fall in the youngest and eldest age groups to a piecewise function in the mid-age groups. All these features are successfully modelled for the period between 1994 and 2002. The success is even enhanced by the fact that the analysis and prediction was based on the same microeconomic model and parameters as obtained for the overall PID in §1.4. It is a formal quantitative validation of the model – it predicts beyond the set of data used for the estimation of empirical parameters and coefficients.
Therefore, the microeconomic model quantitatively describes the evolution (with age and over time) of each and every personal income as a function of the individual capacity to earn money, the size of earning means, and real economic growth. At this stage, the modelling of age-dependent PID was not accompanied by the explicit prediction of the level of income inequality.
In paragraph 1.6, a different set of data is modelled - the dependence of average and median income on work experience. This data set spans a longer period since 1967. Here we first test the consistency of the model at higher incomes described by the Pareto law. The modelling meets significant difficulties related to the changes in the portion of GPI in GDP and income definition in the CPS questionnaire. The revisions to the CPS and population estimates after decennial censuses create artificial steps in the PIDs. Median income may be a more robust variable due to lower sensitivity to higher incomes. Its dynamics is relatively better predicted by the model. Overall, the dependence of mean and median on work experience and its evolution over time validates the model.
Paragraph 1.7 addresses several problems associated with the Pareto distribution. There is no general understanding and formal model of the processes leading to the power law distribution of personal incomes. This is a challenge for the future. However, there are several quantitative features of the Pareto distribution which can be modelled. Of crucial importance is the dependence of the portion of people in the Pareto distribution on work experience. Apparently, the youngest and eldest age group should be characterized by lower portions than intermediate groups. The model is able to accurately predict this dependence and its evolution through time. It is another point in favor of the model.
Numerous quantitative features related to economic inequality are discussed in §§ 1.8 and 1.9. This type of inequality is an apparently inevitable and multi-dimensional phenomenon in any social system. Due to practical and emotional importance for everyone, inequality attracts high attention of economists, politicians, and ordinary people. The former ones are focused at revealing potential quasi-deterministic or statistical links between economic inequality and numerous micro- and macroeconomic variables. There is no clear understanding whether the economic inequality is a positive or negative factor for such fundamental economic parameters as real economic growth, inflation, and unemployment (Galbraight, 1998).
Income inequality is one of quantitative measures of economic inequality. There are many theories of inequality arising from the distribution of income. Neal and Rosen (2000) presented an almost comprehensive overview of state-of-art in this field. In spite of the efforts associated with the development of a consistent model of income distribution there are some problems yet to resolve. Moreover, modern economic theories do not meet some fundamental requirements applied to scientific theories - a concise description of accurately measured variables and prediction of their evolution beyond the period of currently available measurements.
In §1.8, we model the most popular aggregate measure of income inequality - the Gini coefficient, G, for PIDs in the United States. This coefficient is characterized by a number of advantages such as relative simplicity, anonymity, scale independence, and population independence. On the other hand, the Gini coefficient belongs to the group of operational measures: its evolution through time is not theoretically linked to macroeconomic variables and the differences in Gini coefficient observed between various countries are not well explained. These caveats make the Gini coefficient more useful in political and social applications but not in economics as a potentially quantitative (hard) science.
As a rule, the Gini coefficient is estimated from household surveys, and inequality is reported at family and household levels. Such an aggregation is affected by social and demographic processes, which may bias true economic mechanisms driving income inequality. Theoretically, the indivisible level for the study of income inequality is the personal income. In our framework, personal incomes are presumed to be sensitive only to macroeconomic variables.
Paragraph 1.9 describes the data on personal income distribution in various age groups, presents estimates of relevant Gini coefficients, elaborates on empirical PIDs, and compares the evolution of observed Gini coefficients to that predicted by our model. The age-dependent PID in the youngest group is characterized by large differences from the overall PIDs. Obviously, all individuals start with zero income and the initial part of income trajectory in time, as personal income observations show, is close to an exponential growth. In the mid-age groups, PIDs are similar to the overall PID. In the oldest age group, PID is also different and is closer to that in the youngest group. Accordingly, Gini coefficient undergoes a substantial evolution from the youngest to the oldest age groups.
In §1.10, we join the vivid discussion of increasing economic inequality in the United States. Our quantitative assessment of personal income inequality is quite different from that articulated by many economists. In §1.8, we conducted quantitative estimates of Gini coefficient using personal income distributions, which have been reported since 1947 by the US Census Bureau , and found that this coefficient was practically constant over time. Having a constant Gini coefficient since (at least) 1947, one might find it strange that other researches and media thoroughly discuss increasing inequality during the last 25 years. It was difficult to actually understand why those researches do not use the US Census data despite the Census Bureau (2004) explicitly states:
Because of its detailed questionnaire and its experienced interviewing staff trained to explain concepts and answer questions, the CPS ASEC is the source of timely official national estimates of poverty levels and rates and of widely used estimates of household income and individual earnings, as well as the distribution of that income.
Paul Krugman, the 2008 Laureate of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, explained why he and other researchers are forced to deny the estimates based on Census Bureau data:
First, because Census data are based on a limited sample, not the whole population, they're unreliable in tracking the income of small groups – and the really rich are a small group, who just happen to bulk large in the economy. Second, the questionnaire is "top-coded": if the individual interviewed has earnings higher than $999,999, those earnings are recorded simply as $999,999. Since a lot of income growth in the last few decades has taken place among people with multimillion-dollar incomes, the Census data miss an important part of the story.
In practical and theoretical terms, both statements (reasons) are wrong. First, in hard sciences, one is not often able to measure true values of desired variables, but usually measures some portions of them. For example, nobody tries to invent a weighting machine in order to measure the Earth's mass. It is enough to measure gravity acceleration in one point since this acceleration is proportional to the total mass. Therefore, if a portion of a whole object is a representative one and is measured consistently over time, one can carry out a reliable quantitative analysis. A randomly changing portion, as sometimes happens to macroeconomic variables after introduction of new definitions, would, obviously, ruin any such quantitative analysis. So, using surveys of small population samples does create a problem with internal precision, but should not necessary disturb results of overall quantitative analysis. This Chapter provides extensive quantitative results which confirm that the Census Bureau has been collecting high-quality data.
Second, the "top-coded" approach does not harm the estimates of income in the "the richest of the rich" group. This effect is known more than hundred years already. Higher incomes are very accurately distributed according to the Pareto law. As a matter of fact, one does not need to measure any personal income in the high-income group. S/he needs to estimate the number of persons with income above some given (high) threshold. Then, one can use simple mathematical equations to obtain accurate population density at any income level and also total income above any threshold.
The logic of the presentation of a new model in a book cardinally differs from that of scientific research itself. When studying some process or phenomenon, one does not possess complete knowledge about defining relationships and parameters. This state of incomplete knowledge gives birth to numerous questions and problems one has to address during the study. When the research is finished, the obtained model should accurately describe corresponding observations and all (or almost all) wrong hypothesis and irrelevant assumptions are eliminated. As a result, the presentation of a finished model usually skips all unnecessary details and is focused on a comprehensive description of relevant relationships and parameters. Following this tradition, in Chapter 1 we first present our model and then demonstrate how accurate it predicts various quantitative properties of personal income distributions in the US as related to some measured macroeconomic and demographic parameters. The model is validated according to standard procedures involving comparison of predicted and observed data.
Bureau of Economic Analysis, (2005). Current-Dollar and "Real" Gross Domestic Product, (Seasonally adjusted annual rates), table, last modified 05.25.05, http://bea.gov/bea/dn/gdplev.xls
Bureau of Economic Analysis, (2008). National Economic Accounts, tables, retrieved on March 30, 2008
Census Bureau, (2000). The Changing Shape of the Nation's Income Distribution, retrieved on February 26, 2007 from http://www.census.gov/prod/2000pubs/p60-204.pdf
Census Bureau, (2002), Technical Paper 63RV: Current Population Survey Design and Methodology, issued March 2002", http://www.census.gov/prod/ 2002pubs/tp63rv.pdf
Census Bureau, (2002). Source and Accuracy of the Data for the March 2002 CPS, http://www.bls.census.gov/cps/ads/2002/S&A_02.pdf
Census Bureau, (2004). Detailed Income Tabulations from the CPS. Last revised: August 26 2004; http://www.census.gov/hhes/income/dinctabs.html
Census Bureau, (2004). Methodology: National Intercensal Population Estimates. Last revised: August 20, 2004 at 07:19:10 AM. http://www.census.gov/popest/archives/ methodology/intercensal_nat_meth.html
Census Bureau, (2004). Historical Income Tables - People. (Table) P-9. Age-People (All Races) by Mean Income and Sex: 1967 to 2001. Last revised: Thursday 13-May-2004 11:31:11 EDT.
Census Bureau, (2004). Historical Income Tables - People. (Table) Table P-10. Age--People (Both Sexes Combined--All Races) by Median and Mean Income: 1974 to 2003, Last Revised: Thursday 13-May-2004 11:31:11 EDT.
Census Bureau, (2004). Guidance on Differences in Income and Poverty Estimates from Different Sources, August 19, 2004
Census Bureau, (2005). U.S. Interim Projections by Age, Sex, Race, and Hispanic Origin. Table. Last revised: 8:40 am on May 13th.
Census Bureau, (2005). Changes in Methodology for the March Current Population Survey. Last Revised: May 13, 2005.
Census Bureau, (2007). Population Estimates. Retrieved March 14, 2007 from http://www.census.gov/popest
Census Bureau, (2008). Current Population Reports. Consumer Income Reports from 1946-2006 (P60), http://www.census.gov/prod/www/abs/income.html
Internal Revenue Service, (2007). Selected Income and Tax Items from Inflation-Indexed Individual Tax Returns. All Returns: Sources of Income, Adjustments, and Tax Items in Constant 1990 Dollars, http://www.irs.gov/taxstats/indtaxstats
Mechanical model of PID: Introduction