1.1. The model flexibility
Our microeconomic model has a large degree of freedom despite it is driven by one exogenous variable – real GDP per capita. One can change the distribution of the capabilities to earn money and the sizes of work capital or scale them in a different way. The dissipation factor, α, has to be estimated from data as all other constants and variables in the model, but one can also change it according to own understanding of income discounting. The critical age, Tc, can be changed as well as its functional dependence on GDP. The index of exponent describing the fall beyond the critical age is difficult to estimate because of data sparsity, scarcity, and low accuracy as observed for the youngest and eldest population. All parameters are adjustable and do change the model outcome. But this change is not random and cannot fit artificially designed personal income distribution. The changes associated with the adjustment of model parameters have to fit actual observations in the U.S. and do fit these observations.
For the overall population, we have estimated the best-fit set of constants and variables. Any change reduces the best fit with observations. At the same time, the male and female PIDs, their derivatives and aggregates are so different that some changes in defining parameters of the original model are inevitable. Moreover, there are a few new features associated with the female income distribution, which we did not observe in the overall PID. These features have to be included in the model, but they should not disturb the results of the original model. In this Subsection, we demonstrate how the original model reacts to the change of some defining parameters in view of the new features discussed in Section 2.
In this study, we consider males and females separately. They are different in the age pyramid, which is an important exogenous parameter of the model. Crudely, women make approximately 51% of the total working age population in the U.S., and thus, the males’ share is 49%. For the sake of simplicity we multiplied the overall population pyramid by factor 0.51 and 0.49 and obtained the female and male age pyramid, respectively. This approach may introduce some minor errors in the gender ratio for some ages, but these errors are smaller than those related to income measurements and GDP estimates. And the population pyramid does not affect the mean income predictions, which are based on the proportional distribution of people over the capability to earn money and the sizes of work instruments.
The initial value of the critical age, Tc, is borrowed from the overall model and is fixed to 19.07 years in 1930 in all models [Kitov and Kitov, 2013]. In some versions of the gender-dependent model we move the start year to 1960. Then the initial value of critical age is changed according to (19), i.e. as the square root of the cumulative change in the corrected real GDP per capita. It makes 26.08 years in 1960. This value is also fixed in the model. The index of the Pareto law is fixed to k=3.35 for both sexes and does not depend on calendar years and age. As we found in the previous Section, the largest deviations in k are observed for ages having extremely low representation in top incomes. Females’ participation in the Pareto zone is low throughout the entire 20th century. As a result, constant k does not affect the accuracy of model predictions.
In Section 2, we discussed a possibility that the size of instruments used to earn money may be smaller for women than for men. The instrument size affects the time needed to reach a given threshold as well as the level of income one can obtain. In order to change the instrument size we introduce a scaling factor, FL, and multiply all standard sizes by this factor. Figure 23 presents the evolution of predicted mean income in 1962 and 2011 for three different cases: FL=0.5, 1.0, and 1.5. In order to retain portion of population above the Pareto threshold at constant level we scale standard MP=0.43 by the same factors as the instruments. As a result, MP in Figure 23 has values 0.215, 0.43, and 0.645 for FL=0.5, 1.0, and 1.5, respectively.
The increase of all standard sizes Lj by factor 1.5 results in a larger relative income, which is just scaling the mean income curve: the ratio of the peak incomes in 1930 and 2014 is retained in all three cases. The critical age Tc also does not change because it depends only on GDP. The slopes of two FL=1.5 curves in Figure 23 decrease relative to those in the standard model with FL=1.0, however. This is the same effect as observed with increasing real GDP per capita in the overall mean income curves (Figure 1). All in all, the increased sizes of work capital do not create new effects.
Figure 23. Mean income as a function of age in 1930 and 2011. The size of work instrument is multiplied by a factor FL=0.5, 1.0, and 1.5. The Pareto thresholds MP are also multiplied by the same factor in order to retain the portion of people with the highest incomes. They are 0.215, 0.43 and 0.645, respectively. Notice a shelf in the mean income curve for 1930 with FL=0.5.
Figure 24. The number (left panel) and portion (right panel) of people above the Pareto threshold as a function of age in 1930 and 2011. The size of work instrument is multiplied by a factor FL=0.5, 1.0, and 1.5. The Pareto thresholds MP are also multiplied by the same factor in order to retain the portion of people with the highest incomes. They are 0.215, 0.43 and 0.645, respectively.
For smaller sizes, the mean income curves become different. They contain periods when mean income does not change: between 5 and 18 years of work experience in 1930 and from 20 to 37 years in 2011. This is the effect we have found in the females mean income in the 1960s and 1970s (see Figure 15). As we know from the discussion in Section 1, smaller instruments imply faster growth of all incomes. The initial segments of mean income demonstrate steeper growth to maximum values for all earning capabilities Si from 2 to 30. The lowered Pareto threshold suggests that everyone who could reach it in normal conditions FL=1.0 does achieve it, but much faster. When all people reach their maximum incomes, including those in the Pareto zone, the model suggests no further changes before the critical age. With growing GDP, the time when all incomes reach maximum and Tc both increase. The start point and duration of two shelves in Figure 23 both increase.
The portion of people above the Pareto threshold is presented in Figure 24. Both curves for FL=0.5 are characterized by early and steep growth. The number of people is displayed in the left panel. Here, we use the males’ age pyramid and all observed fluctuations are associated with the varying number of population rather that with the model parameters. It is instructive that the total number of people increases with falling FL – faster income growth involves younger population into the Pareto zone. The number above MP in a given age does not affect the mean income since the portion does not depend on the total number. That is why the mean income curves in Figure 23 are smooth.
When the Pareto threshold is retained at the same level for all FL, the number of people is much lower for FL=0.5, as Figure 25 demonstrates. This might be the reason of very low women’s representation in the top incomes in the 20th century. For FL=0.5, the number of people is just marginally above zero in 1930 and 2011, as was observed in the females’ distribution in Figure 18. So, the lowered sizes of work instruments available for women in the U.S. result in a very low number of women with top incomes.
Figure 25. The number of people above the Pareto threshold as a function of age in 1930 and 2011. The size of work instrument is multiplied by a factor FL=0.5, 1.0, and 1.5. The Pareto thresholds MP=0.43 for all cases.
In Section 3.2, we demonstrate that by change in the size of instrument people use to earn money and synchronized change (or no change) in the Pareto threshold our model is able to qualitatively explain some striking differences in income distribution as observed for men and women. This is a good basis for accurate quantitative prediction of principal features observed in the males’ and females’ PID and their derivatives. The age dependence of the mean income and the portion above the Pareto threshold are likely the most prominent feature which demonstrates secular evolution coherent with the growth in real GDP per capita. At first glance, male population in the U.S. demonstrates simpler behaviour. We begin gender-dependent modelling with prediction of men’s personal incomes.
There are some new features we can predict using the original setting of our model. Figure 14 shows that the male mean income curves have the same critical age and rate of growth as the total population. In 2013, the critical age is slightly larger for men. One may suggest that men in the U.S. economy use larger work capitals than women. In essence, their work instruments create the size distribution defined in Section 1. Within our framework, men’s incomes can be modeled with standard instruments.
In Figure 26, we present the predicted and observed mean income curves for 1962, 1977, 1987, and 2011. All defining parameters are the same as in the original model together with 1930 as the start year. There are no income microdata before 1962 and we compare our predictions with actual measurements. The fit between the curves depends on age and year. For the model, the most important part of income evolution is before the critical age. This segment is the most sensitive to defining parameters including the critical age. Therefore, the almost perfect match observed before the critical age during the period from 1962 to 2014 proves that our model predicts personal incomes of male population in the U.S. with incredible accuracy.
The change in the slope and shape of the initial segments is described precisely as a function of GDP. Real economic growth leads to slower relative income growth as was already found in the overall model. The growth in critical age, i.e. the age of peak mean income, has been growing since 43 years of age in 1962 and currently is above 56 years of age. We expect further increase in the age of peak income.
There is a common feature observed in all measured curves above the critical age, which is best highlighted when compared to the predicted curves following the exponential fall defined by TA=60 years of work experience and A=0.65 (see equation (18) for details). At the age from 64 in 1962 to 68 in 2011, the measured curves experience a sharp drop to the level between 0.4 and 0.5. The period between Tc(t) and this specific age decreases with time – from 20 years in 1962 to 12 years in 2011. In Figure 13, this effect was ironed out by smoothing with MA(7). As we discussed for female population, the reason behind this drop is likely associated with retirement. The retirees have constant income as a share of their work incomes. Then the time and amplitude of the drop can be explained by the performance of social security system.
Figure 26. Comparison of measured and predicted mean income as a function of age. Selected years between 1962 and 2011 are presented. The measured and predicted curves start to diverge above 64 in 1962 and above 68 years of age in 2011.
The overall fit between the predicted and observed mean income suggests that the portion of people above the Pareto threshold should also be accurately estimated. The overall model exactly predicts the number of people of a given age above the Pareto threshold. One can directly calculate their total sub-critical income as a sum of all Mij(τ,t)>MP, as if they did not move to the super-critical regime of income distribution. The net gain obtained by these people when they move to the super-critical power law distribution can be calculated since we have the number of people for each age and the overall index k.
Obviously, the net gain is constant for a given MP since the sub- and super-critical total incomes are fixed. For the original model the ratio of the super and subcritical total incomes is 1.33 for MP=0.43 [Kitov, 2009]. As a result, we do not need to calculate individual incomes in the super-critical power law distribution to get an estimate of the total contribution of rich people to the mean income. We just need to multiply all Mij(τ,t)>MP by a factor of 1.33. Hence, when the mean income dependence on age is accurately predicted we believe that the number of people above the Pareto threshold is also accurately described as a function of age.
Figure 27. Comparison of measured and predicted number of people above the Pareto threshold. Actual thresholds are $11,000 in 1962 and $87,000 in 2010.
Figure 27 illustrates this statement. In the left panel, the measured and predicted number of males is presented as a function of age for 1962 and 2010. The overall fit is more than excellent considering only one parameter (real GDP) describing the whole variety of changes (e.g., inflationary periods, recessions, high and low oil prices, changes in fiscal and budgetary rules, varying accuracy of all involved measurements, revisions to all involved variables among many others) during the past half-century. Moreover, to predict the number of the elder males the model involves their almost 50 year history of work experience, which includes the real GDP per capita time series since 1910. Figure 4 illustrates the effect of work experience history on income in a given year. In the right panel of Figure 27, we demonstrated the difference in the critical ages. Unfortunately, the age estimates are subject to bias because of high-amplitude fluctuations, which are related to the data scarcity at higher incomes and, partially, are induced by topcoding.
The drop at the age of retirement is not seen in Figure 27. This observation suggests that retirement does not affect the processes in the Pareto distribution. At the same time, the fall in the number of people above the threshold beyond the retirement age is accurately predicted by our model, which uses the capability to earn money and the size of work capital as defining parameters evolving as the square root of the real GDP per capita. Therefore, the effect of retirement is likely an artificial feature related to sub-critical incomes, which should be incorporated into the model as it is.
The difference between the male and female income distribution in the U.S. is a well-established fact. We do not consider the reasons behind the catastrophic disparity, but have to stress that the work capabilities used in the model to predict personal incomes are likely the same for men and women. From technical point of view, this difference allows to improve the model and introduce new features which were not observed in the overall PID and its aggregates and derivatives. The updated model successfully meets a number challenges.
There are features in the females’ curves in Section 2, which likely manifest some changes in the parameters considered in the original setting as constant. The convergence trend of the male and female PIDs observed since the earlier 1960s suggests that the size of work instruments available for women has been growing. It does not reach the level of the males’ instruments, however. So far, we used fixed relative sizes of work instruments. So, our model has to include a new option allowing the size change according to some predefined time or real GDP function.
It is reasonable to start with an approximate estimate of FL which fits best the observed features for different years. We have calculated a number of models with FL changing from 0.2 to 1.0 with a 0.05 step and all other parameters as in the original model. As we know from Section 2, the most sensitive part in the mean income dependence on age is the initial segment. Figure 28 depicts the measured and predicted mean income curves for 1962 and 1977 as obtained with FL=0.45. This FL value is the best to describe the dynamics of mean income growth in the youngest population and also demonstrates the feature of constant mean income before the critical age. The predicted curves start to fall long before the critical age, however. But Tc is controlled by a different dependence on real GDP per capita and does not influence the initial growth. So, the estimate of FL=0.45 is not compromised by the deviation of the measured and predicted critical ages, which we have address next.
In Figure 29 we present similar curves for 1996 and 2011, but for FL=0.65. The measured curves are accurately approximated by the model between the start point and the critical age. The fall is also well predicted but there are some deviations in 1996, which could have some connection to retirement, as discussed in the previous Subsection. Comparing the curves in Figures 28 and 29 one can conclude that FL has to grow from 0.45 in 1962 to 0.65 in 2011 in order to fit the rate and duration of the initial growth.
Figure 28. The observed and predicted mean income for 1962 (left panel) and 1977 (right panel). In both models FL=0.45.
The simplest way to match the critical age observed between 1962 and 1977 is to rise the initial value Tc(1929). Then the predicted value in 1962 and 1977 has to change accordingly as the square root of real GDP per capita. Figure 30 displays the modified mean income curves (green dotted lines) for the same years as in Figure 28 and 29, which now fit observations in 1962 and 1977 before and beyond the critical age. For comparison, we have also drawn the curves from the original model (red dotted lines). The change in Tc is not justified by any reasonable relationship, however. In addition, the curves predicted for 1996 and 2011 with the higher initial value of Tc do not fit observations neither in the initial segment not in the critical age. The shelf exists in all models, but its length is not well predicted. Considering the large number of contradictions we met when modelling the female mean income it is necessary to extend our original model to match all observed changes in a consistent way.
Figure 29. The observed and predicted mean income for 1996 (left panel) and 2011 (right panel). In both models FL=0.65. Notice the fit between the observed and measured critical ages.
Figure 30. The observed and predicted mean income as a function of age for selected years between 1962 and 2011. Two models are shown: standard model with FL=0.45 and Tc =19.07 years in 1929 (red dotted line); standard model with FL=0.45 and Tc=26.0 years in 1929 (green dotted line).
A lower FL in the earlier years provides an extended “no-change” period before the critical age. The difference between the observed and predicted critical ages in Figure 28 suggests that the concept of critical age belongs to the super-critical distribution rather than to the low-middle incomes predicted by the model. Moreover, the critical age for women observed between 1962 and the earlier 1980s is almost constant and close to the retirement age. It is reasonable to suggest that there are two critical ages – one for the super-critical distribution, Tc, and another for the sub-critical distribution, TS. The former variable depends on GDP. The latter one – is practically constant and may correlate with the retirement age. It is easy to model the retirement effect by an exponential fall in the portion of people in the labour force [Munell, 2011]. Using the basic concept of (18) one can write the following equation:
η = −lnB / (TB – TS) (23)
where B is the constant relative level of income rate at age TB>TS. Both constants in (23) have to be estimated from data. Then the evolution of sub-critical incomes above the retirement age is described by (23) and the evolution of super-critical incomes is defined by (18). When a super-critical income falls below the Pareto threshold it starts to follow (23). As we know, the portion of people in the Pareto zone falls exponentially and
The effect of TS in the overall distribution was masked by income dominance of male population, data roughness, and by the fact that the most recent period was modeled. As a result, TS is missing from the original model. The use of the detailed IPUMS income microdata available from 1962 has revealed this effect for both genders. Figure 26 indicates that this effect cannot be neglected even in the males PIDS and we consider it later on.
Having the new concept of critical age, TS, for sub-critical incomes we can model now the lengthy shelf observed in the mean income curves. Three parameters in (23) have to be fixed in the model: TS, TB, and B. The age of retirement may vary by a few years over the whole modelled period between 1962 and 2011 as we see in the measured mean income curves. Without loss of generality, from the lengthy period of income measurements, from the estimated age of women’s retirement, as well as from the results of extensive modelling we fix Ts to 43 years of work experience or 58 years of age. This is the start of retirement process, which is described by an exponential fall in the portion of people in labour force. The estimated age of retirement is defined by the portion of 50% in the labour force [Munell, 2011]. To match observations we vary A and B in (18) and (23) and obtain the best fit estimate by a simple regression procedure in the age ranging from 15 to 70 years. Larger deviations observed beyond 70 years of age are likely related to poor measurements and should be neglected as noise.
From Figure 30, we have estimated the overall increase in FL from 1962 to 2011. From the IPUMS data, one cannot estimate the exact form of functional dependence of women’s Lj on time or real GDP per capita before 1962. It is reasonable to choose 1960 as the start year and to reduce all initial values used in the original model starting in 1930 to 1960. They are as follows: τ0=1960, Tc=25.92 years, and α= 0.0795, Y(1960)=1. For the dependence of FL on time, we suggest the simplest linear time growth between 1962 and 2011. It makes a 0.004 annual step in FL starting from 0.45 in 1960. We understand that the actual FL dependence on time may deviate from the simple linear approximation, but the accuracy of the IPUMS data is not good enough to estimate exact FL for each year between 1962 and 2011.
There is one model parameter pending adjustment – the Pareto threshold for women, which likely increases over time. As we discussed in Section 3.1, the Pareto threshold may have a direct link to the size of work instrument. Interestingly, the reduction factor FL for women can be best estimated from the rate of growth at the initial stage before 30 years of age. The Pareto distribution is most prominent for the ages above 30 and below 60. But they are connected by some immanent bond, which manifests itself in a slightly different way for women and men.
Observations in Figures 9 and 22 suggest that the power law approximation works from different incomes for males and females. For example, in 1962 the Pareto threshold is $9,000 for males and $6,000 for females. If to consider the males’ threshold as related to normal sizes of work instruments, then the females’ threshold should be reduced by a factor of 2/3. It makes MP=0.29 in 1960 and then this threshold should increase to the level close 0.43 in 2011. In order to match the linear growth in FL, we suggest a similar time function for MP for women. The initial value MP(1960)=0.29 and then it rises at a rate of 0.002 per year, making 0.39 in 2010. Then the growth in MP and FL are synchronized.
There are several novelties proposed in Section 3 to accommodate a few specific income distribution features observed for females. First of all, we have found that the processes in the sub-critical and super-critical income zones have different critical ages, Tc and TS, and indexes of exponential fall - relationships (18) and (23). Essentially, the high-income dynamic processes are decoupled from those in the low-middle income zone. The only connection between them is the exchange of people reaching the Pareto threshold from below and above. The joint distribution of incomes driven by different processes in two income zones is able to describe all features observed in the mean income curves for females.
The multivariable matching process results in the best fit model over all involved parameters. Figure 31 depicts the observed and predicted mean income as a function of age for four years between 1962 and 2011. The fit of the initial segments is the same as in Figure 28 and 29 because the reduction factor FL for the size of instruments to earn money changes from 0.45 in 1960 to 0.65 in 2011. The shelf modelled in Figure 30 by different critical times Tc now consists of two segments as related to the critical ages in the sub-critical (TS) and super-critical (Tc) zones. Since TS is constant, the duration of the second leg, i.e. TS-Tc, has been decreasing with time while the duration of the first leg has been increasing. The second leg of the shelf has a length of a few years in 2011. Following this trend, the second leg will disappear in the future when Tc>TS. The U.S. political and economic authorities may increase the age of retirement, however.
Figure 31. The observed and predicted mean income as a function of age for selected years between 1962 and 2011.
The overall fit between the observed and predicted mean income curves in Figure 31 is much better than that obtained by the original model. In 1993, we notice a slightly lower agreement between the observed and measured mean income during the second leg of the shelf. This is likely the result of the linear approximation of FL and MP. In reality, these parameters are different from those predicted by linear time function. It is worth noting that this discrepancy disappears in 2011 because of the overall convergence between the males and females characteristics. In a few decades, the male and female PIDs should be identical. Meanwhile, it is instructive to apply the upgraded model to males.
1.4.Male model upgraded
The relative sizes of work instruments Lj and the Pareto threshold MP for males do not change with time as well as the age of retirement TS. This fact facilitates the use of the upgraded model. We have to adjust only the levels A and B in (18) and (23) in order to obtain the best fit model. Figure 32 depicts the measured and predicted mean income curves for six years between 1962 and 2011. The overall fit is better than that achieved in Section 3.2 (see Figure 26). The fit below the critical age Tc is the same, however, since it is controlled by the parameters, which were not changed in the upgraded model.
There are two distinct segments beyond the critical age for the super-critical incomes, i.e. those distributed according to the Pareto law. First segment spans the ages between Tc and TS and thus shortens with time. Because of the larger portion of males above the Pareto threshold the first segment demonstrates a clear fall, which is not well seen in Figure 31. The slope of this segment also increases with time since the exponent index γ̃ in (18) depends of Tc(τ).The second segment of the mean income curves is driven by to processes of exponential fall. The highest incomes continue to fall as in the first segment and the sub-critical incomes start their fall beyond the age of retirement. The joint effect of two processes is expressed by an expedite drop beyond TS.
The first segment related to the super-critical incomes is predicted with an excellent accuracy for all presented years as well as the trajectories before Tc. This is an important verification of the model predictive power. The changing slope and length of the pre-critical stage (t<Tc) and same features of the first segment - all change over time as defined by their dependence on one external variable – real GDP per capita. The predicted curves not only fit the measured ones for the selected years. They demonstrate synchronized change with time (real GDP) – the ultimate demand of any dynamic model in physics. Hence, we conclude that our microeconomic model is a physical one rather than economic.
The prediction of the second segment is complicated by a few shortcomings. Firstly, the age of retirement changes with time depending not only on legislation but also on economic situation. After the last financial and economic crisis, many discouraged people have to leave the labour force. The rate of participation in labour force has fallen from 66.4% in 2007 to 62.4% in September 2015. (During the same period the rate of unemployment has changed from 4.6% to 5.1%, with an extremely high rate in-between, however.) As a result the age of retirement may drop. Secondly, the accuracy of income measurements is significantly lower for the eldest population because of the overall under-representation. Thirdly, the population pyramid for the elderly fluctuates as a consequence of WWII and the post-war baby boom. Our model can adjust all these aspects and predict the second segment better than its current version. This would be non-physical amendments, however, with are not the priority of quantitative modelling.
Figure 32. The observed and predicted mean income curves for male population as a function of age.