Originally, this article was published in the Journal of Applied Eeconomic Sciences:

Kitov, I., Kitov, O., Dolinskaya, S., (2009). Modelling real GDP per capita in the USA: cointegration tests, Journal of Applied Economic Sciences, Spiru Haret University,Faculty of Financial Management and Accounting Craiova, vol. 4(1(7)_ Spr), pp. 80-96.

We have already introduced at Seeking Alpha the model for the evolution of real GDP per capita in developed countries. Specifically we presented models for Japan [1], France, Germany, New Zealand, and the UK [2], and demonstrated that the model allows prediction of GDP at various time horizons. Originally the model was revealed using exceptional population estimates reported for the USA [3]. Moreover, we have conducted a cointegration test, which has confirmed the presents of a long-term equilibrium relation between real GDP per capita and the change in population of country-specific age. To begin with, we re-introduce the model.

**Model and data**

There is a measured macroeconomic variable characterized by a long-term predictability for a large developed economy. This is the annual increment of real GDP per capita. One can distinguish two principal sources of the intensive part of real economic growth, i.e. the evolution of real GDP per capita, G: the change in the number of 9-year-olds, and the economic growth trend associated with per capita GDP, G_{t}. The trend has the simplest form – no change in mean annual increment, as expressed by the following relationship:

dG_{t}(t)/dt = A (1)

where G(t) is the absolute level of real GDP per capita at time t, A is an empirical and country-specific constant. The solution of this ordinary differential equation is as follows:

G_{t}(t) = At + B (2)

where B=G_{t}(t_{0}), t_{0} is the starting time of the studied period. Then, the relative growth rate (or economic growth trend) of real GDP per capita is:

g_{trend}(t) = dG_{t}/G_{t}dt = A/G (3)

which indicates that the (trend) rate is inversely proportional to the attained level of the real GDP per capita and the growth rate should asymptotically decay to zero.

One principal correction has to be applied to the per capita GDP values published by the Bureau of Economic Analysis. This is the correction for the difference between the total population and the population of 15 years of age and above. Our concept requires that only this economically active population should be considered when per capita values are calculated.

Following the general concept of the two principal sources of real economic growth one can write an equation for the growth rate of real GDP per capita, g_{pc}(t):

g_{pc}(t) = dG(t)/(dt‧G(t)) = 0.5dN_{9}(t)/(dt‧N_{9}(t)) + g_{trend}(t) (4)

where N_{9}(t) is the number of 9-year olds at time t. One can obtain a reversed relationship defining the evolution of the 9-year-old population as a function of real economic growth:

d(lnN_{9}(t)) = 2(g_{pc }- A/G(t))dt (5)

Equation (5) defines the evolution of the number of 9-year-olds as described by the growth rate of real GDP per capita. The start point of the evolution has to be characterized by some (actual) initial population. However, various population estimates (for example, post- and intercensal one) potentially require different initial values and coefficient A.

Instead of integrating (5) analytically, we use the annual readings of all the involved variables and rewrite (5) in a discrete form:

N_{9}(t+Δt) = N_{9}(t)[1 + 2Δt(g_{pc}(t) - A/G(t))] (6)

where Δt is the time unit equal to one year. Equation (6) uses a simple representation of time derivative of the population estimates, where the derivative is approximated by its estimate at point t. The time series g_{pc} and N_{9 }are independently measured variables. In order to obtain the best prediction of the N_{9}(t) by the trial-and-error method one has to vary coefficient A and (only slightly in the range of the uncertainty of population estimates) the initial value - N_{9}(t_{0}). The best-fit parameters can be obtained by some standard technique minimising the RMS difference between predicted and measured series. In this study, only visual fit between curves is used, with the average difference minimised to zero. This approach might not provide the lowermost standard deviation.

Equation (6) can be interpreted in the following way - the deviation between the observed growth rate of GDP per capita and that defined by the long-tern trend is completely defined by the change rate of the number of 9-year olds. A reversed statement is hardly to be correct - the number of people of some specific age can not be completely or even in large part defined by contemporary real economic growth. Specifically, the causality principle prohibits the present to influence the birth rate nine years ago. Econometrically speaking, the number of 9-year olds has to be a weakly exogenous variable relative to contemporary economic growth.

In fact, Eq. (6) provides an estimate of the number of 9-year-olds using only independent measurements of real GDP per capita. Therefore, the amplitude and statistical properties of the deviation between the measured and predicted number of 9-year olds can serve for the validation of (4) and (5).

**Cointegration test**

Skipping unit root tests we continue with a number of cointegration tests. The assumption that the measured number of 9-year-olds in the USA, N_{9}_{m}(t), and that predicted from the real economic growth, N_{9}_{p}(t), are two cointegrated non-stationary time series is equivalent to the assumption that their difference, e(t)=N_{9}_{m}(t) - N_{9}_{p}(t), is a stationary or I(0) process. The predicted and measured series corresponding to the post- and intercensal population estimates are shown in Figures 4 and 6, and their differences in Figures 5 and 7, respectively.

**Figure 4.** Comparison of the measured and predicted postcensal population estimates between 1960 and 2002.

**Figure 5**. The difference between the measured and predicted population estimates presented in Figure 4. For the period between 1962 and 2002, the average difference is 0 and standard deviation is 164926 for coefficient A=547.1325 and the initial value for the population of 3900000 in 1959. Linear regression is represented by a bold straight line.

**Figure 6.** Comparison of the measured and predicted intercensal population estimates between 1960 and 2002.

**Figure 7**. The difference between the measured and predicted population estimates presented in Figure 6. For the period between 1962 and 2002, the average difference is -1 and standard deviation is 165744 for coefficient A=546.079 and the initial value of population of 3900000 in 1959. Linear regression is represented by a bold straight line.

It is natural to start with unit root tests in the difference. If e(t) is a non-stationary variable having a unit root, the null hypothesis of the existence of a cointegrating relation can be rejected. Such a test is associated with the Engle-Granger approach, which requires the N_{9}_{m}(t) to be regressed on the N_{9}_{p}(t) as the first step, however. It is worth noting, that the predicted variable is obtained by a procedure similar to that of linear regression and provides the best visual fit between corresponding curves. The Engle-Granger approach is most reliable and effective when one of the two involved variables is weakly exogenous, i.e. is driven by some forces not associated with the second variable. This is the case for the GDP per capita and the number of 9-year-olds. The latter variable is hardly to be driven by the former one. The existence of an opposite causality direction is the main object of this study.

The results of the ADF and DF-GLS tests, listed in Table 3, demonstrate the absence of a unit root in the measured-predicted difference series for both the post- and intercensal population estimates. Since the predicted series are constructed in the assumption of a zero average difference, trend specification in these tests is “*none*”. The maximum lag order in the tests is 3. These results give strong evidences in favor of the existence of a cointegrating relation between the measured and predicted time series. Therefore, from the econometric point of view, it is difficult to deny that the number of 9-year-olds is *the only* defining force behind the observed fluctuations of the real economic growth. These fluctuations are observed around the growth trend defined by constant annual increment, A, of the real GDP per capita.

**Table 3. **Unit root tests for the differences between the measured and predicted number of 9-year-olds. Trend specification is constant. The maximum lag order is 3.

Test | Lag | Time series |
| 1% critical |

postcensal | intercensal |

ADF | 0 | -2.87* | -2.85* | -2.64 |

| 1 | -3.67* | -3.59* | -2.64 |

| 2 | -2.99* | -3.92* | -2.64 |

| 3 | -2.90* | -2.83* | -2.64 |

DF-GLS | 1 | -3.55* | -3.47* | -2.64 |

| 2 | -2.98* | -2.92* | -2.64 |

| 3 | -2.92* | -2.85* | -2.64 |

The next step is to use the Engle-Granger approach again and to study statistical properties of the residuals obtained from linear regressions of the measured and predicted single year of age populations. A pitfall of the regression analysis consists in a slight time shift between the measured and predicted series – the former variable is assigned to July 1 (averaged population) and the latter to December 31 (cumulative GDP increase) of the same year. Such a phase shift, apparently, results in a deterioration of regression results but can not be recovered since only annual population estimates are available before 1980.

Table 4 presents a summary of relevant unit root tests with the same specifications as accepted for the difference of the same series. The null hypothesis of a unit root presence is rejected for both time series and all time lags. Therefore, the residuals of the regression build an I(0) time series, and the Engle-Granger tests proves that the predicted and measured variables are cointegrated.

**Table 4.** Unit root tests for the residual time series of a linear regression of the measured series on the predicted one. The measured and predicted series are the numbers of 9-year-olds. Trend specification is *none *(zero average value of the residuals) and maximum lag order 3.

Test | Lag | Time series | 1% critical |

postcensal | intercensal |

ADF | 0 | -3.03* | -3.02* | -2.64 |

| 1 | -3.88* | -3.86* | -2.64 |

| 2 | -3.15* | -3.13* | -2.64 |

| 3 | -3.05* | -3.01* | -2.64 |

DF-GLS | 1 | -3.71* | -3.69* | -2.64 |

| 2 | -3.06* | -3.04* | -2.64 |

| 3 | -2.98* | -2.95* | -2.64 |

The Johansen approach is based on the maximum likelihood estimation procedure and tests for the number of cointegrating relations in the vector-autoregressive representation. The Johansen technique allows simultaneous testing for the existence of cointegrating relations and determining their number (rank). For two variables, only one cointegrating relation is possible. When cointegration rank is 0, any linear combination of the two variables is a non-stationary process. When the rank is 2, both variables have to be stationary. When the Johansen test results in rank 1, a cointegrating relation between the involved variables does exist.

In the Johansen approach, one has first to analyze some specific properties of the underlying VAR model for the two variables. Table 5 lists selection statistics for the pre-estimated maximum lag order in the VAR. Standard trace statistics is extended by several useful information criteria: the final prediction error, FPE; the Akaike information criterion, AIC; the Schwarz Bayesian information criterion – SBIC; and the Hannan and Quinn information criterion, HQIC. All tests and information criteria in Table 5 indicate the maximum pre-estimated lag order 1 for VARs and vector error-correction models, VECMs. Therefore, the maximum lag order 1 was used in the Johansen tests along with “*constant*” as the trend specification.

**Table 5.** Pre-estimation lag order selection statistics. All tests and information criteria indicate the maximum lag order 1 as an optimal one for VARs and VECMs.

| Lag | LR | FPE | AIC | HQIC | SBIC |

postcensal | 1 | 63.03* | 5.8e+09* | 25.31* | 25.36* | 25.44* |

intercensal | 1 | 61.63* | 6.1e+09* | 25.38* | 25.42* | 25.51* |

FPE - the final prediction error, AIC - the Akaike information criterion, SBIC - the Schwarz Bayesian information criterion, HQIC - the Hannan and Quinn information criterion

The properties of the VAR error term have a critical importance for the Johansen test [8]. A number of diagnostic tests was carried out for the VAR residuals. The Lagrange multiplier test for the postcensal time series resulted in χ^{2} of 0.34 and 0.09 for lags 1 and 2, respectively. This test accepts the null hypothesis of the absence of any autocorrelation at these lags. The Jarque-Bera test gives χ^{2}=7.06 (Prob>0.03) with skewness=0.96 and kurtosis=3.77, the skewness being of the highest importance for the normality test and the validity of statistical inference. Hence, the residuals are probably not normally distributed, as expected from the artificial features of the measured population time series. The VAR model stability is guaranteed by the eigenvalues of the companion matrix, which are lower than 0.63. As a whole, the VAR model accurately describes the data and satisfies principal statistical requirements applied to the residuals.

Table 6 represents some results of the Johansen tests. In both cases the cointegrating rank is 1. Hence, there exists a long-run equilibrium relation between the measured and predicted number of 9-year-olds in the USA. The predicted number is obtained solely from the readings of real GDP per capita measured and reported by the BEA. We do not test for the causality direction between the variables because the only possible way of influence, if it exists, is absolutely obvious.

**Table 6.** Johansen test for cointegration rank for the measure and predicted time series. Trend specification is constant. Maximum lag order is 2.

Time series | Rank | Eigenvalue | SBIC | HQIC | Trace statistics | 5% critical value |

postcensal | 1 | 0.397 | 52.48* | 52.23* | 2.198* | 3.76 |

intercensal | 1 | 0.379 | 52.55* | 52.30* | 2.117* | 3.76 |

In this Section, three different tests have demonstrated at a high level of confidence that the measured and predicted number of 9-year-olds in the USA are cointegrated. One can use the cointegrating relation for a reliable prediction of real economic growth in the USA. This finding proves that the evolution of a developed economy is predictable in principle.

**References**

[1] Kitov, I., (2006). *The Japanese economy*, MPRA Paper 2737, University Library of Munich, Germany, http://ideas.repec.org/p/pra/mprapa/2737.html

[2] Kitov, I., (2009). Predicting real GDP per capita in France, Germany, New Zealand, and the UK, MPRA Paper 15503, University Library of Munich, Germany

[3] Kitov, I., (2006). *GDP growth rate and population*, Working Paper 42, ECINEQ, Society for the Study of Economic Inequality.