Perspectives of cross correlation in seismic monitoring at the International Data Centre

This is a preprint of our paper submitted to the Pure and Applied Geophysics:

We demonstrate that several techniques based on cross correlation are able to significantly reduce the detection threshold of seismic sources worldwide and to improve the reliability of IDC arrivals by a more accurate estimation of their defining parameters. More than ninety per cent of smaller REB events can be built in automatic processing while completely fitting the REB event definition criteria. The rate of false alarms, as compared to the events rejected from the SEL3 in the current interactive processing, has also been dramatically reduced by several powerful filters. The principal filter is the difference of arrival times between the master events and newly built events at three or more primary stations, which should lie in a narrow range of a few seconds. Two effective pre-filters are f-k analysis and Fprob based on correlation traces instead of original waveforms. As a result, cross correlation may reduce the overall workload related to IDC interactive analysis and provide a precise tool for quality check for both arrivals and events. Some major improvements in automatic and interactive processing achieved by cross correlation are illustrated by an aftershock sequence of a large continental earthquake. Exploring this sequence, we describe schematically the next steps for the development of a processing pipeline parallel to the existing IDC one in order to improve the quality of the REB together with the reduction of the magnitude threshold. The current IDC processing pipeline should be focused on the events in areas without historical seismicity which are not properly covered by REB events.


How can we validate the statistics of the Russian elections?

Statistically, there is a relatively simple way to validate (or demonstrate falsification) the overall distribution.  It is commonly used by census bureaus and statistical agencies. One takes a random and small subset from a studied distribution (statistics) and repeat survey (voting) under full control. If the re-estimated statistics is quite different from the previously obtained result one may express a doubt in the accuracy of the original survey (voting). The size of the re-estimated set should be defined by the originally observed statistics and specifically from the polling stations with most suspicious results.
 Therefore, one does not need to repeat the full election process everywhere. We need only a relatively small number of polling stations which can be determined by lottery from the set with distributions of votes in favour of United Russia far from the expected Gaussian distribution.  United Russia would be happy  to prove that these results are not falsified to calm down the protests and indignation.


20,000,000, anyway

In our previous post, we have approximated the dependence of the number of polling stations on the portion of United Russia at a given polling station. Now we can estimate the number of votes falsified by UR from simple functional dependences obtained in the post, i.e. the expected normal distribution and the sine function after 29%.
 Let’s take one polling station with, say, 60% in favor of UR. The mean value estimated from the normal distribution is 29%. Therefore, the expected value is 29% as well and the portion of likely falsified votes in 31%. Here we disregard the falsification technique.  It can be redistribution of votes from other parties or direct adding of faked votes to the correct result.  In any case, 31% of the total number is wrongly assigned to UR. For a middle size polling station of, say, 1000 people it makes 310 wrong votes. Our estimate of the number of polling stations between 60% and 61% is 1160. (Notice that the original bins were only 0.5%-wide.) Hence, there are 1160x310=360000 wrong votes altogether. In the range between 29% and 30%, the difference between the normal and sine distribution are small and the number of wrong votes is only 600. In the range between 99% and 100%, one has 472000 votes hacked.
Summing all bins up one obtains 20,000,000 votes biasing the elections outcome. Same as estimated initially using very crude approximations.


The SINE function of the Russian elections

Several days ago we presented a preliminary analysis of the distribution of votes on polling stations in Russia. We borrowed the original graph from Maxim Pshenichnikov and crudely estimated the difference between the observed curve and that expected from normal distribution (Gaussian) of parties portions. All parties except United Russia showed the expected normal distribution, which is also observed during elections in other countries, where UR does not rule. Today we illustrate the difference and demonstrate quantitatively the difference. Moreover, we propose a simple functional dependence for those who want to carry our own research. Figure 1 reproduces the original distribution (brown curve), which depicts the number of polling stations in 0.5% bins as a function of the UR portion of votes - from 0% to 100%. we have approximated the original distribution with a normal one with mean=29% and stdev=5%. Black curve fits the left wing of the measured function and fails on the right one. When elections are honest, the distribution must be normal as the black line shows. However, the observed brown line is better approximated by a sine function shown by red line. We have found the following equation (not optimal):
s(V) = 637+403sin(V/18)
where  V is the portion of the UR votes (x-axis), s(V) is the number of polling stations in a given V-bin. The curve starts from 30%, where the normal distribution fails to fit the observed line. All in all, the elections’ results demonstrate a sine line of the UR success.
having this function one can easily estimate the number of polling stations with falcified elections and the number of voices falcified in favor of UR.

Figure 1. The number of polling stations in 0.5% bins (vertical axis) with a given portion of votes (between 0% and 100%) for the UR.


Can statistics prove the (Russian) elections fraud?

Many people think that statistics  can no prove the Russian elections' fraud. This is not the case, however. Statistics allows revealing reliable links between various measurable parameters during the elections which can not exist at all. For expample, the results of elections (http://www.vedomosti.ru/tnews/geo/moscow-elections  )
in psychiatric clinics in Moscow show that the United Russia had more than 90% of votes, including legally capable patients. This is a fantastic result for the UR but very bad message for its supporters. The large percentage in favor of the UR evidences in statistical sense that there is a reliable link between voting for the UR and the predisposition to a psychiatric disease. A medical doctor has to include this link in the differential diagnosis of the diseases. When a patient comes, the doctor first has to ask which party s/he has voted for. And this link exists only in Russia.

I am not sure that the UR would agree that there was no fraud in this specific area which would characterize the party proponents as subject to psychiatric diseases.  There are many other links that look absolutely weird and the UR should not be happy with them.

P.S. Not serious.

As a novice in psychiatry I missed an important opportunity – the propensity to the UR is contagious and can easily be transmitted in closed groups. Since this propensity has strong correlation with the predisposition to psychiatric disease the Disease Control and Prevention Centers should issue an alarm.  Before that individual measures should be taken – gloves and mask when communicate with the UR people.


A thought

One is not able to buy for money what s/he must sacrifice to get rich.


Russian elections:statistical bias between 10,000,000 and 20,000,000 votes

How many voices might be stolen in Russia?
Statistics is a powerful tool to reveal frauds.  Almost everything in this world is distributed according to a few simple laws: normal, power law, exponential, linear. The most famous is the normal distribution which happens often because of the Central Limit Theorem. Briefly, if some process is controlled by a very big number of parameters its outcome usually obey the normal distribution or Gaussian.  The 2011 elections in Russia is a good example of the Gaussian distribution and potential fraud. Figure 1 displays the distribution of the number of polling stations as a function of the percentage given to five participation parties (borrowed from http://oude-rus.livejournal.com/542295.html ) : EP - United Russia, КПРФ – communist party, etc.). The comminists’ curve demonstrates an example of an approximately normal distribution between 3% and 40%. There is a bias at low values which we can ignore for this case as a consequence of the United Russia behaviour. Specifically, the EP curve demonstrates a clear Gaussian between 15% and 27% and then a long heavy tail. Here we have to notice that it is the only party with this tail. Moreover, there are weird peaks around “magic” numbers: 50%, 55%, …,100%. Hence, the expected normal behaviour is violated and it can be only the result from a voluntary action. If the number of causes behind voting is large then only normal distribution can be observed. If such a distribution would be demonstrated by some lottery, it would have been a fraud and the distribution could be used as a proof.  (It is worth noting that heavy tail distributions are observed in stock market, where they are less prominent.)
Using the EP distribution one can easily estimate the number of votes above that dictated by normal distribution. Let us continue symmetrically the EP curve beyond 27% - a mirror reflection. The distribution should fall to zero at approximately 50% (x-axis). The difference between the measured EP curve and the imaginary extension is the bias introduced by a control force. On average, there are 400 polling stations in any 0.5% bin after 50% in the EP curve. One has 400x2x50=40,000 polling station with biases.
Now we have to evaluate the number of votes. The mean value for the EP distribution is 27%. Then for all polling stations the bias can be estimated as the measured value less 27%. Between 50% and 100% one has 48% on average, i.e. (100%+50%)/2 – 27%. For an average polling station with 1000 voters one has a positive bias of 480 voters. For all 40,000 stations this gives 19,000,000. This figure depends on the average size of polling stations. For 500 voters stations the final number is 9,500,000. 

Anyway, it is big. 


Figure 1. The number of polling stations in 0.5% bins (vertical axis) with a given portion of votes (between 0% and 100%) for five major parties.  


On faulty philosophy of economics

There is a vivid discussion of the 2008/2009 economic and financial crisis which often touches upon the failure of the mainstream economic theory to predict and describe major events.  This theory is actually a bunch of assumptions, sometimes mutually exclusive. There are common features, however, which are shared by all schools of economic thought. They do assume that economic agents (individuals or firms) have some freedom and can act according to their own rational or irrational choices. When these choices are not well coordinated, synchronized and balanced throughout a given economy slowdowns and even recessions are likely to happen.  In economic literature, these events are often introduced as shocks (to demand or supply).  The term “shock” is a euphemism of the physical realization of unknown psychological processes in the economy. Generally, the mainstream economics does not try to explain these shocks as a result of real processes.
We see an inherent problem in the conventional approach to economic processes. It skips the first and fundamental step which is a must for any theory. The basic assumption should be that economic agents do not have a free will and follow up only prescribed trajectories. This notion is supported by the observation of a “frozen” personal income distribution. The normalized income distribution is constant over time as obtained from the reports of Annual social and Economic Supplements to the Current   Population Surveys conducted by the Bureau of Census.
All in all, we assume that no economic agents can change the rate of real economic growth as expressed by real GDP per capita from inside the economy. There are no endogenous forces which can divert the economy from its predefined trajectory of inertial growth. (We do not consider here wars, pandemics and any economy-wide catastrophes.)  This system is also resilient to exogenous forces because the agents can react only the predefined way to any events. Then, the only driving force is the quantitative change in the distribution of agents. That’s why the influx of individuals is the most probable source of shocks to the economy. The outflow is stationary and cannot change the system. As a result, the evolution of real GDP per capita is driven by the number of young people entering the economy. And this is our fundamental model, which must be the basis for any model with mobile agents. For no clear reason, this model is rejected by the conventional economics. 
In classical mechanics, the most fundamental laws and models are first based on the assumption that all objects are identical and have ideal properties.  In physics textbooks, one operates with points, ideal spheres, perfect rigidity, constant coefficients, and so on. In reality, there are tangible deviations from the perfect world of classical mechanics, but all fundamental laws are always valid. Moreover, before one starts to inspect the real world s/he has to learn classical mechanics with its perfect relationships. The mainstream economics has made a trivial philosophical mistake and missed the most fundamental part of scientific approach. We have filled this gap with our book “mecħanomics. Economics as Classical Mechanics. ” It resolves the most urgent problems of economics as a science and provides a solid basis for the further development.  We use only quantities of economic agents whose only property is to exist.  In that sense they are similar to ideal rigid spheres and do not have freedom.


Some science instead of economics

I am an amateur economist and economics is rather  entertainment for me. My profession is geophysics and seismology.  We have started an important study and here is an extended abstract (https://na22.nnsa.doe.gov/prod/researchreview/2011/PAPERS/06-01.PDF pdf) presented in September 2011.


Our objective is to assess the performance of a cross-correlation technique as applied to automatic and interactive processing of aftershock sequences at the International Data Centre (IDC). This technique allows a flexible approach  to time windows, frequency bands, correlation thresholds and other parameters controlling the flux of detections. For array stations, we used vertical channels to calculate a unique cross-correlation coefficient. All detections obtained by cross-correlation were then used to build events according to IDC definitions. To investigate the influence of all defining parameters on the final bulletin, we selected the aftershock sequence of the March 20, 2008, earthquake in China with mb(IDC) = 5.41. As templates, fragments of P- and Pn-waves from two sets of the IDC
Reviewed Event Bulletin (REB) events were selected: all 19 events from the second and third hour after the main shock and 50 events from the entire sequence with mb between 3.0 and 4.0. By varying the threshold of correlation coefficient and F-statistics which were applied to original waveforms and to the cross-correlation time series, we obtained several bulletins with different numbers of events, which could be compared to the original REB and also checked manually. These events were split into four categories: (1) new events having a counterpart (origin time within 10 sec) in the REB, (2) new valid events not having a counterpart in the REB, (3) valid REB events not having counterparts in any bulletin created by cross-correlation, and (4) bogus (invalid) events created by crosscorrelation
and which are in the REB.


Real real GDP

We have already reported that real GDP in the United States is biased by the change in definition of the GDP deflator around 1978. (According to “Concepts and Methods of the U.S. NIPA” the growth rate of real GDP is the growth rate of nominal GDP reduced by the overall change in prices as expressed by the GDP deflator or the economy-wide price index.) Figure 1 shows that before 1978 the GDP deflator and CPI were similar and their difference is negligible since 1929. In 1978, a new definition of the GDP deflator was introduced and the curves coinciding before 1978 started to deviate. In 2010, the deviation was approximately 20%.
A reasonable assumption on the new definition of the GDP deflator is that it should also be applied to the time series before 1978. This would reduce the bias introduced in the time series around 1978. Figure 1 demonstrates (dashed line) that the growth rate of the CPI after 1978 is approximately 20% higher that the rate of the GDP deflator growth. Without loss of generality, one may assume that the GDP deflator had been growing at a rate approximately 20% lower than the CPI before 1978. In Figure 1, green line represents the GDP deflator before 1978. Since the growth rate of the GDP deflator was lower the over all change between 1929 and 2010 is also smaller than that of the CPI.   
The difference between the GDP deflator and CPI has an immediate consequence as related to real GDP. When applied to the real GDP estimates published by the Bureau of Economic Analysis,   the corrected GDP deflator provides a more accurate time series. One must use this corrected time series in economics and econometric research in order to avoid the apparent bias.
To begin with, we have updated our comparison of real GDP and real GDP per capita growth. This comparison has demonstrated that the fall in real GDP (recession) has actually returned the growth trajectory to the long-term trend and there is no output gap as estimated from real GDP time series. Figure 2 depicts two old and two new (corrected) curves. One can see that the new curves are above the old ones since the corrected GDP deflator is lower than the CPI and the updated real GDP estimates are higher than the original estimates. The corrected real GDP curve implies a much larger output gap that makes this hypothesis truly void. Essentially, the growth rate between 1930 and 1960 was so high that it can never be repeated.  As a result, the Solow model (constant returns to scale) behind the output gap is likely to be wrong.
Figure 1. Cumulative growth rate (the sum of annual inflation rates) of various definitions of inflation since 1929. 
Figure 2. Old and corrected (corr.) estimates of real GDP and real GDP per capita. The former estimates are below the new ones. 
Update 28.10.2011. Corrected table of real GDP per capita and real GDP in 2005 US $
2010 42270 13108145194

2009 41377 12722814211

2008 43242 13181625195

2007 43791 13225891467

2006 43399 12978410715

2005 42681 12643309646

2004 41792 12265967955

2003 40769 11857542216

2002 40108 11556298822

2001 39769 11347579296

2000 39750 11226180266

1999 38592 10779826176

1998 37238 10283422652

1997 36102 9854329716

1996 34977 9433786578

1995 34112 9093849856

1994 33671 8870793305

1993 32747 8523454654

1992 32255 8287019110

1991 31614 8015097420

1990 32112 8033812272

1989 31877 7885955399

1988 31069 7613800209

1987 30115 7313216945

1986 29443 7086429569

1985 28717 6849176802

1984 27823 6577190262

1983 26186 6136243938

1982 25282 5870935476

1981 26030 5987108240

1980 25640 5838894640

1979 26010 5855007060

1978 25503 5677707387

1977 24150 5320037949

1976 23103 5038386795

1975 21814 4711459533

1974 21689 4639185468

1973 21796 4619427179

1972 20696 4344618556

1971 19729 4097469291

1970 19161 3929633433

1969 19185 3889478681

1968 18673 3748418138

1967 17905 3558699243

1966 17580 3456136296

1965 16656 3237016460

1964 15817 3035727724

1963 15129 2864008171

1962 14684 2739924324

1961 14039 2579516585

1960 13910 2514340896

1959 13838 2451111168

1958 13078 2277479719

1957 13353 2287102029

1956 13298 2237003680

1955 13280 2194815712

1954 12594 2045223652

1953 12885 2055915430

1952 12487 1959856824

1951 12096 1866190324

1950 11400 1729138416

1949 10677 1592940944

1948 10795 1582831140

1947 10309 1485823522

1946 10482 1482021159

1945 11855 1658907921

1944 12093 1673619102

1943 11231 1535740268

1942 9643 1300405392

1941 8175 1090577563

1940 7044 930616096

1939 6542 857231031

1938 6120 795393311

1937 6356 819695449

1936 6072 778270922

1935 5389 686317930

1934 4963 627750020

1933 4535 570028178

1932 4683 585161480

1931 5488 681318834

1930 5933 730845272

1929 6562 799800744


Real GDP is not correct. Cntd

I have found some more data on the GDP deflator and CPI. These are time seried for France and Japan.
For France, we found a structural break in Okun's law around 1993. For Japan, it was in 1975.
Figures blows validate our finding that both structural breaks were artificial and induced by the change in real GDP definition.

Real GDP is NOT correct

This is an extension of the story on wrong metrology of macroeconomic measurements. Economics chiefly fails and produces a great amount of counterproductive work due to wrong measurements of basic macroeconomics variables. In our previous post we focused on real GDP in the US and here we extend the case by Australia, Canada, and the United Kingdom. As mentioned before, we have devoted enough efforts to reveal and recover many trivial cases in our book “mecħanomics. Economic as Classical Mechanics”. 
Real GDP (see Concepts and Methods of the U.S. NIPA for details) is the difference between nominal GDP and the GDP deflator (price index). The latter is not easy to calculate or even evaluate.  In this post, we found that it is so much a sophisticated problem that before 1978 there was no practical difference between the cumulative inflation values of the CPI and the GDP deflator in the US, as Figure 1 demonstrates. (The cumulative inflation, i.e. the cumulative sum of inflation rates, is different from price index when differently calibrated in the beginning.) Effectively, the curves in the Figure diverge from 1978. There is no direct statement about the reasons of the change in definitions in the aforementioned conceptual document, but we might guess that this is likely related to the introduction of a new methodology to evaluate the overall price inflation.  This difference has affected our analysis of Okun’s law and forced the introduction of a structural break in 1978 in the dependence between unemployment (and employment) rate and the rate of real economic growth. As we lately reported, this was an artificial break completely related to the change in real GDP definition in 1978.  
Thus, before 1978 the CPI was used to estimate of the overall price inflation. Since 1978, the GDP deflator has been used. The difference between these two variables can not be neglected: the cumulative change in inflation between 1978 and 2009 is 20 percentage points. This implies that when applied to the estimates before 1978, the concept of the dGDP would result in a bigger change in real GDP estimates. The overall real GDP increase since 1929 should be much larger in the current definition of the GDP deflator is applied.   
To validate this finding we have borrowed data from the OECD and calculated the CPI and dGDP cumulative inflation in Australia, Canada and the UK, as shown in Figure 2. There are clear breaks in different years: for Australia in 1983 (also 1983 was estimated from a structural break in Okun’s law); for Canada – 1980 (1982 was estimated from a structural break in Okun’s law); for the UK – 1979 (1982 was estimated from a structural break in Okun’s law). For the UK, the CPI and dGDP curves start to diverge in 1979 but the pace of deviation is very slow and the year of structural break in Okun’s law is hard to determine accurately.
One can conclude that all structural breaks in the previously estimated models of the rate of unemployment (Okun’s law) and the employment/population ratio for the US, Australia, Canada and the United Kingdom were entirely artificial and forced by the change in real GDP definition in the years of these breaks. (The OECD does not provide sufficient data length for other modeled countries and we need to find other sources of information for France, Japan and Spain.) Hence, real GPD estimates are incompatible over the break years and thus wrong. One must not use them for modeling and statistical analysis.  
Real GDP is NOT correct!
Figure 1. Cumulative  rate (the sum of annual inflation rates, what is different from inflation index) of inflation in the United States since 1929, as described by the CPI and dGDP.
Figure 2. Same as in Figure 1 for Australia, Canada and the United Kingdom.

... as a young economist

RePEc (Research Papers in Economics) is the largest archive of economic papers (articles/chapters/software)  with ~30,000 voluntary participants from 75 countries. It provides a number of rankings including that for young economists. I am not a young scientist but started my amateur research in economics in 2003 with the first paper published in 2005. Threfore, I belong to the category of young (10 years of less) economists with rank 86 (http://ideas.repec.org/top/top.young.html) . It is also the second rank for those started after 2005. Not bad to begin with.


Weird PPI of oil - illustration

Following our previous post on oil price we compare the price of oil futures (dark blue) and PPI of oil (BLS estimates) between 2007 and September 2011. Even simple visual inspection shows that the September's PPI estimate differs from its expected value when converted from oil price. Why?

Update 20.10.2011
I have replaced the Figure with not seasonally adjusted PPI and medium mon thly oil price between 2008 and September 2011. Now the difference in September is prominent.


Do not understand the growth in the producer price index of oil

It looks weird. Figure below shows daily change in oil price futures during the previous three months. There is no big difference between August and September 2011. I would estimate the change as negligible.  At the same time, the PPI of crude petroleum (not seasonally adjusted) grew from 241 to 275.9, which is approximately the level of June. It should be a mistake.

Principal idea of capitalism: the people only work because and so long as they are poor

This is an excerpt from "The Protestant Ethic and the Spirit of Capitalism"  by Max Weber Another obvious possibility, to retur...