Perspectives of cross correlation in seismic monitoring at the International Data Centre

This is a preprint of our paper submitted to the Pure and Applied Geophysics:

We demonstrate that several techniques based on cross correlation are able to significantly reduce the detection threshold of seismic sources worldwide and to improve the reliability of IDC arrivals by a more accurate estimation of their defining parameters. More than ninety per cent of smaller REB events can be built in automatic processing while completely fitting the REB event definition criteria. The rate of false alarms, as compared to the events rejected from the SEL3 in the current interactive processing, has also been dramatically reduced by several powerful filters. The principal filter is the difference of arrival times between the master events and newly built events at three or more primary stations, which should lie in a narrow range of a few seconds. Two effective pre-filters are f-k analysis and Fprob based on correlation traces instead of original waveforms. As a result, cross correlation may reduce the overall workload related to IDC interactive analysis and provide a precise tool for quality check for both arrivals and events. Some major improvements in automatic and interactive processing achieved by cross correlation are illustrated by an aftershock sequence of a large continental earthquake. Exploring this sequence, we describe schematically the next steps for the development of a processing pipeline parallel to the existing IDC one in order to improve the quality of the REB together with the reduction of the magnitude threshold. The current IDC processing pipeline should be focused on the events in areas without historical seismicity which are not properly covered by REB events.


How can we validate the statistics of the Russian elections?

Statistically, there is a relatively simple way to validate (or demonstrate falsification) the overall distribution.  It is commonly used by census bureaus and statistical agencies. One takes a random and small subset from a studied distribution (statistics) and repeat survey (voting) under full control. If the re-estimated statistics is quite different from the previously obtained result one may express a doubt in the accuracy of the original survey (voting). The size of the re-estimated set should be defined by the originally observed statistics and specifically from the polling stations with most suspicious results.
 Therefore, one does not need to repeat the full election process everywhere. We need only a relatively small number of polling stations which can be determined by lottery from the set with distributions of votes in favour of United Russia far from the expected Gaussian distribution.  United Russia would be happy  to prove that these results are not falsified to calm down the protests and indignation.


20,000,000, anyway

In our previous post, we have approximated the dependence of the number of polling stations on the portion of United Russia at a given polling station. Now we can estimate the number of votes falsified by UR from simple functional dependences obtained in the post, i.e. the expected normal distribution and the sine function after 29%.
 Let’s take one polling station with, say, 60% in favor of UR. The mean value estimated from the normal distribution is 29%. Therefore, the expected value is 29% as well and the portion of likely falsified votes in 31%. Here we disregard the falsification technique.  It can be redistribution of votes from other parties or direct adding of faked votes to the correct result.  In any case, 31% of the total number is wrongly assigned to UR. For a middle size polling station of, say, 1000 people it makes 310 wrong votes. Our estimate of the number of polling stations between 60% and 61% is 1160. (Notice that the original bins were only 0.5%-wide.) Hence, there are 1160x310=360000 wrong votes altogether. In the range between 29% and 30%, the difference between the normal and sine distribution are small and the number of wrong votes is only 600. In the range between 99% and 100%, one has 472000 votes hacked.
Summing all bins up one obtains 20,000,000 votes biasing the elections outcome. Same as estimated initially using very crude approximations.


The SINE function of the Russian elections

Several days ago we presented a preliminary analysis of the distribution of votes on polling stations in Russia. We borrowed the original graph from Maxim Pshenichnikov and crudely estimated the difference between the observed curve and that expected from normal distribution (Gaussian) of parties portions. All parties except United Russia showed the expected normal distribution, which is also observed during elections in other countries, where UR does not rule. Today we illustrate the difference and demonstrate quantitatively the difference. Moreover, we propose a simple functional dependence for those who want to carry our own research. Figure 1 reproduces the original distribution (brown curve), which depicts the number of polling stations in 0.5% bins as a function of the UR portion of votes - from 0% to 100%. we have approximated the original distribution with a normal one with mean=29% and stdev=5%. Black curve fits the left wing of the measured function and fails on the right one. When elections are honest, the distribution must be normal as the black line shows. However, the observed brown line is better approximated by a sine function shown by red line. We have found the following equation (not optimal):
s(V) = 637+403sin(V/18)
where  V is the portion of the UR votes (x-axis), s(V) is the number of polling stations in a given V-bin. The curve starts from 30%, where the normal distribution fails to fit the observed line. All in all, the elections’ results demonstrate a sine line of the UR success.
having this function one can easily estimate the number of polling stations with falcified elections and the number of voices falcified in favor of UR.

Figure 1. The number of polling stations in 0.5% bins (vertical axis) with a given portion of votes (between 0% and 100%) for the UR.


Can statistics prove the (Russian) elections fraud?

Many people think that statistics  can no prove the Russian elections' fraud. This is not the case, however. Statistics allows revealing reliable links between various measurable parameters during the elections which can not exist at all. For expample, the results of elections (http://www.vedomosti.ru/tnews/geo/moscow-elections  )
in psychiatric clinics in Moscow show that the United Russia had more than 90% of votes, including legally capable patients. This is a fantastic result for the UR but very bad message for its supporters. The large percentage in favor of the UR evidences in statistical sense that there is a reliable link between voting for the UR and the predisposition to a psychiatric disease. A medical doctor has to include this link in the differential diagnosis of the diseases. When a patient comes, the doctor first has to ask which party s/he has voted for. And this link exists only in Russia.

I am not sure that the UR would agree that there was no fraud in this specific area which would characterize the party proponents as subject to psychiatric diseases.  There are many other links that look absolutely weird and the UR should not be happy with them.

P.S. Not serious.

As a novice in psychiatry I missed an important opportunity – the propensity to the UR is contagious and can easily be transmitted in closed groups. Since this propensity has strong correlation with the predisposition to psychiatric disease the Disease Control and Prevention Centers should issue an alarm.  Before that individual measures should be taken – gloves and mask when communicate with the UR people.


A thought

One is not able to buy for money what s/he must sacrifice to get rich.


Russian elections:statistical bias between 10,000,000 and 20,000,000 votes

How many voices might be stolen in Russia?
Statistics is a powerful tool to reveal frauds.  Almost everything in this world is distributed according to a few simple laws: normal, power law, exponential, linear. The most famous is the normal distribution which happens often because of the Central Limit Theorem. Briefly, if some process is controlled by a very big number of parameters its outcome usually obey the normal distribution or Gaussian.  The 2011 elections in Russia is a good example of the Gaussian distribution and potential fraud. Figure 1 displays the distribution of the number of polling stations as a function of the percentage given to five participation parties (borrowed from http://oude-rus.livejournal.com/542295.html ) : EP - United Russia, КПРФ – communist party, etc.). The comminists’ curve demonstrates an example of an approximately normal distribution between 3% and 40%. There is a bias at low values which we can ignore for this case as a consequence of the United Russia behaviour. Specifically, the EP curve demonstrates a clear Gaussian between 15% and 27% and then a long heavy tail. Here we have to notice that it is the only party with this tail. Moreover, there are weird peaks around “magic” numbers: 50%, 55%, …,100%. Hence, the expected normal behaviour is violated and it can be only the result from a voluntary action. If the number of causes behind voting is large then only normal distribution can be observed. If such a distribution would be demonstrated by some lottery, it would have been a fraud and the distribution could be used as a proof.  (It is worth noting that heavy tail distributions are observed in stock market, where they are less prominent.)
Using the EP distribution one can easily estimate the number of votes above that dictated by normal distribution. Let us continue symmetrically the EP curve beyond 27% - a mirror reflection. The distribution should fall to zero at approximately 50% (x-axis). The difference between the measured EP curve and the imaginary extension is the bias introduced by a control force. On average, there are 400 polling stations in any 0.5% bin after 50% in the EP curve. One has 400x2x50=40,000 polling station with biases.
Now we have to evaluate the number of votes. The mean value for the EP distribution is 27%. Then for all polling stations the bias can be estimated as the measured value less 27%. Between 50% and 100% one has 48% on average, i.e. (100%+50%)/2 – 27%. For an average polling station with 1000 voters one has a positive bias of 480 voters. For all 40,000 stations this gives 19,000,000. This figure depends on the average size of polling stations. For 500 voters stations the final number is 9,500,000. 

Anyway, it is big. 


Figure 1. The number of polling stations in 0.5% bins (vertical axis) with a given portion of votes (between 0% and 100%) for five major parties.  

The mean income gap between white males and black females grows during the democratic presidencies

Two days ago, we compared the mean income evolution of the white and black population and demonstrated that the difference did not change mu...