How many voices might be stolen in Russia?
Statistics is a powerful tool to reveal frauds. Almost everything in this world is distributed according to a few simple laws: normal, power law, exponential, linear. The most famous is the normal distribution which happens often because of the Central Limit Theorem. Briefly, if some process is controlled by a very big number of parameters its outcome usually obey the normal distribution or Gaussian. The 2011 elections in Russia is a good example of the Gaussian distribution and potential fraud. Figure 1 displays the distribution of the number of polling stations as a function of the percentage given to five participation parties (borrowed from http://oude-rus.livejournal.com/542295.html ) : EP - United Russia, КПРФ – communist party, etc.). The comminists’ curve demonstrates an example of an approximately normal distribution between 3% and 40%. There is a bias at low values which we can ignore for this case as a consequence of the United Russia behaviour. Specifically, the EP curve demonstrates a clear Gaussian between 15% and 27% and then a long heavy tail. Here we have to notice that it is the only party with this tail. Moreover, there are weird peaks around “magic” numbers: 50%, 55%, …,100%. Hence, the expected normal behaviour is violated and it can be only the result from a voluntary action. If the number of causes behind voting is large then only normal distribution can be observed. If such a distribution would be demonstrated by some lottery, it would have been a fraud and the distribution could be used as a proof. (It is worth noting that heavy tail distributions are observed in stock market, where they are less prominent.)
Using the EP distribution one can easily estimate the number of votes above that dictated by normal distribution. Let us continue symmetrically the EP curve beyond 27% - a mirror reflection. The distribution should fall to zero at approximately 50% (x-axis). The difference between the measured EP curve and the imaginary extension is the bias introduced by a control force. On average, there are 400 polling stations in any 0.5% bin after 50% in the EP curve. One has 400x2x50=40,000 polling station with biases.
Now we have to evaluate the number of votes. The mean value for the EP distribution is 27%. Then for all polling stations the bias can be estimated as the measured value less 27%. Between 50% and 100% one has 48% on average, i.e. (100%+50%)/2 – 27%. For an average polling station with 1000 voters one has a positive bias of 480 voters. For all 40,000 stations this gives 19,000,000. This figure depends on the average size of polling stations. For 500 voters stations the final number is 9,500,000.
Anyway, it is big.
Figure 1. The number of polling stations in 0.5% bins (vertical axis) with a given portion of votes (between 0% and 100%) for five major parties.