Integer percentages as fingerprints of electoral falsification

Jerry2 · on July 3, 2016

Absolutely fascinating research!

I wonder if someone has some free time to run the models on several recent elections that were filled with fraud. I'm interested in: Austrian presidential elections [0], Turkish elections [1] and Democratic party primaries [2][3]. Each of them was analyzed for voting fraud and fraud was found in each of them (some were full of massive vote fraud like Turkish elections).

PS: What's with all the downvotes? Is the evidence of massive voter fraud in the West so unsettling that you have to downvote me?

[0] >Austria presidential poll result overturned

http://www.bbc.com/news/world-europe-36681475

[1] >Turkey Elections Massive Vote Fraud

https://erikmeyersson.com/2015/11/04/digit-tests-and-the-pec...

[2] >Hillary Clinton Favored By Election Fraud In Democratic Primaries

http://www.inquisitr.com/3127046/hillary-clinton-favored-by-...

[3] >Election Fraud Watch 2016 (J: awesome blog that tracks election fraud in primaries)

https://electionfraud2016.wordpress.com/

sampo · on July 3, 2016

Austria is actually not suspecting any fraud. About 100 000 postal ballots were counted not fully following the protocol demanded by law, and the elections were decided by a difference of about 30 000 voted. They have no evidence or suspicion of actual fraud, but the legitimacy of the result can still be questioned, so they decided to re-run the elections.

exabrial · on July 3, 2016

The Sanders movement, even though I disagreed with his platform, was true grassroots democracy: Common people getting organized and trying to make a change.

I don't care if I get down voted into oblivion for saying this next part: When looking at the spread between the types of polls, there is a massive difference between the voting elite and the people's will. I don't advocate against the electoral college, but what was supposed to be system that prevented buyouts has proven to be completely forsale by the Clintons. I didn't see massive pro-Clinton rallies. We had quicker investigations into Samsung VS Apple for inconsequential things. No one has been held accountable and no explanations have been given for massive breaches of protocol other than its a plot by the Repuglicans.

I'm tired of everything from the Clinton family. My state KS has a good chance to throw electoral votes at Gary Johnson. Please California if you are listening, don't reward this family for their actions.

smsm42 · on July 4, 2016

I think a major mistake many Sanders supporters are doing is not distinguishing between Clinton being an awful candidate (which she is) and Clinton still winning (which she is). There may be a lot of explanations - from Sanders being even more awful candidate (socialist as an US president? Come on!) to Sanders failing tactically and strategically. One explanation however that does not follow from any Clinton deficiencies is fraud. It also does not follow from polls - polls are increasingly getting worse at predicting elections, especially when they're close. That happened many times recently, and pollsters being wrong means pretty much nothing now except for maybe we need better pollsters.

douche · on July 4, 2016

I think increasingly, polls are proving fallible. I'm not sure people are willing to admit, even pseudonymously, what they really think and vote in the sanctity of the voting booth, for fear that they will be ostracized for not doing what they're supposed to do.

Houshalter · on July 4, 2016

The main reason is probably the abandonment of landlines and phone books, and the adoption of caller id before it. It used to be easy to survey a representative sample of the population, and now it's impossible.

deftnerd · on July 4, 2016

Possibly for polls during the campaign's, but exit polls, where people interview random samplings of people leaving the place where they cast their votes, have historically been very close to the actual vote results. In some areas of the US, the exit polls are spot-on. In other places, the numbers are off by a significant margin.

Some people try to explain it away by saying people vote for one candidate but are ashamed to admit it to a person asking them who they voted for, but wouldn't that tendency show up statistically across all precincts rather than in specific ones?

There is also an alarming number of times this occurs specifically in precincts with electronic voting machines.

smsm42 · on July 4, 2016

It's not exactly "ashamed" I think, more like cautious. I mean, I've read multiple reports of people being beaten bloody on a candidate rally, so people would just prefer to stay out of trouble - even if they are proud of their choice, why they would risk talking about it publicly to random stranger? I mean, the pollster probably won't beat them up, but what if some crazy person overhears it and follows them and makes trouble? Nobody needs that. Sad, but such intolerance is part of the culture in US, many people just don't get "I disagree with you, but I respect your right to have your own opinion".

michaelt · on July 4, 2016

  ashamed to admit it

Eh, no need for shame, it could just be that one candidate appeals more to busy (or privacy-minded) voters who decline to be polled.

I generally avoid people with clipboards who are approaching people on the street, and I can believe that's something that correlates with opinions on other things.

empath75 · on July 3, 2016

If you really think that Clinton didn't get the votes she got fairly, you're living in a bubble. Sanders supporters have this problem of forgetting that women and minorities exist.

(I can't stand the Clintons, but I don't think they cheated)

JohnGB · on July 4, 2016

I am an outsider (European) looking in but closely following the US presidential primaries this year, so I don't have a horse in the race. There are a number of levels at which there has been clear unfairness or cheating (depending on your perspective) in the DNC primaries. Here are some of the simpler ones:

- The recent release of hacked documents from the DNC show that the primaries were not set up to be a fair competition from the start. - The rules are specifically set up in a number of states (New York is a prime example) to disenfranchise any new Democratic Party members from being able to vote. - Thousands of registered voters were removed from the voters roll. There is even a case where a lifelong Democrat running for a congressional seat on a Democratic ticket was removed. - Media coverage made it appear that Clinton was in the lead before any voting even took place. That coverage has been shown to have been orchestrated (at least at the start) by the DNC. - Funds raised in the DNC's name were disproportionately funnelled to the Clinton campaign. - Voting stations were closed down in huge numbers in some states and territories (Puerto Rico and California being prime a examples)

douche · on July 4, 2016

NY was pretty ridiculous.

It's going to look pretty ridiculous if the DNC is ever audited, or has someone more credible than Guccifer2.0 hack and release their documents.

saynsedit · on July 4, 2016

If you look at the demographics of the people who voted for sanders you'll find its representative of the overall population.

I.e. 50% of sanders' voters were women. You'll get similar proportional percentages of minorities.

Simpson's paradox creates the perception that sanders' voters are composed of mostly white males. It's untrue.

fidget · on July 3, 2016

Rallies aren't a very good indicator of popular support. Having 4 million or so more votes is.

empath75 · on July 3, 2016

See also: Howard Dean, 2004

tanderson92 · on July 3, 2016

The electoral college plays no role in the primary nominating process.

zghst · on July 3, 2016

I'm with you buddy

conistonwater · on July 3, 2016

Wasn't [2] already refuted? E.g., http://www.nytimes.com/2016/06/28/upshot/exit-polls-and-why-...

Jerry2 · on July 3, 2016

Refuted? Which instance? There was so many of them. Go check out this series of posts to see corruption in action:

https://electionfraud2016.wordpress.com/

conistonwater · on July 3, 2016

As the NYTimes article points out, a discrepancy between exit polls and results (the main claim of the article you linked to) is not evidence of fraud, rather of the quality of exit polling, which is subject to a number of biases. If those biases are not correctly eliminated, the results will differ. Also, with due respect, that blog looks rather crackpotty.

As if anyone should be surprised that a leftie social democrat wouldn't do well in a US election.

tripzilch · on July 6, 2016

> Is the evidence of massive voter fraud in the West so unsettling that you have to downvote me?

Not downvoting you. But if fraud turns out to be the case, I'm going to be unsettled as hell. It should be unsettling if you live and/or care about democracy in the West, obviously. I doubt we disagree about this though :)

But even if no fraud is suspected or if it finds nothing, this is data science. Unlike say medicine or food science, there's no shortage of data (in theory). So if there's a country that we can reasonably be sure about their democracy is entirely above the table[0], throw it into the test and use that to calibrate the others.

One problem I believe is that there is no such thing as "the test" :-)

This research was done based in a particular system of elections--if you look beyond your own democratic system (or the way you've learned in school how it's supposed to work), you'll know how insanely different they all are. The fact that we actually use a single word "democracy" to describe them all is pretty crazy, now that I think about it.

Both the numbers themselves (what aggregate counts are available) and the specific way of fraud suspected (which you kind of need for a statistical hypothesis) depend heavily on the democratic system involved. For instance that messy thing with the US elections a while back, second term of Bush IIRC, whether there was fraud or not, it was fought over a couple of hundreds of votes in one district or county, was it not?

To disappear or push a few hundred votes one way or another requires a different type of fraud, and therefore statistical testing, than say, the type of fraud that makes a whole country of several hundred million almost always divide their votes between two options so close to 50/50 that a few hundred votes can swing the outcome in the first place.

And that's just one country. So much different creative ways to mess up whatever is locally deemed to constitute democracy, you probably need at least an equal amount of different creative ways to statistically test whether they're fraudulent or not.

[0] come to think of it, maybe there is a shortage of data ...

nzjrs · on July 4, 2016

You are off the mark with Austria I'm afraid.

podmosk · on July 5, 2016

(another author here) I doubt you will find this specific anomaly in US or even Turkey (at least I see nothing in my graphs for 2014 Turkey presidential elections). I'd say this type of falsification is a reminiscence of the purely fictional Soviet elections with their 99% turnout and 99% for the only official candidate. A similar pattern is seen, e.g., in Azerbajcan 2013 presidential election (dozens of precincts reporting identical, albeit in this case non-round, percentages for the incumbent).

blueintegral · on July 3, 2016

I thought this would just be an application of Benford's law, but they note in the beginning that using Benford's law doesn't work for elections (citing this paper: http://www.vote.caltech.edu/sites/default/files/benford_pdf_...)

Does anyone know why Benford's law doesn't work here but does work for other made up numbers in applications like accounting?

ethan_g · on July 3, 2016

Benford's law applies when the underlying distribution is approximately exponential. Because candidates with an exponentially small chance to win are not likely to run, it wouldn't make much sense for elections to have an exponential distribution. Much more plausible distributions are (truncated) normal or uniform, neither of which satisfies Benford's law.

Terr_ · on July 6, 2016

I like to visualize it as throwing random darts at graph-paper that has logarithmic marks on both axes.

sampo · on July 3, 2016

For Benford's law to apply, the numbers (the non-fabricated numbers) need to be coming from a source/distribution covering several orders of magnitude. The fabricated voting percentages are limited to between 0 and 100.

empath75 · on July 3, 2016

Raw vote totals then?

oceliker · on July 4, 2016

We ran a test for one of the problem sets in a course I was TAing. Raw vote counts did follow Benford's law in 2000 US presidential elections. We did not portray it as a tool to detect or disprove fraud, however.

sampo · on July 4, 2016

Perhaps the sizes (number of eligible voters, and also number of people who actually voted) of the voting districts follows the law? Then you could take the total vote counts, and distribute them back to the candidates in a lot of artificial ways (e.g. always 50-50, a random percentage between 30 and 70, say) and probably obtain data that still follows the law.

e2e8 · on July 3, 2016

Benford's law best applies to data that spans multiple orders of magnitude.

TeMPOraL · on July 4, 2016

Isn't it that Benford's law applies when parts of data are being multiplied by other parts (like in most things in real world)? I.e. two uniform distributions multiplied together give a non-uniform distribution?

colanderman · on July 3, 2016

I have seen Benford's law applied to voter turnout, where the distribution could be expected to be closer to exponential (but still not truly, due to districting practices).

Artlav · on July 3, 2016

It's kinda sad to see our elections to be used as reference for falsification detection research...

This isn't exactly new, such analysis was done as early as 2011, i.e. http://lleo.me/dnevnik/2011/12/07_gauss.html

Although, they were more focused on the peculiar distribution, than on integer spikes.

amooeba · on July 4, 2016

The figure that is reproduced and discussed in that lleo.me blog post was made by one of the authors of the paper that is being discussed here. It's just took us from the end of 2011 until early 2016 to actually get the whole thing published.

jcalvinowens · on July 3, 2016

This is fascinating.

I'm tempted to argue the improbably-round numbers might be due to lazily counting/sampling the ballots rather than actual malicious fraud... but I guess sloppily running the election still constitutes fraud in some sense.

wallacoloo · on July 3, 2016

The paper did mention (IIRC) that the ballot counts themselves didn't appear to have any tendency to align with "round" numbers (1250, 1300, etc) which they likely would have if someone was lazily counting the ballots. It was only the calculated percentages (#votes for winner over #votes overall) that had the trend.

amooeba · on July 4, 2016

(One of the authors here.) Actually the ballot counts do have some tendency to align with round numbers. We did not discuss this in the paper because we felt that this "anomaly" can easily be contested, precisely as @jcalvinowens argued up the thread (it is written in our discussion section too). So what we say instead is that the round percentages anomaly persists even if we exclude all ballot stations with round ballot counts, i.e. it cannot be explained as a by-product of round counts.

There are other problems associated with round ballot counts. Here is a funny one: a sizable number of polling stations reported round number of counted ballots, coinciding exactly with the total number of ballots received by the polling station prior to the election day (this information is available too). As if so many people came to vote that every single ballot was used up, turnout being close to 100%. There is little doubt that it was all made up out of thin air.

rcthompson · on July 3, 2016

Or sloppy/lazy counting because they already know the predetermined outcome.

Piskvorrr · on July 3, 2016

Improbable. Sharing a pre-determined result with Joe Random Votecounters * number of voting stations = near certainty of someone leaking that, and a field day for the media.

So, while theoretically possible, realistically summed up by "three men can hold a secret, if one of them is dead."

(And specifically here, if I were a conspirator going for a predetermined result, I would want a somewhat close result, yes - but not an almost-tie as seen here. Something like 55:45 or somesuch, not a "every single vote matters" scenario, where a few thousand votes could swing the outcome)

rcthompson · on July 5, 2016

I think it would be more of an "open secret" pattern: e.g. no one explicitly orders people to make sure the ruling party always wins, but everyone knows someone who knows someone who was punished for reporting the actual count instead of a rigged count.

Piskvorrr · on July 5, 2016

What you are describing is not an "open secret," that's a "conspiracy theory."

darkerside · on July 4, 2016

This is likely true and does not indicate fraud. The majority of precincts overwhelmingly favor one candidate, as do many states.

jessaustin · on July 4, 2016

Elections are not counted by precinct, however. E.g., in a presidential election, many states are won or lost as a block. If one candidate only gets ten votes in this precinct, so we just throw them out, and she loses by five votes over the whole state, that certainly is fraud.

vinchuco · on July 3, 2016

Is there a field that studies inconsistencies of this type?

Recently on HN there was a related test (the Grim test) https://news.ycombinator.com/item?id=11787560

elliotec · on July 3, 2016

I think the field you're looking for is statistics?

mahmud · on July 3, 2016

More specifically anomaly detection.

https://en.wikipedia.org/wiki/Anomaly_detection

vinchuco · on July 4, 2016

Thank you!

JelteF · on July 3, 2016

It would be very interesting to see this method used for other elections to see what unexpected results can be found. Although still interesting, it is not a huge surprise that there has been large vote fraud in Russia.

trhway · on July 3, 2016

>it is not a huge surprise that there has been large vote fraud in Russia

Russian elections for statistics have become like "Lena" for image processing. Typical "camel-double-hump-Gauss" of Russian elections - polling stations without observers at Fig 2. at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3545790/pdf/pnas... ("United Russia" is the Putin's party).

amerine · on July 3, 2016

Nice work. It's interesting they used Russia as the source of election results data. Did I miss an exlaination of their data selection criteria in the paper?

conistonwater · on July 3, 2016

> We used election data from three countries besides Russia: 2011 general election in Spain, 2010 presidential election in Poland (1st round), and 2009 federal election in Germany (Zweitstimmen, i.e. party votes). These three elections were chosen because the data are publicly available down to the single polling station level, and because the number and size of polling stations are comparable to those in Russia

I guess this means they found the data first, then decided to try to analyze it and compare it with similar data elsewhere. It's probably not their work-related research (http://www.fchampalimaud.org/en/the-foundation/mission/).

amooeba · on July 4, 2016

Well, I can give you the explanation: all of the authors are Russian.

jrgirvan · on July 3, 2016

Should try it on the USA election results...

zyxzevn · on July 4, 2016

Why Electronic Voting is a BAD Idea - Computerphile

https://www.youtube.com/watch?v=w3_0x6oaDmI

euske · on July 4, 2016

This thread just reminded me of this quote:

"If you tell the truth, you don't have to remember anything." - Mark Twain

ChicagoBoy11 · on July 4, 2016

This is brilliant. I wonder how the authors got the initial motivation to do the study -- seems like quite a bit of work to find a very specific statistical anomaly; doubt they went into it blindly not really knowing there was something to find...

int_19h · on July 4, 2016

In 2014, there was a lot of attention drawn in Russia and Ukraine to votes in Crimea and Donbass (the "independence referendums", and elections of local leaders). The latter two votes, in particular, had nice exact percentages, which drew attention of people familiar with statistics almost immediately, and were heavily discussed in LJ.

amooeba · on July 4, 2016

The study that is being discussed here was essentially done in the end of 2011 and early 2012, mostly triggered by the results of the 2011 legislative election (heavily discussed in LJ as well; in fact, this study itself was mostly developed in LJ).

The election results in Crimea and Donbass do contain some funny patterns and are most likely largely made up, but no information is available on the polling station level so there is little to analyze statistically.

int_19h · on July 5, 2016

Good to know, thanks! Regarding this:

>> no information is available on the polling station level so there is little to analyze statistically.

What they did was look at the percentages and exact vote counts for each choice, and noticed that they match down to one vote (i.e. if you take the percentage, multiply by total population, and round, you'll get the vote count). For example, in referendum in Lugansk, the official stats are as follows:

1,349,360 valid bulletins 1,298,084 (96.2%) voted yes

Normally, the percentage is a number with a lot of digits after the decimal point, which is then rounded for presentation purposes. But here, if you take 96.2% and multiply it by 1,349,360, you get 1,298,084.32. In other words, it's actually accurate to 4 digits after decimal point, three of which happen to be zeroes (96.2000%).

Of course, it could be just a very unlikely (< 1/1000) coincidence that the number of "yes" votes just happened to fit exactly into three digits of precision. But a more reasonable explanation is that someone started with the percentage that they wanted to get, and then computed the requisite number of bulletins from that.

Said explanation becomes even more reasonable when you take the reported numbers for turnout, and realize that the same relation holds there. Specifically:

1,807,739 eligible voters 1,359,419 voted (75.2%)

1,807,739 х 75.2% = 1,359,419.73

Oh look, another perfect percentage (75.2000%).

lovemenot · on July 3, 2016

On reading just the abstract, there does not appear to be any control in this experiment. I'd have expected them to mention inclusion of election results known with certainty to not be fraudulent.

Likewise, I'd expect there to be proof of fraud by other reliable means in order to validate this method. It is not enough for them to just assert that there can be no other explanation for this data, so these were fraudulent results, so our method must be working.

Absent a control, the strong conclusion that fraud can be detected this way seems unsupported.

conistonwater · on July 3, 2016

They include Poland, Germany, and Spain, see page 10, Fig. 3 and the discussion next to it. They also say the elections prior to 2004 have no such anomaly.

lvs · on July 3, 2016

I would suggest looking at the paper in more detail before saying it's uncontrolled. I found the control just by skimming the figures.

lovemenot · on July 4, 2016

I would do so, but the site prevents access to non-subscribers.

Would you kindly tell me the method they used to independently prove fraud in some of the analysed data.

ams6110 · on July 4, 2016

I had no trouble accessing the paper using the PDF link on the page (https://arxiv.org/pdf/1410.6059v4.pdf)

lovemenot · on July 4, 2016

Thanks for the link to PDF. The "This URL" link takes you to a page that asks for a login.

Having now read the article, I stand by my original point. There seems to be no verification that the proposed method does indeed identify (in some cases) cases of elections that were independently known, as a point of fact, to have been fraudulent.

They seem to be claiming two things, but you cannot have it both ways :

a) This method can identify fraudulent elections and we proved it by analysing elections in Russia

b) These elections in Russia that we analysed using this method were fraudulent - as proved by the method.

I would expect to see a control that takes data from elections that were already known to be fraudulent. e.g. by confession, video evidence or some other reliable means to show that the effect was observable in some of those demonstrably fraudulent elections.

amooeba · on July 4, 2016

Actually there is a TON of evidence that e.g. Russian elections 2011 were heavily falsified. There are dozens (probably hundreds) reports of independent observers, plenty of videos capturing ballot stuffing, multiple cases when election papers are known to have been forged after the ballot count was finished, etc. etc. I don't know what are "reliable means" for you though; all that evidence has been outright rejected by every single Russian official or court.

lovemenot · on July 4, 2016

Yes. And what bothers me about this study is that it seems to rely on our implicit understanding of this circumstance. Rather than using a set of data where they can explicitly compare fraudulent and legitimate results.

Surely there would have been a better set of test data than one for which politicians are still actively serving.

Unfortunately the science is somewhat tainted by the politics here since it is very unlikely they'll get independent confirmation of fraud for the test set.

kirillkh · on July 4, 2016

> These elections in Russia that we analysed using this method were fraudulent - as proved by the method.

Not "as proved by the method", but "as proved by the science of statistics". So let me fix it for you:

a) This method can identify fraudulent elections, because it's scientifically correct.

b) These elections in Russia that we analysed using this method were fraudulent - as proved by the science of statistics.

FYI, the paper was published in "Annals of Applied Statistics", which is a peer-reviewed journal.

lovemenot · on July 4, 2016

I acknowledge that I am no expert, so thanks for your time to try to set me straight. If I may ask, how does statistics let them be so confident of fraud when there is no other hard evidence of fraud in their data? In other words, how can they be sure there's no other explanation, when they apparently have not yet demonstrated the effect on other known fraudulent elections?

I have asked this question in different ways, but to my (admittedly limited) understanding, the question has not been answered by you or anyone else here.