Hacker News new | past | comments | ask | show | jobs | submit login
Integer percentages as fingerprints of electoral falsification (arxiv.org)
225 points by merraksh on July 3, 2016 | hide | past | favorite | 70 comments



Absolutely fascinating research!

I wonder if someone has some free time to run the models on several recent elections that were filled with fraud. I'm interested in: Austrian presidential elections [0], Turkish elections [1] and Democratic party primaries [2][3]. Each of them was analyzed for voting fraud and fraud was found in each of them (some were full of massive vote fraud like Turkish elections).

PS: What's with all the downvotes? Is the evidence of massive voter fraud in the West so unsettling that you have to downvote me?

[0] >Austria presidential poll result overturned

http://www.bbc.com/news/world-europe-36681475

[1] >Turkey Elections Massive Vote Fraud

https://erikmeyersson.com/2015/11/04/digit-tests-and-the-pec...

[2] >Hillary Clinton Favored By Election Fraud In Democratic Primaries

http://www.inquisitr.com/3127046/hillary-clinton-favored-by-...

[3] >Election Fraud Watch 2016 (J: awesome blog that tracks election fraud in primaries)

https://electionfraud2016.wordpress.com/


Austria is actually not suspecting any fraud. About 100 000 postal ballots were counted not fully following the protocol demanded by law, and the elections were decided by a difference of about 30 000 voted. They have no evidence or suspicion of actual fraud, but the legitimacy of the result can still be questioned, so they decided to re-run the elections.


The Sanders movement, even though I disagreed with his platform, was true grassroots democracy: Common people getting organized and trying to make a change.

I don't care if I get down voted into oblivion for saying this next part: When looking at the spread between the types of polls, there is a massive difference between the voting elite and the people's will. I don't advocate against the electoral college, but what was supposed to be system that prevented buyouts has proven to be completely forsale by the Clintons. I didn't see massive pro-Clinton rallies. We had quicker investigations into Samsung VS Apple for inconsequential things. No one has been held accountable and no explanations have been given for massive breaches of protocol other than its a plot by the Repuglicans.

I'm tired of everything from the Clinton family. My state KS has a good chance to throw electoral votes at Gary Johnson. Please California if you are listening, don't reward this family for their actions.


I think a major mistake many Sanders supporters are doing is not distinguishing between Clinton being an awful candidate (which she is) and Clinton still winning (which she is). There may be a lot of explanations - from Sanders being even more awful candidate (socialist as an US president? Come on!) to Sanders failing tactically and strategically. One explanation however that does not follow from any Clinton deficiencies is fraud. It also does not follow from polls - polls are increasingly getting worse at predicting elections, especially when they're close. That happened many times recently, and pollsters being wrong means pretty much nothing now except for maybe we need better pollsters.


I think increasingly, polls are proving fallible. I'm not sure people are willing to admit, even pseudonymously, what they really think and vote in the sanctity of the voting booth, for fear that they will be ostracized for not doing what they're supposed to do.


The main reason is probably the abandonment of landlines and phone books, and the adoption of caller id before it. It used to be easy to survey a representative sample of the population, and now it's impossible.


Possibly for polls during the campaign's, but exit polls, where people interview random samplings of people leaving the place where they cast their votes, have historically been very close to the actual vote results. In some areas of the US, the exit polls are spot-on. In other places, the numbers are off by a significant margin.

Some people try to explain it away by saying people vote for one candidate but are ashamed to admit it to a person asking them who they voted for, but wouldn't that tendency show up statistically across all precincts rather than in specific ones?

There is also an alarming number of times this occurs specifically in precincts with electronic voting machines.


It's not exactly "ashamed" I think, more like cautious. I mean, I've read multiple reports of people being beaten bloody on a candidate rally, so people would just prefer to stay out of trouble - even if they are proud of their choice, why they would risk talking about it publicly to random stranger? I mean, the pollster probably won't beat them up, but what if some crazy person overhears it and follows them and makes trouble? Nobody needs that. Sad, but such intolerance is part of the culture in US, many people just don't get "I disagree with you, but I respect your right to have your own opinion".


  ashamed to admit it
Eh, no need for shame, it could just be that one candidate appeals more to busy (or privacy-minded) voters who decline to be polled.

I generally avoid people with clipboards who are approaching people on the street, and I can believe that's something that correlates with opinions on other things.


If you really think that Clinton didn't get the votes she got fairly, you're living in a bubble. Sanders supporters have this problem of forgetting that women and minorities exist.

(I can't stand the Clintons, but I don't think they cheated)


I am an outsider (European) looking in but closely following the US presidential primaries this year, so I don't have a horse in the race. There are a number of levels at which there has been clear unfairness or cheating (depending on your perspective) in the DNC primaries. Here are some of the simpler ones:

- The recent release of hacked documents from the DNC show that the primaries were not set up to be a fair competition from the start. - The rules are specifically set up in a number of states (New York is a prime example) to disenfranchise any new Democratic Party members from being able to vote. - Thousands of registered voters were removed from the voters roll. There is even a case where a lifelong Democrat running for a congressional seat on a Democratic ticket was removed. - Media coverage made it appear that Clinton was in the lead before any voting even took place. That coverage has been shown to have been orchestrated (at least at the start) by the DNC. - Funds raised in the DNC's name were disproportionately funnelled to the Clinton campaign. - Voting stations were closed down in huge numbers in some states and territories (Puerto Rico and California being prime a examples)


NY was pretty ridiculous.

It's going to look pretty ridiculous if the DNC is ever audited, or has someone more credible than Guccifer2.0 hack and release their documents.


If you look at the demographics of the people who voted for sanders you'll find its representative of the overall population.

I.e. 50% of sanders' voters were women. You'll get similar proportional percentages of minorities.

Simpson's paradox creates the perception that sanders' voters are composed of mostly white males. It's untrue.


Rallies aren't a very good indicator of popular support. Having 4 million or so more votes is.


See also: Howard Dean, 2004


The electoral college plays no role in the primary nominating process.


I'm with you buddy



Refuted? Which instance? There was so many of them. Go check out this series of posts to see corruption in action:

https://electionfraud2016.wordpress.com/


As the NYTimes article points out, a discrepancy between exit polls and results (the main claim of the article you linked to) is not evidence of fraud, rather of the quality of exit polling, which is subject to a number of biases. If those biases are not correctly eliminated, the results will differ. Also, with due respect, that blog looks rather crackpotty.

As if anyone should be surprised that a leftie social democrat wouldn't do well in a US election.


> Is the evidence of massive voter fraud in the West so unsettling that you have to downvote me?

Not downvoting you. But if fraud turns out to be the case, I'm going to be unsettled as hell. It should be unsettling if you live and/or care about democracy in the West, obviously. I doubt we disagree about this though :)

But even if no fraud is suspected or if it finds nothing, this is data science. Unlike say medicine or food science, there's no shortage of data (in theory). So if there's a country that we can reasonably be sure about their democracy is entirely above the table[0], throw it into the test and use that to calibrate the others.

One problem I believe is that there is no such thing as "the test" :-)

This research was done based in a particular system of elections--if you look beyond your own democratic system (or the way you've learned in school how it's supposed to work), you'll know how insanely different they all are. The fact that we actually use a single word "democracy" to describe them all is pretty crazy, now that I think about it.

Both the numbers themselves (what aggregate counts are available) and the specific way of fraud suspected (which you kind of need for a statistical hypothesis) depend heavily on the democratic system involved. For instance that messy thing with the US elections a while back, second term of Bush IIRC, whether there was fraud or not, it was fought over a couple of hundreds of votes in one district or county, was it not?

To disappear or push a few hundred votes one way or another requires a different type of fraud, and therefore statistical testing, than say, the type of fraud that makes a whole country of several hundred million almost always divide their votes between two options so close to 50/50 that a few hundred votes can swing the outcome in the first place.

And that's just one country. So much different creative ways to mess up whatever is locally deemed to constitute democracy, you probably need at least an equal amount of different creative ways to statistically test whether they're fraudulent or not.

[0] come to think of it, maybe there is a shortage of data ...


You are off the mark with Austria I'm afraid.


(another author here) I doubt you will find this specific anomaly in US or even Turkey (at least I see nothing in my graphs for 2014 Turkey presidential elections). I'd say this type of falsification is a reminiscence of the purely fictional Soviet elections with their 99% turnout and 99% for the only official candidate. A similar pattern is seen, e.g., in Azerbajcan 2013 presidential election (dozens of precincts reporting identical, albeit in this case non-round, percentages for the incumbent).


I thought this would just be an application of Benford's law, but they note in the beginning that using Benford's law doesn't work for elections (citing this paper: http://www.vote.caltech.edu/sites/default/files/benford_pdf_...)

Does anyone know why Benford's law doesn't work here but does work for other made up numbers in applications like accounting?


Benford's law applies when the underlying distribution is approximately exponential. Because candidates with an exponentially small chance to win are not likely to run, it wouldn't make much sense for elections to have an exponential distribution. Much more plausible distributions are (truncated) normal or uniform, neither of which satisfies Benford's law.


I like to visualize it as throwing random darts at graph-paper that has logarithmic marks on both axes.


For Benford's law to apply, the numbers (the non-fabricated numbers) need to be coming from a source/distribution covering several orders of magnitude. The fabricated voting percentages are limited to between 0 and 100.


Raw vote totals then?


We ran a test for one of the problem sets in a course I was TAing. Raw vote counts did follow Benford's law in 2000 US presidential elections. We did not portray it as a tool to detect or disprove fraud, however.


Perhaps the sizes (number of eligible voters, and also number of people who actually voted) of the voting districts follows the law? Then you could take the total vote counts, and distribute them back to the candidates in a lot of artificial ways (e.g. always 50-50, a random percentage between 30 and 70, say) and probably obtain data that still follows the law.


Benford's law best applies to data that spans multiple orders of magnitude.


Isn't it that Benford's law applies when parts of data are being multiplied by other parts (like in most things in real world)? I.e. two uniform distributions multiplied together give a non-uniform distribution?


I have seen Benford's law applied to voter turnout, where the distribution could be expected to be closer to exponential (but still not truly, due to districting practices).


It's kinda sad to see our elections to be used as reference for falsification detection research...

This isn't exactly new, such analysis was done as early as 2011, i.e. http://lleo.me/dnevnik/2011/12/07_gauss.html

Although, they were more focused on the peculiar distribution, than on integer spikes.


The figure that is reproduced and discussed in that lleo.me blog post was made by one of the authors of the paper that is being discussed here. It's just took us from the end of 2011 until early 2016 to actually get the whole thing published.


This is fascinating.

I'm tempted to argue the improbably-round numbers might be due to lazily counting/sampling the ballots rather than actual malicious fraud... but I guess sloppily running the election still constitutes fraud in some sense.


The paper did mention (IIRC) that the ballot counts themselves didn't appear to have any tendency to align with "round" numbers (1250, 1300, etc) which they likely would have if someone was lazily counting the ballots. It was only the calculated percentages (#votes for winner over #votes overall) that had the trend.


(One of the authors here.) Actually the ballot counts do have some tendency to align with round numbers. We did not discuss this in the paper because we felt that this "anomaly" can easily be contested, precisely as @jcalvinowens argued up the thread (it is written in our discussion section too). So what we say instead is that the round percentages anomaly persists even if we exclude all ballot stations with round ballot counts, i.e. it cannot be explained as a by-product of round counts.

There are other problems associated with round ballot counts. Here is a funny one: a sizable number of polling stations reported round number of counted ballots, coinciding exactly with the total number of ballots received by the polling station prior to the election day (this information is available too). As if so many people came to vote that every single ballot was used up, turnout being close to 100%. There is little doubt that it was all made up out of thin air.


Or sloppy/lazy counting because they already know the predetermined outcome.


Improbable. Sharing a pre-determined result with Joe Random Votecounters * number of voting stations = near certainty of someone leaking that, and a field day for the media.

So, while theoretically possible, realistically summed up by "three men can hold a secret, if one of them is dead."

(And specifically here, if I were a conspirator going for a predetermined result, I would want a somewhat close result, yes - but not an almost-tie as seen here. Something like 55:45 or somesuch, not a "every single vote matters" scenario, where a few thousand votes could swing the outcome)


I think it would be more of an "open secret" pattern: e.g. no one explicitly orders people to make sure the ruling party always wins, but everyone knows someone who knows someone who was punished for reporting the actual count instead of a rigged count.


What you are describing is not an "open secret," that's a "conspiracy theory."


This is likely true and does not indicate fraud. The majority of precincts overwhelmingly favor one candidate, as do many states.


Elections are not counted by precinct, however. E.g., in a presidential election, many states are won or lost as a block. If one candidate only gets ten votes in this precinct, so we just throw them out, and she loses by five votes over the whole state, that certainly is fraud.


Is there a field that studies inconsistencies of this type?

Recently on HN there was a related test (the Grim test) https://news.ycombinator.com/item?id=11787560


I think the field you're looking for is statistics?


More specifically anomaly detection.

https://en.wikipedia.org/wiki/Anomaly_detection


Thank you!


It would be very interesting to see this method used for other elections to see what unexpected results can be found. Although still interesting, it is not a huge surprise that there has been large vote fraud in Russia.


>it is not a huge surprise that there has been large vote fraud in Russia

Russian elections for statistics have become like "Lena" for image processing. Typical "camel-double-hump-Gauss" of Russian elections - polling stations without observers at Fig 2. at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3545790/pdf/pnas... ("United Russia" is the Putin's party).


Nice work. It's interesting they used Russia as the source of election results data. Did I miss an exlaination of their data selection criteria in the paper?


> We used election data from three countries besides Russia: 2011 general election in Spain, 2010 presidential election in Poland (1st round), and 2009 federal election in Germany (Zweitstimmen, i.e. party votes). These three elections were chosen because the data are publicly available down to the single polling station level, and because the number and size of polling stations are comparable to those in Russia

I guess this means they found the data first, then decided to try to analyze it and compare it with similar data elsewhere. It's probably not their work-related research (http://www.fchampalimaud.org/en/the-foundation/mission/).


Well, I can give you the explanation: all of the authors are Russian.


Should try it on the USA election results...


Why Electronic Voting is a BAD Idea - Computerphile

https://www.youtube.com/watch?v=w3_0x6oaDmI


This thread just reminded me of this quote:

"If you tell the truth, you don't have to remember anything." - Mark Twain


This is brilliant. I wonder how the authors got the initial motivation to do the study -- seems like quite a bit of work to find a very specific statistical anomaly; doubt they went into it blindly not really knowing there was something to find...


In 2014, there was a lot of attention drawn in Russia and Ukraine to votes in Crimea and Donbass (the "independence referendums", and elections of local leaders). The latter two votes, in particular, had nice exact percentages, which drew attention of people familiar with statistics almost immediately, and were heavily discussed in LJ.


The study that is being discussed here was essentially done in the end of 2011 and early 2012, mostly triggered by the results of the 2011 legislative election (heavily discussed in LJ as well; in fact, this study itself was mostly developed in LJ).

The election results in Crimea and Donbass do contain some funny patterns and are most likely largely made up, but no information is available on the polling station level so there is little to analyze statistically.


Good to know, thanks! Regarding this:

>> no information is available on the polling station level so there is little to analyze statistically.

What they did was look at the percentages and exact vote counts for each choice, and noticed that they match down to one vote (i.e. if you take the percentage, multiply by total population, and round, you'll get the vote count). For example, in referendum in Lugansk, the official stats are as follows:

1,349,360 valid bulletins 1,298,084 (96.2%) voted yes

Normally, the percentage is a number with a lot of digits after the decimal point, which is then rounded for presentation purposes. But here, if you take 96.2% and multiply it by 1,349,360, you get 1,298,084.32. In other words, it's actually accurate to 4 digits after decimal point, three of which happen to be zeroes (96.2000%).

Of course, it could be just a very unlikely (< 1/1000) coincidence that the number of "yes" votes just happened to fit exactly into three digits of precision. But a more reasonable explanation is that someone started with the percentage that they wanted to get, and then computed the requisite number of bulletins from that.

Said explanation becomes even more reasonable when you take the reported numbers for turnout, and realize that the same relation holds there. Specifically:

1,807,739 eligible voters 1,359,419 voted (75.2%)

1,807,739 х 75.2% = 1,359,419.73

Oh look, another perfect percentage (75.2000%).


On reading just the abstract, there does not appear to be any control in this experiment. I'd have expected them to mention inclusion of election results known with certainty to not be fraudulent.

Likewise, I'd expect there to be proof of fraud by other reliable means in order to validate this method. It is not enough for them to just assert that there can be no other explanation for this data, so these were fraudulent results, so our method must be working.

Absent a control, the strong conclusion that fraud can be detected this way seems unsupported.


They include Poland, Germany, and Spain, see page 10, Fig. 3 and the discussion next to it. They also say the elections prior to 2004 have no such anomaly.


I would suggest looking at the paper in more detail before saying it's uncontrolled. I found the control just by skimming the figures.


I would do so, but the site prevents access to non-subscribers.

Would you kindly tell me the method they used to independently prove fraud in some of the analysed data.


I had no trouble accessing the paper using the PDF link on the page (https://arxiv.org/pdf/1410.6059v4.pdf)


Thanks for the link to PDF. The "This URL" link takes you to a page that asks for a login.

Having now read the article, I stand by my original point. There seems to be no verification that the proposed method does indeed identify (in some cases) cases of elections that were independently known, as a point of fact, to have been fraudulent.

They seem to be claiming two things, but you cannot have it both ways :

a) This method can identify fraudulent elections and we proved it by analysing elections in Russia

b) These elections in Russia that we analysed using this method were fraudulent - as proved by the method.

I would expect to see a control that takes data from elections that were already known to be fraudulent. e.g. by confession, video evidence or some other reliable means to show that the effect was observable in some of those demonstrably fraudulent elections.


Actually there is a TON of evidence that e.g. Russian elections 2011 were heavily falsified. There are dozens (probably hundreds) reports of independent observers, plenty of videos capturing ballot stuffing, multiple cases when election papers are known to have been forged after the ballot count was finished, etc. etc. I don't know what are "reliable means" for you though; all that evidence has been outright rejected by every single Russian official or court.


Yes. And what bothers me about this study is that it seems to rely on our implicit understanding of this circumstance. Rather than using a set of data where they can explicitly compare fraudulent and legitimate results.

Surely there would have been a better set of test data than one for which politicians are still actively serving.

Unfortunately the science is somewhat tainted by the politics here since it is very unlikely they'll get independent confirmation of fraud for the test set.


> These elections in Russia that we analysed using this method were fraudulent - as proved by the method.

Not "as proved by the method", but "as proved by the science of statistics". So let me fix it for you:

a) This method can identify fraudulent elections, because it's scientifically correct.

b) These elections in Russia that we analysed using this method were fraudulent - as proved by the science of statistics.

FYI, the paper was published in "Annals of Applied Statistics", which is a peer-reviewed journal.


I acknowledge that I am no expert, so thanks for your time to try to set me straight. If I may ask, how does statistics let them be so confident of fraud when there is no other hard evidence of fraud in their data? In other words, how can they be sure there's no other explanation, when they apparently have not yet demonstrated the effect on other known fraudulent elections?

I have asked this question in different ways, but to my (admittedly limited) understanding, the question has not been answered by you or anyone else here.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: