Hacker News new | comments | show | ask | jobs | submit login
Statistical Analysis of Election Manipulation in Republican Primaries (themoneyparty.org)
27 points by ca136 1847 days ago | hide | past | web | 9 comments | favorite



The take home message:

> "...highly anomalous election results indicate a widespread, systematic exchange of votes favoring one candidate"

> "Mitt Romney, based on our analysis, should have (statistically) gotten third rank in Iowa’s election (as opposed to second); second rank in New Hampshire (as opposed to the first rank), and so on, resulting most likely to a brokered convention at the Republican National Convention in Tampa, FL."

Statistics is awesome, although I suspect that public/media knowledge will mean this is brushed aside.


A recent discussion on /r/statistics: http://www.reddit.com/r/statistics/comments/11ydmt/20082012_...

And this: http://www.reddit.com/r/statistics/comments/123pt7/would_rst...

And this, from 8 months ago: http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_...

The result seems to be that the more one knows about statistics, the less convincing the case is for election manipulation. All the analysis actually seems to show is that Romney did better in precincts with a large number of votes cast, which is pretty much what we'd expect, as those are are more likely to be precincts in denser areas.

By sorting the precincts by number of votes cast, and then plotting percentage split of the cumulative votes across that sorted list, they are pretty much forcing the general shape of the curves that they get. The smaller rural precincts are (1) more likely to go for other candidates, and (2) are going to show much more variation, thus making it almost certain the split on the left side of the graphs will be far from the split on the right side of the graph.

People also seem to be making much of the way when Romney's curve moves up on the graph, one or more of the other curves move down by an amount that EXACTLY BALANCES Romney's gain--and so it must have been vote flipping. No, it is because they are showing the percentage split, which BY DEFINITION must add to 100, and so any change in one curve must be balanced exactly by the net changes in the other curves.

Another thing worth noting is that exit polls agreed well with the reported results, as did news organization forecasts based on early returns, as did projections based on pre-vote polling. If there were significant fraud, it would throw those off (unless the fraudsters were managed to ALSO rig the polls and projections...).

EDIT: here's an example. Given a set of precincts where the number of votes cast match those of 2012 Arizona GOP primary, and where the distribution of votes in large precincts is 44.6% Romney, 31.4% Santorum, 16.0% Gingrich, and 8.0% Paul, and the distribution is 29.6%, 36.4%, 21.0%, 13.0% in medium precincts, and 14.6%, 38.4%, 27.5%, 19.5% in small precincts, with large meaning had 150 or more votes, small meaning had less than 50 votes, and medium being anything else, here's what the percentile distribution of cumulative vote total sorted by precinct size curve looks like for a simulated election: http://imgur.com/s4SrE

My numbers there aren't meant to reflect actual Arizona numbers. They are just meant to illustrate the kind of curve we expect if there is a distribution difference that correlates with precinct size.

Romney is blue, Santorum is yellow, Gingrich is green, and Paul is red. Note the similarity to the actual Arizona curve. I expect that if someone could dig up polling data from each precinct from before the vote, and used that to control the vote distribution for each precinct, the match would be very close.

If anyone wants to play around with this, here's some quick and dirty Python code:

    #!/usr/bin/python
    import random
    import matplotlib.pyplot as plt

    precinct_size = [1, 1, 1, 3, 5, 7, 8, 9, 10, 12, 18, 19, 21, 26, 28, 29, 30, 30, 34, 41,
    46, 50, 56, 57, 57, 58, 58, 61, 68, 69, 72, 78, 79, 85, 88, 94, 99, 100,
    103, 109, 109, 120, 126, 126, 129, 132, 133, 133, 136, 139, 139, 141, 147, 152,
    157, 158, 162, 162, 165, 166, 169, 172, 173, 175, 177, 177, 179, 181, 192,
    201, 224, 231, 235, 238, 246, 249, 249, 251, 258, 264, 268, 270, 272, 276,
    277, 281, 281, 293, 315, 322, 327, 333, 334, 337, 346, 348, 349, 350, 363, 366,
    367, 369, 370, 374, 374, 384, 385, 386, 387, 394, 397, 405, 407, 412, 413, 419, 420,
    421, 425, 429, 434, 438, 439, 449, 467, 474, 483, 483, 496, 502, 505, 507, 508, 513,
    518, 525, 526, 538, 545, 548, 555, 560, 571, 577, 581, 583, 584, 606, 606, 612, 620,
    620, 625, 635, 641, 646, 647, 650, 652, 662, 663, 666, 669, 673, 674, 710, 711, 721, 728,
    737, 744, 745, 747, 747, 780, 786, 787, 790, 791, 818, 824, 833, 834, 878, 899,
    903, 927, 928, 949, 961, 1002, 1031, 1059, 1070, 1133, 1587]

    vote_dist  = [.446, .314, .160, .080]    # romney, santorum, gingrich, paul
    vote_dist2 = [.296, .364, .210, .130]
    vote_dist3 = [.146, .384, .275, .195]

    def vote(d):
        r = random.random()
        s = 0
        for v in range(len(d)):
            s += d[v]
            if s >= r:
                return v


    def trial():
        total = [0, 0, 0, 0]
        x = []
        romney = []
        santorum = []
        gingrich = []
        paul = []
        i = 1
        vt = 0
        for s in precinct_size:
            for voter in range(s):
                if s < 50:
                    v = vote(vote_dist3)
                elif s < 150:
                    v = vote(vote_dist2)
                else:
                    v = vote(vote_dist)
                total[v] += 1
                vt += 1
            x.append(float(i))
            i += 1
            romney.append(float(total[0])/vt)
            santorum.append(float(total[1])/vt)
            gingrich.append(float(total[2])/vt)
            paul.append(float(total[3])/vt)

        plt.plot(x, paul, 'ro')
        plt.plot(x, gingrich, 'go')
        plt.plot(x, santorum, 'yo')
        plt.plot(x, romney, 'bo')
        plt.show()

    trial()
EDIT 2: I'm not actually sure that the precinct sizes match Arizona. I got the numbers from a spreadsheet attached to one of the news stories about this. However, the total number of votes is only about 75k, which is much less than the correct number for Arizona.


To expand on this a bit, Romney is more likely to take urban republicans in greater numbers over suburban republicans. This is similar to in the general Obama is expected to take urban voters over suburban voters. I suspect this "analysis" could just as easily be done to show Obama "tampered" with votes after the general.

You gave a much better explanation, but in all seriousness folks, flag this story(I already have) and get it off the front page. It's nothing but pseudo statistics from someone who doesn't have a clue hoping to stir controversy over "evil republicans".


This isn't necessarily true. It's important not to conflate "small precinct" with "suburban/rural area" and "large precinct" with "urban area", since many rural precincts might have more voters than urban ones.

I wish someone would look at the underlying precinct data and redo this analysis while factoring in precinct population density.


This isn't necessarily true. It's important not to conflate "small precinct" with "suburban/rural area" and "large precinct" with "urban area", since many rural precincts might have more voters than urban ones.

And in fact, the paper spends quite a bit of time addressing this.


The main problem of this paper seems to be 'publication bias.' I mean by this, given a number n of statisticians who look at a certain fair election. Each using n novel method. Then we expect ß n papers from it, given that ß is the chance of falsely rejecting the election as fair. Since each statistician uses a number of tests, which may or may not correlate with each other, it is very much impossible to determine sufficient level of certainty, without committing first to the statistical tests used.


Hacker News?


it relates to hacking elections.


And possibly in a much more traditional sense than that word's typical usage around HN!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: