

Statistical Analysis of Election Manipulation in Republican Primaries - ca136
http://www.themoneyparty.org/main/wp-content/uploads/2012/10/Republican-Primary-Election-Results-Amazing-Statistical-Anomalies_V2.0.pdf

======
tzs
A recent discussion on /r/statistics:
[http://www.reddit.com/r/statistics/comments/11ydmt/20082012_...](http://www.reddit.com/r/statistics/comments/11ydmt/20082012_election_anomalies_results_analysis_and/)

And this:
[http://www.reddit.com/r/statistics/comments/123pt7/would_rst...](http://www.reddit.com/r/statistics/comments/123pt7/would_rstatistics_care_to_critique_the/)

And this, from 8 months ago:
[http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_...](http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_you_debunk_this_some_people_with/)

The result seems to be that the more one knows about statistics, the less
convincing the case is for election manipulation. All the analysis actually
seems to show is that Romney did better in precincts with a large number of
votes cast, which is pretty much what we'd expect, as those are are more
likely to be precincts in denser areas.

By sorting the precincts by number of votes cast, and then plotting percentage
split of the cumulative votes across that sorted list, they are pretty much
forcing the general shape of the curves that they get. The smaller rural
precincts are (1) more likely to go for other candidates, and (2) are going to
show much more variation, thus making it almost certain the split on the left
side of the graphs will be far from the split on the right side of the graph.

People also seem to be making much of the way when Romney's curve moves up on
the graph, one or more of the other curves move down by an amount that EXACTLY
BALANCES Romney's gain--and so it must have been vote flipping. No, it is
because they are showing the percentage split, which BY DEFINITION must add to
100, and so any change in one curve must be balanced exactly by the net
changes in the other curves.

Another thing worth noting is that exit polls agreed well with the reported
results, as did news organization forecasts based on early returns, as did
projections based on pre-vote polling. If there were significant fraud, it
would throw those off (unless the fraudsters were managed to ALSO rig the
polls and projections...).

EDIT: here's an example. Given a set of precincts where the number of votes
cast match those of 2012 Arizona GOP primary, and where the distribution of
votes in large precincts is 44.6% Romney, 31.4% Santorum, 16.0% Gingrich, and
8.0% Paul, and the distribution is 29.6%, 36.4%, 21.0%, 13.0% in medium
precincts, and 14.6%, 38.4%, 27.5%, 19.5% in small precincts, with large
meaning had 150 or more votes, small meaning had less than 50 votes, and
medium being anything else, here's what the percentile distribution of
cumulative vote total sorted by precinct size curve looks like for a simulated
election: <http://imgur.com/s4SrE>

My numbers there aren't meant to reflect actual Arizona numbers. They are just
meant to illustrate the kind of curve we expect if there is a distribution
difference that correlates with precinct size.

Romney is blue, Santorum is yellow, Gingrich is green, and Paul is red. Note
the similarity to the actual Arizona curve. I expect that if someone could dig
up polling data from each precinct from before the vote, and used that to
control the vote distribution for each precinct, the match would be very
close.

If anyone wants to play around with this, here's some quick and dirty Python
code:

    
    
        #!/usr/bin/python
        import random
        import matplotlib.pyplot as plt
    
        precinct_size = [1, 1, 1, 3, 5, 7, 8, 9, 10, 12, 18, 19, 21, 26, 28, 29, 30, 30, 34, 41,
        46, 50, 56, 57, 57, 58, 58, 61, 68, 69, 72, 78, 79, 85, 88, 94, 99, 100,
        103, 109, 109, 120, 126, 126, 129, 132, 133, 133, 136, 139, 139, 141, 147, 152,
        157, 158, 162, 162, 165, 166, 169, 172, 173, 175, 177, 177, 179, 181, 192,
        201, 224, 231, 235, 238, 246, 249, 249, 251, 258, 264, 268, 270, 272, 276,
        277, 281, 281, 293, 315, 322, 327, 333, 334, 337, 346, 348, 349, 350, 363, 366,
        367, 369, 370, 374, 374, 384, 385, 386, 387, 394, 397, 405, 407, 412, 413, 419, 420,
        421, 425, 429, 434, 438, 439, 449, 467, 474, 483, 483, 496, 502, 505, 507, 508, 513,
        518, 525, 526, 538, 545, 548, 555, 560, 571, 577, 581, 583, 584, 606, 606, 612, 620,
        620, 625, 635, 641, 646, 647, 650, 652, 662, 663, 666, 669, 673, 674, 710, 711, 721, 728,
        737, 744, 745, 747, 747, 780, 786, 787, 790, 791, 818, 824, 833, 834, 878, 899,
        903, 927, 928, 949, 961, 1002, 1031, 1059, 1070, 1133, 1587]
    
        vote_dist  = [.446, .314, .160, .080]    # romney, santorum, gingrich, paul
        vote_dist2 = [.296, .364, .210, .130]
        vote_dist3 = [.146, .384, .275, .195]
    
        def vote(d):
            r = random.random()
            s = 0
            for v in range(len(d)):
                s += d[v]
                if s >= r:
                    return v
    
    
        def trial():
            total = [0, 0, 0, 0]
            x = []
            romney = []
            santorum = []
            gingrich = []
            paul = []
            i = 1
            vt = 0
            for s in precinct_size:
                for voter in range(s):
                    if s < 50:
                        v = vote(vote_dist3)
                    elif s < 150:
                        v = vote(vote_dist2)
                    else:
                        v = vote(vote_dist)
                    total[v] += 1
                    vt += 1
                x.append(float(i))
                i += 1
                romney.append(float(total[0])/vt)
                santorum.append(float(total[1])/vt)
                gingrich.append(float(total[2])/vt)
                paul.append(float(total[3])/vt)
    
            plt.plot(x, paul, 'ro')
            plt.plot(x, gingrich, 'go')
            plt.plot(x, santorum, 'yo')
            plt.plot(x, romney, 'bo')
            plt.show()
    
        trial()
    

EDIT 2: I'm not actually sure that the precinct sizes match Arizona. I got the
numbers from a spreadsheet attached to one of the news stories about this.
However, the total number of votes is only about 75k, which is much less than
the correct number for Arizona.

~~~
jhspaybar
To expand on this a bit, Romney is more likely to take urban republicans in
greater numbers over suburban republicans. This is similar to in the general
Obama is expected to take urban voters over suburban voters. I suspect this
"analysis" could just as easily be done to show Obama "tampered" with votes
after the general.

You gave a much better explanation, but in all seriousness folks, flag this
story(I already have) and get it off the front page. It's nothing but pseudo
statistics from someone who doesn't have a clue hoping to stir controversy
over "evil republicans".

~~~
clarkm
This isn't necessarily true. It's important not to conflate "small precinct"
with "suburban/rural area" and "large precinct" with "urban area", since many
rural precincts might have more voters than urban ones.

I wish someone would look at the underlying precinct data and redo this
analysis while factoring in precinct population density.

~~~
argv_empty
_This isn't necessarily true. It's important not to conflate "small precinct"
with "suburban/rural area" and "large precinct" with "urban area", since many
rural precincts might have more voters than urban ones._

And in fact, the paper spends quite a bit of time addressing this.

------
polemic
The take home message:

> _"...highly anomalous election results indicate a widespread, systematic
> exchange of votes favoring one candidate"_

> _"Mitt Romney, based on our analysis, should have (statistically) gotten
> third rank in Iowa’s election (as opposed to second); second rank in New
> Hampshire (as opposed to the first rank), and so on, resulting most likely
> to a brokered convention at the Republican National Convention in Tampa,
> FL."_

Statistics is awesome, although I suspect that public/media knowledge will
mean this is brushed aside.

------
yk
The main problem of this paper seems to be 'publication bias.' I mean by this,
given a number n of statisticians who look at a certain fair election. Each
using n novel method. Then we expect ß n papers from it, given that ß is the
chance of falsely rejecting the election as fair. Since each statistician uses
a number of tests, which may or may not correlate with each other, it is very
much impossible to determine sufficient level of certainty, without committing
first to the statistical tests used.

------
cantastoria
Hacker News?

~~~
meatsock
it relates to hacking elections.

~~~
argv_empty
And possibly in a much more traditional sense than that word's typical usage
around HN!

