
Company using Mechanical Turk botches U.S. Senate campaign finance records - danso
https://www.publicintegrity.org/2018/09/04/22214/company-using-foreign-workers-botches-us-senate-campaign-finance-records
======
apendleton
For a bit more background: in general, House and Senate campaigns use the same
software to manage their campaign contribution data, and this software is
capable of doing electronic submission (because the House requires it), but
the Senate instead prints it all out and hands it in paper, because they're
the worst, and then it all gets typed back in again at a cost to the taxpayer
of about $250k per filing deadline (how many of those per year there are
varies -- they get more frequent closer to elections). The main motivation for
the unwillingness to change things here isn't that they like the inaccuracies
so much as that they like the delay -- it effectively means the last couple of
pre-election filings aren't public until after the election, rather than, you
know, instantly.

~~~
everybodyknows
Article says McConnell is the blocker of electronic filing. Anyone know about
his own campaign finances in the last few months before elections?

~~~
apendleton
[https://www.opensecrets.org/](https://www.opensecrets.org/) is a good
resource, but honestly (having worked on money on politics stuff for several
years) none of it is that juicy in isolation. Everybody gets money from
everybody. For the most part they're not scared of any particular story coming
out, they'd just rather not have a "In the final days of the campaign, X for
$Y from Z" story come out the day before the election at all, for any Y or Z
(it looks bad pretty much regardless of who it is: banks, pharmaceuticals,
defense, lobbyists, energy, whatever)

------
slics
There are so many factors in play here. Government issues a contract for
scanning and loading all the paperwork. Contractor wins contract (prime), he
then hires sub-contractors to do the their work, which in return can hire some
other sub-contractor. If you think about it from a requirement perspective,
government asks for apples, by the time those requirements get triaged three
levels deep, they get oranges. Government attitude for a shitty product
delivered, you can still make juice out of the fruit we got. They still keep
pumping yearly, millions of dollars for that same minimal viable product.

------
vitorbaptistaa
A bit off-topic, but as an example of what could be done with this kind of
information, the Operation Serenata de Amor from Brazil created a robot that
analyses the reimbursement claims from Brazilian politicians looking for
outliers, filing complaints and tweeting the politicians to ask for
clarifications. There have been some pretty funny conversations between
politicians and twitter bots because of this :)

[https://serenata.ai/en/](https://serenata.ai/en/)

~~~
jessaustin
They argue about reimbursements over _Twitter_? That seems like a topic for
private communication methods. Email bots are also a thing.

~~~
mmt
> That seems like a topic for private communication methods.

Since the topic is public officials acting in their official capacity, I
disagree.

------
ohashi
Captricity is terrible on MTurk. They reject and don't pay. A lot of people
who regularly work on MTurk avoid them because they are a scummy company. Read
their TurkOpticon reviews or on Reddit or other MTurk communities. I hope they
disappear.

------
seveibar
Mechanical Turk doesn't have built-in quality assurance so unless you're very
skilled in QA systems or have a lot of money to burn on validation it's going
to give poor results. Btw it's relatively hard to build a QA system that
validates work, establishes trusted workers and optimizes for cost without
significant redundancy.

I do the technology at an MTurk competitor that automates the quality
assurance/training process and pays fair wages to refugees to do the work
(workaround.online) if anyone is interested in an alternative to MTurk that is
still relatively cheap I would highly recommend it.

------
cdoxsey
Shouldn't they be running each scan through the mechanical turk multiple times
to remove errors? I guess that would double or triple their costs.

~~~
electroly
My experience with MTurk is that 3 isn't enough runs if you need the data to
be correct and can't afford to pay someone (who ISN'T from MTurk) to validate
every entry.

We regularly ran into these two situations:

\- All three workers got different answers

\- Two of the three workers agreed on the wrong answer

I think five or more runs may be necessary for data transcription on MTurk.

~~~
ig1
You should consider using qualifications / simplifying the requests.

The error rate I get for data entry tasks is around 0.5%-1% discrepancy
between double entry. If you use prior reliability of the worker to tie break
between who's right it drops to <0.1% error rate.

------
logfromblammo
I once did campaign finance data entry as a child laborer for a newspaper
reporter. I entered all the contribution reports for all the the state house
and senate races. It was crates upon boxes of public-record paper documents.
According to the reporter, I probably only made one mistake in the dollar
amounts, due to a single missing contribution report. Not bad for a kid.

But it took forever, and it was relatively expensive for the newspaper. This
was way back in the early 1990s. Since then, paper filings tapered off, and
electronic filings replaced them. It really is a far better way to do it.
Clearly, these records need to be digitized to create public transparency, and
that need is apparently being met by bottom-feeding tax-eaters doing a
minimum-effort job at a top-shelf price. I am not surprised these records are
being botched, but I am surprised this story is coming out only now, rather
than back in 2001.

------
kokey
Oh this reminded me immediately of this story about a data migration project
[https://thedailywtf.com/articles/Importing-Data-the-WTF-
Way](https://thedailywtf.com/articles/Importing-Data-the-WTF-Way)

------
ianhawes
My understanding is that Senate campaigns _can_ file electronically, but are
not legally required and therefore do not.

~~~
eli
Because they want their records to be hard to read and analyze. This is a
policy/political problem not an OCR one.

~~~
cabaalis
Also, it directly benefits the senators for the company hired to process the
data to fail in doing so. Any issues raised from the dataset can immediately
be refuted.

------
Dowwie
This is a political issue more than a technological one.

~~~
komali2
Agreed, especially considering

>Reform advocates say this is in large part because of opposition by a small
group of Senate Republicans, most notably Senate Majority Leader Mitch
McConnell

which is exactly what I'd expect from the dude, but not something I feel like
screaming about on HN.

------
esseti
a did part of my PhD on crowdsourcing and the poor quality results still
remain a problem ;)

~~~
mygo
I remember back when Yahoo Answers was the go-to for crowd-sourced Q&A. Nobody
was getting payed and most answers were absolutely terrible.

Then Quora came out. Nobody was getting paid but suddenly experts were
answering questions in their field.

Wonder if there could be a Quora for MTurk

~~~
booleandilemma
I think this is the first positive thing I’ve ever read about Quora anywhere.

~~~
JoelTheSuperior
I mean, it's not perfect - I have seen my fair share of "experts" but I would
say as a general rule the answers are fairly high quality.

------
stuaxo
Mechanical Turk is basically laundering of grossly underpaid labour.

------
mirimir
Checking data for amount=ID seems pretty trivial. Also looking for outliers.
And of course, for records with nothing in name fields.

------
txcwpalpha
> Captricity turns these images into machine-readable text through what its
> website calls a “groundbreaking collaboration between humans and computers.”

> Captricity administers this kind of work through Mechanical Turk, an Amazon-
> owned online labor marketplace.

Ah yes, "groundbreaking collaboration between humans and computers", which in
this case actually means "pay a human below minimum wage to type numbers on a
keyboard".

Gotta love corporate marketing speak. The only thing that would make this even
more ridiculous is if Captricity claimed they were using the mystical "machine
learning" too. Hmm, let's check their website. [1]

>Captricity then uses sophisticated machine learning to package up these
fields (we call them “shreds”) into quickly identifiable packets.

...lol

1: [https://support.captricity.com/blog/captricitys-secret-
weapo...](https://support.captricity.com/blog/captricitys-secret-weapon-
crowdsourcing/)

~~~
pavel_lishin
But are the results stored in the blockchain?

~~~
redleggedfrog
In the cloud?

~~~
kbenson
No, no, no! They print the blockchain to paper so it can be duplicated and
stored in multiple secure physical locations. Blockchain in the cloud is so
last year.

~~~
SaltyBackendGuy
A plant based media storage ledger, with redundancy. I smell a crypto startup.

