
How Amazon's Mechanical Turkers Got Squeezed Inside the Machine - headalgorithm
https://spectrum.ieee.org/tech-talk/tech-history/dawn-of-electronics/untold-history-of-ai-mechanical-turk-revisited-tktkt
======
Someone1234
I tried to do some Turking just for a little side-cash.

I stopped after I realized that buying a higher tier of tax software just to
handle multiple income sources ate at least the first $15 of income (which is
a significant time investment in Mechanical Turk).

In other words the income was so low that even paying the taxes on it ruined
the whole venture. Even Bing's rewards have a higher rate of return. The
$2/hour from this article entirely mirrors my experience (although my average
might be a little lower).

~~~
giarc
I have an anecdotal story on the other side. I had a small startup and we had
a list of URLs. We loaded them into mTurk and asked turkers to visit site and
find an email address (probably could have automated this ourselves, but we
didn't). Most of the responses were just info@<<websitename>>.com and we could
see they completed the task in <10 seconds. So although info@ is a legit
answer, after checking a few we found out the turker just guessed it would be
a suitable answer and get paid for it. We basically couldn't trust any of the
answers and just walked away from that service.

~~~
notafraudster
You set up the HIT the wrong way. The correct way to do this is to assign the
same URL to multiple Turkers and look for cases where people disagree in order
to verify the response quality. Granted it's still possible that both Turkers
assigned to the HIT could commit fraud in the same way, but surely just as a
matter of chance this bounds your error rate pretty substantially.

~~~
ceejayoz
You can also use qualification tests
([https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMechanical...](https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMechanicalTurkRequester/Concepts_QualificationsArticle.html#qualification-
tests)) to require Turkers to demonstrate competence before doing tasks. I'd
imagine you could inject a certain percentage of known-answer tasks into the
queue as well, as ongoing quality control.

------
elektor
So many social science papers rely on MTurk which is concerning given the poor
compensation. With such low payment structures, MTurk is incentivizing people
to rush through responses and give responses that they anticipate the
researchers want to see.

~~~
notafraudster
(Note: I am an academic social scientist who has done an MTurk pilot but never
published anything based on MTurk responses)

Yes, this is true, but this is true outside MTurk as well. If you rely on the
YouGov panel (people complete surveys to get points for t-shirts), or really
anything else, there's a strong incentive to cheat.

This is why good survey research is going to involve attempting to bound fraud
through a variety of measures: attention checks ("What's your favorite color?
Ignore this question and answer yellow."), looking for straight-lining
(respondents always picking the leftmost answer), looking for unusual
contradictions (i.e. asking the same question in two, opposite ways and
looking for people who don't have the expected relation in their responses),
looking at the distribution of completion speed and scrutinizing the lower
quantiles, attempting to log participants who take the survey multiple times
through dummy accounts, etc.

Oddly, my experience was that Turkers were fairly conscientious. I know one
theory for this is that many Turkers are in fact Turking on the job, and so
their reserve wages are not about what they are being paid for the MTurk hit,
they're about what they're being paid to sit in their desk and not work.

~~~
ben_w
> "What's your favorite color? Ignore this question and answer yellow."

I’ve noticed something similar to that in some of the YouGov surveys I’ve
answered over the last decade. I wonder what it says about me that I sometimes
write, in the errors/comments section at the end of the surveys, that I have
seen surprising questions such as “what do you think about Channel 4?” when I
have previously answered _that I haven’t watched Channel 4 recently_.

~~~
Operyl
In this scenario, one might not watch Channel 4 for a reason, so asking for
that reason is sensible. “User A doesn’t watch channel 4, this is what they
think about it.”

~~~
ben_w
Could be; while it seemed very wrong when I saw it, I didn’t record the exact
words and it may have made more sense than it seemed to.

------
mturkRequester
Many of the problems in comments are from new requesters/workers using/having
poor qualifications or no qualifications, having no quality control, and not
researching the platform before using it. If you think you can just dip your
toe into the Mturk pool and profit, you're wrong. There is a huge learning
curve for both workers and requesters. If you do not invest the time in the
platform, you will get garbage pay and garbage results.

------
halfnibble
MTurk sucks. I need to have over a thousand video transcripts manually fixed,
and so far, only about 15% come back even close. Most MTurkers submit the
original, unmodified transcript, or something completely irrelevant. I have to
use diffing software just to quickly scan each submission for scammers trying
to make a quick buck. Granted, I feel like a scammer myself for paying so
little (but the decision for payout is coming from above me).

------
zepearl
(I keep reading about it but I never used it)

Does mTurk have some sort of quality-related metrics respectively higher
rewards linked to it?

For example when I submit some work can I specify "I want only at least
4-stars-people (out of 5, where "5" is for people that rarely make mistakes)
to work on this and yes I will pay 4$ extra fees for 4-stars and 8$ extra fees
for 5-stars rated workers"?

Thx

