
Everybody is spamming everybody else on Mechanical Turk - Siah
http://openresearch.wordpress.com/2011/03/09/everybody-is-spamming-everybody-else-on-mechanical-turk/
======
marcua
You don't ask assembly line workers to build an amazing car on their own in a
single step. Similarly, you shouldn't ask low-paid information workers to
synthesize amazing text on their own in a single step.

I think that your HIT design highlights several common mistakes requesters
make on MTurk:

\- You are underpaying for the task (would you write a good review of
Berkeley, CA for $1 for a stranger?)

\- You provide no aggregation or verification step, to ensure that turkers
know their work should jive with other turkers' output. You also give no
indication that such verification is possible or likely to happen.

\- Your task output is poorly defined and open to interpretation. You may have
asked a straightforward question, but I assume you placed a blank textbox on
the screen and expected well-formed paragraphs in return.

If you want a great example of text synthesis of relatively high quality using
MTurk for prices in the range of your budget, see
<http://borismus.com/crowdforge/>

If you want to learn more about how to design HIT workflows, see
<http://projects.csail.mit.edu/soylent/> (disclosure: I share an office with
and work with Michael Bernstein, but not on this work). One of Soylent's
contributions was the Find-Fix-Verify design pattern, which helps with some of
the problems you raise.

Your task is even harder, of course, since you require subject-matter experts
in a fictional location. So perhaps MTurk is the wrong crowd for your task.

------
lukas
I run a company called CrowdFlower that provides quality control on top of
Mechanical Turk and other pools of workers from traditional outsourcing
companies to offerwalls (where people earn in game credits for doing our
tasks).

I think this article doesn't reflect everyone's experience with Mechanical
Turk. We get lots of high quality work out of Mechanical Turk and lots of
other companies do as well. It does take a fair amount of work to get the
quality right - that's how we got started as a business and that's why many
people still come to us.

As an aside, if the author of the article is reading this thread and wants
data, we would be happy to talk about it.

------
snikolov
The article raises an interesting point: that many turkers just assume there
is no quality assurance being done on the requester end and everything will
automatically be accepted and paid for. Since it is tricky to automate QA for
huge sets of tasks I would guess this assumption is mostly correct, and
turkers take advantage of it.

~~~
jellicle
"Tricky to automate" what? Are you not literally in the middle of using a tool
that helps you automate QA for huge sets of tasks?

It should be trivial to create a task, create a task for evaluating that task,
and yet another task for evaluating _that_ task. Run all three long enough and
you will in fact get good results.

Obviously if you're going to use an unreliable protocol there have to be
management protocols in effect to correct errors, or you will end up with
errors. This is not a revelation.

~~~
nmcfarl
This should be easy - but it is not. Many many Requesters submit tasks to Turk
from the Amazon provided UI, or some other simplified UI with no concept of a
workflow. Which makes this stupidly hard.

So you'd think this tool would do this for you - but instead you need another
layer on top, either one you code, or some 3rd party tool like CrowdFlower.

------
scrrr
It seems like unique online-identities that belong to real people, just like
Facebook offers them, seem to be the only way to prevent rating-spam.

Or are they?

What if "mechanical turks" continue to use their FB-account to do the same?

This makes any rating-system almost useless.

And since I will be publishing an Android-App soon: Wouldn't it be wise to
hire people to rate it with 5 stars, say a few hundred times? It seems like my
competition will do it.

~~~
kkowalczyk
A solution does exist: providers of mturk-like services could disallow such
work items and enforce that (inci-meta-dentally they could use mturk itself to
crowd-source spam identification on the cheap).

There is additional work for the service provider but it would seem to me that
it does align with their self-interest at some level. I don't think Amazon
really wants mturk to be associated with providing a spam work force.

I believe one of the things that CrowdFlower explicitly calls out as an
advantage over mturk is quality control (although for this particular solution
to work all crowd-sourcing providers would have to do it - in this particular
case it takes only one bad provider to enable bad behavior.

As to your hopefully hypothetical question: a risk you're running is that
Google will pull out your app from the store. I haven't heard a case with
Google but I'm pretty sure apps were pulled from Apple's App Store for
manipulating ratings, so the downside could be big (your hard work could
amount to nothing).

~~~
pavel_lishin
> providers of mturk-like services could disallow such work items

Except for the one shady site that doesn't, and ends up raking in profits.

------
reddot
MIT is doing some really interesting research into using crowd sourcing like
mturk. Check it out:
<http://groups.csail.mit.edu/uid/research.shtml#crowdcomp>

They are tackling tasks like extremely difficult OCR and collaborative editing
and proofreading.

I've used mturk at work to automate transcribing short recordings and have
found that it works pretty well. The trick is to qualify your workers so that
they pass some kind of test. You can also only accept workers that have a
rating above some minimum. Then, critically, as suggested by others here, get
each task done multiple times for cross-checking. And make sure that your
instructions are clear.

------
snikolov
I've spent a good chunk of time modifying an image labeling interface to make
it more intuitive for mechanical turk workers to label obscure things like
biological scans. The hope is that a better interface will increase quality
(they did not seem to have a clue what to do using the old interface), but I'm
starting to question whether the interface will make that much of a difference
after all.

------
orionlogic
Is there any app or start-up which delegates work to my social connections?
Like a Mechanical Turk for my social sphere. Would be a nice solution.

~~~
theklub
What about a local mech turk type program for small tasks? Maybe craigslist or
angieslist is filling this need?

~~~
orionlogic
There are places in the world where craiglist even not heard of. On second
thought there are some issues with this approach, what if my social circle is
narrow? Scalability and Reliability(reliable sources) are sitting on opposite
side.

------
galuggus
Eyeopening.

I was planning on using mt for a project I'm working on.

has anyone got any pointers on how to get the best out of mechanical turk?
Advice much appreciated.

~~~
imx
Mturk manuals are junk and very hard to follow. HIT data cleansing is the
biggest issue. Instead of using the command line tool, use their API to
integrate into your app, as it will save plenty of time down the road...

To "weed out" ineligible workers, try this approach: 1\. Post a bunch
(1000-5000) of cheap multiple-choice HITs. 2\. Allow no more than 10 hits per
worker. 3\. Each hit to get 3 responses from different workers. 4\. Review
answers, compile the list of "good" workers, blacklist the "bad" ones. 5\.
Post another bunch of HITS, make them available for eligible workers only
(found in step 4), this time the HITs might be more demanding, individually
review results for each worker -> the best ones go on your "preferred worker"
list. 6\. Repeat steps 1-5 as necessary.

From then on it's fairly safe to rely on mturk workers from your preferred
list.

~~~
user24
looks like there's a need for a mturk preferred worker aggregation service.

~~~
patio11
Seriously, for being a fire-and-forget API to the lowest possible level of
human tasks, it requires a heck of a lot of hands on management, including
_arguing with identifiable people over two cents_. (YOU DIDN'T SAY TO TURN OFF
CAPS. I wish I were exaggerating.)

I ended up writing off five hours to goodwill when I did a project with a $100
turking component for a client. To use a line favored by my old Indian
colleagues: if you pay peanuts, you get monkeys. Lesson learned.

Next time I will just find a freelancer with a high tolerance for repetition.

~~~
todayiamme
>>>Next time I will just find a freelancer with a high tolerance for
repetition.<<<

Can you shoot me an email when you do?

I have the ominous feeling that it will be mind numbingly boring, but
nonetheless money is money.

------
s00pcan
This is still being used? I remember how easily scammed it was when this first
came out. You'd think that massive failure would have meant something to them.

------
saturn
Money quote:

> We all know that Mechanical Turk challenges the whole “Junk-in, Junk-out”
> dilemma and makes it more like “Always junk-out, regardless of the input
> process”

Couldn't be more true IMO, mturk is basically useless except for this "meta"
kind of research and is a good example of a community that needed active
management and positive incentives going to absolute shit in the absence of
both.

