
Mechanical Turk Lessons Learned - kmano8
http://engineering.curalate.com/2017/02/01/mechanical-turk-lessons-learned.html
======
jbob2000
I wanted to read your article, but the design of your blog really sucks. I can
hardly read the text, it's quite small. Monospace fonts are great for
programming, bad for reading. Animated gifs every 5 lines is really annoying.

Sorry!

~~~
manarth
I ended up using dev-tools to delete all the gifs. I can live with play-once-
then-stop, but the continual repeated movement was distracting.

~~~
lordchair
document.querySelectorAll('img').forEach((im) => { im.style.display = 'none';
})

~~~
fudgy73
\+ document.body.style.backgroundColor = 'white'; = can read now

------
emeryberger
We (the PLASMA research group at UMass,
[http://plasma.cs.umass.edu](http://plasma.cs.umass.edu)) developed a system
called AutoMan specifically designed to automatically manage quality (as well
as to automatically compute pay and time) for a wide variety of tasks. You
basically invoke people _as functions_ and it just works, with statistical
guarantees (it also handles payment, etc. without any additional effort).
Makes dealing with MTurk _much_ nicer. Best used in Scala but also can be used
from Java.

[http://automan-lang.com](http://automan-lang.com) [https://github.com/plasma-
umass/AutoMan](https://github.com/plasma-umass/AutoMan)

Paper here on AutoMan, round one: *
[http://cacm.acm.org/magazines/2016/6/202648-automan/abstract](http://cacm.acm.org/magazines/2016/6/202648-automan/abstract)
(CACM Research Highlight, 2015)

Original paper, not behind a paywall: *
[https://people.cs.umass.edu/~emery/pubs/res0007-barowy.pdf](https://people.cs.umass.edu/~emery/pubs/res0007-barowy.pdf)
(OOPSLA '12)

New features described here have been rolled into AutoMan: "VoxPL: Programming
with the Wisdom of the Crowd" (CHI '17, to appear):
[https://people.cs.umass.edu/~emery/pubs/voxpl-
chi.pdf](https://people.cs.umass.edu/~emery/pubs/voxpl-chi.pdf)

~~~
DavidHm
This is both amazing and a bit unnerving.

~~~
brilliantcode
Matroska brain confirmed. But seriously, I was looking for a tool like this.

------
rhema
For US-based people, I suggest using Mechanical Turk as a worker before
creating HITs as a Requester. See
[https://www.reddit.com/r/HITsWorthTurkingFor/](https://www.reddit.com/r/HITsWorthTurkingFor/)
for decent examples.

It makes me sad that so much of the work available on Mechanical Turk is
poorly designed, and that workers have little recourse to bad Requesters.

~~~
ayw
Definitely. That's one of our focuses at Scale API (www.scaleapi.com). We
build the UIs and the tooling to make sure they're efficient and intuitive, to
avoid the exact problems of requesters making poorly designed work.

~~~
mustacheemperor
You mention your business with a hyperlink four separate times in this thread.
I have to wonder how much of your purpose in posting is informative after the
second or third time.

------
giarc
I recently tested MTurk for my startup. We set up about 500 HITs to collect
website URL and email from various businesses. We set the price at $0.05
(Amazon takes an additional $0.01). Jobs quickly got started and within 24
hours we had all of our data collected.

I'm not sure I would do it again though. A lot of the businesses we were
targeting don't have a web presence and therefore "No URL/No Email" was a
viable answer. However, when I went through the list to see 150 "No URL/No
Email" answers I didn't know for sure whether that is true or whether the
Turker realized they could just copy/paste and make a quick buck. Amazon does
provide the amount of time they spent on the task so I rejected any that were
less than 10 seconds as I felt like they didn't give it a good enough try.
Over that, I just accepted the answer realize that it may be false.

In the end I think I spent more time going through results and correcting them
then it actually saved me. I'm excited to use MTurk in the future again, but
only for appropriate projects.

~~~
fenwick67
I think you probably needed to have some redundancy, have each business
checked at least twice.

~~~
pitt1980
are there any tools that help you do QA on MT data entry in a Bayesian way?

seems like the situation is ripe for such a tool

\---------

start with a subset of questions you have a predetermined answer to, only keep
feed questions if the person responding has met a certain quality threshold on
those question

every so often feed them a QA question

every so often send the same question to someone else to check it for
redundancy

seems like there is a lot you could do to adjust theses based on Bayesian
confidence intervals, and exactly how mission critical you need certain data
to be

maybe something like that already exists, idk

\------------- (edit: is this what the Scale API does?)

~~~
doleson
Yup. My old company (www.crowdflower.com) has a platform set up to do this.

------
omouse
_So why use Mechanical Turk in the first place? Turkers will work for a single
penny in many cases._

Exactly what I was looking for: the cold brutal logic of captialism. It's all
good if it's low-cost.

 _Even after all of this you will still get bad answers._

...yes, of course, you're not paying for quality, you're paying for quantity
and to reduce your costs. If you were paying for quality you would put up a
few posters on college campuses and pay more.

------
werdnapk
Those animated gifs are extremely distracting.

~~~
jwilk
Dark background, low contrast, ugly font, and the GIFs.

It's almost like somebody was trying hard to make this article unreadable.

------
ayw
Hey guys! I'm cofounder of Scale API (www.scaleapi.com), a YC S16 company
building an API for human intelligence. We've been working to obviate the need
to tune your system to work on products like MTurk, and instead have a really
simple API that _just works_. We've worked to build technology to guarantee
quality to our customers and build a simple developer experience.

I really respect your ability to work with MTurk and have it work for you
guys. In our experience, it often takes significant effort to get anything
remotely functional and reliable on MTurk. That's why we're building Scale :)

~~~
_nothing
I listened to you on Software Engineering Daily! Haven't found a use for the
service yet but I've been keeping you guys in the back of my mind in case a
good use case comes up at work.

------
dalacv
Your profile pic gif is broken. It just shows a still picture of you.

------
markovbling
There was a great talk by some AWS guys at the aws re-invent summit on ways to
improve accuracy using ideas similar to cross-validation...

It's on iTunes - title is: "Getting to ground truth with Amazon web services
mechanical Turk"

Video also available on YouTube:
[https://m.youtube.com/watch?v=vRtLdeNl7Tg](https://m.youtube.com/watch?v=vRtLdeNl7Tg)

------
tmaly
I tested out Mechanical Turk back in 2008. I think I was trying to get a
YouTube video promoted on some site.

I still have some credits on the system.

I am not even sure if I could come up with a good use. For those with current
usage experience, could I create in theory a task where people would look up
the best sights to see in the top 10 travel destinations?

Would this be a valid use case, and how would you deal with duplicates?

~~~
VLM
If you have the credits, survey how popular it is to drink and turk. I have
coworkers who think its funny and claim its popular in their younger peer
group. That anecdote is not actual data. I don't know if its popular or almost
unheard of. I don't know if amazon allows meta-gaming like this. Amazon does
not tolerate outright identity theft (send me your SS number to verify your
accuracy) but a simple question like "how many alcohol drinks have you had
today" is probably anonymous enough. Amazon spends a lot on advertising, so
you can see why editors would not cover this story assuming it were true
(which it might not be)

Drink and Turk is a drinking game where you try to turk enough to pay for your
alcohol. I've seen some non-cooperative behavior of trying to find the
funniest turk so the group laughs or give the most ridiculous answer still
providing payment. Most turking is pretty boring but you'd be surprised what
alcohol and youth and a slightly warped mind can do. There is supposedly a
weed variation on this game, you can guess its obvious single rule.

~~~
brianwawok
Want to cofound a bar? You get a tablet at each table, and have to earn enough
credits to get the next drink. No need to pay cash, drinks will be paid right
out of your turk balance.

------
k__
Does a turker make at least minimum wage? (8,50€/h here in Germany)

~~~
c0nducktr
It's very unlikely. I tried it a couple months ago just for the experience,
and in the ~1.5 hours I spent, I may have made $5. I honestly don't understand
why anyone would participate as a worker on the site.

~~~
nmat
That's more than the minimum wage where I live, and I am in Europe. I suppose
there are many countries with people that would be happy to work for that
rate.

~~~
corysama
From what I've read, there are 3 popular demographics that are happy working
on MTurk: 1) People living in areas where a non-strenuous, $5/hour is hard to
come by. 2) People who are bored and find MTurk a somewhat interesting way to
make some spare cash (stay-at-home parents, elderly, students) 3) People who
are bored of sitting in front of a screen at work all day with nearly nothing
to do (security guards, cashiers) and could use some extra cash

------
sqeaky
Even Firefox's new reader view din't get rid of most of those things. But it
did fix the monospace font and contrast issues.

------
brilliantcode
Can turkers be asked to install a chrome extension or run javascript in their
consoles?

Also can you include javascript in the HIT HTML file?

------
duncanmeech
Needs more animated gifs. What was the blog post about again?

