

Ask HN: Mechanical Turk & user testing data? - paul7986

We use Mechanical Turk a lot when adding a new feature.  Before releasing it we want to know if the UI clearly defines this new feature.<p>Turk is great overall and usually 7 out 10 Turk testers are able to figure through our UI the new feature.  Though I wonder should I worry about the 3 (various tests) in which our UI failed them?<p>Would you not release an update until your UI passed the 10 out 10 test?
======
Anon84
Humm... I always wondered about the biases introduced by using Mechanical Turk
for feature testing. I've seen it used a lot in several different contexts
(like almost every other talk at WSCD'09 and WSDM'09). Isn't there a risk that
we are training "professional beta testers" that can considerably skew the
results?

It seems like a clear case of where the different parts have very different
incentives. The hackers want to have the best possible sample of users, while
the Turks want to get their respective tasks done as quickly as possible so
they can move on to the next paying one. I would think this would lead to
Turks specializing in very specific (and relatively common and repetitive)
tasks do they could maximize their through put.

In your case, I would prefer to "watch" users out in the wild, instead of
"professional" turks to truly assess the quality of a feature implementation.

~~~
oomkiller
Thats a good point, almost like testing with users that are power users for
your software. This would be fine though if your target users have the same
experience as the turk users.

------
systemtrigger
> Would you not release an update until your UI passed the 10 out of 10 test?

No. You cannot expect 100% of unqualified users to grok any UI.

> Should I worry about the 3 in which our UI failed them?

You should want to know _why_ these 3 didn't pass your test.

To say much beyond this in the way of advice I think we need to know about
your UI, the test and the worker incentives. What exactly did your HIT
Description say?

------
floozyspeak
7 out of 10 is good enough to establish a pattern, but you should really try
and figure out who your 10 turks were demographically before walkin off into
the sunset.

do some basic demographic questions if you can, and always try and avoid
easily answer questions with turks because ultimately in the end they are on
turk to make a buck, and they will blow thru scale questions and simple
yes/nos faster than all get out to get paid.

the other thing i'd do is try to get some folks you know who use the tool, or
if its for a broad audience get someone you know, a friend of a friend and
test them to see how they stack up against your turk findings, video tape
these sessions for the dev team so they can relive the awe of "oh my frickin
god" when and if it occurs

overall turk is fine and your 7 of 10 is fine if its a broad range acceptance
target like "everyone searches" or what not but if its really demographic
specific you need to know that piece of data from the turks because odds are
that 10 really represents up to 3 or 4 different demographics, which makes
your 7 more like 2 or 3

------
tokenadult
Who will use the new feature? Why not directly observe a few users?

<http://www.useit.com/alertbox/20030825.html>

<http://www.useit.com/alertbox/20030120.html>

------
pclark
Do you know _what_ users struggle with your UI?

Do you log which turkers can and can't use your app (eg, are they old ladies
or comp sci nerds?)

Turk is _ok_ for small traction. It's better if you simply post a link _here_
and we'll review it. Quality testers are hard to come by.

------
frisco
Has anyone ever posted a survey HIT which asks the Turk workers about their
backgrounds (and posted the results)? I'd be interested to see what the
workers' qualifications and backgrounds are, and where they are, native
languages, etc.

~~~
menloparkbum
A few people have researched Turker demographics:

[http://behind-the-enemy-
lines.blogspot.com/2008/03/mechanica...](http://behind-the-enemy-
lines.blogspot.com/2008/03/mechanical-turk-demographics.html)

[http://behind-the-enemy-lines.blogspot.com/2008/03/why-
peopl...](http://behind-the-enemy-lines.blogspot.com/2008/03/why-people-
participate-on-mechanical.html)

<http://waxy.org/2008/11/the_faces_of_mechanical_turk/>

There may be others. Those are just what I had saved in my bookmarks.

------
ryanspahn
thanks for all the comments and helpful resources!

I wish HN had a section devoted to testing new features & UI! That would be
great, as Im not asking feedback on my app(graciously received that), just
like to know if the simple UI defines our new feature.

Feel free to take the Turk - 9 assignments remain: <http://tinyurl.com/cx5czv>

