Ask HN: Mechanical Turk & user testing data?

Anon84 · on March 29, 2009

Humm... I always wondered about the biases introduced by using Mechanical Turk for feature testing. I've seen it used a lot in several different contexts (like almost every other talk at WSCD'09 and WSDM'09). Isn't there a risk that we are training "professional beta testers" that can considerably skew the results?

It seems like a clear case of where the different parts have very different incentives. The hackers want to have the best possible sample of users, while the Turks want to get their respective tasks done as quickly as possible so they can move on to the next paying one. I would think this would lead to Turks specializing in very specific (and relatively common and repetitive) tasks do they could maximize their through put.

In your case, I would prefer to "watch" users out in the wild, instead of "professional" turks to truly assess the quality of a feature implementation.

oomkiller · on March 30, 2009

Thats a good point, almost like testing with users that are power users for your software. This would be fine though if your target users have the same experience as the turk users.

floozyspeak · on March 30, 2009

7 out of 10 is good enough to establish a pattern, but you should really try and figure out who your 10 turks were demographically before walkin off into the sunset.

do some basic demographic questions if you can, and always try and avoid easily answer questions with turks because ultimately in the end they are on turk to make a buck, and they will blow thru scale questions and simple yes/nos faster than all get out to get paid.

the other thing i'd do is try to get some folks you know who use the tool, or if its for a broad audience get someone you know, a friend of a friend and test them to see how they stack up against your turk findings, video tape these sessions for the dev team so they can relive the awe of "oh my frickin god" when and if it occurs

overall turk is fine and your 7 of 10 is fine if its a broad range acceptance target like "everyone searches" or what not but if its really demographic specific you need to know that piece of data from the turks because odds are that 10 really represents up to 3 or 4 different demographics, which makes your 7 more like 2 or 3

systemtrigger · on March 29, 2009

> Would you not release an update until your UI passed the 10 out of 10 test?

No. You cannot expect 100% of unqualified users to grok any UI.

> Should I worry about the 3 in which our UI failed them?

You should want to know why these 3 didn't pass your test.

To say much beyond this in the way of advice I think we need to know about your UI, the test and the worker incentives. What exactly did your HIT Description say?

tokenadult · on March 29, 2009

Who will use the new feature? Why not directly observe a few users?

http://www.useit.com/alertbox/20030825.html

http://www.useit.com/alertbox/20030120.html

pclark · on March 29, 2009

Do you know what users struggle with your UI?

Do you log which turkers can and can't use your app (eg, are they old ladies or comp sci nerds?)

Turk is ok for small traction. It's better if you simply post a link here and we'll review it. Quality testers are hard to come by.

frisco · on March 29, 2009

Has anyone ever posted a survey HIT which asks the Turk workers about their backgrounds (and posted the results)? I'd be interested to see what the workers' qualifications and backgrounds are, and where they are, native languages, etc.

menloparkbum · on March 29, 2009

A few people have researched Turker demographics:

http://behind-the-enemy-lines.blogspot.com/2008/03/mechanica...

http://behind-the-enemy-lines.blogspot.com/2008/03/why-peopl...

http://waxy.org/2008/11/the_faces_of_mechanical_turk/

There may be others. Those are just what I had saved in my bookmarks.

ryanspahn · on March 29, 2009

thanks for all the comments and helpful resources!

I wish HN had a section devoted to testing new features & UI! That would be great, as Im not asking feedback on my app(graciously received that), just like to know if the simple UI defines our new feature.

Feel free to take the Turk - 9 assignments remain: http://tinyurl.com/cx5czv