Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Mechanical Turk & user testing data?
12 points by paul7986 on March 29, 2009 | hide | past | favorite | 9 comments
We use Mechanical Turk a lot when adding a new feature. Before releasing it we want to know if the UI clearly defines this new feature.

Turk is great overall and usually 7 out 10 Turk testers are able to figure through our UI the new feature. Though I wonder should I worry about the 3 (various tests) in which our UI failed them?

Would you not release an update until your UI passed the 10 out 10 test?




Humm... I always wondered about the biases introduced by using Mechanical Turk for feature testing. I've seen it used a lot in several different contexts (like almost every other talk at WSCD'09 and WSDM'09). Isn't there a risk that we are training "professional beta testers" that can considerably skew the results?

It seems like a clear case of where the different parts have very different incentives. The hackers want to have the best possible sample of users, while the Turks want to get their respective tasks done as quickly as possible so they can move on to the next paying one. I would think this would lead to Turks specializing in very specific (and relatively common and repetitive) tasks do they could maximize their through put.

In your case, I would prefer to "watch" users out in the wild, instead of "professional" turks to truly assess the quality of a feature implementation.


Thats a good point, almost like testing with users that are power users for your software. This would be fine though if your target users have the same experience as the turk users.


7 out of 10 is good enough to establish a pattern, but you should really try and figure out who your 10 turks were demographically before walkin off into the sunset.

do some basic demographic questions if you can, and always try and avoid easily answer questions with turks because ultimately in the end they are on turk to make a buck, and they will blow thru scale questions and simple yes/nos faster than all get out to get paid.

the other thing i'd do is try to get some folks you know who use the tool, or if its for a broad audience get someone you know, a friend of a friend and test them to see how they stack up against your turk findings, video tape these sessions for the dev team so they can relive the awe of "oh my frickin god" when and if it occurs

overall turk is fine and your 7 of 10 is fine if its a broad range acceptance target like "everyone searches" or what not but if its really demographic specific you need to know that piece of data from the turks because odds are that 10 really represents up to 3 or 4 different demographics, which makes your 7 more like 2 or 3


> Would you not release an update until your UI passed the 10 out of 10 test?

No. You cannot expect 100% of unqualified users to grok any UI.

> Should I worry about the 3 in which our UI failed them?

You should want to know why these 3 didn't pass your test.

To say much beyond this in the way of advice I think we need to know about your UI, the test and the worker incentives. What exactly did your HIT Description say?


Who will use the new feature? Why not directly observe a few users?

http://www.useit.com/alertbox/20030825.html

http://www.useit.com/alertbox/20030120.html


Do you know what users struggle with your UI?

Do you log which turkers can and can't use your app (eg, are they old ladies or comp sci nerds?)

Turk is ok for small traction. It's better if you simply post a link here and we'll review it. Quality testers are hard to come by.


Has anyone ever posted a survey HIT which asks the Turk workers about their backgrounds (and posted the results)? I'd be interested to see what the workers' qualifications and backgrounds are, and where they are, native languages, etc.



thanks for all the comments and helpful resources!

I wish HN had a section devoted to testing new features & UI! That would be great, as Im not asking feedback on my app(graciously received that), just like to know if the simple UI defines our new feature.

Feel free to take the Turk - 9 assignments remain: http://tinyurl.com/cx5czv




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: