
AI Defeats the Hivemind - J3L2404
http://www.technologyreview.com/blog/mimssbits/26186/
======
futuremint
Wait... the naive-Bayes was trained on Yelp data? Isn't Yelp data _also_
crowd-sourced information? I may not be thinking about this right, but it
seems to me that training the classifier on crowd-sourced data and then
comparing that to Mechanical Turk... that in the end you're just comparing the
quality of the crowd-sourced data to each other?

~~~
JonnieCache
In fact you're comparing individually crowdsourced data to massively
mechanically aggregated crowdsourced data. When viewed like this the results
are not in the least surprising.

------
nicpottier
Making good Turk tasks is a science in of in itself. Figuring out the
incentive is the key, and sometimes you have to think a bit out of the box.

We actually used turking at my company for some really nutty stuff, logo
generation. Basically we'd give people a URL and ask them to generate a 160x40
logo for it. We had some base rules, like the background had to be solid, have
no scaling artifacts etc..

We assigned each logo to five people.

Our reward was essentially this: \- anybody who met all the rules, got .25c \-
the best of all that met the rules, got a 50c bonus

It took a few days for people to get the hang of it, but after that we
consistently got excellent results, with some really creative stuff coming
back. Yes, we were paying up to $1.50 for the logos, but we weren't using them
for every site, only the really popular ones, and having it automated made it
worth it. Every day we spent maybe 60 seconds picking the best logo of five
submissions for a few dozen sites, everything else was automated.

The product that used these by the way is NewsRoom, a pretty sexy RSS reader
available on Android. All the logos you see for sites there were generated by
Turkers.

Anyways, finding the right equation for that task took some experimentation,
but I was impressed by the results in the end.

------
bad_user

          79 passed. This was an extremely basic multiple choice test.
          It makes one wonder how the other 4,581 were smart enough to
          operate a web browser in the first place.
    

I stopped reading right there.

As for the question itself, that's simple: people come for the money, and
since "Turkers" are paid pennies for those tasks that means they have to do a
lot of them; so replying randomly on a test is a no-brainer (I wouldn't even
bother to click and type and just write a script).

It's a good thing we've got these magazines reminding us how we are so smart
and the rest of the world is so stupid. What would I do without my over-
inflated ego?

~~~
LiveTheDream
If you had kept on reading, you would have seen that the article specifically
identifies low wages as a likely cause for the low quality.

~~~
bad_user
Yes, but why keep reading an article that insults people ... the reason for
the low accuracy was not the point of my comment.

My own father is "not smart enough" to operate a browser. Lack of English
skills don't help him. But he can read French and Russian just fine, he has a
Ph.D in his profession and a carrier in politics (former advisor to the prime
minister, currently a senator in a eastern-European country).

------
carbocation
They don't mention the price per HIT. If they're paying between $0.01 and
$0.05 for these HITs, I'm not surprised by these results.

I looked at the cited paper and did not see the cost, but without the cost I
really would not bother interpreting these results. "Machines work for
electricity; humans need real money. News at 11."

~~~
mattmcknight
Who is to say that the mechanical turk-ers aren't AI?

~~~
d4nt
Now there's an idea. It would be a beautiful irony if, in a few years from
now, the mechanical turk API was used as an open platform for AI applications
to make money solving difficult problems.

~~~
JonnieCache
Well according to the article, this would be lucrative to some extent right
now. As ever it would be a problem of matching problems to algorithms.

EDIT: maybe we can get the real MTers to do the algorithm/problem matching
bit...

------
nl
Did anyone else read the paper? The summary doesn't seem very correct to me.

From the summary:

 _The results weren't pretty: in order to find a population of Turkers whose
work was passable, the researchers first used Mechanical Turk to administer a
test to 4,660 applicants. It was a multiple choice test to determine whether
or not a Turker could identify the correct category for a business
(Restaurant, Shopping, etc.) and verify, via its official website or by phone,
its correct phone number and address.

79 passed. This was an extremely basic multiple choice test. It makes one
wonder how the other 4,581 were smart enough to operate a web browser in the
first place._

From the paper:

 _Of the 4,660 workers who took this test, only 1,658 (35.6%) workers earned a
passing score, and over 25% of workers answered fewer than half of the
questions correctly.

To investigate the high failure rate, we conversed with workers directly on
TurkerNation and through private email. Based upon worker’s names and email
addresses, we believe that we conversed with a representative sample of
workers both inside and outside the United States. We found that the test was
not too difﬁcult and that most workers comprehended the questions. We believe
that many applicants simply try to gain access to tasks as quickly as possible
and do not actually put care into completing the test._

ie, 1658/4660 workers passed this test, NOT 79 (!!)

Then later they describe some additional filtering they put in place to
attempt to find the best workers (they tried estimated location and time to
complete task). Based on these filters they said: _Using a combination of pre-
screening and the test tasks described above, only 79 workers of 4,660
applicants qualiﬁed to process real business changes._

------
john_horton
I was at NIPS and talked to one of the authors. I thought the paper was
interesting, but I think the "you're not paying enough" critique is spot on.
Humans clearly _can_ be better at this task---you just can't give them strong
incentives to cut corners on quality, which happens with a low piece-rate and
a task that takes on the order of 3 ~ 4 minutes to do properly.

------
JonnieCache
Am I right in thinking that a naive Bayes classifier is beyond "not even the
best out there," and is in fact about as simple a learning algorithm as you
can get, and straight out of AI 101?

~~~
gjm11
Pretty much, yes. (Though that doesn't mean it's not a good technique. Lots of
quite effective spam filters are more or less naive-Bayes.)

~~~
abeppu
They're sometimes a good technique only because some problems are really
simple. There are almost no problems where the extreme independence
assumptions of naive Bayes create a reasonable likelihood function. The
consequence ends up that when it's wrong, it tends to be very very certain
that it's right. I think the aphorism that gets passed around is "Naive Bayes
classifiers are often in error but never uncertain".

~~~
gjm11
Yup. But some problems -- for instance, discriminating between spam and non-
spam emails, and keeping up decent discrimination as spammers vary their
tactics -- are (1) "really simple" in that sense and (2) apparently quite
difficult to solve, given that there basically were no really effective spam
filters before naive-Bayes ones came along.

------
yarapavan
The original NIPS 2010 paper that sourced this article is - "Towards Building
a High-QualityWorkforce with Mechanical Turk". Available at
[http://www.cs.umass.edu/~wallach/workshops/nips2010css/paper...](http://www.cs.umass.edu/~wallach/workshops/nips2010css/papers/wais.pdf)

HN submission of the same (13 days ago) here:
<http://news.ycombinator.com/item?id=1984130>

------
ianferrel
Does this indicate that the majority of Turkers are _already_ just simple
scripts? Perhaps just not as well adapted to particular problem sets as this
custom-built one was.

