
How Cheap Labor Drives China’s A.I. Ambitions - kawera
https://www.nytimes.com/2018/11/25/business/china-artificial-intelligence-labeling.html
======
stcredzero
_“I used to think the machines are geniuses,” Ms. Hou, 24, said. “Now I know
we’re the reason for their genius.”_

I recall John Searle's Google Tech Talk, when he said outright that AI was
nothing more than the automation and repackaging of the output of human
intelligence. Most of the room was clearly against him, just a bit short of
booing him. Nowadays, the "smart talk" around AI is exactly what he said.

For those who may not be aware, John Searle is notorious for his "Chinese
Room" argument against hard AI.

[https://en.wikipedia.org/wiki/Chinese_room](https://en.wikipedia.org/wiki/Chinese_room)

~~~
ThomPete
That argument is fundamentally flawed.

It correctly points out that the person in the room doesn't know Chinese but
it misses the bigger point.

Of course, the person in the room doesn't know Chinese just as the individual
neuron in the brain doesn't know Chinese.

The room or the house (i.e. the feedback loop) is the conscious part.

The correct answer is "we don't know". But we do know that we are ourselves
the result of emerging complexity built by much simpler structures and that
they are built on even simpler structures until in the end some primordial
soup.

The mistake is isolating any individual element in the feedback loop and
claiming that because that doesn't have some "magic" properties the system as
a whole can't.

~~~
rjsw
The person in the room will also start to recognize patterns in the input
messages and the resulting output messages. It isn't the canonical
understanding of Chinese but it is "an" understanding

If Searle was correct we could do an MRI on a blind person and detect when
their brain switched into "symbol processing mode" if a conversation with them
covered any visual concepts.

~~~
mannykannot
The Chinese Room argument fails to make its case, but not for that reason -
Searle claims that no such instruction book could be written, and so the
experiment could not be performed, meaning that the subject would have no
possibility of learning from the outputs generated by performing it. If the
experiment could be started, whatever the subject would learn from performing
the role would be moot, as merely by getting started, without the subject
initially knowing anything of the language in which the questions were posed,
it would have refuted Searle.

The reason the argument fails to make its case is that Searle does not have an
effective response to the 'systems reply': in the paper, his response is a
mixture of appeals to intuition, together with the erroneous claim that if the
subject memorized the instructions, then the systems reply would not apply.

------
will_brown
Doesn’t Google captcha leverage free labor to train ai? Every time I get one
of their picture puzzles (click on all the pictures with cars, stores,
mountains,etc...)I get about 2 wrong every time on purpose. Even government
services like business searches in the state of Delaware, I constantly
conclude, I’m being forced to train Google’s self driving cars, so I figure
I’ll do the only thing in my power as a protest (unfortunately I can’t boycott
the government systems).

I purposely leave one or two unclicked and click a false. I have often
wondered how my behavior is classified and if at scale it would have any
impact. My reasoning may be misguided and I’m sure impact nil, but it serves
as a reminder to think about these systems, the things I do and why I do them.
I also do similar things with chatbots, and sometimes I’ll add “thanks yous”
and really try to interact genuinely, as though I don’t know I’m talking to a
bot. I like to think the chatbot devs are somewhere highfiving putting
screenshots of my chat in their slide decks.

~~~
jake_the_third
> I get about 2 wrong every time on purpose.

I do the same, but actually get google to accept bad data as good. The trick
is to get the system to trust you (e.g. by supplying one honest answer), then
selecting answers that the system is likely to be unsure about. If done
correctly, you can get them to accept a lot of bad input this way.

Like you, I am under no illusion that this will have some sort of effect on
the correctness of the resulting ML models, but I like to think that I am at
least delaying or otherwise decreasing the training rate.

Have google adopted stackoverflow's model of making the result of our unpayed
work available to us, I would have answered these challenges honestly.

~~~
will_brown
>but actually get google to accept bad data as good.

I should have been clearer, I purposely get them wrong and Google accepts the
bad data for me too.

------
baybal2
Unfortunately, the reported AI boom, is of the same kind as Japan's "5th
generation" computing was.

The when nouveau riches bankrolling the AI party see that no neural algo
amount to "money out of thin air" they quickly loose interest, even without
finding out what neural algos actually are.

The atmosphere in the industry feels very much like pets.com 2.0.

Here in Shenzhen, I often ask those intrepid samodelkins "how are you supposed
to make money with it?" and never hear back a concise answer.

China's real economic strength does not lay with that type of enterprises.

~~~
NicoJuicy
They get billions of funds for a selve checkout register.

PS. The waiting lines with a human are bigger than the selve checkout here in
Belgium. Some say it would be better to make room for humans again.

~~~
shanghaiaway
Generally the self checkout has a minimum of one staff per check out to show
customers how the thing works.

~~~
NicoJuicy
Per 6 if they are in a cube though

------
pcurve
"Two dozen young people go through photos and videos, labeling just about
everything they see. That’s a car. That’s a traffic light. That’s bread,
that’s milk, that’s chocolate. That’s what it looks like when a person walks."

But in the U.S., we do it for free via Captcha. ;-)

~~~
oh-kumudo
For Captcha, you are pointing out the one or several pictures that doesn't
belong to certain category, so that seems to me the service needs to know the
answer beforehand, doesn't it? It could be used for QA purpose though.

~~~
agnokapathetic
Don't need ground truth for crowd-sourced annotation just need agreement
between n-different users.

1 randomly present captcha to few known-safe/low risk users 2\. Compare
answers with answers from higher risk user's Captcha responses.

------
nitwit005
> But the ability to tag that data may be China’s true A.I. strength, the only
> one that the United States may not be able to match.

Why would the United States need to match it? Presumably some of these
companies are doing outsourced work for foreign firms.

~~~
NicoJuicy
Africans are actually labelling data for multinationals right now.

Seems like FOMO article

~~~
ardy42
Part of the benefit is that Chinese taggers would have Chinese cultural
knowledge, which is probably required for problems like the bakery one
mentioned in the article (Africans probably don't eat steamed pork buns and
wouldn't be able to correctly tag them).

However that would also mean the Chinese "AI" would have reduced effectiveness
outside of China where different cultural knowledge is needed.

------
msamwald
I don't see how this is a competitive advantage when any company or
institution can outsource these kinds of annotation jobs to low-wage
countries.

~~~
CardenB
There's a lot of work involved in how the data is annotated and how the
labelers are managed. You have to minimize the error of the labels, because
they aren't perfect and the tasks can be more difficult than you might
realize.

Secondly, using the data is also difficult.

You have to decide what data you need, how you are going to get clean data,
and how you are going to use the data. That's the competitive advantage.

------
bgee
I'm really surprised that nobody has mentioned ImageNet and its use of
Mechanical Turk (Amazon crowdsourcing platform) to label 14 million images
[0].

[0]:
[https://en.wikipedia.org/wiki/ImageNet](https://en.wikipedia.org/wiki/ImageNet)

------
billylindeman
It's not just china. Numerous bay area AI companies employ similar methods.

~~~
heinrichf
"Tech companies pay poor Kenyans to produce training data for AI"
[https://news.ycombinator.com/item?id=18369384](https://news.ycombinator.com/item?id=18369384)

------
NicoJuicy
Something to think about:

> “It was the same work, same movement, day after day,” said Yi Zhenzhen, a
> 28-year-old Ruijin employee who once worked at an electronic component
> company. “Now I have to use my brain a little bit.”

------
microwork
I'm the founder of an image labelling company, happy to answer any questions
you might have.

------
contingencies
Observation: It's highly inefficient to place everyone in the same physical
space when you can just use their cellphones. By the article's reckoning,
that's at least $21,000 loss per year on a business doing $2,000 projects.

------
qbig
Cheap Labor is driving everything in China, not just XXX-evil-sounding-taking-
over-the-world-ambition.. Is it just me or the nytimes is running out of
shitty PR about China to write about ..

------
Tsubasachan
Dont worry Americans us Euros will always prefer flying into California to
Shenzhen. We will stick with you until the end of... wait this Xiaomi smart
home is 50% cheaper?

------
jaimex2
This is machine learning not AI.

~~~
cwyers
Pretty much every popularized AI advancement of the past decade has been
machine learning.

EDIT: To be clear, I think you're right that the term AI is being used
carelessly here, but that ship has sailed.

------
rahimnathwani
tl;dr

\- supervised learning requires labelled data

\- some companies specialise in labelling data for other companies

------
friedman23
I'm very skeptical if employed humans labeling will generate enough training
data. I don't see how this method is superior to what Google is doing with
captcha.

Also this is already being done by many American corporations hiring people in
countries like Bangladesh.

~~~
linkingday
Amazon has a mechanical turking service for labelling data already - no need
to actually put people on your payroll for doing this.

