
India’s data labellers are powering the global AI race - yarapavan
https://factordaily.com/indian-data-labellers-powering-the-global-ai-race/
======
LogicRiver
This is the new back-office for AI processes and looks like India is tapping
into it just at the right time just like the BPO boom in the early 2000's.

~~~
Gpetrium
Unsurprisingly since they have had a strong tech workforce with cheap labor
cost and an advantage in language (English). It will be interesting to see
them move up the value chain and the disruption it may create.

------
snrji
Unsupervised learning is the dark matter of AI, as LeCun said. We cannot rely
on labeled data, which is expensive and scarce.

The first company to figure out how to leverage the massive unlabeled data
that is available will win the AI arms race.

We have already seen impressive progress (language models, GANs...) but much
more work remains. Models requiring labeled data or only working with toy data
(even if unlabeled) will soon become irrelevant.

~~~
mijail
What's the current sentiment on synthetic data? It may be a self serving
question since I work at synthetic data company but I'm curious for fear of
being stuck in an echo chamber.

~~~
ska
It's no magic bullet. I think it can help in particular instances but see a
lot of people chasing their own tails.

------
uberneo
Well thats a really great initiative as this gives good money to the local
villagers who are just enough educated to do this respectable office job. One
concern is how they are handling the sensitive client data and why does client
trust on them for sensitive proprietary data.

~~~
mark_l_watson
I agree that it is great to provide meaningful work and competitive to local
norms salaries.

re: client trust: compare to systems like Mechanical Turk. An established data
labeling company can monitor what employees are doing, provide ethics
training/warnings, etc.

------
Abishek_Muthian
There are captcha defeating click farms in the country, employees of which are
paid ~2$/day.

I hope at-least data-set labelling empolyees are in a better position as they
are expected to have better skill set. This is a better job than
illegal/unethical farms.

------
ashildr
Throwaway away storyline: future AIs perception is skewed by ‚the‘ Indian
perspective on the world. The first self aware AI will feel and act Indian -
what ever that means - and any other AI on a global scale, too :)

------
deppp
Late to the party. This article mentions AI based labeling tools and we're
building one of them. If you're interested to try it out send me an email mik
@ heartex.net

------
thisisit
One of the earliest companies I know in this field was Playment, a YC company
and mentioned in the article too:

[https://news.ycombinator.com/item?id=13640084](https://news.ycombinator.com/item?id=13640084)

------
baybal2
I wonder, how much of it grew up from a "human captcha solving" market?

~~~
speeq
Meanwhile Google reCAPTCHA makes us label cars and road signs for free..

~~~
Cthulhu_
In exchange, we are exposed to a minimal amount of comment spam. If it's
anything like email, systems like captcha prevent 99.9% of spam messages.

Besides, how often do you see captchas anyway? If you're not using super
privacy / tracking protective browsers, they'll remain hidden for the most
part, and the ones you see are the simple 'check this box' variety.

~~~
pavs
Because of how Cloudflare works, and how ubiquitous they are, specifically,
internet users from a non-western country can get bombarded with recapchas. It
is so prevalent and annoying that I had to resort using VPN (located on a
western country), to avoid this nuisance.

It literally breaks the internet for me, I had to go through recapchas 12-15
times a day.

------
samtrack2019
We (Humans) are helping our robot lord to build Skynet!

~~~
martin_a
I, for one, welcome our new robot overlords.

------
drinane
I hope they have a sense of humor and drop some great Easter eggs for god 2.0

