
The rise of 'pseudo-AI': how tech firms quietly use humans to do bots' work - YeGoblynQueenne
https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-tech-companies
======
deepGem
Am glad this is highlighted now. Back in 2015, when we wanted to build an
meeting scheduling bot, we naively thought we could use only machines to get
the job done. 3 months later we realized that was no way feasible, not then
not in the next 10 years. So the common feedback we got was to just use low
cost labor in India/Phillipines to get the job done. To us, that was a no-go
because we kept privacy as the top criterion for doing anything with people's
inboxes. Even after obfuscation, we didn't want any human to read or parse
soeone's private messages. So we dropped the idea and shut shop. Sure we
failed, but at least we were true to our conscience.

Another way to have tackled this problem was to just go down the research
route and build a product once the ML was fully baked, but VCs won't fund such
adventures.

To get VC funding, we had to lie through our teeth, which was a total no-go. I
am not bringing the righteous argument, but for once I feel really good that
our viewpoints held then are corroborated now, some kind of confirmation bias
kicking in.

~~~
evrydayhustling
It's great that you stuck to your guns, but there's another ethical path
forward: transparency. Many companies already pay third party employees to
look at their schedule, so it's not a non-starter. And there are a few VCs out
there that understand training costs for AI and are willing to engage with a
journey that includes them - as long as the cards are on the table.

What we need is a set of accounting metrics about the cost of training and the
rate of improvement, so that VCs can get comfy with how to make projections
rather than choosing between buying snake oil and nit participating in the AI
Dev cycle.

PS - emphatically _not_ taking aim here at your specific decision, OP.
Startups have different opportunities and interests, and these decisions are
tough. I'm sure you know better than I do whether the ethical wizard-of-oz
model above is relevant to you in particular.

~~~
deepGem
Thank you. The latter point about accounting metrics that you mention is very
valid. I will think on these lines. I am very much interested in reaching out
to VCs who are supportive and understand training costs for AI. If you know of
any please do point out.

------
adamdrake
I gave a lecture a couple of months ago called Developing your AI BS Detector
addressing this very topic.

[https://adamdrake.com/developing-your-ai-bs-
detector.html](https://adamdrake.com/developing-your-ai-bs-detector.html)

The main thesis is that there seem to be more and more companies out there
solving interesting problems, which by itself is great, but they're bolting a
lot of wording on top talking about AI. Most of this seems to be an attempt to
differentiate themselves in the market and access funding and I find it
incredibly dishonest.

The whole discussion around AI these days has become so tainted by scheisters
trying to attract funding and attention that I actively try to distance myself
from association.

We need to focus a lot more on Intelligence Amplification (IA)
([https://en.wikipedia.org/wiki/Intelligence_amplification](https://en.wikipedia.org/wiki/Intelligence_amplification)),
i.e., building tools for humans to become more productive, and less on AI.

Douglas Engelbart and others had this figured out 50 years ago (see: the
Mother of All Demos). The AI hype is dangerous, since it will no doubt lead to
the trough of disillusionment.

~~~
dfee
Well the idea is that all emerging ideas with broad appeal go through the Hype
Cycle. I don’t know that AI is different or dangerous in that regard.

~~~
adamdrake
I'm not of the opinion that it is dangerous as an idea, but rather that the
intentional deception I see on the AI topic is dangerous (not to mention
unethical).

I can't think of a single example of an "AI" company I talked with who did not
know that they were explicitly exaggerating their claims and capabilities for
the express purpose of attracting funding and fueling the hype fire.

The solid tech startups I've seen are focused on the problem they are solving
for customers rather than the tooling (AI).

~~~
mrhappyunhappy
Not to mention that AI as we define it does not actually exist yet. Anyone who
uses that term is doing so dishonestly. Show me any intelligent code and I'll
shut up. Until then I call BS on anything that's claims to be powered by AI.

~~~
Retric
I think you can reasonably call self driving cars AI without it being BS. Many
things like translation may qualify based on how you frame the question, but
driving is an open ended real world task.

------
hyperrail
My favorite example of this sort of thing, mentioned in the Guardian story, is
the kerfuffle late last year when Expensify's receipt-reading SmartScan
feature was found out to be partly backed by human receipt readers:
[https://qz.com/1141695/startup-expensifys-smart-scanning-
tec...](https://qz.com/1141695/startup-expensifys-smart-scanning-technology-
used-humans-hired-on-amazon-mechanical-turk/)

(I don't file expenses often, but personally when I do so through Expensify, I
find SmartScan so slow that I now assume _all_ my "SmartScanned" receipts are
going to a human.)

~~~
baybal2
Quote from the article

>How to start an AI startup

>1\. Hire a bunch of minimum wage humans to pretend to be AI pretending to be
human

>2\. Wait for AI to be invented

~~~
chris123
Translation: _Lean Startup Concierge /Wizard of Oz MVP_. For anyone wanting to
learn or refresh about that, here's a descent article I just Googled up (I
reviewed a few and picked this one): [https://grasshopperherder.com/concierge-
vs-wizard-of-oz-test...](https://grasshopperherder.com/concierge-vs-wizard-of-
oz-test/)

------
dalbasal
I think this article may be overstating a little.

First, the prototyping/bootstrapping with humans is not a terrible idea. It
takes a hurdle, and moves it down the line a bit. You still need to get over
that hurdle. It's still a potential failure point, but it's not an irrational
approach. There are a ton of businesses started on the basis of.

    
    
      1. make free service
      2. get 1bn customers
      3. ???
      4. profit
    

Twitter, Google, Facebook... They deferred that little "make money" hurdle
until they were ready for IPOs. Obviously, this opens a door for stupid plans,
where the AI task will not be achievable. But, the "monetisation step" has
comparable problems too.

The deception is a problem though, building companies based on deception _is_
a problem and will probably lead to a nasty crash.

Second, if the goal is a result rather than to use AI... Say you're trying to
solve some specific transcription use case, like Spinvox or Expensify. First,
the problem seems solvable or soon to be solvable. Second, it's helpful to
have a stream transcription attempts coming to you with real life accuracy
requirements. Third, it's not a bad way of approaching Human-AI hybrid
systems. Start with 100% human labour and gradually hybridise. Maybe you never
solve the hard problems that will get you to 100% AI, but you find ways of
leveraging human labour to solve it efficiently.

If what they reach is a tech/process that allows a person to transcribe 10k
words per hour... that's pretty good too.

~~~
gaius
_First, the prototyping /bootstrapping with humans is not a terrible idea_

Well, it might be. Are human drivers _really_ prototyping/bootstrapping self-
driving cars? We are deeply into the territory of Moravec’s Paradox here.

~~~
dalbasal
Interesting, care to elaborate?

For a lot of NN/ML applications, they magic ingredient is "humans" and a
record of humans doing something enough times to describe statistically. AI
sign recognition, sentence completion or checker playing is very often based
on estimates of "what would a human do."

"Would a human say this photo contains cats", is really how a lot of ML
interprets the question "where is my cat"?

~~~
gaius
_Interesting, care to elaborate?_

Moravec's Paradox is that everyone thought that sensory input would be easy
and reasoning about those inputs would be hard. But it turns out that the
sensory input part is very hard, much harder than anyone thought, and once you
have that down, the reasoning is actually simple. So any service that is
relying on humans doing the sensory input bit is handwaving away the difficult
part. Humans aren't really aware of how much processing is involved in things
we take for granted such as sight and hearing. We think that thinking is
"special" but it requires relatively little power to do that once the
underlying hardware/wetware has already processed the signals.

For example, a fundamental bit of logic is

    
    
        if pedestrian():
            apply_brakes()
    

Which is trivial... once you have a pedestrian() function that is.

 _" Would a human say this photo contains cats", is really how a lot of ML
interprets the question "where is my cat"?_

Yes and no. If you ask a human, why do you think this is a cat, he or she will
say, well look at the whiskers and the ears and the paws and the general
overall cuteness. But an NN doesn't have a whisker coefficient or a cuteness
multiplier. It doesn't "think" like a human and nor should we expect it to. ML
that is more human-like is a decision tree, but once you start getting into
random decision jungles and the like, the explainability starts to diminish.

~~~
dalbasal
I didn't mean that it thinks like a human, rather that it's thinking is based
on a statistical aggregation of a bunch of humans thinking. 1,000 humans
identify cats. Train an NN to identify cats based on that training set.

The point about the paradox is good one, and very relevant to this issue. I
can't tell if it is a paradox about computers/intelligence or a point about
people. I suppose it's all the same when it comes to building wizard of Oz
companies. because Moravec's Paradox, you will probably misidentify what is
the easy and/or hard part of the problem you are trying to solve with AI.

------
bencollier49
Great to see the Guardian referencing back to Spinvox here, whose speech-to-
text service turned out to be largely run be sweatshop workers in the
Phillipines; and a warning from history: This shtick works as long as you can
transition to AI. If not, then the service will become increasingly flaky
until the business collapses.

~~~
thaumasiotes
> a warning from history: This shtick works as long as you can transition to
> AI. If not, then the service will become increasingly flaky until the
> business collapses.

The business model "use low-paid labor to service wealthy clients" seems a
little more inherently stable than that. In most cases, nobody expects a
collapse. Why in speech-to-text?

~~~
edanm
Say you're not profitable, and raise money on the expectation that you can
eventually lower costs and only then become profitable. If you then find out
you can't lower costs (i.e. you can't automate something you could, and keep
having to rely on more costly labor), then you eventually collapse.

~~~
randomsearch
Uber

~~~
specialist
Uber, Airbnb, others get their "partners" to provide the working capital.

This is a new take on the "value added reseller" business model. (I don't know
what else to call it. Multi-level marketing?)

For instance, Autodesk's dealer channel covered the costs of marketing, sales,
technical support, training, etc. While they kept the sweet nectar of profits
for themselves.

~~~
microtherion
> This is a new take on the "value added reseller" business model. (I don't
> know what else to call it. Multi-level marketing?)

"franchising" ?

------
xamuel
Absolutely unsurprising given the current state of hype.

How to get rich: (A) Start a page that applies photoshop effects to user-
uploaded pictures. (B) Secretly accomplish A using low-paid grunts. (C) Claim
it's done with AI, obtain a billion dollars of VC money

~~~
raverbashing
You don't even need to do it secretly, just say it's the next step. See: Uber

~~~
googlemike
But Uber has what seems to be world class Machine Learning (If that's not the
heart of AI, what is?) lab. I am not well versed enough to compare it to
Google's, or openAI, but is surely seems like they are at least trying to push
the research envelope?

~~~
gaius
Evading regulators (Greyball) is a very different problem domain to self-
driving cars

~~~
googlemike
I was not speaking about either, actually,

[https://github.com/uber/pyro/](https://github.com/uber/pyro/)

and

[https://github.com/uber/horovod](https://github.com/uber/horovod)

Im by no means a machine learning expert, but sometimes on a Sunday i'll pour
myself a mug of coffee and go through some neat tutorial, and sometimes the
stuff is Ubers.

They are by no means free from criticism for their wrongs, but to ignore what
they have done correctly paints the world far too simply.

------
User23
AI is nonsense. Dijkstra was right.

I'm not saying that silicon/mechanical intelligence isn't possible. I'm
unaware of any physical law that precludes it. But what we currently call "AI"
is just the pathetic fallacy run wild.

All that said, multidimensional data-driven linear recognizers are pretty
impressive.

~~~
davnn
> AI is nonsense. Dijkstra was right.

Define AI first.

One of the first few lines on Wikipedia about AI: The scope of AI is disputed:
as machines become increasingly capable, tasks considered as requiring
"intelligence" are often removed from the definition, a phenomenon known as
the AI effect, leading to the quip, "AI is whatever hasn't been done yet."

~~~
ekianjo
> "AI is whatever hasn't been done yet."

If you can't replicate what a human do, it's not AI. The fact that we can only
beat humans for very, very narrow applications/games and that we don't have a
generalized model for learning is a clear failure of the AI hype.

~~~
edanm
It's totally valid to define AI this way.

But just keep in mind, when most people talk about AI, they're knowingly
talking of something much more limited. So you're going to constantly have
communication failures with people who are defining AI differently than you.

(Many people nowadays use the term AGI [Artificial General Intelligence] to
mean what you think of as AI, btw).

~~~
ekianjo
Yeah I know about AGI but I dont like that term because it implies that the
classifiers we have nowadays are good enough to be called "intelligence". They
are just statistical models with great number of layers, nothing else.

~~~
YeGoblynQueenne
Your description ("statistical models with great number of layers") tells me
you're talking about neural networks. However, we have "nowadays" many
classifiers that are not neural networks and therefore have no layers of any
sort, like SVMs, KNN or logistic regression and are not even statistical, like
decision trees/forests.

I should also point out that literally _all_ the classifiers "we have
nowadays" as per your comment, have been known for at least 20 years
(including deep neural networks).

I'm pointing all this out because your comment suggests to me that your
knowledge of AI and machine learning in particular is very recent and goes as
far as perhaps the last five or six years, when deep nets popularised the
field.

If that is so- please consider reading up on the history of AI. It is an
interesting field that goes back several decades and has had many impressive
successes (and some resounding failures) that predate deep learning by many
years. I recommend the classic AI textbook "AI- A modern Approach" by Stuart
Russel and Peter Norvig. You'll notice there that, even in recent versions,
machine learning is a tiny part of the material covered. Because there is so
much more to AI than just deep neural networks, or statistical classifiers.

If I'm wrong, on the other hand, and you already have a broad knowledge of the
field, then I apologise for assuming too much.

~~~
User23
Peter Norvig's Paradigms of AI Programming: Case Studies in Common Lisp is a
good read too.

------
nostrademons
A lot of this comes from the availability of cheap hype-driven capital. The
way it's supposed to work is that AI lets you replace expensive human labor
with cheap computers so that a job that might've cost $10 in wages instead
costs $0.0001 in server time. The way it actually works is that you tell a
bunch of investors that you've got an AI that reduces the cost of X by a
factor of 10,000, they give you $100M in capital, and now you have 10,000x the
amount of money available, so you can pay the original $10 in wages and worry
about how to _actually_ reduce the cost of X by 10,000 later, usually once
it's clear that no more funding is forthcoming.

This is not really a healthy state for the economy, but seems to be how every
technology wave happens. The real innovation will come when the cheap money
dries up.

~~~
wolf550e
What they do could be a viable thing to do, if getting your customers to spend
their own money to switch from a manual system with spreadsheet files sent
over whatsapp to using your API will enable you to be the only player who has
a huge dataset with real customer data, which you can use to understand which
subset can actually be automated using existing technology, and automate only
that using deep learning or whatever (for which you want the biggest dataset
you can get), and keep doing the other stuff manually. You win by making
people switch to your API, getting network effects, etc.

If you don't cut corners in the way you do the manual tasks and you charge
enough to cover your costs, and enough tasks can be automated so that using
your service is not more expensive than not using your service, you're ok. But
you probably don't have the margins the VCs want.

------
ErikAugust
Just the latest incarnation of the good old Mechanical Turk:

[https://en.wikipedia.org/wiki/The_Turk](https://en.wikipedia.org/wiki/The_Turk)

~~~
zitterbewegung
One of the services that you can use to implement pseudo-AI is also called
that.

See
[https://aws.amazon.com/documentation/mturk/](https://aws.amazon.com/documentation/mturk/)

~~~
nerdponx
Interestingly this service is also used to build the large data sets required
to train certain ML models.

------
hyperrail
Not quite on topic, but I can't help noticing the overused word "quietly" in
the title of the story: [https://www.geekwire.com/2017/tech-news-sites-
quietly-rely-w...](https://www.geekwire.com/2017/tech-news-sites-quietly-rely-
word-create-drama-headlines-analysis-reveals/) (This is a rare case where the
word might be appropriate, though!)

~~~
srtjstjsj
> Apparently whenever a tech company does anything without publishing a press
> release and running ads during Monday Night Football, tech news sites have
> decided that the best word to describe it is “quietly.”

Yes, quietly means "without making noise". I don't know why geekwire is so
breathless about using English language properly.

------
gaius
I like the term “artificial artificial intelligence” to describe this model

------
wmij
Some colleagues of mine talk about the prevalence of companies doing this man
behind the curtain thing all the time while claiming that there solution is
AI/cognitive. When we come across examples of this happening we just refer to
that Seinfeld Moviefone episode and say 'Why don't you just tell me [foo]' \-
or whatever the "AI" is trying to solve.

[https://www.youtube.com/watch?v=gSQ6q_rGpI8](https://www.youtube.com/watch?v=gSQ6q_rGpI8)

It never fails to get a good laugh.

Anyway, I think that human interaction for the training aspects of AI to
prepare data, label examples, test models, etc. is really hard to automate
entirely and should be considered part of the development process. The
execution side of an application component that is marketed as AI/cognitive
however is not true AI unless it is totally free of human interaction.

~~~
kashyapc
_Even if_ the program was totally "free of human interaction", it shouldn't be
marketed as "cognitive"—an exceptional word that requires exceptional
evidence, when used in context of software.

Otherwise, it's all bovine manure.

------
sorokod
Echoes of theranos here. Putting AI aside, aren't these companies swindling
their investors?

------
pm24601
It was my understanding that all AI technology/approaches need some sort of
training. And that this training required some sort of human intervention to
grade "correctness"

In companies that I have worked for that want to use AI; they always have a
clause in the privacy agreement allowing employees to look at users' content
in order to determine how well the algorithm was doing.

There are some differences: 1. PII was redacted (to the extent possible); 2.
Customer could opt-out.; 3. the AI wasn't core to the service.

That said - sci-fi AI does seem to require humans behind the curtain.

Personally, I am more comfortable with a human being my virtual "AI" assistant
than I am with real AI. I feel a human is less likely to make a mistake that
will cause me personal damage, there is more responsibility/liability to doing
a good job.

------
malmsteen
Sounds a bit like the stripe (or is it square) story where every bank
trnsaction was humanly done before they got the rights to automate it. I'm not
surprised and i find it considerably smart considering all the money pouring
into everything with a a machine learning buzzword attached to it.

------
dusieaw
While deceiving your customers is maybe just wire fraud, deceiving your
investors about this is securities fraud.

------
rdlecler1
X.ai barely works and it was pretty obvious that humans are behind it. It
would make so many one-time non-systematic errors when there was almost no
variance in the text.

------
joejerryronnie
"The rise of 'pseudo-AI': how tech firms quietly use humans to do bots' work"

Does this include Uber's human drivers?

------
erikb
I can already see how AI assistants call AI assistants to book an AI driven
service for their AI bosses, to find out 0.3 seconds later that their AI boss
being an AI doesn't need that service, calling again to remove the booking.
Nobody notices that this happens, because no human is involved in the actual
conversation. But of course both of these conversations lead to events where
data needs to be transfered from the calendars to finance, billing etc. These
transfers are done by humans of course because AI would be too expensive for
that.

I see people going to work 16 hours a day handling transferring these
transactions related to haircuts for AI bosses that don't physically exist
(and therefore don't need haircuts), from one database to the other. The AI
boss doesn't see the need for 8h workdays, because he's not human. And the
money for the work doesn't need to be enough to afford living, right? The AI
government will pay a base income to anybody anyways.

------
amelius
This seems completely logical to me. A new business needs humans to train
their AI. So why not use them to bootstrap your business while you're at it?

When we have transfer learning or one-shot learning, it will be a different
story.

~~~
sorokod
Sounds reasonable as long as you are not lying about what the company does.

~~~
pixl97
Under those conditions we would have never got chlorinated water, saving
billions of lives.

------
nradov
This is a good way to quickly build an MVP to gauge customer demand before
incurring the time and expense of building a real scalable product.

~~~
tzahola
So, deceiving your customers about how their data is being handled is a “good
MVP”. With this much cynicism you must be an “enterpreneur”.

~~~
nradov
Where is the deception? Why should the customers care how the service is
implemented on the back end? Unless the service lies and claims the data will
never be seen by humans then they're not doing anything unethical. All that
matters is whether customers are getting value from it.

~~~
tzahola
Because if a service offers to automatically tag my photos, I don’t expect
random people looking at them. In fact, I don’t want _anyone_ looking at my
family photos other than those whom I explicitly gave permission to do so.

Same applies to my voice recordings. Same applies to my receipts. Same applies
to my health data.

~~~
nradov
Unless your data is encrypted using keys that only you control then you have
to assume that random people are looking at it. That is inherent in the nature
of SaaS.

~~~
tzahola
Technically yes. But since GMail became widespread, there’s a tacit agreement
that my data won’t be looked at by random humans.

~~~
wool_gather
Is there? It's an inherent part of the service that Gmail software reads your
mail (including until recently, for advertisement). How do you expect
developers to work on that software without, at least occasionally, looking at
the real inputs?

------
Wingman4l7
So, the worst of both worlds -- inflexible responses that aren't even scalable
to demand.

------
sho
Fake it 'til you make it!

------
macinjosh
One issue with Artificial Intelligence is that it is typically referred to and
understood in a way that concludes it is similarly functional in its domain as
actual (i.e. human) intelligence would be. In reality AI is more akin to
artificial cheese. Artificial cheese is _not_ cheese just made with different
ingredients. It is something vaguely cheese like made in a way completely
dissimilar to real cheese.

------
shawn
See also: the front page of HN. You didn’t think that was an algorithm
choosing which stories you see, did you? :)

It works well. And considering how much effect HN has on our daily lives, it’s
neat that the algorithm reduces to “this human has good taste.”

I see no reason why HN won’t be around for decades. And that’s exciting. It’s
the only newspaper that feels like a community.

This used to feel strange — if you think about the position of influence and
power HN commands, it’s hard to feel ok about ceding control to a handful of
people, no matter how benevolent. And on certain topics this has indeed been
an issue —- if a certain behavior or conversation isn’t tolerated on HN, it’s
easy to feel like you’re a misfit who doesn’t belong in tech, or even that you
don’t identify with the tech community.

One way to become comfortable with this situation is to trust incentives, not
people. The only way HN wins is if HN stays fascinating. It’s why most of us
keep coming back: to monitor the pulse of the tech scene, or to learn physics
factoids[1], or to spot a new tool that saves you hours.

And that’s why HN can’t be fully automated. Which stories fascinate you? If
you could write an algorithm to generate an endless stream of interesting
content, you’d have invented an AI with good taste. And for the moment, that’s
beyond our capabilities.[2]

[1] As the earth orbits the sun, the area swept out by the triangle formed by
the sun->earth->earth3WeeksLater is equal to the area of the triangle during
the next three weeks, and the next three weeks after that, and so on. Equal
time periods = equal areas. That’s why the earth moves faster when it’s closer
to the sun: it has to cover a greater distance in order to sweep out the same
area as during the previous 3 weeks. This gives you an intuition about how
gravity behaves. And with a slight tweak it also holds true for e.g. an ice
skater twirling around. If you stick your leg out while spinning, your leg
sweeps out more area than when it’s near your body. That’s why you spin more
slowly: if you’re sweeping out more area per time period, your rotation must
slow down proportionally. (This is conservation of angular momentum, dressed
up in intuitive clothing.)

[2] This is in contrast with sites like YouTube, which give an endless stream
of interesting videos. Part of HN’s strength is it’s unified front page. We
all see the same thing. And that’s why writing an algorithm to make the front
page interesting is much harder than creating a personalized neural network
trained to show you all the interesting things you haven’t seen yet.

~~~
pm24601
Youtube's algorithm is designed to give the viewer stuff similar to what they
have already seen. Be it Goto Conference videos, Nazi rants, or wacky people
doing fingernail designs. Youtube's AI is not designed take chances and offer
up something 'surprising'. 'Surprising' might be offensive not enlightening.

~~~
srtjstjsj
"Similar" also might be offensive not enlightening.

YT has a problem with trolls and harassers posting reaction videos that YT
gets tricked into thinking are "similar".

------
justonepost
Not sure guardian understands how training AI works.

~~~
maym86
There is a difference between labeling training data and just using humans to
do the work. Some things cannot be achieved yet even with lots of labeled
training data but companies are pretending they have solved hard ML problems
at a high level of performance when the technology and research aren't there
yet.

~~~
pacavaca
But why should it matter for customers who does the job? I mean, if you don't
tell anybody, and pretend it's 100% AI then it's bad, but if it "will
eventually become AI", and your investors and everybody interested in the
technical details know how it's actually done, then what's wrong? A true "AI"
should be able to pass a Turing test, so for the customer, it should be
indistinguishable, and shouldn't matter.

Of course, the privacy concerns are there, but then again, if it's a real
"AI", then it may be worse for the computer to read your data than for a
random low paid worker ;)

~~~
maym86
Because the business costs don't scale well unless they can invent technology
to remove the humans and for harder ML problems that currently isn't possible.
The business is only based on being able to sell they hype of "AI" to
investors.

