Hacker News new | past | comments | ask | show | jobs | submit login
The rise of 'pseudo-AI': how tech firms quietly use humans to do bots' work (theguardian.com)
276 points by YeGoblynQueenne 9 months ago | hide | past | web | favorite | 137 comments

Am glad this is highlighted now. Back in 2015, when we wanted to build an meeting scheduling bot, we naively thought we could use only machines to get the job done. 3 months later we realized that was no way feasible, not then not in the next 10 years. So the common feedback we got was to just use low cost labor in India/Phillipines to get the job done. To us, that was a no-go because we kept privacy as the top criterion for doing anything with people's inboxes. Even after obfuscation, we didn't want any human to read or parse soeone's private messages. So we dropped the idea and shut shop. Sure we failed, but at least we were true to our conscience.

Another way to have tackled this problem was to just go down the research route and build a product once the ML was fully baked, but VCs won't fund such adventures.

To get VC funding, we had to lie through our teeth, which was a total no-go. I am not bringing the righteous argument, but for once I feel really good that our viewpoints held then are corroborated now, some kind of confirmation bias kicking in.

It's great that you stuck to your guns, but there's another ethical path forward: transparency. Many companies already pay third party employees to look at their schedule, so it's not a non-starter. And there are a few VCs out there that understand training costs for AI and are willing to engage with a journey that includes them - as long as the cards are on the table.

What we need is a set of accounting metrics about the cost of training and the rate of improvement, so that VCs can get comfy with how to make projections rather than choosing between buying snake oil and nit participating in the AI Dev cycle.

PS - emphatically not taking aim here at your specific decision, OP. Startups have different opportunities and interests, and these decisions are tough. I'm sure you know better than I do whether the ethical wizard-of-oz model above is relevant to you in particular.

Thank you. The latter point about accounting metrics that you mention is very valid. I will think on these lines. I am very much interested in reaching out to VCs who are supportive and understand training costs for AI. If you know of any please do point out.

Could you expand upon the technical problems you ran into? I would have thought it possible to build a reasonably good meeting scheduling bot with just classical constrained optimization algorithms (no AI / ML), but I'm probably missing something.

Well, for starters we are looking at question answer type natural language processing, something superior than Google smart replies. So imagine a flow something like '<Bot>, schedule a meeting with <p1> at Noon tomorrow. Now the <bot> has to figure out the availability of <p1> at Noon. This is relatively simple if <p1> has given the bot access to the calendar but that's unlikely. So the bot has to start a series of dialogs which to figure out the availability. Now you are looking at NLP + some state maintenance and it gets even more trickier if you have more than one participant, if there are no available overlaps, if the bot has to force a schedule which is dictated by the boss. Using a phone app you can solve some of these by prompting custom keys but then you run into app distribution challenges.

I gave a lecture a couple of months ago called Developing your AI BS Detector addressing this very topic.


The main thesis is that there seem to be more and more companies out there solving interesting problems, which by itself is great, but they're bolting a lot of wording on top talking about AI. Most of this seems to be an attempt to differentiate themselves in the market and access funding and I find it incredibly dishonest.

The whole discussion around AI these days has become so tainted by scheisters trying to attract funding and attention that I actively try to distance myself from association.

We need to focus a lot more on Intelligence Amplification (IA) (https://en.wikipedia.org/wiki/Intelligence_amplification), i.e., building tools for humans to become more productive, and less on AI.

Douglas Engelbart and others had this figured out 50 years ago (see: the Mother of All Demos). The AI hype is dangerous, since it will no doubt lead to the trough of disillusionment.

Well the idea is that all emerging ideas with broad appeal go through the Hype Cycle. I don’t know that AI is different or dangerous in that regard.

I'm not of the opinion that it is dangerous as an idea, but rather that the intentional deception I see on the AI topic is dangerous (not to mention unethical).

I can't think of a single example of an "AI" company I talked with who did not know that they were explicitly exaggerating their claims and capabilities for the express purpose of attracting funding and fueling the hype fire.

The solid tech startups I've seen are focused on the problem they are solving for customers rather than the tooling (AI).

Not to mention that AI as we define it does not actually exist yet. Anyone who uses that term is doing so dishonestly. Show me any intelligent code and I'll shut up. Until then I call BS on anything that's claims to be powered by AI.

I think you can reasonably call self driving cars AI without it being BS. Many things like translation may qualify based on how you frame the question, but driving is an open ended real world task.

Artificial Intelligence: the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. [Google]

I wrote an AI to detect AI hype. So far it's batting a thousand. Let me know if you want in at the ground level.

Yes, anything that works is not AI, by definition.

My favorite example of this sort of thing, mentioned in the Guardian story, is the kerfuffle late last year when Expensify's receipt-reading SmartScan feature was found out to be partly backed by human receipt readers: https://qz.com/1141695/startup-expensifys-smart-scanning-tec...

(I don't file expenses often, but personally when I do so through Expensify, I find SmartScan so slow that I now assume all my "SmartScanned" receipts are going to a human.)

Quote from the article

>How to start an AI startup

>1. Hire a bunch of minimum wage humans to pretend to be AI pretending to be human

>2. Wait for AI to be invented

Translation: Lean Startup Concierge/Wizard of Oz MVP. For anyone wanting to learn or refresh about that, here's a descent article I just Googled up (I reviewed a few and picked this one): https://grasshopperherder.com/concierge-vs-wizard-of-oz-test...

How do they handle bogus results from MTurk? I know people that tried to use it and had to stop because 95% of the "workers" just put meaningless data in hoping you don't notice and they get paid.

It depends on the task, but if you can get them to do multiple tasks in one, you can ask them to do a task you already know the answer to. If they get the known answer wrong, instant reject.

Couldn't you just solve that by sending the job to multiple workers and comparing the results?

That's an obvious solution, but you might start approaching a point where it's actually cheaper to hire a dedicated person for it.

That doesn't solve the problem of a person being wrong.

Think about it.

I consulted with a great company in this field and they also use a mix of OCR and humans. Difference is they have trained in-house staff, rather than farming it out to just anyone for the cheapest they can get.

I think this article may be overstating a little.

First, the prototyping/bootstrapping with humans is not a terrible idea. It takes a hurdle, and moves it down the line a bit. You still need to get over that hurdle. It's still a potential failure point, but it's not an irrational approach. There are a ton of businesses started on the basis of.

  1. make free service
  2. get 1bn customers
  3. ???
  4. profit
Twitter, Google, Facebook... They deferred that little "make money" hurdle until they were ready for IPOs. Obviously, this opens a door for stupid plans, where the AI task will not be achievable. But, the "monetisation step" has comparable problems too.

The deception is a problem though, building companies based on deception is a problem and will probably lead to a nasty crash.

Second, if the goal is a result rather than to use AI... Say you're trying to solve some specific transcription use case, like Spinvox or Expensify. First, the problem seems solvable or soon to be solvable. Second, it's helpful to have a stream transcription attempts coming to you with real life accuracy requirements. Third, it's not a bad way of approaching Human-AI hybrid systems. Start with 100% human labour and gradually hybridise. Maybe you never solve the hard problems that will get you to 100% AI, but you find ways of leveraging human labour to solve it efficiently.

If what they reach is a tech/process that allows a person to transcribe 10k words per hour... that's pretty good too.

First, the prototyping/bootstrapping with humans is not a terrible idea

Well, it might be. Are human drivers really prototyping/bootstrapping self-driving cars? We are deeply into the territory of Moravec’s Paradox here.

Interesting, care to elaborate?

For a lot of NN/ML applications, they magic ingredient is "humans" and a record of humans doing something enough times to describe statistically. AI sign recognition, sentence completion or checker playing is very often based on estimates of "what would a human do."

"Would a human say this photo contains cats", is really how a lot of ML interprets the question "where is my cat"?

Interesting, care to elaborate?

Moravec's Paradox is that everyone thought that sensory input would be easy and reasoning about those inputs would be hard. But it turns out that the sensory input part is very hard, much harder than anyone thought, and once you have that down, the reasoning is actually simple. So any service that is relying on humans doing the sensory input bit is handwaving away the difficult part. Humans aren't really aware of how much processing is involved in things we take for granted such as sight and hearing. We think that thinking is "special" but it requires relatively little power to do that once the underlying hardware/wetware has already processed the signals.

For example, a fundamental bit of logic is

    if pedestrian():
Which is trivial... once you have a pedestrian() function that is.

"Would a human say this photo contains cats", is really how a lot of ML interprets the question "where is my cat"?

Yes and no. If you ask a human, why do you think this is a cat, he or she will say, well look at the whiskers and the ears and the paws and the general overall cuteness. But an NN doesn't have a whisker coefficient or a cuteness multiplier. It doesn't "think" like a human and nor should we expect it to. ML that is more human-like is a decision tree, but once you start getting into random decision jungles and the like, the explainability starts to diminish.

I didn't mean that it thinks like a human, rather that it's thinking is based on a statistical aggregation of a bunch of humans thinking. 1,000 humans identify cats. Train an NN to identify cats based on that training set.

The point about the paradox is good one, and very relevant to this issue. I can't tell if it is a paradox about computers/intelligence or a point about people. I suppose it's all the same when it comes to building wizard of Oz companies. because Moravec's Paradox, you will probably misidentify what is the easy and/or hard part of the problem you are trying to solve with AI.

Google and Facebook both had massive revenue and operating profit before IPO.

Great to see the Guardian referencing back to Spinvox here, whose speech-to-text service turned out to be largely run be sweatshop workers in the Phillipines; and a warning from history: This shtick works as long as you can transition to AI. If not, then the service will become increasingly flaky until the business collapses.

The business model works fine and is in-use if many industries. The problem comes from raising and spending money on beliefs that costs will reduce to 0 because of AI, which may never happen.

Sadly most of these startups could do much better by just hiring and using human staff to provide the same services.

Yep, agreed. The problems come from misleading people, whether that's about the privacy of the service thanks to AI, or about the Opex/Capex profile of the business as a result of hidden manpower.

> a warning from history: This shtick works as long as you can transition to AI. If not, then the service will become increasingly flaky until the business collapses.

The business model "use low-paid labor to service wealthy clients" seems a little more inherently stable than that. In most cases, nobody expects a collapse. Why in speech-to-text?

Say you're not profitable, and raise money on the expectation that you can eventually lower costs and only then become profitable. If you then find out you can't lower costs (i.e. you can't automate something you could, and keep having to rely on more costly labor), then you eventually collapse.


Uber, Airbnb, others get their "partners" to provide the working capital.

This is a new take on the "value added reseller" business model. (I don't know what else to call it. Multi-level marketing?)

For instance, Autodesk's dealer channel covered the costs of marketing, sales, technical support, training, etc. While they kept the sweet nectar of profits for themselves.

> This is a new take on the "value added reseller" business model. (I don't know what else to call it. Multi-level marketing?)

"franchising" ?

If you are producing physical goods then becoming more popular just means that the stores that sell your product run out of stock faster. Then you scale up production.

With an internet based service, if you allow new people to sign up at any time and you become popular and you are not able to scale then you will not be able to service all of your users quickly enough. The people that you have pretending to be AI will be a bottleneck that makes it difficult to scale because you won’t be able to find more people suitable for that job and train them for it quickly enough. Then the business collapses.

The key distinction for many such "AI-enabled" services is that they're providing a service that is not cost-efficient even for low-paid labor, so they are selling below cost unless the expected AI-based automation gains materialize and achieve a drastic reduction in the manual labor required.

Another example here is Leverton, which is basically outsourcing all their "AI" work to sweatshop paralegals in Poland. It's apparently great to get funding though.

Could you elaborate a little more? I'm very interested in Legal Tech, however, I noticed there are few useful applications actually on the market/being used.

It depends on the use case and business model. At the end of the day, the question is can the business scale with good unit economics. Software, machine learning, etc.. are just potential levers. Reminder that no businesses have 100% gross margins, even SaaS companies cap out around 80-85% because they need to provide human support alongside their software.

Absolutely unsurprising given the current state of hype.

How to get rich: (A) Start a page that applies photoshop effects to user-uploaded pictures. (B) Secretly accomplish A using low-paid grunts. (C) Claim it's done with AI, obtain a billion dollars of VC money

You don't even need to do it secretly, just say it's the next step. See: Uber

But Uber has what seems to be world class Machine Learning (If that's not the heart of AI, what is?) lab. I am not well versed enough to compare it to Google's, or openAI, but is surely seems like they are at least trying to push the research envelope?

Evading regulators (Greyball) is a very different problem domain to self-driving cars

I was not speaking about either, actually,




Im by no means a machine learning expert, but sometimes on a Sunday i'll pour myself a mug of coffee and go through some neat tutorial, and sometimes the stuff is Ubers.

They are by no means free from criticism for their wrongs, but to ignore what they have done correctly paints the world far too simply.

Yes, and if you want to do it correctly you use your AI story to convince some VCs to make you a huge pile of money, which gives you the advantage over anyone else who gets a similar idea.

At this rate every company should just start claiming their workers are human-lookalike cyborgs since being a plain-old honest company who doesn't lie about using human workers doesn't get the same boost as one using human workers but claiming the "AI" treatment.

I think another big reason is people expect near perfection from a professional Human powered service but might be much more forgiving when the service is thought to be done by AI.


But I believe that concern about privacy is the main issue. If you trust some firm with your data, you also trust their data systems. And their staff. But if they're giving your data to numerous third parties, there's far more potential for leaks and malicious activity.

This reminds me of NSA's argument that data collection and processing isn't illegal, because ...

> According to USSID 18, a top-secret NSA manual of definitions and legal directives, an "intercept" only occurs when the database is queried — when someone actually reads the text on a screen.


Yes because the NSA believes the Heisenberg’s Uncertainty Principle applies to that data. Such data is simultaneously in the "collected"/"not-collected" state. It is only when the data is observed that it resolves its state.

What are some examples of companies claiming to use AI but instead actually using human workers?

Mentioned in the article: Edison Software, Spinvox, X.ai, Clara, Expensify, Facebook.

No... maybe I wasn't clear on this, but the article doesn't claim those companies do what I asking:

- Edison Software: Their complaint is that engineers went some users' personal emails to "improve" a feature, and that this was a privacy violation. This is not addressing what I was asking. To me this means their AI really is doing the work by default, but that in cases where the AI doesn't behave well, they have humans intervene. Moreover, it's not clear to me that they ever claimed the work was fully AI-driven in the first place.

- Spinvox: Again, same thing: "The ratio of humans to messages and humans to number of users is very, very low."

- X.ai: I can't view the link behind a paywall so I don't know what their case was.

- Clara: same as X.ai

- Expensify: Again, same as Spinvox and Edison Software: "Expensify admitted that it had been using humans to transcribe at least some of the receipts it claimed to process using its “smartscan technology”."

- Facebook: Did Facebook actually claim M was entirely AI-powered? The fact that humans were used isn't enough to answer my question of whether they were "claiming to use AI but instead actually using human workers".

Note that I'm NOT defending these practices by any means, or suggesting that they're not privacy violations. All I'm saying is they don't answer my question, which was entirely about the use of AI, not about privacy violations.

To hopefully clarify what I was looking for: I was looking for examples of companies that are explicitly advertising that their services are AI-powered, but actually using humans to do the heavy-lifting. Meaning I'm looking for a company that does much more than have humans go through user data to improve its existing AI or handle edge-cases. That humans sometimes occasionally become involved at some points to improve service is not enough -- the company needs to be more or less pretty-much-actively-lying about their use of AI when the work is being delegated to human workers. (I asked for examples of this because this seemed to be the criterion the parent comment had.)

AI is nonsense. Dijkstra was right.

I'm not saying that silicon/mechanical intelligence isn't possible. I'm unaware of any physical law that precludes it. But what we currently call "AI" is just the pathetic fallacy run wild.

All that said, multidimensional data-driven linear recognizers are pretty impressive.

> AI is nonsense. Dijkstra was right.

Define AI first.

One of the first few lines on Wikipedia about AI: The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring "intelligence" are often removed from the definition, a phenomenon known as the AI effect, leading to the quip, "AI is whatever hasn't been done yet."

I think the key concept here is Moravec’s Paradox. People believing in the "AI is whatever hasn't been done yet" quip are moving the goalpost in the definition.

Consider speech recognition. Obviously an AI problem solved, right? Except, the AI problem and what the current solutions are solving are two different things. The AI problem is understanding and appropriately reacting to spoken words, as if a human was on the other end. Current implementations are glorified pattern matchers run over clever hashes of sound recordings. There is no understanding happening, there are no concepts forming within the machine (and the understanding is not back-fed into pattern matcher to correct the sensory input on the fly). The difficult parts, the ones that make speech an AI problem, have been entirely sidestepped with mathematical tricks. It sort of works, but its scope is nowhere near the original AI problem.

Similar analysis can be made for anything that is mentioned with the "no longer AI" quip. Can a DNN recognize a hot dog? Sort of, for some definition of "hot dog", only if input images are clear and similar enough to the training set. It's a cute trick, and you can make a business out of it if you control enough of the environment around the system, but it's still nowhere near what we mean when thinking about AI recognizing objects.

> "AI is whatever hasn't been done yet."

If you can't replicate what a human do, it's not AI. The fact that we can only beat humans for very, very narrow applications/games and that we don't have a generalized model for learning is a clear failure of the AI hype.

We do have a generalised model for learning- it's called PAC learning [1] and it's part of Computational Learning Theory, which studies learning in computers.

Note that machine learning may be popular today, thanks to the recent successes of deep learning but learning is not the only thing that makes humans intelligence. For instance, there is reasoning about what you know, and possibly other stuff like motivation, etc.


[1] https://en.wikipedia.org/wiki/Probably_approximately_correct...

Valid definition, why not. My point is that talking about AI without defining what you mean doesn't make sense.

Honestly, that's also the problem with most articles about AI. Articles about AI are either praising the great mystical AI for recognizing cats or blaming AI for not achieving X, but humans can.

It's totally valid to define AI this way.

But just keep in mind, when most people talk about AI, they're knowingly talking of something much more limited. So you're going to constantly have communication failures with people who are defining AI differently than you.

(Many people nowadays use the term AGI [Artificial General Intelligence] to mean what you think of as AI, btw).

Yeah I know about AGI but I dont like that term because it implies that the classifiers we have nowadays are good enough to be called "intelligence". They are just statistical models with great number of layers, nothing else.

Your description ("statistical models with great number of layers") tells me you're talking about neural networks. However, we have "nowadays" many classifiers that are not neural networks and therefore have no layers of any sort, like SVMs, KNN or logistic regression and are not even statistical, like decision trees/forests.

I should also point out that literally all the classifiers "we have nowadays" as per your comment, have been known for at least 20 years (including deep neural networks).

I'm pointing all this out because your comment suggests to me that your knowledge of AI and machine learning in particular is very recent and goes as far as perhaps the last five or six years, when deep nets popularised the field.

If that is so- please consider reading up on the history of AI. It is an interesting field that goes back several decades and has had many impressive successes (and some resounding failures) that predate deep learning by many years. I recommend the classic AI textbook "AI- A modern Approach" by Stuart Russel and Peter Norvig. You'll notice there that, even in recent versions, machine learning is a tiny part of the material covered. Because there is so much more to AI than just deep neural networks, or statistical classifiers.

If I'm wrong, on the other hand, and you already have a broad knowledge of the field, then I apologise for assuming too much.

Peter Norvig's Paradigms of AI Programming: Case Studies in Common Lisp is a good read too.

Can you prove that human intelligence is not statistical models with great number of layers, nothing else (besides some hard-coded instincts)?

A dog can't replicate what a human can do, therefore it's not intelligent, right?

Intelligence is a spectrum, some of the things we have created are intelligent by the definition of the word. They are not human level intelligence, yet.

The things we have created aren't even at rodent level intelligence yet, never mind dogs.

The thing humans can do aren't even at calculator level intelligence yet, let along CNNs.

How about this (incomplete) one: the ability to learn a subject at hand and then to apply this knowledge into another field.

For example, the machine becomes a master chess player then uses this ability to become a master at backgammon.

That's called transfer learning in ML and we are not very good at it, yet.

I’m not sure about the quality of my linked article but it seems like that humans becoming good at chess does not noticeably improve their other skills. They will simply be better at chess.

So our expectations might bee too high against AI getting vastly better only by doing better transfer learning.


Learning chess will make you better at other board games, even if it doesn't improve, say, your maths or language skills. For instance, chess may teach you lessons about sacrificing assets to earn a reward later on, or lessons about material advantage, tempo, controlling the board and so on, that can be applied to many different games. That counts as transfer learning and it is something that statistical machine learning cannot do at all, basically (i.e. a new model has to be trained from scratch for each different game).

As an aside- you might argue that there are common elements of board games' design that make it easier to reuse knowledge. However, statistical machine learning systems can still not transfer knowledge of one game to others, even given the common design elements that should make it easier.

Almost there! Have you seen Feynman talking that knowing the birds' names doesn't mean you know anything about them?

Let's see how the machine learns about the concept of transfer learning in the first place :)

I heard a good set of definitions.

Data science: observing trends in data

Machine learning: humans develop models that are fit to data to make predictions

AI: Computers make modeling decisions entirely autonomously. No human input. Dump data, get predictions.

So would you consider unsupervised classifiers to be AI?

I believe that intelligence is more than quantifiable. Von Neumann machines, no matter how complex, will never achieve intelligence. Again, to be completely clear, I’m not claiming machine intelligence is impossible, just that what we see presently isn’t it in any honest sense.

What do you even mean by "Von Neumann machines, no matter how complex, will never achieve intelligence."? If you mean machines that self-replicate, then there're plenty counterexamples around already (https://www.xkcd.com/387/). If you mean computers using the von Neumann architecture, then that's weirdly specific, since modern computers are at best vaguely inspired by that model, and GPUs in particular work very differently.

You're using Von Neumann machine in the sense of a self-replicating machine. I think the commenter you're replying to meant Von Neumann architecture, which is the basic layout of a computer, and has nothing to do with self-replication.

See this Wikipedia disambiguation page: https://en.wikipedia.org/wiki/Von_Neumann_machine

Thanks for clarifying. The man was such a polymath and has made so many contributions I should have been more clear myself.

How is self-replication a requirement for intelligence?

I charitably assume he's drawing from the fact that all true intelligences we can observe are capable of self-replication.

At one point that was true for 'entities that can perform addition'.

I have a feeling your statement will not stand the test of time well at all.

I can't help repeat this quote from a previous thread. It's from an interview of a lecturer at the London Business School who appeared on a BBC TV programme about technology:

"...Expert systems are really slightly dumb systems that exploit the speed and cheapness of computer chips...There are many expert systems in the literature which are nothing more than a series of fast if-then-else rules...you do that a couple of hundred thousand times it can look remarkably intelligent"

The interview above was broadcast in...1984. There have been advances since then of course, but at the same time this 30-year-old quote still has a certain ring of truth to it.

Here's the clip from the TV programme featuring the quote above: https://computer-literacy-project.pilots.bbcconnectedstudio....

The surprising thing might be that humans and other "thinking" animals are nothing more than a series of fast if-then-else rules too.

Not sure if this one has been debunked yet, but I remember learning in my psych class about the guy who fell asleep, got out of bed, drove a car a bunch of miles, and at his destination murdered his in-laws, all while apparently asleep.

It was used as an example for conscious/unconscious behavior.

You must remember that "Expert System" is a term used in machine learning which specifically means a system in which the input is (logically, not literally) processed by humans at every step until the output is provided, rather than a term to describe "the best" AI systems.

As mentioned in the quote, this is usually done though if/else "if this is the case, then the next step is that".

The reason they are called Expert Systems is that they provably provide the correct output for any input 100% of the time.

The idea that ai will be a revolutionary leap is probably nonsense. What we have seen so far is slow incremental evolutionary development of the tech over decades. Everyone raising money is claiming that the quantum leap is just around the corner but there is very little evidence to support that.

What you actually see is that improvements in perceived machine intelligence show diminishing returns to increasing compute capacity which is a good sign that people are on the wrong track to achieve general ai and that future improvements in perceived intelligence will grow at a slower rate, not exponentially increase.

Unless you mean General AI. We don't have that yet.

AI is huge. Machine learning definitively is changing how people work and will work. It is not there yet but it'll get there. Some people just want to monetize on the hype.

Could you please provide a reference for said Dijkstra's sentiments on AI?

AI => machine learning => automated statistics

Rendered in English: Automated statistics is a consequence of machine learning. Machine learning is a consequence of AI.

I don't understand your point. Please explain.

They mean "greater than or equal to", which doesn't make it correct, but at least makes sense of their point.

Even then, "greater than or equal to" is >= not =>

I guess they're meant to be some kind of arrow?

ya its an arrow bud

A lot of this comes from the availability of cheap hype-driven capital. The way it's supposed to work is that AI lets you replace expensive human labor with cheap computers so that a job that might've cost $10 in wages instead costs $0.0001 in server time. The way it actually works is that you tell a bunch of investors that you've got an AI that reduces the cost of X by a factor of 10,000, they give you $100M in capital, and now you have 10,000x the amount of money available, so you can pay the original $10 in wages and worry about how to actually reduce the cost of X by 10,000 later, usually once it's clear that no more funding is forthcoming.

This is not really a healthy state for the economy, but seems to be how every technology wave happens. The real innovation will come when the cheap money dries up.

What they do could be a viable thing to do, if getting your customers to spend their own money to switch from a manual system with spreadsheet files sent over whatsapp to using your API will enable you to be the only player who has a huge dataset with real customer data, which you can use to understand which subset can actually be automated using existing technology, and automate only that using deep learning or whatever (for which you want the biggest dataset you can get), and keep doing the other stuff manually. You win by making people switch to your API, getting network effects, etc.

If you don't cut corners in the way you do the manual tasks and you charge enough to cover your costs, and enough tasks can be automated so that using your service is not more expensive than not using your service, you're ok. But you probably don't have the margins the VCs want.

It's working for Uber...still waiting for those autonomous taxis that don't run people over....

Just the latest incarnation of the good old Mechanical Turk:


One of the services that you can use to implement pseudo-AI is also called that.

See https://aws.amazon.com/documentation/mturk/

Interestingly this service is also used to build the large data sets required to train certain ML models.

Not quite on topic, but I can't help noticing the overused word "quietly" in the title of the story: https://www.geekwire.com/2017/tech-news-sites-quietly-rely-w... (This is a rare case where the word might be appropriate, though!)

> Apparently whenever a tech company does anything without publishing a press release and running ads during Monday Night Football, tech news sites have decided that the best word to describe it is “quietly.”

Yes, quietly means "without making noise". I don't know why geekwire is so breathless about using English language properly.

Glad to see other people also find it annoying.

I went searching to see if someone had written about the phenomenon last year, but eventually concluded I was probably mad. Thanks!

I like the term “artificial artificial intelligence” to describe this model

Some colleagues of mine talk about the prevalence of companies doing this man behind the curtain thing all the time while claiming that there solution is AI/cognitive. When we come across examples of this happening we just refer to that Seinfeld Moviefone episode and say 'Why don't you just tell me [foo]' - or whatever the "AI" is trying to solve.


It never fails to get a good laugh.

Anyway, I think that human interaction for the training aspects of AI to prepare data, label examples, test models, etc. is really hard to automate entirely and should be considered part of the development process. The execution side of an application component that is marketed as AI/cognitive however is not true AI unless it is totally free of human interaction.

Even if the program was totally "free of human interaction", it shouldn't be marketed as "cognitive"—an exceptional word that requires exceptional evidence, when used in context of software.

Otherwise, it's all bovine manure.

Echoes of theranos here. Putting AI aside, aren't these companies swindling their investors?

It was my understanding that all AI technology/approaches need some sort of training. And that this training required some sort of human intervention to grade "correctness"

In companies that I have worked for that want to use AI; they always have a clause in the privacy agreement allowing employees to look at users' content in order to determine how well the algorithm was doing.

There are some differences: 1. PII was redacted (to the extent possible); 2. Customer could opt-out.; 3. the AI wasn't core to the service.

That said - sci-fi AI does seem to require humans behind the curtain.

Personally, I am more comfortable with a human being my virtual "AI" assistant than I am with real AI. I feel a human is less likely to make a mistake that will cause me personal damage, there is more responsibility/liability to doing a good job.

Sounds a bit like the stripe (or is it square) story where every bank trnsaction was humanly done before they got the rights to automate it. I'm not surprised and i find it considerably smart considering all the money pouring into everything with a a machine learning buzzword attached to it.

While deceiving your customers is maybe just wire fraud, deceiving your investors about this is securities fraud.

X.ai barely works and it was pretty obvious that humans are behind it. It would make so many one-time non-systematic errors when there was almost no variance in the text.

"The rise of 'pseudo-AI': how tech firms quietly use humans to do bots' work"

Does this include Uber's human drivers?

I can already see how AI assistants call AI assistants to book an AI driven service for their AI bosses, to find out 0.3 seconds later that their AI boss being an AI doesn't need that service, calling again to remove the booking. Nobody notices that this happens, because no human is involved in the actual conversation. But of course both of these conversations lead to events where data needs to be transfered from the calendars to finance, billing etc. These transfers are done by humans of course because AI would be too expensive for that.

I see people going to work 16 hours a day handling transferring these transactions related to haircuts for AI bosses that don't physically exist (and therefore don't need haircuts), from one database to the other. The AI boss doesn't see the need for 8h workdays, because he's not human. And the money for the work doesn't need to be enough to afford living, right? The AI government will pay a base income to anybody anyways.

This seems completely logical to me. A new business needs humans to train their AI. So why not use them to bootstrap your business while you're at it?

When we have transfer learning or one-shot learning, it will be a different story.

Sounds reasonable as long as you are not lying about what the company does.

Under those conditions we would have never got chlorinated water, saving billions of lives.

This is a good way to quickly build an MVP to gauge customer demand before incurring the time and expense of building a real scalable product.

So, deceiving your customers about how their data is being handled is a “good MVP”. With this much cynicism you must be an “enterpreneur”.

Where is the deception? Why should the customers care how the service is implemented on the back end? Unless the service lies and claims the data will never be seen by humans then they're not doing anything unethical. All that matters is whether customers are getting value from it.

Because if a service offers to automatically tag my photos, I don’t expect random people looking at them. In fact, I don’t want anyone looking at my family photos other than those whom I explicitly gave permission to do so.

Same applies to my voice recordings. Same applies to my receipts. Same applies to my health data.

Unless your data is encrypted using keys that only you control then you have to assume that random people are looking at it. That is inherent in the nature of SaaS.

Technically yes. But since GMail became widespread, there’s a tacit agreement that my data won’t be looked at by random humans.

Is there? It's an inherent part of the service that Gmail software reads your mail (including until recently, for advertisement). How do you expect developers to work on that software without, at least occasionally, looking at the real inputs?

Sure, if you present it that way, it's fine. As in "I'll do xyz for you, somehow".

If you say you're taking people from A to B in a motor-powered cab but you actually come to pick them up in a rickshaw, that's wrong. People will naturally form expectations about the ride based on what you say.

So, the worst of both worlds -- inflexible responses that aren't even scalable to demand.

Fake it 'til you make it!

One issue with Artificial Intelligence is that it is typically referred to and understood in a way that concludes it is similarly functional in its domain as actual (i.e. human) intelligence would be. In reality AI is more akin to artificial cheese. Artificial cheese is not cheese just made with different ingredients. It is something vaguely cheese like made in a way completely dissimilar to real cheese.

See also: the front page of HN. You didn’t think that was an algorithm choosing which stories you see, did you? :)

It works well. And considering how much effect HN has on our daily lives, it’s neat that the algorithm reduces to “this human has good taste.”

I see no reason why HN won’t be around for decades. And that’s exciting. It’s the only newspaper that feels like a community.

This used to feel strange — if you think about the position of influence and power HN commands, it’s hard to feel ok about ceding control to a handful of people, no matter how benevolent. And on certain topics this has indeed been an issue —- if a certain behavior or conversation isn’t tolerated on HN, it’s easy to feel like you’re a misfit who doesn’t belong in tech, or even that you don’t identify with the tech community.

One way to become comfortable with this situation is to trust incentives, not people. The only way HN wins is if HN stays fascinating. It’s why most of us keep coming back: to monitor the pulse of the tech scene, or to learn physics factoids[1], or to spot a new tool that saves you hours.

And that’s why HN can’t be fully automated. Which stories fascinate you? If you could write an algorithm to generate an endless stream of interesting content, you’d have invented an AI with good taste. And for the moment, that’s beyond our capabilities.[2]

[1] As the earth orbits the sun, the area swept out by the triangle formed by the sun->earth->earth3WeeksLater is equal to the area of the triangle during the next three weeks, and the next three weeks after that, and so on. Equal time periods = equal areas. That’s why the earth moves faster when it’s closer to the sun: it has to cover a greater distance in order to sweep out the same area as during the previous 3 weeks. This gives you an intuition about how gravity behaves. And with a slight tweak it also holds true for e.g. an ice skater twirling around. If you stick your leg out while spinning, your leg sweeps out more area than when it’s near your body. That’s why you spin more slowly: if you’re sweeping out more area per time period, your rotation must slow down proportionally. (This is conservation of angular momentum, dressed up in intuitive clothing.)

[2] This is in contrast with sites like YouTube, which give an endless stream of interesting videos. Part of HN’s strength is it’s unified front page. We all see the same thing. And that’s why writing an algorithm to make the front page interesting is much harder than creating a personalized neural network trained to show you all the interesting things you haven’t seen yet.

Youtube's algorithm is designed to give the viewer stuff similar to what they have already seen. Be it Goto Conference videos, Nazi rants, or wacky people doing fingernail designs. Youtube's AI is not designed take chances and offer up something 'surprising'. 'Surprising' might be offensive not enlightening.

"Similar" also might be offensive not enlightening.

YT has a problem with trolls and harassers posting reaction videos that YT gets tricked into thinking are "similar".

Not sure guardian understands how training AI works.

There is a difference between labeling training data and just using humans to do the work. Some things cannot be achieved yet even with lots of labeled training data but companies are pretending they have solved hard ML problems at a high level of performance when the technology and research aren't there yet.

But why should it matter for customers who does the job? I mean, if you don't tell anybody, and pretend it's 100% AI then it's bad, but if it "will eventually become AI", and your investors and everybody interested in the technical details know how it's actually done, then what's wrong? A true "AI" should be able to pass a Turing test, so for the customer, it should be indistinguishable, and shouldn't matter.

Of course, the privacy concerns are there, but then again, if it's a real "AI", then it may be worse for the computer to read your data than for a random low paid worker ;)

Because the business costs don't scale well unless they can invent technology to remove the humans and for harder ML problems that currently isn't possible. The business is only based on being able to sell they hype of "AI" to investors.

Ask yourself the reverse question - if they already have a working, useful service that's human-powered, why are they lying by saying it's done by AI? Answer: because they're trying to get things they're not entitled to - like better funding, better sales, more attention. In other words, they're trying to cheat other people out of their money or time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact