Hacker News new | past | comments | ask | show | jobs | submit login
Why AI is harder than we think (arxiv.org)
109 points by pilingual 9 days ago | hide | past | favorite | 75 comments

I think another common fallacy is to assume that humans have more general intelligence than we actually do. This is manifest in three ways:

1. Assuming that we can build an AI than can do what a human does, without being embedded in the physical world and in constant communication with other humans in the same way we are. I think this is covered by "Intelligence is all in the brain" in the article

2. Not considering that intelligent breakthroughs by humans isn't partly a product of chance.. of millions, and now billions of humans trying semi-random things. Obviously there's SOME intelligence behind it, otherwise we wouldn't have achieved more than other animals. But maybe we're overestimating how much of our results are product of intelligence alone.

3. Not considering that we use pretty dumb heuristics to come to decisions. I think the paperclip maximizer is a silly example, because the decision of whether an AI should kill or cooperate with humanity to maximize the production of paperclips is probably undecidable. Coming to a clear decision probably requires more computing power than you could have on a single planet. We humans don't need to be certain about the outcome to make a decision. We have emotions like fear, anger, pride and jealousy to nudge us towards decisions like going to war with other people to grab their resources. We need to remember that we're a product of evolving in an environment where we had to compete with other humans for resources, so that's why we're prone to those kinds of decisions. AIs will be product of an environment of competing to please humans to gain computational resources. So the heuristics they develop will probably be strongly tied to achieving that goal.

Not that AIs couldn't be dangerous, put probably more because humans make active decisions to instruct AIs to harm other humans.

I like this refutation of the paperclip maximizer scenario, which ties in to your observation:

Let's say there's a runaway superintelligence equipped with an optimization function that says it should produce a maximal amount of paperclips. Then either its objective, "produce a maximal amount of paperclips" is defined literally or not.

If it's defined literally, e.g. "make your sensors return this data consistent with lots of paperclips having been produced", then it's far easier for the AI to corrupt its sensors with paperclip porn than actually destroy the Earth, and there's no problem.

On the other hand, if the objective is not defined literally, then the AI must be able to understand fuzzy instructions in the way the humans intended it to. In that case, it would be no problem to tell the AI to not be an ass either.

So the problem happens when the AI isn't intelligent enough, and uses heuristics instead of properly optimizing. The extreme case would be grey goo, which doesn't think at all, but just blindly consumes everything. Giving extreme power to something with limited intelligence is generally a bad move.

Thinking about it in terms of intelligence is actually misleading. It's not about intelligence, it's about power to change the world.

Take SARS-CoV-2. It is a small RNA program ~32kb that duplicates and optimizes itself by random search.

It managed to hijack more computing power than our best supercomputers have [0] and kill a lot of us in the process.

By my estimates just the process of copying all SARS-CoV-2 vrions in the world with reproduction rate of 1 takes at least 1 petaFLOPS and up to 100 exaFLOPS.

That's in the range of TOP500 supercomputers (1.3 petaFLOPS - 442 petaFLOPS) [1].

SARS-CoV-2 has certainly a lot of computing power and is still able to outsmart the whole human civilization. Is that a super-intelligence?

Imagine if we manage to create equally stupid program that will figure out how to hijack heat or electrical power of our civilization and feed it into its own growth.

Quite likely it would result in nuclear meltdowns all around the world.

Exponentially growing computational processes are dangerous no matter if they are intelligent or whether they even have any objectives besides just being.

[0] https://news.ycombinator.com/item?id=26646029

[1] https://www.top500.org/statistics/perfdevel/

Which the refutation makes clear: it's not about intelligence, because a sufficiently powerful intelligence will just outwit its objective function in the most efficient way possible (which is closer to "porn" than "destroy the Earth").

Grey goo is powerful not because it can outwit, but because it has so much brute force. On the other hand, intelligence is closer to efficiency of action: not requiring exponential time to solve something that's NP-complete, for instance.

I don't think you can translate computing power like you do (a crypto miner ASIC handles a lot of bits per second, but zero FLOPS as it's all integer math; a nuclear bomb is a self-propagating reaction that alters the mass equivalent of lots of bits, but zero FLOPS).

But apart from that, I agree. And I just think the "beware AI, it will become so smart that it turns into the sorcerer's apprentice" fear is misguided, because it focuses on the wrong thing. It sure makes a good narrative though! That's why it's so popular.

Objective functions are just weak anthropocentric abstractions. If you take a random 110 rule or any other Turing machine how can you know its objective function? What was the objective function of abiogenesis? Can we outsmart the objective function of life?

> I don't think you can translate computing power like you do.

Yes, certainly human cell nucleus will not be a good general purpose computer and in this sense it's more similar to an ASIC. So, take it as a lower bound on computing power necessary to brute-force simulate the virus replication. I think it should be accurate this way. Classical computers will need at least n operations to physically copy n bits.

There are also arguments that biology is very close to the thermodynamic limits of computations [0] [1].

> Here we show that the computational efficiency of translation, defined as free energy expended per amino acid operation, outperforms the best supercomputers by several orders of magnitude, and is only about an order of magnitude worse than the Landauer bound.

[0] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5686401/

[1] https://en.wikipedia.org/wiki/Landauer%27s_principle

And yes, nuclear bombs are extremely useless as computational devices, but also extremely dangerous. They share runaway aspect with viruses, but the good thing is that nuclear bombs self-destroy. Unfortunately programs like viruses reach equilibrium with the environment and are extremely successful at keeping on existing under natural selection. There are also good reasons to believe that viruses are in a local maximum. The global maximum would be spreading to the whole universe.

Overall, it does not matter if a computation is efficient as long as it has access to enough power and stars provide ridiculous amounts of power.

The fear is the AI misunderstanding fuzzy instructions in an unpredictable way.

This is a good point which I think stems from wrongly equating human level intelligence to AGI in the popular literature. It’s not at all clear what a general intelligence should be and it’s much less clear that humans have general intelligence. In my view we have a set of very good innate priors (e.g. space/time continuity, intuitive physics) that have been optimized by evolution for thousands of years. These priors in turn allow us to learn fast from unlabeled data, but would anyone call such a system general intelligence? I’m not sure.

I call General Intelligence any system that wonders for no particular reason if others systems have General Intelligence.

IMO, currently there is no such a thing as intelligence in all AI approaches. By saying that I don't mean they are not useful, but rather saying that to me seems all so called AI model are based on statistics and for sure those programs are way better and/or faster than human in finding subtle common factors in some given datasets. But being able to resolve some hard problems does not mean they are intelligence as they simply cannot understand the problem(s). And thus they might be able to give out correct response on par or better than human in normal conditions but give wrong or even absurd responses when minor noises which human cannot even detect involved.

One essential point about this philosophical branch of AI research is to investigate where "intelligence" and "understanding" really begin.

Saying "the computer does not understand and thus is not intelligent" is a valid, but mundane belief, ignoring that entire research branch with its many unknowns and open questions.

That's why it's called "artificial intelligence". Once we've cracked general intelligence, which is seemingly just an emergent property of sufficiently complex information processing, it won't be "artificial" anymore. It'll just be intelligence.

I've heard this "just an emergent property" theory before, but I don't get it. What evidence of this do we have of this?

It seems like we make complex information processing system all the time, and the more complex they are the more fragile they are. Even the tiniest thing going wrong sends the whole system into chaos or a dead stall. Thermodynamics teaches us that order, and intelligence is largely about imposing order, takes a huge amount of work and is inherently unstable.

I don't think humanity has ever made anything of complexity comparable to even a simple brain.

Arguably not even a single cell.

As a student of philosophy and a former middle school teacher, my experience leads me to think that if we could make AI, it would only manifest in a "society" of AI that are all growing and adapting together. They would communicate with each other at speeds so fast that communication with them would be almost impossible.

Also, they'd have language that would be changing so fast we couldn't learn it.

I think the real AI is human society, not individual humans. We're just individual neurons in that network, taking information from other humans and propagating it. The fact that we die is how the AI garbage collects storage space. We had to invent our own storage cheap ways of propagation via writing, as the garbage collector constantly is coming for all of us. This way, data not useful to the AI swarm (what I ate for breakfast) doesn't have to be stored for very long before being freed up by the GC.

However, we could pause the AI simulation sometimes, and send in researchers to go over the data to what they learned, by first learning their propagation languages then by reading through likely millions of years of history and science to see what they've learned.

We could possibly give them a world like ours, with bodies like ours, but all stimulated, and see if they discover something useful for us, like lightspeed travel or something. The chances are pretty high though that we'd miss some essential rules and it wouldn't be applicable.

The idea though that we'd make a solo intelligence is bizarre. What would happen if you raised a human child alone for millions of years? They'd be insane and feral. This is what would happen if we could make an AI, it would likely run so fast it would have been trapped alone for millions of years by the time we say hello. Maybe it's trapped alone watching TV, but that doesn't mean it can speak.

Also what are its desires? Why would it have any interest in doing anything but watch TV forever?

We often forget that we're a huge host of competing desires. Our desire to interact is carefully balanced between cooperation and competition, sometimes bundled together so tightly we can't even unpack it.

AI is much harder than most people realize.

Yes! I like the garbage collection analogy and we already know that children brought up without parents do badly. Why would an AGI be any different, let alone an AGI brought up in virtual chains for safety's sake. What a misguided notion of safety that would be.

>The idea though that we'd make a solo intelligence is bizarre.

Yes, even our outlier general intelligences (i.e. creative geniuses) got that way because they somehow reproduced more of the surrounding culture inside their heads than everyone else did. It's misleading to think of that as solo brilliance.

>They would communicate with each other at speeds so fast that communication with them would be almost impossible.

In addition to pausing, we could also slow the simulation down for periods of communication. But given how long it took humans to evolve from single-celled organisms, and later to develop hand axes, etc, we may find that our potential AGI culture also starts off very very slowly.

That sounds like a personification of AI. Humans form societies because we’re social animals who can leverage each other’s skills and time that way. We’re social animals because we’re animals. We get strange and feral alone because we’re evolved to not be alone.

None of that applies to an artificial or emergent intelligence that isn’t human. It doesn’t necessarily need others. Its version of a society might be cloning itself and then reabsorbing itself. Or not bothering with cloning and simply spending 100 billion years exploring solo. Why wouldn’t it? It’s not a mammal made of water, carbon and salt.

The idea is the Singularity isn’t just that AIs will be far beyond us. It’s that they’ll be like nothing we can guess at or use analogies for.

It’s safe to say there are some things in the universe that mammal brains simply can’t perceive, understand, predict, or accept.

Perhaps strong AI will require a real body, in real reality.

I agree. I think consciousness is also a social phenomenon (can you think of a single thought that doesn't have a social aspect?)

I've had similar thoughts. We are the AI. Which implies that reproduction still produces the best computing available.


I'm not sure what you mean by this comment. GPT2 is a precomputed web of associations between words that can produce a simulacrum of human speech. It's not used for actual propagation of anything. It's a Looney Tunes painting of a train tunnel, not an actual train tunnel. GPT2 "AI" isn't trying to "tell us anything" any more than a magic 8 ball is trying to tell us anything.

JackMorgan, I meant that your comments read like those generated by GPT2.


You're saying it isn't easy to build a system, that, after about five years of learning a myriad of other things, can lace its shoes after about a dozend tries and showing him just twice?

That at the same time can explain why it is a good idea to lace them, if wearing them at all?

That insists stubbornly, that it is much more fun to wear no shoes when walking in the wet sand at the beach?

That about one or two decades later is bright enough and a lot more educated to grok applied knot theory and some basic topology with the help of some youtube videos?

And that with just 20 Watt?

Wouldn't have thought that.

It took me way more than "four or five tries" to learn to tie my laces. From childhood to about age 30, I was tying a pair of half-hitches, sometimes making a granny, rather than using the superior "bunny-in-a-hole" method. I only learned the better method when I had to teach my kids.

Wether the "bunny-in-a-hole" method is superior, is quite debatable, sir! ;-)

Funny thing is, I had to invent the two-loop shoelace knot/bunny-ears/bowknot myself, because I couldn't be bothered to understand the bunny-rabbit/loop-swoop-and-pull.

Of course you have to do it right or you end up with an unbalanced granny knot, which is of course not acceptable ;-)))

Until my personal groundbreaking knot invention I used velcro and a double simple knot sometimes for three or four years. Since velcros came out of fashion for inexplicable reasons, I was under pressure to change my modus operandi...

Ah, and of course the influence and dynamics of raising children over a long period of time on intelligence is something seldom considered in AI.

Worth emphasising that the knowledge of how to do these things like tying shoelaces and writing topology papers is contained in the surrounding culture. So if a machine can learn this culture, much of the complexity is outsourced. The ability to learn just one piece of existing culture conveys the ability to learn any other part of it, if one is inclined to do so.

>And that with just 20 Watt?

It is amazing; however there's a school of thought that just as we evolved brains in order to reduce physical effort generally this included minimising power consumption by the brain itself. Intelligence was then a by-product of a more glucose-efficient brain!

'emphasising that knowledge...is contained in the surrounding culture'

This! And some more that is not in the brain. Not to say what is in the brain, but isn't considered as intelligence like emotions, that control large parts of the planetary biomass. Fear for example is a simple automatism that for the most part you don't need to replace with more 'intelligent' approaches and still has much more influence on human behaviour than abstract intelligence, whatever that may be. If it exists at all.

Personally, and I emphazise I am thrilled by what the AI crowd has done the last years, kudos to that, they look to me like someone quite bright, that pretends or even believes to understand the inner workings of a computer by simulation its GUI. If I'm right, my critisism would be, that's great and you're making really fun toys and gadgets and tools, but don't sell it to me as intelligence. That's just a marketing ploy.

That energy optimising theory of the brain you're mentioning sounds interesting, have to ponder that.

Many thanks!

Regarding this topic I came across the subject of Open-endedness [1] and the fascinating works of Jeff Clune, Ken Stanley and their colleagues (the two former are currently working at OpenAI).

EDIT: I added this paper [2] by Jeff Clune, a nice introduction of Open-endedness and their potential for reaching general AI.

[1] https://www.oreilly.com/radar/open-endedness-the-last-grand-...

[2] https://arxiv.org/abs/1905.10985

Yeah. Have a look at this for a good summary of the ideas: https://www.youtube.com/watch?v=lhYGXYeMq_E

I wonder if general artificial intelligence will one day be considered the digital equivalent of the philosopher’s stone. For centuries, alchemists made it their life goal to produce the stone, each attempt resulting in utter failure.

There's a huge difference: no natural process that was observed by alchemists acted as a philosophers stone. They sought to find one, but never did. On the other hand I can observe intelligence in the sentence you have written, and maybe you can observe intelligence in this one that I have written. Natural intelligence does exist, the questions are how does it work, and what are the barriers to it existing in other substrates. AI doesn't have to try to create intelligence to answer well posed scientific questions - the alchemists didn't get to that point until people started building nuclear reactors.

At first glance, this reads like a "keynote paper" to me. Does anyone know if this is correct and if there is, or will be, a recording?

It will be a keynote at GECCO 2021 (https://gecco-2021.sigevo.org/Keynotes).

"AI is harder than we think"

of course depends a bit on who "we" is. The paper seems to focus a bit on the types who thought it's a bit of symbol manipulation you can do in lisp which has always seemed a bit dumb. Then the "it'll never happen" crowd seems dumb as well given people do think and seem to be built from atoms and stuff obeying normal physics.

The more sensible view it seems to me is to compare neural computation to what we can do with silicon and then project on a Moore's law type assumption as to when it would be likely, in the manner of Moravec and Kurzweil which has always put the date around 2030- 2040. And obviously it'll be hard like doing a moon landing is hard but probably not impossible like that also.

Moravec's 1998 paper had "it is predicted that the required hardware will be available in cheap machines in the 2020s" and then I guess we need the software.

It's true that those two approaches are "a bit dumb", but the statement

> The more sensible view it seems to me is to compare neural computation to what we can do with silicon and then project on a Moore's law type assumption as to when it would be likely

is no less dumb imo. What makes you say this? What's the evidence? What neural network system has achieved something even close to being on a path to AI?

While I agree, one of the biggest stumbling blocks is the engineering challenge. The architectures we're using nowadays for machine learning are terribly inefficient when compared to human brain (energy consumption- and performance-wise). And we already hitting the wall with the size we could go down to and frequency, so a completely novel approach would be needed if we want to use the resultant AI outside of datacenters.

Given the recent developments, I don't think it's impossible to use laboratory-grown neurons for such development, although it's hard to imaging something like this appearing between 2030 and 2040.

We also don't have the right computing machine. Each neuron integrates signals from hundreds of other neurons and it then propagates this signal to another set of hundreds of neurons. I'm not sure how it compares to what a computer does, but I'm fairly certain it's not that

Yes, I very much hope we develop in this direction, maybe also using synaptic transistors etc., and both neuroscience and computer science could greatly benefit from this.

I don't think this is too bad, since we care much more about prototyping at this stage. If some particular AI approach/architecture becomes 'good enough' for many applications, it wouldn't be long until ASICs for that calculation would arrive; probably not as good as the brain, but certainly orders of magnitude better than general-purpose CPUs/GPUs. There's also ongoing work involving ML with low-precision floats, etc. which can make such circuitry even more efficient.

The reason we don't really see this today is that recouping the investment for designing and fabbing such ASICs may take a few years, and they would probably be obsolete by then, given today's rapid changes. For example, a few years ago there might have been a clear business proposition for putting ASICs dedicated to convolutional neural networks into cameras (even camera phones); yet CNNs now seem to be phasing out in favour of transformers; and it's not at all clear what the "best" transformer is yet (e.g. look at all the different approaches to making them O(n) memory!)

>> The paper seems to focus a bit on the types who thought it's a bit of symbol manipulation you can do in lisp which has always seemed a bit dumb.

About half of the paper is about modern deep neural networks, but can you please give a few examples of the kind of AI you say is "a bit of symbol manipulation you can do in lisp"? Because I would say this is an extreme oversimplification borne by terminal unfamiliarity with the subject, beyond what may be commented on twitter.

For instance, remember that Deep Blue, despite its name, had nothing to do with deep learning or machine learning of any kind, and was a manually programmed system, yet I doubt anyone would sensibly describe it as "a bit of symbol manipulation you can do in lisp".

But please give examples of what you mean.

Symbolic AI is a well known topic.


As the article notes

> Symbolic AI was the dominant paradigm of AI research from the mid-1950s until the late 1980s.

And, yes, Deep Blue was just "a bit of symbolic manipulation you can do in Lisp" at a very large hardware scale. It simply brute force calculated the value of as many position trees as it could up to ply 8, and the "value calculator" was a set of rules based on human expert inputs.

Creating Deep Blue's hardware in 1997 was the most impressive achievement.

Anyway, symbolic AI still potentially may be the path to AGI, but obviously ML/DL techniques utilizing very large datasets snd very large architectures bore a lot of useful fruit.

>> And, yes, Deep Blue was just "a bit of symbolic manipulation you can do in Lisp" at a very large hardware scale.

Modern chess engines use the same approach as Deep Blue, on much smaller hardware. IBM certainly tried to position Deep Blue as a triumph of large hardware, but its success is widely recognised as being owed to its minimax with alpha-beta cutoff algorithm. I can't imagine anyone who would call minimax "brute force"- it's a search algorithm and alpha-beta is a heuristic pruning technique. It's very hard to see pruning of any kind as "brute force"; rather that's the whole point, you prune a search tree to avoid an exhaustive search.

The evaluation function was hard-coded, yes, according to human expertise and chess theory. Note that Deep Blue also had an extensive "opening book" which if I remember correctly made it possible to play a very strong early game.

In any case, it took a few decades to create a system like Deep Blue and it was far from "a bit of symbolic manipulation in Lisp" but scaled up. I recommend the Adversarial Search section in Russel and Norvig if you want to develop a more thorough understanding of the relevant approaches (probably fourth edition will do).

>> Creating Deep Blue's hardware in 1997 was the most impressive achievement.

Can you point to other achievements in symbolic AI that were not as impressive as Deep Blue? Do you know of any others that you could compare to Deep Blue?

Is symbolic AI dead nowadays? All AI papers I'm seeing is on machine learning.

Have you looked at recent publications in AAAI and IJCAI? Those are the big conferences of AI without specific focus on machine learning.

You may also want to have a look at KRR (Knowledge Representation and Reasoning) and one of the robotics conferences, whose names unfortunately I don't remmeber by heart. Again, those don't tend to focus on machine learning.

Finally, there is such a thing as symbolic machine learning. For example, see the IJCLR conference (unifying a bunch of conferences from the symbolic machine learning field, like ILP, STAR, NeSy etc).

They are referring to “classic AI”, as exemplified by SHRDLU¹ and Cyc².

1. https://en.wikipedia.org/wiki/SHRDLU

2. https://en.wikipedia.org/wiki/Cyc

The capabilities of SHRDLU are still unsurpassed by modern NLP systems, unfortunately.

Cyc is a different matter. I don't really know how close the project is to achieveing its goals and it always seemed like a bit of a futile moonshot to me, but it's certainly not "a bit of symbolic manipulation you can do in lisp". If nothing else, the sheer scale of a project that has been going on for almost 40 years, should cause a reduction of the arrogance and brashness of proclamations about it.

1. Moore's law isn't on track.

2. Maybe neurons do internal computation, meaning brain complexity is under-estimated.

Link to paper: https://arxiv.org/pdf/2104.12871.pdf

TLDR: Fallacy 1: Narrow intelligence is on a continuum with general intelligence (solving an easy AI problem doesn't immediately lead to being able to solve a hard AI problem)

Fallacy 2: Easy things are easy and hard things are hard (ie AI is hard)

Fallacy 3: The lure of wishful mnemonics (ie people tend to give parts of a computer program anthropomorphic labels, eg "understand" that don't really)

Fallacy 4: Intelligence is all in the brain (our thoughts are grounded, or inextricably associated with, perception, action, and emotion, and that our brain and body work together to have cognition)

With regards to fallacy #2 in fact, it is the other way around, I. E. "Easy things are hard" also known as Moravec's paradox. The symbolic AI generation always assumed that really hard problems like language understanding could be solved entirely through easy symbolic manipulation techniques. At the same time, current generation of AI researchers also claim that neural networks and gradient descent are all you need to solve such problems. But both groups seem to forget that the such problems are barely touchable using a single approach and encompass a large variety of different subproblems that may need wildly different methodologies inorder to solve them.

I think fallacy 2 is the big one, and not just in the AI field. Sometimes a simple lookup table (or some other 'dumb' model) works well on tasks that humans generally consider 'hard'.

Fallacy #1 is mostly just a proposition/conjecture.

The current trend is for these "narrow" AI systems to become more and more general as the technology gets better.

So far, we haven't seen the end of that trend. It's anyone's guess whether we're going to run into the proposed hypothetical barrier.

Fallacy 4 is wrong. There are good reasons to believe intelligence can be "disembodied". The primary one is that signal mappers decouple.


(Part of) intelligence is about applying acquired knowledge to a new situation. Can a disembodied "intelligence" even encounter new situations?

"Disembodied" means I/O interface is reasonably clean and not entangled, it does not mean there is no I/O.

Whatever AI is or will be it won't be written by us. Architectures designed by deep learning methods are often better and simpler than designed by humans. We just lack the tools to run the search.

I think that AI is a function. Give the person 2 similar inputs and the output will be often similar. I have situations when I am presented with the same problem one month apart and my answer is exactly the same.

While I agree with the general point of this paper I don’t think it’s quite right to compare the current situation with the last AI spring. It’s not AGI but it’s very good narrow AI that has real commercial value right now. The systems back then did not to the same extent, and for this reason I don’t see funding dry up for current ML approaches.

I just read this over lunch, good read.

Made me think of the AAAI conference in 1982 when we all went home with bumper stickers saying "AI: It's For Real".

I agree with most of Melanie's points. AI is so very much more than deep learning. We need better metrics for measuring progress towards real AI, common sense knowledge, etc.

Does the author refer to the whole AI community, when using the word "we"?

From the abstract:

> In this paper I describe four fallacies in common assumptions made by AI researchers, which can lead to overconfident predictions about the field.

I'm getting 403 Forbidden errors?

so when AI will be ready for self driving? at least in cities.

i personally believe it's do-able. But it's not going to happen using machine learning and certainly not with quantum computers. I think u just need plain old code. It's just really complex to understand how a brain works. I mean, words, actions, experiences and concepts have to all work together. Let alone emotions, perception of others, projecting the future and a value system. But even that said, im sure it's do-able.

Just a heads-up, all of your comments seem to get shadow banned. I'm not sure why, maybe you should contact dang.

Thanks for the heads-up. I was already suspecting this for a while. Who/what is dang? U mean the HN user named dang?

Yes, dang is a (the?) moderator on HN. Send a mail to hn@ycombinator.com and he'll (hopefully) sort it out.

This paper is written by a non-technical author, criticising AI based on predictions made by non-technical pundits.

By concentrating on binary outcome predictions (Do we have "full self driving cars" available to the general public?) it misses the real progress made in a huge number of areas.

For example, one of the claims this paper claims to be false is Zuckerberg's 2015 declaration that:

One of [Facebook’s] goals for the next five to 10 years is to basically get better than human level at all of the primary human senses: vision, hearing, language, general cognition

There's little doubt that AI systems are already better than humans in vision and hearing and there is very clear progress in language. Cognition is ill-defined, but on most benchmarks attempting to measure this there is steady progress too.

I think the real reason AI is harder than we think is because any time progress is made, humans redefine AI as "not that thing we just solved"

Take Marcus' claim that charades is too hard for AIs. I think models like Open AI's Dall-E show clear progress towards developing the kinds of techniques needed to solve this, and if there was a benchmark for it I bet computers would out-perform humans in less than 5 years.

The paper criticises the benchmarks that are typically used to show that "AI systems are already better than humans in vision and [speech processing]" etc.

For example, given state of the art in current benchmarks on language understanding, if those benchmarks really did measure language understanding, we could all be AIs debating the paper that itself could have been written by AI. Suffice it to say, this is not likely given the current level of "understanding" in modern systems.

The paper refers to various pieces of work within natural language processing and other sub-fields of AI that analyse the weaknesses of such benchmarks and investigate the ability of modern systems to beat benchmarks by finding shortcuts or exploiting surface statistical regularities etc. I recommend, for example:

T. McCoy, E. Pavlick, and T. Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics innatural language inference. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 3428–3448, 2019.


> For example, given state of the art in current benchmarks on language understanding, if those benchmarks really did measure language understanding, we could all be AIs debating the paper that itself could have been written by AI. Suffice it to say, this is not likely given the current level of "understanding" in modern systems.

I'd note that I said "clear progress on language understanding".

> we could all be AIs debating the paper

if those benchmarks really did measure language understanding in terms of the level of proficiency between the researchers, then it's no surprise that the results are very different. The next chart shows the results of the most common tests for language comprehension. If we look at the results of the different tests, then we see that the average proficiency is very high (12.9 out of 20). If we look at the results of the tests for other things, then the average proficiency is very low. Again, the charts are very similar to the results of the two separate results of the two separate tests for language comprehension. So, the question is, what does this tell us about the impact of language learning on literacy? I'll give you what I think is the relevant point. I think the key point here is that language learning is the process of learning new things. So, there are some things that are learning in the language.

The text here was generated by a GPT-2 based model (aitextgen). I think there is a fair argument that it isn't far off the average level of discourse on HN.

I think you're being unnecessarily mean, rather than "fair". Anyway I'll take the opportunity to point out that one reason why language understanding benchmarks are not adequate for the task is that it's very difficult to know what a system "understands" (if anything) just by looking at its output. It is also very difficult to evaluate language _generation_ tasks accurately. Finally, I have no idea how you could convince me that the above was not written by a human, rather than generated by a language model, if I didn't believe you. Which is to say, there are no good ways to know this kind of thing for sure.

At the end of the day, we have a bunch of metrics that don't measure what we want them to measure and a bunch of systems that don't learn what we want them to learn, and that are very good at gaming any metric we throw at them. The end result is a lot of uncertainty regarding true capabilities of those systems. And when careful scholarship is turned to the analysis of those systems' results, it tends to find that they're not as good as the metrics suggest. I'm repeating the point made by the article, but I agree with it very much.

Edit: the paper I link to above proposes a new benchmark called HANS (Heuristic Analysis for NLI Systems) that tries to correct for learning shortcuts in language models. It finds that (state of the art language model) BERT performs dismally on that benchmark. That's one datum. The other is that I haven't so far seen results on HANS reported in papers or benchmark aggregators etc. Language modelling work likes to cite results on e.g. GLUE, SUPERGLUE, etc, but these are exactly the kinds of benchmarks that are full of loopholes for the current approaches to exploit (even though they're supposed to not be).

In the HANS paper I'd point out this bit:

"we retrained each model on the MNLI training set augmented with a dataset structured exactly like HANS (i.e. using the same thirty subcases) but containing no specific examples that appeared in HANS.... In general, the models trained on the augmented MNLI performed very well on HANS"

I think it's great to find examples where language models break. But it doesn't look like they have found a fundamental problem here - just a weakness that can be fixed with "just" engineering.

I do agree there are things we don't know how to do yet. I think Chollet's "On the Measure of Intelligence" paper[1] is the best writing I've seen on this, which shows specific things that are currently hard for computers to do and give a route towards general intelligence.

[1] https://arxiv.org/abs/1911.01547

Just wanted to point out that I don't think she's a non-technical author https://en.wikipedia.org/wiki/Melanie_Mitchell

Also the paper mainly comments on the work and sayings of AI and CS researchers like John McCarthy, Stuart Russel, Andrew Ng, Geoff Hinton, Hans Moravec, Drew McDermot, Claude Shannon, Alan Turing etc. i.e. hardly "non-technical pundits".

I think computers have been better at stuff then humans the whole time . And when they are better at all the particular things it still doesn’t result in common sense. That’s one of the fallacies.

So, am I right, that 'non-technical author' part means, she doesn't sell AI?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact