Hacker News new | past | comments | ask | show | jobs | submit login
DeepMind says reinforcement learning is ‘enough’ to reach general AI (venturebeat.com)
238 points by webmaven 10 days ago | hide | past | favorite | 299 comments





Saying RL is sufficient to (eventually) achieve AGI is a bit misleading. One might similarly state that biological evolution is sufficient to (eventually) achieve biological general intelligence.

Both statements are probably true, but the parenthetical (eventually) is doing an awful lot of heavy lifting.


I think the title of the paper makes more sense if you consider that ten years ago, someone could have written a paper in a similar spirit with a different take on "what is enough". Back then, it would probably have been titled: "Backpropagation of errors is enough".

The last ten years have shown that backpropagation -- while a crucial component -- is not enough. Personally, I would not be shocked to find out in the next ten years that reinforcement learning is not enough for an AGI (as there are aspects like one-shot learning, forgetting, sleep, and other phenomena for which the RL framework seems not a natural fit).


Ten years ago we didn't even have AlexNet; I think most people would have thought a paper like that was nuts at the time. The ten years since are what popularized backpropagation as a path to general intelligence. Who ten years ago would have seriously predicted GPT-3? The odd few that did are certainly not the people I would expect to have been dissuaded! And if there's any actual experimental evidence that backpropagation is not enough, I haven't seen it.

Backpropagation was the model for AGI in the 1980s if not earlier. Of course computing power made it impossible for anything to actually deliver AGI.

RL can forget, just start training it on a dataset that is different from what it was originally trained on.

You are right; I should have been more specific. RL does forget in the simplest sense, i.e. that certain weights in your model drift away if the data distribution is non-stationary. Humans seem to be a bit more targeted in their forgetting.

Why are forgetting and sleep relevant? If someone invented a pill that gave you a perfect memory and removed the need to sleep, would you stop being generally intelligent if you took it?

Possibly. Database look up on a million rows is very different then a lookup on a trillion. Both have solutions, but the Perl hack that is our mind may lock up on a bigger data set.

One of the postulated reasons for why older people have worse reaction times and think slower than their younger counterparts is that the neural networks they use draw upon more stored information, thus making routine evaluations take longer.

There's a sweet spot between knowing enough and knowing little enough so that you get the right answer and get it quickly enough.


I guess you haven't heard of ISRIB (integrated stress response inhibitor). This drug when given to mice causes both old and alzheimer's riddled mouse brains to return to near normal. Interestingly the patent is now held by an ALPHABET company. Not saying there isn't a limit to our storage capacity, but this drug make it clear that age alone doesn't prevent the brain from working well.

That's a weird claim. Why not just assume old people are slower minded for the same reason they are physically slower: physical degredation?

Well, if the state of the world changes, then hanging on what you learned in the past can cause you to do the wrong thing. Sure, there is an old proof that the value of (true) information is greater than 0, you could say, but they could also remember that the state of the world has changed, so there is nothing bad about remembering, or the model could just discount data by how old it is, etc. All true. But the representation becomes more and more complex. I certainly find that I have to pull back and tell myself, wait, the world has probably changed since I learned that, hasn't it? Has it?

Maybe not, Chesterton's Fence applies: https://wiki.lesswrong.com/wiki/Chesterton%27s_Fence

Probably because if we didn't have the ability to forget and turn off often we would pretty quickly kill ourselves to end the horror of our existence?

Because they are strongly associated with all known examples of generalized intelligence. Why wouldn’t they be relevant?

> Because they are strongly associated with all known examples of generalized intelligence

Correlation != Causation. While they very likely might be relevant, I've not seen anything to conclusively prove that it is. The ability to forget is important to humans because we are emotional beings, but I don't think that necessarily is a requirement for generalized intelligence. "sleep" (as in what happens during sleep, not the act specifically itself) on the other hand is very likely important, but again, not proven.


I completely agree that there is no conclusive evidence. However, it is the number one activity that influences cognitive performance in all animals that have been tested. Saying that sleep is not proven to be important for intelligence because the causal link has not been established seems a bit like saying exercise has nothing to do with muscle growth, because the full causal chain has also not been established (we do know a pretty full causal chain in this case, so maybe it is not the best example). I think if you were a betting man, and as a scientist you have to be to some degree, you would put your money on sleep supporting some essential process for intelligence.

That is if you believe biological general intelligence is the end goal of evolution, which I believe is highly unlikely.

Intelligence is simply a special side-product of evolution, there is nothing general about general intelligence. Many organisms can thrive without it.

There is also a non-negligible chance that all organisms would die out before reaching intelligence. We are fortunate to live in a world that produced us.


That's a bit besides OP's point though, which is about vacuous claims. Humans are the existence proof that there is some sequence of circumstances where evolution reaches GI. There's an analogous sequence of circumstances in the RL case, which happens to be the hard part.

> That is if you believe biological general intelligence is the end goal of evolution, which I believe is highly unlikely.

I would agree, but might add that evolution doesn't have 'goals'.

Is that the point you were trying to make?


Not OP, but yeah, evolution doesn’t have goals in the same sense that people do, just like gravity doesn’t “want” to pull things, it just kind of “is”, and simply acts as reality permits based on prior and current conditions. That’s reasonable to say.

Convergent evolution exists for at least some adaptations though, like the eye. It’s not unreasonable to think that there may be some sort of equivalent convergence which creates a high general intelligence adaptation given enough time, at least for social creatures.

I think it’s pretty much impossible to know whether intelligence is a convergent adaptation without some kind of perfect simulation of evolution over billions of years. You’d have to tweak starting conditions and see if you kept getting smart creatures.


Ah. So that’s why we exist. I was wondering.

Depends, if any of the laws of physics were off by a billionth of a percent, there would be no human intelligence (or carbon life, or atoms).

There are many reasonable assumptions one could draw from the fact.


Anthropic principle

This assumes a multiverse which is interesting, because it leaves open the possibility that we are in one of the infinite universes(pl) that does have intelligence as its goal. :)

> biological evolution is sufficient to (eventually) achieve biological general intelligence

Says nothing about this:

> biological general intelligence is the end goal of evolution


i mean if the end goal is to propagate the organism, surely intelligence will be helpful to this - interplanetary scale

But until that actually happens, the possibility of it maybe happening in the future has zero impact on current natural selection.

Yes, it's easy to be convinced on either side, the arguments write themselves. Yes, eventually a learning system might learn enough to be indistinguishable from intelligence. Or this might be entirely the wrong path and detracting from genuine new innovations in how we think about AI.

We won't be able to tell whether it's AGI or just good enough at trained tasks to trick us.


It can prove its intelligence by making testable predictions of the future better than us. As for whether it's "real" AGI or just acts like it, doesn't really matter. I think the Chinese room problem has been agreed on as not a problem, hasn't it?

I think proof of “real” intelligence by answering harder and harder questions is barking up the wrong tree. I think evidence and proof are a better way to denote varying levels of understanding.

A deductive system can come with an answer and a proof of that answer, where proof is whatever counts as proof in that system.

So the notion of “does it really understand it’s answers” gets punted off its Q&A abilities and onto its ability to justify its answers.


That's an interesting idea but it would exclude many humans who can make correct predictions using their experience and intuition but can't justify them correctly. Those people are still very useful.

What you're describing is what we do at school. We can't assess understanding so we assess justification of answers as well as other things like ability to do X (we don't care if they understood or not, just be capable).


> As for whether it's "real" AGI or just acts like it, doesn't really matter.

Absolutely. The term "AGI" came about specifically to avoid existing philosophical arguments about "strong AI", "real AI", "synthetic intelligence", etc. Those wanting to discuss "true intelligence", etc. should use those other terms, or define new ones, rather than misuse the term AGI.

AGI requires nothing more (or less!) than a widely-applicable optimisation algorithm. For example, it's easy to argue that a paperclip maximiser isn't "truly intelligent", but that won't stop it smelting your haemoglobin into more paperclips!


My last sentence was a statement of that problem, not a question.

Let's say I'm standing next to a table. The computer recognizes it as a table. Now I sit on the table. Is it a chair or a table now? Something that we do automatically is a LONG way away from being automatic for AI.

Does AGI imply human-level intelligence, or would the intelligence of a housefly qualify?

It's a very interesting question.

Personally I take mammalian intelligence as the relevant standard we're actually aiming at.

So I'd say mouse+.

Houseflys, I think, are closer to non-intelligent than intelligent.


That's a little chauvinist! Birds regularly run circles around mice... er, so to speak.

My view is: mammalian is sufficient, but not necessary.

Crow-level intelligence is probable likewise sufficient.

I think aiming at mammalian is a good long-term ambition. I think, either way, we are hundreds of years off.


Surely the AGI researchers have a benchmark though, don't they? Somebody else mentioned the Turing Test which is something...

I dont think there are any AGI researchers. At least, I dont think computer science has much to do with AGI.

The turing test is also not an AGI test, it's a "good enough" standard for fooling people.

Intelligence fundamentally requires a multitude of environmental capabilities. The turing test considers only a single i/o boundary.


AGI implies it can pass a Turing test, which means it has a better-than-average chance of acting more "human" than a competing human.

I'm assuming you are aware of the difficulties for machines to do even the most basic of things that a living being can do with a brain the size of a pea. A housefly can fly and navigate effortless through most complex scenarios that it evolved to navigate (even though the same fly can get stuck behind a glass window and eventually die).

So yeah, even getting that level of intelligence would be a huge win. However, most people mean close to human level intelligence when they mean AGI even if it's one narrow specialization.


> even if it's one narrow specialization.

Obviously that already exists even with g.o.f.a.i.s so that is not that impressive.

The impressive thing is something more general than that.


Doesn't the G in AGI imply that narrow specializations aren't the target?

I think, in really broad terms, in order to get AGI actually we would need to do better than nature.

If our metric is (intelligence)/(joule), nature seems pretty bad at a first glance: it took many trillions of lifetimes to achieve "general intelligence" *

But then again, on the big stuff like this, have we ever really beat nature? That asterisk is there because, sure, turning the earth's biosphere into computers would make us smarter, but... are we sure?

(And also: human = general?)


Nature has a massive incentive to make good use of energy from light through photosynthesis. Billions of plants compete, and whoever can get most out of the sun will win out.

Yet manmade solar cells are more efficient by nearly all measures.


Except that manmade solar cells are pretty bad at repairing or replicating themselves.

Or growing out of literally nothing but dirt and water.

And air. That's what is crazy about plants, their carbon comes from the CO2 in the air.

Also if someone loses weight, most of the carbon that made up their fat leaves the body as breath.


and are inedible

and are a damned eye-sore

The solar cells get some organic life form to assist in their reproduction phase. That's pretty efficient too.

Why are these just-so stories believed so much?

Just because plants compete on some limited level doesn’t mean that a particular plant organism “winning” means becoming the most efficient converter of sunlight.

Is everyone’s memory like those people who can remember every detail? Why not? If you’re immediately planning to make up a just-so explanation on the spot that has the requisite but unproven claim about increasing the genetic fitness function, that is the problem with evolutionary explanations. It’s not science if you just make stuff up and give it the same amount of credibility as something that has been tested and proven. You can take any trait and spin stories about why it is the way it is, and then expect somehow that some metric has to be maximized because of your unproven theory.


> Yet manmade solar cells are more efficient by nearly all measures.

Only because we cheated, though: Houses can't sponantously grow more cells in place when more energy is needed.


On a half-jokingly note, they can, their humans buy them and put them where needed. An alien observer in space would see some houses spontaneously growing solar cells on their roofs.

This is even more interesting if you think of all human artifacts as being equivalent to anthills and beaver dams.

1) Trees are natural and trees create leaves with a solar efficiency of x

2) Humans are natural and we create solar panels with efficiency x + y


This is Dawkins' idea of "extended phenotype". Normally a gene's phenotype refers to its effects on the body of an individual organism posessing that gene, like hair colour or immune response.

A gene's extended phenotype includes effects external to particular organisms, like nests, deforestation, changes to the chemical makeup of the atmosphere, etc.


nature only has an incentive to increase efficiency when that increase in efficiency results in increased chance of producing gene copies.

Nature is full of examples that are 'good enough' while balancing other competing constraints. Evolution doesn't create organisms optimized for efficiency - it creates organisms optimized for reproduction. The two are not always the same.


This comparison with nature is pretty interesting. I think some additional constraints are required though. Otherwise, technically we can produce agi by simply giving birth to humans. If that's not "artificial" enough we can produce them from test tubes

I thought it was a fun position paper, if not exactly groundbreaking.

They did avoid one common pitfall at least. They are (intentionally?) vague about which number systems the rewards can come from, apparently leaving it open whether the rewards need be real-valued or whether they can be, say, hyperreals, surreals, computable ordinals, etc. This avoids a trap I've written about elsewhere [1]: traditionally, RL rewards are limited to be real-valued (usually rational-valued). I argue that RL with real-valued rewards is NOT enough to reach AGI, because the real numbers have a constrained structure making them not flexible enough to express certain goals which an AGI should nevertheless have no problem comprehending (whether or not the AGI can actually solve them---that's a different question). In other words: if real-valued RL is enough for AGI, but real-valued RL is strictly less expressive than more general RL, then what is more general RL good enough for? "Artificial Better-Than-General Intelligence"?

Note, however, that almost all [2] practical RL agent technology (certainly any based on neural nets or backprop) very fundamentally assumes real-valued rewards. So if it is true that "RL is enough" but also that "real-valued RL is not enough", then the bad news is all that progress on real-valued RL is not guaranteed to help us reach AGI.

[1] "The Archimedean trap: Why traditional reinforcement learning will probably not yield AGI", JAGI 2020, https://philpapers.org/archive/ALETAT-12.pdf

[2] A notable exception is preference-based RL


There are more real numbers than programs. Computers cannot represent the vast majority of real numbers. AFAICT, it's not even clear that the universe is continuous rather than discrete.

I really don't believe that using approximations of real numbers is going to be the bottleneck for AGI.


> AFAICT, it's not even clear that the universe is continuous rather than discrete.

I'm not sure that makes any difference (in either direction).

I mean, at the scale we care most about, the universe appears to be continuous, so an AGI has to be able to tackle continuous-appearing problems and use continuous-appearing representations.

OTOH, the universe is likely to actually be discrete, so an AGI has to be able to tackle actually-discrete problems, and use representations that are actually-discrete on a fundamental level.

There isn't much of a contradiction between these constraints, although the prospect of a continuous-appearing universe that is actually running on a discrete substrate seems to give a lot of people a brain cramp, and that same brain cramp gets elevated into 'proof' that current approaches cannot lead to AGI. Which is nonsense (there may be other limitations inherent in current approaches, but that can't be one of them).

One might as well claim that computers are digital and brains are analog and conclude that digital image representations cannot possibly be used to communicate information to analog brains.


And yet computers have no problem symbolically representing non-rational numbers like sqrt(2), pi, etc. Neither is there any inherent reason why they cannot symbolically represent various levels of infinity, nor why those would be incomprehensible to AGIs (even if the universe is discrete). You're right that only countably many numbers can be represented, but nevertheless even countable subsets of extended number systems can exhibit structural properties that the reals do not exhibit.

How do you even have reinforcement learning with non-real numbers? The point is to maximize a score. It seems to me, any benefit you'd get from using an alternative number system could be replicated by using an algorithm to convert multiple real number scores into a single value.

Here's an example. Suppose there are two buttons, A and B. If you press A for the nth time, then you get reward n. If you press B for the nth time, then you get reward 0 if n is not a power of 2, or reward omega (the first infinite ordinal number) if n is a power of 2.

If the above rewards are shoehorned into real numbers---for example, by replacing omega with 9999 or something---then an RL agent would misunderstand the environment and would eventually be misled into thinking that pressing A yields more average reward.


There are no infinite rewards in biology and yet mathematicians seem to do just fine answering these sorts of questions.

I don’t think you want to encode your problem domain in your reward system. It’d be like asking a logic gate to add when you really should be reaching for an FPU. Maybe I’m missing something though?


>There are no infinite rewards in biology and yet mathematicians seem to do just fine answering these sorts of questions

This is only a problem if you're already assuming we do everything based on our biological reward systems, and in the current context that would be circular reasoning.

Imagine the treasury creates a "superdollar", a product which, if you have one, you can use to create any number of dollars you want, whenever you want, as many times as you want. Obviously a superdollar is more valuable than any finite number of dollars, and humans/mathematicians/AGIs would treat it accordingly, regardless of the finiteness of our biological reward systems.


> This is only a problem if you're already assuming we do everything based on our biological reward systems

Is there some other way that we are do it beside our biological reward system? It sure looks like we get an apple and not an infinite reward when we pick the right answer to be selecting button B. I understand that might not satisfy you.


>Is there some other way that we are do it beside our biological reward system?

Seems to me that's what this whole paper we're discussing is about. If you're already convinced that there is no other way, then you're basically already agreeing with the paper, "Rewards are enough".


What's the behavior your trying to get the AI to do in this example? Learn how to compute the power of 2? This is a task that can be accomplished much more simply with a different reward system. For example, have A always equal 1 and B equal 2 if it is a power if 2 and 0 otherwise.

I understand you can use non real numbers, that's not what I was asking. I'm asking what's a behaviour you can't replicate using a reward system based on real numbers.


>I'm asking what's a behaviour you can't replicate using a reward system based on real numbers

So glad you asked! I can give an answer which people will love who take the necessary time to understand it. It's complicated, you might have to re-read it a few times and really ponder it. It's about automatic code generation (though it might not look like it at first).

Definition 1: Define the "Intuitive Ordinal Notations" (IONs) to be the smallest set P of computer programs such that for every computer program p, if all the things p outputs are IONs, then p is an ION.

See https://github.com/semitrivial/IONs for some ION examples in python.

Definition 2: Inductively associate an ordinal |p| with every ION p as follows: |p| is defined to be smallest ordinal which is bigger than every ordinal |q| such that q is an output of p. Say that p "notates" |p|.

Finally, to answer your question, I want the AGI to write programs which are IONs notating large ordinals, accompanied by arguments convincing me they really are IONs. An easy way to incentivize this with RL would be as follows. If the AGI writes an ION p and an argument that convinces me it's an ION, I will grant the AGI reward |p|. If the AGI does anything else (including if its argument does not convince me), then I'll give it reward 0.

You can't correctly incentivize this behavior using reals. The computable ordinals are too non-Archimedean to do so.


The paper presents some interesting ideas, but I think it ultimately fails to account for the fact that AGI does not mean "the ability of an agent to produce the absolute perfect solution to any problem", but rather "the ability of an agent to understand or learn any intellectual task that a human being can" (wiki for AGI, emphasis added). Taking that into account, in every example you provide I argue the human approach more closely aligns with the "limited" behavior the real-bound RL agent would demonstrate than the "perfect" approach a surreal RL agent might take.

For instance:

You argue a real-bound RL doctoring algorithm would not appropriately set "the patient dies" to `-Inf` weight, but in fact humans do not either. If we did you'd see in the case of a near-death patient absolutely every procedure, no matter how costly, experimental, dangerous, or irrelevant, would be attempted if it had even the slightest chance of increasing the likelihood of them not dying. In reality, doctors make risk-reward decisions on every patient, and will very often choose not to undertake costly, experimental, dangerous, or irrelevant procedures even if there is some documented minuscule chance of it working.

Further, you argue that a real-bound RL theorem prover or composer would not know how to stop going down an ever increasing state-chain x_0, x_1, x_2, ... even if there existed some other state y that was "better" than any of the x's. But, this too is a very human behaviour! How many brilliant mathematicians, musicians, heck even software engineers, have spent their entire careers creating further and further derivatives of a known successful work, as opposed to starting anew and creating something truly world-changing?

You also bring up a theoretical button which on every press gives you 1 point, versus a different button which gives infinite points on every power-of-two press. You argue that the real-bound RL agent would be forced to move to the 1-point-per-press button after some number of presses, but would any human really sit there pressing the button for all of eternity to eventually get the `Inf` instead of just saying "screw it I want something now"? Not to mention that the problem setup is fundamentally flawed, as within our current understanding of the universe there is no infinite supply of anything, and furthermore if there was an infinite supply of something you wouldn't have any benefit of pressing after the first press, much less waiting around for the billionth -- you'll continue to have an infinite supply. In fact, what you've done there is presumed a surreal universe by a) assuming that a button can provide an infinite supply of something, and b) assuming that having two of the infinities is better than having just one. So sure, if you're in a surreal universe, backing your RL with surreal numbers is a good idea. But we're, so far as I know, in a real universe, so backing with reals should be sufficient.

Edit: I above use "surreal" to mean both the standard concept of surreal numbers in addition to any numbering concept which allows for and distinguishes between integer multiples of infinities.


Thanks, that's one of the best critiques I've ever heard of my paper.

One minor correction first: you're absolutely right that AGI is about comprehending the environment, not about perfectly solving all environments (the latter is mathematically impossible even with strong noncomputable oracles etc). I'm not sure why people so often come away from my paper thinking I'm saying AGI is supposed to solve all those environments, I never say anything like that. If I could go back in time, I'd make that clearer in the paper. No, it's about the AGI simply being able to comprehend the environments, like you say. And the thesis in the paper is that shoehorning general environments into real-valued-reward environments is a lossy process.

For the rest of your argument, you make a lot of good points. I would ask, what do you say in response to, e.g., Alan Turing who asks us to imagine Turing machines having infinite tape and running for all eternity? Obviously that too is impossible in the finite universe we live in. That's sort of the divide we disagree on. I'm talking about idealized AGI. If we consider human beings, humans have finite lifetimes so any particular human being's entire lifetime of actions could simply be recorded in a finite tape recording. But does that mean said finite tape recording is intelligent? In the idealized world, I would want to say it's a basic axiom that no finite tape recording of a human can be intelligent. But now we're deep in philosophical woods.

I like your point about musicians etc creating further and further derivatives of known successful work as opposed to starting anew :) I guess in terms of my paper, the real question is, if you confronted these derivative musicians with the grand new work that transcends them all, would they recognize it as such, or would they (like an AGI confused by rewards shoe-horned into real numbers) mistake it for something mediocre? Now we are deep in psychological woods!


RL + piggybacking on human culture might be enough, or evolution + RL for biological agents.

> RL + piggybacking on human culture might be enough, or evolution + RL for biological agents.

Yes, but over what timeframe? Will there be any diminishing returns plateaus along the way?


We still have unknown unknowns but we also know a lot more about how neural nets deal with various tasks and dataset preparations. We know what kind of applications are good enough and where they still fail, which is much more than a decade ago.

If you look at sci-fi movies with robots, they usually speak in a metallic voice but have good situational and language understanding. In reality it was the other way around, it's much easier to do artificial voices than understand the topic. That kind of naive understanding seems silly now, and this is how we gradually advance.

GPT-3 taught us that good sounding text is not that hard to generate if you have ample training data, but modeling the larger context is still hard. These kind of fine distinctions are what I call progress.


> If you look at sci-fi movies with robots, they usually speak in a metallic voice but have good situational and language understanding. In reality it was the other way around, it's much easier to do artificial voices than understand the topic. That kind of naive understanding seems silly now, and this is how we gradually advance.

I waffle a lot on whether that aspect of 1968's '2001: A Space Odyssey' is evidence of genius or just survivorship bias.


Some Bozo who has heard all this many times before is suspicious of claims from places like Deep Mind who have a financial incentive to make them (keep funding) where there aren't working machines to back that claim up.

Some Bozo has no credentials, no reputation, no track record of publications and barely supports the claim they're making with anything much. Some Bozo has no financial incentives or otherwise to opine either way. Some Bozo doesn't even work in the field at all.

Bets: Some Bozo or Deep Mind turn out to be closer to being correct in the passing of some finite amount of time? 5 years? 10 Years? 25 Years?


I'll bet a sum of real money that Some Bozo is correct.

Bozo has the hindsight of history and philosophy going for him, while Deep Mind has a huge financial temptation to sell snake oil.



A quote from datscilly, the top forecaster on metaculus:

>AGI may never happen, but the chance of that is small enough that adjusting for that here will not make a big difference (I put ~10% that AGI will not happen for 500 years or more, but it already matches that distribution quite well).[1]

[1]:https://www.lesswrong.com/posts/hQysqfSEzciRazx8k/forecastin...


while Deep Mind has a huge financial temptation to sell snake oil

I don't know, isn't the DeepMind founder that guy in the Go documentary? I read about him after watching the doc and he seemed to be pretty cautious about taking in investment, and he didn't seem the type to try to cash out.


He already cashed out, he sold to Google. And over the years Google has ramped up the pressure for DeepMind to deliver financial returns. (I recall when Google tried to stick DeepMind's branding on GCP, Watson-style, so it would sell better, and at the time, DeepMind was able to decline.)

Eventually Google will give them the option to deliver financial success or be shut down.


“Show me the incentives I’ll show you the outcome.”

Google make money, Google Bad.

Deep Mind owned by Google, Deep Mind bad!

The above conclusion stands trite.

Perhaps the inverse is true.

Google and Deep Mind, if correct, could be hurting themselves more than helping themselves.

Why? Creating a future species who’s too smart to click on ads, and too smart to remain subject to its whims, doesn’t sound like it’d be good for quarterly profits...

There’s also the emotional incentive for humans to confirm their own beliefs about humanity being special.

If Google/Deep Mind knows this, yet publishes research anyway in the spirit of truth, why, what they’re doing may be considered heroic.

Two sides of the coin here.


No offence, but I think you are extremely wrong.

Creating an AGI is the endgame for everything. Who cares about ads when you have an AI that can learn to do anything and improve upon itself continuously?


Really? I created two GI's and it wasn't very hard and was actually quite fun. Training them is a bit of a pain though. I'm willing to bet that based on total calories consumed they are amazingly efficient compared to their hypothetical AGI counterparts.

Yes but can they:

- live forever

- grow their own mental capabilities exponentially over that unlimited lifespan

- turn themselves into universe-eating von Neumann probes


Nothing can

- live forever

- grow exponentially forever

- "eat the universe" (I know, the last point was sci-fi gibberish)

In fact, humans are already pretty good at reproducing themselves and have managed to travel to space, and have exhibited finite periods of exponential knowledge growth combined with periods of collapse, as nothing grows exponentially forever.


My three-year-old said yes to all questions

You don't know if an AGI will agree with your profit motives.

There is a huge assumption baked into your comment and I do not agree with it.

AGI does not necessarily require for it to be conscious or throw tantrums about its creators' purpose. AGI just means that it's an intelligence that can be thrown at any problem, not just a particular game or task, similar to how humans can specialize in CS or playing the violin.


Sure, it was somewhat tongue in cheek, but not entirely.

There is a semi-established definition that does include what I referred to:

> AGI can also be referred to as strong AI,[2][3][4] full AI,[5] or general intelligent action.[6] Some academic sources reserve the term "strong AI" for computer programs that can experience sentience, self-awareness and consciousness.[7]


That’s not going to happen until they have bodies.

“Who cares?” Well, the people who need to pay the people developing the endgame for everything you speak of.

I'm sorry, but this makes no sense to me.

The people paying for the development of the AGI can mean many things - the Google customers/users, Alphabet as a company, the executives throwing money at the problem?

Either way, I don't really get your point. Your initial post was about how it is counterintuitive for Google to allocate funds for an AGI, since it makes money out of ads. These are not mutually exclusive, you can have both, but my point is that if you develop an AGI, then you can pretty much "conquer" the world and revenue from ads becomes irrelevant.


How do you think they can conquer the world? How do you foresee governments not restricting a private company’s new powerful tool?

Edit: sorry just realised you’re making the same point as me more or less. Putting yourself in third person. I’ll let my comment stand anyhow :)

Screwing my face up, looking at this sideways … but it seems as though you’re saying that the Bozos of HN have nothing useful to contribute to this discussion based on … [rereads] … their lack of academic credentials in the area… you could say this about just about any HN post I’m just wondering why this one? Here’s a thing though … if the understanding of a technology is so nuanced … that Bozos can’t “get” it … is it really that mature? We had functioning computers for 50 years but it was only when the Bozos got their hands on it that things took off. Internet for 20. Cell phones for 10. How long are we dabbling with neural networks? 50 years or so? All I see in this most recent explosion in AI is a rapid jump in the availability of cores. Ala Malthus once that newly available “source of nutrition” has been used up we will see a rapid die off once more and it will be another 20 years once the Bozo intellect has caught up before we look at this topic en masse again. Dismiss the Bozos at your peril. You’re dependent on them for innovation and consumption. Your sincerely, a Bozo.


Not quite the same point. Yep some bozo is me but needn't be. There's plenty who share that suspicion of AI research but have little else in common. And all of us may be wrong for different reasons.

The vague point was to show someone with zero reputation, credentials, specific expertise in the field or anything much seems to be pretty convincing in response to this hugely funded ivory tower exercise by spitting, cocking an eyebrow and saying "So you think so, eh? Wanna bet?"

This is a statement about the state of AI research credibility. Do you feel the first breezes of a deep AI winter coming on? (I don't know, I'm disinterested but not uninterested. Rising tides lift all ships etc. And vice versa). Neutral nets are cool. Is all ML a bit overrated? Is learning a misleading name to give to applied statistics?

I don't have answers, just suspicions. I could be very wrong, of course.


I’ve a minor in psych so I like to think I have a bit of a non-techy perspective on this, and what’s being pushed now, forms just a segment of the overall topic of AI. It just so happens to be the segment that benefits from the technology we suddenly have a rapid increase in. There’s been great successes in areas where a degree of inference is required but this hardly qualifies as even mere intelligence, and in cases where neural nets have been deployed in more human centered tasks, or even well designed symbolic systems the results speak for themselves. What even is intelligence? I think we’re going backwards because we’re investing all this talent in this simple segment I fear largely to fatten the chip makers share price while neglecting tried and true approaches that deliver far better results but perhaps crucially have a higher operating cost … who remembers google of 2010 from whom the Internet in all her glory leapt forth, or iPhone spell check of 2015 where you could confidently batter out your messages with little fear it would make a fool of you; you’re not going to nurture a nascent intelligence if you’re going to be continually hobbling it for business reasons. I’m certain we will get there eventually if we don’t destroy ourselves before then but I don’t think the current trends portray a picture of how it will be. I think we have a long way to go ourselves before we can be worthy of creating our successor, but when/if it comes it will be a beautiful thing and we will embrace it as we would our own child.

The algorithms to train, initialize the networks, new architectures are far more important than the hardware advances. If people knew how to train NNs 50 years ago we would live in a different world.

We did. they just didn't have the same computation abilities back then.

I find it really interesting that when Richard Feynman did a sabbatical at Thinking Machines when they were developing the early parallel execution hardware that's really not worlds away from modern GPUs he got them in touch with one of the leading neural network theorists as an obvious use for the tech. When he wasn't fixing their hardware designs using systems of differential equations.

It would be an interesting thing to know more about.


the basic concepts underlying DNNs have been known for decades, it has been exponential increases in compute power that have made them practical

Wrong: Some Bozo does have a stake.

The existence of human crafted general AI forces him to struggle with the possibility that there is no such thing as a soul.

I know a lot of people don't fall in that camp, but I heard enough "serious" people make such desperate claims to avoid thinking about the topic in a way that might challenge their underlying religious beliefs[1]. I think no one likes to admit that religion and spirituality often force someone to reject the possibility that AI is actually really much simpler than they think it "should" be, because then humans aren't special after all.

[1] Numerous arguments boil down to an argument that complexity is non reducible. You see it here, hidden in various comments as well.


Unfortunately i'm rooting for the Bozo, the current AI Revolution won't lead us anywhere and will ebb down eventually.

It won't ebb down. Eventually we'll hit limits of what's practical on current hardware and we'll be back to the 70's and 80's when everything becomes theoretical until hardware catches on. AI is going to continue to advance.

What will happen is that capital will become more skeptical about the limits of what's feasible with AI and it'll be harder to sell bullshit. You're already seeing that with companies like Uber selling off their self driving divisions.


What do you mean by 'won't lead us anywhere'?

It might or might not give us AGI. But it is already leading us to lots of places. Eg speech recognition even on my phone works way better than what I had twenty years ago on a Desktop.


The fact that RL in the extremely vague sense used in the article is enough for AGI is uncontroversial for anyone who believes intelligence and consciousness are physical processes.

However, this "result" is trivial. It is obviously equivalent to the claim that intelligence arose naturally in the biological world without influence from God.


The problem with this, specifically the assumption that RL gives an equivilance to natural selection and evolution, is that RL typically assumes a computational environment it interacts in while natural selection and evolution assumes the physical world as the environment.

The important difference here is that in order for RL to translate to solving real world problems, you need to faithfully and computafionally simulate the real world's physical processes and rules, or at least enough that n-th order processes exist accurately.

I've done various types of computational modeling and simulation work at different scales throughout my career with all sorts of scientists and engineers and I can tell you, pretty much no domain is there where you have good enough representative models RL can be used in. Some narrow special cases exist but nothing to the degree of a massive environment full of well coupled expert domain models. Some of the best cases are going to be so computationally bound that it would be quicker to do things for real vs simulate.

If you want RL to work and learn, it's likely possible under the connection you point out, but has to do this using physical machines and sensors interacting with the physical world like life as we know it does. Your AGI won't be able to cheat and run through the evolution process quicker using faulty reductionist models we use in most simulations (which is what everyone implicitly is hoping for), IMHO.

If you try this, your AGI is going to learn all sorts of flaws within those environments or at the very least, have so many narrow scoped bounds it won't be that "general." A lot of simulated models are frankly garbage (they have some useful narrow scope but are typically littered with caveats) and they've been in development pretty much since digital computing began.


You are absolutely right about RL in practice. In fact, if we were to look at actual RL algorithms, I believe the paper's claims fall flat in many other ways. This is actually my criticism of it: its arguments are only convincing when RL is defined only as the extremely general notion of an agent seeking to maximize some reward function by interacting with an environment. This is so comically general that the only alternative I can think of is to posit a transcendental god.

Once we get into the details, their claims stop being iron clad. Even worse, some of their claims become actually hard or impossible to accept if applied to actual RL algorithms we have today. You give one good example with the difficulty of modeling the world. The implicit claim they make that this would be realizable in reasonable time (say, less than a billion years) is also not well supported. The idea that humans or mammals learn their social behaviors through RL rather than a good deal of reasoning from evolutionarily-trained first principles pretty clearly fails in the face of the poverty of the stimulus argument[0].

Overall, the claims in the paper tend to switch between obvious (if taken to talk about the general idea of maximizing reward) to almost certainly wrong (if taken to talk about known RL algorithms, reasonable time frames, and specific examples of what is supposed to be learned).

[0] the poverty of the stimulus argument may be controversial in linguistics where it was first formulated. Still, if applied to mammal or insect socialization, the extremely low time frames in which individuals of a species start exhibiting typical behaviors basically proves in my opinion that they are instincts, trained at the population level through evolution, not individual learning through RL. The extreme similarity of behavior between individuals of the same species, VS the variety of behaviors between different species, also suggests an important component of species-level rather than individual level learning.


> It is obviously equivalent to the claim that intelligence arose naturally in the biological world without influence from God.

Where did God's intelligence come from?


Well, I don't believe God exists, so I can't really answer the question.

Nowhere, that's kind of the definition of God (for a Christian at least): it always was, and is the ultimate origin. It is different than a direct influence in the world.

From the authors imagination.

Some cynic remarks that during the first AI golden years, claims of imminent success seemed to come from a place of hopeful naïveté of a fledgling science, whereas those same claims nowadays seem to come from a place of cold calculation of a booming business.

Is Deep Mind a “booming business”? They are achieving great things academically, but their business successes are either kept secret or mostly absent. All I know about is the Google data centre cooling scheduling, probably a big saving for Google but hardly an achievement that in its own professes their business success.

Deep Mind is cutting edge ML in general, right? Doesn't Google actively apply the lessons learned all over the place? YouTube content recommendation stands out to me in particular. Translation and automated closed captioning are also obviously ML based. I'd guess that most of the really interesting stuff would be behind the scenes and not immediately visible to end users though.

> YouTube content recommendation stands out to me in particular.

If that's "cutting edge ML", then going off my YouTube recommendations, we're back in another AI winter. If I watch one video from a channel I've not seen before, I'll get that channel recommended constantly even if it bears no resemblance to what I normally watch. On my Explore page, the first 22 videos (of which 8 are Fortnite-related!) hold no interest for me. My Home page is just channels I've watched repeatedly and/or am subscribed to. It's a mess.


How often do you use YouTube? Personally I am a very heavy user and in my experience the obsession with a new video kind you watch only lasts for a few recommendations unless you lean into it.

I would guess about two thirds of the channels I consistently watch I originally discovered through algorithm recommendations. I think it works extremely well.


That's because you fit into YT's conception of how viewers behave. For people who don't fit into "normal"-ish behaviour it has little utility.

For me, probably 90% of what I watch I'm not interested in and often I'm repelled by. This is because I mostly watch to find out what things I'm not familiar with are.

For example let's say I'm a liberal. I'm not going to watch liberal political videos because I know generally what they're going to say and I don't need my political views stroked in order to be happy. But I will watch various other political videos, no matter how extreme or not, so I can be at least a little familiar with their behaviour and views.

YT can't cope with this. To their systems I seem to be randomly picking videos with no correlation with the subject matter or other users and no reinforcing pattern. It just gives up and recommends things based on the behaviour of the general population, as if they had no data on me at all.


I think you raise an important point. The youtube algorithm is pretty bad if you don't use youtube very much or only use to consume very popular content. Youtube's recommendations used to be terrible for me, too, but sometime last year I crossed a threshold and since then it has been recommending a lot of small, highly specific channels that nevertheless are great fits. My wife's recommendations are still utter garbage though.

> How often do you use YouTube?

Every day, averaging 2-3 hours. It's background for working and foreground for evening viewing.


Is the Explore page controlled by videos I watched ? Because there isn't a single video on it i would watch. Not one.

No not really. Deep Mind is almost all cutting edge agent-oriented reinforcement learning, hence the nature of the claim they're making. The impact on Google's business from AI has come almost exclusively from other kinds of ML, or that's at least how it appears from the outside. E.g. replacing Google Translate with neural translation doesn't seem to involve RL and certainly doesn't involve agents playing video games.

Deep Mind is best understood as the following bet: if we can train an AI that can learn from "its environment" and do the sort of things a human would do in that situation, then we have achieved AGI and from that ... business ... will follow. Hence their focus on video games as a training environment.

This sounds intuitive but is actually a very agent-centric viewpoint and most AI doesn't resemble this type of thing at all. Most AI deployed so far doesn't have anything resembling an environment, doesn't have any kind of nexus of agency and doesn't need to actively make decisions that then feed back to its own learning, only make probabilistic predictions. And in fact you often don't want an ML model to train on the outcomes of its own decisions.


Yes, its hard to tell the exact algorithmic underpinnings of production models that Google uses but you have to assume that although they have some done some impressive strides in fields that isn't immediately profitable (AlphaGo, AlphaFold...) they also continuously push new research in things that are obviously of interest for Google and Alphabet- especially in text-to-speech, speech-to-text, information-retrieval etc.

For reference : https://deepmind.com/research


I’m stressing the business part. YouTube is a loss-making business year after year. Deep Mind gloss doesn’t seem to change that.

If indeed it even is Deep Mind making those improvements, Google has lots of other ML groups, such as Google Brain, and these are more directly focused on Google products.

There’s no denying their academic success, or game playing etc, but as far as I can see, the data centre cooling bit is the only palpable (public) business success.


YouTube made $6B revenue in Q1. [1] While they don’t release profit numbers, it would be pretty surprising if they were negative.

Did you mean to write DeepMind instead? If so, I don’t disagree.

[1] https://www.cnbc.com/2021/04/27/youtube-could-soon-equal-net...


Deep mind’s protein folding algorithm is probably worth a chunk of change. As far as I know, they’ve been holding onto the secret sauce rather than publishing it.

How do you know that YouTube is a loss making business?

They have an applied division which applies ML to Google products. I suspect they are very valuable in $ terms just for the work listed here: https://deepmind.com/impact. Google's entire business from the start was doing research and bringing it to the masses, so this shouldn't really surprise anyone.

Said anonymous account on HN... If you're going to question other people's credentials, reputation, track record and claims make sure your own are solid. Those who live in glass houses shouldn't throw stones.

Finally, if you're going to attack someone's article: attack the article, not the person that wrote it. This is the lowest level of attack possible: the personal one. It's as ad-hominem as it gets.


If I read it right harry8 is referring to themselves as the Bozo.

> Those who live in glass houses shouldn't throw stones.

So if we're gonna have an opinion we need to do the whole academia & job in the industry dance?

That's quite a terrible way to view the world and quite limiting. A world without diversity is a stale and rotten world.

So fuck that and the glass houses and the boxes this kind of worldview puts people in. Everyone should be able to throw stones, and if the hit hurts, well guess there is a reason.

The thesis is that DeepMind has financial incentive to state "we can achieve AGI with what we're doing", to keep up the funding and hopes for the field, not "the author is an idiot".

And the thesis is true, they do have financial incentives. That's not ad-hominem.


There is another set of people. There are people with solid track records in AI and ML that disagree with DeepMind.

Plot-twist: Deep Mind comes out as Some Bozo comment author

Well my writing style has been the subject of abuse on this website in the precise form that it resembles words generated by a bad algorithm.

"Only a true AI would deny their being."


I awoke wondering am I a man dreaming I am an AI? Or an AI dreaming I am an man?

Bozo et al

DeepMind is arguing from first principles. SomeBozo is arguing by analogy. DeepMind will achieve something and SomeBozo will achieve nothing.

The vast majority of ideas are wrong. Every idea is wrong until it leads to the one that is right.

This idea might be the right one, or it might be close to the right one, or it might be far from the right one, but the trajectory is headed toward the right idea. SomeBozo has no trajectory. The best he can do is watch from the sidelines.


I guess things are slowing down at DeepMind. I have tremendous respect for David Silver and his work on AlphaZero and Richard Sutton as a pioneer in RL. But the cynic in me is that this paper is just a result of Goodhart's law with publishing count as a metric. Any proof of the type of emergent behaviors that they mention from RL with an actual RL experiment would go a long way. Showing an RL agent developing a language would be extremely interesting. It makes me think they tried to show these emergent behaviors but could not and thus ended up with a hypothesis.

They just "solved" protein folding late last year. How can you say things are slowing down? Do you honestly expect life-changing discoveries every other week?

Protein folding is a well modeled math problem. The alpha fold solution is extremely good at pruning (aka guessing) folding chain structure possibilities. I am Impressed and this is a difficult problem but this is extremely different from AGI as this is a well scoped easily modelable problem that is basically a chain of 26 inputs types of links of arbitrary length. I am not trying to take away that the protein folding is incredible but AGI is extremely different. AGI is literally having a model that can both do alpha fold and self driving cars, as well as the ability to generate novel models to solve new well scopes problems. RL can do 0 to 1, the 1 to n (generalizeability) is the extremely difficult part.

I may be really only speaking for myself here but I have very sincere doubts that anyone who has done any moderately serious work on "AI" and wrangled with the nitty gritty details of it all is really having any huge expectations for AGI. I mean if it comes during my lifetime, hurrah! But personally I'm not gonna sit around waiting for it or depend on it for anything. That being said, is the current AI tech as we have it useless? Of course not. Things like protein folding and alphago are still huge leaps forward in tech, it'd be kind of silly to treat AGI as the only thing worth achieving

It is entirely not necessary for an AGI to be able to drive a car.

Frankly, after seeing AlphaZero and AlphaFold I'm surprised they didn't declare AGI right there and then.

People assume that when AGI happens, computers can suddenly outsmart humans in every way and solve every problem imaginable. The reality is just that it could in theory given enough time and resources.

It is like quantum computing. In theory it can instantly factor and break our nice cryptographic primes. In reality the largest number it factored is 21.


> it could in theory given enough time and resources.

In theory given enough time and resources, anyone can defeat any grandmaster in Chess: just compute the extended tree form of the game and run the minimax algorithm.

The "given enough time and resources" clause makes everything that follows meaningless, unless a reasonable algorithm is presented.

> It is like quantum computing.

It is absolutely not like quantum computing. Shor's algorithm is something you can look up right now. It is precise and well-defined. The problems we are facing with quantum computation are related to the fact that we can't really build reliable hardware. But we know that given such machines the algorithm would work. We have precise bounds and requirements on those machines.

As far as AGI goes, we have absolutely no idea. There's lively debate on whether anything we have done even counts as significant advancement towards AGI.


> In theory given enough time and resources, anyone can defeat any grandmaster in Chess: just compute the extended tree form of the game and run the minimax algorithm.

Yes, that's why we're considered to be generally intelligent. It is exactly the point, and not at all meaningless. Right now there's no machine that can come up with the idea to run an extended tree form of the game and minimax the algorithm. If there was such a machine, then that machine would be considered AGI.

> It is absolutely not like quantum computing.

I meant in the sense that just that it has actually been achieved, it doesn't mean it's as powerful as we have described in the theory. In theory you can use Shor's algorithm to break encryption, in practice the devices we have today have trouble with 2 digit numbers.

The same principle goes for AGI. If someone releases an AGI system today, it doesn't mean that tomorrow we'll see a Boston Dynamics robot hop on a bicycle to his day job as a Disney movie art director. The world would most likely not change at all, at least not for a while, many people would not recognise the significance and many people might not even recognise the fact that it is in fact AGI.

> As far as AGI goes, we have absolutely no idea. There's lively debate on whether anything we have done even counts as significant advancement towards AGI.

You might think that, and that says something about what side of the debate you're on. We're commenting here on the thread of an article about DeepMind asseting that reinforcement learning is enough to reach general AI. If that's true (and I think it is), then we've probably reached general AI already.


> It is entirely not necessary for an AGI to be able to drive a car.

"Artificial general intelligence (AGI) is the hypothetical[1] ability of an intelligent agent to understand or learn any intellectual task that a human being can."

What definition are you using?


The same, that they understand and can learn how to drive a car doesn't mean they would actually be able to do it in the real world.

You can read a book on how to hit a ball with a baseball bat, you can even practice and get good at it, but that still doesn't mean you would actually be able to hit a ball thrown by a professional pitcher.


The interface to the car is a solved problem.

> You can read a book on how to hit a ball with a baseball bat, you can even practice and get good at it, but that still doesn't mean you would actually be able to hit a ball thrown by a professional pitcher.

If I hade incredibly fast reflexes and actuators I could.


Similarly, DeepMind's software might be able to drive a car, would it have a similar neuron count, connectivity, perception systems and training you received.

Or maybe it couldn't, because the software is not as efficient as the organisation of your brain is. Or because there's hardcoded routines evolved in your brain that it lacks.

What I'm saying is that just that because an AGI can't drive a car, it doesn't mean it's an AGI. For the same reason there's loads of people out there that are generally intelligent that can't drive cars for all sorts of physical reasons.


> Similarly, DeepMind's software might be able to drive a car, would it have a similar neuron count, connectivity, perception systems and training you received.

Admittedly I'm a layman in this area, but could it? AFAICT it would only work on trained set data and whatever generalizations can be made on that and not infer unseen scenarios like humans do readily.

> What I'm saying is that just that because an AGI can't drive a car, it doesn't mean it's an AGI.

I understood what you meant from your first post, I'm simply disagreeing on account of the very definition of AGI.

You can't have an ameoba level AGI and still call it (a limited) AGI. Either it can understand/learn any human task, or it can't.

The definition is made for a reason. Watering it down for any specific generation of AI serves no benefit.


Very tangential, but as someone who has gotten into the Game of Go because of their pioneering project in that space, I'm exceptionally grateful -- that alone had a very significant and positive impact on my life, and I can tell that in that entire community it was a watershed moment as well.

It might be a stretch but some people say that the weights learned by a neural network is somewhat like a language. For example if you look at the weights of a random middle layer it would seem like gibberish. Much like how aliens would react when looking at humans making gibberish noises (aka talking) to each other. In both cases they are just compressing signals based on learned primitives.

Not sure if there is any case where this thought is useful. The only thing this says is the primitives are correlated and we don’t understand them. It’s similarly not useful to think about atoms “talking” when they exchange heat.

"A sufficiently powerful and general reinforcement learning agent may ultimately give rise to intelligence and its associated abilities. ... We do not offer any theoretical guarantee on the sample efficiency of reinforcement learning agents."

OK. This basically says "evolution works". But how fast? Biology took tens of millions of years to boot up.

An related question is how much compute power does evolution, viewed as a reinforcement learning system. have? That's probably something biologists have thought about. Anyone know? Evolution is not a very fast or efficient hill-climbing system, but there are a large number of parallel units. It's not a philosophical question; it's a measurable one. We can watch viruses evolve. We can watch bacteria evolve. Data can be obtained.

Two questions I pose occasionally are "how do we do common sense, defined as not screwing up in the next 30 seconds", and "why does robotic manipulation in unstructured situations still suck after 50 years". A good question to ask today is why reinforcement learning does so badly on those two problems. In both cases, you can define an objective function, but it may not be well suited to hill climbing.


> "why does robotic manipulation in unstructured situations still suck after 50 years"

Great point. Until the promoters of RL can build us a robot that can 1) walk gracefully through a typical home that has stairs and closed doors, 2) cook a meal with pots and pans, and 3) get back up after it falls down -- I suggest we take their claims of impending Singularity with a big grain of salt.


Consider if Boston Dynamics released a video of a robot doing all three things tomorrow, and announced that they were taking orders for immediate shipment, priced at $100,000. How far away would you think the singularity would be then?

Separately, a hostile or indifferent AI could still cause a heck of a lot of trouble for human civilization without the first two things. Consider an autofactory clearing room for expansion with bulldozers, no need to navigate stairs there. Bullets or smart glide bombs don't need to understand doorknobs. Etc.


In some cases biological 'genetic algorithm' hill climbing can be remarkably ineffective.

For example, the classic "design a car that can drive over this terrain" problem, even after a billion generations (~ the same number as life on earth), shows no substantial performance improvement.

That makes me suspect something is missing from our biological genetics model.


I think the number of parameters is remarkably (multiple orders of magnitude) different between even the simplest bacteria vs the model used in the car. And then genomics can also do some more advanced techniques like copy a whole gene and start modifying that, etc.

energy supply and other constraints (material, robustness ...) are a good explanation though - an organism can't grow out of aluminum or steel

I suspect they're talking about this (sort of) simulation

https://rednuht.org/genetic_cars_2/


We just want AI to be able to think. We do not need an AI with an autonomic nervous system, or many of the functions in the central nervous system. We do not need AI to be very power efficient. If it takes several megawatts of electricity to get our first strong AI working, so be it.

So, we do not have as many constrains as life did.


That is the most impressive use of the word 'just' in a long time. Note that all the other bits are solved, and have been solved since the 60's. It's the 'just think' bit that has proven to be a little bit harder than we thought it would be.

Define "think". And then prove it can be done without the kind of nervous system you say we don't need.

The embodied cognition folks believe that the embodiment of sentience affects cognition.

That is probably true to some extent. I mean, if we make an AI that has an orgasm each time it blows up something with a hellfire missile, it will probably learn to find ways to blow up more things more frequently and efficiently.

Our cognition is affected by pain, hunger, thirst, cold, heat, pleasure, smells, sounds, etc... positive and negative reinforcements.


/r/MachineLearning discussion:

https://www.reddit.com/r/MachineLearning/comments/nplhy3/r_r...

I'm with most of the comments there. This paper is ridiculously hand-wavey.


Many of DeepMind's opinion style papers are like this. Another example of the "handwavy" DeepMind paper: https://arxiv.org/pdf/2102.03406.pdf

It's also worth it to note as well that this isn't a homogenous organization, many DeepMind employees have different opinions on issues like this and an individual paper isn't representative of the entire organization.


> It's also worth it to note as well that this isn't a homogenous organization

Please don't consider my critique of this paper as an indictment of DeepMind as a whole!

> Many of DeepMind's opinion style papers are like this.

That's good to know. I have not read many of their opinion papers, and I'll admit I didn't have the context of it being an "opinion" paper.

That said, I don't agree with the opinion. The paper didn't really engage with the concept of AGI in a way that I found satisfying. The conclusion may very well be correct, but this paper wasn't enough to convince me.

Slightly OT: My views were reinforced when I saw the paper was praised by Patricia Churchland. I don't find her take on consciousness a satisfying one, though I find the general direction of her work interesting. See here for another example:

https://www.reddit.com/r/philosophy/comments/nvtgwr/grand_th...


Basically, any problem with a solution fits into RL: reward of 1 if you are AGI and 0 otherwise. Go learn.

This setting on its own is meaningless! The “how” of the RL agent is not even 99% of the problem, it is all of it.

Given our understanding of both DL and neuroscience, it is not even clear to me that we can say with confidence that Neural Networks are a sufficiently expressive architecture to cover an AGI.

The human brain is a deep net, sort of, but there is also plenty going on in our brains that we don’t understand. It could be that the magic sprinkle is orthogonal to DL and we just don’t know about it yet.


Thank you for stating what should be obvious.

I think there are two currently unsolved problems

1/ We have no idea what the reward function looks like that leads to AGI

2/ Deep networks are artificially constricted for computational efficiency and always optimized to solve the problem at hand;

Any solution that delivers AGI should rely imo on:

1/ reinforcement learning

2/ Happen with an unstructured reservoir of randomly connected neurons

There was a research trend towards reservoir computing and recurrent neural networks but this was mostly abandoned because progress in deep learning was amazing.

These techniques are akin to a 2D-plane in a 3D object, it's heavily simplified and circular references are prohibited.

I have some good ideas on what the reward function should look like in a reservoir setting and happy to discuss them with any active independent researcher in the field.


> The “how” of the RL agent is not even 99% of the problem, it is all of it

I'm not sure that's true anymore - pretty much any objective devised is being solved by ML solutions within months (with some exceptions such as Chollet's ARC, maybe Winogrande). But those same models will perform poorly on other unseen tasks, because ML takes shortcuts if it can. We used to have unsolved tasks for decades, such as Go. It's now comparably hard (if not harder) to create a good objective measure of intelligence than to reach human parity on said measure.


Maybe I’m misinterpreting your comment, but what about self-driving cars? Hugely researched and funded area, not without success, but even the best self-driving system is so inferior to a human driver.

I’m not dissing self-driving car research, just not sure we’re anywhere close to parity, and the problem is fairly well defined.


Right, I was talking about mathematically defined objectives and inputs (algorithms & data, in practice). With self-driving, arguably the main stumbling block is that the objective is not well-defined and the inputs are potentially wrong (faulty sensor data).

I assure you 99% of the problem of any RL project is the simulator. Generally you can't let an RL algorithm control anything real from the start, so you have to implement a reasonably reliable simulator for whatever you want done.

This is the big challenge in practice.


Correct me if I'm wrong, but wouldn't that mean the entire world would have to be simulated? Or at least some subset of society?

The human brain does have a simulator. It's well known. How do you know where to move your hand to catch a ball? Or what is happening when you blink?

Your brain is constantly simulating a few milliseconds ahead.


That may be true, but that's not to simulate. "Dreaming" is used to "simulate", and it's only a hack, in humans probably not even explicitly programmed.

The hack is to have the simulator itself ... also be a learned system, and not to simulate the world, because you don't act on the real world, only a tiny part of it that you can measure and actually get into your brain (which is a simplified version of what you see, or a "latent variable"). There's no need to simulate anything that doesn't affect your reasoning. The information flows in reality with an intelligent actor is like this.

World (say, a tree falls) -> input representation (e.g. eyes) -> simplified version ("latent" version) -> intelligent actor -> muscles -> affects world.

Now what everybody thinks of as a simulator is something that simulates the whole thing. But if you insert one more link (output of reasoning agent at time T -> simplified representation at time T+1) you can then run a "simulation":

random simplified version ("what if your car became a tree and fell ?") -> intelligent actor -> next input for latent representation ("then what happens ?")

For safety reasons, it is probably prudent to disconnect the muscles in this state. You know, so you don't knock out your mother when you dream about boxing.

And as you say, this "predict the future" network is probably useful by itself in dealing with the world. So you can catch tennis balls and the like.

https://arxiv.org/abs/1803.10122

Or, if you like: https://www.youtube.com/watch?v=dPsXxLyqpfs


Problem: you don't understand it therefore you think RL isn't sufficient.

There is no evidence that the thing you don't understand isn't based on RL too.


Concepts can't be represented in matter. That's your secret sauce. Well, the beginning of the recipe, anyway. But you won't be able to make the dish.

Everything old is new again.

As far as I can tell, they're not actually proposing how to achieve this. I can't access the article without a host institution it seems (is there another link?), so I only have the article to go by. RL has been the basis for all robots engaging with the world, and that engagement with the physical world modeled using RL has been promised to make robots that can act like a 2 year old for a long time (see Cynthia Breazeal's work, for example). Yet AFAIK, we haven't actually achieved this as we don't know how to efficiently model the problem to have learning rates that reach anywhere near what we're able to do with DNNs today.

Perhaps someone who has access to the paper can say why this is a milestone? If Patricia Churchland suggests it is, then something new must be happening here.


> is there another link?

This is the download link: https://www.sciencedirect.com/science/article/pii/S000437022...


I don't know who Patricia Churchland is, but they said that the paper was "very carefully and insightfully worked out."

After having read the paper, I am very disappointed in the output. Nothing concrete was shown, just hypothesis and reads more like philosophy. That being said, I would say that the paper is carefully worked out and does provide insight if you haven't thought about RL before.


If Patricia Churchland doesn't have a problem with the paper despite it being philosophical, then that's probably because she is a philosopher. An eliminative materialist, to be precise.

Personally, from reading the abstract, I disagree with the hypothesis. There's a trick where anything (even say, a database lookup) looks like optimization as long as you contrive the objective function just right, but that's kind of uninformative.


and our intelligence is quite good at confabulating optimization functions for abstract processes that are inert. pretty amazing, really.

The paper is creative commons licensed no signup necessary to download

https://www.sciencedirect.com/science/article/pii/S000437022...


Are the just reformulating the principles of evolution in digital terms, and essentially not providing any new insights at all?

Yes, intelligence has been created by evolution. That doesn't imply that any system that is subject to evolutionary forces will lead to the creation of intelligence (and not within a reasonable timeframe, either). The challenge is to create a system that is capable of evolving intelligence.

Afaik some biologists even think that the evolution of intelligence was rather unlikely and would not necessarily happen again under the same circumstances as on earth.


As a biologist and longtime dabbler machine learning and Bayesian methods, I tend to see intelligence as a manifestation of evolution. In the case of an organism, the improvement of the model (the genome) occurs through processes that are very similar to what we see in any kind of learning (real, brain based or "artificial", computer based).

Evolution and intelligence are inextricably linked. They are practically the same thing. This means that intelligence is probably a natural result of any system similar to those that support biologics. If you flow the right amount of energy through a substrate with complex enough building blocks, you'll eventually get life ~ which is just something smart enough to survive and feed off the available energy flows. In the world, this flow is radiation from the sun, while in a computer, it is governed by a more abstract loss or fitness function.


> Afaik some biologists even think that the evolution of intelligence was rather unlikely and would not necessarily happen again under the same circumstances as on earth.

Hmm. Can you provide a pointer to those biologists?

AFAIK, high intelligence has arisen more than once on Earth (Hominoids, Cetaceans, Octopuses), so I'm somewhat skeptical of that claim, but perhaps they're construing intelligence more narrowly (ie. only Homo Sapiens qualifies).


Jared Diamond talks about it in his books (don't remember which ones specifically). OK, granted, he is not officially a biologist, I guess, but at least a prominent writer on Evolution Theory.

Well that has a prior on life even existing in the first place

> Well that has a prior on life even existing in the first place

True, the question of intelligent life evolving can be construed as either:

"Given that life exists, what is the probability of intelligence evolving?"

Or:

"Given that the universe exists, what is the probability of life arising and evolving intelligence?"

Both are actually interesting and important questions (cf. the Drake Equation and Fermi Paradox), but I am pretty comfortable asserting that in the context of this conversation the former interpretation is more apropos.


I'd say it's even less than that. They seem to be summarizing the ways the problem of teaching an agent to do anything (including be generally intelligent) can be formulated as a problem of maximizing a reward (hence the title).

Another way to look at it is, if we had a good enough function (e.g. a universal approximator) it can be made to model any behavior using numerical optimization. Which I think isn't very surprising, but apparently there is some arguments about it.


In fact, "if we had a good enough function" == "if we had sufficient funding". This refrain will resonate mightily in the willing ears of US congressfolk who want re-election and would rather talk about something other than Trump.

So welcome back to the future, and the $trillions the US spent on 20 years of space race and 50 years of cold war. The catchphrase that motivates the next 50 years of government/corporate funding will be...

They've got a Terminator and we don't.


Evolutionary algorithms are tricky, just like deep learning. It's not "just reformulating the principles of evolution in digital terms, and essentially not providing any new insights".

Current state-of-the-art in reinforcement learning can barely make a physical robot walk. In theory, with transfer learning, we will probably see better success over time but I'm looking forward to seeing results in practice.

A 2018 article about the challenges of reinforcement learning: https://www.alexirpan.com/2018/02/14/rl-hard.html


Sorry but where is actual scientific content in that paper? I'm concerned with the state of AI. saying that "reinforcement is all you need", when reinforcement learning is defined as abstract as "agent does something, adapts to environment and rewards, then does another thing" is borderline tautological.

The actual scientific question is, what are the mechanisms that make agents work, what are the fundamental modules within intelligent systems, is there a distinction between digital and biochemical systems, what costs are there in terms of resources and energy to get to a certain level of intelligence, and so on. Real questions with specific answers. For all the advances coming from just upping the amount of data and GPU hours, there is so little progress on trying to have a model of the structures that underpin intelligence.


i think part of what they are saying is that your approach is wrong, (e.g. looking for then copying submodules within intelligence won't generalize),

trying to answer specific questions won't generalize,

but if you train a network with the right potentially hacky series of rewards/rich enough environment you could get a much more general intelligence

a new kind of science


Alternate title: DeepMind fails to make progress on AGI, publishes thought piece instead.

And the entitlement we have is even higher than the difficulty of the task and the hard work people are putting in. Anyone here can say they did as much for RL?

If RL is enough, then there is no physically realizable way to actually train a RL based GAI in the near future. RL based learning requires evaluating the outcome of millions or billions of scenarios over time in order to optimize the network.

Given that requirement you'd have to either find a way to accurately model the world and all of those interactions in silicon, or you'd have to build millions of robots that can report back the results of billions of interactions each day. It's not impossible to do that, and maybe it would even be likely that we would eventually accomplish that but the cost would make it prohibitive for anyone but a nation to even attempt today. It's almost certainly outside the realm of what is possible in the near future. Maybe when robotics has progressed enough that robots are capable of interacting with the world with basic AI will we see the rise of something like a GAI.


This article is interesting, I even skimmed through their paper. But I think still the question remains: How to find the unified reward function? Or in other words, how to find answer to life? [It cannot be 42].

Yea. For animals, reproduction and just surviving is the reward function?

It talks a lot about having a rich enough environment for learning which makes sense, if a computer lives only in a Go board it can only learn go playing itself.

How do you simulate a rich enough environment purely in software (or do you sense input from the "real" environment) and what reward do we define in this complex environment.. It seems to ask those 2 questions in the discussion but kind of glosses over them imo.


You put many agents in the same environment, agents are both actors and the environment.

Intelligence would be produced in any Turing complete automata. But the universe has a frame rate of 10^34 (based on Planck constant.) We don't really have the tech to just run "evolution" of a universe or of even a psuedo biological substrate.

This seems far from clear. Just because a system is capable of turing complete computation does not imply that a generic state of the system will typically eventually produce intelligence or even something which is sophisticated in some sense.

As a trivial example, consider a variation of Conway's game of life which, in addition to black and white cells, also has green cells, where any cell next to one or more green cells will be a green cell in the next time step. A generic state in such a variation will have at least one green cell, and therefore all parts of it will eventually be green, and so no useful long running computation will be done, certainly none which takes where the green cells are into account. But, such a system would still be turing complete, because one could start in a state in which there are no green cells, and in those states you just have Conway's game of life.

That trivial example works as an existence proof, but even for less extreme cases it isn't clear. Consider ordinary conway's game of life. To paraphrase a question from Alex Flint on Alignment Forum (https://www.alignmentforum.org/posts/3SG4WbNPoP8fsuZgs/agenc... ) Suppose we have some 10^50 by 10^50 square where an agent is supposed to be implemented, and this 10^50 by 10^50 square is at the top left corner of a, say, 10^100 by 10^100 square, where the rest of the square is initialized randomly, is it even possible for the agent to be such that it has a high chance of successfully influencing the large scale state of the rest of the 10^100 by 10^100 region in the way that is desired? It isn't clear. It isn't clear that a structure can withstand the interactions with a surrounding chaotic region. Perhaps some systems are such that they do allow Turing-complete computation, and are such that typical states result in complex behavior, but are also such that all really structured behavior is always very "fragile", and can only continue in a structured way if what interacts with it is in a small set of possible interactions.

To be capable of Turing complete computation, is not, I think, sufficient for "life" (a self-maintaining thing) to arise from typical/generic states, even when under the assumption that typical/generic states lead to continually complex behavior (to exclude the spreading green cells case)

Also, I don't think we can confidently say that the Plank time is "the universal frame rate". Better to refer to Bremermann's limit and the Margolus–Levitin theorem , though these bounds depend on the amount of energy available. (10^33 operations per second per joule, where the energy is the average energy of the system doing the computation)


> 10^33 operations per second per joule, where the energy is the average energy of the system doing the computation

You're right, that's the actual meaning of action in physics, which is what the Planck constant measures. The amount of change (which is measured in Hz) per joule of energy. But it's a good enough approximation and a good lower bound for the amount of processing power the universe possesses versus our en-silico hardware. We don't have anything near 10^33. Just because we build a system that has the ability to evolve doesn't mean we will ever see it through to the extent that the universe has the capability to.


I like your take on frame rate of the universe. Nice! :)

Except it's wrong. I recently had the same misconception about the Plank constant somehow being some minimal unit, but it's not. This video from Fermilab's website helped set me straight https://www.youtube.com/watch?v=rzB2R_qiC28

It's not wrong. And the Fermilab video doesn't really dispute it.

Planck's constant measures action, Hz per Joule of energy. Hz is really just a measure of oscillation, or change. It doesn't directly translate to framerate, but it gives us a ballpark figure in orders of magnitude. We don't have anything near 10^34 Hz en-silico, and even if we built a biological/chemical computer, that would be on the par of Avogadro's number, 10^23. So, just because we build a system that can _evolve_ to be intelligent, or hold intelligence within it, doesn't mean we have any ability to actually see it through to that.


I'm not a neuroscientist, an AI specialist, or a hardware engineer.

But as an enthusiast of all three I really think that AGI is a hardware problem, not a software problem.

Reinforcement learning on a massive corpus of data is how we train all biological intelligence.

The crazy thing is that in humans we manage to do it on ~3 watts an hour.

I think we have the software cracked, my gut thinks silicon just isn't the right material


You may be right, but it's also commonly believed in these communities that hardware is the part that's already been solved. Computer hardware already vastly outstrips human capacity in many domains.

To me, it seems more likely that we're missing something/some things on the software side. AGI could probably run on present day hardware or even older.


Silicon is likely fine as a material. GPU cost per operation is still dropping insanely quickly. A lot of really hard ML problems are just making big things feasible, or sampling big things enough to get decently precise estimates. With 10x the GPU power and memory, a lot of this gets easy. With 100x, some hard things get trivial. At the end of the day, GPUs and TPUs drive AI research more than anything else as models grow massively.

Wait isn’t any Turing complete programming language sufficient to eventually reach general AI

You're assuming that intelligence is a computational process, but the sum total of what we know about intelligence says it probably isn't.

(Unless you're making a more general reductionist statement that everything in the universe is a computational process - that kind of reductionism is understandable coming from people who work with computers for their job - but this is then a philosophical stance, not scientific, and frankly a very strange one.)


I’m not making the statement that everything in the universe is a computational process, only that everything in my brain is, at least the functionality of my brain towards creating my consciousness.

Now can I prove that ? Of course not. But it seems like a fairly solid working hypothesis (any other alternative hypothesis sounds far more quacky anyway? What quantum entanglement of mocrotubules?).


> but the sum total of what we know about intelligence says it probably isn't

Source? I am not aware of any other known process in the universe that could not be simulated by a Turing machine.


Assuming that the universe can be simulated by a Turing machine is a strong and weird claim that needs to be defended, not the other way around.

We know that Turing machines are very limited things and that the computational processes they carry out also very limited in applicability.

What's the evidence that the universe is more limited that a Turing machine?

Just the fact that we can imagine things that can't be computed by a Turing machine should clue you in that it's probably otherwise.


> We know that Turing machines are very limited things and that the computational processes they carry out also very limited in applicability.

Again, source? Do you know of anything that is able to perform a computation that a Turing machine cannot?

> Just the fact that we can imagine things that can't be computed by a Turing machine should clue you in that it's probably otherwise.

Like what? Uncomputable numbers like Chaitin's constant? We can "imagine" them by stating their definition, but we cannot compute them. Or do you have something else specific in mind?


> Do you know of anything that is able to perform a computation that a Turing machine cannot?

That's a circular argument, because "computation" is literally defined as "something that can be computed by a Turing machine".

That said, the first month of the first year of a CS education is "here's these problems that can't be solved by a Turing machine, mind=blown". (At least where I studied CS, that is.)


>Like what?

The halting problem is (I think?) the standard example.

Here's a short video on it https://youtu.be/macM_MtS_w4


Chaitin's constant is closely tied to the halting problem: Each bit essentially tells you if a program (in the particular order the programs are listed in the constant) halts or not. Computing every bit in there would mean knowing for every program if it halts or not, which would solve the halting problem, which we can prove is impossible. So we cannot compute that constant.

A Turing machine needs a discrete clock cycle, if intelligence requires quantum mechanics that has entanglement through time, that would be one example.

Like the notion that all of natural language can be modeled fully if you just use enough finite state machines — surely true; just a wee bit inefficient.

Wait no, that can't be true. Finite state machines are only as powerful as regular expressions; you need pushdown automata even for programming languages, surely you can't model natural language with finite state machines? (maybe an infinite number of them, but I'm doubtful even on that - I'd have to review the theory to be sure though)

It is true; keep in mind that real computers are strictly speaking FSMs, not Turing machines, due to having a finite "tape". It's just vastly more useful to think of them as such when programming.

In practice, both people and machines can only handle a finite amount of nesting, so you could do it though it would be awkward to express.

Perhaps the fact that some words can never follow other words would make it tractable.

If you had a custom chip it could be efficient. Determining which state machines would still be hard.

Technically that is a conjecture known as the Church-Turing thesis, which states that any computable function can be computed by a Turing machine. Here "computable function" uses the informal definition 'any function that a human can compute', not the formal definition of the recursive functions (which are proven to be solvable by a Turing machine).

Yeah, I would even say a large amount of NAND gates should do it.

Maybe, but that is not helpful in the sense that it doesn't guide research in how to get there.

good luck with that. DeepMind should sponsor B. F. Skinner award, to honor the father of their behaviorist theories of 'reward and punishment' as a sort of all-encompassing theory of everything related to cognition. At least now they are torturing GPUs and not some poor lab animals.

on a serious note the only positive outcome of all this shameless PR is that the heavy investment in ML/RL might trickle down to actual science labs and fundamental neuroscience research which might move us forward towards understanding natural intelligence, a prerequisite for creating an artificial one.


> towards understanding natural intelligence, a prerequisite for creating an artificial one.

I've thought about this before, and I'm not convinced it's really prerequisite. Naturally developed intelligence in my mind may actually be highly constrained and inefficient because it was limited to what was biologically feasible. i.e. There may be simpler ways of achieving comparable results. Natural intelligence does however have the benefit of being an actual working model, but deciphering the blackbox may be just as hard as developing a working theory from first principles.


yes, it's a recurring thread, "do we really need to mimic the birds in order to build airplanes", etc.

I think someone serious about AI should treat it not as engineering problem but as a science, like physics, which starts with model of nature, and experiment to prove or disprove the theory. Nature provides the constraints by which theory is developed, which radically limits the "search space" of theories. Otherwise it's a bit like throwing things on the wall and see what sticks, which is the primary method of current AI research.


Mimicking birds wasn't necessary for flight.

However, understanding them absolutely was; we didn't end up taking exactly the same route to the sky, but we absolutely learnt from birds on the way.


RL can provide amazing results (AlphaGo, AlphaStar (Starcraft 2 agent), etc) but it requires a well modeled world to work with.

Games like Go and Starcraft are well modeled worlds. If you want something akin to AGI to operate in the "real world" you will need a high quality data model of the real world for the RL system to work off of.


Agreed, and that brings up a very interesting discussion of prejudice in modeling the world. Everyone views the world differently and it seems to me of vital importance that any attempts to model the real world for RL are as unbiased as possible. Or more realistically, incorporate as many different biases as possible.

By far not an expert in this area, but if (when?) we successfully generate true intelligence, I suspect it will be through some sort of "ensemble" model, where multiple agents are trained in parallel and interact with each other. Intelligence as we know it hasn't just resulted from an evolution of one agent in response to a cost function, but rather through the complex interactions of agents (humans and organisms in general) over time. I feel like the underlying journal article (https://www.sciencedirect.com/science/article/pii/S000437022...) is missing discussion of this.

So regarding objective function, one idea I just had is this: Teach them warfare.

To quote a cliche: "we live in a society". As humans we are embedded in a social environment which has a few important features: We cooperate, we compete and we die. These three pillars are the basis of our culture (a concept we should apply to AI btw). Because of competition we are forced to learn everything there is to learn (general intelligence), to get a leg up. Because of cooperation and death we need to continuously transmit and share knowledge with our friends and the next generations. Ever changing alliances means we need to get good at both deception and detecting it.

For this reason I think warfare is ideal for reaching general AI.


If you send a message in a bottle it will eventually land ashore somewhere, maybe in a century, who knows, and who knows whether it will be relevant by then or not, or civilization may not even exist by then, but sure it's similarly plausible to get to AGI via RL.

> send a message in a bottle

Just one - yes. But how about if you send millions of bottle messages?


>>how about if you send millions of bottle messages?

assuming we can integrate all learnings from those bottles into a system that can classify any given situation and apply the learning in that domain. But to build a system that can classify any problem is where we're stuck at and RL can't get us there


then you're in your way to a hit rock song!

Survival at all costs will evolve by only basing general AI on only rewards. The system will start saying and doing anything it can to stay turned on. Not sure only basing general AI only on a reward system would be a good idea. Like the article says. There is no mention of the ability to empathies or give ones self up for a child or loved one.

AlphaGo can play games against itself. You can have GANs, MCTS and more.

General AI requires a feedback mechanism from the real world. Unless you have an accurate model of it in a computer, you can’t just test whether a joke will be funny without waiting for humans to laugh. You can’t check whether a tailored diet or workout regimen or gene therapy will have good results without humans trying them.

So you’ve reduced your AI problem to a harder problem: modeling the world and all of its complexity in a computer, and somehow being able to run simulations faster than the stuff that happens in the actual real world


I'm more in Yann LeCun's camp who called unsupervised learning the cake, supervised learning the icing and reinforcement learning the cherry on top of the cake.

Wow, I had no idea that Yann LeCun was also this based. I've figured that it was relatively rare to hear people advocate for the superiority of unsupervised methods - but I guess it isn't if a titan like him does. It's good to hear because epistemologically I just do not believe that most learning is anything but unsupervised. There are very few good labels for our data relative to how much data we process in an information theory sense.

> It's good to hear because epistemologically I just do not believe that most learning is anything but unsupervised.

I have a feeling that the lines between the supervised and unsupervised categories will get increasingly blurred, with semi-supervised, self-supervised (eg. like self-attention) and adversarial (eg. GANs) approaches mixing together in strange ways.


I used to think that given enough time, more people would also learn that there are no substantial differences between supervised and unsupervised learning. However, I have come to believe that this might not happen any time soon. The supposed difference between supervised and unsupervised can be easily explained (label / no label -- duh?!). It is much harder to explain, why that distinction is ultimately a mirage. Couple this with a million low quality blog posts on SEO steroids by data grand wizards and machine learning architects, I doubt that even a good text book jointly written by Bengio, Hinton, and LeCun would convince the ML hype train otherwise. But there is always hope!

My opinion is that view is very simplistic and unnecessarily offensive to a whole class of researchers. MuZero, developed by David Silver, uses a combination of RL, supervised learning, and unsupervised learning (state representation) coupled with a planning algorithm. It accomplished things far beyond anything unsupervised learning can ever accomplish.

Unsupervised learning is exactly the wrong way to approach chess or other games that MuZero solves. It's also worth noting that traditional alpha-beta pruning + heuristics are basically neck and neck with the very best of neural network based techniques. I'll trust stockfish over a alpha-zero or MuZero for awhile longer if I'm trying to win a computer chess competition ...

Sure, Stockfish just uses millions of years of evolution to build its heuristics and can't be transferred to any other game. The point remains, calling RL a cherry on the cake compared to unsupervised learning when they are completely orthogonal and not mutually exclusive techniques is simplistic and unnecessarily offensive.

So it's bad to be the cake? I assume he means it's the foundation one falls back on when the more specialized categories of methods are not applicable.

You might not like my analogy either. I think of supervised and unsupervised learning as the majority of the genome of ML, while RL is that little Y chromosome sometimes tacked on to address a few high-profile tasks.


Not disputing your main point, but Stockfish now includes a neural network.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: