Looking at it from CogSci perspective it is essentially an offshoot of behaviorism, using a coarse and extremely inefficient model of learning as reward and punishment, iterative trial and error process.
This 'Skinnerism' has been discredited in cognitive psychology decades ago and makes absolutely no biological sense whatsoever for the simple reason that any organism trying to adapt in this way will be eaten by predators before minimizing its "error function" sufficiently.
Living learning organisms have limited resources (energy and time), and they cut the search space drastically through shortcuts and heuristics and hardcoded biases instead of doing some kind of brute force optimization.
This is the case where computational efficiency  comes first and sets the constraints by which cognitive apparatus needs to be developed.
As for actual cognition models a good place to start is not ML/AI field (which tends to getting stuck in local minima as a whole), but state-of-the-art cognitive psychology, and may be looking at research in "distributional semantics", "concept spaces", "sparse representations", "small-world networks" and "learning and memory" neuroscience.
You'd be surprised how much knowledge we gained about the mind since those RL & ANN models developed in the 1940s.
> "Living learning organisms have limited resources (energy and time), and they cut the search space drastically through shortcuts and heuristics and hardcoded biases instead of doing some kind of brute force optimization."
But those heuristics and hardcoded biases were developed through brute force optimization over the course of billions of years, a massive amount of energy input and many organisms being devoured.
This is true in the context of the universe as a whole, not by the organism itself.
Or unlucky, as the case may be.
One can rationally argue either way over the speculative proposition that reinforcement learning will yield AI in less than a few million years, but that it took evolution half a billion years is hardly conclusive, and certainly not grounds for stopping work.
I’m no expert, but my personal opinion is that AGI will probably be some hybrid approach that uses some reinforcement learning mixed with other techniques. At the very least, I think an AGI will need to exist in an interactive environment rather than just trained on preset datasets. Prior context or not, a child doesn’t learn by being shown a lot of images, it learns by being able to poke at the world to see what happens. I think an AGI will likely require some aspect of that (and apply reinforcement learning that way).
But like I said, I’m no expert and that’s just my layperson opinion.
 if the goal is AGI, if it’s not then of course there’s no reason to stop
 some people are doing just that, of course
Well, if it follows, then it follows necessarily. But maybe that's just a deformation professionelle? I spend a lot of time working with automated theorem proving where there's no ifs and buts about conclusions following from premises.
In your work on theorem proving, am I right in guessing that there are no 'ifs' or 'buts' because the truth of premises is not an issue? In the "evolution argument", the premises/lemmas are not just that evolution took a long time, but also something along the lines of significant speedup not being possible.
You might notice that in another comment, I suggested that we might still be in the AI Cambrian. I'm not being inconsistent, as no-one knows for sure one way or the other.
But, to be sure, as is common when this kind of informal conversation suddendly sprouts semi-formal language, like "argument", "claim", "proof", "necessarily follows" etc, I am not even sure what exactly it is we are arguing about, anymore. What exactly is your disagreement with my comment? Could you please explain?
Insofar is I have an issue with your comment, it is that it is not going anywhere, as I explained in my previous post.
I see this god-moding of my comment as a pretend-polite way to tell me I'm
takling nonsense, that seems to be designed to avoid criticism for being rude
to one's interlocutor on a site that has strong norms against that sort of
thing, but without really trying to understand why those norms exist, i.e.
because they make for more productive conversations and less wasting of
You made a comment to say that unless I claim that X (which you came up with),
then my comment is not going anywhere. The intellectually corteous and honest
response to a comment with which one does not agree is to try and understand
the reasoning of the comment. Not to claim that there is only one possible
explanation and therefore the comment must be wrong. That is just a straw man
in sheep's clothing.
And this is not surprising given that it comes at the heels of nitpicking
about supposedly important terminology (necessarily!). This is how
discussions like this one go, very often. And that's why they should be
avoided, because they just waste everyone's time.
Meanwhile, in a branching thread, I had a short discussion with the author of the post I originally replied to, in which I agreed with the points he made there. Both of us, I think, clarified our positions and reached common ground. That is how it is supposed to go.
I did not set out to pick a fight with you, and if I had anticipated how you would take my words, I would have phrased things more clearly.
In short- Turing completeness is no guaranteed path to AGI. Assuming an "AGI program" exists, it is hidden away in an infinity of almost identical, but not quite, programs.
it's always 5 years away because mainstream AI researches are stuck with yak shaving their gradient descents.
I mean you can't just throw things at the wall and hope they stick, but it's literally the state of the art, if you follow ML conferences and their world-beating toy benchmarks results, with a lot of pseudo-rigorous handwaving for theory.
The reason physics has been so successful is that their theory closely followed empirical data and constraints imposed by nature.
I think the only hope to achieve common sense in AI is to align it with hard constraints living organisms have, using those constraints as a guide.
A few terms I mentioned are coming from that POV, if you dig a bit deeper they all have direct physical manifestation in natural learning systems.
A small correction: that's deep learning researches, not AI researchers and not all machine learning researchers even. To be charitable, it's not even all deep learning researchers. It's just that the field of deep learning research has been inundated with new entrants who are sufficiently skilled to grok the practicalities but lack understanding of AI scholarship and produce unfortunately shoddy work that does not advance the field (any field, any of the aforementioned ones).
As a personal example, my current PhD studies are in Inductive Logic Programming which is, in short, machine-learning of logic programs (you know, Prolog etc). I would not be able to publish any papers without a theoretical section with actual theoretical results (i.e. theorems and their proofs - and it better be a theorem other than "more parameters beget better accuracy", which is not really a theorem). Reviewers would just reject such a paper without second thought, regardless of how many leaderboards I beat in my empirical results section.
And of course there are all the other fields of AI were work continues - search, classical planning, constraint satisfaction, automated theorem proving, knowledge engineering and so on and so forth.
Bottom line- the shoddy scholarship you flag up does not characterise the field of AI research, as a whole, it only afflicts a majority of modern deep learning research.
Brainfuck is also Turing complete, so logically if we just do, for instance, Markov chain Monte Carlo for Bayesian program learning in Brainfuck, we can realize AGI that way.
"Everything is possible, but nothing is easy." The Turing tarpit.
This seems like overstating your point. Nobody has been able to rigorously define "AI" yet, so there's no way of saying whether it's possible with a Turing machine architecture. The human brain, at least, doesn't seem that similar to a Turing architecture. Neurons don't carry out anything like discrete operations.
Maybe it's possible to run AGI on a Turing machine, maybe it's not, but there are more options than simply "possible with a Turing machine" or "completely impossible".
The article at the top of this thread is specifically about properties ("The Generalized Archimedean Property") that real numbers do not possess.
There's also a little bit of slipperiness around the use of "AI" vs. "AGI" - you could easily argue (and people do!) that we've already achieved "AI" for many specialized domains. It's the General bit that seems to be the sticking point, and that this article focuses on.
Arguing from ignorance, of course nothing is knowable for certain. However there has been a lot of work on the universality of Turing machines, showing that a Turing machine can simulate any conceivable concept of finite computation and can approximate any conventional physical system.
I think a more useful way to express your intuition is to note that if human-built AGI comes into existence, it might be runnable on a Turing machine but quite possibly not efficiently so.
I'm a layman (just a software engineer) but am curious, I train my cat only with rewards (never punishment because apparently doesn't work on cats) and the kitty learned how to high-five me, sit, jump, follow me etc. It seems to work really well for us. Basically, ever time he does something desirable, I click my pen and give him his favorite treats. Is this ineffective?
But regardless, the broader point is yeah, combine something akin cognition capability and the proper training routine and there you go, AGI from "reinforcement learning", broadly defined.
For example, your cat's brain isn't just a randomly initialised neural net. Your cat comes pre-wired in such a way that it understands certain things about its environment and has certain innate biases that allow you to train it to do simple tricks with relative ease through a reward mechanism.
A more analogous example would be building a cat-like robot with four legs and a neural processor then switching it on and expecting to be able to train it with treats. Without a useful initial neural state (founded with an understanding of cognitive psychology and neuroscience) it would be almost totally useless.
Obviously I could be missing some great breadth and depth of research (there's definitely a lot I don't know) but from what I've read "we have no idea" is a pretty accurate description with how far we've come when it comes to consciousness, and I would imagine even less for the newer field of AI/AGI (consciousness has been around for a while P: and our theories have mostly sidestepped this real world phenomenon).
> "The idea of using "rewards" as a learning mechanism and a path to actual cognition is just wrong, full stop."
This to me is a huge red flag (mostly of ego/hubris). I think if we rephrased the goal to not talk about "AGI" and maybe around quantitative things like the things you've listed ("computational efficiency", likelihood of being stuck in local minimimas, etc) then I'd happily concede that we should be looking at "X" and not "Y" but unless I've missed something, again likely, when we're talking about AGI, we're talking about consciousness (epiphenomenon that come about through physical/deterministic interactions). A quick way to gut check myself here is twisting what you state is not a good place to start "ML/AI field... gets stuck in local minima" and ask myself is it possible that local minima (which we consider "bad" for current/traditional tasks) could be necessary for consciousness ? I think the widely accepted answer to this is currently "We don't know".
If I think that achieving AGI is going to be similar to what the algorithms and architecture we currently use (where the likelihood of being stuck in a local minima is something we can look at) then sure, your opinions stand. But that is just a guess and unless I'm mistaken AGI hasn't been achieved because we don't know how to do it.
This isn't to say that we should have 100% of the data before making strong judgements like this about a subject. It's just that the subject of "consciousness" is a big one (I'd say THE big one) so making such strong statements about something we know we don't know much about is interesting. <- this is where I get flashbacks to SE world where a missing piece of data can really throw you off or leads to wrong assumptions and when I think about consciousness we know we don't know a lot.
"Penrose argues that human consciousness is non-algorithmic, and thus is not capable of being modeled by a conventional Turing machine, which includes a digital computer. Penrose hypothesizes that quantum mechanics plays an essential role in the understanding of human consciousness. The collapse of the quantum wavefunction is seen as playing an important role in brain function."
All talk about this is premature and "pre-science", before we figure out more basic, fundamental things like object storage and recall from memory, object recognition from sensory input, concept representation and formation, the exact mechanism of "chunking" , "translational invariance" , generalization along concept hierarchy and different scales, representation of causal structures, proximity search and heuristics, innate coordinate system, innate "grammar".
Even having a working, biologically-plausible model of navigation in 3d spaces by mice, without spending a ton of energy training the model, would be a good first step. In fact there is evidence that navigational capacity  is the basis of more abstract forms of thinking.
On all of these things we have decades worth of research and widely published, fundamental, Nobel-winning discoveries which are almost completely ignored by the AI field stuck in its comfort zone. Saying "we have no idea" is just being lazy.
Edit: As for OP's actual paper I think something like complex-valued RL  might bypass his main claims entirely. But my point is that RL itself is a dead end, trivializing the problem at hand.
Look, uh, I've read Gardenfors too, but are those really the state of the art? I don't remember there being anything about them at CogSci this past summer. Maybe I wasn't paying close-enough attention?
Some interesting recent work  related to Gardenfors ideas was combining them with discovery of place & grid cells, and extending the "cognitive maps" and spatial navigation machinery into concept spaces, treating the innate coordinate system as foundation for abstraction and generalization facilities.
And they actually found empirical data to prove it in  and related papers, so Gardenfors was right.
I believe it gotta be the starting point for anyone seriously considering an AI, kind of like Cartesian foundation. It also aligns nicely with rich "distributional semantics" work and popular vector space models.
A lot of research in RL is focused on intrinsic motivation and the question of whether we can bootstrap our own 'rewards' from our ability to predict and control the future according to some self-defined goals/hypotheses.
Edit: for authoritative reference on biologically-plausible learning see anything by Edmund Rolls . He explicitly stated in his recent book  that something like back-propagation, or similar error correction mechanisms have no supporting evidence in experimental data collected so far
Although the paper is trying to say that the real number system isn't robust enough to express the goal/purpose of more complicated, "abstract" tasks, it speculates that a higher-order number system (such as the hyperreal or surreal numbers) would be able to achieve this. I currently disagree with this view - I view of "intelligence" as we know of today more as emergent phenomena of complex systems of autonomous agents (in the case of human intelligence, the emergent phenomena of neurons and other cells interacting with the external world), but that's a topic for another day.
>it speculates that a higher-order number system (such as the hyperreal or surreal numbers) would be able to achieve this
I didn't mean to give that impression, sorry if it came off that way. Rather, what I say is that those number systems don't suffer the particular flaw that the real numbers suffer. There might still be other flaws. That's why in the beginning of Section 4 I wrote: "There are at least two potential ways to change RL so as to make it applicable to such tasks and, thus, at least potentially capable of leading to AGI. Of course, there is no guarantee that removing the roadblock in this paper will cause RL to lead to AGI. There might be other roadblocks besides the inadequate reward number system"
Furthermore, infinite rewards are not compatible with human behavior. Humans never optimize for a single event at infinite expense w.r.t. other goals.
If that's correct, this may be just one of those many problems where the optimal solution is far harder than a near-optimal solution. Examples include linear programming and the traveling salesman problem, where a true optimum is NP-hard to find, but you can get very close with far less work.
I was going to criticize this paper as crankery in the vein of Penrose, but first I thought I'd just compute all possible ordinals in my brain to make sure I'm a general intelligence.
Rather, my paper is on the less ambitious question of whether the traditional RL model (with its real-valued rewards) accurately captures the full set of reward-giving environments an AGI should be capable of comprehending.
Hah. Still got it.
But I have often wondered - why are people hung up on linear ordering? Why not non-total partial orders?
Is this insistence on linear orders because of simplistic modeling on the part of cognitive science and AI people, or is there some problem with general partial orders?
Since you mentioned higher dimensional representations for rewards, I want to remind that the sub-fields of Inverse RL and Model-based RL are concerned with reward representation and prediction by neural nets.
Also, it doesn't seem like a good idea to try to disprove an entire field with a purely theoretical (a-priori) argument. There should be at least some consideration given to the state of the art in the field.
I mean, humans are general intelligences, and you can translate pretty much any human reward into money, which is a real number.
The paper is super long though. Maybe someone can give a TL;DR that makes sense.
To repeat an example I posted for someone else: Suppose there's something called a "superdollar". If you have a superdollar, you can use it to create an arbitrary number of dollars for yourself, any time you want, which you can trade for goods and services. If you want, you can also trade the superdollar itself. Now picture an environment with two buttons, one of which always rewards you one dollar, and the other of which always rewards you one superdollar. Shoe-horning this environment into traditional RL, you'd have to assign the superdollar button some finite reward, say a million. But then you would mislead the traditional-RL-agent into thinking a million dollars was as good as one superdollar, which clearly is not true.
Actually I thought about this some more, and maybe money wasn't the best example, but I think there must be some internal measure of utility that humans use that can be represented by real numbers.
Imagine you are presented with an array of possible actions with associated (possibly estimated) rewards. You can only pick one. Maybe there are some doors but you can only open one - behind the first is $1m, behind the second is a superdollar, behind the third is a button that cures world hunger, behind the 4th is your loving family, whatever.
As a human I can pick one. No matter what the rewards are. Even if one reward is "you essentially become God". That means I can order them, and therefore that they can be represented by real numbers (plus infinity for the god option).
I don't see why the infinity would cause an issue: the "you can now do literally anything" reward is worth more than every other reward, but it's the only one. Also it doesn't actually exist so who cares?
Actually I guess it can exist in games, e.g. God mode in Quake. But that should have an infinite reward and agents should choose it over everything else so I can't see the problem really.
A lot of people have written quite a lot of arguments that this is false.
I think this is complete nonsense. Humans don't receive infinite rewards, either, but we still think about infinite numbers. We typically think about infinite numbers in terms of finite representations, like finite proofs about their properties. Reinforcement learning could in theory do the same thing.
The real reason that traditional reinforcement learning will not yield AGI is that the model is too limited. The current state of the art in reinforcement learning relies on representing the entire world and your entire strategy as a single vector. For most RL tasks we haven't even gotten deep networks working; RL is somewhat behind areas like vision or natural language processing.
The way forward isn't to worry about infinite numbers, it's to develop better architectures that let RL solve more problems.
Maybe you're right, but I don't think it's as obvious as you imply.
Consider this question. Can there exist hypothetical rewards x1,x2,x3,..., each one of which is significantly better than the previous, and at the same time another hypothetical reward y such that y is significantly better than each x_i?
If such rewards can exist, and if "significantly better" implies "at least +1 better", then that necessarily means y must be infinite.
Now, how would you argue that such rewards can't exist? If you say, "because then y would have to be infinite", then you're arguing in circles, assuming what you want to prove in the first place. It might well be that such rewards indeed can't exist, but if so, the argument would have to be different, and I suspect nontrivial.
Human-equivalent minds can obviously be implemented on top a framework that does not have inherently infinite or infinitely divisible values, because human minds are implemented on a top of a substrate that uses a finite amount of discrete neurotransmitters to do everything, and manages to work just fine.
If human minds can reason about infinite concepts using "hardware" where all the neural signals used to represent that concept are finite, then why should it be impossible a reinforcement learning system learn to reason about infinite concepts while using only finite numbers to form that representation?
This is like saying computers can't represent infinity because they have only finitely many bytes.
Suppose the treasury rewarded you a "superdollar", which is a special object that allows you to create any number of dollars that you want, on demand, as many times as you want. How many dollars would you say this superdollar is worth? Obviously, no finite number of dollars would be worth that one superdollar. The human mind can certainly understand the relative value of a superdollar vs. any number of dollars. That the human mind is implemented through finite electrochemical processes is irrelevant.
In a similar manner, the total amount of reward that you might ever get is capped by the amount of pleasure your brain can perceive at any given moment (which is finite) times your lifespan (which is finite).
You might reason and hypothesise about agents perceiving infinite rewards (as we are doing now), but this has nothing to do with the reality of homo sapiens rewards system(s), or, in fact, the rewards system of any agent existing in our physical reality, which is effectively bounded both in time and space.
If you have an environment which gives rewards of this type, and you want the model which gets the highest reward, you are likely to have an issue if you try to represent the rewards using a single floating point number. (well, you could just use the first entry, and do decently well, but you would lose out a bit on what could be accomplished on the second number.)
Of course, because there are, in actuality, only finitely many floating point values of a given precision, you can actually give an enumeration of the values of this type in order, and if you use that enumeration for the rewards, then that could work.
However, when we use floating point numbers, we sometimes sorta-pretend that they don't have a finite range of actual-number-values . We sometimes sorta-pretend that they are the actual real numbers (with a little fuzziness and errors tacked on, when we are being careful). We use them in computing "derivatives" and such. And this works pretty well!
But if we wanted to use the enumeration of the pairs of floats, if we just treated them as the (bigint) integer index of the pair, we would lose all the nice interpretation that goes with floating point numbers, and be left with only the ordering. The notion of the relative distances between the different values would be lost. The whole "use the derivative of this function (except we are using floating point numbers)" trick stops being applicable.
While adding together any number of copies of (0.0 , 1.0) would never surpass (1.0 , -5.0) , that's not really a reason to make it so we can't define the preferences we want to represent, uh, unable to handle that situation.
People can and do use statements like "death must be avoided at all costs" (it's also mentioned in the paper), but that statement simply is hyperbole which is not exactly literally true description of reality.
It's not saying you need rewards that are infinite in value per se. It's saying that you might have more situations that need to be differentiated from each other than there are real numbers. For instance, you might need to have one set of rewards that map to the real numbers, and then another set of rewards that also map to the real numbers when compared to themselves, but are all considered strictly greater than the first set.
The author demonstrates that a system without such a capability would not be able to solve a certain set of problems. However, going from that claim to a claim that this capability is needed for human level AGI is a non-sequitur - there is no evidence that such a capability is needed for human-level intelligence, and there's no evidence (at least not mentioned in the paper) that humans have the exact capability described.
I'm skeptical about AGI anyway. The proof is unnecessary, I tend to an even more reductionist model: Our lack of understanding where intelligence is, in the brain, goes to our failure to model it. "morally" making claims "its alive" based on a lack of understanding, is a bit like chemistry by alchemy. If you don't know why, you didn't make it any more than making a baby would have.
Feynman said it better: lots of physics proofs are built on partial models of sub-atomics, which if you ask questions about become unknowables too. Its turtles-all-the-way-down stuff.
traditional reinforcement learning will probably not yield AGI, any more than any current method, based on not understanding GI, and until we understand GI, I do not believe any connectionist, or learning model will derive it.
I think this kind of machine intelligence is already at reach of currents models, or we are close to be able to make an "e-cockroach".
I think we'll learn lots of stuff just seconds after having put in the world that kind of limited artificial intelligence.
And then the models, through sensors, will have the entire world at reach to begin auto-improvement tasks.
And this would solve the issue "we're just training the models with very limited data, how could they evolve faster than millions of years?"
I think the FAANG, the big players, realized this limitation a LOT time ago (10 years maybe).
And they are already trying some things to solve the limited data issue, giving their models all the information they can extract from cellphones, the current iteration of "massive network of sensors to train BIG - multiple, almost hidden from the public - models.
If I have to bet, I'd bet the whole information is being stored to be "replayed" when whole new more advanced models emerge eventually.
In some ways this way of thinking is too meta. In order to be a good mathematician, it is not necessary to understand the nature of mathematics from an outside perspective. This view can be very useful e.g. if you're working on foundations, but that doesn't mean before being good at mathematics one must be good at understanding the nature of mathematics. That seems like the job of philosophers, not mathematicians. (Well, sometimes the set has intersections, e.g. Brouwer, Hilbert, Godel and Penrose (theo. physicist) wrote some works on phil. of math).
EDIT: To express this slightly more formally: in order to understand a theory of a model, you do not need to understand the model comprehensively. You can be an expert in intelligence science by studying the falsifiable and predictive theories of intelligence science without understanding the nature of intelligence itself.
I'm quite fond of a terrible mathematical joke which this reminded me of.
"I'm worried about my nephew, I was trying to teach him to add numbers, but he can't even pronounce zermelo fraenkel set theory, how's he ever going to learn it?!"
This goes hand in hand with a functional definition of knowledge, where we judge whether a system or an agent "really knows" something through measuring success or failure in a variety of scenarios where knowing that thing (and properly applying that knowledge) is necessary to make an effective decision. Or, quoting Forrest Gump, stupid is as stupid does.
Of course AGI does not need to be "animal-like intelligence".
You can use natural numbers to represent real numbers with arbitrary precision. You can also use natural numbers to represent non-Archimedian number systems.
In fact, your entire comment is just an ordered sequence of natural numbers, and it does not seem to be very misleading (though it tried to trick).
Besides, the article is purposed to be a rigorous mathematical proof of current representations in RL being unsuitable for AGI, but I haven't seen a Definition for "misleading" there.
Well, obviously! That is basically the definition of cardinality!
Review my comment a few steps above about Big-O. The reasoning there shows the real numbers, despite having cardinality of the continuum, are inadequate for measuring even the following countable set of Big-O complexity classes: O(2^n) together with O(n),O(n^2),...,O(n^i),...
Is there something special about intelligence that implies we only need to consider RL-environments with real-valued rewards, and not, say, big-O-complexity-class rewards? Maybe there is. But if there is, the proof will have to be nontrivial---at a bare minimum, it should make some reference to what the reals are (e.g., the unique complete ordered field). This would be quite world-shattering, to real analysts if to no-one else. But none of the RL literature seems to make any reference to what the reals actually are, instead just taking it for granted that the reals are a magical number system as flexible as anybody could ever want. (Other authors essentially pointed this out before I did; see footnote 10 in my paper.)
Suppose the true environment has various tasks the agent can do which give big-O-complexity-valued rewards, one task giving a reward of O(2^n), and others giving rewards of O(n^i) for various i. For concreteness, say Task A rewards O(2^n), Task B rewards O(n^10000), and Task C rewards O(n^20000).
Now suppose you present this to the agent using real-valued rewards, say, where O(n^i) is replaced by arctan(i) and O(2^n) is replaced by arctan(2)+pi, as you suggest, then the agent will be deluded into thinking, e.g., that Task B and Task C give almost identical rewards (Task B gives reward 1.57069633 and Task C gives reward 1.57074633, which barely differ from each other at all). This is misleading because in the true environment, Task C gives much more reward than Task B. Yes, the agent understands Task C gives a bigger reward, but the agent totally mis-understands how much bigger :)
The agent might (or might not) find progressively more and more optimal solutions to the interpreted environment, but not to the true environment (unless by dumb luck), because it does not see what the true environment actually is. For example, in the concrete environment described above, you could suppose completing Task C takes more steps than Task B. Then the agent is likely to conclude, on the basis of the misleading arctan rewards, that Task C is not worth the extra steps. Depending on the extra steps, this conclusion could be badly false.