I worry that the "stochastic parrot" was premature, an idea sown early in development that will now carry along through any advances made.
Basically there is this innate idea that if the basic building blocks are simple systems with deterministic behavior, then the greater system can never be more than that. I've seen this is spades within the AI community, "It's just matrix multiplication! It's not capable of thinking or feeling!"
Which to me always felt more like a hopeful statement rather than a factual one. These guys have no idea what consciousness is (nobody does) nor have any reference point for what exactly is "thinking" or "feeling". They can't prove I'm not a stochastic parrot anymore than they can prove whatever cutting edge LLM isn't.
So while yes, present LLMs likely are just stochastic parrots, the same technology scaled might bring us a model that actually is "something that is something to be like", and we'll have everyone treating it with reckless carelessness because "its just a stochastic parrot".
"These guys have no idea what consciousness is (nobody does)"
Where do people get off saying no one has any idea what consciousness is? I agree that there is a significant sliver of a philosophical problem which remains stubborn (how precisely does physical activity produce qualia), but neuroscience knows quite a bit about what physical processes underlie our behavior from the behavior of individual neurons to the activity of the entire brain.
I object to the wholesale dismissal of neuroscience because thinking about the brain relative to LLMs is genuinely informative about what sorts of things you could expect to be going on in an LLM. And, to my mind, a real appraisal of the differences between brains and LLMs makes the case pretty strongly that LLMs experience nothing and are, furthermore, fairly well characterized as stochastic parrots.
"They can't prove I'm not a stochastic parrot anymore than they can prove whatever cutting edge LLM isn't." Prove is a very strong word, but I think its actually quite possible to demonstrate via scientific observation that you differ in many, significant, and relevant to the question of "being a stochastic parrot", ways, from LLMs. It astounds me that people routinely suggest that human brains and LLMs are somehow indistinguishable.
> I agree that there is a significant sliver of a philosophical problem which remains stubborn (how precisely does physical activity produce qualia)
But that IS the definition of consciousness! This is like saying "We understand practically everything about airplanes, except how they stay in the air."
Nothing discussed in neuroscience is relevant to understanding what consciousness IS (which is the question posed above). Finding out that stimulating such and such a region makes us sad, or that this bundle of nerves activates before we're consciously aware of a decision doesn't tell us anything about consciousness itself. We've known for hundreds of years that there is a relationship between the brain and consciousness, finding out more details doesn't answer the question.
(Now, whether consciousness is necessary for AGI is a separate question.)
> This is like saying "We understand practically everything about airplanes, except how they stay in the air."
Even that is a sufficient level of understanding to correctly determine that a motorcycle is not an airplane.
While we might not have a complete picture of what consciousness entails, we can at least list some necessary conditions for it to arise. Any system that lacks those conditions can at least be proven to not be conscious.
With LLMs specifically, I think there is a very strong argument that they are not and cannot be conscious at all, regardless of how big of a corpus you throw at it or how many parameters it has. Emily Bender explains it well here:
> Even that is a sufficient level of understanding to correctly determine that a motorcycle is not an airplane.
But are we talking about airplanes, or are we talking about "flying"? Airplanes fly, motorcycles don't. Do hot air balloons fly? Or is floating not flying?
Neuroscience tells us about human and similar consciousness. Maybe Ll biological consciousness, but maybe not even that. Are we sure we're not exploring a subset of consciousness though, and other variations exist that were unaware of and will catch us off guard because we haven't encountered them (or recognized when we have)?
I think that's the important question here, and it goes beyond LLMs, because whether they can or can not achieve consciousness doesn't mean something else will follow the same path.
> Are we sure we're not exploring a subset of consciousness though, and other variations exist that were unaware of and will catch us off guard because we haven't encountered them (or recognized when we have)?
I don't know of any general principle one could use to determine if system X has or doesn't have property Y if you don't at least have some definition of Y.
I believe it is eminently credible draw a strong association between consciousness and the physical activity of the brain since it is relatively well backed up by scientific observation that there is a one to one correspondence between conscious experience and brain activity. Although we still don't understand precisely how the physical activity creates qualia, I think its perfectly reasonable to say that studying and understanding brain activity constitutes studying and understanding consciousness.
We don't understand consciousness as it pertains to the underlying question; knowing that brain activity can produce consciousness does not get us any closer to knowing that a matrix multiplication can't.
True, but I don't dispute that in principle a lot of matrix multiplication could produce consciousness. I just suggest that an honest appraisal of brains and LLMs suggests little to no consciousness on the part of the latter.
If we use your metric, then an honest appraisal of brains and computers suggests little to no mathematical ability on the part of the latter either. If we assume that a similar medium or structure is necessary for similar results, then it should be highly improbable that a bunch of semiconductors could ever perform even simple math, since they are very structurally dissimilar to the human brain.
Only if you insist on thinking of brains and ICs as magical mysterious objects about which nothing can be said. We understand how both of these objects work to one degree or another. My point is that it is precisely the understanding of both phenomena which suggests that LLMs are not conscious or, arguably, intelligent.
The "to one degree or another" is doing all the work in this argument. Does my knowledge of how a full adder works now grant me the ability to discern malware at a glance? Similarly, should we start dismissing psychologists because your average neurologist can just cure depression and other mental issues? Should we do the same with sociologists or economists? Maybe even neurologists could be replaced by physicists or mathematicians.
Or maybe the abstract, high level understanding the brain provided by psychology is enough to explain its dynamic behavior? Maybe I can become an expert in IC design by learning React?
We know how neural nets work on a fundamental level and we know how they work on multiple levels of higher abstraction, yet explainability is one the biggest problems in machine learning right now. These models can solve complex problems which computer scientists long struggled to develop algorithms for, even though every aspect of them, except emergent behaviors due to complex interactions, is known by us.
The issue is that consciousness is a strongly emergent property - touching every level of abstraction and comprising patterns from the specific to the general. Knowledge of how a system works on the ground level or how it works on some coarse levels of abstraction, does not allow you to classify it as conscious or unconscious.
Additionally, consciousness is ill-defined. There is no agreed upon definition that is free of contradictions, does not accidentally include systems that we would not see as conscious or does not accidentally exclude a significant portion of humanity.
I invite you to think up some properties of the human brain that you would classify as essential for consciousness to emerge, and then try to think up exceptions. I'm very confident that you can come up with at least one for every single property.
Yes, yes, yes, but pretending that we know nothing at all about consciousness or that nothing can be said about the likelihood that an LLM has it or other properties is absurd in the extreme. There are genuine limits to stupidity.
Every account of the universe is grounded in brute facts, for which there is no justification. That hardly means we can't claim to understand things. I would say we understand consciousness more or less in the same way we understand nuclear physics: we have a very compelling ontologically flavored justificatory framework which both allows prediction and makes theoretical sense. We know quantum field theory is not the right theory of the universe. We may never know the fundamental theory. But it would be ridiculous to say "we know nothing about how the nucleus works."
Ah. I use the term "the subjective experience of consciousness". IE, what you are experiencing could really just be a VR drug hallucination, completely unrelated to anything else, or an epiphenomenon of a completely mechanistic universe.
> Where do people get off saying no one has any idea what consciousness is?
I'm not "getting off" saying that, but I do say it often.
For me, it's important to know:
If we think an artificial neutral network can have consciousness and we're wrong, then there is a risk of all the people who want to have their minds uploaded having a continued existence no better than the one of TV stars reproduced on VHS tape. There is also a risk of this being done as a standard treatment for lesser injuries, especially if it's cheaper.
If we think an artificial neutral network can't have consciousness and we're wrong, then there is a risk of creating a new slave class that makes real the fears of the Haitian slaves in the form of the Vodou concept of a zombie — not even death will free them from eternal slavey.
Well, that is an entirely different question. My personal view is that nothing prevents an artificial neural network from having a consciousness (I don't think it makes sense to believe there is anything magical about human brains).
What I am saying is that we emphatically know things about the physical processes that (almost certainly) generate consciousness and that we should take that knowledge seriously when examining artificial neural networks. People eager to attribute more to these networks than they plausibly constitute love to dismiss all this knowledge so as to muddy the waters of comparison.
> What I am saying is that we emphatically know things about the physical processes that (almost certainly) generate consciousness
I'm prepared to believe that people who aren't me know such things, but last time I asked a PhD in brain research about this (a while ago now), they seemed to disagree.
At least, assuming we're talking about the same usage of the word "consciousness" here — when it's defined as "opposite of unconscious" then sure we have drugs to turn that off, and also separately with the non-overlapping definition of "opposite of autonomous or reflexive"…
…but the weird thing where I have an experience rather than just producing responses to stimuli? If anyone knows about that, my search engine bubble hides it from me.
Came across an intriguing paper recently. It postulated that consciousness might emerge when an organism can form a predictive simulation of both its body and surroundings, an approach akin to model-based reinforcement learning (RL). This is distinctly different from merely reacting to the environment, a characteristic of model-free RL.
> What insects can tell us about the origins of consciousness
> I agree that there is a significant sliver of a philosophical problem which remains stubborn (how precisely does physical activity produce qualia)
But that "sliver" of a problem is known as the hard problem of consciousness for a reason [1], which is exactly the sort of problem neuroscience can only address in a limited capacity. Understanding how nerves propagate a signal to produce a sensory input (an "easy" problem of consciousness) doesn't inform us as to why certain physical mechanisms result in conscious experience (or more fundamentally what it even means to have a conscious experience).
To return to the topic at hand, a stochastic parrot generates grammatical, sensible language without understanding its underlying meaning. Of course, you can debate what it means to understand something; but for a person to vocalize an idea they understand, they must first somehow consciously process that idea. This is firmly a hard problem to which neuroscience offers limited guidance.
Of course, I'd agree that human beings aren't stochastic parrots -- if human beings were stochastic parrots, then what would it even mean to understand something? But I doubt you could use neuroscience to ascertain whether large language models are or aren't stochastic parrots. Indeed, depending on your definition of "understanding", consciousness might not even be a prerequisite, making the comparison to neuroscience moot.
Unfortunately, this is the thread that all these arguments begin: demonstrative ignorance of either the science, the research, or both. They originate in ignorance and because of ignorance, the response is terrible fear and prophecies of the End Times. It's no different than any other brand of fear.
There is no evidence that processing power = mind. None. There is no evidence that the human condition is any way related to some kind of terra firma of logic. In fact, there's considerable evidence that feelings are so entangled in the experience of humanness that the idea of divorce or separation is a false one. "Being human" is primarily a feeling experience that drives narratives, motivations: it underlies every single activity we engage in.
This is why people like Eliezer Yudkowsky and his ilk are so totally off the mark: it's no coincidence that the Less Wrong community and AI doomsayers can often be found on the same side of the aisle. Both camps believe in and idealize a distinct logic mind that can be attained. Funnily enough, it's still fear, a very human feeling, that is the basis for all these proclamations.
My worry is this camp garners enough influence to convince someone an AI doomsday is right around the corner unless immediate action is taken.
Buddhism says that this isn’t at all what being a human is. It says that the feelings and narratives are an aspect of the human mind, but not core to it. My conception is they are like feedback sources/optimizers and constraint engines. The worries, narratives, voices, chattering are self important feeling and trick us into believing they are who we are. However, a key aspect of vipassana and related mindfulness is building the ability to be aware of those processes from a part of us that has no voice, has no experienced feelings, but is able to be aware of them independently and control them. This source is where our agency derives from, and it does not have to be driven by feelings or the narratives, it is in fact able to suspend them and simply exist as it is. This is what is known as nirvana, which has attained weird mystical meanings in our culture, but essentially means attaining and maintaining a mind that has fully subordinated the “self” driving narrative and emotional soup. The loudness of feelings and our chattering monkey mind self support themselves internally as being “who we are,” but, again my own conception, their importance is an illusion they create internally. All this said, they are certainly a part of what makes us human, in so far as a foot makes us human. But you can function and live a full and complete life in the state of nirvana without losing anything.
In fact in my 30 year practice at one point I was scared to bring the practice into my daily lived life fearing being uncompelled by these processes and having a clear mind would make a robot or something - but the opposite was true. At some core level I knew my experiences and connections deeper than a feeling, and the people around me felt I was finally with them for the first time.
My point here is that the western conception of what it means to be a human is not particularly simple and it’s not the case, assuming thousands of years of Buddhist practice isn’t a crock, that our feelings and thoughts are the core of what it is to be human. Further - if they are illusions and feedback systems, they can be simulated as constraining feedback systems in an artificial mind just as easily.
I think the nature of what is human is much deeper in our minds, but because it’s not easy to examine like feelings and thoughts, I think we really do not understand it very well. This leads me to my long labored point - I agree with the original poster that we don’t understand consciousness. I believe we over estimate our understanding of what it means to be human. I do not however think our machines will achieve it either. But I don’t know why we need to make an artificial human. AI means intelligence, not human. A natural human takes 9 months and we have too many of them, let’s try for something different.
Meditation is useless for gainful activity. Maybe a way to relax and have artistic experiences, but not a goal worth putting much energy into. Better go do something, meet people, in general don't abstain from action. Spoken from personal experience, I have seen lives wasted with "spiritual" inaction.
Meditation in Buddhism is practice for living life in the present. It in itself is nothing, although it can be a pleasant experience and that’s nothing to dismiss. There are a ton of different uses of meditation and I won’t diminish any of them. However vipassana meditation is NOT meant to be an end or a goal, nor is it meant to be an excuse to avoid the world. It’s practicing building awareness of your thoughts, feelings, and physical state without becoming entangled and lost in them as we spend most of our lives. In fact, bridging the practice into day to day life is the ultimate goal. Buddhism definitely does not teach avoidance of the world, in any of its experiences good or bad. Quite the exact opposite - it teaches you to be entirely present in what actually is in the present happening. Finally Buddhism emphasizes a moderate path in everything. Meditating all the time is not good. Neither is constant activity. If you find yourself getting so attached to the pleasures you feel from meditation, you’ve entirely missed the point. It’s a tool, a method, a practice of isolating the inner mind, the awareness, from being enmeshed in illusions such as the chatter of your mind or the feelings of anxiety or depression or the pain of your cancer.
Likewise, the nobody knows what consciousness is mindset is smuggling in the idea that in order to know something at all, you must know it comprehensively. I know exactly what consciousness is based on my experience with it, even though I could not possibly give a comprehensive account of everything consciousness entails.
By analogy, I've been married for just shy of 20 years. I know my wife very well. I certainly do not know everything there is to know about her, but I do know her.
Scientific study of consciousness comes from ignoring our subjective impressions of our own consciousnesses, which might be illusory, and going only by what can be seen by other people. So you have experiments doing things like showing subjects subliminal images trying to probe the boundaries between conscious experience and unconscious experience. You start with results like "If we show this image to subjects for 50 ms it only has a slight effect on their behavior which fades out after a second, but if we show it to them for 60 ms it has a large effect for the rest of the experiment including being able to talk about it" and then you keep going from there.
This is kind of a stretch of an argument though. We could say the same about physics and any other sciences - everything is an abstraction at some level, but if this abstraction is reproduced by multiple independent types of measurement and is falsifiable, that is what we call scientific. I don't think LLMs pass this test.
I certainly wouldn't say that a traditional LLM is conscious. Once an input falls off of a LLM's input buffer it ceases to have any effect on its output, the exact same way that a subliminal stimuli's effects are limited in a human brain. The size, in bytes, of a LLM's input buffer isn't all that far off from a human brain's input activations either. So strictly feed-forward neural networks aren't conscious in this sense but its easy to image that architectural changes might provide an analogue to what a humans' consciousness provides.
> I know exactly what consciousness is based on my experience with it, even though I could not possibly give a comprehensive account of everything consciousness entails.
Not sure if personal experiences count. Generally, we laugh at people who talk about esoteric experiences.
So a simple explanation could be that consciousness is an illusion?
Or put differently, is there any phenomenon that needs the assumption of consciousness?
The way I experience myself could be just the history of experiences. So there is something that the brain can refer to.
So "the illusion" is like a GPT or StableDiffusion model making stuff up based on conditioning. Nothing mysterious, we have AI that can do that. The same simulator predicts not just how the world will evolve, but also actions and their estimated rewards. It's an imagination based planning system.
Bringing all perceptions together into the simulation, integrating them into the same reference system, and using them to imagine, plan, act and learn - that could be consciousness.
Yes, but why would such a system experience anything? I'm a died in the wool monist materialist but I can acknowledge that there is something tricky (at the very least) going on here.
We create representations from our sensorial data and values from our reward signals. We are part of a larger system and our actions are filtered and rewarded by the environment. We feed on this system to learn, basically we train on it.
> I know exactly what consciousness is based on my experience with it, even though I could not possibly give a comprehensive account of everything consciousness entails.
What is the price of that kind of knowledge? You don't even know where is the border between your knowledge and your absence of knowledge. How much can you tell about consciousness without stepping on "absence of knowledge" field? Pretty nothing, isn't it?
> By analogy, I've been married for just shy of 20 years. I know my wife very well. I certainly do not know everything there is to know about her, but I do know her.
I have a better example. I speak English for 20 years if to start counting from my first English lesson when I learned my first English word. You can find plenty of silly mistakes in my comments. But at least I know what I can express or understand and what I can not.
Here's a thought experiment: what are the qualities of a parrot that would make its consciousness different than that of a stochastic parrot? For that matter, what are the qualities that separate a parrot's consciousness from a human's? Or a human's from a pig's? Given the way we treat pigs, we clearly don't think consciousness in and of itself is worthy of any formal consideration, as long as the benefit we derive from exploiting it is high enough.
So, with AI, is it fair to say that anyone really cares whether it will develop qualities that make it seem as though it is an emergent consciousness? Why would we treat digital consciousness any better than we treat organic consciousness? What is the point of pontificating whether or not the type of thinking an AI does crosses an arbitrary threshold when that threshold only exists as a tool for creating useful outgroups?
However sophisticated the thing that our thinking is, it exists on a scale and we sit at an arbitrary spot. We treat thinking that occurs further down the scale as functionally irrelevant not because of any real distinction but because doing so has a high utility for our species.
So, the question of how we will treat a "truly conscious and sentient" AI has already been answered. Look at how we treat pigs. Good luck out there, HAL.
thank you for this. i am not well up on consciousness (or machine learning) and i have seen chatbots/llms hallucinate and such, and i have also seen them do amazing things. i have wondered to myself a few times lately: how do i know that im dissimilar in nature from these things?
so i ask you a followup question: what are some easy to understand ways in which a human's thought process would differ from an llm's behavior?
There are a lot. First of all, these LLMs do not learn in-situ. They are entirely static (apart from the prompt). To teach an LLM something new is an ex-situ process, more or less totally unrelated to the way it predicts. Contrast that with a brain: brains are constantly learning (in fact, it is difficult to imagine how a brain as we understand it could work without constantly learning).
In a related way, because we learn on-line and constantly, our brains have to also maintain goals, rewards and punishments, etc etc. We have neurons for all of the trivia of keeping us moving, seeking new input, generalizing it, throwing away bad information, etc. For an LLM all of that is external. The LLM doesn't have any reason to even distinguish between generation and training. All the weight updates are calculated by a (relatively simple) external process. Furthermore, LLMs are entirely _feed forward_. The input comes in, a lot of numbers are crunched, and then output comes out. There is no rumination (again, the analogy for rumination in an LLM is in the training process, which is not embodied in the LLM).
Much of the content of our consciousness is perceptions relating to all of these things. I think its possible that artificial neural networks may one day do enough of these things that I would admit they are conscious, but architecturally and fundamentally, I don't see any reason that an LLM would have them.
I also don't think even GPT4 is that intelligent (fantastic recall, though). It does an impression of a cognitive process (literally by printing out steps) but that doesn't seem compelling enough for me to imagine a theory of mind underneath. A model of text, sure, but not a mind.
Really enjoyed this response, and feel like I've developed a better understanding of some of the concepts relating to generative ML as it is used in LLMs.
An aside: I took a course on ML in a university a few years back, and it was interesting (it was an intro and survey course offered by the CompSci faculty), but difficult for me. I excelled at implementing using Keras/TF code in Python, and I had fun manually implementing some gradient descent algorithm but a lot of the math including all of the multi-var calc, stats, probability was quite difficult for me to wrap my head around, and I really didn't feel like I got a solid grounding on a meta-level of what we were doing or why. I have been reading a bit about LLMs and I think your post has filled in some of the gaps in what at this point I was really looking to understand.
This is a very interesting point that requires some examples and further elaboration to have value for the readers. It refutes but doesn't provide arguments. Can you please elaborate?
I mean… you experience stuff, right? "Qualia" is the word for that. We can tweak the definition, but I think it's pretty obvious that "subjective, conscious experience" [1] does in fact exist.
I have some basic observational evidence that it does exist and presumably you do as well. I do suspect that we are thinking about it in some fundamentally wrong way, but I reserve judgment.
I think that some LLMs (mainly just GPT-4) should be considered as refutations to the Stochastic Parrot idea, which was published in March 2021 and claims no LLM can have "any model of the world". It was a reasonable (though perhaps overconfident) paper for authors who had only used GPT-3 to publish, but there is now ample evidence of world modeling, including published academic evidence, for GPT-4. I think the following claim from the paper is also deeply incorrect and confused:
> an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.
Next token prediction is a function of tokens/words, but that doesn't preclude that prediction depending on meaning, and the best predictions obviously do depend on meaning. It is not clear, at least to me, that next token prediction leads to any kind of upper bound on intelligence. It is always possible to incorporate more of the descriptions of the world obtained through the training data into your predictions to improve them.
But I think you've missed an important distinction. The stochastic parrot claim can be false, not because LLMs can or will ever feel or be conscious, but because they can (today) reason and solve novel problems (the capability is there, but it is unreliable). LLMs are not probabilistically regurgitating their training sets; they're applying the learning they took away from those training sets.
I think GPT-4 can reason today, but I don't think it can feel or is conscious, and I don't expect it to be capable of those things in its current architecture.
Agreed. I think the stochastic parrot concept is useful to ground our expectations on LLMs for now, but it could outlive it's usefulness if there ends up being multiple jumps in sophistication similar to that of GPT-2 to 4 in the next 10 years.
If that happens, then stochastic parrot as an argument as to why a machine isn't thinking can be made pretty useless if One chooses to drag the argument further into philosophy.
I disagree. The criticism is _not_ that basic building blocks cannot be combined to produce something richer. The issue is the "without any reference to meaning" part of the quoted definition from Bender in that article. Models which are _only_ trained on text do not have a grounding to relate linguistic forms to anything else. When you know what an apple is, it's in part because you've seen and touched and tasted and eaten one. The model only knows how people talk about apples, and which texts are plausible, but not which ones are true.
But we're already getting past this with multi-modal models! Some really great work is being done which ties language processing with visual perception and in some cases robot action planning. A model can know how we talk about apples, can see where an apple is in a scene, can navigate to and retrieve an apple, etc. This lets us get at truth ("Is the claim 'the apple is on the book' true of this scene?") in a way which text-only models fundamentally cannot have. The point is, the way you get past the "stochastic parrot" phase requires qualitative structural changes to incorporate different kinds of information -- not just scaling up text-only models.
> They can't prove I'm not a stochastic parrot anymore than they can prove whatever cutting edge LLM isn't.
I can't prove you're not a stochastic parrot by only talking to you via text. But in person I can toss you an object and you can catch it which shows that you understand how to interact with a dynamic 3D environment. I can ask you a question about something in our shared environment, and you can give an answer which is _true_, rather than which is a plausible-sounding sentence. This is the difference between knowing what English texts or English conversations look like, versus knowing what states of the world are referred to by statements.
By your definition, is a blind person capable of reasoning about visual data? Is a deaf person capable of reasoning about auditory data? Can a physicist understand the molecules, atoms, & subatomic particles which he or she can only interact with via a fundamentally textual theory? I would submit that there's no fundamental reason why an LLM needs access to more than text to derive human-level world models.
I'm not saying that the current LLMs have derived human-level world models (they haven't). It's just that, to me, the theory that textual data is categorically not enough to do so is necessarily empirical. To back up the assertion, you'd need to construct metrics which present text-only LLMs fail to succeed with, and then you need to show how multi-modal LLMs did succeed with those same metrics. So far, I don't think adding multi-modality to LLMs actually has improved their general-purpose reasoning ability, which I consider evidence against this theory. But then I read people online just asserting it as though it's an obvious truth derivable from philosophical first-principles. It's odd to me.
> I disagree. The criticism is _not_ that basic building blocks cannot be combined to produce something richer. The issue is the "without any reference to meaning" part of the quoted definition from Bender in that article.
Thanks for pointing to that! I'm weirded out b/c this article from Bender in late May seemed so familiar. Here's a conversation from Feb in which a very similar argument is made, also using Thai text as an example:
https://news.ycombinator.com/item?id=34732971
You can say the same about humans, we only experience an approximation of the real world via our senses, never the “real thing”, so can we “truly understand” it? Yes, in the sense that we can reason about it and make and test predictions about the parts we can understand. The world we experience is based on our senses, and that’s what what we understand. A LLM’s world is text, and there’s no reason it doesn’t “truly understand” the concepts that it’s using any less than humans do
Stochastic parrots have nothing directly to do with consciousness. You might consider reading the paper or at least the definition on Wikipedia more carefully.
As far as your statement regarding consciousness goes, it's glib to say that no one has any reference point for what consciousness, thinking, or feeling are. We all have our own lived experience to draw on for intuition and guidance to inform our thinking, which is invaluable. We can relate our qualitative perception of these phenomena to other things in the world, where a reasonable person can form the hypothesis that "matrix multiplication" is unlikely to be conscious, to think, or to feel by dint of it being an abstract mathematical concept, since there is no precedence for an abstract mathematical concept exhibiting any of these qualities. Indeed, the only things in our lived experience which can plausibly be said to be conscious, to think, or to feel are biological organisms, of which a computer is not.
In a sense LLM's are a Rorschach test for people's beliefs about consciousness. If you believe consciousness to be an emergent phenomenon derived from simple deterministic biological processes, then it is not a big leap to believe LLM's to be on a roadmap to consciousness. If you instead believe consciousness is a supernatural phenomenon, then you will discard the very idea of a computer having a consciousness, because a machine could never be imbued with one by mere algorithms.
Tell me what your view is on the ability of LLM's to become AGI, and I'll tell you whether you believe in an immortal soul.
Roger Penrose wrote about this decades ago, arguing that computers cannot contain consciousness and neither can anything in currently understood physics.
It's weird seeing comments like this that argue simultaneously:
1) LLMs aren't stochastic parrots anymore!
2) You can't prove humans aren't stochastic parrots!
It pretty clear the whole point is minimize the difference between us and AI, but it does feel like you are undermining you argument by trying to work it from both sides. It reminds me some accused of crime who say both "I didn't do it!" and "If did it, it wasn't wrong!".
Humans aren't stochastic parrots. You can't "prove" this because it's not mathematical fact, but there is plenty evidence from study how to brain works to show this. Hell, it's even readily apparent from introspection if you'd bother to check. LLMs on the hand basically are stochastic parrots because they just autoregressive token predictors. They might become less so due to architectural changes made by the companies working on them, but it isn't going to just creep up on us like some goddamn emergence boogeyman.
Unless it is able to feel pain it remains a stochastic parrot and I wouldn't call it conscious or alive in any philosophical sense nor can one say it is capable of "feeling".
Frankly speaking, I think this is more on us for thinking that we are special or that living forms are special. This is just our ego talking.
Under the view that we are all just complexity arising out of an unfathomably large universe, then we can accept that LLMs are just that, like us, but weaker, and that is fine.
They will improve, we can leverage them, we can live with them. It's almost as if we have created a new species that exists only abstractly; and arises out of silicon and electrons.
> Basically there is this innate idea that if the basic building blocks are simple systems with deterministic behavior, then the greater system can never be more than that. I've seen this is spades within the AI community
I'm very surprised by this, because in essence, it's a flat-out denial of the emergence concept, no different from denying that atoms can ultimately lead to biological entities.
There is also a problem to me that "stochastic parrot" is too clever. It is too good of a name and evokes such a strong mental image. It is a great name for branding purposes but because of that it is a terrible name if we are actually trying to discover truth. It can't but help to become a blunt, unthinking, intellectual weapon and rhetorical device.
I have noticed people look at animal or machines and try to figure out if they are as complex as humans in order to figure out if they are conscious or not.
But almost never you see logic applied other way around, maybe we are just bunch of simple mechanisms convinced we are something way more complex.
There are quite few hints that the second option is the actual reality.
The future is made by those that look past what is and see what might be and fail half way to achieving it. The rest are mired in their attachments and will never escape a murky prison of what today appears to be.
>"stochastic parrot" is a term coined by Emily M. Bender in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?"
This might be the first time the term was seen in an ’official’ context, but is it really the origin? It feels like the term has been hovering around for longer, and even Google Trends shows significant search trends way before 2021
I see only a few peaks starting in 2008 that are in the double digit numbers. Could these just be queries containing the words "stochastic" and "parrot"?
For instance, there's this ecology paper from 2014: Influence of stochastic processes and catastrophic events on the reproductive dynamics of the endangered Maroon‐fronted Parrot Rhynchopsitta terrisi
Not sure what happens under the hood, but it wouldn't surprise me if people searching for this paper would show up under "stochastic parrot" in google trends even if that's not what they literally searched for.
I feel this way too. Maybe there's some very similar term that we're both thinking of though that is just at the tip of the tongue because I can't find what it'd be
Most people have never actually dealt with something like modern LLMs, so we haven't really developed the proper language to describe them and how they behave. It's either too simplistic and reductive (stochastic parrot, xerox machine) or presupposes sentience and intent ("fabricates","hallucinates", etc.)
Also "a blurry jpeg of the internet". LOL, all we need is "a series of tubes, not a truck" and we're set.
I think we are focusing on the model too much and miss the real hero - language. The corpus of text these models are trained on is a marvel of human creativity. This cultural artefact is the diff between primitive and modern humans. And it is the diff between a random initialisation and a trained GPT-4. Maybe the brain or the model don't matter, but what you train them on.
Even more, language is special. Ideas are self replicators, they have a lifecycle, they have evolutionary pressure to improve. Ideas travel a lot. No single human can recreate this knowledge, it is the result of massive search. I'd say more than 99% of human intelligence is based on applying ideas invented by someone else. So let's be more lenient on the parroting accusations. AIs can be smart if they get feedback, like AlphaZero, but without feedback they of course have to parrot.
I don't think this coinage is something to be proud of, given that the stochastic parrot analysis is now rejected by most top AI researchers, like Geoffrey Hinton or Andrew Ng. Even LLM skeptic Yann LeCun says LLMs have some level of understanding:
As far as I know, none of these 3 work specifically in NLP, most of their work is in image processing and to the best of my knowledge none of them have any background in linguistics.
Well, they are absolutely top AI researchers, so their opinion should count for a lot. If you specifically ask for people working on LLMs: Paul Christiano invented RLHF when he worked at OpenAI, and I'm pretty sure he also rejects the stochastic parrot analysis.
> They go on to note that because of these limitations, a learning machine might produce results which are "dangerously wrong"
I was initially thinking "well, yes, Nobel Prize for Stating the Obvious there", but looks like the paper was written in the far distant past of 2021, when LLMs were largely still in their babbling obvious nonsense stage, rather than the current state of the art, where they babble dangerously convincing nonsense, so, well, fair enough I suppose.
Amazing how fast progress has been there, though it's progress in an arguably rather worrying direction, of course.
Not to reduce the value of the insight, but since she coauthored the paper with Google employees she probably had access to models more advanced than those which were available to the general public
At that point, OpenAI was still fairly clearly at the babbling obvious nonsense phase; I would wonder was Google's stuff much better.
I also wonder if the original authors would have been surprised to learn that, by 2023, lawyers would be citing fake precedent made up by a machine. The progression to "dangerous nonsense" really does seem to have been worryingly fast.
I was really impressed with the work that Noam Shazeer was doing at Google before he left (I worked on TPUs and frequently had to debug problems at scale for researchers). It was clear he was making some pretty impressive improvements, but the results weren't super obvious even to most people inside google, and they didn't translate to externally visible projects.
This isn't that dissimilar to working at any sufficiently advanced R&D outfit, which strongly demonstrates the principle "the future is already here but isn't evenly distributed".
LLMs are not stochastic though, they are deterministic and dont even require random numbers, right?
The term in general seems to be unfortunate because the models seem to do more than parroting. LLMs are more like central pattern generators of the nervous systems, able to flexibly create well coordinated patterns when guided appropriately
My understanding is the opposite. The entire process results in a "score" over all output tokens, which is then converted into a probability of being picked, using a softmax that takes a temperature as a parameter. With a temperature of zero, the "best" token is always picked, but interestingly enough, that does not give optimal results. So sometimes you want the second or even third best. Thus, a "good" (GPT-like) LLM is intrinsically random.
To put it differently: You can make them deterministic by using a temperature of zero (then the output would be pretty bad and repetitive), or having a "better" temperature and fixing a random seed (then the output would be better, but it would only be deterministic in the same sense as a simulation of Brownian motion with fixed random seed).
So the randomness is not mandatory for the llm to work, it is just boring. This means that as a language model it still performs perfectly well in modeling language. We just give it some random saltyness for fun
I would guess the random step is not even mandatory: there is probably a way to replace randomness with a simplified function and still get interesting text. I can't run a simulation but there is no indication here that good randomness is needed.
Fundamentally the design of the transformer and especially its core which is attention based, does not require randomness, so to call it a stochastic model is a stretch
Training alone relies hugely on many factors (e.g. initialization of paramters, order of training data, hyper paramters, etc.).
In evaluation (afaik this applies to recent models as well) you pick the continuation based on chance and not always the "best". But evaluation is the result of the training process, so all the randomness from that factors in as well.
They are stochastic in the domain of meaning. Minor syntactic changes to the prompt or changes to the seed can result in substantial* changes to the meaning of the response.
*substantial as in nontrivial, not substantial as in massive
Sure, I don't think those are mutually exclusive with stochastic. A stable or well-conditioned model may just have an acceptably small standard deviation for the task at hand.
Difficulty to define rigorously does not preclude its existence or usefulness as model. The paper addresses how it feels humans are different from LLMs in reference to meaning.
Reliably conferring the same mental model to another entity regardless of syntactic differences, or just failing to do so in a way that isn't predictable by a bell curve. The paper makes the argument that humans modelling the mental state of their conversation partner is part of how reliable meaning is exchanged, something that LLMs are unable to do because it is completely absent from their training data.
The real question to me is: in the next decade, as ML researchers roll out progressively more sophisticated systems, we can expect that generative systems- which may actually be "only stochastic parrots"- are going to create works that would fool any reasonable human being.
At what point does a stochastic parrot fake it till it makes it? Does it even matter? We can imagine that, within 10 years, we'll have a fully synthetic virtual human simulator- a generative AI combined with knowledge base, language parsing, audio and video recognition, basically a talking head that could join your next technical meeting and look like full contributor. If that happens, will the Timnits and the Benders of the world admit that, perhaps, systems which are indistinguishable from a human may not just be parrots, or perhaps, we are just sufficiently advanced parrotS?
Seen from that perspective, the promoters of stochastic parrots would seem to be luddites and close-minded, as well as discouraging legitimate, important, and valuable scientific research.
Once you have a knowledge base connected to the language model, it's no longer a Stochastic Parrot, but something else entirely. The point of the paper is that simply continuing to scale up LLMs will not produce understanding, because a pure LLM has no connection between form and meaning. That link can provided in other ways, though (multimodal models, robot embodiment).
But these language models are implicitly trained on knowledge by being fed large amounts of factual text, which (I presume) allows it to generate text that is factual (statistically more frequently than hallucinating nonfactual information). So probably recent models (which were being trained around the time the parrots paper came out) are really implict knowledge models already. Obviously they don't have embodiment, and it's still unclear to me what level of true embodiment in the actual, real, physical world is required to make these models more than just "parrots".
In the end, it turned out the actual innovation was doing the opposite of what this paper recommended: scaling up the LLM, improving quality by throwing lots of data at it rather than curating, and limiting bias by RLHF rather than picking the right datasets.
The organizations that listened to these people for even some amount of time got hosed in this situation. Google managed to oust this flock from within but not before their AIs were so lobotomized that they are wildly renowned for being the village idiot.
Ultimately, this paper is a triumph of branding over science. Read it if you'd like. But if you let these kinds of people into your organization, they'll cripple it. It costs a lot to get them out. Instead, simply never let them in.
The long-term impact of this paper has confused me from a technical lens, although I get it from a political lens. I'm glad it brings up the risks from LLMs but makes technical/philosophical claims which seemed poorly supported and empirically have not held up -- imo because they chose not to engage with RLHF at all (which was deployed through GPT-3 at the time; and enables grounding + getting around 'parrotness'), and uses over-the-top language ("stochastic parrot") which seems very poorly to capture what it feels like to meaningfully engage with e.g. models like GPT-4.
> limiting bias by RLHF rather than picking the right datasets
This is the same as curation and picking out the dataset, except as post-processing. The reason why RLHF has to happen (and traumatize the people <https://www.bigtechnology.com/p/he-helped-train-chatgpt-it-t...>) is to address the problems by censoring the model.
Is it though? If you wanted to teach humans so that they don't develop unfortunate beliefs, would it be a good approach to just keep them from reading material that you find objectionable?
If you read a book that you disagree with, or one that contains falsehoods and bad reasoning as far as you can tell, would that make you believe those things?
The word "trauma" is getting overused. The idea of someone being traumatized by reading fictional text is just silly. It's unpleasant or gross at worst unless you already have other issues.
Everything we revile about online recipe websites that spend 1000 words about the history of cooking before getting to the point, will be part and parcel of AI-written anything. It won't be properly proofread or edited by a human, because that would defeat the purpose.
Yoshua Bengio, Andrew Ng, Anrej Karpathy, and many other of the top researchers in the field do not believe these models are stochastic parrots, they believe they have internal world models and prompts are methods to probe those world models. Stochastic parrots is one of the dumbest takes in AI/ML.
> Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.
The problem here is that there is currently no reliable way to extract information from this hypothetical world model. Language models do not always say what they "believe", they might instead say what is politically correct, what sounds good etc. Researchers try to optimize (fine-tune) language models to be helpful, honest, and harmless, but honesty ("truthfulness") can't be easily optimized for.
I’d argue that all these models are stochastic parrots because they’re not embodied in any way. There is no way they can actually understand what they are talking about in any way that is tied back to the physical world.
What these LLMs and diffusion models and such actually are is a lossy compression method that permits structural queries. The fact that they can learn structure as well as content allows them to reason as well, but only to the extent that the rules they’re following existed somewhere in the training data and its structure.
If one were given access to senses and memory and feedback mechanisms and learned language that way, it might be considered actually intelligent or even sentient if it exhibited autonomy and value judgments.
I’d argue that all these models are stochastic parrots because they’re not embodied in any way.
I do not think that this would really change much in itself. If you tell the model that crimson is a shade of green, it will learn something wrong whether it has a body or not. What you need is feedback on whether a response is correct or not, factually correct, not grammatically correct. Alternatively you have to teach the model to perform its own fact checking and apply it to its responses.
I think that maybe "truly understand" is anchored in the physical world. I don't exhaustively know what, say, grass is, but I know what it looks like, and I know what it feels like to walk on, and I know what it feels like to touch with my hands, and I know what it sounds like when I walk on it, and I know what it smells like when it's cut. And I know that there's a consistent correlation between "stuff that look like that" and "stuff that smells like that when it's cut".
And so if the topic of grass comes up, I have some firsthand knowledge to draw on - less than a botanist, but not nothing. I have some sense impressions that correlate to other sense impressions and to the word "grass". GPT, on the other hand, has some words that correlate to other words, and nothing more.
So it seems fair to say that I understand grass on a level that GPT does not, and cannot. Therefore it seems fair to say that GPT is at least closer to being a stochastic parrot than humans are.
And yet if you see some AstroTurf you'd still call it grass. In the end there is no "true understanding", there are just predictions we make about the world. Depending on how deeply you look, they are often incorrect, but also generally good enough.
GPT isn't quite at the good-enough point and being limited to only text, makes it impossible to reason about aspects of the world that are difficult to describe in text or simply weren't in the training data.
And more generally speaking, the claim that LLMs don't understand anything really doesn't hold up given how much they are able to hallucinate. If a LLM truly wouldn't understand anything, it wouldn't be able to generate plausible text, it would either generate nonsense or be limited to whatever was in the training data, but that's not the case. The LLMs can predict past their trained knowledge and predict stuff they haven't seen yet. Those predictions will sometimes turn out wrong, but so will the humans prediction that the AstroTurf is grass when taking a closer look.
Sure, you have experienced grass with more senses than just reading about it, but I do not think that this fundamentally changes anything. If I lied to you your entire life and told you that you are walking on or smelling grass while you were actually walking on moss, you would learn a similar mistake spanning several of your senses.
The idea that an entity can't be sentient because of a lack of senses has the problem where it invalidates the sentience of humans though. Do you consider a blind person having less sentience than a person that can see because they lack the sense of sight? Even if we consider sentience as an on/off switch, what about a person that has no senses at all (whether someone like that exists theoretically or in reality)? With no way to tie back thoughts to the real world, are they no longer sentient?
Obviously we don't know for certain if other humans are sentient, but it seems necessary to establish the premise they are in order to get anywhere in the argument for sentience of AIs. In this case, we need an argument about the sentience of AIs that coincides with our experiences of the sentience of humans, which this argument doesn't seem to do.
Even if we limit ourselves to thinking about people with all of their senses, there's still information that we cannot tie back to the physical world with our senses. Take someone who sits at a computer all day. They read news and talk about it online, without ever interacting with the news physically. Take someone who theoretically has never done anything outside of read and type on a computer all day. Are they not sentient because they've never physically interacted with the world outside of their computer?
> Do you consider a blind person having less sentience than a person that can see because they lack the sense of sight?
They still interact with an external world. An LLM doesn't, at all, not even a little bit. That's the crucial difference. A person will know when things didn't go as predicted, as the real world will provide feedback they can sense. An LLM in contrast has no idea what is going on, its past actions don't exist for it. There is only the prompt and the unchanging base model.
That said, this is not to disparage the abilities of LLMs, they simply were never designed to be sentient. If one wants an LLM that is sentient, one has to build some feedback into the system that allows it to change and evolve depending on its past actions.
A syntax-producing machine harnessing the power of duality will still never have access to semantic content. For this reason I have difficulty saying that it understands things beyond a colloquial sense.
It has to have some degree of autonomy to be useful. The current approach with ChatGPT to just have all the knowledge in the world directly in the base model not only doesn't scale, it would also run into issues with copyright if it could actually recite books and stuff word for word. A ChatGPT that can just use Google to look up the necessary information itself would be far more useful.
BingChat sort of tries that, but it doesn't really have any autonomy either, so it just summarizes the first Bing search result it gets. It would be far more useful if it could search around two or three layers depth into the search results to actually find what you are looking for.
In general current AI systems have the problem that you have to babysit them far to much. If you want to get specific answers, it's you that has to provide all the necessary context to make it happen, the AI can't figure out by itself what you want from past conversations.
One massive flaw of the Wikipedia modell is that the people who edit Wikipedia the most "aggressively" are the ones with the most emotional investment in the topic.
This can lead to very detailed articles written by very enthusiastic people. In other cases the people who are very pro/against the subject will be the ones who put in the most effort, especially on smaller/controversial subjects.
I have seen Wikipedia pages which basically read like ads for small companies.
"Meaning without reference in large language models"
"we argue that LLM likely capture important aspects
of meaning, and moreover work in a way that
approximates a compelling account of human
cognition in which meaning arises from con-
ceptual role"
TL;DR: the focus on the implementation details, and descriptions like this, are detrimental, even perilous,
because such accounts are both accurate, and deeply misleading.
This is description, but it is neither predictive, nor explanatory.
It implies a false model, rather than providing one.
Evergreen:
Ximm's Law: every critique of AI assumes to some degree that contemporary implementations will not, or cannot, be improved upon.
Lemma: any statement about AI which uses the word "never" to preclude some feature from future realization is false.
From the article: A "stochastic parrot", according to Bender, is an entity "for haphazardly stitching together sequences of linguistic forms … according to probabilistic information about how they combine, but without any reference to meaning."
It seems to me that the great success transformers are now enjoying is precisely due to the fact that 'probabilistic information about how they combine' _is_ meaning.
It's really not. Read the National Library of Thailand thought experiment to understand the difference. But this isn't saying that AGI is impossible, only that it can't come purely from LLMs, and that pure LLMs will remain stochastic parrots no matter how they are scaled up.
I agree. There's a quote in that paper about how ML models can never access meaning (semantics of words) because they only see the form (syntax and letters) and the two are somehow completely divorced.
It's obvious nonsense. I can describe a new concept to you using only words and letters and you can understand it. Therefore you can build up knowledge using only syntax.
Nobody is saying that LLMs understand the layout of a bus or the feel of leather, but they understand that buses are vehicles with four wheels that transport people etc.
This also relates to vision models. The existence of adversarial attacks (e.g. imperceptable changes in the image drastically changing the output) essentially demonstrate that the model has not reached the point at which the network "understands" the generalized concept it wants to disinguish.
Not really an example, there are many ways human vision is flawed and can be tricked, but none are on the level of these adversarial examples. There imperceptible differences between an image lead to a category error.
Human perception can be ambigous, but minimal changes never cause drastic category errors.
Basically there is this innate idea that if the basic building blocks are simple systems with deterministic behavior, then the greater system can never be more than that. I've seen this is spades within the AI community, "It's just matrix multiplication! It's not capable of thinking or feeling!"
Which to me always felt more like a hopeful statement rather than a factual one. These guys have no idea what consciousness is (nobody does) nor have any reference point for what exactly is "thinking" or "feeling". They can't prove I'm not a stochastic parrot anymore than they can prove whatever cutting edge LLM isn't.
So while yes, present LLMs likely are just stochastic parrots, the same technology scaled might bring us a model that actually is "something that is something to be like", and we'll have everyone treating it with reckless carelessness because "its just a stochastic parrot".