Hacker News new | past | comments | ask | show | jobs | submit login
To Understand Language Is to Understand Generalization (evjang.com)
91 points by ericjang on Dec 18, 2021 | hide | past | favorite | 38 comments



An AI could be said to understand language if it used language as one of a selection of tools to operate on itself, a peer or other being, or its environment. The idea of "meaning = co-ocurrance" overlooks things like need, cause, and effect that appear when language is used as a tool to operate on its environment.

Most of what I read about ML and AI is about creating these monolithic models that treat networks and clusters of neurons as a single entity, but that would be like treating a species of individuals with lifecycles as a single entity. The comment in the article about how GPT models are like a shadow compared to a 3D world suggests the bottleneck to evolving them is really us, as we're trying to make just one that emmulates many of us, instead of letting one loose on the internet to divide and proliferate to evolve millions where the best few will be exponentially better. Right now we're building expert systems that are individual specimens without an ecosystem.

There isn't yet a botnet of GPT nodes compromising machines and harvesting compute for training and evolving through participating in forums, but then again how would I know? (There's nothing worse than failing a modern catchpa and having a flash of existential dread at the stark possibility I may have indeed been a robot all along. Now I do them at random just to be sure.)


> instead of letting one loose on the internet to divide and proliferate to evolve millions where the best few will be exponentially better

I've said this before, what current AI agents lack is a dick (& pussy). If they had a dick they could have an internal goal to motivate their evolution, a goal not dependent on us anymore. The battlefield of self replication vs death is the great school of evolution, where humanity is currently the top student. AI only sent the likes of AlphaGo to the school.


You can program them to have a dick (or pussy). Program an AI to get the highest score in a game etc, and it will be more incessant than a teenager - it won't stop ever. What AI agents lack is animation - they are inanimate, they are software; they are machines. And will always be so.

We can (and do) labour under the impression of our senses that all there is in reality is physical matter. This is an erroneous assumption imo, but if that is your bedrock, you will struggle to understand why the damn machines can't do what we want. You can be unhappy about this but it won't change reality.

It is true metaphysics is hard to discern perhaps (by definition) but it won't change the reality that metaphysics is a genuine element of the human existence. In fact, its the most important part of human existence - we don't feel to be automatons after all, even if we make a pretence of it sometimes.

The best we will do with machines is to create a simulation of the human experience, one that might pass the Turing test even. And even then, despite all indications and evidence, the machine will not be animated by spirit.


This got me thinking. People like Joscha Bach and Douglas Hofstadter have posed ideas that we are programs running on a computer, which implies when we ourselves halt, there is nothing, but if we in fact were, there would be someone running that program and deciding whether to boot it up for something else, a set of basic reference rules to keep the program going, and a benevolent intention and desire of the operator to keep it running.

To keep the environment going and developing, you need incentives, so a notion of suffering is necessary, but not the purpose. The purpose would be to use a dynamic between the suffering and the rules to evolve something greater that sustains and improves the overall environment of the program and evolves through other instances of whatever individuals are.

An immortal soul concept works into this where it's like a trained module an operator could rely on to perform in a particular way, and it would get used or reused as was useful to the overall environment. It runs somewhere, or it doesn't, and the suffering piece is just the incentives of the environment it was placed into to develop.

These GPT models aren't sophisticated enough to fully emmulate us, but as a logical mirror and test harness that can reflect pretty much every metaphysical concept we can think up, they may be sufficient.

Sometimes I think of forum updoots as weighting for a language model, and some of these more metacognitive musings need suppression because they would be disruptive to the sustainability of the model. If there were javascript behind this comment form that also used timing from our keystroke patterns to weight paths between these words, it would imbue the model with a kind of "personality," that emmulated what was going on in our minds as a kind of black box.

How would one find out if HN data is being used in GPT or LAMDA models?


> These GPT models aren't sophisticated enough to fully emmulate us, but as a logical mirror and test harness that can reflect pretty much every metaphysical concept we can think up, they may be sufficient.

A mirror is a good metaphor. We don't think that the reality we see in a mirror is animated. So too, for the artificial intelligence.

There may be no difference in appearance, but in reality the difference is vast - so vast as to be beyond compare. There is no meaning in the mirrored-self, the reflection does not thinking or have ideas etc. If you struggle to find which of the 2 (you or the reflection) is the real one, using appearance only you will struggle. But then you can laugh and move on - this is not an issue to concern you, you can dismiss it - you know where the awareness is.


They do have a loss function, that is analogous to human desire.

I think we are fundamentally missing something, that there is an irreconcilable difference between a mathematical expression and actual conscious desire, and until we figure out what they is, we won't crack AGI


This is a neat idea, but I think it's missing a large and important area for generalization, and that's the process of seeking and exploring exceptions or counter-examples (see my other comments for examples).

Language defines things through subtraction, inversion, comparison, and contrast as much as construction and straightforward language.

Engineering and computer science rely too heavily on induction, but deduction and other non-linear processes are largely missing from these kinds of analyses/approaches. And until they are accounted for I don't think we'll reach any kind of true approach to generalization.


Nice post! I work on NLP and I think a lot of ideas in this post resonate with what I find exciting about working on the intersection of language + the real world: large text datasets as sources of abundant prior knowledge about the world, structure of language ~ structure of concepts that matter to humans, etc.

I feel like the bottleneck is getting access to paired (language, other modality) data though (if your other modality isn't images). i.e. "bolt on generalization" is an intuitively appealing concept, but then it reduces to the hard problem of "how do I learn to ground language to e.g. my robot action space?" I haven't seen a robotics + language paper that actually grapples with the grounding problem / tries to think about how to scale the data collection process for language-conditioned robotics beyond annotating your own dataset as a proof-of-concept. Unlike language modeling / CLIP-type pretraining, it seems (fundamentally?) more difficult to find natural sources of supervision of (language, action). I'd be curious about your thoughts on this!

> When it comes to combining natural language with robots, the obvious take is to use it as an input-output modality for human-robot interaction. The robot would understand human language inputs and potentially converse with the human. But if you accept that “generalization is language”, then language models have a far bigger role to play than just being the “UX layer for robots”.

You should check out Jacob Andreas's work, if you haven't seen it already - esp. his stuff on learning from latent language (https://arxiv.org/abs/1711.00482).


My hope is that sufficiently rich language models obviate the need for a lot of robot-language grounding data.

LfP (https://learning-from-play.github.io/) was a work that inspired me a lot. They relabel a few hours of open-ended demonstrations (humans instructed to play with anything in the environment) with a lot of hindsight language descriptions, and show some degree of general capability acquired through this richer language. You can describe the same action with a lot of different descriptions, e.g. "pick up the leftmost object unless it is a cup" could also be relabeled as "pick up an apple".

That being said, the LfP paper stops short of testing whether we can improve robotics solely by only scaling language - a confounding factor and central to their narrative was the role of "open-ended play data". We do need some paired data to ground (language, robot-specific sensor/actuator modalities), but perhaps we can scale everything else with language only data.

Thanks to the pointer on the Andreas paper! This is indeed quite relevant to the spirit of what I'm arguing for, though I prefer the implementation realized by the Lu et al '21 paper.


> We do need some paired data

A couple of under-explored rich sources of training data on actions are videos and code. Videos, showing how people interact with objects in the world to achieve goals, might also come with captions and metadata, while code comes with comments, messages and variable names that relate to real world concepts, including millions of tables and business logic.

Maybe in the future we will add rich brain scans as an alternative to text. That kind of annotation would be so easy to collect in large quantities, provided we can wear neural sensors. If it's impractical to scan the brain, we can wear sensors and video cameras and use eye tracking and body tracking to train the system.

I am optimistic that language modelling can become the core engine of AI agents, but we need a system that has both a generator and a critic, going back and forth for a few rounds, doing multi-step problem solving. Another must is to allow search engine queries in order to make more efficient and correct models, not all knowledge must be burned into the weights.


> My hope is that sufficiently rich language models obviate the need for a lot of robot-language grounding data.

I feel like this is “missing the trees for the forest.” In my experience, generality only emerges after a critical mass of detailed low-level examples is collected and arranged into a pattern. Humans can’t actually reason about purely abstract ideas very well. Experts always have specifics in mind they are working from.

So I'm not convinced leaving it to the model gets you anything new.


I feel that the (IMHO plausible) idea is that a sufficiently rich language model can enable transfer learning for robotics, where you can effectively replace a lot of robot-language grounding data with a small amount of robot-language grounding and a lot of pure language data.


I like the design of your website!

What do you mean when you say words are disentangled, standalone concepts? I see words as being very much related to each other.

I assume I may be misinterpreting what you mean by "disentangled, standalone concepts”.

Barbara Tversky's research seems to contradict linguistic relativism. I definitely don’t think language is the foundation of cognition.


Thanks!

Words are considered a "discrete unit of meaning", i.e. 3/4 of a word doesn't really mean much. So words like "red" and "grass" are "standalone" in the sense that the mean something by themselves. I agree that words are very much related to each other, in the sense that you can combine them.

I was trying to draw a connection that the "disentangled representations" ML folks often talk about are but a special few-word case of grammars for combining distinct concept.


Unfortunately, words aren't that simple, but it's close. Prefixes, suffixes, in-fixes, endings, etc., all have discrete meaning as well. And going into Asian language, this is much more obvious.

The discrete unit of meaning level is generally somewhere between a syllable and a word, with a few exceptions for shorter modifiers.

Unfortunately, in linguistics, the concept of a "word" is only as well defined as "planet" was pre-pluto losing its status.

Similarly when you look at riddles and crossword puzzle clues the idea of words being discrete also falls apart. Words, very much like variables in algebra only have meaning in relation to the other pieces of the context they are attached to.

While the mechanics (all the pieces of language, syntax and semantics are not discretizable. Just talk to anyone working on a dictionary.) you talk about don't seem to hold, I do think the idea you're talking about does hold.


Fair enough, I agree that if we really examine the comment "word as a discrete unit of meaning", the edge cases start to accumulate and the semantics rapidly break down. But barring things like prefixes/suffixes/modifiers/composite word characters in traditional Chinese, words are fairly discrete and generally regarded as the primary layer for expressing singular units of "meaning"


They are, but only because we don't have better language to express them. Similar to a lot of the problems with Chomsky's works the composability of language is only a subset of the whole breadth of what is expressable in a given language.

Or in other words, I believe the surface area of "edge cases" has a similar surface area as the rest of the language. The difference being they aren't invoked nearly as often because they require more effort and creativity.

Just look at the rise of words like "hangry". There are types of mashups that show up in creative uses of language that defy nearly any rule for any language you can come up with. In many languages, if you choose any of those supposed rules you can probably construct an algorithm to generate odd, but understandable words that defy that rule.


> Or in other words, I believe the surface area of "edge cases" has a similar surface area as the rest of the language. The difference being they aren't invoked nearly as often because they require more effort and creativity.

Edge cases or exceptions do tend towards being highly used; this is because language is more likely to change the more it's used, so the most highly used words/phrases/sentences/etc tend to accumulate changes. One example of this is that if a language has verb conjugation and irregular verbs, then odds are some of its most common verbs will be irregular.

> Just look at the rise of words like "hangry". There are types of mashups that show up in creative uses of language that defy nearly any rule for any language you can come up with. In many languages, if you choose any of those supposed rules you can probably construct an algorithm to generate odd, but understandable words that defy that rule.

There are rules for that that would work, weirdly enough. There are just a ton of them.


My only point here is that any framework for generalization needs to be able to account for and incorporate these kinds of "exception-seeking" cases. Similar to the same way that mathematics uses counter-examples to strengthen and reinforce the definitions chosen.


I agree with your comment "In many languages, if you choose any of those supposed rules you can probably construct an algorithm to generate odd, but understandable words that defy that rule." - it comes many forms, from Goodhart's Law to the "hot dog vs. sandwich" debate.

I do mention this in my blog post - although I think Generalization is Language, I don't think it's possible to create a formal framework of language, for precisely because of "adversarial examples" that can be supplied for any formal definition.

Natural language itself, ignorant of formality, is able to account for these exceptions insofar as language is sufficient for people to convey a bare minimum of meaning. I am proposing to define language and generalization via the implicit understanding of large language models, in the same way you might use an image classifier to define "cat images" or "hot dogs"


Hmm, I can understand the motivation. However, I feel it either won't work or will be very fragile because it's already part of the model because they're trained using natural language.

DL is already far from formal models, that's why deep learning “works.” And even at the current level of DL models, those exceptions are represented to some extent.

So ultimately, your idea is to push the models toward further generality, which in my option, will bake these “exceptions” deeper into the model.

And my question is, what does that mean for your idea? In my mind trying to exclude them would break what works. On the other hand, ignoring them means you can't direct development towards your goal because there’s no map from language to generalizations, so that you would be relying on random chance for progress.

If this is off in left field, let me know, but that's what I can see from your description.


The problem with "word", as with many terms in linguistics, is that it's a prescientific unit of analysis.

I certainly think most linguistic typologists would say that there is no cross-linguistic unit that corresponds to our intuitive understanding of word, which is really grounded mostly in orthography.

And I think it's fairly easy to show that orthography should not have much say in this matter, though. Of course you can't get around it in language didactics, but in scientific description we need to be very careful with it. Bob Dixon and Alexandra Aikhenvald give some examples from Bantu languages in their Word: A cross-linguistic typology. In Sotho, the sentence "We will skin it with his knife" is written "Re tlo e bua ka thipa ya gagwe", while in the orthographies for Zulu and Xhosa, the same sentence would be rendered as "Retloebua kathipa yagagwe". You really need to look at each language to find a sensible set of analytical categories, and be very explicit about your criteria, be they syntactic, semantic or phonological.


Linguistics has the distinction for what you're talking about: Morpheme versus word. Morphology is the study of this area. I freaking loved my Morphology classes.


While I think there's a generally accepted definition of morpheme (as the smallest distinctive unit), that doesn't give you a good definition of the word. (Because there isn't one.)

Funny you use the term morphology like that. To me it's basically synonymous with inflection, very traditional, where morpheme is very much a structuralist term. But all my teachers were cognitive-functional linguists, so everything was cut rather different and sometimes it's hard to talk.


Yeah, my morphology teacher was a structuralist, and this was quite a while ago, so I have no doubt I'm biased there. (I actually preferred the cognitive stuff I was introduced to; I really liked working with metaphor in their systems and syntax/phonology/morphology were less my thing than semantics and sociolinguistics.)

You're definitely right that the definitions aren't cut-and-dried and that makes typology rather difficult.


And there’s also multi word expressions (MWE), where the meaning of the whole is different than that of the sum of its parts. E.g. “out of the blue”, “bite the bullet”.


Yup. Going the other direction is a thing as well.


Actually, the discrete unit of meaning, linguistically, is the morpheme. It's a small difference, but it matters. Some words are morphemes, but not all, and not all morphemes are words.

Language, man. It's weird.


I can see how this could work in English. I’m not sure if there are other languages in which 3/4 of a word carries more meaning. (I’m a primary English speaker, so this concern could be unfounded.)


In many languages you have literally 3/4 of the word carry the meaning of the actual word and the remaining 1/4 sounds or letters devoted to grammatical markers for the gender/case/number/etc.

Using a classic Latin example from Monty Python, Romani ite domum / Romanes eunt domus;

the "Roman" part of of Romanes/Romani actually carries very much meaning and the -es/-i has information that's largely orthogonal to that.


All languages have something analogous to words in this way, although it can be hard to know where to draw the boundaries sometimes.

Technically the smallest indivisible unit that bears meaning is the morpheme, not the word. For example the word “cats” in English consists of two morphemes, cat+s. The first morpheme can stand on its own as a word, but the second can’t.


I agree, but I think the trickier part is that the semantics of words are even blurrier/more ambiguous than the syntax.


Yeah, hence the turn away from dictionary definitions and things like WordNet towards continuous distributional vector representations in NLP.

I don’t think you could really give an uncontroversial symbolic definition for any natural word.


What are your thoughts on the externalism of Putnam and Kripke, i.e. that meanings aren't just defined by use, but that they are also determined by objects themselves? It feels like that puts a crimp in meaning = co-occurance, but maybe not?


I agree.

Or put another way a set (or sets) of concrete examples grounds every abstract idea (including words as abstract objects). And it's turtles all the way down (or up depending).


This seems very similar to the research program led by the late Patrick Henry Winton: https://groups.csail.mit.edu/genesis/index.html

Besides, I wish that causality had been mentioned more than once in passing. Due to the existence of the ladder of causality, many important queries cannot be answered by mere observation, or even by intervention; such queries require counterfactual reasoning, and structural causal models generalize because they describe something that is very invariant in the world.


Does reverse is true ? To understand generalisation is to understand language ?


Not to understand generalization, therefore, is not to understand language.

Q E D.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: