Does it just require a *lot* more training? Im talking about the boring stuff. C...

sjducb · on Jan 25, 2023

A human trains on way less data then an AI.

Chat GPT has processed over 500GB of text files from books, about 44 billion words.

If you read a book a week you might hit 70 million words by age 18

2OEH8eoCRo0 · on Jan 25, 2023

I disagree.

Starting from birth, humans train continuously on streamed audio, visual, and other data from 5 senses. An inconceivable amount.

danenania · on Jan 25, 2023

And prior to that was billions of years of training by evolution that got us to the point where we could 'fine tune' with our senses and brains. A little bit of data was involved in all that too.

wizofaus · on Jan 26, 2023

I'd argue that is the fundamental difference though - brains that were able to make good guesses about what was going on in the environment with very limited information are the ones whose owners reproduced successfully etc. And it's not unreasonable to note that the information available to the brains of our forebears is therefore in a rather indirect but still significant way "encoded" into our brains (at birth). Do LLMs have an element of that at all in their programming? Do they need more, and if so, how could it be best created?

mcswell · on Jan 26, 2023

You missed the point. ChatGPT trained on a gazillion words to "learn" a language. Children learn their language from a tiny fraction of that. Streamed visual, smell, touch etc. don't help learn the grammars of (spoken) languages.

hypertele-Xii · on Jan 26, 2023

> visual, smell, touch etc. don't help learn the grammars of (spoken) languages.

Of course they do! These are literally the things children learn to associate language with. "Ouch!" is what is said when you feel pain.

An ML model can learn to associate the word "ouch" with the words "pain" and "feel", but it doesn't actually know what pain is, because it doesn't feel.

OzCrimson · on Jan 27, 2023

Isn't it more complicated than that? "Ouch" can be a lot of things, and that's where a lot of problems crop up in the AI world.

If one of my friends insults another friend, I might say,"OUCH!" I'm not in pain but I might want to express that the insult was a bit much. If someone tries to insult me and it's weak, I could reply with a dry, sarcastic "ouch."

Combine that with facial expression and tone of voice and 'ouch' is highly contextual.

One problem with some of the tools used to take down offensive comments on social media platforms is that they don't get context.

Let's say that 'ouch' is highly offensive and you got into trouble for calling someone an "ouch." If I want to discuss the issue and agree that you were being offensive, I could get into trouble with the ML/AI tools for quoting you.

mcswell · on Jan 27, 2023

No. First off, I said grammar, not word meaning.

Second, saying "Ouch" is not even language. My cat says something when I step on her paw. That doesn't mean she understands language, nor that she speaks some language.

Third, you're right about pain, but an ML model can associate the word "red" with the color, and "walk" with images of people walking, and "sailboat" with certain images or videos, and plenty of other concepts. If that was what learning a language was, then AIs would understand language in lots of areas, if not in the specific domain of pain. But they don't.

sdiupIGPWEfh · on Jan 26, 2023

That has me wondering now.

It's absolutely true that children learn (and even generate) language grammar from a ridiculously small number of samples compared to LLMs.

But could the availability of a world model, in the form of other sensory inputs, contribute to that capacity? Younger children who haven't fully mastered correct grammar are still able to communicate more sensibly than earlier LLMs, whereas the earlier LLMs tend toward more grammatically correct gibberish. What if the missing secret sauce to better LLM training is figuring out how to wire, say, image recognition into the training process?

sdiupIGPWEfh · on Jan 26, 2023

It amuses me that this would be not unlike teaching an LLM with picture books.

Maursault · on Jan 26, 2023

> and other data from 5 senses.

It only makes your point stronger, but there are way more[1] than 5 human senses, not counting senses we don't have that, say, dolphins or other animals do. I can only name a few others, such as proprioception, direction, balance, and weight discrimination, but there are too many to keep track of them all.

[1] https://www.nytimes.com/1964/03/15/archives/we-have-more-tha...

ASalazarMX · on Jan 26, 2023

Last Christmas one of my nephews was gifted a noisy baby toy. I don't know what are his goals and constraints, but he's still training with it. Must have learned a lot by now.

VMG · on Jan 26, 2023

You can run the numbers with HD video and audio with high words-per-minute and you'd probably still be orders of magnitude below the model sizes

nitwit005 · on Jan 25, 2023

Imagine someone has the idea of strapping mannequins to their car in hopes the AI cars will get out of the way.

Sure, you could add that to the training the AI gets, but it's just one malicious idea. There's effectively an infinite set of those ideas, as people come up with novel ideas all the time.

mlboss · on Jan 25, 2023

Reinforcement learning should solve this problem. We need to give robots the ability to do experiments and learn from failure like children.

majormajor · on Jan 26, 2023

Need to make those robots as harmless as children when they do that learning too. ;)

"Whoops, that killed a few too many people, but now I've learned better!" - some machine-learning-using car, probably