Hacker News new | past | comments | ask | show | jobs | submit login

It doesn't matter, the most important factor is the training data. The network implementation, and sometimes even the topology don't matter as long as they have some basic properties, one of which is to allow pairwise interactions.

We have seen RWKV, S4, MLP-Mixer, T5, etc - they all give comparable results to vanilla GPT when trained on the same dataset. Similarly, there are no two people with identical neural wiring, but when they learn the same course the gain similar abilities. There are only small differences.

On the other hand it appears that even small models like Phi-1.5 when trained with "textbook quality" data perform 5x above their weight, and when you train a big LLM with 10x more data - GPT-4 was rumored to be trained on 13T tokens - its abilities are superior to all other models.

So better data or more data make a difference. Architecture makes little difference, controlling for model size. Interesting tidbit - since 2017 when transformers were invented, almost no change to the architecture was widely adopted, only a small change or two (postnorm and GELU activation in the feedforward), and not for lack of trying. There are hundreds of papers trying to invent a better transformer. Where are they now?

The brain is just inside a better data engine than AI models. That's the magic behind the brain. It's not some kind of special learning ability, it's continual learning from a persistent environment, one with the highest level of complexity and diversity. Humans can create causal interventions in the environment to test their hypothesis. AI models trained on the internet don't have causal intervention powers except when we provide embodiment and environment.

Humans also have access to many other humans, AIs train with a text corpus, not live humans. I reject the idea that brains have some special sauce. It's everything else around the brains that is better.




Thanks for putting language to a vague feeling I've had wrt "causal intervention", I tend to think there's no such thing as self or self-preservation/fear-of-death or desire-to-love that might facilitate actual agentic decision making without being embedded in a fight for survival against an adverserial environment.

As far as embodiement and environment goes, does simulating an environment get us anywhere? Agents given avatars in games practice causal intervention, no? Is that too a matter of training a model in a rich enough, accurate enough simulation? I guess that's the problem self driving cars face, and they at least have a programmed-in-fear of mortality, tho not its own, in the form of avoiding a fatal wreck.


> does simulating an environment get us anywhere?

Sure does, one of the few super-human AIs - AlphaZero, learned from an environment made of the go board and his opponent. Diverse exploration is essential, but ultimately everything was learned from that one bit of reward signal at the end of the game.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: