The scaling laws of large language models are very specific to language models a...

Calavar · on Aug 20, 2022

I'm not sure I agree that there is a distinction between scaling laws and a model being "capable of scale." RNNs are Turing complete, so from that perspective they should in theory they should be sufficient for AGI. But of course they are not because their scaling with regards to network depth and the length of sequences is abysmal. LLMs do scale with depth and sequence length, if their scaling laws with regard to dataset size prevent us from training them adequately, then we are stuck nonetheless.

I haven't heard of any groups who are studying data constrained learning in the context of LLMs, but that will probably change as models get bigger. And at that point, architectures with better scaling laws may be right around the corner, or they may not. That's the pain of trying to project these things into the future.

Jack000 · on Aug 20, 2022

The scaling laws for LLMs depend heavily on the quality of data. For example, if you add an additional 100gb of data but it only contains the same repeating word, that will hurt the model. If you add 100gb of completely random words, that will also hurt the model. Between these two extremes (low and high entropy), human language has a certain amount natural entropy that helps the model gauge the true co-occurrence frequency of the words in the sentence. The scaling laws for LLMs aren't just a reflection of the model but the conditional entropy of human-generated sentences.

RL is such a different field that you can't apply these scaling laws directly. eg. agents playing tictactoe and checkers would stop scaling at a very low ceiling.

jacquesm · on Aug 20, 2022

One possible risk I see is that with the amount of model generated text out there it will at some point inevitably result in feeding the output of one model into another unless the source of the text is meticulously traced. (My assumption is that that would hurt the model that you are trying to train as well.)