The Matrix: A Bayesian learning model for LLMs

dosinga · 2024-05-04T11:32:25

Conclusion from the paper is:

In this paper we present a new model to explain the behavior of Large Language Models. Our frame of reference is an abstract probability matrix, which contains the multinomial probabilities for next token prediction in each row, where the row represents a specific prompt. We then demonstrate that LLM text generation is consistent with a compact representation of this abstract matrix through a combination of embeddings and Bayesian learning. Our model explains (the emergence of) In-Context learning with scale of the LLMs, as also other phenomena like Chain of Thought reasoning and the problem with large context windows. Finally, we outline implications of our model and some directions for future exploration.

Where does the "Cannot Recursively Improve" come from?

ajb · 2024-05-04T17:25:03

Looks like someone edited the title of this thread. I assume "Cannot Recursively Improve" was in the old one?

dosinga · 2024-05-04T17:49:46

Yeah, looks like it. This is better!

avi_vallarapu · 2024-05-04T21:32:23

Theoretically this sounds great. I would worry about scalability issues with the Bayesian learning models practical implementation when dealing with the vast parameter space and data requirements of state of the-art models like GPT-3 and beyond.

Would love to see practical implementations on large-scale datasets and in varied contexts. I Liked the use of Dirichlet distributions to approximate any prior over multinomial distributions.

programjames · 2024-05-05T00:00:40

I didn't read through the paper (just the abstract), but isn't the whole point of the KL divergence loss to get the best compression, which is equivalent to Bayesian learning? I don't really see how this is novel, like I'm sure people were doing this with Markov chains back in the 90s.

jksk61 · 2024-05-05T05:54:18

in fact it is nothing new.

drbig · 2024-05-04T11:31:31

The title of the paper is actually `The Matrix: A Bayesian learning model for LLMs` and the conclusion presented in the title of this post is not to be found in the abstract... Just a heads up y'all.

zoky · 2024-05-04T11:34:16

I don’t really care, I just want some vague reassurance that we’re probably not on the verge of launching Skynet…

toxik · 2024-05-04T11:30:20

Completely editorialized title. The article talks about LLMs, not transformers.

ShamelessC · 2024-05-04T21:39:38

[flagged]

dang · 2024-05-05T00:49:41

Could you please stop posting unsubstantive comments and flamebait? It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.