The information wasn't removed: it was *moved*. Whenever a human remembers somet...

og_kalu · on Nov 16, 2023

The comment was just to tell a fascinating story of the conceptual origins of what we have today. But the predictor Claude imagined actually works quite a bit differently than what we have today.

Yes Shannon argued meaning and semantic wasn't necessary but today, we know that our language models develop meaning and semantics. We know they build models. We know they try to model the casual processes that generate this data and implicit structure that was never explicitly stated in the text can find themselves emerging in the inner layers.

>LLMs do not use logic at all. The logic of invalidation is missing.

This is a fascinating idea that i see that just doesn't square with reality. In fact, this is all they do. What do you imagine training to be ?

Prediction requires a model of some sort. It need not be completely accurate, or how you imagine it. But to performantly make predictions, you must model your data in some way.

The important bit here is that the current paradigm doesn't just stop at that. Here, the predictor is learning to predict.

We have some optimizer that is tirelessly working to reduce loss. But what does a reduction in loss of internet skill data distribution mean?

It means better and better models of the data set. Every single time a language model fails a prediction, it's a signal to the optimizer that the current model is incomplete, insufficient in some way, work needs to be done, and work will be done, bit by bit. The models in a LLM at any point in time, A, is different from the models at any point in time, B during the training process but it's not a random difference. It's a difference that trends in the direction of a more robust worldview of the data.

This is why language model don't bottleneck on some arbitrary competence level humans like to shoe-horn it on.

There is a projection of the world in text. Text is the world and the language model is very much interacting with it.

The optimizer may be dumb but this restructuring of neurons to better represent the world as seen in the text is absolutely happening.

thomastjeffery · on Nov 16, 2023

> >LLMs do not use logic at all. The logic of invalidation is missing.

> This is a fascinating idea that i see that just doesn't square with reality. In fact, this is all they do. What do you imagine training to be ?

I could have been more clear, but I didn't want to write a novel. The ambiguity here is what they invalidate: memory reconstructions, not logical assertions.

An LLM can't tell the difference between fact and fiction, because it can't apply logic.

Better memory will never suddenly spawn itself the feature to think objectively about that memory. LLMs improve, yes, but they didn't start as a poor-quality equivalent to human thought. They started out as a poor quality equivalent to human memory.

> There is a projection of the world in text. Text is the world and the language model is very much interacting with it.

The language model becomes that world. It does not inhabit it. It does not explore. It does not think, it only knows.

og_kalu · on Nov 16, 2023

>An LLM can't tell the difference between fact and fiction, because it can't apply logic.

Not true. They can differentiate it just fine. Of course being able to tell the difference and being incentivized to communicate it are 2 different things.

GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets - https://arxiv.org/abs/2310.06824

thomastjeffery · on Nov 16, 2023

> Not true. They can differentiate it just fine.

I'm not convinced. I think the reality has been lost in its presentation.

It's always a human who makes the assertion of truthiness. The LLM does not make determinations or assertions: it only remembers them.

When we "train" an LLM, we structure it along our intentions. It never restructures itself. It never acts.