x_flynn's comments

x_flynn · 2025-05-24T04:44:03 1748061843

What the model is doing in latent space is auxilliary to anthropomorphic interpretations of the tokens, though. And if the latent reasoning matches a ground-truth procedure (A*), then we'd expect it to be projectable to semantic tokens, but it isn't. So it seems the model has learned an alternative method for solving these problems.

valine · 2025-05-24T16:53:03 1748105583

You’re thinking about this like the final layer of the model is all that exists. It’s highly likely reasoning is happening at a lower layer, in a different latent space that can’t natively be projected into logits.

refulgentis · 2025-05-24T04:51:58 1748062318

It is worth pointing out that "latent space" is meaningless.

There's a lot of stuff that makes this hard to discuss, ex. "projectable to semantic tokens" you mean "able to be written down"...right?

Something I do to make an idea really stretch its legs is reword it in Fat Tony, the Taleb character.

Setting that aside, why do we think this path finding can't be written down?

Is Claude/Gemini Plays Pokemon just an iterated A* search?

x_flynn · 2025-05-24T04:42:34 1748061754

It's more about the lack of semantic meaning in the intermediate tokens, not that they aren't effective (even when the intermediates are wrong)

x_flynn · 2025-05-24T04:41:20 1748061680

I like to think of the intermediate tokens as low-dimensional hidden states. Also see the Coconut paper/citation

x_flynn · 2025-05-24T04:40:44 1748061644

Exactly, the traces lack semantics and shouldn't be anthropomorphized. (I'm one of the students in the lab that wrote this, but not one of the authors)

theptip · 2025-05-24T16:53:07 1748105587

Thanks! So, how does this impact Deliberative Alignment[1], where IIUC the intermediate tokens are assessed (eg for referencing the appropriate policy fragment)?

Does you see your result as putting that paradigm in question, or does the explicit reasoning assessment perhaps ameliorate the issue?

[1]: https://arxiv.org/html/2412.16339v2