Hacker News new | past | comments | ask | show | jobs | submit login

I think the wildest thing is actually Meta’s latest paper where they show a method for LLMs reasoning not in English, but in latent space

https://arxiv.org/pdf/2412.06769

I’ve done research myself adjacent to this (mapping parts of a latent space onto a manifold), but this is a bit eerie, even to me.






Is it "eerie"? LeCun has been talking about it for some time, and may also be OpenAI's rumored q-star, mentioned shortly after Noam Brown (diplomacybot) joining OpenAI. You can't hill climb tokens, but you can climb manifolds.

I wasn’t aware of others attempting manifolds for this before - just something I stumbled upon independently. To me the “eerie” part is the thought of an LLM no longer using human language to reason - it’s like something out of a sci fi movie where humans encounter an alien species that thinks in a way that humans cannot even comprehend due to biological limitations.

I am hopeful that progress in mechanistic interpretability will serve as a healthy counterbalance to this approach when it comes to explainability.. though I kinda worry that at a certain point it may be that something resembling a scaling law puts an upper bound on even that.


Is it really alien or is it more similar to how we think? We don't think purely in language, it's more a kind of soup of language, sounds, images, emotions and senses that we then turn into language when we communicate with each other.

I remember (apocryphal?) Microsoft's chatbot developing pidgin to communicate to other chatbots. Every layer of the NN except the first and last already "think" in latent space, is this surprising?

> You can't hill climb tokens, but you can climb manifolds.

Could you explain this a bit please?


I imagine he means that when you reason in latent space the final answer is a smooth function of the parameters, which means you can use gradient descent to directly optimize the model to produce a desired final output without knowing the correct reasoning steps to get there.

When you reason in token space (like everyone is doing now) you are executing nonlinear functions when you sample after each token, so you have to use some kind of reinforcement learning algorithm to learn the weights.


Links to Yan:

Title: "Objective Driven AI: Towards Machines that can Learn, Reason, and Plan"

Lytle Lecture Page: https://ece.uw.edu/news-events/lytle-lecture-series/

Slides: https://drive.google.com/file/d/1e6EtQPQMCreP3pwi5E9kKRsVs2N...

Video: https://youtu.be/d_bdU3LsLzE?si=UeLf0MhMzjXcSCAb


It's just concept space. The entire LLM works in this space once the embedding layer is done. It's not really that novel at all.

kinda how we do it. language is just an io interface(but also neural obv) on top of our reasoning engine.

It’s not just a protocol buffer for concepts though (weak wharf Sapir, lakoff’s ubiquitous metaphors). Language itself is also a concept layer and plasticity and concept development is bidirectional. But (I’m not very versed in the language here re ‘latent space’) I would imagine the forward pass through layers converges towards near-token-matches before output, so you have very similar reason to token/language reasoning even in latent/conceptual reasoning? Like the neurons that nearly only respond to a single token for ex.

There are lots of papers that do this



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: