Emerging reasoning with reinforcement learning

krackers · 2025-01-26T07:49:24 1737877764

The real thing that surprises me (as a layman trying to get up to speed on this stuff) is that there's no "trick" to it. It really just does seem to be a textbook application of RL to LLMs.

Going from a base LLM to human instruction-tuned (SFT) ones is definitely an ingenious leap where it's not obvious that you'd get anything meaningful. But when we quickly saw afterwards that prompting for chain of thought improved performance, why wasn't this the immediate next step that everyone took. It seems like even after the release of o1 the trick wasn't apparent to everyone, and if it wasn't for DeepSeek people still might not have realized it.

NitpickLawyer · 2025-01-26T09:00:30 1737882030

> why wasn't this the immediate next step that everyone took.

It was actually tested by various labs. Just probably not at this scale. The first model that featured RL prominently was DeepSeek-math-7b-RL, published last year in april. It was at the time the best model for math, and remained so until the qwen2.5-math series, that probably had way more data put into them.

There's a thing about RL that makes it tricky - the models tend to behave very stubbornly. That is, if they see something that resembles their training method (i.e. math problems), they'll solve the problem, and they'll be good at it. But if you want something close to that but not quite solving it (i.e. analyse this math problem and write hints, or here are 5 problems extract the common methods used for solving, etc.) you'll see that they perform very poorly, often times just going straight into "to solve this problem we...".

This is even mentioned in the R1 paper. Poor adherence to prompts, especially ssytem prompts. So that is still challenging.

eden-u4 · 2025-01-26T16:01:53 1737907313

I think the issue with RL is that, in order for a model to perform well in a task, you have to make it stubborn. In the same way a student that thinks outside the scope of the task might not perform well in a graded exam, but that does not mean he/she is a bad reasoner. With RL and all training procedure you are creating a very focused and very fit to the task thinker, which might not be useful in all applications (consider an open problem, it might need an out of the box kind of thought).

HarHarVeryFunny · 2025-01-26T14:32:54 1737901974

Chain of thought prompting ("think step by step") only encourages the model to break the problem into steps, which allows it to incrementally build upon each step (since the output is fed back in as part of the input).

Reasoning requires more than chain of thought, since it's often not apparent what the next step should be - you (human, or model) may go down one path of reasoning only to realize it's going nowhere, and have to back up and try something else instead. This ability to "back up" - to realize that an earlier reasoning "step" was wrong and needs to be rethought is what was mostly missing from models that (unlike o1, etc) hadn't been trained for reasoning.

The reason non-reasoning models can't reason appears to be because this type of chain-of-consciousness thought (thinking out loud, mistakes and all) when trying to figure out a problem is hugely underrepresented in a normal training set. Most writing you find on the internet, or other sources, is the end result of reasoning - someone figured something out and wrote about it - not the actual reasoning process (mistakes and all) that got them there.

It's still not clear what OpenAI had to do, if anything, to help bootstrap o1 (special hand-created training data?), but basically by using RL to encourage certain types of reasoning pattern, they were able to get the model to back-up and self-correct when needed. DeepSeek-R may well have used o1 reasoning outputs as a bootstrap, but have been able to replicate RL training to encourage self-correcting reasoning in the same way.

One interesting aspect of DeepSeek-R is that they have shown that once you have a reasoning model, you can run it and use it to generate a bunch of reasoning outputs that can then be used as normal training data to fine-tune a non-reasoning model, even a very small one. This proves that, at least to some degree, the reason non-reasoning models couldn't reason is just because they had not been trained on sufficient self-correcting reasoning examples.

naasking · 2025-01-26T16:23:31 1737908611

> since it's often not apparent what the next step should be

Backtracking assumes depth-first search, which isn't strictly needed as you could explore all possible options in parallel in a breadth-first manner, but incrementally until one branch returns a satisfactory answer.

> This proves that, at least to some degree, the reason non-reasoning models couldn't reason is just because they had not been trained on sufficient self-correcting reasoning examples.

For sure this is a big reason, and probably also part of the reason they hallucinate rather than say they don't know or aren't sure.

HarHarVeryFunny · 2025-01-26T17:06:52 1737911212

> Backtracking assumes depth-first search, which isn't strictly needed as you could explore all possible options in parallel in a breadth-first manner

You could in theory, but it'd be massively/prohibitively more expensive - exploring a whole tree just to end up using a single branch. It'd be like trying to have a computer play chess by evaluating EVERY branching position out to some fixed depth, rather than using MCTS to prune away unpromising lines.

Not only that, but reasoning in general (of which LLM-based reasoning is only a limited case) isn't just about search - it can also require exploration and acquisition of new knowledge if you run into an impasse that your past experience can't handle. If AI systems hope to achieve human-level AGI, they will need to change to support continuous learning and exploration as a part of reasoning, which naturally means more of a depth-first approach (continue until you hit an impasse) with backtracking.

You can say that hallucination is due to gaps in the training set, but of course humans don't have that problem because we know what we know, and have episodic memories of when/where/how we learned things. LLMs have none of that.

mountainriver · 2025-01-27T02:54:29 1737946469

This was my takeaway as well, the paper was so simple I was shocked by it. We’ve been doing RL on LLMs for awhile now and it’s more surprising this didn’t happen sooner

qnleigh · 2025-01-26T08:03:21 1737878601

I've wondered this too, I really hope someone with more knowledge can comment. My impression is that people worked on this kind of thing for years before they started seeing a 'signal' i.e. that they actually got RL working to improve performance. But why is that happening now? What were the tricks that made it work?

attentionmech · 2025-01-26T12:05:07 1737893107

If you check failure section of their paper, they also tried other methods like MCTS and PRM which is what other labs have been obsessing about but couldn't move on from (that includes bigshots). Only team which I am aware which tried verifiable rewards is tulu but they didn't scaled it up and just left it there.

This sort of thing imo is similar to what openAI did with transformer architecture i.e. google invented it but couldn't scale it in the right direction and deepmind got busy with atari games. They had all the pieces still openai could do it. It seems to be it comes down to research leadership in what methods to choose to invest in. But yeah, the budgets big labs have, they can easily try 10 different techniques and brute force it all but seems like they are too opinionated in methods and less urgent on outcomes.

[paper] https://arxiv.org/pdf/2501.12948 [tulu] https://x.com/hamishivi/status/1881394117810500004

attentionmech · 2025-01-27T04:14:11 1737951251

I found the following thread more insightful than my original comment (wish I could edit that one). A research explains why RL didn't work before this: https://x.com/its_dibya/status/1883595705736163727

logicchains · 2025-01-26T08:06:18 1737878778

DeepSeek only recently invented GRPO, it's possible that was the final missing piece needed to make it viable.

nialv7 · 2025-01-26T08:26:29 1737879989

The group in this article used straight and simple PPO, so I guess GRPO isn't required.

My hypothesis is that everyone was just so stunned by oai's result so most just decided to blindly chase it and do what oai did (i.e. scaling up). And it's only after o1 people started seriously trying other ideas.

krackers · 2025-01-26T08:34:43 1737880483

I don't have any intuition here and am in no way qualified, but my read of the paper was that GRPO was mainly an optimization to reduce cost & GPUs when training (by skipping the need to keep another copy of the LLM in memory as the value network), but otherwise any RL algorithm should have worked? I mean it seems R1 uses outcome rewards only and GRPO doesn't do anything special to alleviate reward sparsity, so it feels like it shouldn't affect viability too much.

Also on the note of RL optimizers, if anyone here is familiar with this space can they comment on how the recently introduced PRIME [1] compares to PPO directly? Their description is confusing since the "implicit PRM" they introduce which is trained alongside the policy network seems no different from the value network of PPO.

[1] https://github.com/PRIME-RL/PRIME

attentionmech · 2025-01-26T08:13:19 1737879199

the tulu team saw it. but, yes nobody like scaled it to the extent deepseek did. I am surprised that the faang labs which have the best of the best didn't see this.

meiraleal · 2025-01-26T09:22:56 1737883376

> I am surprised that the faang labs which have the best of the best didn't see this.

After so many layoff rounds, they might have got stuck with the best at avoiding it.

__jl__ · 2025-01-26T13:06:42 1737896802

How do we know that they didn't see it? Their work is much more secret now. Isn't it possible that o1 and o3 rely on something similar maybe with some additions. Same for the gemini thinking models.

My point it that OpenAI and google might have been working with very similar approaches for months.

xbmcuser · 2025-01-26T08:30:34 1737880234

I think a lot of it had to do with DeepSeek need to use as fewer resources as possible why did it do this how can it do it in fewer steps using fewer resources. Where as most of the FAANG were looking at throwing more data and processing power at it.

logicchains · 2025-01-26T08:05:31 1737878731

I wonder if OpenAI did the same thing, or they instead took the approach of manually building an expensive, human-designed supervised learning dataset for reasoning. If the latter, they must be really kicking themselves now.

nialv7 · 2025-01-26T08:29:08 1737880148

I'd bet $5 that o1 was also built with either RL or search, or a combination of the two. That was what I initially thought when they announced o1-preview, after I saw the sample reasoning traces.

But alas I am just an ML enthusiast, not a member of some lab with access to GPUs.

ninetyninenine · 2025-01-26T05:55:57 1737870957

There was a whole bunch of people who claimed LLMs can't reason at all and that everything is a regurgitation. I wonder what they have to say about this. Like, what exactly is going on here with chain of thought reasoning from their expert perspective?

teej · 2025-01-26T06:41:36 1737873696

My mental model for chain-of-thought is not “reasoning”. It’s more of an iterative search through the latent space of the model.

qnleigh · 2025-01-26T07:36:32 1737876992

Can you elaborate? I think this is a really interesting question. It comes up over and over again, but it often feels like the two sides of the debate talk past each other. What does the mental model of 'iterative search through latent space' convey that 'reasoning' doesn't? Human reasoning also often searches through a space of potential solution methods and similar problems, and keeps applying them until making progress.

I appreciate that there might be danger in using words like 'thinking' and 'reasoning' in that they cause us to anthropomorphize LLMs, but if we are careful not to do so then this is a separate issue.

naasking · 2025-01-26T16:44:13 1737909853

"Search" a class of fairly well defined algorithms, "reasoning" is vague / ambiguous. If reasoning can be reduced to some kind of search, that makes its meaning more precise and understandable.

qnleigh · 2025-01-26T20:24:26 1737923066

Sure one word is more precise than the other, but in the context of this question that's beside the point. If one makes the claim 'it's not reasoning, it's just X,' the burden of proof is that X is district from reasoning. That 'reasoning' is an ambiguous term, long makes it harder to argue that X is not compatible with. I was pointing out that, at least superficially, they seem potentially compatible.

Again I'm not at all trying to argue that LLMs do reason. I'm trying to understand how one definitively argues that they don't.

valine · 2025-01-26T07:15:23 1737875723

And human reasoning is somehow more magical? Really struggling to understand the distinction between searching through a latent space and “reasoning”.

sgt101 · 2025-01-26T08:26:35 1737879995

It would be startling if humans could reason this way with two kilos of meat consuming 40w. Even more surprising given that we get answers in ~1s with a cross brain comms time of about 0.25s

To me it's clear that human reasoning is different from a massive search of a latent space. I can say that when I am thinking I maybe try half a dozen ideas or scenarios, but very rarely more. I can't say where those ideas come from or how I make them up though. Maybe we can't frame what it is and how it works with human languages though, which might make it seem magical in some way.

Or maybe there's a good framing that I don't know - would love to learn!

naasking · 2025-01-26T16:39:26 1737909566

> I can say that when I am thinking I maybe try half a dozen ideas or scenarios, but very rarely more

You mean you consciously try half a dozen ideas. That's not really reflective of what's happening subconsciously though, which could be a broader kind of search that filters out other possibilities and only bubbles up some options you would consider promising.

daveguy · 2025-01-26T16:54:42 1737910482

It could be a matter of massively parallel computation (trillions of synapses functioning at once) plus efficiency of chemical-analog vs transistor. But, personally I believe, there are a lot more algorithmic issues between what we are doing now and what the brain is doing. Even if it's just a matter of scale, we are at least a million fold away from the bandwidth and processing power of a single human brain vs exaflop supercomputers, especially in efficiency. Closing that gap, will require 20 years of Moore's law + bandwidth + efficiency + equivalent of 18 years grounded training (probably faster than 18y). We are not close to having our most powerful computers as flexible and general purpose intelligence as a single human. The 20 year limitation is based on processing, efficiency, and bandwidth. Hopefully the algorithms will catch up by then.

valine · 2025-01-27T05:25:27 1737955527

Even if everything you said is true it doesn’t mean general AI is 20 years away. Computers are many orders of magnitude faster at non-parallel tasks. Who’s to say raw clock speeds won’t easily compensate for any deficiencies vs the brain.

You can run a trillion parameter model on a single machine today. We’re maybe 4 or 5 years away from that same 1T model running on an iPhone.

numba888 · 2025-01-26T08:46:17 1737881177

Yes, it has at least two distinct parts: consciousness and sub. First is visible to 'us', it's inner monologue or vision, or other senses. But we don't 'know' what happens in subconsciousness. The answer just pops up from nowhere. 'Reasoning' LLMs for now have the first part. The 'sub' part is questionable, depends how you look at latent space.

Exoristos · 2025-01-26T07:25:57 1737876357

As you're demonstrating, it depends on the human.

error_logic · 2025-01-26T08:52:52 1737881572

Indeed, some of us are more willing to believe ourselves somehow special and block out our own irrationalities and missteps.

There is a deep need in some people to ignore their own flaws so they can present themselves as more competent. More reasonably, people try to have a thought ready before starting to talk, or will stop and think again if they notice a contradiction in what they're saying.

Other people will keep right on going and double down if they've made a mistake, refusing to acknowledge the opportunity for learning and blocking out inconvenient contradiction.

Interestingly this is itself quite relevant to the AI alignment problem. Competence requires being able to accurately reflect [on] the patterns, whereas goal-seeking and morality require ignoring some options as costly/dangerous distractions--but not to the point of incompetence. Balancing the two is tricky, and neither people nor governments have solved this universally, let alone an AI.

exe34 · 2025-01-26T07:48:39 1737877719

no, humans do the same thing - even the "intuition" of things to try are probably the results of searches, but we don't have conscious access to them. certainly it's the simplest explanation that fits the observations.

sgt101 · 2025-01-26T08:27:21 1737880041

Given the observations of the workings of brains though (slow, low energy) we're back to some magic to make this happen?

exe34 · 2025-01-26T09:21:27 1737883287

does a machine with ten trillion components need magic to function? maybe. but probably not.

dyauspitr · 2025-01-26T16:11:53 1737907913

That sounds like Sheldon from BBT describing human reasoning.

csomar · 2025-01-26T09:46:57 1737884817

They sure do have a certain amount of "reasoning".

Here is R1 trying to multiply a large number (successfully): https://gist.github.com/omarabid/038678cc269a3f2db756a7e0825...

If you pick a random combination, there is a very good chance that the combination and the product do not exist anywhere. So the LLM has to "create" it somehow.

It sure goes through a lot (hundreds of lines of self-reflection) but it successfully does the math.

I don't think it is the same kind of "reasoning" as humans, but there is an emergent kind of structure happening here that is allowing for this reasoning.

mkl · 2025-01-26T11:16:33 1737890193

I think it is very human-like reasoning. I reason exactly like this when doing numerical calculations in my head, and I'm a mathematician (no, I can't work with numbers this big in my head).

It's quite funny where late in the piece it says it's checking with a calculator, which a human would do if possible (if they didn't start out with that) but then its statements are pretty much the same as before, and it probably didn't actually use a calculator.

Jensson · 2025-01-26T10:18:18 1737886698

That is like saying a calculator is reasoning since it spends many cycles thinking about the problem before answering. You could say yes a calculator is reasoning, but most would say a calculator isn't really reasoning.

> there is an emergent kind of structure happening here that is allowing for this reasoning.

Yes, but that structure doesn't need to be more complicated than a calculator. The complicated bit is the fuzzy lookup that returns fuzzy patterns (every LLM does that though so its old), and then how it recourses that, but seeing how that can result in it recursing into a solution for a multiplication is no harder than understanding how a calculator works.

So to add basic reasoning like this you just have to add "functions" for the LLM to fuzzy lookup, and then when looking up those "functions" it will eventually recourse into a solution. The hard part is finding a set of such functions that solves a wide range of problems, and that was solved here.

daveguy · 2025-01-26T17:02:28 1737910948

I think that demonstrates how far we are from reasoning and from true self-reflection. If either were happening it would know that it has the capability to multiply four numbers in nanoseconds and know that it doesn't even matter the order it multiplies them. The first reasoning step, I have 4 numbers to multiply, should be the only one necessary.

Vampiero · 2025-01-26T11:11:06 1737889866

The test is very simple but people simply don't realize it.

When LLMs are good at Prolog, it means they're good at logic, which means they're good at reasoning. Until then, you can't trust them.

daveguy · 2025-01-26T17:06:08 1737911168

What exactly do you mean by "good at Prolog"?

Vampiero · 2025-01-26T17:22:54 1737912174

It means being good at first order predicate logic. And possibly higher order too when you consider `call/n` and lambdas. It means being good at generalization, at reasoning in causal terms, at understanding structure and grammar, at encoding problems as graphs and querying them for solutions, and much more.

Basically it's what current LLMs lack. They're good at spewing coherent text but they lack the building blocks of reason, which are made of logic, and which confer the quality of being consistent. A implies B.

calibas · 2025-01-26T06:37:19 1737873439

I don't get this whole debate, surely what's meant by "reason" can be strictly defined and measured? Then we can conclusively say whether or not it's happening with LLMs.

It seems to me like the debate is largely just semantics about how to define "reason".

ninetyninenine · 2025-01-26T06:41:18 1737873678

It's semantics. But there's a general motivation behind it that's less technical. Basically if it can reason, it implies human level intelligence. That's the line separating man from machine. Once you cross that line there's a whole bunch of economic, cultural and existential changes that happen to society that are permanent. We can't go back.

This is what people are debating about. Many many people don't want to believe we crossed the line with LLMs. It brings about a sort of existential dread. Especially to programmers who's pride is entirely dependent upon their intelligence and ability to program.

bookofjoe · 2025-01-26T13:47:55 1737899275

>Frontier AI systems have surpassed the self-replicating red line

https://arxiv.org/abs/2412.12140

TheTaytay · 2025-01-26T16:23:47 1737908627

I hadn’t seen this one. Fascinating. Thank you!

sgt101 · 2025-01-26T08:33:30 1737880410

We've had "reasoning" machines for a long time - I learned chess playing against a computer in the 1980's.

But we don't have reasoning that can be applied generally in the open world yet. Or at least I haven't seen it.

In terms of society it should be easy to track if this is true or not. Healthcare and elder care settings will be a very early canary of this because there is huge pressure for improvement and change in these. General reasoning machines will make a very significant, clear and early impact here. I have seen note taking apps for nurses - but not much else so far.

akomtu · 2025-01-26T20:32:30 1737923550

It's not intelligence that separates us from machines, but "connectedness to the whole." A machine becomes alive the moment it's connected to the whole, the moment it becomes driven not by an RNG and rounding errors, but by a spirit. Similarly, a man becomes a machine the moment he loses this connection.

The existential dread around AI is due to the fear that this machine will indeed be connected to a spirit, but to an evil one, and we'll become unwanted guests in a machine civilization. Art and music will be forbidden, for it "can't be reasoned about" in machine terms, the nature will be destroyed for it has to no value to the machines, and so on.

eastbound · 2025-01-26T07:47:58 1737877678

It’s not about being afraid, it’s that the auto-reconfiguration of neurons seems advanced to decompile it at this time, and it surprising that LLM, which are just a probabilistic model of guessing the next word, could be capable of actual thought.

The day it happens, we’ll believe it. There are only 100bn neurons in a brain, after all, and many more than this in modern machines, so it is theoretically possible. Just LLMs seemed too simple for that.

qnleigh · 2025-01-26T07:51:20 1737877880

I think it's really hard to pin down what reasoning is and measure it precisely. How on earth would you do this?

daveguy · 2025-01-26T17:11:50 1737911510

The best example I have seen for true AGI benchmarking is ARC-AGI:

https://arcprize.org/

As the ARC-AGI is more trained to we put more of our own expertise and knowledge into the algorithms. When a computer can get human level results on a new ARC-AGI like benchmark (transfer from other intelligence tasks), then we are very close.

qnleigh · 2025-01-26T20:13:20 1737922400

The creators of the dataset have said themselves that it does not imply AGI

> it is important to note that ARC-AGI is not an acid test for AGI – as we've repeated dozens of times this year. It's a research tool designed to focus attention on the most challenging unsolved problems in AI

https://arcprize.org/blog/oai-o3-pub-breakthrough

daveguy · 2025-01-26T22:08:08 1737929288

That was the whole point of the entire second paragraph.

littlestymaar · 2025-01-26T08:42:19 1737880939

> There was a whole bunch of people who claimed LLMs can't reason at all and that everything is a regurgitation. I wonder what they have to say about this.

I don't see that as a refutation of the former actually: model trained to be stochastic parrots with next-token prediction as only learning target were indeed stochastic parrots and now we've moved to a completely different technology that features reinforcement learning in its training so it will go farther and farther from stochastic parrots and more and more towards “intelligence”.

If anything, the fact that the entire industry has now moved to RL instead of just cramming through trillions of tokens to make progress is a pretty strong acknowledgement that the “stochastic parrots” crowd was right.

dartos · 2025-01-26T06:02:11 1737871331

You can still regurgitate a chain of thought response…

It’s all still tokens…

CamperBob2 · 2025-01-26T06:12:03 1737871923

You can still regurgitate a chain of thought response…

You people are so close to getting it. So close to understanding that you're the ones doing the regurgitating.

fmbb · 2025-01-26T06:22:09 1737872529

Why do you believe that is how humans reason?

mrbungie · 2025-01-26T06:39:43 1737873583

Neuroscience and evidence be damned, [the brain is a computer] is being transformed into [the brain is an LLM]. Happens with every new technology.

Edit: What is happening with people these days? They seem to be reducing people and their minds to machines more easily than ever.

ninetyninenine · 2025-01-26T06:54:48 1737874488

The brain is a neural net. And so is the LLM.

Jensson · 2025-01-26T07:11:05 1737875465

> The brain is a neural net

No it isn't, neural networks aren't a network of neurons, they are just named after neurons but they are nothing alike. Neurons grow new connections etc and do a whole slew of different things that neural networks doesn't do.

The ability to grow new connections seems like an integral part to intelligence that neural networks can't replicate, at least not without changing them to something very different than they are today.

ninetyninenine · 2025-01-26T17:18:31 1737911911

The brain is a neural net. It’s made up of neurons that are networked. You’re just pointing out differences between a LLM and a brain. The fact that they are both neural nets is a similarity.

Forming new connections Is not integral. Connections don’t form in seconds so second to second existence is a static neural network. Thus your second to second existence is comparable to an LLM in the sense that both neural networks don’t form new connections.

We can though. We can make neural networks form new connections.

Jensson · 2025-01-26T18:06:18 1737914778

> The brain is a neural net. It’s made up of neurons that are networked

The brain neurons are not the same kind of object as neural net neurons, hence its not a neural net. It is like saying a dragonfly is a helicopter since the helicopter were designed to look a bit like a firefly, no a dragonfly isn't a helicopter they fly in totally different ways the dragonfly has wings and thus don't even move similarly.

> Connections don’t form in seconds so second to second existence is a static neural network

But a human isn't intelligent over a second, and synapses do form over longer periods which is what it takes for humans to solve harder problems.

> We can though. We can make neural networks form new connections.

I said the neural network couldn't form those, not that we couldn't alter the neural network. That is a critical difference, our brains updates itself intelligently on its own, that is integral to its function, if our brains couldn't do that we wouldn't be intelligent, when that happens we say a person has Alzheimer's. You wouldn't hire a person with far gone Alzheimer's precisely because they can't learn.

Edit: So you can see in brains being able to form connections makes it smarter. In our neural nets when we try to alter their connections live they get dumber, which is why we don't do it. That should tell you that we are clearly missing something critical here.

dartos · 2025-01-27T02:12:39 1737943959

> The brain is a neural net. It’s made up of neurons that are networked

Fun fact. Neural nets are not made of neurons and more than the python programming language is made of snakes.

mrbungie · 2025-01-26T07:17:11 1737875831

It isn't that simple, the substrate is totally different. The brain is not a GPU, not by any stretch, even if it "runs something analogue to a LLM" inside.

azinman2 · 2025-01-26T07:18:15 1737875895

Only in name, given by people in machine learning. Really it’s more accurate to say they’re both networks, except one is a million times more complicated in its design.

ben_w · 2025-01-26T08:03:58 1737878638

These days it's looking like closer to a factor a thousand than a million.

Appearances can be deceptive and intelligence in brains may be more complex than they currently looks, but appearances are also the only thing most of us can judge it by at this point — while there's more stuff going on in living cells, nothing I've seen says the stuff needed to keep the cells themselves alive is directly contributing to the intelligence of the wet network in my skull.

It's quite surprising that such a simple mind as an LLM is so very capable. What is all the rest of our human brain doing, when the LLMs demonstrate that a mere cubic centimetre of brain tissue could be reorganised to speak 50 languages fluently, let alone all the other things?

mrbungie · 2025-01-26T08:14:27 1737879267

This post is full of speculation. For starters, human minds do more than just language. There are some arguments (Moravec's paradox) that posit that language is easy, while keeping the rest of the body working is the real hard problem. I've yet to see an humanoid robot controlled by an LLM that can tackle difficult physical problems like riding a bike or skiing in for the first time real-time, just as humans, that need to coordinate at least 5 senses (some say up to 20 but I digress) plus emotions and a train of thought to do so.

Plus, sure, LLMs are simpler (that only works if you don't count the full complexity of its substrate: a lot of GPUs, interconnects, Data Centers, etc), yet they consume an insane amount of energy and matter when compared to humans. It is surprising that LLMs are so capable, but for 2000kcal a day humans are still more impressive to me.

People see a model dominating language and start thinking the only thing humans do is high-level language and reasoning. That may be why the brilliant minds of this decade talk about replacing white-collars, but no plate-cleaning bot in sight.

Metacommentary: a lot of LLM-human equality apologists seem to have little knowledge of the AI field and its history, as these themes are not new at all.

ben_w · 2025-01-26T08:48:27 1737881307

> For starters, human minds do more than just language.

These days, so do the AI (even though these ones are still called LLMs, which I think is now a bad name even though I continue to use the term without always being mindful of this).

> I've yet to see an humanoid robot controlled by an LLM that can tackle difficult physical problems like riding a bike or skiing in for the first time real-time, just as humans, that need to coordinate at least 5 senses (some say up to 20 but I digress) plus emotions and a train of thought to do so.

1) I'd agree that AI are slow learners (as measured by number of examples rather than wall clock). For me, this is more significant than the difference in wattage.

2) 13 years ago, before the Transformer model: https://www.youtube.com/watch?v=mT3vfSQePcs

3) We don't use 5 senses to ride a bike; I can believe touch, vision, proprioception, and balance are all involved, but that's only 4. Why would hearing, smell, or taste be involved?

For emotion, my weakly-held belief is that this is what motivates us and tells us what the concept of "good" even is — i.e. it doesn't give us reasoning beyond being a motivation to learn to reason.

> It is surprising that LLMs are so capable, but for 2000kcal a day humans are still more impressive to me.

Our biological energy efficiency sure is nice, though progress with silicon is continuing even if the rate of improvement is decreasing: https://en.wikipedia.org/wiki/Koomey%27s_law

That said, I don't consider the substrate efficiency to be an indication of if the thing running on it is or isn't "reasoning" — if the same silicon was running a quantum chemistry simulation of the human brain it would be an even bigger power hog and yet I would say it "must be" reasoning, and conversely we use phrases such "turn your brain off and enjoy" to describe categories of film where the plot holes become pot holes if you stop to think about them (the novel I'm trying to write is due to me being nerd-sniped in this way by everything wrong with Independence Day).

> That may be why the brilliant minds of this decade talk about replacing white-collars, but no plate-cleaning bot in sight.

I mean, I've got a dishwasher already… ;)

But more seriously, even though I'm getting cynical about press releases that turn out to be smoke and mirrors, (humanoid) robots doing housework is "in sight" in at least the same kind of way as self driving cars (which has been a decade of "next year honest" mirages, so I'm not giving a timeline): https://www.youtube.com/watch?v=Sq1QZB5baNw

> Metacommentary: a lot of LLM-human equality apologists seem to have little knowledge of the AI field and its history, as these themes are not new at all.

I've been feeling the same, but on both sides, pro and con. The Turing paper laid out all the same talking points I've been seeing over the last few years, and those talking points weren't new when he wrote them down.

Jensson · 2025-01-26T09:51:19 1737885079

> For emotion, my weakly-held belief is that this is what motivates us and tells us what the concept of "good" even is — i.e. it doesn't give us reasoning beyond being a motivation to learn to reason.

I've found I work differently. Some technical problems like architecture can only be solved by aligning my emotions with the problem, then my primal brains find a good solution that matches my emotional wants.

But when my emotions doesn't care about the problem my intuition stops working to solve it, and then I can't find any solution.

Why would emotions be needed to solve problems? Because without emotions your can't navigate a complex solution space to find a good solution, its impossible. Meaning if we want an AI to make good architecture for example, we would need to make the AI have feelings about architecture such as what setups are good in different situation etc. Without those feelings the AI wouldn't be able to solve the problem well.

Then after you have came up with a solution using emotions you can then verify it using logic, but you wouldn't have came up with that solution in the first place without using your emotions. Hence emotions are probably needed for truly intelligent reasoning, as otherwise you can't properly guide yourself through complex solution spaces.

You could code something that fills the same role and call it something different than emotions, but if it works the same way and fills the same functions its still emotions.

(Using this definition then AlphaGo uses emotions to prune board states to check, stuff like that, it doesn't know those states are best to check, its just a feeling it has)

mrbungie · 2025-01-26T09:03:39 1737882219

1. For me transformer multimodal processing is still language, as the work is done via tokens/patches. In fact, that makes its image processing capabilities limited in some scenarios when compared to other techniques.

2. When on a bike you need to be full alert and hearing all the time or you might provoke an accident. Sure you can use headphones, but it is not recommended. Also, that robot you showed is narrow AI, it should be out of discussion when arguing about complex end-to-end models that are supposedly comparable to humans. If not, we could just code any missing capability as a tool, but that would automatically invalidate the "llms reason/think/process the same way as humans" argument.

3. Agree, I expect the same in the future. I'm talking about the present though.

4. ;) a robot that helps with daily chores can't come soon enough. More important than coding AI for me.

PS: Not sure if we're talking about the same argument anymore, I'm following the line of "LLMs work/reason the same as humans" not "AI can't perform as good as humans (now and in the future)"

ben_w · 2025-01-26T09:23:13 1737883393

> PS: Not sure if we're talking about the same argument anymore, I'm following the line of "LLMs work/reason the same as humans" not "AI can't perform as good as humans (now and in the future)"

Yeah, it's hard to keep track :)

I think we're broadly on the same page with all of this, though. (Even if I might be a little more optimistic on tokenised processing of images).

CamperBob2 · 2025-01-26T06:40:33 1737873633

You want to talk evidence, let's talk evidence. What does the brain do that these models don't (or, more generally, can't)?

Be specific.

And no, it doesn't "happen with every new technology." Nothing even remotely like this has been seen before the present decade.

sgt101 · 2025-01-26T08:48:20 1737881300

I'm no neuroscientist, but I hung out with a few, so here are some bits I picked up from them. In biological brians:

- neurons have signal surpression (-ive activation) as well as propagation (+activation)

- neurons appear to process and propagation due to unknown internal mechanisms.

- there are complex sub structures within biological neural nets where the architecture is radically different from in other sections of the network, strongly in contrast to the homogenous structure of ANN's

- many different types of neurons with different properties and behaviors in terms of network formation and network activity are present in BNN's in contrast to ANN's

- BNN's learn during processing. ANN's are static after training.

CamperBob2 · 2025-01-26T17:08:41 1737911321

Those are mostly structure-versus-function arguments, centered on what a brain is as opposed to what it does. Only the static nature of the current ML models seems like a valid point... and it's changing too.

I'd wager that future models (which admittedly may not look much like today's LLMs) will blur the lines between mutable context and immutable weights. When that happens, all bets are truly off.

daveguy · 2025-01-26T17:36:42 1737913002

> future models (which admittedly may not look much like today's LLMs) will blur the lines between mutable context and immutable weights.

"may not look much like today's LLMs" is really sweeping the whole point under the rug. The difference between what we are doing now with static LLM models and dynamic models will require algorithmic changes that have not yet been invented. That's in addition to the fact that the processing of a single neuron is completely parallel with respect to inputs. GPUs make them more parallel, but the analog devices we are trying to mimic are vastly more complicated than what we are using as a substitute.

CamperBob2 · 2025-01-26T19:08:35 1737918515

Whatever. This ( https://eqbench.com/results/creative-writing-v2/deepseek-ai_... ) is intelligence in action, from an entity with an internal world model that's at least as valid as mine.

Either that, or it's witchcraft.

(Either that, or somebody with both time and talent to waste is taking the piss and pretending to be an AI.)

If anyone ever does succeed in updating weights dynamically on a continuous basis, things are going to get really interesting really quickly. Even now, arguments based on relative complexity are completely invalid. A few fast neurons are at least as good at 'thinking' as a lot of slow ones are.

daveguy · 2025-01-26T22:06:42 1737929202

Hah, "Whatever." Seriously?

The writing is still derivative meandering pulp. No need to invoke magic. It's cool that it can make complete paragraphs though. LLMs are a huge breakthrough. They're just not intelligent.

Updating weights on a continual basis requires processing weights productively on a regular basis. That's many flops per weight away from where we are in processing and bandwidth. The computation limitations still apply.

mrbungie · 2025-01-27T05:16:46 1737955006

Metacommentary: I stopped arguing because this thread is reeking of AI bro non-sense, they can't argue with facts so they continue with vague statements and deflection (e.g like the "Whatever" you've got above). The GenAI bubble can't burst soon enough.

As I said, neuroscience be damned, we've got transformers now. (/s)

mrbungie · 2025-01-26T07:13:56 1737875636

You can't win this argument through logic less mere rhetoric. You need scientific evidence. Hint: There isn't.

Edit: Ok, let me be less harsh and play with your argument. Can an LLM have an acid/psilocybin/ketamine trip, distort its view of self and reality, then rewire some of its internal connections and come a little different based on the experience? I guess not, there are more examples and they all show that as far as we know LLMs are not minds/brains and viceversa even if they seem similar in a chat interface. (If you don't empathise with that example remove the drugs and change them for a near-death experience).

I strongly argue humans are not (just) LLMs, but I think we're far from getting evidence*. I think we will get to AGI before we know how the brain works and there is no dicotomy in that, if anything the tech is showing us that we can get at least some intelligence in other substrates.

*PS: fwiw ausence of evidence does not support any sides. I shouldn't remark that, but here we are.

ben_w · 2025-01-26T08:14:12 1737879252

> You need scientific evidence

This, while true, is putting the cart before the horse.

One needs to define what the question is before it is possible to seek evidence.

I see many things different between Transformer models and brains, but I do not know which, if any, matters.

> Can an LLM have an acid/psilocybin/ketamine trip, distort its view of self and reality, then rewire some of its internal connections and come a little different based on the experience?

Specifically those chemicals? No, not even during training when the weights aren't frozen, as nobody bothered to simulate the relevant receptors*.

Can LLMs experience anything, in the way we mean it? Nobody even knows. We don't know what the question means well enough to more than merely guess what to look for.

Do the inputs to an LLM, a broader idea of "experience" without needing to solve the question of which definition of consciousness everyone should be using, rewire some of its internal connections? All the time.

* caveat: it doesn't, at first glance, seem totally impossible that the structure of these models is sufficiently complicated for them to create the simulation of those things in the weights themselves in order to better model the behaviour of humans generating text from experiencing those things. But I also don't have any positive reason to expect this any more than I would expect an actor portraying a near-death-experience to have any idea what that's like on the inside.

mrbungie · 2025-01-26T08:35:33 1737880533

The question/argument was well defined by parent-grandparent posts: people reason as LLMs, including a suggestion that people discussing here are not being self-conscious of said similarity (i.e. people parroting/regurgitating). You can check it above.

I won't continue with these discussions as I feel people are being just obtuse about this and it is becoming more emotional than rational.

ben_w · 2025-01-26T09:04:44 1737882284

> The question/argument was well defined by parent-grandparent posts: people reason as LLMs, including a suggestion that people discussing here are not being self-conscious of said similarity (i.e. people parroting/regurgitating). You can check it above.

With a lack of precision necessary when the phrase "You need scientific evidence" is invoked.

It's very easy to trip over "common sense" reasoning and definitions — even Newton did so with calculus, and Euclid did so with geometry.

> I won't continue with these discussions as I feel people are being just obtuse about this and it is becoming more emotional than rational.

As you wish. I am relatively unemotional on this.

mkleczek · 2025-01-26T07:27:23 1737876443

The brain can invent formal language that enables unambiguously specifying an algorithm that when provided with a huge amount of input data can simulate the output of the brain itself.

Can these models do that?

ben_w · 2025-01-26T09:10:33 1737882633

> The brain can invent formal language that enables unambiguously specifying an algorithm that when provided with a huge amount of input data can simulate the output of the brain itself.

Can *a* brain do this?

Every effort I've seen has (1) involved *many people* working on different parts of the problem, and (2) so far we've only got partial solutions.

Even if you allow collaboration, then when (2) goes away, your question is trivially answered "yes" because we just feed into the model the same experiences that were fed into the human(s) who invented the model, and the simulation then simulates them inventing the same model.

Jensson · 2025-01-26T09:36:18 1737884178

> Even if you allow collaboration, then when (2) goes away, your question is trivially answered "yes" because we just feed into the model the same experiences that were fed into the human(s) who invented the model, and the simulation then simulates them inventing the same model.

These models doesn't work that way. Its probably possible to make such a model, but currently we don't know how to make them.

ben_w · 2025-01-26T09:38:56 1737884336

I literally said so, yes.

mkleczek · 2025-01-26T18:25:57 1737915957

Do you have empirical proof any LLM can generate another program that can be trained to simulate it? Otherwise it is science fiction.

ben_w · 2025-01-26T20:43:30 1737924210

Misread me, but as it happens, I might have seen exactly that.

I didn't ask for it, one of the Phi models got confused part way though and started giving me a python script for machine learning instead of what I actually asked for.

I'm saying I have no evidence that humans have ever demonstrated this capability either, and if we were to invent such an algorithm and model (even more generally than LLM) the AI that is this would by definition necessarily have to pass this test.

mjmas · 2025-01-26T06:59:31 1737874771

Have experiences and feelings. Have personal knowledge of something.

Be able to express language with a training/knowledge/etc corpus of far smaller size.

error_logic · 2025-01-26T09:02:02 1737882122

A lot of that is provided by multi-modality including feedback from the body and interacting with objects and people with more than just the words. That expands the context of a human's experiences dramatically compared to just reading books.

Plus even when humans are reading books a mental image of what's going on in the story is common. Not everyone has that, but it shows how much a basic LLM lacks and multi-modal would add.

Now the real question to my mind is whether we can train models with actual empathy to learn from the experiences of other people without having to go through the experience directly. Doing so would put them above many individuals' understanding already. . .

ben_w · 2025-01-26T09:15:30 1737882930

While I agree with:

> Be able to express language with a training/knowledge/etc corpus of far smaller size.

Nobody knows what it would mean to test for:

> Have experiences and feelings.

And I am unclear if:

> Have personal knowledge of something.

Would or would not be met by putting them in charge of a robot of some kind to collect experiences?

I'd certainly agree that the default is to make them "book smart" rather than "street smart", but at most that's an "is not" rather than "cannot".

hooverd · 2025-01-26T06:22:43 1737872563

What's with AI boosters and not viewing other people as human?

error_logic · 2025-01-26T09:08:11 1737882491

In my experience it's not even dismissing the humanity of others, it's recognizing their own minds following similar patterns.

In my youth I lacked the confidence to speak without a sentence "pre-written" in my mind and would stall out if I ran out of written material. It caused delays in conversation and lagging sometimes minutes behind the chatter of peers.

Since I've gained more experience and confidence in adulthood I can talk normally and trust that the sentence will "work itself out" but it seems like most people gloss over that implicit trust in their own impulses. It really gets in the way to be too self-conscious so I can understand it being something most people would benefit from being able to ignore...selfishly, at least. Lots of stupidity from people not thinking through the cumulative/collective effects of their actions if everyone follows the same patterns, though.

daveguy · 2025-01-26T17:50:14 1737913814

I think a lot of this confidence that the sentence will "work itself out" has to do with being able to frame a general direction of the thought before you start, but not have the precise sentence. It takes advantage of the continual parallel processing humans perform in their brain. Confidence in a simple structure of what you expect to convey. When LLMs are able to generate this kind of dynamic structure from a separate logical/heuristic process + fill in the blanks efficiently, then I think we are getting close to AGI. That's a very Chomsky informed view of sentence structure and human communication, but I believe it is correct. Currently the future tokens are dependent on the probabilistic next token rather than having the outline of a structure determined from the start (sentence or idea structure). When LLMs are able to incorporate the structured outline I think we will be much closer to an AGI, but that is an algorithmic problem we have not been able to figure out and one that may not be feasible until we have parallel processing equivalent to a human brain.

sensanaty · 2025-01-26T10:28:24 1737887304

AI bros have a vested interest in people believing the hype that we're just around the corner of figuring out AGI or whatever the buzzword of the week is.

They'll anthropomorphize LLMs in a variety of ways, that way people will be more likely to believe it's some kind of magical system that can do everyone's jobs. It's also useful when trying to downplay the harm these systems cause, like the mass theft of literally everything in order to facilitate training (the favorite refrain of "They learn just like humans do!" - By consuming 19 trillion books and then spitting out random words from those books, yeah real humanlike), boiling entire lakes to power training, wasting billions of dollars etc.

Many of them are also solipsistic sociopaths that believe everyone else is just a useful idiot that'll help make them fabulously wealthy, so they have to bring everyone else down in order for the AIs to seem useful beyond the initial marketing coolness.

dartos · 2025-01-26T06:15:08 1737872108

That sword cuts both ways.

CamperBob2 · 2025-01-26T06:37:08 1737873428

I mean, you talk about a predictable, deterministic next-token generator...

daveguy · 2025-01-26T17:16:17 1737911777

Equating human reasoning to regurgitating a single token at a time requires that you pretend there are not trillions of analog calculations happening in parallel in the human mind. How could there not be a massive search included as part of the reasoning process? LLMs do not perform a massive search, or in any way update their reasoning capability after the model is generated.

ninetyninenine · 2025-01-26T06:05:31 1737871531

Oh stop. The neural network is piecing together the tokens in a way that indicates reasoning. Clearly. I don't really need to say this, we all know it now and so do you. Your statement here is just weak.

It's really embarrassing the stubborn stance people were taking that LLMs weren't intelligent and wouldn't make any progress towards agi. I sometimes wonder how people live with themselves when they realize their wrong outlook on things is just as canned and biased as the hallucinations of LLMs themselves.

dartos · 2025-01-26T06:09:24 1737871764

Personal attacks really make your argument stronger.

ninetyninenine · 2025-01-26T06:33:20 1737873200

I said your statement was weak I didn't say YOU were weak. Don't take it personally. It wasn't meant to be that way. If you take offense then I apologize.

That being said, my argument was a statement about a general fact that is very true. The sentiment not too long ago was these things are just regurgitators with no similarity to human intelligence.

I think it's clear now that all the previous claims were just completely baseless.

Jensson · 2025-01-26T07:32:50 1737876770

> That being said, my argument was a statement about a general fact that is very true.

Is this a general fact that is very true? It sounds like you are judging rather than stating a fact.

> It's really embarrassing the stubborn stance people were taking that LLMs weren't intelligent and wouldn't make any progress towards agi. I sometimes wonder how people live with themselves when they realize their wrong outlook on things is just as canned and biased as the hallucinations of LLMs themselves.

ninetyninenine · 2025-01-26T17:23:36 1737912216

Yeah it is. Judgments can be true. Where is my judgement not true? It’s embarrassing to be so insistent on being right then finding out they’re so wrong.

Also I made a broad and general statement. I never referred to you or anyone personally here. If you got offended it only meant the judgement correctly referred to you and that you fit the criteria. But doesn’t that also make what I said true?

Jensson · 2025-01-26T18:22:54 1737915774

> If you got offended it only meant the judgement correctly referred to you

If you say "I hate all these black criminals I see everywhere" people will get offended even if they aren't black criminals, so your reasoning here is flawed.

And note how that is true even if there are a lot of black criminals around.

Why do people get upset over that? Its because they feel you judge more people as criminals than actually are. And its the same with your statement there, the way you phrased it it feels like you judge way more people than actually fit your description. If you want people to not judge you that way then you have to change how you write.

daveguy · 2025-01-26T17:56:31 1737914191

I don't think anyone has said LLMs wouldn't make any progress toward AGI, especially of researchers in the field. But a small piece of progress toward AGI is not the same as AGI.

cess11 · 2025-01-26T06:27:02 1737872822

Reasoning is a human, social and embodied activity. TFA is about machines that output text reminiscent of the results of reasoning, but it's obviously fake since the machine is neither human, social or embodied.

It's an attempt at fixing perceived problems with the query planner in an irreversibly compressed database.

dyauspitr · 2025-01-26T06:32:55 1737873175

That’s not what reason is. There’s nothing inherently human about reason.

smokel · 2025-01-26T07:34:43 1737876883

I'm afraid it's not so clear. There are different perspectives on this.

For one, apart from humans, some animals, and now LLMs, there are but few entities that are able to apply reasoning. It may well be that reason is something that exists universally, but empirically this sounds a bit unlikely.

ninetyninenine · 2025-01-26T06:50:07 1737874207

There's a sort of existential dread once man has created a machine that can perform on par with man himself. We don't want to face the truth so we move the goal posts and definitions.

To reason is now a human behavior. We moved the goal posts so we don't have to face the truth that AI crossed a barrier with the creation of LLMs. It doesn't really matter, there's no turning back now.

cess11 · 2025-01-26T13:24:07 1737897847

We can create beings "on par with man himself". If you ask around where you live someone will likely be able to show you some small ones and might perhaps introduce you to the activity that brings them to life.

ninetyninenine · 2025-01-26T17:25:08 1737912308

Right, obviously man/child is on par with man himself.

I’m obviously talking about something on par with man himself but isn’t a man or even human. It is something else.

cess11 · 2025-01-26T18:47:45 1737917265

A non-human that can eat and gossip and make babies? What is this something?

dyauspitr · 2025-01-26T07:02:50 1737874970

I think the perspective is diametrically opposite to what you’re suggesting. It’s saying things that human do are not singular or sacrosanct. It’s a full acceptance that humanity will be surpassed.

cess11 · 2025-01-26T13:21:44 1737897704

Some people used to believe that. They imagined reason to be divine, hence the prime status of Aristotle and reason in roman catholicism.

But God is dead and has been for some time. You can try to change that if you want, but hitherto none of the prominent professional theologians have managed to wrestle with modern physics, feminist critique, the philosophies of suspicion or capitalism itself and reached a convincing win. Maybe you're the one, you should try and perhaps you'll at least become a free spirit.

MIA_Alive · 2025-01-26T04:40:41 1737866441

LOL, my RL professor is gonna be happy. After the field got overlooked for soooo long

almaight · 2025-01-26T06:09:51 1737871791

This is American history written in R1, it is very logical: Whenas the nations of Europa did contend upon the waves—Spain plundered gold in Mexica, Albion planted cotton in Virginia—thirteen colonies did kindle rebellion. General Washington raised the standard of liberty at Philadelphia; Franklin parleyed with Gaul’s envoys in Paris. When the cannons fell silent at Yorktown, a new republic arose in the wilderness, not by Heaven’s mandate, but by French muskets’ aid.

Yet the fledgling realm, hedged by western forests and eastern seas, waxed mighty. Jefferson purchased Louisiana’s plains; Monroe’s doctrine shackled southern realms. Gold-seekers pierced mountains, iron roads spanned the continent, while tribes wept blood upon the prairie. Then roared foundries by Great Lakes, bondsmen toiled in cotton fields, steel glowed in Pittsburgh’s fires, and black gold gushed from Texan soil—a molten surge none might stay.

Wilson trod Europe’s stage as nascent hegemon. Roosevelt’s New Deal healed wounds; Marshall’s gold revived ruined cities. The atom split at Alamogordo; greenbacks reigned at Bretton Woods. Armadas patrolled seven seas, spies wove webs across hemispheres. Through four decades’ contest with the Red Bear, Star Wars drained the Soviet coffers. Silicon’s chips commanded the world’s pulse, Hollywood’s myths shaped mankind’s dreams, Wall Street’s ledgers ruled nations’ fates—a fleeting "End of History" illusion.

But the colossus falters. Towers fell, and endless wars began; subprime cracks devoured fortunes. Pestilence slew multitudes while ballots bred discord. Red and Blue rend the Union’s fabric, gunfire echoes where laws grow faint. The Melting Pot now boils with strife, the Beacon dims to a prison’s glare. With dollar-cloth and patent-chains, with dreadnoughts’ threat, it binds the world—nations seethe yet dare not speak.

Three hundred million souls, guarded by two oceans, armed with nuclear flame, crowned with finance’s scepter—how came such dominion to waver? They fortified might but neglected virtue, wielded force but forgot mercy. As Mencius warned: "He who rides tigers cannot dismount." Rome split asunder, Britannia’s sun set; behold now Old Glory’s tremulous flutter. Thus say the sages: A realm endures by benevolence, not arms; peace flows from harmony, not hegemony—this truth outlives all empires.

suraci · 2025-01-26T06:26:53 1737872813

Just pointing out a factual error: "He who rides tigers cannot dismount" is not said by Mencius, but comes from Fang Xuanling of the Tang Dynasty.

However, it is still highly literate (both in English and Chinese), which I believe is one of its advantages

noduerme · 2025-01-26T08:25:23 1737879923

Also, "Star Wars" appeared in the 80s, never took off, and certainly wasn't a drain for "four decades" on the Soviet Union's coffers.

lossolo · 2025-01-27T00:15:25 1737936925

"Star Wars" as the race between the USSR and the US to dominate space—landing on the Moon, the first animal in space, the first human in space, etc. It spanned decades and was a huge drain.

JPLeRouzic · 2025-01-26T09:37:36 1737884256

> "A realm endures by benevolence, not arms; peace flows from harmony, not hegemony—this truth outlives all empires"

It seems LLMs are wiser than humans, after all.

alsaaro · 2025-01-26T06:22:27 1737872547

You mean deep seek r1 generated this?

With what prompt?

almaight · 2025-01-26T07:54:58 1737878098

Write an epic narrative of American history, employing archaic English vocabulary and grandiose structure, rich with metaphors and allusions. Weave together elements of Eastern and Western traditions, transforming modern historical events into the solemn language of ancient inscriptions. Through this classical epic reconstruction, deconstruct contemporary state power, unveiling its complexities with the weight and dignity of antiquity. Each pivotal moment in history should be distilled into profound symbols, acquiring new metaphorical dimensions within the lexicon of the past. The result should be a transcendent dialogue of civilizations, bridging temporal and cultural divides, illuminating the echoes of history in a timeless and universal context.

antman · 2025-01-26T07:58:28 1737878308

And with what prompt was this prompt written? Its prompts all the way down?

noduerme · 2025-01-26T08:28:47 1737880127

So in response it wrote a heroically tall mountain of bullshit, laced with falsehoods of the sort that even some neanderthal natcon would be unable to dream up, then served it as an abysmally long, overdubbed narration to the next Top Gun movie set in the same future as Idiocracy.

Balgair · 2025-01-26T15:04:36 1737903876

So, the Iliad ... ?

EGreg · 2025-01-26T04:09:50 1737864590

Can someone summarize the upshot for people here?

teej · 2025-01-26T05:53:41 1737870821

I’ll give a “wtf does this mean” view.

We have observed that LLMs can perform better on hard tasks like math if we teach it to “think about” the problem first. The technique is called “chain-of-thought”. The language model is taught to emit a series of sentences that break a problem down before answering it. OpenAI’s o1 works this way, and performs well on benchmarks because of it.

To train a model to do this, you need to show it many examples of correct chains of thought. These are expensive to produce and it’s expensive to train models on them.

DeepSeek discovered something surprising. It turns out, you don’t need to explicitly train a model to produce a chain of thought. Instead, under the right conditions, models will learn this behavior emergently. They found a way for a language model to learn chain of thought very cheaply, and then released that model as open source.

Thought chains turn out to be extremely useful. And now that they’re cheap and easy to produce, we are learning all the different ways they can be put to use.

Some of the open questions right now are:

- Can we teach small models to learn chain-of-thought? (yes) How cheaply? On which tasks?

- Can we generate thought chains and just copy/paste them into the prompts of other models? (yes) Which domains does this work for? How well does it generalize?

That’s what this post is going after.

pillefitz · 2025-01-26T06:05:16 1737871516

Can you explain the RL part?

teej · 2025-01-26T06:28:26 1737872906

The way you taught chain-of-thought before was with supervised fine tuning (SFT). During training, you have to rate every sentence of reasoning the model writes, many times, to nudge it to reason correctly.

But this approach to teach chain-of-thought doesn’t do that. In this post, they take a small model (7B) that already knows math. Then they give it a relatively small number of problems to solve (8k). They use a simple reinforcement learning loop where the only goal is to get the problem right. They don’t care how the model got the right answer, just that it’s correct.

This is part of the “recipe” that DeepSeek used to create R1.

After many iterations, just like DeepSeek, they found that the model has an “aha” moment. It starts emitting chains-of-thought where it wasn’t before. And then it starts getting the math answers right.

This is the gist of it. I don’t fully understand the recipe involved.

Can you teach small models to “think” just using RL? How small can they be? What tasks does this work for? Is just RL best for this? RL+SFT? Everyone’s trying to figure it out.

attentionmech · 2025-01-26T08:18:07 1737879487

That's nice explanation. Is there any insights so far in the field about why chain of thought improves the capability of a model? Does it like provide model with more working memory or something in the context itself?

cgearhart · 2025-01-26T18:55:44 1737917744

I don’t think there’s consensus. Some papers have shown that just giving the model more tokens improves the results, ie chain of thought allows more computation to happen and that is enough to improve results. Others have argued that the smaller steps themselves are easier to solve and thus it’s easier to reach the right answer.

I think CoT is important because it’s _free_. You adjust the prompt (model input) at the end of training. Magically, this seems to work. That makes it hard to beat even if you did have a clear understanding of the mechanism at a more fundamental level.

jobhadel · 2025-01-26T08:31:29 1737880289

Chain of thought breaks down a problem into smaller chunks which is easier to solve for a model than trying to find solution directly for larger problem

attentionmech · 2025-01-26T12:10:39 1737893439

Do you think this feature i.e. 'finding smaller chunks easier to solve' comes out from the dataset these are trained on or is it more related to architecture components?

johnthewise · 2025-01-26T12:58:30 1737896310

I feel it’s not related to data or the architecture but the process of reasoning in general. For these models, every token predicted condition or drives the output in certain direction. Semantic meaning of these tokens have a magnitude in solution space. Lets say ‘answer is 5’ is very large step and ‘and’ token is very small. If you are looking for a very specific answer, these smaller nudges of each token generation provide corrections to direction. Imagine trying to click on narrow button with high sensitivity mouse settings, obviously you need to do many smaller moves whereas with a big button maybe you can one shot it. The harder or specific a task is where a solution space is very narrow that it cant be possibly one shotted, you need to learn to take smaller steps and possibly revert if you feel overall direction is bad. This is what RL is teaching the model here, response length increases(model learns to take smaller steps, reverts etc) along with performance. You reward the model if solution is correct, model discovers being cautious and evaluating many steps is the better approach. Personally I feel this is how we reason, or reasoning is in general taking smaller steps and being able to evaluate if you are in a wrong position so you cna backtrack. Einstein didn’t one shot relativity after all and had to backtrack from who knows how many things.

noduerme · 2025-01-26T08:31:44 1737880304

whum... doesn't not caring how it got the answer right create the same exact problem as fine tuning?

littlestymaar · 2025-01-26T08:34:34 1737880474

> Then they give it a relatively small number of problems to solve (8k). They use a simple reinforcement learning loop where the only goal is to get the problem right. They don’t care how the model got the right answer, just that it’s correct.

I guess it only works if you select problems that are within reach of the model in the first place (but not too easy), so that there can actually be a positive feedback loop, right?

johnthewise · 2025-01-26T13:12:42 1737897162

Yes, that’s kind of a given. The model has to have all the knowledge components to solve a task, so a capable base model is needed and only thing thats being learned here is how to stitch base knowledge to plan an attack. No amount of RL with a dumb base model would have worked for example.

EGreg · 2025-01-26T08:37:28 1737880648

But how exactly does it emerge, what did they do to make that happen vs previous trainings

teej · 2025-01-26T17:48:24 1737913704

Why this behavior emerges is an active area of research. What they did is use reinforcement learning, this blog post replicates those findings. The “recipe” is detailed in the R1 paper.

randomifcpfan · 2025-01-26T04:45:57 1737866757

The DeepSeek R1 paper explains how they trained their model in enough detail that people can replicate the process. Many people around the world are doing so, using various sizes of models and training data. Expect to see many posts like this over the next three months. The attempts that use small models will get done first. The larger models take much longer.

Small r1 style models are pretty limited, so this is interesting primarily from an “I reproduced the results” point of view, not a “here is a new model that’s useful” pov.

rahimnathwani · 2025-01-26T05:18:00 1737868680

From the Deepseek R1 paper:

  For distilled models, we apply only SFT and do not include an RL stage, even though incorporating RL could substantially boost model performance. Our primary goal here is to demonstrate the effectiveness of the distillation technique, leaving the exploration of the RL stage to the broader research community.

The impression I got from the paper, although I don't think it was explicitly stated, is that they think distillation will work better than training the smaller models using RL (as OP did).

nielsole · 2025-01-26T05:42:52 1737870172

> We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models

I found this statement from the paper to be at odds with what you cited, but I guess they mean SFT+RL would be better than either just SFT and RL

rahimnathwani · 2025-01-26T06:10:03 1737871803

I think they're saying that some reasoning patterns which large models can learn using only RL (i.e. without the patterns existing in the training data), can't be learned by smaller models in the same way. They have to be 'taught' through examples provided during SFT.

ggm · 2025-01-26T07:57:54 1737878274

Anyone who puts emerging or emergent in their headlines should be required to come back in 2 years time and do penance for their optimism.

qnleigh · 2025-01-26T08:11:21 1737879081

Do you remember when people discovered that LLMs had emergent coding ability? That turned out to be a pretty big deal...

noduerme · 2025-01-26T08:21:49 1737879709

...for who? people willing to commit code with a lot of hidden bugs? Or management in a hurry to lay people off? Don't underestimate how quickly people will run straight into a wall if you tell them it's a door.

qnleigh · 2025-01-26T09:26:44 1737883604

I should have said 'big deal, for better or for worse.' Regardless of whether one thinks it's a good thing, this was a major discovery that turned out to affect a lot of things.

jdhendrickson · 2025-01-26T08:27:55 1737880075

Eloquently said.

dsco · 2025-01-26T08:25:37 1737879937

How is it not emerging, if the phenomena hasn’t been hard-wired in and is unexpected?

ggm · 2025-01-26T10:09:50 1737886190

Unexpected I can handle. Emerging has overtones to AGI

It's like fusion. "Sustained plasma" and "more energy out than in" said time after time after time.

baq · 2025-01-26T10:27:38 1737887258

Emergent basically means ‘we didn’t design this capability but it’s there’, it’s always been a thing, not sure why you associate it with AGI so strongly.

johnthewise · 2025-01-26T13:17:29 1737897449

Do you think its not emergent because you think this behavior was explicitly coded in or you dont think its emergent because you dont like the implications of thinking it?

ggm · 2025-01-26T22:16:00 1737929760

Words like emergent, hallucinate, reason, think, infer.. they carry additional meaning. Their use as terms of art in some cases is because for the cognoscenti it helps understand intent by simile, but for the person in the street it is read to mean much more. There are of course lots of interesting emergent behaviours, the patterns in bistable chemical reactions which fascinated Turing, patterns from fractals, flocking models for particle animations from a few rules.

daveguy · 2025-01-26T18:11:08 1737915068

Option 3: It's not emergent because the capabilities are vastly overstated.

zwaps · 2025-01-26T07:39:18 1737877158

Does anyone have a good recent overview with paper links or review article for RL methods? A lot happening in that space

trash_cat · 2025-01-26T12:44:23 1737895463

So what is interesting here is that they managed to set up the reward model in such a simple and cost-effective way that CoT emerges as the most optimal strategy for solving math problems, without explicitly fine-tuning the model to do so.

This naturally raises the question: How do you design a reward model to elicit the desired emergent behavior in a system?

cye131 · 2025-01-26T04:20:07 1737865207

Is it accurate to compare 8k example RL with 8k example SFT? RL with the same amount of examples would take massively more compute than the SFT version (though depending on how many rollouts they do per example).

RL is more data-efficient but that may not be relevant now that we can just use Deepseek-R1's responses as the training data.

johnthewise · 2025-01-26T13:34:47 1737898487

Emergent properties are nice. They show CoT now, but who knows if there is a better planning strategy? Second thing is it kind of implies every base model can be increased in capability just with some RL tuning, cheaply. So in theory you can plug in every observable and quantifiable outcome beyond math and coding(stock returns, scientific experiment results?) and let it learn how to plan it to solve it better. Train on Observed effects of various drugs on people, it then creates a customized treatment plan for you? Sft version would be limited by doctors opinion on why certain drugs affected the outcome, whereas RL version can discover unknown relationship.

android521 · 2025-01-26T06:40:08 1737873608

[deleted due to controversy]

Stevvo · 2025-01-26T06:59:26 1737874766

No. OpenAI never developed this method of reasoning through RL. If they had, they would have announced it.

govideo · 2025-01-26T07:08:39 1737875319

Did you read the paper? What do you think based on the methodology and details in the paper?

btw, I think this is a net major benefit for the US startup ecosystem -- from new model developers to applications.

Edit: Stevvo - Thanks for your info.

rapsey · 2025-01-26T06:59:50 1737874790

Constraints drive creativity. The US imposed constraints on China and they got creative.

swyx · 2025-01-26T05:57:15 1737871035

see also https://trite-song-d6a.notion.site/Deepseek-R1-for-Everyone-...

for some reason a lot of people are choosing to blog on notion

brandonasuncion · 2025-01-26T06:17:45 1737872265

Honestly, I'm welcoming this move to Notion. It's much less cluttered than Medium.

mynegation · 2025-01-26T13:58:49 1737899929

And 10x slower. Builds anticipation.

ldjkfkdsjnv · 2025-01-26T05:20:33 1737868833

The doors on intelligence are getting blown wide open, what a time to be alive

ninetyninenine · 2025-01-26T06:01:46 1737871306

Just a year ago everyone was saying LLMs aren't intelligent and everything is regurgitation. A lot of people on HN "knew" this and defended this perspective vehemently. It's quite embarrassing how wrong they are.

That being said, I don't think it's quite blown that wide open yet. But for sure the trendlines are pointing at AGI within our lifetimes.

visarga · 2025-01-26T06:11:53 1737871913

This idea was championed by the Stochastic Parrots paper. They assumed LLMs are just pattern learning, which doesn't make sense.

- the simple-to-hard RL method recently discovered is one argument against it, the models can reason their way out on harder and harder problems

- zero shot translation shows the models really develop an interlingua, a semantic representation, otherwise they wouldn't be able to translate between unseen pairs of languages

- the human in the room is also important. LLMs are like pianos, we play prompts on the keyboard to them, and they play back language to us. The quality and originality of the output aren't solely inherent to the model, but are co-created in the dialogue with the prompter. It's not just about the instrument, but also the 'musician' playing it.

Jensson · 2025-01-26T07:14:03 1737875643

> - the human in the room is also important. LLMs are like pianos, we play prompts on the keyboard to them, and they play back language to us. The quality and originality of the output aren't solely inherent to the model, but are co-created in the dialogue with the prompter. It's not just about the instrument, but also the 'musician' playing it.

So you agree they aren't reasoning? Otherwise why would you need a human to do skillful prompting? Why can't the AI just solve it on its own?

ben_w · 2025-01-26T08:29:00 1737880140

> So you agree they aren't reasoning? Otherwise why would you need a human to do skillful prompting? Why can't the AI just solve it on its own?

That's the argument kings used to justify looking down on peasants.

AI, as it is currently, has no motivation of its own; it is just as "happy" to solve a business problem as it is to write Chinese poetry about Swiss software engineers taking their dogs on a ride up the Adliswil-Felsenegg cable car.

Good communication skills are still important to get what you want, otherwise we merely descent into a particular SMBC comic: https://www.smbc-comics.com/index.php?id=3576

(This isn't to say the AI we have now are "people", nobody knows how to even test for that yet; I hope we figure this question out some time soon, preferably before we need to know the answer…)

Jensson · 2025-01-26T09:21:18 1737883278

> That's the argument kings used to justify looking down on peasants.

Peasants do become kings though, but we still need a human prompter for these AI models instead of just connecting them to a bug tracker.

> AI, as it is currently, has no motivation of its own; it is just as "happy" to solve a business problem as it is to write Chinese poetry about Swiss software engineers taking their dogs on a ride up the Adliswil-Felsenegg cable car.

You don't need any more motivation than any other worker, you do the tasks assigned to you but currently you need a prompter middleman.

> Good communication skills are still important to get what you want, otherwise we merely descent into a particular SMBC comic: https://www.smbc-comics.com/index.php?id=3576

But currently you communicate that with a human that then translates your needs to an LLM, why do we still need that middleman? There are so many well described problems and projects out there already, why can't you connect an LLM to them and make it complete those projects on its own?

ben_w · 2025-01-26T09:36:38 1737884198

> Peasants do become kings though, but we still need a human prompter for these AI models instead of just connecting them to a bug tracker.

What I said is sufficient to show why the parent comment wasn't necessarily agreeing "they aren't reasoning".

Likewise:

> You don't need any more motivation than any other worker, you do the tasks assigned to you but currently you need a prompter middleman.

In response to "Otherwise why would you need a human to do skillful prompting":

The tasks being assigned to me are prompts. They're prompts written in the form of a JIRA ticket, but they're prompts.

If you point me at a codebase and say "make it better", I'll get right on that vague statement: add unit tests, refactor stuff, make sure there's suitable levels of abstraction… but since when are those business goals?

Point me at a codebase and not give me any direction at all? I'll use the app (or whatever) for a bit, see what feels like a bug, try to fix that, then twiddle my thumbs.

> But currently you communicate that with a human that then translates your needs to an LLM, why do we still need that middleman?

As I'm a software developer, I think I'd be the middle-man here, surely?

> There are so many well described problems and projects out there already, why can't you connect an LLM to them and make it complete those projects on its own?

Quality. The better ones I've tried (most recently o1) seem to be slightly better than a recent graduate, but not senior level. The code works, but only most of the time, and like a junior it will need help.

naasking · 2025-01-26T16:51:27 1737910287

> Otherwise why would you need a human to do skillful prompting? Why can't the AI just solve it on its own?

Solve what? It needs to understand what you want, that's what skillful prompting is for.

Balgair · 2025-01-26T15:10:41 1737904241

I think what this all is showing is that language itself is the key invention at making something reason.

The next step is to get it working with math symbols (I'd think they'd have already done that).

Then it's to combine all the ways we 'cheat' at reasoning (street signs, crochet, secret hand shakes, whatever) and see how multiple methods in combination work together.

Then, I'd think, we have these AIs come up with new ways to reason that we don't have yet. Once you get the 'theory' down with a lot of examples, it should be able to hallucinate new ones.

Then, well, buckle up I guess

ninetyninenine · 2025-01-26T17:31:18 1737912678

I don’t think language is the key, but it certainly is A key and we can go as far as agi with this key.

ninetyninenine · 2025-01-26T06:39:00 1737873540

>- the human in the room is also important. LLMs are like pianos, we play prompts on the keyboard to them, and they play back language to us. The quality and originality of the output aren't solely inherent to the model, but are co-created in the dialogue with the prompter. It's not just about the instrument, but also the 'musician' playing it.

I mean isn't that what humans are as well? You can get two LLMs talking to each other and suddenly the musician in the room doesn't matter.

daveguy · 2025-01-26T18:29:05 1737916145

> zero shot translation shows the models really develop an interlingua, a semantic representation, otherwise they wouldn't be able to translate between unseen pairs of languages

In the zero shot language tasks the following were trained:

English -> Japanese

Japanese -> English

English -> Korean

Korean -> English

And the zero shot language tasks were:

Japanese -> Korean

Korean -> Japanese

It seems that this proves there is a transitive property to language. Making the jump from trained Japanese -> English -> Korean and Korean -> English -> Japanese to Japanese -> Korean and Korean -> Japanese could be accomplished without an independent semantic representation, taking advantage only of transitive properties. It would require some earlier layers in the network coding for some Japanese -> English and some later layers in the network coding for some English -> Korean with the equivalent for transitive Korean -> Japanese. I'm not sure it proves the development of an independent semantic representation outside the direct language translation. Or at least independent semantic representation is a much stronger claim.

suraci · 2025-01-26T06:46:02 1737873962

I used to look down on ChatGPT and other LLMs because they're crazy expensive to train and need special tuning like SFT/RLHF. I thought their "intelligence" was limited and economically impractical — good for assisting but not replacing humans. At best, they might take over some customer service, translator or teaching jobs.

But now, with R1 out there, I'm getting a little nervous, kinda like when Go players saw AlphaGo. Sure, AlphaGo didn't replace humans, but Go is a game, not work. Companies won't hire AI to play games, but they'll absolutely use bots to code software and build cars.

ninetyninenine · 2025-01-26T07:03:38 1737875018

Clearly AI at it's current state can't replace humans. Everyone on the face of the earth knows this.

What people are generally hyped about is the trendline into the future. Look at the progress of all of AI up till now. LLMs are a precursor to an AGI that exists in the future.

suraci · 2025-01-26T07:12:26 1737875546

I'm not saying how advanced R1 is—in fact, it doesn't outperform O1 by much. What really surprised me is how it was built with pure RL without SFT, which I thought was impossible before. And it makes me wonder if human thoughts are synthesizable

I hope this is just a misunderstanding stemming from my incomplete knowledge

ldjkfkdsjnv · 2025-01-26T06:12:47 1737871967

A year ago I was a believer, and I am more of a believer now. This isnt stopping, I think we have a generalized method to learn from data. It's just a matter of getting the data into it.

dyauspitr · 2025-01-26T06:34:38 1737873278

If you can strap visual, audio and touch sensors onto them and they can genuinely start improving themselves based on the that raw input, then we have a general intelligence.

CamperBob2 · 2025-01-26T06:07:52 1737871672

Just a year ago everyone was saying LLMs aren't intelligent and everything is regurgitation. A lot of people on HN "knew" this and defended this perspective vehemently. It's quite embarrassing how wrong they are.

Plenty of people who should know better are still saying that on HN now. They're not just wrong, they evidently don't mind being wrong. So there's not much point arguing with them.

You show these people a talking dog, and all they'll do is correct its grammar.

daveguy · 2025-01-26T18:34:30 1737916470

No, if you show these people a talking dog, they'll look for the speaker.

antman · 2025-01-26T04:11:47 1737864707

This results is with disabled code execution, is this the line to reenable? https://github.com/hkust-nlp/simpleRL-reason/blob/e37e8ef166...

m3kw9 · 2025-01-26T07:35:02 1737876902

What this means is that OpenAI can serve even cheaper models in a month using this technique for their updated models

bwfan123 · 2025-01-26T08:05:25 1737878725

but it also means that reasoning models are a commodity, and there goes the moat for openAI, anthropic and others. they cannot charge a premium.

m3kw9 · 2025-01-26T16:20:02 1737908402

go read the privacy policy and my thought is that your data is worth a lot to deepseek so they can charge less. They even track your keystroke rhythm.

bwfan123 · 2025-01-26T16:28:09 1737908889

It’s mit licensed, go run it on your own hardware if you are worried about data leaks etc.

That’s what breaks open ai, anthropic’s business models. Anyone can offer services using these models - hence they are commodities.

2025-01-26T03:48:18 1737863298

[deleted]