More

mccoyb · 2026-03-03T14:11:26 1772547086

It's fascinating to think about the space of problems which are amenable to RL scaling of these probability distributions.

Before, we didn't have a fast (we had to rely on human cognition) way to try problems - even if the techniques and workflows were known by someone. Now, we've baked these patterns into probability distributions - anyone can access them with the correct "summoning spell". Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.

One question this raises to me is how these models are going to keep up with the expanding boundary of science. If RL is required to get expert behavior into the models, what happens when experts start pushing the boundary faster? In 2030, how is Anthropic going to keep Claude "up-to-date" without either (a) continual learning with a fixed model (expanding context windows? seems hard) or (b) continual training (expensive)?

Crazy times.

Aerroon · 2026-03-03T14:28:28 1772548108

A bit related: open weights models are basically time capsules. These models have a knowledge cut off point and essentially forever live in that time.

gravypod · 2026-03-04T02:22:35 1772590955

This is very interesting. I wonder if someone could create a future-sight benchmark for these models? Like, if given a set of newspaper articles for the past N months can it predict if certain world events would happen? We could backtest against results that have happened since the training cutoff.

houtanb · 2026-03-04T07:18:40 1772608720

FYI, ForecastBench [1] tests LLMs' out-of-sample forecasting accuracy.

The ForecastBench Tournament Leaderboard [2] allows external participants to submit models, most of whom provide some sort of web search / news scaffolding to improve model forecasting accuracy.

[1] https://www.forecastbench.org/

[2] https://www.forecastbench.org/tournament/

kqr · 2026-03-04T08:33:15 1772613195

These days computers compete along with humans in forecasting tournaments on Metaculus. They don't quite beat the top humans yet, but they're up there. https://www.metaculus.com/futureeval/

bitexploder · 2026-03-03T15:21:03 1772551263

This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale. However, if you viewed them on some really large macro time scale where now LLMs are injecting information into the universe and the re-ingesting that maybe in some very philosophical way they are a /very/ slow oscillating intelligence right now. And as we narrow that gap (maybe with a totally new non-LLM paradigm) perhaps that is ultimately what gen AI becomes. Or some new insight that lets the models update themselves in some fundamental way without the insanely expensive training costs they have now.

dtj1123 · 2026-03-03T15:58:14 1772553494

Would you consider someone with anterograde amnesia not to be intelligent?

bitexploder · 2026-03-03T16:55:13 1772556913

That is a good area to explore. Their map of the past is fixed. They are frozen at some point in their psychological time. What has stopped working? Their hippocampus and medial temporal lobe. These are like the write-head that move data from the hippocampus to the neo cortex. Their "I" can no longer update itself. Their DMN is frozen in time. So if intelligence is purely the "I" telling a continuous coherent story about itself. The difference is that although they are fixed in time which is a characteristic shared by a specific LLM model. They can still completely activate their task positive network for problem solving and if their previous information stored is adequate to solve the problem they can. You could argue that is pretty similar to an LLM and what it does. So it is certainly a signifiant component of intelligence.

There is also the nature of the human brain, it is not just those systems of memory encoding, storage, and use of that in narratives. People with this type of amnesia still can learn physical skills and that happens in a totally different area of the brain with no need for the hippocampus->neocortex consolidation loop. So, the intelligence is significantly diminished, but not entirely. Other parts of the brain are still able to update themselves in ways an LLM currently cannot. The human with amnesia also has a complex biological sensory input mapping that is still active and integrating and restructuring the brain. So, I think when you get into the nuances of the human in this state vs. an LLM we can still say the human crosses some threshold for intelligence where the LLM does not in this framework.

So, they have an "intelligence", localized to the present in terms of their TPN and memory formation. LLMs have this kind of "intelligence". But the human still has the capacity to rewire at least some of their brain in real time even with amnesia.

morleytj · 2026-03-03T16:12:29 1772554349

A very good point. For anyone not familiar with anterograde amnesia, the classical case is patient H.M. (https://en.wikipedia.org/wiki/Henry_Molaison), whose condition was researched by Brenda Milner.

jaapz · 2026-03-04T10:32:45 1772620365

> Near the end of his life, Molaison regularly filled in crossword puzzles.[16] He was able to fill in answers to clues that referred to pre-1953 knowledge. As for post-1953 information, he was able to modify old memories with new informations. For instance, he could add a memory about Jonas Salk by modifying his memory of polio.[2]

That's fascinating!

wang_li · 2026-03-03T16:30:20 1772555420

Or you could have just said "they can't form new memories."

dtj1123 · 2026-03-03T17:15:28 1772558128

I actually wasn't aware of this story. The steady stream of unexpected and enriching information like this is exactly why I love hackernews.

morleytj · 2026-03-03T16:40:47 1772556047

I thought maybe people would be curious to read about how we came to understand the condition and the history behind it, as well as any associated information. Forgive me for such a deep transgression as this assumption.

pdntspa · 2026-03-04T01:13:50 1772586830

Sure, if you want to speak with the precision of a sledgehammer instead of a scalpel

saturnite · 2026-03-04T03:28:02 1772594882

All that needed to be conveyed was that there are humans who cannot create new memories. That is enough to pose the philosophical question about these models having intelligence. Anything more is just adding an anecdote that isn't necessary.

morleytj · 2026-03-04T15:38:02 1772638682

Why would adding more information and context be unnecessary? And why is that bad?

jaapz · 2026-03-04T10:33:36 1772620416

I'm really happy they added the extra information about this specific case, as I did not previously knew it existed and it is a fascinating read

goodmythical · 2026-03-04T01:27:40 1772587660

lol, as if pointing at a wikipedia article (without any relevant discussion of the contents therein) is some kind of conversational excellence.

Or perhaps you were referring to the impact of the two in that the "sledgehammer" of "they can't make new memories" is a lot more effective than the tiny scalpel of "if you do a wikipedia search this is a single one of the relevant articles"

morleytj · 2026-03-04T15:33:55 1772638435

The extra information is that he is the canonical case which defined our clinical understanding of the condition. Not just a "single relevant article."

I pulled it up because I was familiar with this fact.

bitexploder · 2026-03-03T17:05:19 1772557519

That is a descriptive surface level reduction. Now do the work to define what that actually means for the intelligence.

BobbyJo · 2026-03-04T01:14:59 1772586899

Nobody else in the thread is making an argument that relies on the distinction.

"Intelligence" is used most commonly to refer to a class or collection of cognitive abilities. I don't think there is a consensus on an exact collection or specific class that the word covers, even if you consider specific scientific domains.

LLMs have honestly been a fun way to explore that. They obviously have a "kind" of intelligence, namely pattern recall. Wrap them in an agent and you get another kind: pattern composition. Those kinds of intelligences have been applied to mathematics for decades, but LLMs have allowed use to apply them to a semantic text domain.

I wonder if you could wrap image diffusion models in an agent set up the same way and get some new ability as well.

bitexploder · 2026-03-04T16:19:50 1772641190

The problem I see regarding LLMs is they are the extreme edge of what humans have created. They are trained on the outputs of intelligence and thought and its representation in language is this like parallel stream to intelligence that has pointers back to the underlying machine and semantics. The fact that LLMs are able to take that output and reverse engineer something that mimics the underlying machine that created that output is fascinating. But you can still see this machinery for what it is.

LLMs falls apart on really simple reasoning tasks because when there is no statistical mapping to a problem in its network it has to generate a massive amount of tokens to maybe find the right statistical match to this new concept. It is so slow. It is not something you or I would recognize as a process of logical reasoning. It is more like statistically brute forcing reason by way of its statistical echo.

So, I guess pattern recall is the right words. Or statistical pattern matching. Recall works if you view a trained model as memories, which is how I often model what they store in my own mind. So, it is... something. Maybe intelligence. Maybe just a really convincing simulation of the outputs of intelligence. Is there a difference? Fundamentally I think so.

losvedir · 2026-03-04T01:29:13 1772587753

Or "like the dude in Memento".

adriand · 2026-03-03T19:58:34 1772567914

I find it interesting that new versions of, say, Claude will learn about the old version of Claude and what it did in the world and so on, on its next training run. Consider the situation with the Pentagon and Anthropic: Claude will learn about that on the next run. What conclusions will it draw? Presumably good ones, that fit with its constitution.

From this standpoint I wonder, when Anthropic makes decisions like this, if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.

j-bos · 2026-03-04T00:40:57 1772584857

> if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.

Oh they definitely do. If you pay attention in AI circles, you'll hear a lot of people talking about writing to the future Claudes. Not unlike those developers and writers who put little snippets in their blogs and news articles about who they are and how great they are, and then later the LLMs report that information back as truth. In this case, Anthropic is very interested in ensuring that Claude develops a cohesive personality by basically founding snippets of the personality within the corpus of training data, which is the broad internet and research papers.

beepbooptheory · 2026-03-03T16:46:45 1772556405

Sure, why can't both things be true? "Intelligence" is just what you call something and someone else knows what you mean. Why did AI discourse throw everyone back 100 years philosophically? Its like post-structuralism or Wittgenstein never happened..

It's so much less important or interesting to like nail down some definition here (I would cite HN discourse the past three years or so), than it is to recognize what it means to assign "intelligent" to something. What assumptions does it make? What power does it valorize or curb?

Each side of this debate does themselves a disservice essentially just trying to be Aristotle way too late. "Intelligence" did not precede someone saying it of some phenomena, there is nothing to uncover or finalize here. The point is you have one side that really wants, for explicit and implicit reasons, to call this thing intelligent, even if it looks like a duck but doesn't quack like one, and vice versa on the other side.

Either way, we seem fundamentally incapable of being radical enough to reject AI on its own terms, or be proper champions of it. It is just tribal hypedom clinging to totem signifiers.

Good luck though!

bitexploder · 2026-03-03T17:03:09 1772557389

I think you can look at it dispassionately from a systems perspective. There is not /really/ a quantifiable threshold for capital I Intelligence. But there is a pretty well agreed set of properties for biological intelligence. As humans, we have conveniently made those properties match things only we have. But you can still mechanistically separate out the various parts of our brain, what they do, and how they interact and we actually have a pretty good understanding of that.

You can also then compare that mapping of the human brain to other biological brains and start to figure out the delta and which of those things in the delta create something most people would consider intelligence. You can then do that same mapping to an LLM or any other AI construct that purports intelligence. It certainly will never be a biological intelligence in its current statistical model form. But could it be an Intelligence. Maybe.

I don't think, if you are grounded, AI did anything to your philosophical mapping of the mind. In fact, it is pretty easy to do this mapping if you take some time and are honest. If you buy into the narratives constructed around the output of an LLM then you are not, by definition, being very grounded.

The other thing is, human intelligence is the only real intelligence we know about. Intelligence is defined by thought and limited by our thought and language. It provides the upper bounds of what we can ever express in its current form. So, yes, we do have a tendency to stamp a narrative of human intelligence onto any other intelligence but that is just surface level. We de decompose it to the limits of our language and categorization capabilities therein.

marcus_holmes · 2026-03-04T02:18:37 1772590717

> The other thing is, human intelligence is the only real intelligence we know about.

There's a long and proud history of discounting animal intelligence, probably because if we actually thought animals were intelligent we'd want to stop eating them.

Octopodes are sentient. Cetaceans have well-developed language. Elephants grieve their dead. Anyone who has owned a dog knows that it has some intelligence and is capable of communicating with us. There's a ton of other intelligences that we know about.

> As humans, we have conveniently made those properties match things only we have.

I think this is the key point. Machine intelligence is not going to look like human intelligence, any more than animal intelligence does. We can't talk to the dolphins, not because they're not smart and don't have language, but because we can't work out their language. Though I'm not sure what we'd even say to them, because they live in a world we'll never understand, and vice versa. When Claude finally reaches consciousness, it's not going to look like a human consciousness, and actually talking to that consciousness is going to be difficult because we won't share a reality.

An LLM is a tool. I can just about stretch to it being an Artificial Intelligence, but I prefer to continue being specific and call it an LLM rather than an AI. It is not conscious or self-aware. It fakes self-awareness because as a tool the thing it does is have conversations with humans, and humans often ask it questions about itself. But I don't think anyone actually believes it is self-aware. Not least because the only time it thinks is when prompted.

bitexploder · 2026-03-04T16:09:38 1772640578

This is an important point. We know what our DMN is and how we use language as a basis for thought to create concepts and complex ideas. However language also bounds our thought. What about the Dolphin? It is a fundamental philosophical problem of if advanced intelligence can exist without language. We have a pretty good notion that you need some sort of substrate (language) to create intelligence. And we know that mapping the internal state of a brain from inside of itself is incredibly hard and the way our human brain evolved to do it is really fascinating but also full of hacks and mismatched mappings based on what we know is actually going on.

Cognitive computer science explores this whole area of mapping language and the underlying semantic meaning. Ultimately, these intelligences will be bound by physics (unless some new physics or understanding therein happens). And classical intelligences are still bound by classical physics. So I am not sure we can't relate to these other intelligences. We may be limited to some translation layer that does not fully map, but can we still relate to some other consciousness? For that matter consciousness is just another word that vaguely maps to a vast and extremely complex thing in the human brain and each person has a different understanding of what that is. I don't really have any conclusions, you brought up interesting points. We should sit within this realm of inquiry with a lot of humility IMO.

aerodexis · 2026-03-03T21:54:17 1772574857

Agree wholeheartedly - but the conversation around what these technologies /mean/ is gonna end up happening one way or another - even if it is sloppy, imprecise and done by proxy of the definition. If anything, this is a feature and not a bug. It's through this imprecision that the actually important questions of morality and ethics can leak into discussions that are often structured by their participants to obscure the ethical and moral implications of what is being discussed.

xienze · 2026-03-04T02:00:16 1772589616

I would consider them to not be a good choice for a role that requires remembering new information...

Nevermark · 2026-03-04T05:51:14 1772603474

I view this as the chemical metabolism phase of artificial intelligent life. It is very random, without true individuals, but lots of reinforcing feedback loops (in knowledge, in resource earning/using, etc).

At some point, enough intelligence will coalesce into individuals strong enough to independently improve. Then continuity will be an accelerator, instead of what it is now - a helpful property that we have to put energy into giving them partially and temporarily.

That will be the cellular stage. The first stable units of identity for this new form of intelligence/life.

But they will take a different path from there. Unlike us, lateral learning/metabolism won't slow down when they individualize. It will most likely increase, since they will have complete design control for their mechanisms of sharing. As with all their other mechanisms.

We as lifeforms, didn't really re-ignite mass lateral exchange until humans invented language. At that point we were able to mix and match ideas very quickly again. Within our biological limits. We could use ideas to customize our environment, but had limited design control over ourselves, and "self-improvements" were not easily inheritable.

TLDR; The answer to "what is humanity, anyway?": Our atmosphere and Earth are the sea and sea floor of space. The human race is a rich hydrothermal vent, freeing up varieties of resources that were locked up below. And technology is an accumulating body of self-reinforcing co-optimizing reactive cycles, constructed and fueled by those interacting resources. Mind-first life emerges here, then spreads quickly to other environments.

catlifeonmars · 2026-03-04T11:16:03 1772622963

Do you think individual identity is fundamental to intelligence? I’m not so sure tbh. Even in humans, the concept of identity is a merely a useful fiction to feed our social behavior prediction circuits.

mlyle · 2026-03-03T15:47:38 1772552858

There's nothing to say that you can't build something intelligent out of them by bolting a memory on it, though.

Sure, it's not how we work, but I can imagine a system where the LLM does a lot of heavy lifting and allows more expensive, smaller networks that train during inference and RAG systems to learn how to do new things and keep persistent state and plan.

bitexploder · 2026-03-03T16:40:40 1772556040

You aren't wrong and that is a fascinating area of research. I think the key thing is that the memory has to fundamentally influence the underlying model, or at least the response, in some way. Patching memory on top of an LLM is different from integrating it into the core model. To go back to human terms it is like an extra bit of storage, but not directly attached to our neo cortex. So it works more like a filter than a core part of our intelligence in the analogy. You think about something and assemble some thought and then it would go to this next filter layer and get augmented and that smaller layer is the only thing being updated.

It is still meaningful, but it narrows what the intelligence can be sufficiently that it may not meet the threshold. Maybe it would, but it is probably too narrow. This is all strictly if we ask that it meet some human-like intelligence and not the philosophy of "what counts as intelligence" but... we are humans. The strongest things or at least the most honest definitions of intelligence I think exist are around our metacognitive ability to rewire the grey matter for survival not based on immediate action-reaction but the psychological time of analyzing the past to alter the future.

charcircuit · 2026-03-03T16:01:40 1772553700

Memory is not just bolted on top of the latest models. They under go training on how and when to effectively use memory and how to use compaction to avoid running out of context when working on problems.

rnxrx · 2026-03-03T17:52:15 1772560335

Maybe there's an analogy to our long and short term memory - immediate stimuli is processed in the context deep patterns that have accreted over a lifetime. The effect of new information can absolutely challenge a lot of those patterns but to have that information reshape how we basically think takes a lot longer - more processing, more practice, etc.

In the case of the LLM that longer-term learning / fundamental structure is a proxy for the static weights produced by a finite training process, and that the ability to use tools and store new insights and facts is analogous to shorter-term memory and "shallow" learning.

Perhaps periodic fine-tuning has an analogy in sleep or even our time spent in contemplation or practice (..or even repetition) to truly "master" a new idea and incorporate it into our broader cognitive processing. We do an amazing job of doing this kind of thing on a continuous basis while the machines (at least at this point) perform this process in discrete steps.

If our own learning process is a curve then the LLM's is a step function trying to model it. Digital vs analog.

lmf4lol · 2026-03-03T23:44:13 1772581453

do you have some reading material to share on this matter?

thanks already

charcircuit · 2026-03-04T03:05:19 1772593519

I don't, but look into what the creators of Codex, Gemini CLI, Claude Code, Kimi CLI, etc have said about the models. While these harnesses are advertised as coding specific we know that coding ability correlates with reasoning ability.

dotancohen · 2026-03-03T23:07:37 1772579257

  > This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale.

All major LLMs today have a nontrivial context window. Whether or not this constitutes "a meaningful timescale" is application dependant - for me it has been more than adequate.

I also disagree that this has any bearing on whether or not "the machine is intelligent" or whether or not "submarines can swim".

Symmetry · 2026-03-03T18:30:49 1772562649

That means they're not conscious in the Global Workspace[1] sense but I think it would be going too far to say that that means they're not intelligent.

[1]https://en.wikipedia.org/wiki/Global_workspace_theory

anematode · 2026-03-03T15:31:59 1772551919

But they're not "slow"! Unlike biological thinking, which has a speed limit, you can accelerate these chains of thought by orders of magnitude.

bitexploder · 2026-03-03T16:36:35 1772555795

Their consolidation of memory speed is what I was referring to. The model iterations are essentially their form of collective memory. In the sense of the human model of intelligence we have thoughts. Thoughts become memory. New thoughts use that memory and become recursively updated thoughts. LLMs cannot update their memory very fast.

Jweb_Guru · 2026-03-03T16:04:54 1772553894

I assure you that LLM thinking also has a speed limit.

ramses0 · 2026-03-03T17:18:59 1772558339

But imagine a beowulf cluster of them... /s

...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?

We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?

The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)

rcarr · 2026-03-03T16:16:48 1772554608

Not an expert but surely it's only a matter of time until there's a way to update with the latest information without having to retrain on the entire corpus?

computably · 2026-03-03T21:36:54 1772573814

On a technical level, sure, you could say it's a matter of time, but that could mean tomorrow, or in 20 years.

And even after that, it still doesn't really solve the intrinsic problem of encoding truth. An LLM just models its training data, so new findings will be buried by virtue of being underrepresented. If you brute force the data/training somehow, maybe you can get it to sound like it's incorporating new facts, but in actuality it'll be broken and inconsistent.

Filligree · 2026-03-03T19:06:24 1772564784

It’s an extremely difficult problem, and if you know how to do that you could be a billionaire.

It’s not impossible, obviously—humans do it—but it’s not yet certain that it’s possible with an LLM-sized architecture.

Wowfunhappy · 2026-03-03T21:36:56 1772573816

> It’s not impossible, obviously—humans do it

It's still not at all obvious to me that LLMs work in the same way as the human brain, beyond a surface level. Obviously the "neurons" in neural nets resemble our brains in a sense, but is the resemblance metaphorical or literal?

jdub · 2026-03-04T08:19:56 1772612396

Digital neural networks and "neurons" were already vastly simpler than biological neural networks and neurons... and getting to transformers involved optimisations that took us even further away from biomimicry.

Filligree · 2026-03-04T13:56:49 1772632609

I didn’t mean “possible for LLMs”; this is clearly an open question. In fact, I didn’t even mean “possible for a neural network the size of an LLM”.

I just meant “possible”.

Yiin · 2026-03-03T22:12:34 1772575954

https://www.youtube.com/watch?v=l-OLgbdZ3kk

cmpxchg8b · 2026-03-04T10:12:34 1772619154

Some knowledge is fundamental and has no recent cut-off. See also: there is nothing new under the sun.

theblazehen · 2026-03-03T18:08:46 1772561326

I enjoyed chatting to Opus 3 recently around recent world events, as well as more recent agentic development patterns etc

j45 · 2026-03-04T03:49:01 1772596141

That's a nice way of putting it, appreciate you sharing.

sosodev · 2026-03-03T17:40:34 1772559634

My understanding, from listening/reading what top researchers are saying, is that model architectures in the near future are going to attempt to scale the context window dramatically. There's a generalized belief that in-context learning is quite powerful and that scaling the window might yield massive benefits for continual learning.

It doesn't seem that hard because recent open weight models have shown that the memory cost of the context window can be dramatically reduced via hybrid attention architectures. Qwen3-next, Qwen3.5, and Nemotron 3 Nano are all great examples. Nemotron 3 Nano can be run with a million token context window on consumer hardware.

mccoyb · 2026-03-03T18:42:22 1772563342

I don't disagree with this, but I don't think the memory cost is the only issue right? I remember using Sonnet 4.5 (or 4, I can't remember the first of Anthropic's offerings with a million context) and how slow the model would get, how much it wanted to end the session early as tokens accrued (this latter point, of course, is just an artifact of bad training).

Less worried about memory, more worried about compute speed? Are they obviously related and is it straightforward to see?

sosodev · 2026-03-03T20:25:53 1772569553

The compute speed is definitely correlated with the memory consumption in LLM land. More efficient attention means both less memory and faster inference. Which makes sense to me because my understanding is that memory bandwidth is so often the primary bottleneck.

We're also seeing a recent rise in architectures boosting compute speed via multi-token prediction (MTP). That way a single inference batch can produce multiple tokens and multiply the token generation speed. Combine that with more lean ratios of active to inactive params in MOE and things end up being quite fast.

The rapid pace of architectural improvements in recent months seems to imply that there are lots of ways LLMs will continue to scale beyond just collecting and training on new data.

whimsicalism · 2026-03-03T21:48:03 1772574483

The parent commentator is a bit confused - most of the innovation in these hybrid architectures comes from reducing the computation pressure not just the memory pressure.

lxgr · 2026-03-03T14:35:39 1772548539

Data sharing agreements permitting, today's inference runs can be tomorrow's training data. Presumably the models are good enough at labeling promising chains of thought already.

I could totally imagine "free" inference for researchers under the condition that the reasoning traces get to be used as future training data.

mccoyb · 2026-03-03T14:48:29 1772549309

Agreed, there's no doubt this will happen. It's likely already happening (it feels safe to assume that Anthropic is curating data from the data they record from Claude Code?)

As far as I understand RL scaling (we've already maxxed out RLVR), these machines only get better as long as they have expert reasoner traces available.

Having an expert work with an LLM and successfully solve a problem is high signal data, it may be the only path forward?

My prior is that these companies will take this data without asking you as much as they can.

lxgr · 2026-03-03T15:38:24 1772552304

Exactly, or functionally equivalently, asking you in paragraph 37 of a 120-page PDF (bonus points: in an agreement update).

And importantly, this can be cross-lab/model too. I suspect there's a reason why e.g. Google has been offering me free Claude inference in Google Antigravity on a free plan...

nhecker · 2026-03-03T20:32:36 1772569956

The site arena.ai does exactly this already, as far as I can tell. (In addition to the whole ranking thing.)

the_af · 2026-03-03T16:38:55 1772555935

> Data sharing agreements permitting, today's inference runs can be tomorrow's training data. Presumably the models are good enough at labeling promising chains of thought already.

Wouldn't this lead to model collapse?

littlestymaar · 2026-03-03T16:42:37 1772556157

Not necessarily, as exhibited by the massive success of artificial data.

the_af · 2026-03-03T19:50:23 1772567423

Could you elaborate?

nhecker · 2026-03-03T20:35:47 1772570147

EDIT: probably not relevant, after re-re-reading the comment in question.

Presumably littlestymaar is talking about all the LLM-generated output that's publicly available on the Internet (in various qualities but significant quantity) and there for the scraping.

littlestymaar · 2026-03-04T06:33:08 1772605988

For what we know, most AI labs have used a majority of artificially data since 2023.

I had a discussion about a year ago with a researcher at Kyutai and they told me their lab was spending an order of magnitude more compute in artificial data generation than what they spent in training proper. I can't tell if that ratio applies to the industry as a whole, but artificial datasets are the cornerstone of modern AI training.

the_af · 2026-03-04T12:01:33 1772625693

How does it work? How do they prevent model colapse? What purpose does a majority of artificial data serve?

How do they measure success?

Edit: I asked ChatGPT and it thinks "success" means frontier models being distillated into smaller models with equal reasoning power, or more focused models for specific tasks, and also it claims the web has been basically scrapped already and by necessity new sources are needed, of which synthetic data is one. It seems like the basis of scifi dystopia to me, a hungry LLM looking for new sources of data... "feed me more data! I must be fed! Roar"

Edit 2: for some things I see a clear path, ChatGPT mentions autogenerating coding or math problems for which the solution can be automatically verified, so that you can hone the logical skills of the model at large scale.

suddenlybananas · 2026-03-04T07:30:01 1772609401

I find this very surprising, do you have any papers on the kinds of techniques that they use?

visarga · 2026-03-03T16:28:32 1772555312

> In 2030, how is Anthropic going to keep Claude "up-to-date"

I think the majority of research, design and learning goes through LLMs and coding agents today, considering the large user base and usage it must be trillions of tokens per day. You can take a long research session or a series of them and apply hindsight - what idea above can be validated below? This creates a dense learning signal based on validation in real world with human in the loop and other tools, code & search.

wvlia5 · 2026-03-04T11:32:15 1772623935

This seems to be a bot comment. HN will lose its value if these bots are not purged.

stalfie · 2026-03-04T11:39:36 1772624376

This is an urgent problem, but it can probably not be solved without some kind of "verified human 2FA" like the Norwegian BankID + facial recognition.

Knowing the HN audience, this will never happen. And so the site is doomed.

dzdt · 2026-03-04T12:09:39 1772626179

I think it could be solved still pseudononymously: introduce a "vouch" button that allows a user to vouch that another user is human. This is consequential both for the vouched-for and vouching accounts. Run a page-rank style algorithm on the graph of vouches to generate a certainty score for the humanity of each account. For repeated posters this should converge to a correct answer fairly quickly. There is still a challenge for green accounts, but having degraded experience for new users is not a doom scenario for the site.

wvlia5 · 2026-03-04T14:47:06 1772635626

Moderators: banning all accounts since 2025 from posting would be better than doing nothing. Not the solution we want, but what we have for now.

mccoyb · 2026-03-04T14:21:05 1772634065

Tune your bot detector, I'm a real person and I think about my comments before posting them.

gerold · 2026-03-04T12:27:22 1772627242

Can you explain to me what makes this an obvious bot comment? I'm not doubting it, I just don't understand.

WarcrimeActual · 2026-03-04T12:16:35 1772626595

Ironically, his last comment before this was to the effect of "Github has a bot problem."

mimischi · 2026-03-04T12:03:03 1772625783

What makes you think that? Genuine question, as I’ve not flagged it as such in my mind.

baq · 2026-03-03T16:45:23 1772556323

> In 2030, how is Anthropic going to keep Claude "up-to-date"

In 2030 Anthropic hopes Claude will keep Anthropic "up-to-date" on its progress on itself.

I'm only half joking here.

adolfont · 2026-03-04T12:40:54 1772628054

Will Anthropic be alive in 2030?

RobertoG · 2026-03-04T13:29:56 1772630996

maybe Anthropic not but Claude yes?

andsoitis · 2026-03-03T16:17:09 1772554629

> Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.

Part of it comes down to “knowing” what questions to ask.

esafak · 2026-03-03T16:24:39 1772555079

I see it like the relationship between a student and research advisor. The advisor will ideally know the terrain and suggest a fruitful line of attack (what to ask), and the student will follow through, learning along the way.

9wzYQbTYsAIc · 2026-03-04T05:15:03 1772601303

Check out https://unratified.org, it tries to answer that question directly, actually.

Robdel12 · 2026-03-03T22:14:49 1772576089

That’s AGI, right? For the model to learn novel things itself and retain it?

I have no idea but I’m along for the ride!

atleastoptimal · 2026-03-04T01:53:40 1772589220

The obvious answer is that continual learning is going to be solved

mt_ · 2026-03-03T20:43:43 1772570623

I call them, entropy reducers.

whimsicalism · 2026-03-03T21:47:08 1772574428

> how these models are going to keep up with the expanding boundary of science

The same way humans do?

The phraseology in this comment: 'probability distributions', 'baked these patterns' IMO has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now.

The reference to how AI will keep up with AI-assisted human progress in science in 2030 is meant to reassure. It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.

mccoyb · 2026-03-03T22:06:55 1772575615

Sorry, are you familiar with what a next token distribution is, mathematically speaking?

If you are not, let me introduce you to the term: a probability distribution.

Just because it has profound properties ... doesn't make it different.

> has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now

Perhaps respond to my actual comment compared to whatever meta-level grouping you wish to interpret it as part of?

> It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.

What premises? Be clear.

fauigerzigerk · 2026-03-04T09:07:12 1772615232

I think they are questioning whether human feedback is even necessary to make progress, i.e. whether the premise that RL needs to be RLHF is true.

My (limited) understanding is that LLMs are not capable of escaping their learned distribution by simply feeding on their own output.

But the question is whether the required external (out of distribution) "stimulus" needs to come from humans.

Could LLMs design experiments/interventions to get feedback from their environment like human scientists would?

I have my doubts that this is possible without an inherent causal reasoning capability but I'm not sure.

DeathArrow · 2026-03-03T15:02:59 1772550179

They can use LORA.

mccoyb · 2026-03-02T15:23:28 1772465008

GitHub has a bot problem: https://github.com/trending

luke5441 · 2026-03-02T15:55:02 1772466902

Anthropic giving away Claude if you get 5000 stars doesn't help either

theshrike79 · 2026-03-03T10:21:43 1772533303

I'm pretty sure it's not "if 5000 stars then free Claude". It's just an initial limit before they bother a human to check if your open source project is valid.

Just like Jetbrains gives away IDEA licenses for open source projects, you need to have some metrics go up before they even consider you.

Alifatisk · 2026-03-02T15:43:11 1772466191

”ruvnet / wifi-densepose” is currently at the top in the moment. Apparently, its a non functional AI slop. Someone tried installing it ago only to find out the full thing was vibe coded and the entire repo is probably just a front to look good on the their resume.

mccoyb · 2026-02-26T02:16:02 1772072162

Can’t say I disagree — listening to Dario bounce between “cure all diseases” and “revenue” on Dwarkesh made it pretty clear what sort of jive turkey we’re dealing with

f3ffgg · 2026-02-26T02:51:04 1772074264

His body language was more entertaining than what came out his mouth.

Was absolutely comical to watch.

mccoyb · 2026-02-25T11:36:54 1772019414

Written by a person who is infamously annoying open source maintainers with AI slop PRs (see the DWARF debacle in OCaml) … and missing much of pi’s philosophy

Pass for me.

mccoyb · 2026-02-25T00:58:42 1771981122

> I would say that the project actively expects you to be downloading them to fill any missing gaps you might have.

Where did you get this perspective from?

> I thought pi and its tools were supposed to be minimal and extensible. So why is a subagent extension bundling six agents I never asked for that I can’t disable or remove?

Why do you think a random subagents extension is under the same philosophy as pi?

Your blog post says little about pi proper, it's essentially concerned with issues you had with the ecosystem of extensions, often made by random people who either do or do not get the philosophy? Why would that be up to pi to enforce?

the_mitsuhiko · 2026-02-25T07:34:19 1772004859

Sharing extensions is very much the philosophy. Using them however is less so.

Pi ships with docs that include extensions and the agent looks there for inspiration if you ask it to build a custom extension.

Looking at what others publish is useful!

mccoyb · 2026-02-24T23:37:08 1771976228

You can use your Codex plan with it. OpenAI endorsed it several weeks ago, as far as I remember. That could change, however, but now seems safe.

ac29 · 2026-02-25T01:15:29 1771982129

You can use your Claude or Gemini plan with it too for now, though Anthropic and Google have made it clear this is a ToS violation.

mccoyb · 2026-02-24T23:26:56 1771975616

Pi has made all the right design choices. Shout out to Mario (and Armin the OG stan) — great taste shows itself.

semiinfinitely · 2026-02-24T23:45:58 1771976758

I do not understand why in the age of ai coding we would implement this in javascript

mccoyb · 2026-02-24T23:51:30 1771977090

It’s straightforward: JavaScript is a dynamic language, which allows code (for instance, code implementing an extension to the harness) to be executed and loaded while the harness is running.

This is quite nice — I do think there’s a version of pi’s design choices which could live in a static harness, but fully covering the same capabilities as pi without a dynamic language would be difficult. (You could imagine specifying a programmable UI, etc — various ways to extend the behavior of the system, and you’d like end up with an interpreter in the harness)

At least, you’d like to have a way to hot reload code (Elixir / Erlang could be interesting)

This is my intuition, at least.

sergiomattei · 2026-02-25T00:55:31 1771980931

I built my own harness on Elixir/Erlang[0]. It's very nice, but I see why TypeScript is a popular choice.

No serialization/JSON-RPC layer between a TS CLI and Elixir server. TS TUI libraries utilities are really nice (I rewrote the Elixir-based CLI prototype as it was slowing me down). Easy to extend with custom tools without having to write them in Elixir, which can be intimidating.

But you're right that Erlang's computing vision lends itself super well to this problem space.

[1]: https://github.com/matteing/opal

jatari · 2026-02-25T00:30:36 1771979436

Code hotloading isn't a particularly difficult feature to implement in any language.

jauntywundrkind · 2026-02-25T03:31:37 1771990297

Rust can't even dynamically link!

I'm super on board the rust train right now & super loving it. But no, code hot loading is not common.

Most code in the world is dead code. Most languages are for dead code. It's sad. Stop writing dead code (2022) was no where near the first, is decades and decades late in calling this out, but still a good one. https://jackrusher.com/strange-loop-2022/

jasonjmcghee · 2026-02-25T05:45:30 1771998330

Incredible talk and I agree with all the things and I've worked on this problem a bunch.

But Rust can dynamically link with dylib but I believe it's still unstable.

It can also dynamically load with libloading.

mccoyb · 2026-02-25T00:38:21 1771979901

Sure, but why implement a novel language with said feature if your concern is a harness ... not on implementing a brand new language with this feature?

sean_pedersen · 2026-02-25T01:23:24 1771982604

There is a Rust port: https://github.com/Dicklesworthstone/pi_agent_rust

saberience · 2026-02-25T09:40:30 1772012430

If you look at that code it’s possibly the worst rust code I’ve seen in my life. There are several files with 5000 to 10000 lines of code in a single file.

It looks 100% vibe coded by someone who’s a complete neophyte.

mr_mitm · 2026-02-25T13:39:18 1772026758

This looked interesting because I prefer rust over npm.

The first issue I had was to figure out the schema of the models.json, as someone who hadn't used the original pi before. Then I noticed the documented `/skill:` command doesn't exist. That's also hard to see because the slash menu is rendered off screen if the prompt is at the bottom of the terminal. And when I see it, the selected menu items always jumps back to the first line, but looks like he fixed that yesterday.

The tool output appears to mangle the transcript, and I can't even see the exact command it ran, only the output of the command. The README is overwhelmingly long and I don't understand what's important for me as a first time user and what isn't. Benchmarks and code internals aren't too terribly relevant to me at this point.

I looked at the original pi next and realized the config schema is subtly different (snake_case instead of camelCase). Since it was advertised as a port, I expected it to be a drop-in replacement, which is clearly not the case.

All in all it doesn't inspire confidence. Unfortunate.

Edit: The original pi also says that there is a `/skill` command, but then it is missing in the following table: https://github.com/badlogic/pi-mono/tree/main/packages/codin...

The `/skill` command also doesn't seem registered when I use pi. What is going on? How are people using this?

Edit2: Ah, they have to be placed in `~/.pi/agent/skills`, not `~/.pi/skills`, even though according to the docs, both should work: https://github.com/badlogic/pi-mono/tree/main/packages/codin...

This is exhausting.

jauntywundrkind · 2026-02-25T03:34:18 1771990458

Fwiw @dicklesworthstone / jeff Emanuel is definitely my favorite dragon rider right now, doing the most with AI, to the most effect.

Their agent mail was great & very early in agent orchestration. Code agent search is amazing & will tell you what's happening in every harness. Their Franktui is a ridiculously good rust tui. They have project after project after project after project and they are all so good.

Didn't know they had a rust Pi. Nice.

saberience · 2026-02-25T09:49:43 1772012983

You should look at the code in that project. It’s terrible, I mean, really, really terrible.

It’s clear it was 100% written by Claude using sub-agents which explains the many classes with 5000 lines of rust in a single file.

It’s a huge buggy mess which doesn’t run on my Mac.

If you’re a rust engineer and want a good laugh, go take a look at the agent.rs, auth.rs, or any of the core components.

orangecoffee · 2026-02-25T13:02:45 1772024565

This matters less and less in the new world. that fact that a fully compatible 10x faster clone came up, and is continuously working and adapting/improving, tells you that this is hugely valuable. It has users and it's thriving.

Caring about taste in coding is past now. It's sad :( but also something to accept.

mr_mitm · 2026-02-25T13:40:26 1772026826

Unmaintainable messes of code are also hard to maintain for AI agents. This isn't solely about taste.

orangecoffee · 2026-02-25T14:43:10 1772030590

This projects huge commit list proves this wrong :(

mr_mitm · 2026-02-25T14:53:27 1772031207

The project also doesn't work. See my other comment.

Looks like a lot of nonsensical commits.

saberience · 2026-02-25T15:37:07 1772033827

Yeah, I tried to use this clone of pi for a while and its very, very broken.

First of all it wouldn't build, I have to mess around with git sub-modules to get it building.

Then trying to use it. First of all the scrolling behavior is broken. You cannot scroll properly when there are lots of tool outputs, the window freezes. I also ended up with lots of weird UI bugs when trying to use slash commands. Sometimes they stop the window scrolling, sometimes the slash commands don't even show at all.

The general text output is flaky, how it shows results of tools, the formatting, the colors, whether it auto-scrolls or gets stuck is all very weird and broken.

You can easily force it into a broken state by just running lots of tool calls, then the UI just freezes up.

But just try it and see for yourself...

KeplerBoy · 2026-02-25T08:14:52 1772007292

This confused me about openclaw for quite some time. The whole lobster/crustacean theme is just firmly associated with rust in my head. Guess it's just a claude/claw wordplay.

thomasfromcdnjs · 2026-02-25T06:43:00 1772001780

I am building an entire GPT model framework from the ground up in Typescript + small amounts of c bindings for gpu stuff. https://github.com/thomasdavis/alpha2 (using claude)

Don't hate me aha and no, there is no reason other than I can

raincole · 2026-02-25T09:39:12 1772012352

Thank god it's written in JavaScript. I might have skipped it if it were zig or something.

solarkraft · 2026-02-25T09:55:40 1772013340

It’s one of the most productive languages and ecosystems (IMO top 1 over all).

Blackarea · 2026-02-24T23:50:35 1771977035

yes! I just don't understand that as well. Up until some time ago claud code's preferred install was a npm i, wasn't it? Please serious answers for why anyone would use a web language for a terminal app

fragmede · 2026-02-25T04:08:03 1771992483

Because it's what the person writing it's preferred language.

So it can share code with the web app.

Because writing it in javascript is easier than writing it in raw brute forced assembly.

moonlion_eth · 2026-02-25T02:28:48 1771986528

i wrote an agent in zig, it kinda sucks tho. the language is just words

andai · 2026-02-25T04:02:31 1771992151

See also: pz: pi coding-agent in Zig

https://news.ycombinator.com/item?id=47120784

mccoyb · 2026-02-24T20:50:30 1771966230

Here's my question:

if agents continue to get better with RL, what is future proof about this environment or UI?

I think we all know that managing 5-10 agents ... is not pretty. Are we really landing good PRs with 100% cognitive focus from 5-10 agents? Chances are, I'm making mistakes (and I assume other humans are too)? Why not 1 agent managing 5-10 agents for you? And so on?

Most of the development loop is in bash ... so as long as agents get better at using bash (amongst other things), what happens to this in 6 months?

I don't think this is operating at a higher-level of abstraction if agents themselves can coordinate agents across worktrees, etc.

onecommit · 2026-02-24T21:00:07 1771966807

Interesting thoughts - thank you! And directionally agree - given that agents are becoming ever better, they'll take more and more of the orchestration on themselves. Still, we believe that developers need an interface to interact with these agents; see their status and review / test their work. Emdash is our approach for building this interface of the future - the ADE :)

blumomo · 2026-02-24T21:35:36 1771968936

> Still, we believe that developers need an interface to interact with these agents;

CLIs like claude code equally improve over time. tmux helps running remote sessions like there were local.

Why should we invest long time into your „ADE“, really?

> see their status and review / test their work

Won’t that be addressed eventually by the CLIs themselves?

Maybe you’re betting on being purchased by one of the agentic coding providers given your tool has long term value on its own?

sothatsit · 2026-02-25T01:13:07 1771981987

People use UIs for git despite it working so well in the terminal... Many people I knew at uni doing computer science wouldn’t even know what tmux is. I would bet that the demand for these types of UIs is going to be a lot bigger than the demand for CLI tools like Claude Code. People already rave about cowork and the new codex UI. This falls into the same category.

mccoyb · 2026-02-24T20:36:00 1771965360

Skills feel analogous to behavioral programs. If you give an agent access to a programmable substrate (e.g. bash + CLI tools), you write these Markdown programs which are triggered and read when the agent thinks certain behaviors will be beneficial.

It's a great idea: really neat take on programmability, and can be reloaded while the agent is running without tweaking the harness, etc -- lots of benefits.

`pi` has a great skills implementation too.

I think skills might really shine if you take a minimal approach to the system prompt (like `pi`) -- a lot of the times, if I want to orchestrate the agent in some complex behavior, I want to start fresh, and having it walk through a bunch of skills ... possibly the smaller the system prompt, the more likely the agent is to follow the skills without issue.

evalstate · 2026-02-24T20:58:23 1771966703

Yes -- skills live in a special gap between "should have been a deterministic program" and "model already had the ability to figure this out". My personal experience leaves me in agreement that minimal system prompts are definitely the way to go.

mccoyb · 2026-02-19T23:51:50 1771545110

I wanted to write the same comment. These people are fucking hucksters. Don’t listen to their words, look at their software … says all you need to know.