More

antirez · 2026-03-14T21:18:43 1773523123

As yourself: what kind of tool I would love to have, to accomplish the work I'm asking the LLM agent to do? Often times, what is practical for humans to use, it is for LLMs. And the reply is almost never the kind of things MCP exports.

CharlieDigital · 2026-03-14T21:30:51 1773523851

You interact with REST APIs (analogue of MCP tools) and web pages (analogue of MCP resources) every day.

I'd recommend that you take a peek at MCP prompts and resources spec and understand the purpose that these two serve and how they plug into agent harnesses.

antirez · 2026-03-14T22:11:02 1773526262

So you love interacting with web sites sending requests with curl? And if you need the price of an AWS service, you love to guess the service name (querying some other endpoint), then ask some tool the price for it, get JSON back, and so forth? Or you are better served by a small .md file you pre-compiled with the services you use the most, and read from it a couple of lines?

> I'd recommend that you take a peek at MCP prompts and resources spec

Don't assume that if somebody does not like something they don't know what it is. MCP makes happy developers that need the illusion of "hooking" things into the agent, but it does not make LLMs happy.

antirez · 2026-03-12T07:41:07 1773301267

Of what is happening with AI the most bizarre thing, for me, is how these tools are 20$ away from being tested. Yet, to form an idea about actual real world usefulness many folks seek some kind of indirect proxy.

This is combined with the incredible general feeling that automatic programming can be evaluated as producing the same results regardless of the user using it. Something true only with benchmarks, basically. Benchmarks are useful metrics because even if weak we need some guidance, but the current real world dynamic is that AI will completely change what it is capable of doing based on the programmer using it.

Maybe never in the history of programming there was a time where diverse programming skills were as important as today (but this may change as AI evolves).

croemer · 2026-03-12T09:40:06 1773308406

Benchmarks do a few things: 1. Help choose a model from the hundreds out there, or at least help create a shortlist to try. 2. Quantify progress/improvements (or lack thereof) over time. 3. Inform about relative strengths and weaknesses.

utopiah · 2026-03-12T13:12:52 1773321172

Assuming the benchmark can't be gamed.

utopiah · 2026-03-12T13:10:58 1773321058

> automatic programming can be evaluated as producing the same results regardless of the user using it.

That's something I've argued here several time and that's actually rarely done. Namely it's totally different when a non-developer use such tool for programming vs when a (senior) SWE does. That's a fundamental point which IMHO a potential for (non-riskfree) augmentation versus replacement. Replacement though makes for excellent narrative (if not scapegoat) yet if the tool is "productive" (with KPIs to agree on) only with skilled staff that it's not the reality, just a "wish".

Archit3ch · 2026-03-12T22:20:35 1773354035

I'm about to put up the 20 to see what everyone is raving for. But the real cost is time: if this doesn't work, I'm worse off than never trying.

antirez · 2026-03-09T17:19:22 1773076762

> They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters.

Honestly I was confused about the summarization of my blog post into just a legal matter as well. I hope my blog post will be able to flash at least a short time in the HN front page so that the actual arguments it contain will get a bit more exposure.

antirez · 2026-03-08T09:24:45 1772961885

My private benchmarks, using DeepSeek replies to coding problems as a baseline, with Claude Opus as judge. However when reading this percentages consider that the no-think setup is much faster, and may be more practical for most situations.

    1   │ DeepSeek API -- 100%
    2   │ qwen3.5:35b-a3b-q8_0 (thinking) -- 92.5%
    3   │ qwen3.5:35b-a3b-q4_K_M (thinking) -- 90.0%
    4   │ qwen3.5:35b-a3b-q8_0 (no-think) -- 81.3%
    5   │ qwen3.5:27b-q8_0 (thinking) -- 75.3%

I expected the 27B dense model to score higher. Disclaimer: those numbers are from one-shot replies evaluations, the model was not put in a context where it could reiterate as an agent.

throwdbaaway · 2026-03-08T11:31:58 1772969518

Yours is the only benchmark that puts 35B A3B above 27B. Time for human judgement to verify? For example, if you look at the thinking traces, there might be logical inconsistencies in the prompts, which then tripped up the 27B more when reasoning. This will also be reflected in the score when thinking is disabled, but we can sort of debug with the thinking traces.

antirez · 2026-03-08T12:00:38 1772971238

I inspected manually and indeed the 27B is doing worse, but I believe it could be due to the exact GGUF in the ollama repository and/or with the need of adjusting the parameters. I'll try more stuff.

andhuman · 2026-03-08T12:55:30 1772974530

Isn’t llama.cpp’s implementation of Qwen 3.5 better, or am I misinformed?

antirez · 2026-03-08T14:20:01 1772979601

There was a recent fix by ollama and I used it.

alansaber · 2026-03-08T14:07:06 1772978826

Maybe a reductive question but are there any thinking models that don't (relatively) add much latency?

ac29 · 2026-03-08T15:44:32 1772984672

The whole point of thinking is to throw more compute/tokens at a problem, so it will always add latency over non thinking modes/models. Many models do support variable thinking levels or thinking token budgets though, so you can set them to low/minimal thinking if you want only a minimal increase in latency versus no thinking.

antirez · 2026-03-07T08:33:00 1772872380

> Any idiot can now prompt their way to the same software.

Not only it would be good if true, but it is also not true. Good programmers learn how to build things, for the most part, since they know what to build, and have a general architectural idea of what they are going to build. Without that, you are like the average person in the 90s with Corel Draw in their hands, or the average person with an image diffusion model today: the output will be terrible because of lack of taste and ideas.

antirez · 2026-03-07T08:27:17 1772872037

As it always happens, people realize there is something new is Redis in 2 years or more. With Streams it tragically took like 4 years and then everybody started to use it for this use case, with a sharp acceleration in the latest few years. I believe this is what is happening for vector sets as well. For a reduced size problem like that you just git clone Redis, add the vectors into a key with VADD, and query with VSIM. It's a 10 lines Python script that will deliver 20k/50k queries per second more or less, out of the box with zero optimizations.

But here the problem is: the scale. Billions of vectors. And I wonder if Redis should add on-disk vector sets, which I started to sketch months ago and never implemented. So my question is, the "3B" in Vicky's blog post is theoretical or is a practical need many folks have? I'm asking because at such a scale, the main problem is to generate the embeddings for your items, whatever they are.

https://gist.github.com/antirez/b3cc9af4db69b04756606ad91cab...

EDIT: I wonder if it is possible to use in memory vector sets to index discrete on disk dense blobs of nearby vectors to query with an approach like the one described in the post. It's like a H-HNSW, and resembles to certain on-disk approaches for vector similarity indeed.

antonvs · 2026-03-07T09:30:11 1772875811

> at such a scale, the main problem is to generate the embeddings for your items

Generation is often decoupled from querying, though. Consider LLMs, where training is a very expensive, slow, hardware intensive process, whereas inference is much faster and much less intensive.

But the performance of inference is in many ways more important than the performance of training, because inference is what users interact with directly.

antirez · 2026-03-05T11:21:21 1772709681

I believe that Pilgrim here does not understand very well how copyright works:

> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code

This is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.

maybewhenthesun · 2026-03-05T17:05:14 1772730314

Regardless of the legal interpretations, I think it's very worrying if an automated AI rewrite of GPLed code (or any code for that matter) could somehow be used to circumvent the original license. That kinda takes out the one stick the open source community has to force soulless multinationals to contribute back to the open source projects they use.

rao-v · 2026-03-05T17:49:47 1772732987

I’m genuinely surprised to see this not discussed more by the FOSS community. There are so many ways to blow past the GPL now:

1. File by file rewrite by AI (“change functions and vars a bit”)

2. One LLM writes a diff language (or pseudo code) version of each function that a diff LLM translates back into code and tests for input/output parity

The real danger is that this becomes increasingly undetectable in closed source code and can continue to sync with progress in the GPLed repo.

I don’t think any current license has a plausible defense against this sort of attack.

nmfisher · 2026-03-06T02:11:24 1772763084

I’ve never delved fully into IP law, but wouldn’t these be considered derivative works? They’re basically just reimplementing exactly the same functionality with slightly different names?

This would be different from the “API reimplementation” (see Google vs Oracle) because in that case, they’re not reusing implementation details, just the external contract.

rerdavies · 2026-03-08T10:45:29 1772966729

Because copyrights do not protect ideas. Thankfully. We are free to express ideas, as long as we do so in our own words. How that principle is applied in actual law, and how that principle is a applied to software is ridiculously complicated. But that is the heart of the principle at play here. The law draws a line between ideas (which cannot be copyrighted), and particular expressions of those ideas (e.g. the original source code), which are protected. However, it is an almost fractally complicated line which, in many place, relies on concepts of "fairness", and, because our legal system uses a system of legal precedence, depends on interpretation of a huge body of prior legal decisions.

Not being a trained lawyer, or a Supreme Court justice, I cannot express a sensible position as to which side of the line this particular case falls. There are, however, enormously important legal precedents that pretty much all professional software developers use to guide their behaviour with respect to handling of copyrighted material (IBM vs. Ahmdall, and Google v. Oracle, particularly) that seem to suggest to us non-lawyers that this sort of reimplementation is legal. (Seek the advice of a real lawyer if it matters).

rao-v · 2026-03-08T19:51:19 1772999479

Taking a step back, it seems fairly clear that wherever you set the bar, it should be possible to automate a system that reads code, generates some sort of intermediate representation at the acceptable level of abstraction and then regenerates code that passes an extensive set of integration tests … every day.

At that point our current understanding of open source protections … fails?

rerdavies · 2026-03-09T05:05:44 1773032744

Depends whether you sit on the MIT half of open source, or the GPL side of open source, I suppose.

pas · 2026-03-06T23:20:59 1772839259

there's usually a test for originality, and it involves asking things (from the jury) like, is it transformative enough?

so if someone tells the LLM to write it in WASM and also make it much faster and use it in a different commercial sector... then maybe

since 2023 the standard is much higher (arguably it was placed too low in 1993)

singpolyma3 · 2026-03-06T02:38:15 1772764695

"change functions and bars a bit" isn't a rewrite. Anything where the LLM had access to the original code isn't a rewrite. This would just be a derivative work.

However most of the industry willfully violates the GPL without even trying such tricks anyway so there are certainly issues

UqWBcuFx6NV4r · 2026-03-06T14:36:27 1772807787

The fact that you are drawing such absolute conclusions is indication enough that you are not qualified to speak on this.

wakawaka28 · 2026-03-06T02:05:32 1772762732

#1 is already possible and always has been. I never heard of a case of anyone actually trying it. #2 is too nitpicky and unnecessarily costly for LLMs. It would be better to just ask it to generate a spec and tests based on the original, them create a separate implementation based on that. A person can do that today free and clear. If LLMs will be able to do this, we will just need to cope. Perhaps the future is in validating software instead of writing it.

wlonkly · 2026-03-06T03:24:57 1772767497

(1) sounds like a derivative work, but (2) is an interesting AI-simulacrum of a clean room implementation IF the first LLM writes a specification and not a translation.

wareya · 2026-03-06T00:35:44 1772757344

It's worrying, but it's consistent with how copyright law is currently written. Laws haven't caught up with what technology is currently capable of yet. The discussion should be whether, and if so how, our laws should be tweaked to stop this from getting out of hand, IMO.

TruePath · 2026-03-06T20:19:34 1772828374

If the AI is good enough to truly implement the whole thing to a similar level of reliability without copying it then who cares. At that point you should be able to decompile any program you want and find enough information inside that an AI can go write a similar quality program from the vague information about the call graph. We've transcended copyright in computer code.

If it can't and it costs a bunch of money to clean it up then same as always.

OTOH if what is actually happening is just that it is rewording the existing code so it looks different then it is still going to run afoul of copyright. You can't just rewrite harry potter with different words.

Note that even with Google vs oracle it was important they didn't need the actual code just the headers to get the function calls were enough. Yes it's true that the clean room isn't required but when you have an AI and you can show that it can't do it a second time without looking at the source (not just function declarations) that's pretty strong evidence.

therealpygon · 2026-03-05T18:55:14 1772736914

Take AI out…if a person can do it, which they can, the situation hasn’t changed. Further, it was a person who did it, with the assistance of AI. Also, the concept that you “can’t be exposed to the code before writing a compatible alternative” is utterly false in their arguments. In fact, one could take every single interface definition they have defined to communicate and use those interfaces directly to write their own, because in fact this i(programmatic) interface code is not covered by copyright (with an implicit fair use exemption due to the face the software cannot operate without activating said interfaces). The Java lawsuit set that as precedent with JDK. A person could have absolutely rewritten this software using the interfaces and their knowledge, which is perfectly legal if they don’t literally copy and re-word code. Now, if it IS simply re-worded copies of the same code and otherwise the entire project structure is basically the same, it’s a different story. That doesn’t sound like what happened.

Finally, how exactly do people think corporations rewrite portions of code that were contributed before re-licensing under a private license? It is ABSOLUTELY possible to rewrite code and relicense it.

Edit: Further, so these people think you contribute to a project, that project is beholden to your contribution permanently and it can never be excised? That seems like it would blatantly violate their original persons rights to exercise their own control of the code without those contributions, which is exactly the purpose of a rewrite.

rmast · 2026-03-06T06:54:27 1772780067

As part of the relicensing ZeroMQ did a few years ago, they sought permission from all previous contributors (yes, it was a multi-year effort). Code contributions that they weren’t able to get permission to relicense resulting in the corresponding lines being removed (or functionality rewritten from scratch).

ottah · 2026-03-06T02:49:15 1772765355

It cuts both ways. You can write a GPL version of a proprietary or permissively licensed program. The only difference is the effort of the rewrite is (theoretically) easier.

(I have my doubts the rewrite is a reasonably defect free replacement)

ineedasername · 2026-03-07T05:05:20 1772859920

It’s less worry to me given that a year ago this would have been exceptionally harder to do, requiring a lot more time and effort and been more costly. A year from now it will be even easier. All of this means that one aspect of the mission that brought about the need for a license like this is now fundamentally easier whether or not the license is used. There can be less worry about software locked up in closed source overall.

TophWells · 2026-03-06T13:40:36 1772804436

True, but if that is found to be how it works then an automated AI rewrite of closed-source code is just as unbound by the original license. Which is a much bigger win for the open-source community, since any closed-source software can become the inspiration for an open-source project.

luma · 2026-03-06T03:53:22 1772769202

If automated AI rewrites are generally feasible, then the marginal price of nearly all software trends to zero.

robinsonb5 · 2026-03-06T08:36:54 1772786214

If code becomes essentially free (ignoring for a moment the environmental cost or the long term cost of allowing code generation to be tollboothed by AI megacorps) the value of code must lie in its track record.

The 5-day-old code in chardet has little to no value. The battle-tested years-old code that was casually flushed away to make room for it had value.

wakawaka28 · 2026-03-06T02:02:55 1772762575

Soulless multinationals often want to share costs with other soulless multinationals, just like individuals do. So I think there will always be publicly shared code. The real question is whether this code will be worth much if it can be implemented so quickly by a machine.

judahmeek · 2026-03-06T03:51:12 1772769072

Implementation is only one of the costs shared through open-source projects.

There are others, such as security vulnerability detection, support, & general maintenance.

singpolyma3 · 2026-03-06T02:36:21 1772764581

If it actually is a rewrite it's not "circumventing" it's just a new thing

CamperBob2 · 2026-03-05T18:40:04 1772736004

That kinda takes out the one stick the open source community has to force soulless multinationals to contribute back to the open source projects they use.

I'll trade that stick for what GenAI can do for me, in a heartbeat.

The question, of course, is how this attitude -- even if perfectly rational at the moment -- will scale into the future. My guess is that pretty much all the original code that will ever need to be written has already been written, and will just need to be refactored, reshaped, and repurposed going forward. A robot's job, in other words. But that could turn out to be a mistaken guess.

beepbooptheory · 2026-03-05T18:52:12 1772736732

I think it's very weird but valid I guess to want to be just atomic individual in constant LLM feedback loop. But, at risk of sounding too trite and wholesome here, what about caring for others, the world at large? If you wanna get your thing to rewrite curl or something, that's again really weird but fine, but just don't share it or try to make money off of it. Isn't that like even the rational position here if you still wanna have good training materials for future models? These need not be conflicting interests! We can all be in this together, even if you wanna totally fork yourself into your own LLM output world.

What happened to sticking up for the underdogs? For the goodness of well-made software in itself, for itself? Isn't that what gave you all the stuff you have now? Don't you feel at least a little grateful, if maybe not obliged? Maybe we can start there?

lukeschlather · 2026-03-05T20:56:59 1772744219

> If you wanna get your thing to rewrite curl or something, that's again really weird but fine, but just don't share it or try to make money off of it.

The whole point of the GPL is to encourage sharing! Making money off of GPL code is not encouraged by the text of the license, but it is encouraged by the people who wrote the licenses. Saying "don't share it" is antithetical to the goals of the free software movement.

I feel like everyone is getting distracted by protecting copyright, when in fact the point of the GPL is that we should all share and share alike. The GPL is a negotiation tactic, it is not an end unto itself. And curl, I might note, is permissively licensed so there's no need for a clean room reimplementation. If someone's rewriting it I'm very interested to hear why and I hope they share their work. I'm mostly indifferent to how they license it.

joquarky · 2026-03-06T00:50:34 1772758234

> what about caring for others, the world at large

30 years of experience in the tech industry taught me that this will get you nowhere. Nobody will reciprocate generosity or loyalty without an underlying financial incentive.

> What happened to sticking up for the underdogs?

Underdogs get SPACed out and dump the employees that got them there.

beepbooptheory · 2026-03-06T01:52:42 1772761962

Grateful I do not share your experiences. But I'm sure your viewpoint here is hard won. Sorry.

CamperBob2 · 2026-03-05T19:05:41 1772737541

Everything I have now arose from processes of continuous improvement, carried out by smart people taking full advantage of the best available tools and technologies including all available means of automation.

It'll be OK.

beepbooptheory · 2026-03-05T20:08:14 1772741294

Ah well, I tried.. To paraphrase Nietzsche, a man can be measured by how well he sleeps at night. I can only hope you stay well rested into this future ;).

And yes, it will be ok!

CamperBob2 · 2026-03-05T21:25:07 1772745907

Ah, Nietzsche. "They call him Ubermensch, 'cause he's so driven." He told us that man is a thing that will be surpassed, and asked what we've done to surpass him. The last thing I want to do is get in the way of the people doing it.

beepbooptheory · 2026-03-05T22:16:01 1772748961

Ah geeze don't lie down so easily! It's aspirational! You don't need to prefigure yourself as so impotent here... We can all find the courage to roar against the consensus of slave mentality, even those of us who are maybe quicker to give it all up at first for some new God. I think you have the right attitude, but you are going to end up on the side of losers either way if you don't even try to fight. Also, I am just an old man, so grain of salt and all that!

And fwiw, the idea he meant like literal people walking around being Uber is kinda nazi distortion anyway.

dragonwriter · 2026-03-05T15:07:04 1772723224

Neither does the maintainer that claims a mechanical test of structural similarities can prove anything either waybwith regard to whether legally it is a derivative work (or even a mechnaical copy without the requisite new creative work to be a derivative work.)

And then Pilgrim is again wrong by saying that the use of Claude definitively makes it a derivative work because of the inability to prove it the work in question did not influence the neurons involved.

It is all dueling lay misreadings of copyright law, but it is also an area where the actual specific applicable law, on any level specific enough to cleanly apply, isn’t all that clear.

simiones · 2026-03-05T16:01:36 1772726496

I think this is a bit too broad. There are actually three possible cases.

When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

If the code is completely different, then clean room or not is indeed irrelevant. The only way the author can claim that you violated their copyright despite no apparent similarity is for them to have proof you followed some kind of mechanical process for generating the new code based on the old one, such as using an LLM with the old code as input prompt (TBD, completely unsettled: what if the old code is part of the training set, but was not part of the input?) - the burden of proof is on them to show that the dissimilarity is only apparent.

In realistic cases, you will have a mix of similar and dissimilar portions, and portions where the similarity is questionable. Each of these will need to be analyzed separately - and it's very likely that all the similar portions will need to be re-written again if you can't prove that they were not copied directly or from memory from the original, even if they represent a very small part of the work overall. Even if you wrote a 10k page book, if you copied one whole page verbatim from another book, you will be liable for that page, and the author may force you to take it out.

Someone · 2026-03-05T16:16:08 1772727368

> When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

Yes, but you do not have to prove that you haven’t copied the original; you have to prove you didn’t infringe copyright. For that there are other possible defenses, for example:

- fair use

- claiming the copied part doesn’t require creativity

- arguing that the copied code was written by AI (there’s jurisdiction that says AI-generated art can’t be copyrighted (https://www.theverge.com/2023/8/19/23838458/ai-generated-art...). It’s not impossible judges will make similar judgments for AI-generated programs)

kube-system · 2026-03-05T17:11:53 1772730713

Courts have ruled that you can't assign copyrights to a machine, because only humans qualify for human rights. ** There is not currently a legal consensus on whether or not the humans using AI tools are creating derivative works when they use AI models to create things.

** this case is similar to an old case where a ~~photographer~~ PETA claimed a monkey owned a copyright to a photo, because they said a monkey took the photo completely on their own. The court said "okay well, it's public domain then because only humans can have copyrights"

Imagine you put a harry potter book in a copy machine. It is correct that the copy machine would not have a copyright to the output. But you would still be violating copyright by distributing the output.

schlauerfox · 2026-03-05T17:32:30 1772731950

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput... Specifically he claimed he owned the copyright on a photo he didn't directly take. PETA weighed in trying to say the monkey owned the copyright.

kube-system · 2026-03-05T17:43:39 1772732619

Ah yeah you’re right I forgot it was PETA arguing that.

pseudalopex · 2026-03-05T16:36:39 1772728599

> there’s jurisdiction that says AI-generated art can’t be copyrighted

The headline was misleading. The courts said what Thaler could have copyrighted was a complicated question they ignored because he said he was not the author.

gpm · 2026-03-06T03:27:45 1772767665

- Arguing that you owned the copyright on the copied code (the author here has apparently been the sole maintainer of this library since 2013, not all, but a lot of the code that could be copied here probably already belongs to him...)

dmurvihill · 2026-03-07T00:17:51 1772842671

The burden of proof is completely uncharted when it comes to LLMs. Burden of proof is assigned by court precedent, not the Copyright Act itself (in US law). Meaning, a court looking at a case like this could (should) see the use of an LLM trained on the copyrighted work as a distinguishing factor that shifts the burden to the defense. As a matter of public policy, it's not great if infringers can use the poor accountability properties of LLMs to hide from the consequences of illegally redistributing copyrighted works.

simiones · 2026-03-09T11:18:31 1773055111

The way I see this it looks like this:

1. Initially, when you claim that someone has violated your copyright, the burden is on you to make a convincing claim on why the work represents a copy or derivative of your work.

2. If the work doesn't obviously resemble your original, which is the case here, then the burden is still on you to prove that either

(a), it is actually very similar in some fundamental way that makes it a derived work, such as being a translation or a summary of your work

or (b), it was produced following some kind of mechanical process and is not a result of the original human creativity of its authors

Now, in regards to item 2b, there are two possible uses of LLMs that are fundamentally different.

One is actually very clear cut: if I give an LLM a prompt consisting of the original work + a request to create a new work, then the new work is quite clearly a derived work of the original, just as much as a zip file of a work is a derived work.

The other is very much not yet settled: if I give an LLM a prompt asking for it to produce a piece of code that achieves the same goal as the original work, and the LLM had in its training set the original work, is the output of the LLM a derived work of the original (and possibly of other parts of the training set)? Of course, we'll only consider the case where the output doesn't resemble the original in any obvious way (i.e. the LLM is not producing a verbatim copy from memory). This question is novel, and I believe it is being currently tested in court for some cases, such as the NYT's case against OpenAI.

rerdavies · 2026-03-08T11:57:56 1772971076

On the other hand, as a matter of public policy, nobody should be able to claim copyright protection for the process of detecting whether a string is correctly formed unicode using code that in no material way resembles the original. This is not rocket science.

red_admiral · 2026-03-05T15:07:14 1772723234

I'm with you here, but I see another problem.

The expected functionality of chardet (detect the unicode encoding) is kind of fixed - apart from edge cases and new additions to unicode, you'd expect the original and new implementations to largely pass the same tests, and have a lot of similar code such as for "does this start with a BOM".

The fact that the JPlag shows such a low %overlap for an implementation of "the same interface" is convincing evidence for me that it's not just plagiarised.

cubefox · 2026-03-05T14:33:59 1772721239

If you let an LLM merely rephrase the codebase, that's like letting it rephrase the Harry Potter novels. Which, I'm pretty sure, would still be considered a copy under copyright law, not an original work, despite not copying any text verbatim.

actsasbuffoon · 2026-03-05T17:44:24 1772732664

But what if it didn’t summarize Harry Potter? What if it analyzed Harry Potter and came back with a specification for how to write a compelling story about wizards? And then someone read that spec and wrote a different story about wizards that bears only the most superficial resemblance to Harry Potter in the sense that they’re both compelling stories about wizards?

This is legitimately a very weird case and I have no idea how a court would decide it.

cubefox · 2026-03-06T07:18:36 1772781516

That seems unrelated to what happened.

TZubiri · 2026-03-05T14:43:34 1772721814

Ok sure, in the alternative, here's the argument:

The AI was trained with the code, so the complete rewrite is tainted and not a clean room. I can't believe this would need spelling out.

pocksuppet · 2026-03-05T15:14:01 1772723641

"Tainted rewrite" isn't a legal concept either. You have to prove (on balance of probabilities - more likely than not) that the defendant made an unauthorized copy, made an unauthorized derivative work, etc. Clean-room rewriting is a defense strategy, because if the programmer never saw the original work, they couldn't possibly have made a derivative. But even without that, you still have to prove they did. It's not an offence to just not be able to prove you didn't break the law.

rmast · 2026-03-06T07:00:12 1772780412

If you wanted to do the clean-room approach for something like chardet in a less controversial way, instead of having the AI do all the work couldn’t the AI generate the spec and then a human (with no exposure to the original code) do an initial implementation based on the spec?

Manuel_D · 2026-03-05T15:36:18 1772724978

As other pointed out, the notion of "clean room" rewrites is to make a particularly strong case of non-infringement. It doesn't mean that anything other than a clean room implementation is an infringement.

jdauriemma · 2026-03-05T15:08:13 1772723293

This is interesting and I'm not sure what to make of it. Devil's advocate: the person operating the AI also was "trained with the code," is that materially different from them writing it by hand vs. assisted by an LLM? Honestly asking, I hadn't considered this angle before.

cardanome · 2026-03-05T15:38:56 1772725136

If you worked at Microsoft and had access to the Windows source code you probably should not be contributing to WINE or similar projects as there would be legal risk.

So for this case, not much different legally. Of course there is the practical difference just like there is between me seeing you with my own eyes and me taking a picture of you.

"Training" an LLM ist not the same as training a human being. It a metaphor. Its confusing the save icon with an actual floppy disk.

I can say I "trained" my printer to print copyrighted material by feeding it bits but that that would be pure sophism.

Problem is that law hasn't really caught up the our brave new AI future yet so lots of decisions are up in the air. Plus governments incentivized to look the other way regarding copyright abuses when it comes to AI as they think that having competitive AI is of strategic importance.

jdauriemma · 2026-03-05T17:32:37 1772731957

> "Training" an LLM ist not the same as training a human being. It a metaphor. Its confusing the save icon with an actual floppy disk.

Maybe? But the design of the floppy disk is for data storage and retrieval per se. It can't give you your bits in a novel order like an LLM does (by design). From what I can tell in this case, the output is significantly differentiated from the source code.

senko · 2026-03-05T15:08:22 1772723302

Reread the parent: clean room is not required.

TZubiri · 2026-03-06T23:12:55 1772838775

Oh, got it.

Parent was making a claim about clean room not being required, without making claims about whether LLM coding is or isn't clean room.

kahnclusions · 2026-03-05T23:39:53 1772753993

I’m surprised they think the AI generated rewrite is even copyrightable.

spwa4 · 2026-03-05T15:18:33 1772723913

Given that LLMs were trained on the repository directly, it's not just the case that anything made by the LLM is a derivative work, the LLM ITSELF is a derivative work. After all, they all are substantially based on GPL licensed works by others. The standard courts have always used for "substantially based" by the way, is the ability to extract from the new work anything bigger than an excerpt of the original work.

So convincing evidence, by historical standards, that ChatGPT, Gemini, Copilot AND Claude are all derivative works of the GPL linux kernel can be gotten simply by asking "give me struct sk_buff", then keep asking until you're out of the headers (say, ask how a network driver uses it).

That means if courts are honest (and they never are when it comes to GPL) OpenAI, Google and Anthropic would be forced to release ALL materials needed to duplicate their models "at cost". Given how LLMs work that would include all models, code, AND training data. After all, that is the contract these companies entered into when using the GPL licensed linux kernel.

But of course, to courts copyright applies to you when Microsoft demands it ($30000 per violation PLUS stopping the use of the offending file/torrent/software/... because such measures are apparently justified for downloading a $50 piece of software), it does not apply to big companies when the rules would destroy them.

The last time this was talked about someone pointed out that Microsoft "stole", as they call it, the software to do product keys. They were convicted for doing that, and the judge even increased damages because of Microsoft's behavior in the case.

But there is no way in hell you'll ever get justice from the courts in this. In fact courts have already decided that AI training is fair use on 2 conditions:

1) that the companies acquired the material itself without violating copyright. Of course it has already been proven that this is not the case for any of them (they scraped it without permission, which has been declared illegal again and again in the file sharing trials)

2) that the models refuse to reproduce copyrighted works. Now go to your favorite model and ask "Give me some code written by Linus Torvalds": not a peep about copyright violation.

... but it does not matter, and it won't matter. Courts are making excuses to allow LLM models to violate any copyright, the excuse does not work, does not convince rational people, but it just doesn't matter.

But of course, if you thought that just because they cheat against the law to make what they're already doing legal, they'll do the same for you, help you violate copyright, right? After all, that's how they work! Ok now go and ask:

"Make me an image of Mickey Mouse peeling a cheese banana under an angry moon"

And you'll get a reply "YOU EVIL COPYRIGHT VILLAIN". Despite, of course, Mickey Mouse no longer being covered under copyright!

And to really get angry, find your favorite indie artist, and ask to make something based on their work. Even "Make an MC Escher style painting of Sonic the Hedgehog" ... even that doesn't count as copyright violation, only the truly gigantic companies deserve copyright protection.

dragonwriter · 2026-03-06T06:41:41 1772779301

> Given that LLMs were trained on the repository directly, it's not just the case that anything made by the LLM is a derivative work, the LLM ITSELF is a derivative work.

That’s not how “derivative works”, well, work.

First of all, a thing can only be a derivative work if it is itself an original work of authorship.

Otherwise, it might be (or contain) a complete copy or a partial copy of one or more source works (which, if it doesn't fall into a copyright exception, would still be a at least a potential violation), but its not a derivative work.

spwa4 · 2026-03-06T09:51:46 1772790706

So you're saying LLMs don't count as an original work and so have zero copyright protection? So anyone running those models can just freely copy them if they have access to them? And, of course, it means distillation attacks, even if they do turn out to copy the OpenAIs/Anthropic/... model are just 100% perfectly legal? I mean paying someone to break into the DC and then putting the model on torrent would allow anyone downloading it to use it, legally. Because that would be the implication, wouldn't it?

Plus, if this is true, it would be a loophole. Plus this is totally crazy.

It would be great if courts declared WHAT is the case. But they won't, because copyright only protects massive companies.

dragonwriter · 2026-03-06T10:37:26 1772793446

> So you're saying LLMs don't count as an original work and so have zero copyright protection?

No, I'm saying that your explanation of what makes something a derivative work is wrong. Now, personally, I think there is a very good argument that LLMs and similar models, if they have a copyright at all, do so only because of whatever copyright can be claimed on the training set as a work of its own (which, if ti exists, would be a compilation copyright), as a work of authorship of which it is a mechanical transformation (similar to object code having a copyright as a consequence of the copyright on the source code, which is a work of authorship.) Its also quite arguable that they are not subject to copyright, and many have made that argument.

> So anyone running those models can just freely copy them if they have access to them?

I'm not arguingn for that, but yes that is the consequence if they are not subject to copyright, assuming no other (e.g., contractual) prohibition binds the parties seeking to make copies.

> And, of course, it means distillation attacks, even if they do turn out to copy the OpenAIs/Anthropic/... model are just 100% perfectly legal?

Distillation isn't an “attack” and probably isn't a violation of copyright even if models are protected, they are literally interacting with the model through its interface to reproduce its function; they are functional reverse engineering.

Distillation is a violation of ToS, for which there are remedies outside of copyright.

> I mean paying someone to break into the DC and then putting the model on torrent would allow anyone downloading it to use it, legally.

Paying someone to break into the DC and do that would subject you to criminal charges for burglary and conspiracy, and civil liability for the associated torts as well as for theft of trade secrets covering the resulting harms, even without copyright protection.

> Plus, if this is true, it would be a loophole. Plus this is totally crazy.

Its not a “loophole” that copyright law only covers works of original authorship, it is the whole point of copyright law.

> It would be great if courts declared WHAT is the case.

If there is a dispute which turns on what is the case, courts will rule one way or the other on the issues necessary to resolve it. Courts (in the US at least) do not rule on issues not before them, except to the extent that a general rule which resolves but covers somewhat more than the immediate case can usefully be articulated by an appellate court.)

> But they won't, because copyright only protects massive companies.

Leaving out any question of whether the premise of this claim is true, the conclusion doesn't follow from it, since “what is the case” here is the kind of thing that is quite likely to be an issue between massive companies at some point in the not too distant future, requiring courts to resolve it even if they only address the meaning of copyright law for that purpose.

spwa4 · 2026-03-06T12:09:05 1772798945

Your first 3-4 arguments I just read as trying to weasel out from under the GPL. Because everyone trains on GPL code and if the GPL applies to the result ... well clearly you know the implications of that.

And btw: that a "compilation copyright" would apply to training data. Great. That only means, of course, that if they are publish their training data (like they agreed to when using GPL code to base their models on), people can't republish the exact same collection under different conditions (BUT they can under the same conditions). Everyone will happily follow that rule, don't worry.

> Paying someone to break into the DC and do that would subject you to criminal charges for burglary and conspiracy, and civil liability for the associated torts as well as for theft of trade secrets covering the resulting harms, even without copyright protection.

I don't claim the break-in would be legal, but without copyright protection, if that made a model leak, it would be fair game for everyone to use.

> Distillation is a violation of ToS, for which there are remedies outside of copyright.

But the models were created by violating ToS of webservers! This has the exact same problem the copyright violations have, only far far bigger! Scraping webservers is a violation of the ToS of those servers. For example [1]. Almost all have language somewhere that only allows humans to browse them, and bots, and IF bots are allowed at all (certainly not always), only specific bots for the purpose of indexing. So this is a much bigger problem for AI labs than even the GPL issue.

So yes, if you wanted to make the case that the AI labs, and large companies, violate any kind of contract, not just copyright licenses, excellent argument. But I know already: I'm a consultant, and I've had to sue, and won, 2 very large companies on terms of payment. In one case, I've had to do something called "forced execution", of the payment order (ie. going to the bank and demanding the bank execute the transaction against a random account of the company, against the will of the large company. Let me tell you, banks DO NOT like to do this)

Btw: what model training is doing, obviously, is distilling from the work, from the brain, of humans, against the will of those humans, and without paying for it. So in any reasonable interpretation, that's also a ToS violation. Probably a lot more implicit than the ones spelled out on websites, but not fundamentally different.

[1] https://www.bakerdatacounsel.com/blogs/terms-of-use-10-thing...

dragonwriter · 2026-03-07T03:29:29 1772854169

> Your first 3-4 arguments I just read as trying to weasel out from under the GPL.

I haven't talked about any license, or given any though to any particular license in any of this; I don't know where you are reading anything about the GPL specifically into it.

None of this has anything to do with the GPL, except that the GPL only is even necessary where there is something to license because of a prohibition on copyright law.

> nd btw: that a "compilation copyright" would apply to training data. Great. That only means, of course, that if they are publish their training data (like they agreed to when using GPL code to base their models on), people can't republish the exact same collection under different conditions (BUT they can under the same conditions).

No, that's not what it means, and I don't know where you got the "other terms" or the dependency on publication from; neither is from copyright law.

> But the models were created by violating ToS of webservers!

And, so what?

To the extent those terms are binding (more likely the case for sites where there is affirmative assent to the conditions, like ones that are gated on accounts with a signup process that requires agreeing to the ToS, e.g., “clickwrap”), there are remedies. For those where the conditions are not legally binding (more like the case where the terms are linked but there is no access gating, clear notice, or affirmative assent), well, they aren't binding.

> Btw: what model training is doing, obviously, is distilling from the work, from the brain, of humans, against the will of those humans, and without paying for it. So in any reasonable inteUhrpretation, that's also a ToS violation.

Uh, what? We are just creating imaginary new categories of intellectual property and imaginary terms of service and imaginary bases for those terms to be enforceable now?

vsl · 2026-03-06T06:37:49 1772779069

The LLM would, under that argument, be a transformative derivative work, which has important fair use implications (that don’t exist in the chardet case)…

jacquesm · 2026-03-05T12:00:27 1772712027

This is correct. I think any author of a main chunk of code that they claim ownership to (which is probably all of us!) should at least study the basics of copyright law. Getting little details wrong can cost you time, money and eventually your business if you're not careful.

antirez · 2026-03-04T14:49:29 1772635769

Fine tuning is a story that is nice to tell but that with modern LLMs makes less and less sense. Modern LLMs are so powerful that they are able to few shot learn complicated things, so a strong prompt and augmenting the generation (given the massive context window of Qwen3.5, too) is usually the best option available. There are models for which fine tuning is great, like image models: there with LoRa you can get good results in many ways. And LLMs of the past, too: it made sense for certain use cases. But now, why? LLMs are already released after seeing (after pre-training) massive amount of datasets for SFT and then RL. Removing the censorship is much more efficiently done with other techniques. So I have a strong feeling that fine tuning will be every day less relevant, and already is quite irrelevant. This, again, in the specific case of LLMs. For other foundational models fine tuning still makes sense and is useful (images, text to speech, ...).

prettyblocks · 2026-03-04T14:59:28 1772636368

I think the biggest case for fine tuning is probably that you can take small models, fine tune them for applications that require structured output, and then run cheap inference at scale. "Frontier LLMs can do it with enough context" is not really a strong argument against fine-tuning, because they're expensive to run.

faxmeyourcode · 2026-03-04T17:14:20 1772644460

Especially for super constrained applications. I don't care if the language model that I use for my extremely specific business domain can solve PhD math or remember the works of Shakespeare. I'd trade all of that for pure task specific accuracy.

arkmm · 2026-03-04T19:02:03 1772650923

Can you share more details about your use case? The good applications of fine tuning are usually pretty niche, which tends to make people feel like others might not be interested in hearing the details.

As a result it's really hard to read about real-world use cases online. I think a lot of people would love to hear more details - at least I know I would!

faxmeyourcode · 2026-03-10T15:15:44 1773155744

If you treat LLMs as generic transformers, you can fine tune with a ton of examples of input output pairs. For messy input data with lots of examples already built, this is ideal.

At my day job we have experimented with fine tuned transformers for our receipt processing workflow. We take images of receipts, run them through OCR (this step might not even be necessary, but we do it at scale already anyways), and then take the OCR output text blobs and "transform" them into structured receipts with retailer, details like zip code, transaction timestamps, line items, sales taxes, sales, etc.

I trained a small LLM (mistral-7b) via SFT with 1000 (maybe 10,000? I don't remember) examples from receipts in our database from 2019. When I tested the model on receipts from 2020 it hit something like 98% accuracy.

The key that made this work so well is that we had a ton of data (potentially billions of example input/output pairs) and we could easily evaluate the correctness by unpacking the json output and comparing with our source tables.

Note that this isn't running in production, it was an experiment. There are edge cases I didn't consider, and there's a lot more to it in terms of accurately evaling, when to re-train, dealing with net new receipt types, retailers, new languages (we're doing global expansion RN so it's top of mind), general diversity of edge cases in your training data, etc.

derwiki · 2026-03-04T15:27:32 1772638052

Exactly, inference cost is a very good reason to fine tune with something like Qwen

Me1000 · 2026-03-04T15:56:55 1772639815

Wouldn’t it be better to use a grammar in the token sampler? Tuning is fine, but doesn’t guarantee a syntactical correct structured output. But if the sampler is grammar aware it could.

MillionOClock · 2026-03-04T16:05:12 1772640312

I think both should be done, they don't really serve the same purpose.

_the_inflator · 2026-03-04T20:16:18 1772655378

I agree.

Also for certain use cases there are constraints like embedded hardware systems with no internet access. These LLMs have to be trained to specialize for clearly defined use cases under hardware constraints.

Frontier LLMs also are rarely function in isolation instead are orchestrating a system of special units aka subsystems and agents.

While costs and effort are one thing, being able to downsize these monster LLMs through finetuning itself in the first place is extremly valuable.

andriy_koval · 2026-03-04T21:03:50 1772658230

> "Frontier LLMs can do it with enough context" is not really a strong argument against fine-tuning, because they're expensive to run.

I am not expert in this topic, but I am wondering if large cached context is actually cheap to run and frontier models would be cost efficient too in such setting?

prettyblocks · 2026-03-05T01:55:29 1772675729

I'd like to read more about that if anyone has any suggestions.

andriy_koval · 2026-03-06T03:30:55 1772767855

I am not expert in this topic, but its easy to observe that price for cached tokens is usually 10x cheaper on major providers.

throwaway6977 · 2026-03-04T15:15:27 1772637327

I agree- I'm currently trying to learn how I can embed a fine tuned tiny model into my c++ game so it can provide a narrative in prose of certain game-event logs. It needs to be as tiny as possible so it doesn't take resources away from the running game.

lelanthran · 2026-03-04T21:39:05 1772660345

> I agree- I'm currently trying to learn how I can embed a fine tuned tiny model into my c++ game so it can provide a narrative in prose of certain game-event logs.

Unless your game states have combinatoral exlosion, would it not be better to generate all of that pre-build? If templated you can generate a few hundreds of thousands of templates to use for any circumstance, then instantiate and stitch together those templates during the game runtime.

hedgehog · 2026-03-05T01:40:13 1772674813

There are a bunch of tutorials on how to use GRPO to fine tune a small Qwen. Depending what you're doing LoRA or even just prefix tuning can give pretty good results with no special hardware.

yw3410 · 2026-03-04T19:04:15 1772651055

How small a model are we talking? Don't even the smallest models which would work need gigabytes of memory?

lelanthran · 2026-03-04T21:27:15 1772659635

> How small a model are we talking? Don't even the smallest models which would work need gigabytes of memory?

I dunno, for game prose I expect that a tiny highly quantized model would be sufficient (generating no more than a paragraph), so 300MB - 500MB maybe? Running on CPU not GPU is feasible too, I think.

butILoveLife · 2026-03-04T15:36:17 1772638577

This is literally what I'm waiting for. I want a ~8B model that works well with OpenClaw.

prettyblocks · 2026-03-04T15:53:21 1772639601

I don't think you will get that anytime soon because for a model to work well with something like openclaw it needs a massive context window.

butILoveLife · 2026-03-04T15:56:35 1772639795

but but but but unified memory! (jk, I don't actually believe in Apple marketing words)

There might be future optimizations. Like, have your small model do COT to find where to look for memory that is relevant.

piyh · 2026-03-04T16:03:06 1772640186

Qwen 9B doesn't?

butILoveLife · 2026-03-04T16:11:54 1772640714

Nothing is really usable outside Opus.

I've tried too. Wasted a few days trying out even high end paid models.

danielhanchen · 2026-03-04T15:20:43 1772637643

These are fair points considering LLMs are getting smarter and better every week - but to be fair the biggest benefits of finetuning / RL are still not yet realized:

1. If we have robots at home, they need some sort of efficient continual learning, which could be on the go finetuning / RL via some small LoRA - this will need to do multimodal finetuning with sparse reward signals - one could also imagine all data is aggregated to one central processing center after anonymization, and training a larger model with more data + RL like that

2. Agreed images, audio, video etc is what still LoRA does well - the guide at https://unsloth.ai/docs/models/qwen3.5/fine-tune is actually a vision + text finetuning guide, so you can finetune the vision layers on your own use case

3. Model routing is going to be more the norm in the future - ie locally smallish models with LoRA for continuous finetuning can be used, but complex tasks can be offloaded to a large LLM in the cloud.

4. I also wrote about more use-cases below on the post - DoorDash, Vercel, Mercor, Stripe, NASA, Perplexity, Cursor and many others all do finetuning - for eg Cursor, Perplexity finetune large OSS LLMs themselves for their specific product lines - so there is definitely value if you have the data for it.

canyon289 · 2026-03-04T15:45:39 1772639139

I work on Gemma and Gemini models I want to echo Daniel's point here. Small finetuned models have their place even with larger general purpose models.

For example last year with Daniel/Unsloth's help we released a tiny specialized model that can get equivalent to Gemini level purpose specifically for FC. For folks that need efficient limited purpose models small models like this can fit a specific need.

https://blog.google/innovation-and-ai/technology/developers-...

Especially on device. https://developers.googleblog.com/on-device-function-calling...

It's the same with chips, we have general purpose CPUs but we still have specialized silicon for tasks that are smaller, more power efficient, cheaper, and because they're single purpose it simplifies and derisks certain designs.

And I have to add, if you want to learn about finetuning models efficiently the Unsloth guides are at the top of my list. They're practical, have all the technical details, and most importantly Daniel and the others are working around the clock to keep it up to date in what is an incredibly fast moving space of models and hardware. I am continually astounded by their work.

danielhanchen · 2026-03-04T17:02:08 1772643728

Function calling and also finetuning with FC is a big use-case across any companies - we constantly see large orgs have internal APIs with some schema, and JSON guided output is good, but finetuning with FC is just much more powerful since the model actually starts to understand how to utilize the tools more effectively!

Nice work with Gemma and Gemini as usual! :) Excited for more cool models this year!

bravura · 2026-03-04T16:36:30 1772642190

For me, trying to fine-tune a model to write "best day" prose I would accept over 80% of the time.

You are correct if we are talking about knowledge.

However it is bad at hyper-idiosyncratic, gritty style transfer.

I first noticed the issue when asking claude code to draft email responses. The choice of register was off. ("Register in writing refers to the level of formality and tone chosen to suit a specific audience, purpose, and context.")

I decided to talk all my HN comments and rewrite them in various bad LLM prose, and see if I could use DSPy to optimize a prompt using in-context-learning (ICL, I give it 10 examples of my HN comments) and the results were abysmal. RHLF fine-tuned frontier LLMs have a deep seated aversion to the target stylistic distribution of my comments.

I tried fine-tuning qwen3, llama, and gemma models. Instruct models are already so tuned that they could not be tuned. This is using several hunded comments as gold targets and 5 different LLM degradations per gold as the input.

kristianp · 2026-03-05T01:30:14 1772674214

> Instruct models are already so tuned that they could not be tuned

Some models have the base model available, that is before instruction tuning. For example llama 3 comes in "pre-trained and instruction tuned variants" [1]. I'm guessing you already know that though.

[1] https://huggingface.co/meta-llama/Meta-Llama-3-8B

BoredomIsFun · 2026-03-05T08:27:13 1772699233

Llama-3-8B is a coprolite at this point.

HanClinto · 2026-03-04T19:48:54 1772653734

How well would you say it worked? I do like the idea of taking my historical forum posts and e-mails and whatnot and training an autocomplete LLM that is specifically "my voice".

abhgh · 2026-03-04T15:52:39 1772639559

They are great for specialized use-cases: (a) where the problem is not hard enough (you don't need reasoning), or (b) diverse enough (you don't need a world model), (c) you want cheap inference (and you can make it happen hardware-wise) and (d) you either have enough data or a workflow that accumulates data (with fine tuning with enough data you can sometimes beat a premier model while ensuring low latency - ofc, assuming (a) and (b) apply).

I make it sound like a rare perfect storm needs to exist to justify fine tuning, but these circumstances are not uncommon - to an extent (a), (c) and (d) were already prerequisites for deploying traditional ML systems.

ranger_danger · 2026-03-04T14:59:03 1772636343

where it makes sense IMO is when you need it to know about a large amount of information that's not already in the model, such as a company knowledgebase, code repositories or a trove of specialized legal documents... in that case it's not realistic to try to stuff the context window every time with that information, especially if you're trying to make a responsive chat bot.

antirez · 2026-03-04T15:20:54 1772637654

With the current context windows and the ability those models did RL to work as agents, it's much faster and reliable for them to use tools and find the information before replying. Much better, no hallucinations problems (or a lot less), no fine tuning needed when information changes. I believe it is exactly in this case that fine tuning is no longer useful, and even in the past worked at very different degrees of quality.

dotancohen · 2026-03-04T15:21:11 1772637671

Wouldn't a RAG make more sense for this use case?

larodi · 2026-03-04T17:52:14 1772646734

indeed, and in practical terms, this is more often than never, and particularly with large knowledge bases. also makes super sense for VLMs and ViT models.

hrmtst93837 · 2026-03-05T08:04:36 1772697876

I think fine-tuning still matters for production problems where you need deterministic, auditable behavior or to reliably reduce hallucinations that clever prompting alone cannot eliminate. In my experience the best pragmatic approach is parameter efficient tuning, for example LoRA or QLoRA with bitsandbytes for 4-bit training to keep costs down, paired with a RAG layer over a FAISS vector DB so you do not stuff the model context and blow your token budget. I've found that managing a few tuned adapters and a small ops pipeline is a simpler, cheaper long term tradeoff than endless prompt gymnastics, and it saves you from praying to the prompt gods every time requirements creep.

woctordho · 2026-03-05T10:04:24 1772705064

This time even Unsloth could not provide bitsandbytes 4-bit models. bitsandbytes does not support new models with MoE and linear attentions, and it's much less flexible than GGUF. Nowadays I think it's better to train lora over GGUF base model, see the discussion at https://github.com/huggingface/transformers/issues/40070

I'll find some time to do this and I hope someone can do this earlier than me.

joefourier · 2026-03-04T15:56:34 1772639794

Fine-tuning still makes sense for cost/latency-sensitive applications. Massive context windows drastically slow down generation, and modern models' performance and instruction following ability relies heavily on a reasoning step that can consume orders of magnitude more tokens than the actual response (depending on the application), while a fine-tuned model can skip/significantly reduce that step.

Using the large model to generate synthetic data offline with the techniques you mentioned, then fine-tuning the small model on it, is an underrated technique.

woctordho · 2026-03-05T10:03:57 1772705037

In one word, porn.

Qwen filtered out a lot of porn during data curation, and a finetuned model can perform much better than context engineering. Abliteration can only remove censorship, not add something non-existent in the training data.

This guy did some great work in the age of Qwen 3.0: https://huggingface.co/chenrm/qwen3-235b-a22b-h-corpus-lora

iamleppert · 2026-03-05T15:43:16 1772725396

The problem with this is context. Whatever examples you provide compete with whatever content you want actually analyzed. If the problem is sufficiently complex, you quickly will run out of context space. You must also describe your response, in what you want. For many applications, it's better to fine-tune.

mountainriver · 2026-03-05T00:52:24 1772671944

If that were true, we would be able to run working agents out of the box on any domain.

We are far from that still, for reliability in most applications you need fine tuning.

For any new modality you need fine tuning

For voice, image and video models you need fine tuning

For continual learning you (often) need fine tuning.

For any domain that is somewhat OOD you need fine tuning.

To fully ground a model you need fine tuning

esafak · 2026-03-04T15:15:39 1772637339

I would like model adaptation algorithms like Doc-to-LoRA (https://pub.sakana.ai/doc-to-lora/) to go mainstream.

sweaterkokuro · 2026-03-04T16:07:33 1772640453

As strong as current LLMs are they are easily distracted from the task often. At production scale, fine tuning can make a lot more sense given you provide the model a very specific task.

andsoitis · 2026-03-04T16:21:41 1772641301

For agentic coding, which do you prefer:

a) qwen3-coder

b) qwen3.5 (general)

KronisLV · 2026-03-04T15:44:53 1772639093

> But now, why?

Because these models are good in general but their Latvian output is half-drivel, like the roots of the words are usually the right ones, but not the rest.

That, and EuroLLM is really slow to release new models that would be similarly good off the shelf.

antirez · 2026-02-27T12:12:51 1772194371

Thank you for your work about CP/M, Steve!

antirez · 2026-02-27T12:11:40 1772194300

Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.