More

theptip · 2026-04-17T01:15:17 1776388517

I normally agree with this, but they objectively did lower the default effort level, and this caused people to get worse performance unexpectedly.

And it does seem likely to me that there were intermittent bugs in adaptive reasoning, based on posts here by Boris.

So all told, in this case it seems correct to say that Opus has been very flaky in its reasoning performance.

I think both of these changes were good faith and in isolation reasonable, ie most users don’t need high effort reasoning. But for the users that do need high effort, they really notice the difference.

theptip · 2026-04-16T15:14:36 1776352476

11% further along the particular bell curve of SWE-bench. Not really easy to extrapolate to real world, especially given that eg the Chinese models tend to heavily train on the benchmarks. But a 10% bump with the same model should equate to “feels noticeably smarter”.

A more quantifiable eval would be METR’s task time - it’s the duration of tasks that the model can complete on average 50% of the time, we’ll have to wait to see where 4.7 lands on this one.

theptip · 2026-04-16T02:11:01 1776305461

It’s a good thing to keep in mind, but LLM + scaffolding is clearly superior. So if you just use vanilla LLMs you will always be behind.

I think the important thing is to avoid over-optimizing. Your scaffold, not avoid building one altogether.

fragmede · 2026-04-16T02:35:09 1776306909

It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.

l33tman · 2026-04-16T06:49:30 1776322170

As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.

jtbayly · 2026-04-16T16:20:40 1776356440

I read once (so no idea if it is true) that in voice lessons, one of the most effective things you can do to improve people's technique is to tell them to pretend to be an opera singer.

chrisjj · 2026-04-16T10:52:22 1776336742

There will always be bosses who/which think telling workers to work well works well.

AlexCoventry · 2026-04-16T03:47:22 1776311242

They have no values of their own, so you have to direct their attention that way.

theptip · 2026-04-12T00:37:43 1775954263

It’s $20k for all the vulns found in the sweep, not just that one.

And last security audit I paid for (on a smaller codebase than OpenBSD) was substantially more than $20k, so it’s cheaper than the going price for this quality of audit.

theptip · 2026-04-11T05:37:02 1775885822

I think them stating this very simple policy should also be read as them explicitly not making a more restrictive policy, as some kernel maintainers were proposing.

Applejinx · 2026-04-11T11:39:18 1775907558

From everything I'm seeing in the industry (I'm basically a noncoder choosing to not use AI in the stuff that I make, and privy to the private work experience of coders and creators also in that field because of human social contacts), I feel like I can shed a bit of light.

It looks to me like a more restrictive policy will be flat-out impossible.

Even people I trust are going along with this stuff, akin to CAD replacing drafting. Code is logic as language, and starting with web code and rapidly metastasizing to C++ (due to complexity and the sheer size of the extant codebase, good and bad) the AI has turned slop-coding to a 'solved problem'. If you don't mean to do the best possible thing or a new thing there is no excuse for existing as a coder in the world of AI.

If you do expect to do a new thing or a best thing, in theory you're required to put out the novel information as AI cannot reach it until you've entered it into the corpus of existing code the AI's built on. However, if you're simply recombining existing aspects of the code language in a novel way, that might be more reachable… that's probably where 'AI escape velocity' will come from should it occur.

In practice, everybody I know is relegating the busywork of coding to AI. I don't feel social pressure to do the same but I'm not a coder. I'm something else that produces MIT-licensed codebases for accomplishing things that aren't represented in code AS code, rather it's for accomplishing things that are specific and experiential. I write code to make specific noises I'm not hearing elsewhere, and not hearing out of the mainstream of 'sound-making code artifacts'.

Therefore, it's impractical for Linux to take any position forbidding AI-assisted code. People will just lie and claim they did it. Is primitive tab-complete also AI? Where's the line? What about when coding tools uniformly begin to tab-complete with extensive reasoning and code prototyping? I already see this in the JetBrains Rider editor I use for Godot hacking, even though I've turned off everything I can related to AI. It'll still try to tab-complete patterns it thinks it recognizes, rarely with what I intend.

And so the choice is to enforce responsibility. I think this is appropriate because that's where the choices will matter. Additions and alterations will be the responsibility of specific human people, which won't handle everything negative that's happening but will allow for some pressures and expectations that are useful.

I don't think you can be a collaborative software project right now and not deal with this in some way. I get out of it because I'm read-only: I'm writing stuff on a codebase that lives on an antique laptop without internet access that couldn't run AI if it tried. Very likely the only web browsers it can run are similarly unable to handle 2026 web pages, though I've not checked in years. You've only got my word for that, though, and your estimation of my veracity based on how plausible it seems (I code publically on livestreams, and am not at all an impressive coder when I do that). Linux can't do what I do, so it's going to do what Linux does, and this seems the best option.

alfiedotwtf · 2026-04-11T13:50:56 1775915456

You can refuse to use AI personally, but why would you not help yourself when you can?

… my dad is 86 and only after I signed him up to Claude could he write Arduino code without a phone call to me after 5 minutes of trying himself. So now, he’s spending 4+ hours at a time focused writing code and building circuits of things he only dreamt about creating for decades.

Unless you’re doing something for the personal love of the craft and sharpening your tools, use every advantage you can get in order to do the job.

But… as above, if you’re doing it for the love of it, sure - hand crafted code does taste better and you know all the ingredients are organic

Applejinx · 2026-04-11T21:54:50 1775944490

Nah. I'm only interested in the bits it doesn't know. Why would someone else's regurgitated whatever be what I wanted, why would that be help in any way?

ghighi7878 · 2026-04-16T18:03:48 1776362628

Everything is a copy of a copy of a copy

alfiedotwtf · 2026-04-12T02:03:28 1775959408

My dad isn’t a programmer but he’s done ho by electronics for his whole life. It’s now helping him reach way beyond what he’s ever been able to do himself. And he’s never been so excited about anything for at least the past 20 years!

vips7L · 2026-04-11T14:45:16 1775918716

Or just let people do the job the way they want.

theptip · 2026-04-08T17:39:08 1775669948

Thats not what a moat means in business. It doesn’t mean impregnable, it just means expensive or difficult to cross.

It is absolutely a moat if only $1T companies can afford the capex to compete.

6thbit · 2026-04-08T19:53:32 1775678012

non-zero already existing companies can afford to compete, so its not a great preliminary line of defense against your main enemies.

If the existing competition's turn around time to out-benchmark you is in the order of weeks, its an absolutely terrible moat.

theptip · 2026-04-07T23:20:51 1775604051

Public APIs get distilled, this is why Deepseek and Qwen are so competitive.

I am very confident that frontier models won’t be public at strong AGI levels, and certainly not at superhuman levels.

theptip · 2026-04-07T04:06:39 1775534799

I’ve been playing with

    /loop 5m check if you have any actionable tasks

for this scenario.

giancarlostoro · 2026-04-07T13:21:59 1775568119

Is that baked in?

theptip · 2026-04-07T18:19:40 1775585980

Yeah, new feature dropped a couple weeks ago.

https://code.claude.com/docs/en/scheduled-tasks

theptip · 2026-04-05T19:24:08 1775417048

IIUC the operating theory is that a short burst of acute inflammatory stimulus clears out the system to below the prior baseline.

theptip · 2026-04-01T19:31:36 1775071896

They will certainly make human soldiers unviable. (I draw mostly dystopian conclusions from that prediction.)

throwaway85825 · 2026-04-01T19:37:51 1775072271

Or it will just lead to lopsided massacres like the maxim gun did.