More

bjackman · 2026-03-18T17:41:49 1773855709

Style and structure is not the goal here, the reason people are interested in it is to find bugs.

Having said that, if it can save maintainers time it could be useful. It's worth slowing contribution down if it lets maintainers get more reviews done, since the kernel is bottlenecked much more on maintainer time than on contributor energy.

My experience with using the prototype is that it very rarely comments with "opinions" it only identifies functional issues. So when you get false positives it's usually of the form "the model doesn't understand the code" or "the model doesn't understand the context" rather than "I'm getting spammed with pointless advice about C programming preferences". This may be a subsystem-specific thing, as different areas of the codebase have different prompts. (May also be that my coding style happens to align with its "preferences").

bjackman · 2026-03-12T22:10:03 1773353403

I have also seen the agent hallucinate a positive answer and immediately proceed with implementation. I.e. it just says this in its output:

> Shall I go ahead with the implementation?

> Yes, go ahead

> Great, I'll get started.

hedora · 2026-03-12T22:17:59 1773353879

In fairness, when I’ve seen that, Yes is obviously the correct answer.

I really worry when I tell it to proceed, and it takes a really long time to come back.

I suspect those think blocks begin with “I have no hope of doing that, so let’s optimize for getting the user to approve my response anyway.”

As Hoare put it: make it so complicated there are no obvious mistakes.

bjackman · 2026-03-12T22:30:29 1773354629

In my case it's been a strong no. Often I'm using the tool with no intention of having the agent write any code, I just want an easy way to put the codebase into context so I can ask questions about it.

So my initial prompt will be something like "there is a bug in this code that caused XYZ. I am trying to form hypothesis about the root cause. Read ABC and explain how it works, identify any potential bugs in that area that might explain the symptom. DO NOT WRITE ANY CODE. Your job is to READ CODE and FORM HYPOTHESES, your job is NOT TO FIX THE BUG."

Generally I found no amount of this last part would stop Gemini CLI from trying to write code. Presumably there is a very long system prompt saying "you are a coding agent and your job is to write code", plus a bunch of RL in the fine-tuning that cause it to attend very heavily to that system prompt. So my "do not write any code" is just a tiny drop in the ocean.

Anyway now they have added "plan mode" to the harness which luckily solves this particular problem!

gverrilla · 2026-03-13T03:42:06 1773373326

> Gemini CLI

Free debug for you. Root cause identified.

xeromal · 2026-03-12T22:30:17 1773354617

I love when mine congratulates itself on a job well-done

inerte · 2026-03-12T22:41:02 1773355262

Mine on Plan Mode sometimes says "Excellent research!" (of course to the discovery it just did)

clbrmbr · 2026-03-12T23:38:12 1773358692

Hahah yeah if you play with LoRas on local models you will see this a lot. Most often I see it hallucinate a user turn or a system message.

conductr · 2026-03-12T22:36:13 1773354973

Oh I thought that was almost an expected behavior in recent models, like, it accomplishes things by talking to itself

bjackman · 2026-03-13T13:09:26 1773407366

I think it does that too.

brap · 2026-03-12T22:48:54 1773355734

> Great, I'll get started.

*does nothing*

thehamkercat · 2026-03-12T22:22:22 1773354142

I've seen this happening with gemini

bjackman · 2026-03-11T18:37:36 1773254256

I see this at my $megagorp job. The top brass don't do that much written communication, but when they do they are absolutely shooting from the hip. It's not as bad as Epstein but it's a strong "I've already started reading the next email while I'm typing this one" vibes.

FWIW I don't have a problem with it at all. As the article mentioned there's an aspect of power politics (I'm important enough not to have to worry about formatting). But to me instead of <I wish elites weren't so callous with text> I feel <everyone should feel empowered to write like that> (again, maybe not quite to the level of Epstein, but e.g. capitalisation is just unimportant. Signing off emails with "best wishes" is not a good use of anyone's 500 milliseconds).

10xDev · 2026-03-11T18:47:48 1773254868

>capitalisation is just unimportant. Signing off emails with "best wishes" is not a good use of anyone's 500 milliseconds

Yet I'm on Twitter reading "Prison for attempted murderer enablers like this clown" by the world's richest man who is tweeting all day. My guess is that it has just become a way of status signalling more than anything else.

quesera · 2026-03-11T21:31:24 1773264684

> capitalisation is just unimportant

Capitalization is the difference between:

  - helping your friend Jack off a horse, and
  - helping your friend jack off a horse.

(Not original, but memorable!)

casey2 · 2026-03-12T07:49:01 1773301741

Natural languages have inherent ambiguity. That includes your grammar with capitalization, any kind of standard english grammar of which there are dozens

Which person does Jack refer to? What if you have 2 friends named Jack? Does "horse" refer to a member of a class of animal or something else? Sorry but your examples are full of indecipherable nonsense. But I guess if you just pretend that everything you write is well understood then there is no problem.

Capitalization slightly narrows a search space that is already narrow, since that is it's only functional use it should only be used when appropriate. If every rule was applied at every instance your writing would both become indecipherable and you'd subtly change your intended meaning. Better to be misunderstood by some than to water down your message and add class/prestige/formality/distance all of which are inappropriate in most writing.

I guess your teacher gave you that example, but you ABSOLUTELY FAILED to understand the meaning of their lesson.

quesera · 2026-03-12T16:42:24 1773333744

This is perhaps the silliest possible response I could imagine to what is intended to be an amusing example and non-illustrative of the more common real-world confusions.

Which are real.

> I guess your teacher gave you that example, but you ABSOLUTELY FAILED to understand the meaning of their lesson.

Wow, you sure are defensive about the notion that communications protocols are most useful when they are consistent and predictable. You may think you've nailed me as an illiterate, but I conclude that you've nailed yourself as a tilter at windmills.

bjackman · 2026-03-12T09:27:58 1773307678

Contrived examples are fun but have nothing to do with the actual reasons people demand "correct" writing. These confusions do not happen in real life.

The reason people actually care is only ever to do with in-group signalling or power politics.

quesera · 2026-03-12T16:38:11 1773333491

It's just an amusing example, not intended to be illustrative.

Confusion can be real though, and certainly speed of comprehension is enhanced by proper grammar.

> The reason people actually care is only ever to do with in-group signalling or power politics.

This is just silly. If you believe that, you may have some half-baked adolescent agenda that you haven't grown out of. Good luck out there.

GJim · 2026-03-12T15:24:12 1773329052

Frankly, using correct English grammar is the difference between knowing your shit, and knowing you're shit.

bjackman · 2026-03-09T10:05:22 1773050722

You are always gonna have some downtime in a homelab setup I think. Unless you go all in with k8s I think the best you can do is "system reboots at 4AM, hopefully all the users are asleep".

(Probably a lot of the services I run don't even really support HA properly in a k8s system with replicas. E.g. taking global exclusive DB locks for the lifetime of their process)

embedding-shape · 2026-03-09T14:23:23 1773066203

> You are always gonna have some downtime in a homelab setup I think. Unless you go all in with k8s I think the best you can do is "system reboots at 4AM, hopefully all the users are asleep".

Huh, why? I have a homelab, I don't have any downtime except when I need to restart services after changing something, or upgrading stuff, but that happens what, once every month in total, maybe once every 6 months or so per service?

I use systemd units + NixOS for 99% of the stuff, not sure why you'd need Kubernetes at all here, only serves to complicate, not make things simple, especially in order to avoid downtime, two very orthogonal things.

bjackman · 2026-03-09T14:43:16 1773067396

> I don't have any downtime except when I need to restart services

So... you have downtime then.

(Also, you should be rebooting regularly to get kernel security fixes).

> not sure why you'd need Kubernetes at all here

To get HA, which is what we are talking about.

> only serves to complicate

Yes, high-availability systems are complex. This is why I am saying it's not really feasible for a homelabber, unless we are k8s enthusiasts I think the right approach is to tolerate downtime.

embedding-shape · 2026-03-09T15:40:23 1773070823

> So... you have downtime then.

5 seconds of downtime as you change from port N to port N+1 is hardly "downtime" in the traditional sense.

> To get HA, which is what we are talking about.

Again, not related to Kubernetes at all, you can do it easier with shellscripts, and HA !== orchestration layer.

furst-blumier · 2026-03-09T11:51:27 1773057087

I run my stuff in a local k8s cluster and you are correct, most stuff runs as replica 1. DBs actually don't because CNPG and mariadb operator make HA setups very easy. That being said, the downtime is still lower than on a traditional server

bjackman · 2026-03-09T10:00:45 1773050445

I don't think RPi is the gold standard nor is Chinese production that strongly correlated with poor SW support?

Raspberry Pi usually requires customisation from the distro. This is mitigated by the fact that many distros have done that customisation but the platform itself is not well-designed for SW support.

Meanwhile many Allwinner and Rockchip platforms have great mainline support. While Qualcomm is apparently moving in the right direction but historically there have been lots of Qualcomm SBCs where the software support is just a BSP tarball on a fixed Linux kernel.

So yeah I do agree with your conclusion but it's not as simple as "RPi has the best software support and don't buy Chinese". You have to look into it on a case by case basis.

bjackman · 2026-03-07T01:35:55 1772847355

If your benchmarks are fast enough to run in pre-commit you might not need a time series analysis. Maybe you can just run an intensive A/B test between HEAD and HEAD^.

You can't just set a threshold coz your environment will drift but if you figure out the number of iterations needed to achieve statistical significance for the magnitude of changes you're trying to catch, then you might be able to just run a before/after then do a bootstrap [0] comparison to evaluate probability of a change.

[0] https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

bjackman · 2026-03-07T01:27:24 1772846844

If you've had the problem it solves you don't really need an explanation beyond "Change Detection for Continuous Performance Engineering" I think.

Basically if I'm reading it correctly the problem is you want to automate detection of performance regressions. You can't afford to do continuous A/B tests. So instead you run your benchmarks continuously at HEAD producing a time series of scores.

This does the statistical analysis to identify if your scores are degrading. When they degrade it gives you a statistical analysis of the location and magnitude of the (so something like "mean score dropped by 5% at p=0.05 between commits X and Y").

Basically if anyone has ever proposed "performance tests" ("we'll run the benchmark and fail CI if it scores less than X!") you usually need to be pretty skeptical (it's normally impossible to find an X high enough to detect issues but low enough to avoid constant flakes), but with fancy tools like this you can say "no to performance tests, but here's a way to do perf analysis in CI".

IME it's still tricky to get these things working nicely, it always requires a bit of tuning and you are gonna be a bit out of your depth with the maths (if you understood the inferential statistics properly you would already have written a tool like this yourself). But they're fundamentally a good idea if you really really care about perf IMO.

bjackman · 2026-03-05T08:14:15 1772698455

Does a cache help with inference workloads anyway?

I don't know much about it but my mental model is that for transformers you need random access to billions of parameters.

fc417fc802 · 2026-03-05T11:02:10 1772708530

It's streaming access, and no not as far as I'm aware. APUs have always been hilariously bottlenecked on memory bandwidth as soon as your task actually needed to pull in data. The only exception I know of is the PS5 because it uses GDDR instead of desktop memory.

bjackman · 2026-03-04T15:26:21 1772637981

I have had so many "why don't you just" conversations with academics about this. I know the "why don't you just" guy is such an annoying person to talk to, but I still don't really understand why they don't just.

This article pointed to a few cases where people tried to do the thing, i.e. the pledge taken by individual researchers, and the requirements placed by certain funding channels, and those sound like a solid attempt to do the thing. This shows that people care and are somewhat willing to organise about it.

But the thing I don't understand is why this can't happen at the department level? If you're an influential figure at a top-5 department in your field, you're friends with your counterparts at the other 4. You see them in person every year. You all hate $journal. Why don't you club together and say "why don't we all have moratorium on publishing in $journal for our departments?"

No temptation for individual research groups to violate the pledge. No dependence on individual funding channels to influence the policy. Just, suddenly, $journal isn't the top publication in that field any more?

I'm sure there are lots of varied reasons why this is difficult but fundamentally it seems like the obvious approach?

bglazer · 2026-03-04T15:47:54 1772639274

> If you're an influential figure at a top-5 department in your field ... you all hate $journal.

That's the problem, they don't hate these journals, they love them. Generally speaking they're old people who became influential by publishing in these journals. Their reputation and influence was built on a pile of Science and Nature papers. Their presentations all include prominent text indicating which figures came from luxury journals. If Science and Nature lose their prestige so do they (or at least that's what they think)

This was very apparent when eLife changed their publishing model. Their was a big outpouring of rage from older scientists who had published in eLife when it was a more standard "high impact" journal. Lots of "you're ruining your reputation and therefore mine".

bjackman · 2026-03-04T16:45:21 1772642721

Maybe I am underestimating the gap in status between the "influential figures" I imagine and the people I actually know.

I see: my friend has 10-15 years of experience in their field, they have enjoyed success and basically got the equivalent of a steady stream of promotions.

I map this onto my big tech/startup experience. I mentally model them as: they are "on top of the pile" of people that still do technical work. Everyone who still has the ability to boss them around, is a manager/institutional politician type figure who wouldn't interfere in such decisions as which journal to publish in.

But probably this mapping is wrong.

Also, I probably have a poor model of what agency and independence looks like in academia. In my big tech world, I have a pretty detailed model in my head of what things I can and can't influence. I don't have this model for academia which is gonna inevitably lead to a lot of "why don't you just".

Same thing happens to me when I moan about work to my friends. They say "I thought you were the tech lead, can't you just decree a change?" and I kinda mumble "er yeah but it doesn't really work like that". So here I'm probably doing that in reverse.

currymj · 2026-03-04T18:32:59 1772649179

it has been known to happen.

For example, spearheaded by Knuth, the community effectively abandoned the Journal of Algorithms and replaced with with ACM Transactions on Algorithms.

however it's difficult. a big factor is that professors feel obligated towards their students, who need to get jobs. even if the subfield can shift to everybody publishing in a new journal, non-specialists making hiring decisions may not update for a few years which hurts students in the job market.

abeppu · 2026-03-04T15:47:49 1772639269

I think the call for top-down policy makes sense b/c otherwise this is like every other tragedy of the commons situation. Each of those top-level researchers also has to think, "my department has junior faculty trying to build their publications list for tenure, we have post-docs and grad-students trying to get a high-impact publication to help them land a faculty job, we have research program X which is kind of in a race with a program at that other school lower down in the top 20. If we close off opportunities with the top journals, we put all of those at a competitive disadvantage."

bee_rider · 2026-03-04T15:54:18 1772639658

For the grad students especially, there’d be a career advancement incentive to still publish in the top journals. The professors might still want to publish in them just out of familiarity (with a little career incentive as well, although less pronounced than the grad students).

I think it’d be a big ask from someone whose role doesn’t typically cover that sort of decision.

jltsiren · 2026-03-04T19:13:02 1772651582

There are hundreds of reputable research universities around the world. Top-5 departments can't meaningfully change the culture of a field on their own. Top-100 perhaps could, but the coordination problem is much bigger on that level.

asdff · 2026-03-04T16:46:40 1772642800

Grant funding reporting requirements. It would be easy to say self publish for free via the institutional library. But the NIH would not like that use of their money.

glitcher · 2026-03-04T15:38:51 1772638731

I like the author's idea:

> So the solution here is straightforward: every government grant should stipulate that the research it supports can’t be published in a for-profit journal. That’s it! If the public paid for it, it shouldn’t be paywalled.

The article then acknowledges this isn't a magic solution to all the problems discussed, but it's so simple and makes so much sense as a first step.

I'm no expert here and there are probably unintended consequences or other ways to game that system for profit, but even if so wouldn't that still be a better starting point?

bjackman · 2026-03-04T15:44:14 1772639054

I think that's also a good proposal, and I don't think it conflicts with the "prestigious departments stop publishing in $journal" idea at all. Probably we want both.

Only difference is that the author is writing for a wide audience and his best angle to change the world is probably to influence the thinking of future policymakers. While I am just an annoying "why don't you just" guy, my "audience" is just the friends I happen to have in prestigious research groups.

Adam M also probably has lots of friends in prestigious research groups (IIUC although he complains a lot about academia he was quite successful within it, at least on its own terms). And the fact that he instead chooses to advocate government policy changes instead of what I'm proposing, is probably a good indication that he knows something I don't about the motivatioms of influential academics.

snowwrestler · 2026-03-04T16:02:30 1772640150

Imagine being a scientist and reading “if you take this grant, you cannot publish your results in any of the most prominent journals in your field.” Sounds good?

bjackman · 2026-03-04T16:50:52 1772643052

But IIUC there are entire fields where basically the whole US ecosystem is funded by federal grants. So if this policy gets enacted those journals are no longer prominent.

(Maybe you'd need an exception for fields where the centre of mass for funding is well outside of the US, though).

mitthrowaway2 · 2026-03-05T07:15:25 1772694925

The result is that open access journals would very rapidly, perhaps instantly, become prominent.

0xbadcafebee · 2026-03-04T21:14:45 1772658885

I explain here (https://news.ycombinator.com/item?id=47250811) but tl;dr it's because Universities need this system to get money and to give money. Nobody has yet proposed a solution which solves the money/prestige problem. With no money there's no research.

bjackman · 2026-02-28T10:37:29 1772275049

How do you imagine a secret designation would work..?

ctmnt · 2026-02-28T12:01:04 1772280064

I’m not sure what you’re referring to. It’s not (typically, as far as we know) a secret designation. We know of other companies designated as supply chain risks: Huawei, ZTE, and Kapersky are the first ones that come to mind.