More

Octoth0rpe · 2026-05-07T11:29:49 1778153389

> An over-engineered solution (complete with CLI, storage backend, documentation, unit tests) for a trivial problem which that person would've solved by an elegant bash one-liner only 3 years ago.

Importantly, I think AI companies are motivated towards the overengineered solutions as they increase the buyer's token spend. I'm not sure how we can create incentives that optimize for finding the 'right' solution, which may be the cheapest (the bash one-liner). Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?

maxsilver · 2026-05-07T12:13:58 1778156038

> Importantly, I think AI companies are motivated towards the overengineered solutions as they increase the buyer's token spend.

Yes that, and also, the more complicated the solution, the more likely no one reads or reviews it too carefully, and will instead depend on an LLM to ‘read’ and ‘review it’

Even ignoring token costs, there’s a high incentive for LLMs to generate complex solutions, because those solutions generate demand for further LLM use. (You don’t really want to review that 30,000 line pull request by hand, do you?)

munificent · 2026-05-07T17:41:58 1778175718

> Yes that, and also, the more complicated the solution, the more likely no one reads or reviews it too carefully, and will instead depend on an LLM to ‘read’ and ‘review it’

Exactly right. It's the other end of the bikeshed continuum[1]. If you send out a two-page design doc or a hundred like pull request, the recipient will actually review it. Let AI inflate that to ten pages or a thousand lines of code and they feel like they don't have enough mental capacity to tackle it so they let it slide.

[1]: https://bikeshed.com/

khaledh · 2026-05-07T13:44:59 1778161499

This reminds me off this famous quote by Tony Hoare:

    "There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies."

NateEag · 2026-05-07T18:46:57 1778179617

> Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?

I don't see how this could be achieved.

Any widely-recognized benchmark is going to be gamed by the genAI companies.

They have a strong financial incentive to do so, and their products' nature shows that they are not influenced by ethical or societal-good incentives.

overgard · 2026-05-07T21:22:02 1778188922

I dunno, on a subscription one would assume that minimizing token spend would actually be in their interest. Even for API calls I'm not entirely convinced they're profitable.

whazor · 2026-05-07T12:11:39 1778155899

I think the model space is too competitive. People will switch if another model is significantly better.

thfuran · 2026-05-07T13:09:50 1778159390

There are only a few frontier models, and aren’t they all operating under the same incentives?

jerojero · 2026-05-07T14:02:08 1778162528

Open source models maybe not necessarily as they can (in theory) be self hosted.

I think right now the incentives of open source chinese model developers is to provide good (comparable to SotA) and cheap models so the space is not captured by a few private american companies because they've seen how hard it is to compete in the space when that happens.

Octoth0rpe · 2026-05-06T12:01:19 1778068879

Pretty damning. Would also be interesting to see the number of commits overlayed. The graph tells a great story about the correlation with MS's takeover, but I wonder if at the same time that uptime went to shit, MS was shifting over large numbers of enterprise contracts to github. That would be a more complete story IMO.

None of which excuses this. Can you imagine someone's reaction in 2017 if you told them that github would be below 90% uptime in 2026? It would be unimaginable.

Octoth0rpe · 2026-04-30T00:54:29 1777510469

> That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase.

This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?

Throwing out another idea: I wonder if we could see some kind of equivalent of c header files for more modern languages so that an llm just has to read the equivalent of a .h file to start using a library.

preommr · 2026-04-30T01:40:46 1777513246

> This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?

my solution (as someone that's building something tangential) is to use granular levels of scope - there should be an implicit single file that gets generated from a package at a certain phase of the static tool processing. But the package is still split into files for flexibility and DevEx (developper experience). Files/Folder organization is super useful for humans. For tooling, the pacakge can be taken collected together, and taken as a single unit, but still decomposed based on things like namespace, and top-level definitions that define things like classes, specifications, etc. That way the tooling has control over how much context to pass in.

lesam · 2026-04-30T01:26:27 1777512387

I think AST aware code reading is criminally underused by agents - you don't need a header file if you can see a listing of all the functions in a library.

Similarly, I don't read the whole file a function is in while editing it in an IDE, why should a coding agent get the whole file polluting its context by default?

gavmor · 2026-04-30T02:04:03 1777514643

Check out Ataraxy-Labs/weave for AST-aware git merges.

But, I wonder, do AST-aware tools cleave to the LLM training manifold the way coding-tutorial slop does?

still_grokking · 2026-04-30T01:38:48 1777513128

Why would you need "header files" when a LSP server can give you just the outline of some file?

Octoth0rpe · 2026-04-26T22:59:14 1777244354

On the other hand, I don't think I know any millenials that don't have an extremely overbearing HoA that forbids anything other than a grass lawn.

pkaye · 2026-04-27T00:00:46 1777248046

I looked it up and a couple of states have laws against HOAs from forcing your to have a grass lawn. Alternatives can include native plants, drought tolerant plants, xeriscaping, vegetable gardens depending on state. The states I've found are California, Colorado, Florida, Texas and Maryland, Nevada.

Octoth0rpe · 2026-04-24T01:03:45 1776992625

There are general users of the average SaaS, and there are claude code users. There's no doubt in my mind that our expectations should be somewhat higher for CC users re: memory. I'm personally not completely convinced that cache eviction should be part of their thought process while using CC, but it's not _that_ much of a stretch.

abustamam · 2026-04-24T04:07:51 1777003671

Personally I've never thought about cache eviction as it pertains to CC. It's just not something that I ever needed to think about. Maybe I'm just not a power user but I just use the product the way I want to and it just works.

troupo · 2026-04-24T05:36:26 1777008986

Anthropic literally advertises long sessions, 1M context, high reasoning etc.

And then their vibe-coders tell us that we are to blame for using the product exactly as advertised: https://x.com/lydiahallie/status/2039800718371307603 while silently changing how the product works.

Please stop defending hapless innocent corporations.

jghn · 2026-04-24T12:57:55 1777035475

This oversells how obfuscated it is. I'm far from a power user, and the opposite of a vibe coder. Yet I noticed the effect on my own just from general usage. If I can do it, anyone can do it.

troupo · 2026-04-24T17:01:11 1777050071

Here's Anthropic's own Boris Cherny and others telling how great everything is with long sessions and contexts: https://news.ycombinator.com/item?id=47886087

taormina · 2026-04-24T15:05:50 1777043150

Listen, no one cares if you think you’re smart for seeing through the lies of their marketing team. You’re being intentionally obtuse.

jghn · 2026-04-25T00:30:02 1777077002

My point is the opposite. I don't think my observation was smart, and I'm surprised to so many people here, a venue with a lot of people who use this stuff far more than I do, think it wasn't an easy to grok thing.

taormina · 2026-04-25T06:20:05 1777098005

You’re still intentionally missing the point. Everyone knows they are lying. It doesn’t excuse the lies!

jghn · 2026-04-25T12:44:51 1777121091

I’m not. Why would anyone believe marketing speak for any product? One should always assume that at best they’re fluffing their product up and more likely that they’re telling straight up lies

troupo · 2026-04-25T17:54:56 1777139696

1. False advertisement is a thing, to the point there are laws against it

2. They were caught blatantly lying, and you're literally telling everyone it's the users' fault for not digging into the black box that is Claude Code (and more so Anthropic's servers) and figuring its behavior for themselves. A behavior that suddenly changed on a March day [1] and which previously very few people ever needed to investigate.

[1] https://x.com/levelsio/status/2029307862493618290

jghn · 2026-04-26T15:57:02 1777219022

I'm not saying this is a great state of affairs. But I'm saying that it's so pervasive in daily life that yes, at least part of the blame lies on users for not taking this into account. As a developer it's important to at least try to understand the tools and libraries on which one relies. Relying on magic black boxes is not a good plan on the user's part, and they need to be defensive about this. Too many developers have been more than happy to hand the keys over to the AI assistants and hope for the best.

Also it wasn't completely undocumented, rather it was hiding in not-quite-plain sight. Which itself is a bit duplicitous, but again something that's far from unique on the part of Anthropic.

Octoth0rpe · 2026-04-09T11:58:42 1775735922

> Torvalds jokingly named it "git" after the slang term, later defining it as "the stupid content tracker".”

I think the better Torvalds quote was when he said "I name all my projects after myself"

Octoth0rpe · 2026-04-09T00:59:29 1775696369

Octoth0rpe · 2026-04-03T15:14:12 1775229252

> the US economy really only cares about profit

Which would be ok if we more effectively were able to include externalities into company's overhead, instead of constantly subsidizing them.

Octoth0rpe · 2026-04-02T18:34:02 1775154842

Starlink's maritime, roving, airplane, and military options are all much more than $100/mo/user. Not sure how much that closes the gap, but it's _something_.

Source: https://starlink.com/business/aviation ($250->$10k/mo)

https://starlink.com/business/maritime ($250/mo)

https://starlink.com/business/mobility ($65->$540/mo)

NoLinkToMe · 2026-04-02T22:48:59 1775170139

But it's actual revenue was $10b in 2025 on 9m customers, so he's pretty much correct.

The point I have more issue with is that a 60 or 100 PE ratio only makes sense in a high-growth scenario. Telecoms are valued at 9x by comparison. 60 or 100 only makes sense if you expect it to grow by 10x from here, and face no competition and keep prices this high.

And that seems like a bit of a reach. The richest people on the planet live in urban environments in US/EU/Asia, with fast and widespread 5G.

Yes, rich people on boats in the pacific, hiking remote mountains, and researchers in Antartica exist, but they're not a market of 200 million people. And even if you get there, that's still just 120b, not 380b valuation.

Octoth0rpe · 2026-04-02T18:29:23 1775154563

Does it make sense to value Starship Commercial Launch at $170B, _and_ Falcon 9/Heavy at $100B? I would expect that if Starship achieves its operational goals, then it should quickly deprecate nearly all uses of Falcon, the exceptions being national security launches that require validating the launcher, or Dragon launches for similar reasons. Even those categories are likely on a countdown the moment starship is rapidly reusable.