More

shepherdjerred · 2026-06-07T16:23:51 1780849431

Even if you can create infinite software you still have to be very intentional about what you’re choosing to work on.

There’s still a cost to testing, support, planning, etc even if coding is now “free”

mmcnl · 2026-06-07T17:33:46 1780853626

Anthropic claims 8-fold productivity increase since 2025. If even that isn't enough to enable support for Linux, I don't know what it is.

shepherdjerred · 2026-06-07T19:39:14 1780861154

I didn't say that they _couldn't_, but it clearly isn't a priority for them. They still have the same opportunity cost any other engineering team faces.

They can work on feature X or feature Y -- which is the better choice?

Apparently they don't think Linux support is significant. I doubt the lack of support is due to technical constraints.

kdnvk · 2026-06-08T00:49:02 1780879742

They explicitly did not claim this in the blog post

trumpdong · 2026-06-07T16:32:33 1780849953

If only Anthropic had some kind of automated testing, support, planning machine.

shepherdjerred · 2026-06-07T19:36:33 1780860993

You wouldn't run an engineering company with 500 engineers reporting directly to the CEO.

AI doesn't solve this. You need humans who can understand and verify what is being made.

gessha · 2026-06-08T01:53:28 1780883608

> You wouldn't run an engineering company with 500 engineers reporting directly to the CEO.

Jack Dorsey wants to prove you wrong!

shepherdjerred · 2026-06-08T02:51:35 1780887095

And there's nothing wrong with that! I doubt they'll succeed with the current generation of tools, though.

trumpdong · 2026-06-07T22:47:28 1780872448

Their whole value proposition is based on the fact that you no longer do.

id00 · 2026-06-07T20:13:05 1780863185

Not, according to their AI influencers

shepherdjerred · 2026-06-07T04:50:52 1780807852

They're describing a layered architecture enforced by some script in CI.

For example, if you had a `backend`, `common`, and `frontend` package, you would be OK having backend/frontend depending on common, but you wouldn't want common depending on backend/frontend or backend/frontend depending on each other.

If you think about JavaScript, there is nothing stopping your dependency graph from becoming spaghetti. It sounds like they built static analysis to enforce rules.

Some languages have this built in like Java (Project Jigsaw), Go, and Rust. JavaScript, Python, etc. have no such feature.

It's really nothing special -- it has existed before. It just becomes a _lot_ more important with agents since they produce a lot of code, and it is good to have lots of static analysis when heavily utilizing agents.

They mention this in the article:

> This is the kind of architecture you usually postpone until you have hundreds of engineers. With coding agents, it’s an early prerequisite: the constraints are what allows speed without decay or architectural drift.

shepherdjerred · 2026-06-07T04:40:41 1780807241

This is beautiful!

I had a project 'discord plays pokemon' written in TypeScript that allowed users to play Pokemon together.

The architecture was a GPU accelerated Docker container running a full browser and desktop environment.

With this it can all be done in-process. I threw Claude at the problem and it worked!

https://github.com/shepherdjerred/monorepo/tree/main/package...

shepherdjerred · 2026-06-07T04:05:03 1780805103

Would it make sense to keep that money in a trust and pay a portion of the interest out to them each month/year?

e.g. assuming $500k/yr at 4% interest, that would be $20k/yr which is not enough to live on but is still a nice cushion

shepherdjerred · 2026-06-07T02:08:03 1780798083

This mirrors exactly what I have been doing.

- Give Claude/Codex a way to verify its own work (browser, smoke tests, e2e tests, high-fidelity local environment)

- Keep all context (issue tracking, docs, ideas, plans, worklogs) in-repo (https://github.com/shepherdjerred/monorepo/tree/main/package...)

- Give Claude/Codex access to observability (Grafana, Prometheus, Tempo, PagerDuty)

- Have Claude/Codex follow good engineering guidelines like fail-fast, type safety, parse at boundaries

I haven't yet been able to achieve full autonomy due to cost and CI load on my homelab.

para_parolu · 2026-06-07T02:30:27 1780799427

Does it yield good results? I found that instead of docs it’s easier just to ask ai to read code. I feel like this is same as comments in code. Become outdated fast

shepherdjerred · 2026-06-07T02:38:56 1780799936

I don't really use "docs" for documentation. I've prompted Claude/Codex to always write a "log" and save it in-repo to track what it did and why.

I've found this to be really helpful, e.g. "you did this last week, and now some other thing is happening" or "you tried this approach before to solve alert X but it didn't work" -- except it can discover this itself.

https://github.com/shepherdjerred/monorepo/tree/main/package...

I've also used it to store TODOs and plans. For example I might want to explore some idea and defer it for later, or some weekend have it execute on some tech debt I've put off. One last use case is asking "what did I work on in the last 2-3 weeks, is it healthy, and what additional quality checks can/should I do; is there any follow-up work?"

DenisM · 2026-06-07T16:14:02 1780848842

I find that preserving logs that contain errors will confuse future sessions even if the errors were corrected at the time. Do you have that problem?

Essentially preserving logs extends the context window with all related problems.

shepherdjerred · 2026-06-07T16:19:50 1780849190

I haven’t actually noticed that, but I’m not sure why. Maybe because I specifically describe it to the agent as a work log rather than documentation? I’m not sure

c0rruptbytes · 2026-06-07T10:00:47 1780826447

it does not result in great results left unattended, it’ll start creating slop or hardcoding solutions

but overtime if you adjust your verification rubric, it’s not too bad, gets pretty good, if you do make it do TDD, it gets kinda crazy and you’ll have 2000-3000 tests after awhile, or on my common case, 6000-7000 lines of code in single files (i usually have a cron to audit files for decomposition and create tickets)

i wouldn’t use it at my job yet, but it’s been fun to use for personal projects - it’s like modded minecraft automation or factorio

shepherdjerred · 2026-06-07T16:20:44 1780849244

Static analysis can help here! Add CI checks for duplicated code or file length.

For test growth, maybe use a coverage tracker and remove redundant tests?

geoffbp · 2026-06-07T11:41:01 1780832461

I like the idea of saving the work done into files - helps to prevent the llm from redoing the same work. Maybe one day instead of code in a repo it will just be a list of prompts.

shepherdjerred · 2026-06-07T19:40:48 1780861248

Yes, this was a huge help for me. For example I would have a difficult bug that requires a few sessions/deployments to truly close out.

With the worklog, it can easily see "oh I've worked on something similar before"

shepherdjerred · 2026-06-05T01:40:17 1780623617

It is an excellent example of how LLMs let you try new ideas, even if they aren’t necessarily good ones

shepherdjerred · 2026-06-04T06:37:52 1780555072

Dell had the best touchpads aside from Microsoft and Apple

shepherdjerred · 2026-06-04T06:28:07 1780554487

Yeah, it has been in foraging. Requests that Claude has refused me:

- What are popular free streaming sites used in China?

- How do I bypass the safety mechanism on my food processor (it’s broken)

- What are nerve agents and how do they work (for a layman)?

- Help me decompile some code

- Help me make a design system similar to XYZ

- Here is an API token, please do X (I can’t do that! Rotate the secret immediately! I refuse!)

In some cases I can trick it with prompting, but in many cases it is steadfast. The food processor one was particularly annoying

Grimblewald · 2026-06-04T11:35:33 1780572933

I've had some really dumb refusals. Explaining elements of infrared specteoscopy, researching aritifical bud-breaking in agriculture, etc. Anything interesting and non-mainstream is banned. Basically, restricted to answers i'm better of just going to wikipedia for.

mft_ · 2026-06-04T15:29:22 1780586962

Yeah, I had my first refusal with 4.8 today.

I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.

First time ever I've fired up openrouter to seriously consider alternatives.

gspr · 2026-06-04T07:29:06 1780558146

I find it terrifying that people are willing to outsource thinking. Outsourcing thinking to an entity that is opinionated about what to think is beyond crazy.

shepherdjerred · 2026-06-04T15:21:37 1780586497

What’s the difference between outsourcing thinking and using an LLM as a research tool?

An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data

mmmlinux · 2026-06-04T20:15:39 1780604139

The only guard rail ive hit recently was when i was trying to get it to rename files ripped from dvd to episode names. I told it to try again and it did it. It wasn't even really a refusal it was just working on it and then stopped for content violation or what ever.

mwigdahl · 2026-06-04T13:46:39 1780580799

An easy way around the API token thing is to put it in a file and point the model at the file. I saw what you were seeing when I provided credentials directly, but haven't had any problems with it since using the indirect method.

stavros · 2026-06-04T10:54:34 1780570474

It refuses to use an API token? In my experience, it's more than happy to read out my secrets from .envrc files "just to check".

At least it feels a lot of remorse over its mistake until I reset the session.

shepherdjerred · 2026-06-04T15:19:25 1780586365

It’s really hit or miss. Most of the times it works but every once in a while it will dig in its heels

fc417fc802 · 2026-06-04T06:37:10 1780555030

> What are nerve agents and how do they work (for a layman)?

On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?

Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?

torginus · 2026-06-05T23:39:00 1780702740

I remember once in college a Chem Eng friend told me he (or any competent chemical engineer) could basically manufacture a lot of explosives/chemical agents should he want to. He even told me he could substitute a lot of suspicious precursor materials with others, should he want to avoid raising alarms.

I think AI or not, the knowledge to how to make this stuff is basically out there, and its not chatbot guardrails that are keeping nerve gas and TNT out of the hands of regular people.

plufz · 2026-06-04T07:14:36 1780557276

Maybe the difference is that just reading Wikipedia only help you part of the way. While an LLM could help you step by step (e2e) producing a functional weapon. And setting a more complex rule where claude tells you some things about this and not other is probably a lot more work for little gain?

But I have no idea. Just guessing here.

lazide · 2026-06-04T09:53:39 1780566819

That query would not more provide actionable guidance than ‘tell me how a nuclear weapon works (for a layman)’. Aka not at all.

fc417fc802 · 2026-06-04T10:50:12 1780570212

I believe a sufficiently advanced model could provide a layman with actionable step by step instructions for building a nuclear weapon. They're complicated but not (AFAIK) that complicated. The more or less insurmountable barrier there is weapons grade material. Thankfully refinement is prohibitive in cost, expertise, and equipment.

In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.

BizarroLand · 2026-06-04T16:15:32 1780589732

The difficulty with a fission bomb is getting enough uranium or plutonium or other fissile material together for the bomb yield you want (at least above the critical mass for your chosen material), and refining it to fissile form, (since most fissile material found in nature is a more stable variety), and then separating the fissile bits with something thin but neutron absorptive.

The rest is just slamming the material together with a small explosive so that it passes the critical mass state and starts a chain reaction.

This is information you can find in many places if you're willing to put the effort in to go searching for it. Knowing this knowledge does not get you any closer to making atomic bombs. The process of mining uranium or plutonium is difficult, expensive, and very likely to get you caught before you even make it to the enrichment step of the process thanks to constant world-wide spy satellite surveillance.

Unless you are a nation, your only chance of making a nuclear bomb would be to find a lost nuclear submarine and convert the nuclear material inside of it before you were caught.

lazide · 2026-06-04T11:29:58 1780572598

A gun type maybe. But then, two paragraphs and some machining knowledge + shop tooling could do the same, given enough refined material.

Ain’t no way a layman is pulling off an implosion device, regardless of tooling or LLM guidance. The explosive lense structure and timing required is quite complex, and would require some significant calculation from someone who actually knew what they were doing.

Nation state, or even sufficiently motivated big corp, if they had the refined material? Sure. Layman? No.

Thinking they can with LLM slop involved? That will make for some very interesting radiological incidents though!

jerf · 2026-06-04T15:11:21 1780585881

"A gun type" of nuke is sufficient to achieve most, and usually all, of the goals some small group building a nuke would have.

We are all fortunate that as fc417fc802 mentioned, refining the materials proves to be quite challenging and I see no particular way that AI could possibly make that any easier. If it was as simple as building a gun-type nuke banging together any uranium together to get a big bang we'd be living in a very different world.

fc417fc802 · 2026-06-04T12:15:57 1780575357

I agree, but really feel like you're missing the point here. Many things are reasonably straightforward and require almost no understanding when you have simple step by step instructions. LLMs are capable of providing such instructions and in certain cases they probably shouldn't.

But it's not as simple as just refusing help on a broad swathe of topics they way they do now. That makes agents much less useful in general (ie lots of collateral damage) and for many topics is entirely ineffective given that for better or worse the internet already makes such material readily available. In such cases reporting suspicious behavior is likely to be much more effective than denial.

Aside: You've now got me curious and I really want to test the frontier models to see to what extent they're capable of providing sensible designs and specifications for implosion type thermonuclear weapons but also feel like that would attract the wrong sort of attention and probably create a headache for me in more ways than one.

lazide · 2026-06-04T12:19:13 1780575553

I think you’re missing the point?

The data is often wrong enough it screws whoever tries it unless they have enough experience/knowledge to not need it, or really doesn’t help beyond what someone using existing tools to get - albeit with a little more motivation.

At best, it either gets someone started with something they still need to think to finish, or gets them deep into a mess it can’t help them get out of. In my experience.

In some edge cases, it can be used by experts to automate some grunt work or do prototypes without getting in the way, but often a better thought out framework is usually faster in my experience.

Awhile ago I made an analogy about WYSIWYG gui tools, and the more this comes up, the more accurate I think it really is.

fc417fc802 · 2026-06-04T12:22:22 1780575742

Does that not depend entirely on the topic and does it not get better with each generation? This is a general ethical and functional question that isn't going away about how the models ought to handle certain topics. Much of the difficulty at present is caused by a ham fisted broad censorship approach that I'm pointing out is wrong headed in an at least somewhat nuanced way.

lazide · 2026-06-04T13:06:53 1780578413

Maybe? I haven’t seen it crop up however on any topic someone knows well - a kind of dunning Kruger, I guess?

And yeah, the censorship model is wrong, but also the underlying other model is wrong too.

Sharlin · 2026-06-04T09:32:36 1780565556

I thought that these models are supposed to be vastly smarter than what’s needed to discern between "general information trivially available on Wikipedia" and "actionable synthesis instructions".

yencabulator · 2026-06-04T15:27:59 1780586879

An LLM could probably make that distinction clearly.

a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.

For example:

- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful

BizarroLand · 2026-06-04T16:04:51 1780589091

So, where are the truly uncensored models? There has to be some that have no guardrails, built on publicly available data, that will explain to anyone in graphic detail anything they want to know or talk about.

I've tried the abliterated ones from huggingface and they still have guardrails. I guess I could fire up unsloth and re-abliterate a 20b, but surely someone somewhere has already done this.

All of this concern about guardrails and security, people have such puckered butts about it when so far, 99.9% of people at least have no access to any of this to begin with, and if someone does use a tool for evil, it's on the user, not the tool.

fc417fc802 · 2026-06-04T19:51:02 1780602662

As I understand things (not a user) abliteration has been superceded by actively monitoring the model state during the run and steering specific "negative" directions as they arise. It's both more reliable and does less damage.

nicce · 2026-06-04T08:40:56 1780562456

Let's see what is the fate of Wikipedia if turns like big tech:

https://news.ycombinator.com/item?id=48285592

svara · 2026-06-04T06:46:33 1780555593

This is strange to me, did you really ask like this and which model did you use?

I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.

The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.

But I haven't seen such filters trigger at all anymore in more than half a year.

shepherdjerred · 2026-06-04T15:18:54 1780586334

1 and 3 were refused on the Claude web chat using Opus 4.7 or 4.8. I’m not sure why we’re getting different results

brianwawok · 2026-06-04T16:55:15 1780592115

Honestly it may be your memory has internalized you are a student or researcher and grants you more leeway. Which if so is a very bad security rail.

fragmede · 2026-06-05T23:54:43 1780703683

There's a study out there that if you tell the LLM you're a (medical) patient, all you get are refusals. If you tell it you're a doctor, then it'll actually help you.

ElFitz · 2026-06-04T08:03:03 1780560183

How are decompiling code or making a design system inspired by another one even remotely illegal?

shepherdjerred · 2026-06-04T06:24:53 1780554293

LLMs have really changed the world. I didn’t think something like then would be possible in my lifetime

dyauspitr · 2026-06-04T06:51:44 1780555904

It came out of nowhere. It’s all emergent. I’m convinced this is possible with just about anything given enough data. We will be seeing a near magical physical outputs LLM in the near future. It’s going to take in video and sounds and spit out physical movements that will be just as mind blowing as when 3.5 came out and it will come out of nowhere.

zhoBEENG · 2026-06-05T21:16:45 1780694205

I can't agree enough, and I am increasingly struggling to understand why people are not grasping this. It's just a matter of sensors in the right places and compute.

sufficient telemetry + sufficient compute = AI solution to any problem

From the Universal Approximation Theorem for neural nets, we know that if we have the right training method and net architecture we can get approximate any function with a NN. Of course, that doesn't imply that we actually have a sufficient training method and net architecture for the problem at hand, but we have been able to demonstrably solve at least two engineering domains: physical world navigation (Waymo) and language (GPT). It turns out a robust enough language model is sufficient for reasoning.

Given these results, I am personally stumped to come up with a problem humans can solve now that we can't solve with a computer given the correct telemetry and sufficient compute.

shepherdjerred · 2026-06-03T15:50:00 1780501800

Those are rookie numbers

https://stocks.sjer.red/

zargon · 2026-06-03T16:31:13 1780504273

I thought I read that Samsung SATA SSDs were discontinued, but apparently that was a rumor and Sansung has denied it. I wonder why they exceed NVMe prices. They're the only SATA drives left with DRAM. I guess they could just be milking that fact.

tosti · 2026-06-03T17:33:30 1780508010

Well boo-hoo. It's about time more people got to know what it's like not to be on the bleeding edge. I've always had second-hand computers and only once bought myself a new laptop, the asus EeePC after the price dropped.

Ten years from now I'll get to watch inception in 4K.

linker3000 · 2026-06-03T17:50:12 1780509012

A few weeks ago I needed a computer to be a Debian server for some at-home simple Web dev / learning stuff. I bought an HP Prodesk 400 G3 SFF PC with i5-6500, 8GB RAM and a 256GB off a popular auction site for £44. It'll do. I might upgrade to 16GB. An additional 8GB stick costs £19.

tosti · 2026-06-03T20:34:08 1780518848

Good work. I bought an AMD Ryzen 3 3100 for €35 with shipping. A Radeon W5500 would set me back €150 at the moment. 16 GiB of RAM another €90. And that's on a relatively cheap site in my country.

cwmoore · 2026-06-03T20:32:20 1780518740

You’ll want 64K bro Inception is a very loud movie