More

noosphr · 2026-02-08T08:34:46 1770539686

My favourite part of a Canticle for Leibowitz is the manual auto regressive model the old monk is using to recover damaged books. I remember reading the gpt2 paper and thinking hang on...

noosphr · 2026-02-08T01:05:26 1770512726

Last I heard, which was last year, human + computer still beat either by themselves. You got a link about what's changed?

dwohnitmok · 2026-02-08T03:21:12 1770520872

I'm curious what you heard exactly. As far as I can tell, centaur chess looks completely dead.

Nobody ever wins anymore in the ICCF championships (which I believe is the most prestigious centaur chess venue, but am not sure).

This is not an exaggeration. See my comment from several months ago: https://news.ycombinator.com/item?id=45768948

As far as I can tell based on scanning forums, to the extent humans contribute anything to the centaur setup, it is entirely in hardware provisioning and allocating enough server time before matches for chess engines to do precomputation, rather than anything actually chess related, but I am unsure on this point.

I have heard anecdotally from non-serious players (and therefore I cannot be certain that this reflects sentiment at the highest levels although the ICCF results seem to back this up) that the only ways to lose in centaur chess at this point is to deviate from what the computer tells you to do, either intentionally or unintentionally by accidentally submitting the wrong move, or simply by being at a compute disadvantage.

I've got several previous comments on this because this is a topic that interests me a lot, but the two most topical here are the previous one and https://news.ycombinator.com/item?id=33022581.

seanhunter · 2026-02-08T14:16:48 1770560208

The last public ranking of chess centaurs was 2014, after which it is generally held to be meaningless as the ranking of a centaur is just the same as the ranking of the engine. Magnus Carlsen’s peak elo of 2884 is by far the highest any human has ever achieved. Stockfish 18 is estimated to be in excess of 4000 elo. Which is to say the difference between it and the strongest human player ever is about the same as the difference between a strong club player and a grandmaster. It’s not going to benefit meaningfully from anything a human player might bring to the partnership.

Magnus himself in 2015 said we’ve known for a long time that engines are much stronger than humans so the engine is not an opponent.

https://stockfishchess.org/blog/2026/stockfish-18/

https://www.dw.com/en/world-chess-champion-magnus-carlsen-th...

jibal · 2026-02-08T06:27:08 1770532028

You're the one claiming "Last I heard" so you're the one who owes a link.

noosphr · 2026-02-08T00:52:48 1770511968

OK, but who is saying that to the llm? Another llm?

We got feedback in this thread from someone who supposedly knows rust about common anti patterns and someone from the company came back with 'yeah that's a problem, we'll have agents fix it.'[0].

Agents are obviously still too stupid to have the meta cognition needed for deciding when to refactor, even at $1,000 per day per person. So we still need the buts in seats. So we're back at the idea of centaurs. Then you have to make the case that paying an AI more than a programmer is worth it.[1]

[0] which has been my exact experience with multi-agent code bases I've burned money on.

[1] which in my experience isn't when you know how to edit text and send API requests from your text editor.

noosphr · 2026-02-07T18:18:48 1770488328

I was looking for some code, or a product they made, or anything really on their site.

The only github I could find is: https://github.com/strongdm/attractor

    Building Attractor

    Supply the following prompt to a modern coding agent
    (Claude Code, Codex, OpenCode, Amp, Cursor, etc):
  
    codeagent> Implement Attractor as described by
    https://factory.strongdm.ai/

Canadian girlfriend coding is now a business model.

Edit:

I did find some code. Commit history has been squashed unfortunately: https://github.com/strongdm/cxdb

There's a bunch more under the same org but it's years old.

simonw · 2026-02-07T18:25:44 1770488744

There's actual code in this repo: https://github.com/strongdm/cxdb

socialcommenter · 2026-02-08T07:20:51 1770535251

Amusingly, it appears the README (that would be code, right?) has hallucinated the existence of a docker image - someone filed an issue at https://github.com/strongdm/cxdb/issues/1

In-house employees don't read code or do code reviews, so presumably they don't raise issues either. I guess the issue was picked up by an astute HN reader.

lunar_mycroft · 2026-02-07T19:18:20 1770491900

I've looked at their code for a few minutes in a few files, and while I don't know what they're trying to do well enough to say for sure anything is definitely a bug, I've already spotted several things that seem likely to be, and several others that I'd class as anti-patterns in rust. Don't get me wrong, as an experiment this is really cool, but I do not think they've succeeded in getting the "dark factory" concept to work where every other prominent attempt has fallen short.

simonw · 2026-02-07T19:20:56 1770492056

Out of interest, what anti-patterns did you see?

(I'm continuing to try to learn Rust!)

lunar_mycroft · 2026-02-07T20:41:41 1770496901

To pick a few (from the server crate, because that's where I looked):

- The StoreError type is stringly typed and generally badly thought out. Depending on what they actually want to do, they should either add more variants to StoreError for the difference failure cases, replaces the strings with a sub-types (probably enums) to do the same, or write a type erased error similar to (or wrapping) the ones provided by anyhow, eyre, etc, but with a status code attached. They definitely shouldn't be checking for substrings in their own error type for control flow.

- So many calls to String::clone [0]. Several of the ones I saw were actually only necessary because the function took a parameter by reference even though it could have (and I would argue should have) taken it by value (If I had to guess, I'd say the agent first tried to do it without the clone, got an error, and implemented a local fix without considering the broader context).

- A lot of errors are just ignored with Result::unwrap_or_default or the like. Sometimes that's the right choice, but from what I can see they're allowing legitimate errors to pass silently. They also treat the values they get in the error case differently, rather than e.g. storing a Result or Option.

- Their HTTP handler has an 800 line long closure which they immediately call, apparently as a substitute for the the still unstable try_blocks feature. I would strongly recommend moving that into it's own full function instead.

- Several ifs which should have been match.

- Lots of calls to Result::unwrap and Option::unwrap. IMO in production code you should always at minimum use expect instead, forcing you to explain what went wrong/why the Err/None case is impossible.

It wouldn't catch all/most of these (and from what I've seen might even induce some if agents continue to pursue the most local fix rather than removing the underlying cause), but I would strongly recommend turning on most of clippy's lints if you want to learn rust.

[0] https://rust-unofficial.github.io/patterns/anti_patterns/bor...

jaytaylor · 2026-02-07T23:18:37 1770506317

(StrongDM AI team member here)

This is great feedback, appreciate you taking the time to post it. I will set some agents loose on optimization / purification passes over CXDB and see which of these gaps they are able to discover and address.

We only chose to open source this over the past few days so it hasn't received the full potential of technical optimization and correction. Human expertise can currently beat the models in general, though the gap seems to be shrinking with each new provider release.

nmilo · 2026-02-07T23:33:25 1770507205

Hey! That sounds an awful lot like code being reviewed by humans

drekipus · 2026-02-07T22:54:27 1770504867

This is why I think AI generated code is going nowhere. There's actual conceptual differences that the stotastic parrot cannot understand, it can only copy patterns. And there's no distinction between good and bad code (IRL) except for that understanding

fragmede · 2026-02-08T14:46:38 1770561998

what we need to do is setup a feedback loop for bad code so that the AI agents can be trained to not do that

jessmartin · 2026-02-07T18:27:52 1770488872

They have a Products page where they list a database and an identity system in addition to attractors: https://factory.strongdm.ai/products

For those of us working on building factories, this is pretty obvious because once you immediately need shared context across agents / sessions and an improved ID + permissions system to keep track of who is doing what.

yomismoaqui · 2026-02-07T18:21:38 1770488498

I don't know if that is crazy or a glimpse of the future (could be both).

PS: TIL about "Canadian girlfriend", thanks!

ebhn · 2026-02-07T18:26:45 1770488805

That's hilarious

ares623 · 2026-02-07T19:00:24 1770490824

I was about to say the same thing! Yet another blog post with heaps of navel gazing and zero to actually show for it.

The worst part is they got simonw to (perhaps unwittingly or social engineering) vouch and stealth market for them.

And $1000/day/engineer in token costs at current market rates? It's a bold strategy, Cotton.

But we all know what they're going for here. They want to make themselves look amazing to convince the boards of the Great Houses to acquire them. Because why else would investors invest in them and not in the Great Houses directly.

simonw · 2026-02-07T19:08:30 1770491310

The "social engineering" is that I was invited to a demo back in October and thought it was really interesting.

(Two people who's opinions I respect said "yeah you really should accept that invitation" otherwise I probably wouldn't have gone.)

I've been looking forward to being able to write more details about what they're doing ever since.

ucirello · 2026-02-07T19:41:55 1770493315

Justin never invites me in when he brings the cool folks in! Dang it...

oidar · 2026-02-07T22:17:39 1770502659

Is this the black box folks you mentioned?

simonw · 2026-02-07T23:39:35 1770507575

It's the dark factory people, yeah: https://news.ycombinator.com/item?id=46739117#46801848

ares623 · 2026-02-07T19:28:29 1770492509

I will look forward to that blog post then, hopefully it has more details than this one.

EDIT nvm just saw your other comment.

shimman · 2026-02-08T02:17:14 1770517034

You don't see why a company would gain to invite bloggers that will happily write positively about them? Talk about a conflict of interest, the FTC should ban companies from doing this.

simonw · 2026-02-08T03:46:38 1770522398

Are you saying that because I have a blog I should be banned from going to meetings or demos of anything, for any reason?

enraged_camel · 2026-02-08T05:13:24 1770527604

This reads like a total joke.

navanchauhan · 2026-02-07T20:43:08 1770496988

I think this comment is slightly unfair :(

We’ve been working on this since July, and we shared the techniques and principles that have been working for us because we thought others might find them useful. We’ve also open-sourced the nlspec so people can build their own versions of the software factory.

We’re not selling a product or service here. This also isn’t about positioning for an acquisition: we’ve already been in a definitive agreement to be acquired since last month.

It’s completely fair to have opinions and to not like what we’re putting out, but your comment reads as snarky without adding anything to the conversation.

Game_Ender · 2026-02-07T21:34:17 1770500057

Can you link to nlspec? It is not easy to find with a search.

simonw · 2026-02-07T21:38:42 1770500322

That's in this repo: https://github.com/strongdm/attractor

navanchauhan · 2026-02-07T21:39:02 1770500342

https://github.com/strongdm/attractor

ares623 · 2026-02-08T01:05:07 1770512707

[flagged]

blackqueeriroh · 2026-02-08T02:38:44 1770518324

Why will you be destitute? Consider this: how do billionaires make most of their money?

I’ll answer you: people buy their stuff.

What happens if nobody has jobs? Oh, that’s right! Nobody’s buying stuff.

Then what happens? Oh yeah! Billionaires get poorer.

There’s a very rational, self-interested reason sama has been running UBI pilots and Elon is also talking about UBI - the only way they keep more money flowing into their pockets is if the largest number of people have disposable income.

FeteCommuniste · 2026-02-08T03:56:48 1770523008

Nice, so all of us legacy humans can be kept around as pets on a fixed income for the master race of billionaires and their AI army.

palmotea · 2026-02-08T07:09:21 1770534561

> What happens if nobody has jobs? Oh, that’s right! Nobody’s buying stuff.

> Then what happens? Oh yeah! Billionaires get poorer.

Or they pivot to businesses that don't depend on consumers buying stuff.

Or pivot away from business entirely, into a realm of pure power independent of the market and conventional economics.

> There’s a very rational, self-interested reason sama has been running UBI pilots and Elon is also talking about UBI - the only way they keep more money flowing into their pockets is if the largest number of people have disposable income.

There's another very rational, self-interested reason for those people to pursue UBI: as a temporary sop to the masses, to keep them passive until they lack the power to resist.

yojat661 · 2026-02-08T11:47:21 1770551241

> The worst part is they got simonw to (perhaps unwittingly or social engineering) vouch and stealth market for them.

Lol. Any time I see something ai related endorsed by simonw, I tend to view it as mostly hype, and I have been right so far.

fragmede · 2026-02-08T15:11:17 1770563477

Can you give an example? His writing seems pretty grounded to me. He's not out there going on podcasts claimed that LLMs are going to cure cancer, afaik.

itissid · 2026-02-07T23:31:07 1770507067

So I am on a web cast where people working about this. They are from https://docs.boundaryml.com/guide/introduction/what-is-baml and humanlayer.dev Mostly are talking about spec driven development. Smart people. Here is what I understood from them about spec driven development, which is not far from this AFAIU.

Lets start with the `/research -> /plan -> /implement(RPI)`. When you are building a complex system for teams you _need_ humans in the loop and you want to focus on design decisions. And having structured workflows around agents provides a better UX to those humans make those design decisions. This is necessary for controlling drift, pollution of context and general mayhem in the code base. _This_ is the starting thesis around spec drive development.

How many times have you working as a newbie copied a slash command pressed /research then /plan then /implement only to find it after several iterations is inconsistent and go back and fix it? Many people still go back and forth with chatgpt copying back and forth copying their jira docs and answering people's question on PRD documents. This is _not_ a defence it is the user experience when working with AI for many.

One very understandable path to solve this is to _surface_ to humans structured information extracted from your plan docs for example:

https://gist.github.com/itissid/cb0a68b3df72f2d46746f3ba2ee7...

In this very toy spec driven development the idea is that each step in the RPI loop is broken down and made very deterministic with humans in the loop. This is a system designed by humans(Chief AI Officer, no kidding) for teams that follow a fairly _customized_ processes on how to work fast with AI, without it turning into a giant pile of slop. And the whole point of reading code or QA is this: You stop the clock on development and take a beat to see the high signal information: Testers want to read tests and QAers want to test behavior, because well written they can tell a lot about weather a software works. If you have ever written an integration test on a brownfield code with poor test coverage, and made it dependable after several days in the dark, you know what it feels like... Taking that step out is what all VCs say is the last game in town.. the final game in town.

This StrongDM stuff is a step beyond what I can understand: "no humans should write code", "no humans should read code", really..? But here is the thing that puzzles me even more is that spec driven development as I understand it, to use borrowed words, is like parents raising a kid — once you are a parent you want to raise your own kid not someone else's. Because it's just such a human in the loop process. Every company, tech or not, wants to make their own process that their engineers like to work with. So I am not sure they even have a product here...

noosphr · 2026-02-07T17:53:38 1770486818

This is the part that feels right to me because agents are idiots.

I built a tool that writes (non shit) reports from unstructured data to be used internally by analysts at a trading firm.

It cost between $500 to $5000 per day per seat to run.

It could have cost a lot more but latency matters in market reports in a way it doesn't for software. I imagine they are burning $1000 per day per seat because they can't afford more.

threecheese · 2026-02-07T18:40:20 1770489620

They are idiots, but getting better. Ex: wrote an agent skill to do some read only stuff on a container filesystem. Stupid I know, it’s like a maintainer script that can make recommendations, whatever.

Another skill called skill-improver, which tries to reduce skill token usage by finding deterministic patterns in another skill that can be scripted, and writes and packages the script.

Putting them together, the container-maintenance thingy improves itself every iteration, validated with automatic testing. It works perfectly about 3/4 of the time, another half of the time it kinda works, and fails spectacularly the rest.

It’s only going to get better, and this fit within my Max plan usage while coding other stuff.

noosphr · 2026-02-07T19:03:10 1770490990

LLMs are idiots and they will never get better because they have quadratic attention and a limited context window.

If the tokens that need to attend to each other are on opposite ends of the code base the only way to do that is by reading in the whole code base and hoping for the best.

If you're very lucky you can chunk the code base in such a way that the chunks pairwise fit in your context window and you can extract the relevant tokens hierarchically.

If you're not. Well get reading monkey.

Agents, md files, etc. are bandaids to hide this fact. They work great until they don't.

touristtam · 2026-02-08T07:52:31 1770537151

You don't ask the agent to replace your pipeline by providing the data; you ask it to automate the pipeline.

noosphr · 2026-02-07T13:41:37 1770471697

Guiles debugging has been a nightmare in the 3.x series. Which is rather surprising since it was probably the easiest scheme to debug in the 1.x days.

It got so bad I moved to racket as my daily driver.

wwfn · 2026-02-07T13:47:17 1770472037

Can you say more? Guile's the only scheme I've tried (attempts at packaging for Guix). Debugging has been difficult, but I figured it was me struggling with new tools and API. Does racket have better facilities for introspection or discovery at the REPL?

pjmlp · 2026-02-07T14:27:58 1770474478

Yes, since years, all the way when it was still known as Dr. Scheme.

It has a graphical experience similar to the survivors from Lisp is Great days, like LispWorks and Allegro Common Lisp, or Clojure with Cursive.

pjmlp · 2026-02-07T14:25:43 1770474343

Even better than PLT Scheme with Dr. Scheme, now Raket has ever been?

I have my doubts, given its age.

feastingonslop · 2026-02-07T13:45:08 1770471908

Which LLM works best for Racket?

davidelettieri · 2026-02-07T18:48:34 1770490114

I found gemini 3 pro to be pretty good.

Overall understanding of the code, code generation capabilities and ability to explain are all pretty good.

noosphr · 2026-02-06T10:58:06 1770375486

Sortition is the only system that ensures high quality universal education. If anyone can become president for a year then everyone needs to be able to be president for a year.

derektank · 2026-02-06T15:42:51 1770392571

I would like to see sortition implemented in one house of a bicameral legislature. Executive office is not where I would want to see it tested first (and I think it’s ill suited even in theory).

Der_Einzige · 2026-02-06T12:39:51 1770381591

This but unironically.

noosphr · 2026-02-06T03:25:03 1770348303

I've been building systems like what the OP is using since gpt3 came out.

This is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.

Then a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.

I've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.

Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

tudelo · 2026-02-06T03:41:12 1770349272

First off, appreciate you sharing your perspective. I just have a few questions.

> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.

Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?

> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.

I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?

noosphr · 2026-02-06T04:17:56 1770351476

LLMs work on text and nothing else. There isn't any magic there. Just a limited context window on which the model will keep predicting the next token until it decides that it's predicted enough and stop.

All the tooling is there to manage that context for you. It works, to a degree, then stops working. Your intuition is there to decide when it stops working. This intuition gets outdated with each new release of the frontier model and changes in the tooling.

The stateless API with a human deciding what to feed it is much more efficient in both cost and time as long as you're only running a single agent. I've yet to see anyone use multiple agents to generate code successfully (but I have used agent swarms for unstructured knowledge retrieval).

The Unix tools are there for you to progra-manually search and edit the code base copy/paste into the context that you will send. Outside of Emacs (and possibly vim) with the ability to have dozens of ephemeral buffers open to modify their output I don't imagine they will be very useful.

Or to quote the SICP lectures: The magic is that there is no magic.

fhd2 · 2026-02-06T07:16:54 1770362214

I can't speak for parent, but I use gptel, and it sounds like they do as well. It has a number of features, but primarily it just gives you a chat buffer you can freely edit at any time. That gives you 100% control over the context, you just quickly remove the parts of the conversation where the LLM went off the rails and keep it clean. You can replace or compress the context so far any way you like.

While I also use LLMs in other ways, this is my core workflow. I quickly get frustrated when I can't _quickly_ modify the context.

If you have some mastery over your editor, you can just run commands and post relevant output and make suggested changes to get an agent like experience, at a speed not too different from having the agent call tools. But you retain 100% control over the context, and use a tiny fraction of the tokens OpenCode and other agents systems would use.

It's not the only or best way to use LLMs, but I find it incredibly powerful, and it certainly has it's place.

A very nice positive effect I noticed personally is that as opposed to using agents, I actually retain an understanding of the code automatically, I don't have to go in and review the work, I review and adjust on the fly.

SatvikBeri · 2026-02-06T12:13:00 1770379980

One thing to keep in mind is that the core of an LLM is basically a (non-deterministic) stateless function that takes text as input, and gives text as output.

The chat and session interfaces obscure this, making it look more stateful than it is. But they mainly just send the whole chat so far back to the LLM to get the next response. That's why the context window grows as a chat/session continues. It's also why the answers tend to get worse with longer context windows – you're giving the LLM a lot more to sift through.

You can manage the context window manually instead. You'll potentially lose some efficiencies from prompt caching, but you can also keep your requests much smaller and more relevant, likely spending fewer tokens.

sunshinekitty · 2026-02-06T04:08:05 1770350885

> I've been building systems like what the OP is using since pgt3 came out.

OP is also a founder of Hashicorp, so.. lol.

> This is the honeymoon phase.

No offense but you come across as if you didn’t read the article.

noosphr · 2026-02-06T04:20:59 1770351659

You come across as if you didn't read my post.

I'll wait for OP to move their workflow to Claude 7.0 and see if they still feel as bullish on AI tools.

People who are learning a new AI tool for the first time don't realzie that they are just learning quirks of the tool and underlying and not skills that generalize. It's not until you've done it a few times that you realzie you've wasted more than 80% of your time on a model that is completely useless and will be sunset in 6 months.

noosphr · 2026-02-06T03:04:04 1770347044

I'd never thought I'd see the day that anyone praises COM.

tonyedgecombe · 2026-02-06T09:02:56 1770368576

If you read Don Box’s book on COM he goes through every decision they made and the rationale for it. It all seemed to make sense.

Unfortunately I think Don Box’s was the only person in the world who really understood it all.

pjmlp · 2026-02-06T08:41:30 1770367290

As idea it is great, as the tooling available in C++ Builder, Delphi, VB 6, C++/CX (WinRT is basically COM with extras) also great.

Using it from MFC, kind of alright.

Using it from .NET, depends if Framework, .NET Native, or modern, with various levels of usability.

Using it from ATL, WRL, C++/WinRT, is a mess unfortunely.

okanat · 2026-02-06T11:05:50 1770375950

Compared to wobbly types in C and hidden contracts due to C compiler internals, COM's IDL-based approach is much better!

noosphr · 2026-02-05T10:02:22 1770285742

5 was a great option for ml work last year since colo rented didn't come with a 10kW cable. With ram, sd and GPU prices the way they are now I have no idea what you'd need to do.

Thank goodness we did all the capex before the OpenAI ram deal and expensive nvidia gpus were the worst we had to deal with.