Hacker Newsnew | past | comments | ask | show | jobs | submit | vidarh's commentslogin

I cofounded an ISP in the 90s, and our first bank of 16 modems were mounted to a panel on the wall by clamps attached to their serial cables.

Who is suggesting "AI is dead without (these specific) people"? People are wondering what it means specifically for the Qwen model family.

There's no fundamental difference, but in practice the difference is one of frequency.

E.g. I sometimes have 10+ agents kicking off changes to the same small project at the same time, and likely to all want to merge within the next 15-30 minutes. At that rate, merge conflicts happens very frequently. The agents can mostly resolve them themselves, but it wastes tokens, and time.


This is exactly the use case, thanks for explaining.

I'm running agents doing merges right now, and yes and no. They can resolve merges, but it often takes multiple extra rounds. If you can avoid that more often it will definitely save both time and money.

Thanks for the great explaination again.

That was far crazier than I expected going into it... To the point I've seen Hollywood movies with far more believable plots that people would find unrealistic.

I just noticed the Wikipedia article has a very relevant and interesting link: https://en.wikipedia.org/wiki/Coffin_ship_(insurance)

We care about this one well known authors opinion if we're someone who sees it as a loss that he and people like him will be less available to us.

We also care about it because it's an indicator of the rise of a new societal problem of signal being even further drowned out.


AI can trivially replicate average-ish and somewhat above human writing if given a tone sample to copy. Getting to replicate the quality of writing output of a decent author, probably not without a lot of effort, but the threshold here is to sound like a plausible e-mail from a book club or similar group, not to write the Great American Novel.

[flagged]


The author needs to avoid a sufficient number of false positives for the time investment not to be prohibitive, and that is what he is arguing is becoming a hard problem. I have problems believing that given some of the e-mail I receive. I have no problem trusting Scalzi on this.

[flagged]


Or you have someone running an AI bot that does research on your target automatically. Ironically, one of regular features of spam I'm getting semi-regularly now is for "marketing" services" that provide OpenClaw instances to research and individualise messaging.

Author states it's a volume problem. Examples show no individualisation beyond the book content. My suggestions would be quite effective.

This isn't a new phenomena in the author world, it's been plagued by 'you have won 1st prize in a poetry competition' for decades.


> Author states it's a volume problem.

How is that relevant? The entire point of using a bot is to be able to achieve volume while including customisation and the ability to hold a conversation if you reply. There are major providers offering to do this at volume for pennies.

> Examples show no individualisation beyond the book content.

The post has no examples. Where have you found examples? And why do you presume adding hurdles will stop things for more than a day or two before the campaigns are more effective?

EDIT: If the examples you're referring to are on his post from November, consider that he didn't change anything despite those. It is logical to assume things have gotten worse since. And it's not unreasonable to think part of this is related to how at least 3 mass-email providers having launched AI bot-driven campaigns in the last couple of months, including at least one of them integrating OpenClaw - I'm sure there are many more, those are just the ones I've casually spotted.

> My suggestions would be quite effective.

Your suggestions are based on you presuming an outdated spray and pray approach that is increasingly being replaced by far more sophisticated campaigns. I'm not Scalzi, but I regularly get approaches that include far more customization than what you propose, including e.g. full-on web pages customized for my business.

> This isn't a new phenomena in the author world, it's been plagued by 'you have won 1st prize in a poetry competition' for decades.

I know. It's been decades since I got my first one. That is also a good reason to think that something has changed for Scalzi to decide this now after he's for years been above average accessible.


I don't know what to tell you. It still works. Personally, I get about 30 job applicants a day, about 2 of which are from real people, which are consistently filtered into my inbox by the subject line keyword I state clearly in the online job description. The spam content is ever improving in terms of relevance. But it's success rate of beating this filter hasn't changed at all.

AI bots are designed to pray and spray, essentially. Customizing the message doesn't change the fact. They're not designed to scan a job spec or article, look for a secret keyword and insert it into the subject line. They could do that, of course, but they don't. Not in March 2026, at least.

If the author wanted to only receive emails from real people, I would bet a lot of money that this trick would be very effective. Whether those real humans are also on the scam is a different question, but they would be humans who had taken the time to read the blog, that's for sure. which, based on the evidence presented, are not the problem.

So yeah, I don't really care if you ignore me or believe me or not, but if anyone is suffering from a similar problem, a simple request to insert a secret into the subject is very effective at proving you are human. I'd recommend giving it a try.

anyway, got to go read a book!


I have the AI integrate their results themselves. That's if anything one of the things they do best. I also have them do reviews and test their own work first before I check it, and that usually makes the remaining verification fairly quick and painless.

Yes, it's often faster if you sit around waiting. What I will do instead is prompt the AI to create various plans, do other stuff while they do, review and approve the plans, do other stuff while multiple plans are being implemented, and then review and revise the output.

And I have the AI deal with "knowing how to do it" as well. Often it's slower to have it do enough research to know how to do it, but my time is more expensive than Claude's time, and so as long as I'm not sitting around waiting it's a net win.


I do this too, but then you need some method to handle it, because now you have to read and test and verify multiple work streams. It can become overwhelming. In the past week I had the following problems from parallel agents:

Gemini running an benchmark- everything ran smoothly for an hour. But on verification it had hallucinated the model used for judging, invalidating the whole run.

Another task used Opus and I manually specified the model to use. It still used the wrong model.

This type of hallucination has happened to me at least 4-5 times in the past fortnight using opus 4.6 and gemini-3.1-pro. GLM-5 does not seem to hallucinate so much.

So if you are not actively monitoring your agent and making the corrections, you need something else that is.


You need a harness, yes, and you need quality gates the agent can't mess with, and that just kicks the work back with a stern message to fix the problems. Otherwise you're wasting your time reviewing incomplete work.

Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...

Your point being? A proper harness will mostly catch things like that. Even a low end model can be employed to do write tests plans and do consistency checks that mostly weed out stuff like that. Hence: You need a harness, or you'll spend your time worrying about dumb stuff like this.

Glancing at what it's doing is part of your multitasking rounds.

Also instead of just prompting, having it write a quick summary of exactly what it will do where the AI writes a plan including class names branch names file locations specific tests etc. is helpful before I hit go, since the code outline is smaller and quicker to correct.

That takes more wall clock time per agent, but gets better results, so fewer redo steps.


Here is an example where the prompt was only a few hundred tokens and the output reasoning chain was correct, but the actual function call was wrong https://x.com/xundecidability/status/2005647216741105962?s=2...

I as a human have typos too - and sometimes they're the hardest thing to catch in code review because you know what you meant.

Hopefully there is some of lint process to catch my human hallucinations and typos.


>And I have the AI deal with "knowing how to do it" as well. Often it's slower to have it do enough research to know how to do it

This is exactly the sort of future I'm afraid of. Where the people who are ostensibly hired to know how stuff works, out source that understanding to their LLMs. If you don't know how the system works while building, what are you going to when it breaks? Continue to throw your LLM at it? At what point do you just outsource your entire brain?


There are many layers to "knowing how stuff works". What does your manager do when your code breaks?

> Continue to throw your LLM at it?

Increasingly, yes. If you have objective acceptance criteria, just putting the LLM in a loop with a quality gate tends to have it converge on a fix itself, the same way a human would. Not always, and not always optimally, but more and more often, and with cheaper and cheaper models.

I also tend to throw in an analysis stage where it will look at what went wrong and use that to add additional criteria for the next run.


This sounds like one recipe for burnout, much like Aderal was making everyone code faster until their brain couldn’t keep up with its own backlog.

If anything, it's the opposite. With a proper harness you stop having to spend so much energy reviewing every little intermediate step, and can focus on the higher level.

I'm actually working on a project now where the biggest problem I need to solve is that the verifier that reviews the test harness is too strict.


I keep being told that a proper harness makes agents better, but no one has shown me exactly what is it that gives them such amazing results.

Yesterday Gemini burned 40 minutes trying to diagnose a failed Expo build and going into loops of changing the Podfile and re-running the build, when the issue was that Xcode needed updating (quick Google search for it).

But my comment on burnout stands. The lack of downtime and dynamic thinking modes (admin, planning, review, actual coding) seems like it would conspire to make you either cram out more work or disconnect from it. Both of these become dangerous, after a while.

(Information workers were productive 4–6 hours a day, and the economy did just fine.)


They really don't, if you actually bother prompting them. Give them a voice sample, and tell them to match the tone, and you already get something 10x better. Have them revise with a list of common writing problems - not just common LLM patterns, but guidelines for writing better - and you get rid of more.

Properly prompted, an LLM writes far better than most people.


Without weighing in on whether this is true, I'll point out that LLMs could both be better writers than most people and also be bad writers.

Writing is a difficult skill that many (most?) educational systems do not effectively teach. Most people are terrible writers.


Yes, but for most uses that is irrelevant. Most of the complaints are not about them not being top-level writers, but that they stand out negatively from human writing by relying on a bunch of bad tropes and stereotypical language use.

Maybe we shouldn't use it to write novels if we can't push it well beyond average, but you don't need to get it to produce anything more than pretty much average or a little bit better for it to be good enough in competition with average humans.


That is precisely the problem. When writing technical documentation, such as the landing page for an FPGA inference engine, a model should not need to be prompted to use proper voice and to avoid marketing language. There should be enough context in the text of the prompt itself.

I don't think any of this indicates a fundamental property of the tech itself. AI companies post-train their models to sound like what people like to read better. There's a reason that engagement farmers have converged on the tone that these LLMs imitate, namely its something that people prefer. Maybe not you, but it's the same thing that gives us YouTube face on thumbnails etc.

It takes some prompting to nudge the model out of that default voice because post training reinforced it. They will likely shift it once these AI-isms are known and recognized widely. I'd assume the nextgem models under training now will get negative feedback from the human evaluators for talking too AI-like and then there will be new AI smells to calibrate to.


I'm not sure this invalidates anything I'm saying. The tools currently produce terrible-quality output unless actively prompted to stop producing terrible-quality output. To me, that's a bug, and I don't think post-training and popular preference excuses the tool's behavior. There's no value in normalizing slop if it's so easy to fix.

Should Youtube "fix" the proliferation of exaggerated faces in thumbnails?

People prefer the slop, at least until they collectively notice the AI smell, at which point the post training will likely train it out of models and slop will have new characteristics that take a while for the mainstream to detect.


This is like saying people shouldn't need to be trained for a job.

There's no reason to expect a general purpose model to know what you want when you've not given it any training in what to do for your specific case.

And in this case, the models do far better than humans: Most humans can't just switch to copy arbitrary tone, just by giving them a page worth of text. We don't even need to actually train/fine-tune these models further - we just need to actually fully specify the task we give them to get them to write well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: