Hacker Newsnew | past | comments | ask | show | jobs | submit | topcommentslogin

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.


I'm finding the "adaptive thinking" thing very confusing, especially having written code against the previous thinking budget / thinking effort / etc modes: https://platform.claude.com/docs/en/build-with-claude/adapti...

Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output, you have to add "display": "summarized" to get that: https://platform.claude.com/docs/en/build-with-claude/adapti...

(Still trying to get a decent pelican out of this one but the new thinking stuff is tripping me up.)


I've been running this on my laptop with the Unsloth 20.9GB GGUF in LM Studio: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF/blob/mai...

It drew a better pelican riding a bicycle than Opus 4.7 did! https://simonwillison.net/2026/Apr/16/qwen-beats-opus/


This actually looks very useful. Cloudflare seems to be brining together a great set of tools. Not to mention, D2 is literally the only sqlite-as-a-service solution out there whose reliability is great and free tier limits are generous.

Related. We have several third party web apps in use. These apps don't expose a public api, but they are all single page web apps. We wanted to connect claude code to these web apps for our limited use case.

We opened chrome, navigated the entire website, the downloaded the network tab as an har file. The asked claude to analyze and document the apis as an openapi json. Worked amazing.

Next step - we wrote a small python script. On one side, this script implements stdio mcp. On the other side, it calls the Internal apis exposed by the 3rd party app. Only thing missing is the auth headers..

This is the best part. When claude connects to the mcp, the mcp launches a playwright controlled browser and opens the target web apication. It detects if the user is logged in. Then it extracts the auth credentials using playwright, saves them to a local cache file and closes the browser. Then it accesses the apis directly - no browser needed thereafter.

In about an hour worth of tokens with claude, we get a mcp server that works locally with each users credentials in a fairly reliable manner. We have been able to get this working in otherwise locked down corporate environments.


Very cool! I saw a similar product recently that I liked but I much prefer your approach to theirs[1]

[1] https://github.com/cordwainersmith/Claudoscope


There’s something quietly impressive about getting modern AI ideas to run on old hardware (like OP's project or running LLM inference on Windows 3.1 machines). It’s easy to think all the progress is just bigger GPUs and more compute, but moments like that remind you how much of it is just more clever math and algorithms squeezing signal out of limited resources. Feels closer to the spirit of early computing than the current “throw hardware at it” narrative.

Here are the articles in this series that got significant HN discussion (in chronological order for a change):

ML promises to be profoundly weird* - https://news.ycombinator.com/item?id=47689648 - April 2026 (602 comments)

The Future of Everything Is Lies, I Guess: Part 3 – Culture - https://news.ycombinator.com/item?id=47703528 - April 2026 (106 comments)

The future of everything is lies, I guess – Part 5: Annoyances - https://news.ycombinator.com/item?id=47730981 - April 2026 (169 comments)

The Future of Everything Is Lies, I Guess: Safety - https://news.ycombinator.com/item?id=47754379 - April 2026 (180 comments)

The future of everything is lies, I guess: Work - https://news.ycombinator.com/item?id=47766550 - April 2026 (217 comments)

The Future of Everything Is Lies, I Guess: New Jobs - https://news.ycombinator.com/item?id=47778758 - April 2026 (178 comments)

* (That first title was different because of https://news.ycombinator.com/item?id=47695064 - as you can see, I gave up.)

p.s. Normally we downweight subsequent articles in a series because avoiding repetition of any kind is the main thing that keeps HN interesting. But we made an exception in this case. Please don't draw conclusions from that since we'll probably get less series-ey, not more, after this! Better to bundle into one longer article.


Ooh, this looks great!

The usage costs are rather high compared to S3 - 30x higher PUT/POST. It looks like batching operations is going to be vital.


The hardware-attested privacy path is the interesting part of this, but the economic side has a quieter risk the thread has not named: the load tax per request. MiniMax M2.5 239B from your catalog still has to load all 239B weights even though only 11B are active — that is roughly 120GB at Q4_K_M, and cold load from SSD on Apple Silicon is measurable in tens of seconds. Even the Qwen3.5 122B MoE lands around 65GB cold. If the coordinator routes request number two to a different idle Mac than request number one, or if the owner's machine spun the model out to free memory in between, each request pays that cold load before the first token. Keeping the model resident 24/7 solves the latency but eats into the power budget the operator is trying to amortize in the first place. How does the coordinator decide which provider to keep warm for which model? A 16GB or 32GB home Mac cannot host Qwen3.5 122B MoE at all, and the Mac Studios that can are a much smaller slice of the 100M machine estimate.

After reading the article, for some reason I am finding the following fact profoundly distressing. Surely there are more than 1000 active airlines worldwide‽

> Every airline has a 3-digit IATA numeric code. 098 = Air India. British Airways is 125. IndiGo is 526. These codes predate the familiar 2-letter IATA codes (AI, BA, 6E): they were used when teletypes could not reliably transmit letters and numbers interchangeably.


"Again, we are not doing this because we want this to be the future. It is not because we want to expand to chain AI-run retail stores across the world. It is not for economic opportunity.

We’re doing this because we believe this future is coming regardless, and we’d rather be the ones running it first while monitoring every interaction, analyzing the traces, benchmarking how much autonomy an AI can responsibly hold."

I always enjoy how these AI companies try to take a moral high ground. When someone doesn't want something to be the future, usually, their instinct is not to try to be the first person doing that exact thing. If you don't want this to be the future than why don't you spend your time building a future you do want? Supporting people that want more AI regulation to stop this? Literally anything else.

Just be honest, you think this is the future and you do in fact want to be first doing it to be in a position to make alot of money. Do you think people don't know what and ad is when they see one?


I'm not sure why this announcement has generated so much irritation in the comments-- Cloudflare has been transitioning from "DDoS protection" to "AWS competitor" for many years now, and this is just their alternative to AWS SES.

It's an email sender that you can access through an API, or directly through Workers. For those who haven't been keeping up over the years, Workers is their product for running code on Cloudflare's platform directly (an AWS Lambda competitor, more or less) and they've been trying to make it the centerpiece of an ecosystem where you deploy your code to their platform and get access to a variety of tools: databases, storage, streaming, AI, and now email sending. All of this is stuff that AWS has had for years, but some people like Cloudflare more (I certainly do).

One thing that surprised me is the price-- Cloudflare's cloud offerings are usually much cheaper, and I've saved plenty of money by migrating from AWS S3 to Cloudflare's R2. This new offering is 3x the AWS price, though. Weird. Anyway, most small companies don't send enough email for it to matter.

But getting back to the consensus in the comments here: I'm not sure why people think that they'll be worse about policing spam than AWS SES, Azure Email, etc.


And still, in the year of our lord 2026, GitHub does not support IPv6.

https://github.com/orgs/community/discussions/10539


Paper Computing (great name!) is something I've been thinking about a lot to help my kids benefit from tech without exposing them to the brain melting addiction of screens. I sacrificed a few crazy nights of sleep to try to build a Paper Computer Agent prototype for a recent Gemini hackathon (only to disappointingly have submission issues right before the actual deadline) which my kids loved and keep asking me to set up permanently for them.

It's essentially a poor man's hacked up DynamicLand - projector, camera, live agent. There are so many things you could do if you had a strong working baseline for this. My kids used it to create stories, learn how to draw various things, and watching safe videos they could hold in their hand.

There's something weirdly compelling and delightfully physical about holding a piece of paper that shows a live rocket launch, with the flames streaming down the page. It could also project targeted pieces of text, such as inline homework advice, or graphs next to data. It doesn't take long to imagine any other number of fun use cases, and it feels a lot more freeing and inspiring than keeping everything bound to a screen.

Github - https://github.com/Pugio/Orly (hacky minimal prototype that did the thing)

Video Pitch - https://youtu.be/-9l1x7GnmxU (filmed an hour before the deadline on an old phone with no sleep)


I had truly good “hacking” session with Codex. It’s not hacking, I wasn’t breaking anything, just jumping over the fences TP-Link put for me, owning the router, inside the network, knowing the admin password. But TP-Link really tried everything so you cannot access the router you own via API. They really tried to be smart with some very very broken and custom auth and encryption scheme. It took some half a day with Codex, but in the end I have a pretty Python API to access my router, tested, reliable, and exporting beautiful Prometheus metrics.

I’m sure there is some over eager product manager sitting in such companies, trying to splits markets into customer and enterprise sections, just by making APIs not useable by humans and adding 200% useless “security by obscurity”.


I got a human being at Google to look into my problem and take action after sending a police report to Google‘s legal department certified mail return receipt along with a letter describing how someone was impersonating me and my business using a Gmail address in an attempt to commit fraud.

Yes, it was a pain to take all of these steps and it probably took about 3 hours but it was absolutely necessary considering there was no avenue for me to shut down this person otherwise.


>> Test it yourself, GPT 120B OSS is cheap and available. BTW, this is why with this bug, the stronger the model you pick (but not enough to discover the true bug), the less likely it is it will claim there is a bug.

I guess this is the crux of the debate. All the claims are comparing models that are available freely with a model that is available only to limited customers (Mythos). The problem here is with the phrase "better model". Better how? Is it trained specifically on cybersecurity? Is it simply a large model with a higher token/thinking budget? Is it a better harness/scaffold? Is it simply a better prompt?

I don't doubt that some models are stronger that other models (a Gemini Pro or a Claude Opus has more parameters, higher context sizes and probably trained for longer and on more data than their smaller counterparts (Flash and Sonnet respectively).

Unless we know the exact experimental setup (which in this case is impossible because Mythos is completely closed off and not even accessible via API), all of this is hand wavy. Anthropic is definitely not going to reveal their setup because whether or not there is any secret sauce, there is more value to letting people's imaginations fly and the marketing machine work. Anthropic must be jumping with joy at all the free publicity they are getting.


It would be interesting to do a write-up like this on "modern microcontrollers." Some of the content is similar (some µc cores look relatively similar to µp cores with a 10-20 year lag), but there's differences too. Things that would come to mind for me:

1) Strategic pipeline lengths -- long pipelines drive throughput, short pipelines drive interrupt responsiveness. 5-stage pipelines are still popular for realtime cores.

2) Heterogenous cores -- a mix of short- and long-pipeline cores on a single chip, with some optimized for responsiveness and some optimized for throughput. (This could actually be added to the µp article as well, discussing big.LITTLE style heterogeneity with some cores optimized for total throughput and some optimized for power efficiency.) Unlike in the µp case, this is pared with a general assumption that cores are usually developer-managed (asymmetric multiprocessing) rather than magically managed by a scheduled (symmetric multiprocessing). (Dedicated cores for low power come up in µcs too.)

3) Fast memories; some very fast memories. Everything fits in SRAM on chip. Some SRAM is tightly coupled to a specific core (tightly coupled memory), which gives as fast as single cycle access; some is hanging off an AXI bus to allow sharing between cores, but adds a few cycles (and possible collisions) to access, making caches still relevant (which has not always been true for µcs). The µp developer approach to performance of "memory accesses rule everything" is not nearly as true on µcs.

4) Peripherals and accelerators dominate silicon area, and dominate system performance. (This can also be said of µps these days.) Proper use of DMA engines can completely change the solution to problems. Smart peripherals unload huge amounts of work from the core, making the core less important -- in some cases, it's really just there to configure the DMA engine and the peripherals. (This sounds an awful lot like cores on a µp just feeding GPUs these days.)

5) Topology awareness. Multiple AXI busses and peripheral busses; software needs to be aware of what peripheral or SRAM chunk hangs off what bus to maximize performance, minimize collisions, or even just to allow the peripheral to be used at all from a given core in a given power state. This has some similarities to NUMA awareness in µp development, but as with AMP vs SMP it's generally more visible to developers.

I could keep going... there's an article here.


Key detail:

> Immigration authorities say the move is aimed at preventing cases in which foreign workers obtain visas under one category, but then engage in unrelated or lower-skilled work.

The claim appears to be that people were using up visa slots for things like interpreters or other jobs where clearly you'd need good language skills to actually do the job, including in Japanese, with the intent all along of doing some other job instead. An up-front test should let through almost all of the legitimate claimants of these visas, and stop almost all the fraudsters. Probably a lot cheaper than a similarly-effective level of after-the-fact auditing, or more-extensive checks into applicants' work situation.

[EDIT] I mean, in the framing provided by the government, the above appears to be what's going on. Governments may lie, of course.


When the early adopters start pushing neural implants they'll be ad-free. Not long after your boss insists that everybody needs neural implants for the sake of productivity, they'll be ad-supported but moneyed developers will be able to opt out. The terms of the ad-free service will continue shifting, so nothing is ever really ad-free for long, and ads for better neural implants are promotions not ads right? But y'all are working on neural implants because if you don't, somebody else will, aren't you

So Opus 4.7 is measurably worse at long-context retrieval compared to Opus 4.6. Opus 4.6 scores 91.9% and Opus 4.7 scores 59.2%. At least they're transparent about the model degradation. They traded long-context retrieval for better software engineering and math scores.

Oh wow, I used to work on Excel Add-Ins about 10 years ago. Even got a patent for it. I'd be curious to see how they implemented the calls.

We came up with what I still consider a pretty cool batch-rpc mechanism under the hood so that you wouldn't have to cross the process boundary on every OM calls (which is especially costly on Excel Web). I remember fighting so hard to have it be called `context.sync()` instead of `context.executeAsync()`...

That being said, done poorly it can be slow as the round-trip time on web can be on the order of seconds (at least back then).


Addressing the usual few complaints folks always bring up:

* This is from the separate independent team that works on Thunderbird, not Firefox, so there isn't any resource contention happening there

* Thunderbird is revenue positive, and this potentially gives that team another revenue stream to be even more self-sustaining through charging companies

* Businesses definitely want to control the AI they're using (especially with RAGs of their own data) instead of just throwing it at their LLM vendor and hoping for the best

People on HN are fond of asserting that their own POV is the only one. Imagine that there is such a thing as a person in charge of choosing technologies for organizations, and that you're such a person. That's who this is for.


> We had a budget alert (€80) and a cost anomaly alert, both of which triggered with a delay of a few hours

> By the time we reacted, costs were already around €28,000

> The final amount settled at €54,000+ due to delayed cost reporting

So much for the folks defending these three companies that refused to provide hard spending cap ("but you can set the budget", "you are doing it wrong if you worry about billing", "hard cap it's technically impossible" etc.)


Little bit of extra detail about static closures in PHP for anyone interested: https://www.php.net/manual/en/functions.anonymous.php#functi...

The article heavily quotes the "AI Security Institute" as a third-party analysis. It was the first I heard of them, so I looked up their about page, and it appears to be primarily people from the AI industry (former Deepmind/OpenAI staff, etc.), with no folks from the security industry mentioned. So while the security landscape is clearly evolving (cf. also Big Sleep and Project Zero), the conclusion of "to harden a system we need to spend more tokens" sounds like yet more AI boosting from a different angle. It raises the question of why no other alternatives (like formal verification) are mentioned in the article or the AISI report.

I wouldn't be surprised if NVIDIA picked up this talking point to sell more GPUs.


I wonder why Windows Defender has the privilege to alter the system files. Read them for analysis? Sure! Reset (as in, call some windows API to have it replaced with the original), why not? But being able to write sounds like a bad idea.

However, I don't know what I'm talking about so take it with a grain of salt!


New Orleans drops mic... I'm from River Ridge in the Jefferson Parish suburbs ( within the city area ) and if I meet a stranger somewhere else in the country they, after one or two sentences, usually think I'm from New York. But slipping into one of the many dialects we have here is never far away, depending on who you are conversing with. Only locals will understand, but my wife used to tell me that after 2 sentences my dad and I would start talking like we were "from Kenner" and she couldn't follow the conversation. To non-locals, Kenner is directly next to River Ridge.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: