More

killcoder · 2026-05-22T03:01:02 1779418862

I think 'actual parallelism' is a vastly easier and more fruitful way to get better performance out of these kinds of systems, compared to pushing against single-threaded faster generation. Tool calling and responses are often embarrassingly parallel. Code generation tasks naturally have a dependency tree that can be unrolled into a fixed budget of parallelism. Tasks can be hierarchically decomposed into subtasks.

It's the same asynchronous stream pattern we're used to dealing with in regular software engineering. We have a fixed thread pool, lots of work that can be scheduled concurrently. Since these are streams, we can do the compute incrementally to reduce the time-to-first-byte/token/response.

Since so many tool calls are inherently asynchronous, and subagent task decomposition can be modelled as such, the IO streams can be oversubscribed, and incoming responses can be priority queued.

On the intelligence front, it's incredible how much better frontier models perform when you just interrupt them every so often and go 'is that the best you can do?', or re-iterate instructions, or repeat the overall goal. I find instruction following _so poor_, especially for 'presentation layer' aspects. Yet if I ask the model to rewrite its last response, it does so perfectly. Why can't the model do this 'internally' and save me having to say 'try again'!

Just because the 'model' is autoregressive doesn't mean the system as a whole needs to present a single stream of immutable text.

warmedcookie · 2026-05-22T11:48:25 1779450505

I do this kind of parallelism with a little merge request tool I slopped together. I spin up multiple small agents and assign them specific code review tasks (security, coding standards, etc.) and have it spit out a gitlab API draft json object with code examples for the MR I can deterministically validate against. If it fails to insert code examples (depending on the task) and the proper json object schema, I have "ask it to try again" logic in place.

Works fine, forcing LLMs to output parsable responses is a good workaround to get them to do what you want until they improve. It also allows you to use the fast models (ex. I spin up the Gemini 3.1 flash lite model for these tasks) to have these tasks done in seconds rather than minutes.

xurenwu · 2026-05-22T13:29:40 1779456580

Similar to your method

killcoder · 2026-01-17T06:03:34 1768629814

I don't buy the "any constraints cause lower performance via being out of distribution" idea. Sure if you ask the model to output 'reasoning' in JSON steps, that is a completely different 'channel' to its trained 'reasoning' output. For real tasks though, I think it's more about picking the _right_ context free grammar to enforce format correctness. You can enforce an in-distribution format and get the best of both worlds. I don't think the industry should settle so hard on JSON-for-everything.

killcoder · 2026-01-17T05:49:07 1768628947

I was working on a speculative decoding optimisation and its accompanying blog post. Explaining the more basic concepts filled so much of the post I decided to pull them out, forming this article.

I had a bit too much fun with the tokenisation diagrams / animations. The raw text is provided to an Astro component, which tokenises it, and forms the individual DOM elements of the tokens. I find it really hard to read 'tokenised' text, I figured some consistent colouring would help. The 'Probabilities' component is a trivial grid, but all the other components support 'word wrap'.

I ended up writing a 'responsive design aware graph colouring solver'.

Multiple screen widths, 'desktop' and 'mobile' are 'simulated', forming an adjacency graph of tokens that touch. Colours are then greedily allocated, then optimised per page over a few hundred iterations, swapping allocations to enforce minimum hue distance between touching tokens at those common screen sizes. The optimising value function prioritises even distribution of colours, because it looks nicer than maximal hue difference.

Originally I naively outputted the palette styles per component, but found the css post processing optimisers didn't handle that as well as I'd have thought. So then I wrote a little 'CSS compiler' that takes the high level palette and timing concepts of the animations, and optimally merges rule declarations.

The start of the post really relies on the animation occurring while fully in view, so I set up some IntersectionObservers that do the 'please scroll' text.

I tried my best to have it all work when JS is disabled on the client. I tried to get the 'hovering' to be CSS-only, but found the JS solution much more performant.

The DAG diagrams are formed with this neat Needleman-Wunsch algorithm from the bioinformatics field. The Astro component accepts several 'examples' then aligns common subsequences, producing the CSS grid and the 'basic SVG' on the server. The responsive nature meant I had to move the final 'allow' generation to the client.

Some browsers seem to throttle the token animations sometimes but I haven't figured out what causes that. This is my first time leaning hard on CSS variables.

killcoder · 2025-11-27T10:09:41 1764238181

In South Australia an algal bloom started in ~mid-March of this year, it's a pretty big ecological disaster, probably the worst non-bushfire disaster in living memory. Probably 30% of SA's coastline is affected. It's a pretty big deal affecting many people's livelihoods.

The joint state and federal government relief and cleanup package is worth AUD $102.5 million dollars.

I hope the public receives that comparison at every opportunity.

The old website was frankly excellent, the only problem was it didn't have HTTPS support. I would have happily upgraded that part of the system for the cost of a cup of coffee if I'd had an opportunity to submit for the tender!

The new website is significantly more difficult to navigate (for me, a seasoned tech user). The primary thing Dad's everywhere use it for (the weather radar) now requires scrolling to the _bottom_ of the page, and zooming in from the 'map of Australia' to the region you live in. It used to be like, a click to go from home page -> state weather radar with all the info you needed.

https://www.abc.net.au/news/2025-11-23/bureau-of-meteorology...

If you want to read our local news about it.

> [BOM] said the cost breakdown included $4.1 million for the redesign, $79.8 million for the website build, and the site's launch and security testing cost $12.6 million.

Absolutely stupid, even those numbers are outrageous. They say it's part of some 'larger upgrade package', prompted by a cyber attack in 2015.

https://www.abc.net.au/news/2015-12-02/china-blamed-for-cybe...

But politicians over here love to blame cyber attacks when technical blunders happen. We had a census a couple years ago and the website fell over due to 'unprecedented load' or maybe it was a 'DDOS attack'? The news at the time couldn't decide who to blame!

Welp, I hope this gets as much world-wide attention as possible so they can be embarrassed and do better.

prawn · 2025-11-27T10:58:20 1764241100

(Hello, fellow South Australian!)

The painpoint for me has been the loss of information density. 99% of my use of the old BoM was the 7 day forecast showing rain and cloud: former for working outside, latter for photography jobs. Now, at about 800px or narrower the 7 day forecast loses the rain estimate, and all they manage to fit in is day, icon, min and max. The day name could be abbreviated, and the other elements are typically 30px wide. Having to expand each or all days to look for the rain estimate is thoroughly tedious.

Among the highlights of vertical space wastage are 130px for a cookie warning, 50px for "No warnings for this location" and then another 110px for heading a table with "7 day forecast" and "expand all". On a large phone screen, it leaves only about a third of the vertical spacing for starting content; the rest is site header and browser chrome!

miyuru · 2025-11-27T11:08:05 1764241685

Funny enough, I recently stumbled upon an Australian comedy show called Utopia that looks more and more like a documentary now.

https://www.youtube.com/watch?v=_otJbx-PVOw

gritten · 2025-11-27T12:57:01 1764248221

The least realistic thing is how many Aussies are employed in that government department.

sam-cop-vimes · 2025-11-29T08:47:23 1764406043

I don't understand how those kinds of numbers get accepted, approved and paid! We built a fairly complex web application for a customer. The total cost including design, development, QA, data migration from a legacy platform + independent 3rd party security audit/pentest was less than $0.5M!

Even if I accounted for the additional capacity to serve a nation of users, I can't imagine the cost being more than $5M.

gregoryl · 2025-11-27T10:58:41 1764241121

I would have settled for https to redirect to http. Instead, it redirected to a generic page telling you they don't support https, with no way to get to the actual content.

pjc50 · 2025-11-27T12:46:52 1764247612

In some ways, poor project management is like an algal bloom or wildfire: costs expand, feeding on other costs, unless a huge active effort to keep them under control is made.

And it ends up being a disaster for the public.

WatchDog · 2025-11-27T10:59:57 1764241197

Another way to think about the price, is that it's slightly less than we spend per day on the NDIS(~126 million)

arresin · 2025-11-27T13:49:38 1764251378

Cleaning up algae doesn’t buy votes

grebc · 2025-11-27T10:40:41 1764240041

Seasoned tech user and changing the link location stumps you? BOOKMARK IT. Things change.

Sorry fellow Aussie here and every Tom, Dick & Harry has had their say on this website during the likely 1000’s of committee meetings here.

I’d charge 96m to the BOM too to upgrade their old POS website.

killcoder · 2025-11-20T03:47:48 1763610468

It would be nice if users of the codex-cli that are just using API keys as a way to handle rate limits and billing could receive these new models at the same time. I appreciate the reasoning behind delayed 'actual API' release, but I've found the rate limiting to be quite annoying, and my own API keys don't have this limitation.

ineedasername · 2025-11-20T06:38:04 1763620684

Re: rate limits, I'm not sure they can, yet, on capacity. See Jensen's comment today about their cloud GPUs being sold out. So capacity increased await the ongoing data center build out.

killcoder · 2025-11-23T04:05:14 1763870714

> 30% more token-efficient at the same reasoning level across many tasks

But they're claiming it's more token efficient, so me switching my usage to the new model should _free up_ capacity.

killcoder · 2025-10-29T14:01:48 1761746508

Apart from the actual model release, this is the second set of models from OpenAI that uses the Harmony response format. I don't suppose anyone knows if OpenAI uses the Harmony format internally for GPT-5 as well?

https://cookbook.openai.com/articles/openai-harmony

killcoder · on Sept 17, 2024

Renderers can access Node APIs via the ‘node integration’ setting or via a preload script.

themoonisachees · on Sept 17, 2024

Aren't these just IPCs disguised as normal function calls though? IIRC only the main node process does anything node, renderers can call "node functions" that really happen in the main process.

killcoder · on Sept 18, 2024

Not at all, in a renderer the Node and Chromium event loops are bound together, they’re part of the same v8 isolate, no IPC shenanigans.

The main process really shouldn’t be used for anything except setup. Since it controls gpu paints amongst other things, blocking on it will cause visible stuttering and a bad user experience.

https://www.electronjs.org/blog/electron-internals-node-inte...

killcoder · on Sept 17, 2024

You were correct. Electron lets you expose specific NodeJS APIs via the preload script or everything via the ‘nodeIntegration’ setting:

https://www.electronjs.org/docs/latest/api/structures/web-pr...

Separately the IPC lets you do zero copy in some circumstances via Transferable objects such as ArrayBuffers. Structured cloning is efficient but not zero copy, and json serialisation shouldn’t be used (since structured cloning is easily available).

__jonas · on Sept 17, 2024

Thanks for adding this context! Guess I was mislead by the Electron documentation talking about multiple processes and IPC, appreciate the clarification!

killcoder · on Sept 17, 2024

Within a renderer you can access NodeJS APIs directly. The main process shouldn’t be used for any significant computation, as it will block GPU paints and cross-process synchronisation.

The other main difference is Electron bundles a known set of APIs, given the known Chromium version. There’s such a huge variance of supported features across the embedded web views.

cyanydeez · on Sept 17, 2024

Yes, this is the best benefit of elecrron: you dont have to trouble shoot 10s of OS webview versions and their ixremental suppory, especially with MacOS.

But it is right that the ui for elwctron has to use a IPC layer to get a node backend running. However, chrome is moving a lot of things like FilesystemAPI into browsers so there may be a day were nodejs is dropped in favor of a sandboxed chromium.

killcoder · on Sept 17, 2024

You don’t need IPC, you can either use a preload script to expose particular Node APIs in a secure manner or set ‘nodeIntegration‘ to ‘true’ to expose everything.

Source: https://www.electronjs.org/docs/latest/api/structures/web-pr...

killcoder · on Sept 15, 2024

Conversely, the last blog post we wrote was 8,000+ words and took months of testing, yet the average 'read' time is under 2 minutes. I'm convinced there's a correlation between interested technical users and the blocking of analytics scripts - but if I were to naively look at the data, I'd also come to the conclusion that "lower effort" was better return on investment. I wonder if these tech journalism establishments are following their analytics and A/B testing themselves into oblivion.

WWLink · on Sept 15, 2024

It's like meat and potatoes, though. Yes you can fill a website up with low effort filler content that keeps your viewers engaged and visiting, but in the long run you also need some solid meaty stuff.

A lot of that sorta stuff moved over to youtube because it was easier to monetize. I think a hybrid of the two is the nicest (reading charts from youtube videos sucks)

fuzzy_biscuit · on Sept 15, 2024

It's a weird trap. With no analytics, it'd be difficult to attribute any conversions to a particular user type, so I'd wager that, if the hypothesis that lower tech users don't block ads/analytics holds up, the metrics skew that way. We can't make any realistic assertions without the data for that user group. Shrug.