The context creep point really resonates. it's easy to keep adding 'just one more thing' to a prompt without realizing each run is getting heavier. I hatched three lobsters and sent them on random errands, $250 vanished. One thing that helps is setting a hard rule upfront: define the maximum input size before you start building, not after. Treating it like a budget from day one makes it easier to stick to. The idea of model routing as a seatbelt rather than an optimization is a good way to frame the bigger picture too. I have sense switched from Opus to Sonnet which is 1/5 the price and they just sped Sonnet up.
Nice to see a real-world example of Codex handling a port like this. The browser first approach makes sense for TTS, being able to run it locally without a server is a big deal for privacy. Curious how the output quality compares to the original...?
This has been building for a while. The gap between 'AI safety' as a company value and real-world government contracts was always going to create friction. Still, it's hard to believe the US government's own AI capabilities are apparently behind what's publicly available. Although I know Opus 4.6 isn't the best model Anthropic is sitting on.
Really appreciate that this bakes in PII redaction and a review step before anything is published. A lot of tools that push data somewhere skip that entirely. The 'export locally first, review, then publish' flow seems like the right approach.
The part that gets me is 24,000 fake accounts and 16 million API calls. That's not a side project, that's a coordinated operation. At some point "distillation" stops being a technical term and starts being a very polite word for something else.
Hard to see how legislation fixes this though. If the models are good enough to be worth stealing, they're accessible enough to be stolen. That's not a China problem, that's a business model problem.
Maybe that's why Anthropic cut off the Max Plan from API calls. It was a no-no until last week when it became not possible anymore. Needless to say, my API costs or more than $200 a month now. Glad they ramped up sonnet to 4.6 and 1/5 the cost of Opus.
Running local LLMs on Apple Silicon has gotten surprisingly capable — the M-series chips handle models that used to require expensive GPU setups, so tools like this that actually speak to that hardware are welcome.
The quantization comparison is the feature I'd use most. It's one of those things that sounds simple but in practice nobody wants to dig through benchmarks just to figure out whether Q4 or Q8 is worth the extra memory on their specific machine.
Does it factor in what else is running in the background when estimating how much your machine can handle? That number can shift a lot depending on what else has memory tied up.
No. The tool uses total hardware resources, not currently available ones.
It displays current usage (via progress bars and "used" values), but the LLM model recommendations are based on raw hardware totals, not what's actually free after background processes.
This is a reasonable design choice for a "system specs" tool—it's showing what the machine has, not what it has available right now. But you're right that it could misleadingly suggest a 70B model fits on a 24GB GPU when other apps are already using 4GB.
The architecture here is right, local execution, remote window. Running agents on your own machine means your filesystem, tools, and env stay intact, you're not paying cloud rates for compute that your hardware can already handle.
The "survive interruptions" piece is underrated. Anyone who has had a long agentic run get killed by a dropped connection knows the pain. Session persistence isn't glamorous but it changes how you actually use these tools day to day.
Curious how it handles multiple local sessions — if you're running separate agents on the same machine for different projects, can you switch between them remotely or are you stuck being able to only remote control 1 bot?
I'm a technology proponent, but tech doesn't replace a human teacher when it's 24 students, 1 teacher, and the lesson plan is "open your i-Ready modules."
My 4 kids all use i-Ready. It has real value for reinforcing lessons — but some teachers lean on it as a one-size-fits-all substitute rather than a supplement. That's where it falls apart.
The honest version of AI in education isn't here yet: truly dynamic, personalized lesson paths that adapt to each student in real time. When that exists, the multiple-choice quiz model i-Ready is built on won't be able to compete. The question is just — when does every student get an AI that functions as their own individual teacher?
The buried insight is right: if random keystrokes produce playable games, the input is basically noise and the system is doing all the work. We've evolved past the point where intent matters. That's either the most exciting or most terrifying thing about where this is all heading. But I am glad I am sitting in the front row watching this all happen, especially a dog vibe code!
First, because there's intent in the very verbose initial prompt.
Second, because you have to factor in the quality of the output. I don't want to be a killjoy, but past the (admittedly fun!) art experiment angle, these are not quality games. Maybe some could compete with Flappy Bird (remember it? It seems like ages ago!), but good indie games are in a different league. Intent does matter.
reply