More

Sevii · 2026-01-10T15:49:02 1768060142

Small models are getting good but I don't think they are quite there yet for this use case. For ok results we are looking at 12-14GB of vram committed to models to make this happen. My MacBook with 24GB of total ram runs fine with a 14B model running but I don't think most people have quite enough ram yet. Still I think it's something we are going to need.

We are also going to want the opposite. A way for an LLM to request tool calls so that it can drive an arbitrary application. MCP exists, but it expects you to preregister all your MCP servers. I am not sure how well preregistering would work at the scale of every application on your PC.

Sevii · 2026-01-09T21:55:47 1767995747

Very much wish I had gotten a RTX 5090 for local LLMs but it would have doubled the cost of my PC.

MBCook · 2026-01-09T22:17:41 1767997061

Actually isn’t the cost of a Pi basically a rounding error compared to the 5090 at this point?

Sevii · 2026-01-06T15:38:54 1767713934

AI coding improved a lot over 2025. In early 2025 LLMs still struggled with counting. Now they are capable of tool calling so they can just use a calculator. Frankly, I'd say AI coding may as well have not existed before mid-2025. The output wasn't really that good. Sure you could generate code but couldn't rely on a coding agent to make 2 line edits to a 1000 line file.

rocmcd · 2026-01-06T15:56:33 1767714993

I don't doubt that they have improved a lot this year, but the same claims were being made last year as well. And the year before that. I still haven't seen anything that proves to me that people are truly that much more productive. They certainly _feel_ more productive, though.

Hell, the GP spent more than $50,000 this year on API calls alone and the results are... what again? Where is the innovation? Where are the tools that wouldn't have been possible to build pre-ChatGPT?

I'm constantly reminded of the Feynman quote: "The first principle is that you must not fool yourself, and you are the easiest person to fool."

Sevii · 2026-01-04T04:51:18 1767502278

LLMs writing test cases, LLMs writing Selenium tests, LLMs doing exploratory testing, LLMs used for canary deployments. All that testing that people didn't do before because it was too hard and took too long? LLMs will be used to do it.

Sevii · 2025-12-23T23:14:03 1766531643

Taking days to work through system design is a good idea. If you aren't going to take time for humans to understand the design why bother with design?

Sevii · 2025-12-20T17:54:19 1766253259

Had Claude web put a PR together. I'll test it a bit next. Getting the hooks config to work and figuring out a test flow for plugins was the hard part. https://github.com/Sevii/agent-marketplace/pull/4/files#diff...

Sevii · 2025-12-20T18:17:20 1766254640

Pushed a new plugin elevator-notifications to the repo/marketplace. I'm seeing notifications in notification center (had to turn mac notifications back on to test). Looks like I could fine tune the actual notification content a bit more but it's working on my machine.

Sevii · 2025-12-16T22:20:59 1765923659

Can you setup automated integration/end-to-end tests and find a way to feed that back into your AI agents before a human looks at it? Either via an MCP server or just a comment on the pull request if the AI has access to PR comments. Not only is your lack of an integration testing pipeline slowing you down, it's also slowing your AI agents down.

"AFAICT, there’s no service that lets me"... Just make that service!

adam_gyroscope · 2025-12-17T17:16:25 1765991785

We do integration testing in a preview/staging env (and locally), and can do it via docker compose with some GitHub workflow magic (and used to do it that way, but setup really slowed us down).

What I want is a remote dev env that comes up when I create a new agent and is just like local. I can make the service but right now priorities aren’t that (as much as I would enjoy building that service, I personally love making dev tooling).

Sevii · 2025-12-16T21:49:29 1765921769

Claude Code latency is at the unfortunate balance where the wait is long enough for me to go on twitter, but not long enough to do anything really valuable. Would be more productive if it took minutes or under 5-10 seconds.

Sevii · 2025-12-16T21:46:13 1765921573

If AI is good enough to write formal verification, why wouldn't it be good enough to do QA? Why not just have AI do a full manual test sweep after every change?

simonw · 2025-12-16T21:47:49 1765921669

You can do that as well, but it's non-deterministic and also expensive to run.

Much better to have AI write deterministic test suites for your project.

apercu · 2025-12-16T22:09:48 1765922988

I guess I am luddite-ish in that I think people still need to decide what must always be true in a system. Tests should exist to check those rules.

AI can help write test code and suggest edge cases, but it shouldn’t be trusted to decide whether behavior is correct.

When software is hard to test, that’s usually a sign the design is too tightly coupled or full of side effects, or that the architecture is unnecessarily complicated. Not that the testing tools are bad.

Sevii · 2025-12-15T15:30:33 1765812633

You get confidence in things by doing them. If you don't have experience doing something, you aren't going to be confident at it. Try vibe coding a few small projects. See how it works out. Try different ways of structuring your instructions to the 'agents'.

fnimick · 2025-12-15T15:34:16 1765812856

Are there public examples of "good instruction" and an iteration process? I have tried and have not been very successful at getting Claude Code to generate correct code for medium sized projects or features.

Jeremy1026 · 2025-12-15T18:55:05 1765824905

I had Claude write a piano webapp (https://webpiano.jcurcioconsulting.com) as a "let's see how this thing works" project. I was pleasantly surprised by the ease of it.

I actually just put together a write up showing my prompts and explaining what was generated after each, if you're interested at all https://jcurcioconsulting.com/posts/how-i-used-claude-code-t...

Sevii · 2025-12-15T18:35:06 1765823706

Anthropic has a short training course https://www.coursera.org/learn/claude-code-in-action. There isn't really a lot of best practices at this point because the technology has improved significantly in 2025.

mkranjec · 2025-12-15T19:51:22 1765828282

Heads up, this is a paid course.