Hacker Newsnew | past | comments | ask | show | jobs | submit | dimitri-vs's commentslogin

The turn-key option is ostris ai-toolkit which has good tutorials on YT and can be run completely locally or via RunPod. Claude Code can set everything up for you (speaking from experience) and can even SSH into RunPod.

I wouldn't say the US gov has a good track record regarding it's "job" in healthcare.


Hell of a lot better than without the government though. Read more history.


In theory, yes. In practice the shit data you are working with (descriptions that are one or two words or the same word with ref id) really benefit from a) an agent that understands who you are and are likely spending money on b) has access to tool calls to dig deeper into what `01-03 PAYPAL TX REFL6RHB6O` actually is by cross referencing an export of PayPal transactions.

I think the smarter play is having an agent take the first crack at it, and build up a high confidence regex rule set. And then from there handle things that don't match and do periodic spot checks to maintain the rule set.


> "...then from there handle things that don't match..."

Curious, what's the inputs for an agent when handling your dataset? What can you feed the agent so it can later "learn" from your _manual labeling_?


Reality is we went from LLMs as chatbots editing a couple files per request with decent results. To running multiple coding agents in parallel to implement major features based on a spec document and some clarifying questions - in a year.

Even IF llms don't get any better there is a mountain of lemons left to squeeze in their current state.


I'm in Claude Code 30+ hr/wk and always have a at least three tabs of CC agents open in my terminal.

Agree with the other comments: pretty much running vanilla everything and only the Playwright MCP (IMO way better than the native chrome integration) and ccstatusline (for fun). Subagents can be as simple as saying "do X task(s) with subagent(s)". Skills are just self @-ing markdown files.

Two of the most important things are 1) maintaining a short (<250 lines) CLAUDE.md and 2) having a /scratch directory where the agent can write one-off scripts to do whatever it needs to.


I also specifically instruct Claude how to use a globally git ignored scratch folder “tmp” in each repo. Curious what your approach is


You store your project context in an ignored tmp folder? Share more plz - what does it look like? What do you store?


Not memory, I just instruct it to freely experiment with temporary scripts and artifacts in a specific folder.

This helps it organize temporary things it does like debugging scripts and lets it (or me) reference/build on them later, without filling the context window. Nothing fancy, just a bit of organization that collects in a repo (Git ignored)


How can you - or any human - review that much code?


When I'm coding I have about 6 instances of VSCode on the go at once; each with their own worktree and the terminal is a dangerous cc in docker. most of the time they are sitting waiting for me. Generally a few are doing spec work/reporting for me to understand something - sometimes with issue context; these are used to plan or redirect my attention if I might've missed something. A few will be just hacking on issues with little to no oversight - I just want it to iterate tests+code+screenshots to come up with a way to do a thing / fix a thing, I'll likely not use the code it generates directly. Then one or two are actually doing work that I'll end up PR'ing or if I'm reviewing they'll be helping me do the review - either mechanically (hey claude, give me a script to launch n instances with a configuration that would show X ... ok, launch them ... ok, change to this ... grab X from the db ... etc.) or insight based (hey claude, check issue X against code Y - does the code reflect their comments; look up the docs for A and compare to the usage in B, give me references).

I've TL'd and PM'd as well as IC'd. Now my IC work feels a lot more like a cross between being a TL and being a senior with a handful of exuberant and reasonably competent juniors. Lots of reviewing, but still having to get into the weeds quickly and then get out of their way.


>I've TL'd and PM'd as well as IC'd. Now my IC work feels a lot more like a cross between being a TL

Interesting... I've been in management for a few years now and recently doing some AI coding work. I've found my skills as a manager/TL are far more adaptable to getting the best out of AI agents than my skills as a coder.


Same, I was a very average dev coming out of CS, and a PM before this. I find that my product training has been more useful, especially with prototypes, but I do leave nearly all of the hard system, infra, and backend work to my much much more competent engineering teammates.


TBH I'm not building "production grade" apps depended on by hundreds of thousands of users - our clients want to get to a live MVP as fast as possible and love the ability to iterate quickly.

That said, it's well know that Anthropic uses CC for production. You just slow things down a bit, spend more time on the spec/planning stage and manually approve each change. IMO the main hurdle to broader Claude Code adoption isn't a code quality one, it's mostly getting over the "that's not how I would have written it" mindset.


From personal experience, most of my time in Claude Code is spent experimenting, iterating, and refining approaches. The amount of code it produces as it relates to time spent working on it tends to be pretty logarithmic in practice.


They don’t, they just push garbage, someone else quickly looks over it (or asks another llm to review for him), and merges.


This is me. Was a huge Cursor fan, tried Claude Code, didn't get it, tried it again a year ago and it finally clicked a week later I cancelled my Cursor sub and now using VS Code.

I don't even like using CLI, in fact I hate it, but I don't use CLI - Claude does it for me. Using for everything: Obsidian vault, working on Home Assistant, editing GSheets, and so much more.


Chandra


Yes, all the major CLIs (Claude Code, Codex, etc) and many agentic applications use a large model main agent with task delegation to small model sub-agent. For example in CC using Opus4.5 it will delegate an Explore task to a Haiku/Sonnet subagent or multiple subagents.


The agent interfaces are for human interaction. Some tasks can be fully unattended though. For those, I find smaller models more capable due to their speed.

Think beyond interfaces. I'm talking about rapid-firing hundreds of small agents and having zero human interaction with them. The feedback is deterministic (non agentic) and automated too.


As someone that's currently building accounting (and many many other) tools for myself: yes, it can.

But with a big fat asterisk that you: 1. Need to make it aware of all relevant business logic 2. Give it all necessary tools to iterate and debug and 3. Have significant experience with strengths and weaknesses of coding agents.

To be clear I'm talking about cli agents like Claude Code which IMO is apples and oranges vs ChatGPT (and even Cursor).


FWIW most LLMs are pretty terrible at estimating complexity. If you've used Claude Code for any length of time you might be familiar with it's plan "timelines" which always span many days but for medium size projects get implemented in about an hour.

I've had CC build semi-complex Tauri, PyQT6, Rust and SvelteKit apps for me without me having ever touched that language. Is the code quality good? Probably not. But all those apps were local-only tools or had less than 10 users so it doesn't matter.


That's fair, I've had similar experiences working in other stacks with it. And with some niche stacks, it seems to struggle more. Definitely agree the more narrow the context/problem statement, higher chance of success.

For this project, it described its reasoning well, and knowing my own skillset, and surface level info on how one would start this, it had many good points that made the project not realistic for me.


Disagree - the timelines are completely reasonable for an actual software project, and that's what the training data is based on, not projects written with LLMs.


Yes, this is my experience as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: