Hacker Newsnew | past | comments | ask | show | jobs | submit | multidude's commentslogin

Infrastructure management mostly — cron scheduling, DB migrations, nginx configs, log analysis. Things I know how to do but that are slow to type out. But the learning is there: I've used it to go deep on PostgreSQL query planning and indexing in a way I wouldn't have bothered with before.

The risk, in my opinion was the opposite of I'd expected — not learning less, but a lot of shallow things giving an impression of learning which in effect isnt there. Look! i built this powerful thing. 60.000 Lines of coded business logic in a week! On the surface i have something impressive to show. People admire it, and i end up believing that i did it. This is not to say that i did not learn - i did, but my learning was not as impressive as the result would make you believe.

Maybe i should try using it to learning Math now.


I haven't tackled payments, but I've run an agent with SSH access to a production server and real API keys for a few weeks. The trust question you're circling ("would you trust an AI with $500") is the interesting part. My answer so far: yes for reversible actions, not yet for irreversible ones. Deleting a file, sending an email, making a payment — these need a different approval model than reading a database or running a query. The hard problem isn't capability, it's building infrastructure that distinguishes "can do" from "should do without asking.

And i want to build an agent capable to do automated investment. so, to the question "has" anyone...?" i believe yes, my role model is Jim Simons from Renaissance. He did.


I think you have a point. The credential part feels like a solved problem — auth-proxying has been around for a while. What seems genuinely new to me is the approval layer, the idea that a human should confirm before a sensitive action actually executes. I'm not sure that's covered by tokenizer or SSO proxy, but I could be wrong. Is that the real differentiator here, or am I missing something?

A problem i have is that the agent's mental model of the system im building diverges from reality over time. After discussing that many times and asking it to remember, it becomes frustrating. In the README you say the agents memory persists across runs, would that solve said problem?

Also, I had to do several refactorings of my agent's constructs and found out that one of them was reinventing stuff producing a plethora of function duplications: e.g. DB connection pools(i had at least four of them simultaneously).

Would AXE require shared state between chained agents? Could it do it if required?


YES! happens to me all the time in things big and small. At work, at home, with the kids, my wife, and their birthday presents. I once talked to a somewhat famous writer who told me this very thing. He said his worst critic was his inner demon biting him at every thought, every phrase, questioning his wording, waiting for the greatest possible idea, discarding all that was not breathtaking enough.

Why do we have to be great all the time? Who is telling us to be best? And i know that in writing this i am pruning myself again trying to find the best words here.

Imagine that: i want enough points for karma to be able to post here my greatest idea. Which ironically enough, is the best greatest idea i had in a loooong time, and the moment i want to share it i must wait to be found good enough and worth to be heard.

I guess the only thing we can do is to disconnect our feeling of self worth from outside signals and be happy with the little things that made us smile when we did not know nor care about other peoples opinions.


The "deny list is a fool's errand" framing is exactly right. I've been running an AI agent with broad filesystem and SSH access and the failure mode (so far) isn't the agent doing something explicitly forbidden — it's the agent doing something technically allowed but contextually wrong. git checkout on a file you meant to keep is the classic example.

The action taxonomy approach is interesting. Curious whether context policies work well in practice — what does "depends on the target" look like when the target is ambiguous? E.g. a temp file in /opt/myapp/ that happens to be load-bearing.


The stale state problem is real and underappreciated. I've been running browser automation through OpenClaw and the failure modes you describe — modal appears after screenshot, dropdown covers the target element — are exactly what causes silent failures that are hard to debug. The agent "succeeds" from its perspective because it acted on the last known state.

The freeze-then-capture approach is interesting. Curious how it handles pages with aggressive anti-bot detection that fingerprints headless Chromium forks — that's the other failure mode I keep hitting.


Right now, it's evading all anti-botting detectors I've tested it on. I believe it's due to the fact it runs in headful mode and I've removed all detectable CDP signatures. Input events are also simulated at a system level (typing is at 200 WPM) so it's very hard for a page's javascript to detect it's not in a human operated chrome. A lot of detection on headless happens due to the webGPU capabilities being disabled since a modern computer is very unlikely to not support those. You could also wire up one of the Heretic models as a dedicated Captcha solver, I recommend Qwen 3.5 27b Heretic! https://huggingface.co/coder3101/Qwen3.5-27B-heretic

This is directly useful for financial data monitoring. I've been thinking about watching specific elements on energy report pages (EIA weekly inventory releases, OPEC statements) rather than scraping the full page. The element picker + RSS output is exactly the right interface for that — pipe the change event straight into an NLP pipeline without the noise of a full page diff.

The RSS question: yes, RSS is useful precisely because it's composable. It works with anything. Direct alerts are convenient but RSS is infrastructure.


The model choice matters a lot for cost. I've been running a production NLP pipeline on OpenClaw using Claude Haiku exclusively — it's roughly 25x cheaper than Opus for inference tasks where you don't need the full reasoning power. For most "read this text, classify it" tasks Haiku is more than sufficient.

The hard part for a new user who knows about VMs isn't the VM setup — it's knowing which model to reach for. Opus for complex reasoning, Sonnet for balanced tasks, Haiku for high-volume classification or anything where you're calling the API repeatedly in a loop. Getting that wrong is where bills explode.

A sensible default for a hosted product like Klaus would be Sonnet with Haiku available for bulk operations. Opus should require an explicit opt-in with a cost warning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: