Some background: I had been running into the same issue over and over again where my Ai coding agents sucked at testing edge cases, performing long horizontal tasks, and testing the functionality of its own code. My agents, especially claude, would frequently hit context anxiety, run into issues where they stated they were "done" when in fact they had only hit 50% completion on a feature implementation, and then they would consistently lie to me and say, "Nuh uh, I did implement and test it".
After doing some digging into other peoples approaches to avoid these problems I realized that an Ai harness was necessary to wrangle the clanker bastard in in order to perform my tasks big or small with increasing efficiency. I implemented a harness solution for my company where I work at and the results were good. Really good.
Never before had I had so many of my PR's merged so quickly without being told "hey go check this out", or "this needs to change". It was incredible. It got to the point where I just gave claude unlimited access to my linear tasks from my project manager and had it run the request through the /forge skillset that is the core of the pipeline. I soon had no need to check in on how my little sweat shop coding agent was performing and finally had time to work on other stuff.
With all the new time I had on my hands I realized that I wanted this not just in my work repo but in my personal ones as well so I created forge-cli. A cli tool that allows anyone anywhere with access to the repo to initialize an Agent harness that matches to an existing repo or helps you plan long horizontal tasks for a new project you are making, and sets up the core skills and agent files that are needed to start any good harness to reel in your defiant robot slave.
Since every project is different the implementation should respect how your codebase and skills grew and what you already have and so the forge pipeline respects your new additions to SKILLS, CLAUDE.md, and more and formats the files it creates to match your repo.
One of the standout additions of this forge-cli is implementing karpathy/autoresearch ideas. Basically a loop in the CLI called "forge refine" that helps you write out what you wanted a task to do, the implementation approach of that task, and then the refinement on if it completed or not. Only completions get merged into principle changes in the code to refine the process. You can apply this idea to skill files, workflows, and more.
This means the more projects you tackle, the more iterations you run, the better your system gets over time. I experienced this first hand when running the forge CLI for the first time. It SUCKED to say the least but with this approach it now runs really cleanly and helped me refine my ideas and they will only be getting better. The main breakthrough is how this tool has allowed me to keep asking the question "what am I missing and what could be better?" without the massive mental research to answer those questions in a tight-ish loop.
Please feel free to check out the repo, try it out for yourself, give me your critiques or praise on if it hurt or helped your process, and collaborate with me to jump in and make it better! This is my first time making something of this nature so if it is poorly made then I ask the great devs out there: I would love your feedback! Please also let me know your implementations on how you solved similar problems!
I think the difference is largely due to the fact that bringing a bit of extra weight isn't as problematic in a submarine as it is in a spacecraft. For example, you always have a water source that can give you oxygen and you don't have to worry about discarding the byproducts.
Determine the cost of owning the ice cream maker per year. For some people, owning something costs nothing and in fact provides value, they find comfort in owning things, used or not. For some people, owning things is a burden, a drain, and owning something unused is painful.
An ice cream maker costs maybe $200? How would you feel if you disposed of the ice cream maker and then a week later realized you wanted it?
If you want to soften the blow, don’t throw things away: give them away to someone who will use them.
I hate owning things, owning an ice cream maker that I never use would weigh on me and I would much rather spend $200 on a new ice cream maker every 5 years (that I give away after a month) than have an unused ice cream maker for 5 years.
This is the second major npm supply chain attack this year and the playbook is identical every time: hijack a maintainer account, publish via CLI to bypass CI/CD, inject a dependency nobody's heard of.
The fix isn't better scanning (though Socket catching it in 6 minutes is impressive). The fix is npm making Trusted Publishers mandatory for packages above a download threshold. If axios can only be published through GitHub Actions OIDC, a stolen password is useless.
We run a fleet of AI agents that depend on npm packages. First thing we did tonight was audit every lockfile. Clean — but only because we aggressively minimise dependencies. The real victims here are the thousands of teams who npm install with ^ ranges and never check what changed.
I've spent 25 years building data infrastructure in financial services — Goldman Sachs, Bridgewater, Deutsche Bank, Freddie Mac. Today I'm open-sourcing Datris, a data platform built around MCP (Model Context Protocol) from day one.
The idea: As far as I know, this is the first data platform where MCP is the primary interface, not an afterthought. We built the MCP server first and made everything accessible through it. The platform has 30+ MCP tools — any agent (Claude, Cursor, your own framework) can create pipelines, ingest data, validate with plain English rules, transform, query databases, search vector stores, and monitor jobs. The API and UI use the same pipeline engine, but MCP is the native interface.
What an agent can do: - Create a complete pipeline from sample data in one call (schema auto-detected) - Upload and process CSV, JSON, XML, Excel, PDFs, Word docs - Validate and transform data using natural language instructions - Query PostgreSQL and MongoDB - Semantic search across 5 vector databases (Qdrant, Weaviate, Milvus, Chroma, pgvector) - Ask questions in natural language — SQL generated and executed automatically - Full RAG pipeline — extract, chunk, embed, and search documents - Profile data quality, diagnose errors, explore metadata
There's also a CLI: bash datris ingest sales.csv --dest postgres datris query "SELECT * FROM public.sales" datris query "top 5 stocks by volume" --table trades datris search "return policy" --store pgvector --collection docs
A prompt tweak, model swap, or context change can silently shift behavior — wrong tone, dropped policy details, different tool routing. Standard unit tests pass. You / users notice later.
Agentura adds behavioral evals to your CI pipeline. On every PR, it runs your agent against expected outputs, compares scores to your main branch baseline, and shows you exactly which cases regressed before you merge. 100% Free, Open Source.
Try it locally:
npx agentura@latest init
npx agentura@latest run --local
The article is reflecting on the observed reality that US Navy operations in this war are taking Iran’s littoral combat power into account by operating its ships further from the Iranian coast…why can’t you imagine that they are operating this way under Trump?
The token efficiency argument only holds for teams paying per API call. Cursor, Copilot, and most tools developers actually use are flat subscription. On a flat monthly fee, there's no economic pressure toward brevity in generated code.
This is the real questionable part of the graphic. It seems that no-data pre 2018 was just considered 100% uptime (which is hardly historically accurate).
The main server runs 3x RTX PRO 6000 (288 GB VRAM combined), power limited to 280W each (can crank it up as temps are fine but about to add some more fans first as the cards are stacked).
The second server is 2x Radeon RX 7900 XTX (48 GB VRAM combined). It's a fairly recent gaming PC that's being repurposed. Idea is to power limit those cards too and run some overnight stuff w small/medium sized models.
Intel just released some 32 GB VRAM cards, but sounds like support across AI tooling is a bit rough atm.
Kubernetes-native orchestration for autonomous AI agent fleets. For founders who want to go off-cloud.
Clawbernetes turns AI agent deployment into a kubectl apply workflow. Declare your agents, policies, and observability stack as Custom Resources, the operator handles the rest.
Cool project, the on-device STT angle is solid. I've been working on something adjacent, LiveSuggest, which focuses more on real-time suggestions during the call rather than transcription, same no-bot approach via system audio capture. Different use case but the "no bot joining" thing resonates, people really don't want that. livesuggest.ai if curious.