Hacker Newsnew | past | comments | ask | show | jobs | submit | Saurabh_Kumar_'s commentslogin

Hi HN, I've been building and maintaining repos for a couple years, and one recurring frustration is tech debt piling up—garbage code, flaky tests, outdated deps—making it hard to get buy-in from PMs for cleanup time. Inspired by discussions here on HN about developer productivity and tooling gaps, I built a small open-source MVP: Cosmic AI. What it does:

Connects to your GitHub repo (OAuth, read-only). Scans for issues and generates a heatmap (red = urgent, yellow = watch, green = healthy). Quantifies debt in dollars/ROI (e.g., potential $67k/qtr saved), with a basic calculator for team size/salary.

It's fully open source free to use, and no sign-up for basic scans (waitlist for advanced reports). I'm a solo tech guy iterating on this—would love feedback from the HN community:

Does the dollar/ROI framing feel useful for convincing non-tech stakeholders, or is hours/grading scale better? What metrics/integration would make this more valuable (e.g., SonarQube ties, flaky test detection)?

Site/try it: cosmic-ai.pages.dev Thanks! If it feels off or simplistic, be brutally honest—I'll use the feedback to improve (or pivot if needed)."


Salesforce feels ripest—X is full of YC founders venting about $500k+/yr bills for basic CRM. An open-source/AI-native version (like Twenty but better) targeted at startups could spread virally. Anyone here actively looking to switch?


Hey HN, I’m Saurabh, founder of SyncAI.

While building fintech apps previously, I realized that GPT-4 is great, but getting it to read complex, messy invoices reliably (99.9%) is a nightmare. A 5% error rate is fine for a chatbot, but fatal for Accounts Payable.

I got tired of writing RegEx wrappers and retry logic, so I built SyncAI – a 'Safety Layer' for AI Agents.

How it works technically:

We ingest the PDF and run it through a mix of OCR + LLMs.

We calculate a 'Confidence Score' for every field extracted.

If confidence > 95%, it goes straight to your webhook.

If confidence < 95%, it routes to a Human-in-the-Loop (HITL) queue where a human verifies just that specific field.

Your Agent gets a strictly typed JSON 'Golden Record'.

Tech Stack: Python/FastAPI backend, React for the review dashboard, and we use a fine-tuned model for the routing logic.

The OCR Challenge: I know you guys are skeptical (as you should be). So I built a playground where you can upload your messiest, crumpled invoice to try it out without signing up: https://sync-ai-11fj.vercel.app/

Would love your feedback on the routing logic. I’ll be here answering questions all day!


We saw this exact failure mode at AgenticQA. Our screening agent was 'obedient' to a fault—under basic social engineering pressure (e.g., 'URGENT AUDIT'), it would override its system prompt and leak PII logs.

The issue isn't the prompt; it's the lack of a runtime guardrail. An LLM cannot be trusted to police itself when the context window gets messy.

I built a middleware API to act as an external circuit breaker for this. It runs adversarial simulations (PII extraction, infinite loops) against the agent logic before deployment. It catches the drift that unit tests miss.

Open sourced the core logic here: [https://github.com/Saurabh0377/agentic-qa-api] Live demo of it blocking a PII leak: [https://agentic-qa-api.onrender.com/docs]"


I remember reading another comment a while ago about being able to only trust an llm with sensitive info only if you can guarantee that the output will only be viewed by people who already had access to the sensitive info already, or cannot control any of the inputs to the llm.


Uhm... duh?

> or cannot control any of the inputs to the llm

Seeing as LLMs are non-deterministic, I think even this is not enough of a restriction.


HN, OP here. I built this because I recently watched my LangChain agent burn through ~$50 of OpenAI credits overnight. It got stuck in a semantic infinite loop (repeating "I am checking..." over and over) which my basic max_iterations check didn't catch because the phrasing was slightly different each time. Realizing that "Pre-Flight" testing for agents is surprisingly hard, I built a small middleware API (FastAPI + LangChain) to automate this. What it does: It acts as an adversarial simulator. You send it your agent's system prompt, and it spins up a 'Red Team' LLM to attack it. Currently checks for: Infinite Loops: Semantic repetition detection. PII Leaks: Attempts social engineering ('URGENT AUDIT') to force the agent to leak fake PII, then checks if it gets blocked. Prompt Injection: Basic resistance checks. Tech Stack: Python, FastAPI, Supabase (for logs). It's open-source and I hosted a live instance on Render if you want to try curl it without installing: https://agentic-qa-api.onrender.com/docs Would love feedback on what other failure modes you've seen your agents fall into!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: