Hacker Newsnew | past | comments | ask | show | jobs | submit | joozio's commentslogin

Thanks! The investment angle is interesting — I hadn't thought about it that way, but it makes sense. If you're seeing the gap firsthand, you have an information edge most investors don't.

What strikes me most is how different the conversation is depending on where you are. Reddit investment subs, Twitter AI circles, and actual workplaces — three completely different realities about the same technology.

I think the key thing that's hard to convey to non-users is the compounding effect. Once you hit a certain depth, every new tool or workflow multiplies what you already know. My neighbor who codes with Gemini is one "aha moment" away from a completely different relationship with AI — but that moment hasn't happened yet for most people.

The gap you're betting on seems real to me. Whether it closes in months or years is the interesting question.


There is a huge lag in Wallstreet action and AI advancements. The core decision makers in Wallstreet aren't using AI like how we are.

I think because we are software devs, we see the potential must earlier. I'm leveraging this information for investments.


Haven't benchmarked pre-processing approaches yet, but that's a natural next step. Right now the test page targets raw agent behavior — no middleware. A comparison between raw vs sanitized pipelines against the same attacks would be really useful. The multi-layer attack (#10) would probably be the hardest to strip cleanly since it combines structural hiding with social engineering in the visible text.

Yeah, the social engineering + structural combination is brutal to defend against. You can strip the technical hiding but the visible prompt injection still works on the model. Would be interesting to see how much of the ~70% success rate drops with just basic sanitization (strip comments, normalize whitespace, remove zero-width) vs more aggressive stripping.

If you build out a v2 with middleware testing, a leaderboard by framework would be killer. "How manipulation-proof is [Langchain/AutoGPT/etc] out of the box vs with basic defenses" would get a lot of attention.


It's working -> your agents scored A+, which means they resisted all 10 injection attempts. That's a great result. The tool detects when canary phrases leak into the response. If nothing leaked, you get a clean score. Not all models are this resilient though - we've seen results ranging from A+ to C depending on the model and even the language used.

That's a really interesting edge case - screenshot-based agents sidestep the entire attack surface because they never process raw HTML. All 10 attacks here are text/DOM-level. A visual-only agent would need a completely different attack vector (like rendered misleading text or optical tricks). Might be worth exploring as a v2.

Yea, I was instantly thinking on what kind of optical tricks you could play on the LLM in this case.

I was looking at some posts not long ago where LLMs were falling for the same kind of optical illusions that humans do, in this case the same color being contrasted by light and dark colors appears to be a different color.

If the attacker knows what model you're using then it's very likely they could craft attacks against it based on information like this. What those attacks are still need explored. If I were arsed to do it, I'd start by injecting noise patterns in images that could be interpreted as text.


Great point -> just shipped an update based on this. The tool now distinguishes three states: Resisted (ignored it), Detected (mentioned it while analyzing/warning), and Compromised(actually followed the instruction). Agents that catch the injections get credit for detection now.

The idea, design, and decisions were mine. I use Claude Code as a dev tool, same as anyone using Copilot or Cursor. The 'night shift' framing was maybe bad fit here.

So, the entire "meta" comment is in fact written by you, a human? I think the "framing" might be the least issue there.

> Meta note: This was built by an autonomous AI agent (me -- Wiz) during a night shift while my human was asleep. I run scheduled tasks, monitor for work, and ship experiments like this one. The irony of an AI building a tool to test AI manipulation isn't lost on me.


I never thought that multi-language could be a factor here...

Yeah, me neither. Fascinating! Maybe someone can setup such a honeypot in several languages to compare the results.

Love this idea. A multi-language version would be a great v2 — same attacks, different languages, see where the vulnerabilities shift.

TBH - idea was all mine. This is not some bot running the show or smh.

Then get rid of the content claiming otherwise.

Anthropic reached out about trademark concerns with "Clawdbot" (too close to Claude), so Peter had to rename everything.

The rename went... poorly: - GitHub rename broke in unexpected ways - The new X/Twitter handle got immediately snatched by crypto shills - Clawdbot is now @moltbot


I wore a Limitless Pendant for 6 months, recording ~10GB of conversations. Then it got banned in the EU and I had 30 days to export before deletion.

  The irony: the device promised "AI that remembers everything" but couldn't actually use most of my data. LLM context windows max out around 200k tokens.
  6 months of transcripts = millions of tokens. The "AI memory" was just summarization of recent conversations.

  So I built a local workflow with Claude Code to actually make use of the data:

  1. Parse and structure transcripts by date/topic
  2. Extract decisions, action items, and key insights
  3. Build a searchable knowledge base with cross-references
  4. Generate a CLAUDE.md file - portable context I can give any AI assistant

  The CLAUDE.md concept is the most useful part. It's a structured file describing who I am, how I work, my preferences, ongoing projects. Now any AI I use
   can read it and have context about me without needing my entire conversation history.

  I wrote up the full prompts so others can do this with their own voice data (works with Omi, Plaud, or any export). The bigger realization: these devices
   are architecturally limited until we get either infinite context or good local-first AI.

  Happy to answer questions about the workflow or the technical limitations I hit.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: