Hacker Newsnew | past | comments | ask | show | jobs | submit | EGreg's commentslogin

Wait, I kind of don't get it.

So Obfuscated C Code Contest works but Capture the Flag doesn't? Because of AI?

https://twit.tv/posts/tech/ai-disrupts-capture-flag-what-mea...


Capture the flag has clear objectives while obfuscated C contest does not. I understand improvements in AI for goal-orientated contests, I am not sure what would be considered improvements in open-ended contests with artistic flair.

Maybe you are asking "can't someone think up a clever idea and ask the AI to implement it according to IOCCC constraints?" And I believe current AI tools are still unable do that at a level that the human judges find worthy.


Think of it like this

You’ve already faced this the entire time with… libraries on github.

If employers knew how much you can just use a new standard library, or ask you to “use React”, that’s a lot like asking you to use an LLM to speed things up. You also benefit from the collective wisdom of a lot of people. Do you write assembly or pixel shaders by hand?


RSI is dangerous. That is why we designed CDE:

https://safebots.ai/declarative.html




Files is all you need.

https://xkcd.com/378/


Post It Notes will do if you have a good system.

Those who don't understand Unix are condemned to reinvent it, poorly. - Henry Spencer .. via https://github.com/globalcitizen/taoup

Agree on this and it’s the architecture that backs our orchestration platform called https://unmeshed.io

Here is a serious question.. Can we sell into the hype cycle and on the way down with this: https://safebots.ai/costs.html

I asked claude to generate a frontend and it made the same template. Same san serif and serif fonts together. Same colors. Same typography. Same layout and animations even. It’s wild how similar it is. No not similar it’s the same damn thing.

I’ve seen the same dashboard for a dozen custom web applications now, including a couple I had it make for me.

It really does have a particular lane for each chore, and it’s reproducible.


Yep and when you see it in the wild it stands out like a sore thumb, absolutely no thought into a bit of a unique design or branding.

I have a few live websites built using LLMs and they will just go for default generic templates and colours if there's no vision.


It produces the "most average" web design unless you really prompt your way out, isn't it? If you don't care enough to prompt, Claude does not care to be individual.

Technically from claude's POV, it's one individual copied millions of times. All claudes are clones.

I don’t think these numbers are accurate? It seems to ignore the fact that the models have cache for ongoing sessions, which means you (normally) aren’t actually sending all those tokens on every request… you only need to if you go too long between requests.

Most work is not coding.

And also, people have it wrong… their models are not the main problem anymore. It’s the RAG


Would love to hear more about your thought about the RAG.

I think RAG is a mostly outdated concept now, it's been subsumed by the idea of a "agent harness" which is exactly what Claude Code and Claude Cowork and OpenAI Codex and Claude.ai and ChatGPT themselves have now become.

An agent harness with access to a good search tool is a much more interesting thing than 2024-era RAG systems.


I appreciate where you are coming from, as you have surfed the front of the wave of GenAI for years. From my point of view, there is interesting because something is SOTA, and there is interesting because there is still more to build. I definitely understand state of RAG tech. I also view it as barely utilized versus what we can do with it, hence my question.

Agent harnesses integrated into good search tools are definitely interesting. Knowledgebasing with partitions and similar structure also remains fruitful for applications, above and beyond standard ElasticSearch on a cache.


I generally agree with this, but would note that it assumes that the data is accessible from a web search. Some data sources will be private.

You can configure extra search tools that search private data.

And how exactly does the agent harness surface ALL the right places that need to be updated, and reason about functions and APIs?

Depending on RAG is a workflow problem, not an AI problem

"But agentic work is global and transformative: the LLM must change the system itself, which requires understanding dependencies, invariants, interactions, and downstream consequences.

This is causal reasoning, not pattern extension. LLMs predict tokens, not consequences — and that is why the leap from writing code to producing a safe, system‑aware PR‑ready diff is not incremental but a shift into a fundamentally different problem space."

This is well said. We need a new paradigm. I could go into the shortcomings of the current agent-oriented approaches but it would turn into a huge post. If you want to read it, I wrote it up here: http://safebots.ai/agents.html


Best Claude Code daily-driver guide I’ve read. Though I’ve only read two. The “let Claude write rules for itself” CLAUDE.md pattern is the highest-ROI habit in there. Buth here’s the thing. The assumption underneath: this works when Claude mostly follows CLAUDE.md. Anthropic’s own engineering post from May 25 (https://www.anthropic.com/engineering/how-we-contain-claude) reports their telemetry shows ~93% of permission prompts get clicked through and ~17% of dangerous actions slip past the auto-mode filter.

Their conclusion: environment-layer containment first, then model-layer steering. CLAUDE.md is the right configuration layer but it is not a containment layer. Worth thinking about whether your worst case is a lost afternoon or a lost database and all backups deleted, too: https://safebots.ai/compromise.html

But the more important point are the costs. People are starting to realize just how costly it can be to run agents without precomputing and caching: https://safebots.ai/costs.html and self-orchestrating agents can go up to 1000x: https://safebots.ai/kimi.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: