Capture the flag has clear objectives while obfuscated C contest does not. I understand improvements in AI for goal-orientated contests, I am not sure what would be considered improvements in open-ended contests with artistic flair.
Maybe you are asking "can't someone think up a clever idea and ask the AI to implement it according to IOCCC constraints?" And I believe current AI tools are still unable do that at a level that the human judges find worthy.
You’ve already faced this the entire time with… libraries on github.
If employers knew how much you can just use a new standard library, or ask you to “use React”, that’s a lot like asking you to use an LLM to speed things up. You also benefit from the collective wisdom of a lot of people. Do you write assembly or pixel shaders by hand?
I asked claude to generate a frontend and it made the same template. Same san serif and serif fonts together. Same colors. Same typography. Same layout and animations even. It’s wild how similar it is. No not similar it’s the same damn thing.
It produces the "most average" web design unless you really prompt your way out, isn't it? If you don't care enough to prompt, Claude does not care to be individual.
I don’t think these numbers are accurate? It seems to ignore the fact that the models have cache for ongoing sessions, which means you (normally) aren’t actually sending all those tokens on every request… you only need to if you go too long between requests.
I think RAG is a mostly outdated concept now, it's been subsumed by the idea of a "agent harness" which is exactly what Claude Code and Claude Cowork and OpenAI Codex and Claude.ai and ChatGPT themselves have now become.
An agent harness with access to a good search tool is a much more interesting thing than 2024-era RAG systems.
I appreciate where you are coming from, as you have surfed the front of the wave of GenAI for years. From my point of view, there is interesting because something is SOTA, and there is interesting because there is still more to build. I definitely understand state of RAG tech. I also view it as barely utilized versus what we can do with it, hence my question.
Agent harnesses integrated into good search tools are definitely interesting. Knowledgebasing with partitions and similar structure also remains fruitful for applications, above and beyond standard ElasticSearch on a cache.
"But agentic work is global and transformative: the LLM must change the system itself, which requires understanding dependencies, invariants, interactions, and downstream consequences.
This is causal reasoning, not pattern extension. LLMs predict tokens, not consequences — and that is why the leap from writing code to producing a safe, system‑aware PR‑ready diff is not incremental but a shift into a fundamentally different problem space."
This is well said. We need a new paradigm. I could go into the shortcomings of the current agent-oriented approaches but it would turn into a huge post. If you want to read it, I wrote it up here: http://safebots.ai/agents.html
Best Claude Code daily-driver guide I’ve read. Though I’ve only read two. The “let Claude write rules for itself” CLAUDE.md pattern is the highest-ROI habit in there. Buth here’s the thing. The assumption underneath: this works when Claude mostly follows CLAUDE.md. Anthropic’s own engineering post from May 25 (https://www.anthropic.com/engineering/how-we-contain-claude) reports their telemetry shows ~93% of permission prompts get clicked through and ~17% of dangerous actions slip past the auto-mode filter.
Their conclusion: environment-layer containment first, then model-layer steering.
CLAUDE.md is the right configuration layer but it is not a containment layer. Worth thinking about whether your worst case is a lost afternoon or a lost database and all backups deleted, too: https://safebots.ai/compromise.html
But the more important point are the costs. People are starting to realize just how costly it can be to run agents without precomputing and caching: https://safebots.ai/costs.html and self-orchestrating agents can go up to 1000x: https://safebots.ai/kimi.html
So Obfuscated C Code Contest works but Capture the Flag doesn't? Because of AI?
https://twit.tv/posts/tech/ai-disrupts-capture-flag-what-mea...
reply