the part that breaks down for me is the property test loop. if the agent writes the code AND the properties, it's just bootstrapping from the same mental model that produced the bug. i've had it pass all self-generated tests and still ship logic that was wrong in ways i only caught by accident. review the spec/properties carefully, not the code, seems like the right frame.
curious what happens to 13F filings if quarterly earnings reports go away. the 45-day window post quarter-end is baked into the same reporting cycle. seems like SEC would have to rethink that too, or we lose a big chunk of the institutional transparency picture
Interesting timing given the SEC is also considering changes to 13F disclosure windows. Less frequent earnings could mean more information asymmetry for retail investors - institutions with proprietary data will have even more edge.
the cost angle is underrated here. sonnet for implementation, opus for architecture review — that's not a philosophical stance, it's just not burning money. i do something similar and the reviewer pass catches a surprising number of cases where the implementer quietly chose the path of least tokens instead of the right solution
The no-degradation-at-scale claim is the interesting part. Context rot has been the main thing limiting how useful long context actually is in practice — curious to see what independent evals show on retrieval consistency across the full 1M window.
I don't think they're claiming "no degradation at scale", are they? They still report a 91.9->78.3 drop. That's just a better drop than everyone else (is the claim).
the token math is compelling but I'm curious about the discovery step. with native MCP the host already knows what tools exist. with this, the agent has to run --list first, which means extra roundtrips. for 120 tools that might still be a net win, but the latency tradeoff seems worth calling out
the interesting twist with agents is that the "audience" for the code is now split — humans still need to read it, but so does the model generating/modifying it. literate programming was designed for the human reader; I wonder if there's a format that optimizes for both simultaneously, or if those constraints are actually at odds
Same energy here. I'm in my late 20s but the feeling OP describes is exactly what I got when I started using Claude Code for a fintech side project. Spent years wanting to build stuff but getting bogged down in boilerplate and config hell. Now I just describe what I want and iterate. It's like pair programming with someone who never gets tired and doesn't judge your 2am ideas.
Been building a fintech data pipeline with Claude Code lately and yeah this tracks. The moment I started writing actual test cases before letting it loose the quality jumped massively. Before that it was generating stuff that looked right but would silently drop edge cases in the data parsing. Treating it like a junior dev who needs a clear spec is exactly right imo.
Anecdotally seeing this play out in SF right now. We're hiring for our fintech startup and getting 500+ applications per role, many from people at companies I would've assumed were stable. A year ago we struggled to get anyone to even look at us. The talent pool is incredible but man it feels weird benefiting from other people's misfortune.
reply