writingdna's comments

writingdna · 2026-02-22T08:20:31 1771748431

The Yegge claim about manual code review being outdated conflates two things: reviewing for correctness vs. reviewing for design coherence. Agents are getting decent at the first (does this function do what the spec says?) but remain weak at the second (does this abstraction fit the existing architecture? will this pattern scale when the next feature lands?).

What actually works for me is treating agents less like autonomous developers and more like very fast typists who need clear architectural guardrails. The heavy lifting is writing the context documents -- architecture decision records, module boundary descriptions, naming conventions -- that constrain the generation. Ironically, the better your documentation, the less you need an orchestrator, because a single agent with good context produces coherent code on the first pass.

The git worktree pattern multiple people mention is underrated. Having each agent work on an isolated branch with automated test gates before merge catches the drift problem at the integration point rather than trying to prevent it during generation.

writingdna · 2026-02-22T08:15:43 1771748143

The article frames this as "semantic ablation" but the underlying mechanism is more specific: it is distributional averaging. RLHF and DPO reward policies optimize for the modal response given a prompt distribution. That is not a bug in the training process, it is the objective function working as designed. The model learns to produce the response that the median annotator would rate highest, and that response is, almost by definition, the least distinctive one.

What is underappreciated is how much stylistic signal lives in what information retrieval people call "burstiness" -- the tendency for distinctive words to cluster rather than distribute evenly. Hemingway's short declarative stacking, DFW's recursive parentheticals, legal writing's formulaic precision -- these are all bursty patterns that a model trained to maximize expected reward will sand down. You can partially recover it with few-shot prompting, but the model is fighting its own reward gradient the entire time.

The practical question is whether you can encode a style prior that survives the decoding process. The research on authorship attribution (stylometry) suggests the feature set is well-understood -- function word frequencies, sentence length distributions, type-token ratios, syntactic complexity metrics. But nobody has built a production system that uses those features as a constraint during generation rather than just detection.