> whether the final result is actually better, or whether it is just a more polished hallucinatio
Agents sampled from the same base model agreeing with each other isn't validation, it's correlation. Cheaper orchestration mostly amplifies whatever bias the model already has. Neat hack though.
MFN clauses are common in retail, what's different about Amazon's is the enforcement. Manufacturer MAP policies threaten "we won't ship you more," which you can recover from (unlike a listing demotion)
>Generation became cheap, validation didn't
this is basically the whole post imo. Also maps to why productivity hasn't really moved despite 93% adoption.. the oversight bandwidth eats the generation gains.
KL(P||Q) penalizes Q heavily when it assigns low probability to things P considers likely, but barely cares when Q wastes probability on rare events. That's why KL regularization in RLHF pushes models toward typical, average-sounding outputs..
I think you probably meant this, but when used with RL it's usually KL(π || π_ref), which has high loss when the in-training policy π produces output that's unlikely in the reference. But yeah as you noted, I guess this also means that there is no penalty if π _does not_ produce output in π_ref, which leads to a form of mode-collapse.
This collapse in variety matches with what I've seen some studies show that "sloppification" is not present in the base model, and is only introduced during the RL phase.
Auto-switching across model providers basically concedes the model layer is commodity, which I think is right (1)
tbd whether the skill registry develops network effects or just stays a flat directory. Portable skills as APIs tracks with the broader pattern of agent stacks decomposing into specialized swappable layers, where the defensible asset is whatever process knowledge orgs encode, not the deployment infra.
I agree on the commodity point, that's why I went multi-model from start.
The registry question is the one I'm thinking about the most. Right now it's flat. I plan to integrate usage data (success rates, cost, trust scores). So the registry tells you which skills actually work well, and that's valuable.
It's already been beaten into acceptance that I have to use the Ticketmaster app (shockingly awful) or Dice app (not quite as bad but still sucks) to get into a lot of music venues in Boston.
But at one club they wanted me to install another app just to check my coat. I elected to hide it under a some furniture instead lol
Looking at the commit dates (which seem to be derived from the original publication dates) the history seems quite sparse/incomplete(?) I mean, there have only been 26 commits since 2000.
It's related to commits actually having a parent-child structure (forming a graph) and timestamps (commit/author) being metadata. So commits 1->2->3->4 could be modified to have timestamps 1->3->2->4. I know GitHub prefers sorting with author over commit date, but don't know how topology is handled.
> It's related to commits actually having a parent-child structure (forming a graph) and timestamps (commit/author) being metadata.
Yeah, I think everyone is aware. It's just that the last couple dozen commits, to me, looked like commits had been created in chronological order, so that topological order == chronological order.
> I know GitHub prefers sorting with author over commit date, but don't know how topology is handled.
The amendment 34 (by Markéta Gregorová, Greens/EFA) to the Sippel Report A10-0040/2026 significantly restricts the ePrivacy derogation (chat control extension until 2027).
It replaces Art. 3 para. 1 lit. a of Regulation (EU) 2021/1232: Processing (scanning) may only be
- strictly necessary for technologies to detect/remove known CSAM material (hashes, no unknown content),
- proportionate,
- limited to necessary technologies and content data.
Agents sampled from the same base model agreeing with each other isn't validation, it's correlation. Cheaper orchestration mostly amplifies whatever bias the model already has. Neat hack though.
reply