If comprehension and spelling were decoupled but still equally weighted then the aggregate score would've been significantly higher. Let's split the difference and say that 20-30% on the set of [all answers simultaneously spelled correctly AND comprehended] becomes a 25% on the "spells things correctly" axis. If we weight that equally against the "comprehension" axis then you get
let R = 25 # reading
let C = ? # comprehension
let M = 40 # reported mean
(R + C) / 2 == M
(25 + C) / 2 == 40
(25 + C) == 80
C == 55 # implied comprehension grade from a 40% average
If comprehension was any higher than 55% - which this anecdote certainly suggests - then the 40% aggregate score awarded is an ineffective assessment of the holistic learning picture.
This is the grade school equivalent of tossing out resumes over a single copy editing mistake.
Code is not the only sort of thing with an optimal chunk size. Languages and APIs (such as sets of library or system calls) run up against the same sorts of human cognitive constraints that produce Hatton's U-curve.
Accordingly, Unix programmers have learned to think very hard about two other properties when designing APIs, command sets, protocols, and other ways to make computers do tricks: compactness and orthogonality.
I am willing to bet that Ed handles all of those activities as needed in modes 1 and 3. Breaking down "prep" and "review" into detailed bits doesn't really challenge his assertion that the core activity of actual programming regularly requires autonomous isolation.
If you're interested in well thought out criticisms of Taylorism (as well as Schumpeterism and other status quo models), check out Organization Theory by Kevin Carson.
One interesting point in the book is that Taylorism suffers from the garbage-in-garbage-out problem. Large firms are islands of calculational chaos because they suffer from the economic calculation problem  pointed out by Mises and Hayek. Similar to centrally planned economies, large firms cannot intelligently allocate resources or make other managerial decisions because there are heavy distortions in incentives / price signals.
He further points out that economic distortions occur not because of socialism per se, since large capitalist firms also suffer the same problem, but rather long hierarchies. He explores other modular and co-operative organizational models in the book also.
One helpful thing is to separate the power aspect from the measurement aspect. Taylorism is horrific because it's used by people with power, a managerial class, to micromanage and exploit workers. But measurement on its own isn't problematic; think of the quantified self movement, or athletes who used detailed self-study to improve their own performance.
I've been on teams that did a lot of self-measurement, and it has generally been fine. But the problems I recall are when people with managerialist inclinations seize upon something measured and use it to try to sound smart or exert control. E.g., the time a CEO, on his occasional visits, noticed our project LoC measurement. We all knew the dangers of that number, and treated it very lightly. But he kept trying to do MBA math with it (e.g., $/LoC), and I ended up having to tell him that if he didn't knock it off, we'd stop displaying the metric.
While I agree with you, I think it's also important to see that this sort of analysis can be used two ways:
1) Taken at face value, managers might try to directly change the positive metrics. For example, since face-to-face communication is shown to be superior to conference calls, ban remote work and mandate all meetings be in person. Obviously this probably wouldn't make people too happy.
2) On the other hand, understanding what is really driving the metrics can generally improve workplaces for everyone. For example, the article points out that instituting team-wide coffee breaks not only increased productivity, but also employee satisfaction at a call center.
As with all research like this, some companies will make good use of it, and others will not.