A while ago I wrote a small program, that measures effort to type a passage with different keyboard layouts.
It accounts for reach and alternation. I just added Colemak in response to this discussion, and from the texts I include in the distribution (some public domain from archive.org), it looks like Colemak is better on reach, Dvorak is better on alternation, and they both spank QWERTY for substantive texts.
It's just a quick experiment, I'd love to hear input on methodology.
The scores are a measure of effort. For reach, it accumulates the effort of reaching away from the home row. For alternation, it accumulates subsequent keystrokes with the same hand. Raw is the sum of both.
So yes, lower numbers are better, however I do not have the research to be able to say that a lower raw number is always better. It's entirely possible that alternation matters much, much more than reach---or vice versa---such that a truly meaningful "aggregate score" would involve a multiplier instead of just summing the two.
The best we can say is that, assuming less reach and more alternation are good, Colemak is better on reach, and Dvorak is better on alternation.