So is it "lower is better" for all scores? In the raw_score a combination of the two previous scores? By that measure, dvorak is better on all nontrivial texts (so excluding the quick brown fox).

The scores are a measure of effort. For reach, it accumulates the effort of reaching away from the home row. For alternation, it accumulates subsequent keystrokes with the same hand. Raw is the sum of both.

So yes, lower numbers are better, however I do not have the research to be able to say that a lower raw number is always better. It's entirely possible that alternation matters much, much more than reach---or vice versa---such that a truly meaningful "aggregate score" would involve a multiplier instead of just summing the two.

The best we can say is that, assuming less reach and more alternation are good, Colemak is better on reach, and Dvorak is better on alternation.

