Not ideal, but the reason for this is that people have gotten used to larger bars indicating better performance on bar charts when evaluating LLMs. Including being confused by the older version of this very benchmark.
Reading further, including those charts, is what made me understand that my initial reading of the first chart was wrong after a bout of confusion.
IMHO, it's still the wrong choice. If one feels like their audience doesn't understand "Lower=Better", then I feel like plotting the inverse or the difference (I'm not familiar with this score) is the solution. Breaking the x-axis convention is inviting confusion, especially with the "Lower=Better" disclaimer (again imho).