And obviated by the judges having no easy way to learn and isolate which "part" of the app they're testing, and how such compartmentalization goes strongly against human psychology.
I don't pretend it'll be "perfect", but it should let decisions fall (noisily) around what we think of as "fair" (since we're all running that assessment through the same hardware) which is really what we're looking for.