Also that seems like a silly test. Couldn't you just measure the differential after mixing? TFA also mentions it, but noise is also a factor. For the water-heater example, the limited number of probes introduces a known amount of potential variance.
Rather than a straight % efficiency, a more useful number to report would include the confidence interval - Its 99% ± 3%. Consumers can't do stats but at least it's more honest. TFA touches on this but doesn't draw any conclusions other than "we need to look at the source data and do our own analysis".
The industry has made a bit of progress, surprisingly unprompted by regulations - female and child dummies came into circulation before they were required in tests. But overall, testing is still run against a tiny handful of body types which move 'realistically' in only a few regulation-guided respects.
The main reason the DOT standard is so bad is because its mired in bureaucracy and managed by a severely underfunded organization.
Though I'm not aware of a law stating that for any given principle there is an existing eponymous law.
... oh, wait: https://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy