The article reminds me of a story from a friend that was an engineer at a major ...

ryanianian · on June 30, 2020

Isn't this a symptom of the exact testing procedure being a white box that rarely changes? Many students would score 100% on the ACT if the entire question and answer set were widely published.

Also that seems like a silly test. Couldn't you just measure the differential after mixing? TFA also mentions it, but noise is also a factor. For the water-heater example, the limited number of probes introduces a known amount of potential variance.

Rather than a straight % efficiency, a more useful number to report would include the confidence interval - Its 99% ± 3%. Consumers can't do stats but at least it's more honest. TFA touches on this but doesn't draw any conclusions other than "we need to look at the source data and do our own analysis".

dharmab · on June 30, 2020

This phenomenon also appears in the testing of crash helmets. The DOT (US government) helmet standard is easy to game since it's very strict on how and where the helmets are tested. The Snell Foundation standard is a bit better since the humans testing the helmets are allowed to look for weak points to target for anvil drops. The new FIM standard adds more variability to the tests to better simulate the varying angles of crashes.

https://youtu.be/uLj9WfoWPSQ?t=413

Bartweiss · on June 30, 2020

Crash test dummies have basically this problem also. They're designed for realism in certain very narrow ways, and then the very small number of approved dummies are used for testing car safety.

The industry has made a bit of progress, surprisingly unprompted by regulations - female and child dummies came into circulation before they were required in tests. But overall, testing is still run against a tiny handful of body types which move 'realistically' in only a few regulation-guided respects.

craftinator · on July 1, 2020

I think some of this falls into the simulation paradox: the more accurate the simulation, the closer the simulation is to the thing being modelled. But it's a quadratic relationship in most cases, so at some point meaningful increases in simulation accuracy cease to be economically viable.

dharmab · on July 1, 2020

Yeah, but in the words of RyanF9, "The US government can afford a BB gun", so there's no reason that DOT can't test helmet visors.

The main reason the DOT standard is so bad is because its mired in bureaucracy and managed by a severely underfunded organization.

chickenpotpie · on June 30, 2020

Also https://en.wikipedia.org/wiki/Goodhart%27s_law

bananamerica · on June 30, 2020

How did I know this linked to some kind of general principle with a cool short name even before reading or opening the link?

dredmorbius · on June 30, 2020

Well, there are several such:

https://en.wikipedia.org/wiki/List_of_eponymous_laws

Though I'm not aware of a law stating that for any given principle there is an existing eponymous law.

esja · on June 30, 2020

Haven’t you heard of DredMorbius’s Law of Eponymity?

dredmorbius · on June 30, 2020

But bananamerica really discovered it.

... oh, wait: https://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy

AnthonyMouse · on June 30, 2020

The patent system in a nutshell, ladies and gentlemen.

bananamerica · on July 1, 2020

Something tells me there’s a XKCD for that, or at least it should be.

colechristensen · on June 30, 2020

It means the test procedure has vulnerabilities which need to be patched.

cybrjoe · on June 30, 2020

Serious question, how is this different from what VW did? There was a lot of talk about complicit engineers vs. shady management in that discussion. Did this ever make it to production?

henryfjordan · on June 30, 2020

VW actively detected if the car was on a treadmill testing setup and electronically changed the engine parameters to make it run cleaner.

The water heater was designed with the test in mind but functions the same regardless.

VW was clearly cheating whereas the water heater is taking advantage of known deficiencies in the test (is that cheating?).

alistairSH · on June 30, 2020

I'm not sure that's an accurate description of VW's cheat.

VW knew the parameters of the test (stationary car, no steering input, prescribed throttle inputs, etc). And they configured the car to pass the test.

Water-heater engineer knew the parameters of the test (number and location of probes). And he configure the water-heater to pass the test.

Same-same. In both cases, an engineering team willfully committed fraud to improve their sales figures.

The only differences are a bit pedantic. VW's "configuration" was more elaborate. But, both groups gamed the test.

amscanne · on June 30, 2020

You might misunderstand what VW did.

Apparently it is common for cars to have a "test mode" bit, because they run on a dyno (only one set of wheels spin), and the car may disable certain traction control systems, etc.

VW changed the way other systems (the engine itself) function in this mode. So even if everything is exactly the same on the road (e.g. 30mph in a straight line for hours), the car will perform differently and have different emissions. You don't drive around on a dyno.

Optimizing for the test may go up to the line, but VW crossed it.

nwallin · on June 30, 2020

The differences are not pedantic at all. The differences are extremely significant.

The VW engineers designed the car to do different thing while being tested vs used by consumers. While being tested, the emissions system was on. While being used by consumers, the emission system was entirely disabled.

The water heater engineers designed the water heater to do the same thing while being tested vs used by consumers.

If the water heater engineers designed the water heater to never turn on and keep the water at room temperature while being tested, ("works good boss, water heater efficiency is infinity percent") then yes, it would be reasonable to claim they did essentially the same thing.

esrauch · on July 1, 2020

> If the water heater engineers designed the water heater to never turn on and keep the water at room temperature while being tested

I think the difference between what they did and this is only a difference in degree and not kind though, right?

titzer · on July 1, 2020

It is a difference in kind: VW dynamically detected and adapted behavior to the test. It would never operate in that way under normal conditions. The water heater example was completely static: it always behaved the same way, under test or not.

esrauch · on July 1, 2020

People obviously disagree since I was downvoted but I don't really see the black-and-white difference between the two.

In both cases there was an intentional design change to deliberately changed mislead a measurement, and the final product does not match the intended thing. Both adversarially directly cause the consumer's purchase to not be the intended item; no one gets a car that has emissions measured and no one gets a heater that has the efficiency measured.

Nasrudith · on June 30, 2020

One added difference is that the water heater. The intent may be deceptive but they provided what was asked for and what they claimed. It is almost malicious compliance. Dickish gaming and probably worth suing over but not sure if it is criminal fraud per se.

It is like the joke about Soviet factory metrics for some item like nails. They set the quota on numbers and got useless ones better described as needles. They wised up a bit and set the quota by weight next month and got one useless massive nail instead. It isn't even too far from the truth given actual management involved things like a train line circularly shipping coal between depots instead of retrieving from source or distributing to end users to boost their "metric tons transported kilometers" metric.

VW meanwhile had a covert illegal configuration while claiming mileage and lack of maintenancd urea. The claims are outright false - that it provides all three benefits instead of two of three.

viraptor · on June 30, 2020

That kind of mentality was the joke in many eastern eu movies in the 80s. Pretty much the standard comedy in Polish movies from that time.

kelnos · on June 30, 2020

It's not the same at all, and I don't think the differences are pedantic.

The water heater runs the same regardless of whether it's being tested or is running normally in someone's home. Really, this test "cheat" exposed that the test itself was not measuring what it thought it was measuring; and hell, a manufacturer could accidentally cheat with their design.

The car would run differently if it detected it was in a test situation. In the real world, with a real driver, it would run in a way that would give different test results (if you were in a position to run the test while it's being driven).

ivanbakel · on June 30, 2020

What was more heinous about the VW case IIRC is that the cleanliness gains the cars made on the treadmill setup were enough to push them into compliance with regulations, which the engines normally didn't meet.

"Cheating" a better efficiency rating is significantly less reprehensible than cheating a legal obligation which was legislated for public health and environmental reasons.

ebg13 · on June 30, 2020

But the efficiency ratings also exist for public health and environmental reasons. Why does it matter to you that one is a mandate? They're both intentionally deceiving regulators and consumers (vs accidentally acing the test without cheating). Both are fraud.

Bartweiss · on June 30, 2020

I do think that manipulating a purely instructive measure is less extreme than manipulating a compliance test; consumers can seek alternate tests and reviews, but the state emissions test has special status even if a dozen other tests give a different result. That said, I believe Energy Star ratings affect tax rebates and electric bills, and they're required to be printed on products - so that's not really an arbitrary test.

There are other differences here too, I think. The water heater trick is passive manipulation that stays in place at all times, which limits how far from "real" performance it can get. And per the story, it seems more like "teaching to the test" than "cheating". That is, Volkswagen consciously moved away from the mandate outside of testing. The water heater was (potentially) as energy-efficient as they could design, with the test score manipulated on top of that.

None of that makes it harmless - if "as good as you can make" doesn't hit standards without manipulating them, that's still a problem. But I do find it less galling than "intentionally worsens emissions outside the test bench".

Gibbon1 · on June 30, 2020

The flip side of the water heater test is, you could game the test the other way too. Making your water heater look worse than it is. Would you do that? No.

The difference between the water heater and VW is the water heater manufacturer is providing a representative sample. And VW was not. It'd also be dubious to say that the water heater company is acting in bad faith. Where VW's bad faith rose to the level of criminal. On the other hand Volvo appears to be acting in strictly good faith.

Bad faith for a crash test would be crafting a silver plate model for testing. Reminds me that's what my uncle said the power supply manufacturer he worked for did.

kelnos · on June 30, 2020

The difference is that the water heater test itself was flawed in that it depended on arbitrary design decisions that have nothing to do with efficiency. Two completely innocent manufacturers could build water heaters with the exact same real-world efficiency, but score fairly differently on the efficiency ratings just due to how they're designed.

While I agree that this particular water heater manufacturer was doing something shady in order to get the best score, at least they weren't selling a product that did something differently while under test conditions vs. in real-world usage. They merely realized that the test itself had wide error bars, and designed their heater to "err" in the positive side of those.

VW, in contrast, sold a product that lied to the testers about its emissions in order to pass certifications, while in real-world driving would behave in a way that would not pass muster.

And to me I think that's the key: VW's cars intentionally behaved differently depending on if they were being tested or if they were being driven in normal real-world usage. This water heater behaved the same regardless of whether it was being tested or was heating water in someone's home.

In a way I think of this in academic terms. The water heater manufacturer studied the SAT to learn what kind of questions were going to be asked. VW stole the answer key to the test and memorized it.

ReactiveJelly · on June 30, 2020

One is polluting at the tailpipe, the other is only polluting at a power plant.

im3w1l · on June 30, 2020

Many important differences.

1. Actively detecting test and behaving differently. It's like stealing a test vs teaching to the test.

2. Lower stakes. Health issues are much more serious than inefficiency.

3. It affects the buyer. It's more acceptable for the buyer to be cheated than everyone around them.

4. People could have created these layers by accident. Favouring those who got lucky is unfair.

Honestly I think basically all my gadgets exaggerate how energy efficient they are, by tuning parameters for tests that don't correspond to the real world. My dishwasher has an energy efficient mode, the manual literally says it's just for compliance and recommends other modes. It's just a fact of life.

Bartweiss · on June 30, 2020

This is omnipresent even where regulators aren't involved: every graphics card benchmark out there is 'manipulated' relative to real world performance. At this point it's so universal that I don't think anyone is even fighting it - as long as everyone games benchmarks roughly the same amount, the relative scores stay usable.

Your point about fairness and passive design is the one that makes me view these cases differently also. In the anecdote, the product being tested was the same one being sold, and there's no sign the heater was worsened to improve test performance. The designers just picked the best-scoring option among some reasonable configurations. (Frankly, once they noticed that issue, what were they supposed to do? Pick the worst-scoring, or pick the spec out of a hat?)

In the VW story, the test-bench vehicle was fundamentally different from the market vehicle, and the road version was designed to behave worse on the metrics to get other gains. I happen to know someone who bought a diesel Jetta specifically because it was more eco-friendly than other options, and I think he'd draw a clear line between tuning for test metrics and VW consciously lying to their buyers.

orclev · on June 30, 2020

It's interesting you mention graphics cards because that very behavior has lead to the gaming community favoring benchmarks derived from a handful of current gen games min/max/avg FPS over so called synthetic benchmarks. It only took a handful of instances of companies baking in "benchmark" modes that get triggered when certain benchmarks are detected for people to start discounting those benchmarks in favor of more organic measurements.

Marsymars · on June 30, 2020

> Honestly I think basically all my gadgets exaggerate how energy efficient they are, by tuning parameters for tests that don't correspond to the real world.

Having measured all my gadgets with a Kill a Watt meter, that's not my experience. It seems that many gadget-makers realize that people don't really care about power draw, so they just slap the maximum draw onto the specs.

kube-system · on June 30, 2020

The big difference with VW is that they put in a mechanism that detects the test and completely changed the behavior of the car for the test.

ebg13 · on June 30, 2020

So they _intentionally_ set themselves up to lie to regulatory agencies and consumers about real world efficiency. That honestly sounds basically the same to me. In both cases the tests are poor approximations, and in both cases someone could accidentally optimize the test, and in both cases someone did it intentionally to deceive people.

henryfjordan · on June 30, 2020

Imagine the water heater company didn't know they were gaming the test. They first design a water heater and it gets a B- on the test. Being overachievers, they work hard and submit a second design that gets an A+. They might not realize that both heaters are basically the same with the only difference being some heating element spacing that works better for the test. Both times they submitted a legit design that was the same they'd provide to consumers. Sure, we know the engineers knew what was happening, but we can see how one might innocently arrive in the same scenario. I think it's safe to say the test is flawed.

The VW test is not like that. There's no way to innocently arrive in the scenario they did. They did not game a bad test, they literally lied to the test administrators. The car ran in "clean mode" only if it was in the test environment. If the car ran like it did on the road, they'd have failed (which is how they were caught, with a mobile testing setup).

One of the points in the article is that regulating for safety based on known testing conditions is going to result in over-fitting for the test. The water heater company is guilty of intentionally over-fitting. VW just straight up lied. I don't think those 2 actions are equal, VW is worse, but I agree that both are dishonest to a degree.

ebg13 · on June 30, 2020

> Imagine the water heater company didn't know they were gaming the test.

We don't have to. In this case we have clear admission of intent. The intent to deceive is what makes it fraud and not just being wrong.

henryfjordan · on June 30, 2020

My point isn't that what the water heater designer did was perfectly ethical, just that it's clearly distinct from what VW did.

ebg13 · on June 30, 2020

It's distinct in technical detail, but not in the broken ethics rule against intent to deceive.

chongli · on June 30, 2020

I think the ethics are still different. In the case of the water heater, they deceived the test to make their heater seem more efficient than it really was. However, the heater still does its job, it just costs a bit more to run (and emits some extra CO2).

What VW did was to take a clean-burning car and disable the pollution controls under normal driving conditions, for performance reasons. So while to the consumer it seems deceptive that VWs get better performance than they should, given how clean they’re supposed to be, in reality the cars are illegal and spewing toxins they were supposed to be removing from the exhaust. This makes the cars not only a pollution source but a health hazard to people living nearby.

To get on the same level of VW, the water heater would have to be doing something like emitting low levels of carbon monoxide into the home while having a feature that avoids doing that in the laboratory. In other words, reckless and willful disregard for human health and life.

dmurray · on July 1, 2020

There's no intent to deceive necessary in the water heater example. The water heater company could have sent it in with a note to the regulator saying "we moved the second element up, because we believe it will perform better on your test" and the regulator would likely just accept it instead of redesigning the test.

Also, for the water heating one, there's a plausible reason for the regulator to care about the discrete measurements rather than the total amount of thermal energy in the water. Hot water at the top of the tank is more valuable, because it's used first and less likely to be wasted, so you could wait it more heavily in a test. There's no parallel for the VW test cheating. No indication that's what happened here, of course.

Spooky23 · on July 1, 2020

There is no deception and no ethical issue. The nature of the test is known.

What placement of components would be ethical? Should engineers required to be separated from the test parameters by a Chinese wall? Do they need to build the system for the worst result? Some middle ground? If the engineers are unethical, where is the line?

The obvious answer to optimizations like this are for the testing body to tweak the test procedure based on what manufacturers do over time. That provides an incentive to be more conservative or accurate.

cortesoft · on June 30, 2020

The key legal difference is that VW literally behaved differently if a test was running. If they had simply designed a system that tested better than it actually performed (by optimizing the factors tested for), they would not have gotten in trouble.

If the water heater manufacturer had special heating elements that only ran during the test, it would be equivalent.

Bartweiss · on June 30, 2020

> in both cases someone could accidentally optimize the test

I think this is what I disagree with.

The water heater story is about a viable-for-market design which also optimized for the test. The equivalent for a car emissions test might be optimizing the transmission to reduce emissions at the specific speeds which will be tested. Those speeds could be sweet spots of the engine curve by accident, or they could be planned that way. I don't think that's necessarily right, but it's within the bounds of "natural" design for the product.

Instead of doing that, VW submitted something for testing which was fundamentally different from what went to market. Rather than being misleading, the test results were fundamentally irrelevant. Creating two completely different modes of behavior isn't something you could do by chance, and it means there's no real limit on how badly they could cheat.

SkyBelow · on June 30, 2020

There is a difference between memorizing enough information to ace a test and sneaking in notes that aren't allowed to ace a test. And people would also say it is wrong to steal a copy of a test and then memorize the answers to it to ace a test. But what if a professor uses the same test every year (maybe changing a few numbers but in a way that only impacts the calculations, not the way to solve it) and people study just the information needed to answer the test. Is that cheating?

tialaramex · on June 30, 2020

If you cheat in most such tests it just means you miss out on actually learning what you were supposed to. If it wasn't your intention to learn anyway I guess that's fine.

Rarely the purpose of tests is to assure the public of your fitness (e.g. a driving test) and cheating those might be a problem, but if you cheat my CS 101 course and then struggle because you needed remedial classes but the cheated test means you don't get them that's your problem.

bravoetch · on June 30, 2020

Another aspect is the incentives. Most discussion here is about the cheating itself, and not the reasons for it. I may not learn much from just writing about a degree I don't really have on my resume, or roles I never worked at, and experience I don't have. But I can get paid a lot more by doing so.

SkyBelow · on June 30, 2020

There are a few other issues with cheating, such as devaluing a degree for all others who didn't cheat to earn it.

kube-system · on June 30, 2020

Morally, they both seem to fall in the same category. Legally, it might be a complicated question.

coding123 · on June 30, 2020

That sounds exactly like the third sentence of the article.

> Sun managed to increase its score on 179.art (a sub-benchmark of specfp) by 12x with a compiler tweak that essentially re-wrote the benchmark kernel.

nwallin · on June 30, 2020

Yes, but you're talking about what VW did vs what Sun did, but the person you're replying to is talking about what VW vs what a company that makes a water heater does.

I agree that what Sun did is very similar to what VW did, with the exception that VW's increased emissions (statistically speaking) killed people, and what Sun did likely had no health impact on anybody except a few hurt paychecks.

kube-system · on June 30, 2020

Sort of, except that emissions testing is a regulatory requirement.

duxup · on June 30, 2020

VW's vehicles did not meet the standards required when used every day. They only met the standard during testing.

The water heaters don't sound like they'd fail any given test.

malandrew · on July 1, 2020

With the following question, I'm not absolving VW from criticism. With that in mind:

Why are we not holding those doing the measurement accountable as well?

If you produce a test that can be gamed and your job is to test things to meet consumer expectations, you've failed at your job.

After all is said and done, what is a better outcome: a) VW is punished for gaming the test b) the test is significantly harder to game

With (a), we have only one less manufacturer gaming the tests, VW. With (b) we have tests that none of the manufacturers can game any longer or at least will take time to game. The testers should be expected to always be two steps ahead.

This is not unlike whitehat/blackhat security engineering. We should pay bug bounties to teams that successfully exploit the tests and we should be actively running red team drills.

https://en.wikipedia.org/wiki/Red_team

dndvr · on July 1, 2020

What VW did is similar to uber's greyball system in that they give a different experience to the regulator rather than giving everyone an experience that is tuned to what a regulator might hope to see

apcragg · on June 30, 2020

Is that not outright fraud? Just because you can hack something doesn't make it right or legal.

bravoetch · on June 30, 2020

The system incentives are based on outcomes, not inputs. People are not benefiting from pointing out flaws in the system. They are benefiting from exploiting the flaws.

balfirevic · on June 30, 2020

How far out of optimal (for testing) position would you require their water temperature layers to be?

AmericanChopper · on June 30, 2020

Reminds me of the Ferrari fuel injection controversy in F1 last season. The gist of it is the theory that Ferrari were cheating the maximum fuel injection rates, by tuning their injection system to inject more fuel when the injection sensor wasn’t taking a reading.

https://www.motorsport.com/f1/news/analysis-fia-settlement-f...

sangfroid_bio · on June 30, 2020

There was a scandal a few years back when a Kaggle team did something simar with ML. They were treating the competition as a black box and optimising for the unknown dataset instead of actually building better AI algorithms.

vngzs · on June 30, 2020

This?

https://www.kaggle.com/c/petfinder-adoption-prediction/discu...

sangfroid_bio · on June 30, 2020

It goes back way further than that, the incidents that I am recalling involved academic teams and I think ImageNet. It happened quite a few times.

t_serpico · on June 30, 2020

Out of curiosity, why would the water form layers of different temperatures?

Gibbon1 · on June 30, 2020

I think because with a modern water heater especially electric models the heat loss is so low convection often stops completely. So the water in the tank doesn't mix. I think you can get situations too where the water at the level of the heating elements gets hotter and hotter and the water away from it gets cold. My water heater strongly suggests adding a mixing valve for that reason.

jodrellblank · on June 30, 2020

> "water at the level of the heating elements gets hotter and hotter and the water away from it gets cold."

Eh? How come that temperature differential doesn't cause convection? The colder water being more dense and sinking down, the warmer water being less dense and rising up.

If the heating elements aren't at the bottom, specifically for this reason, how come they aren't?

dmurray · on July 1, 2020

> If the heating elements aren't at the bottom, specifically for this reason, how come they aren't?

There are typically two heating elements, one near the bottom and one further up. You can use the higher one if you only need a half-tank of hot water.

I'm not sure if this is the complete answer to your other question, or if sometimes conditions are such that convection occurs extremely slowly despite a substantial temperature difference.

Gibbon1 · on July 1, 2020

I think perversely it's not the density profile in the vertical dimension that drives convection but the horizontal dimension. A hotter thus lower density but perfectly flat homogeneous layer can't displace the colder denser layer above it because the pressure between the two layers is uniform.

jodrellblank · on July 1, 2020

How can it become a perfectly flat homogeneous layer, from a heating element which looks like this[1]?

It's your claim "the water at the level of the heating elements gets hotter and hotter and the water away from it gets cold" which is surprising me; water being quite a good conductor of heat, and "efficient boiler" suggesting that the tank will be insulated so not much heat is lost to the outside world, so the water away from the element should warm up by conduction from the other water faster than it gets cold from losing heat through the insulation (in my head). OK maybe I can imagine that in a fixed pressure with no room for expansion, the warm water cannot be less dense, so there can't be much convection - but then won't the heat radiate and conduct outwards in a "sphere" from the heating element including up and down, and not make any layers?

And here[2] is a video of Thunderf00t putting an infra-red thermal camera on a closed water bottle with a peltier cooler attatched to the side of it, and showing that the cold water does sink, there is some convection happening with a cold "heating element" halfway up the "tank" on one side. (But there is compressible air in the top).

Maybe it's that you don't want to run a boiler long enough for all the water to get to a uniform temperature before you can get warm water out of it, and if you don't then you risk getting super hot and cold water unmixed from "impatience" more than anything else?

[1] https://ae01.alicdn.com/kf/HTB1_U_RSpXXXXbYapXXq6xXFXXXy/DC-...

[2] https://youtu.be/p9i1mhNsYXQ?t=903

Gibbon1 · on July 1, 2020

> but then won't the heat radiate and conduct outwards in a "sphere" from the heating element including up and down, and not make any layers?

Suspicion of mine is part of the answer is that the viscosity of water drops with temperature. Rising hot water in a warm layer hits the cold layer above it and spreads out then turns down. Also water is a middle of the road non metallic solid when it comes to heat conduction. But it has high heat capacity and is a liquid.

Complicating things stratification at least in a well designed water heater happens rarely. So it's easy enough to set up conditions where it doesn't happen. But of course it's a known problem. And not a totally solved one either. Suspect that simplified models probably don't display the behavior.

iamgopal · on July 1, 2020

I've maintained that water heater is the only equipment that is 100 percent efficient. Glad to know that is not the case.