
Goodhart's Law: When a measure becomes a target, it ceases to be a good measure - chrisshroba
https://en.wikipedia.org/wiki/Goodhart%27s_law
======
davidmanheim
This is probably where I should mention my two longer articles that discuss
this in depth; [https://www.ribbonfarm.com/2016/06/09/goodharts-law-and-
why-...](https://www.ribbonfarm.com/2016/06/09/goodharts-law-and-why-
measurement-is-hard/) [https://www.ribbonfarm.com/2016/09/29/soft-bias-of-
underspec...](https://www.ribbonfarm.com/2016/09/29/soft-bias-of-
underspecified-goals/)

As well as my more recent paper categorizing the different ways it occurs;
[https://arxiv.org/abs/1803.04585](https://arxiv.org/abs/1803.04585)

~~~
waltherg
Since you’re here, what does the following property stand for in your second
paper?

s ∈ AifM(s)/gec

~~~
defen
Not OP but that should probably render as: s ∈ A _if_ M(s) >= c

So: s (system state) is a member of A (permissible region) _if_ M(s) (proxy
measure of s since we can't directly observe it) is greater than or equal to c
(threshold)

------
nikanj
"You get what you measure" is the bane of KPIs everywhere. Measure deals
closed, and you get deals with no regard for profitability. Measure
profitability, and you will burn through your customer base by jacking prices
and cutting quality. Etc etc.

~~~
minimaxir
When I interviewed for a data analyst position at a known startup, I was asked
to define a target KPI for a controlled pseudo-A/B experiment, and I suggested
total revenue. The interviewer replied "that's a dumb answer because revenue
can be gamed; you should have said # of sales."

I was too taken aback to follow up asking why # of sales can't be gamed too.

~~~
abiox
> interviewer replied "that's a dumb answer

a bit ot, but was that the actual response?

personally, stuff like this would have me ending the interview early.

~~~
dredmorbius
And what is your gentle phrasing for that action?

~~~
breakingcups
It's an interview, not a competition. The interviewer would've been much
better of asking: "Why?"

~~~
dredmorbius
I meant for excusing yorself.

I'd tend to go to "I'm sorry, but I'm afraid I'm wasting your time," for
example.

------
inanutshellus
And as soon as a measure becomes known, it becomes the target.

IMO, the law ought to be "As soon as a measure becomes known it ceases to
become a good measure."

"I'm fat because I can't touch my toes? Hang on, I'll be back in three weeks."

"What's that? Your measure for gas consumption and co2 emissions has you go
exactly 35 MPH for 10 minutes? Hmm....."

~~~
niftich
Gamification increases as understanding improves.

Information is exploited.

~~~
stcredzero
[Targeting Intensifies]

------
njarboe
The breaking of this law when one uses profit as the measure of a successful
company or business is why I think that a well set up market system actually
works. People and companies can set up making a profit as their target and it
is still a good measure of success, because it is actually what you want.

The key for a good society is regulating the places where making max profits
can induce bad behavior and leaving alone where it does not. A hard thing to
get right but forcing people to inform you of the ingredients of food for
sale, is an example of a good regulation I think.

~~~
nkoren
Even disregarding the potential for negative externalities etc., profit alone
can still be a terrible KPI.

I worked for a large engineering consultancy that was very profit motivated.
They figured out a brilliant way to maximise profit: get rid of business
development! Only pay for work when clients are paying you to do it. This was
GREAT for quarterly profit margins, at the expense of the project pipeline.

As the pipeline dried up, they figured out a new way to maximize profit: get
rid of engineers! The customers are big, stupid organisations, and it'll take
them a long time to notice. You can _totally_ maximise quarterly profit
margins until they do.

And that's how you can simultaneously be highly profitable and in a corporate
death spiral.

Moral of the story: reality is a fundamentally complex place. You need
multiple KPIs to get a handle on it -- and ideally those KPIs should be
mutually contradictory enough to force you to engage with that complexity
head-on, rather than boiling things down to game-able metrics which ultimately
will defeat themselves because they are such a poor proxy for reality.

~~~
njarboe
I would say the complexity is in balancing profit over time. Here is your
contradictory force. In the end that large engineering consultancy was making
zero profit and was worth nothing. Whether running a business into the ground
will give you more overall profit than just selling it and putting the money
into government bonds is something one can consider, but in your case was the
consultancy owned by the people making these short-term profit maximizing
decisions? The agent/principal problem for companies has not been solved.

It is probably not a coincidence that the top 6 US companies by market cap
(Apple, Amazon, Alphabet, Microsoft, Facebook, Berkshire Hathaway) are still
run by their founders or were run by their founders for most of the companies
existence before a recent step-down (Microsoft) or death (Apple). Most public
companies aren't and they struggle to grow or even survive over time.

~~~
nkoren
> in your case was the consultancy owned by the people making these short-term
> profit maximizing decisions[?]

It was a publicly-listed company, so that'd be a yes.

Profit is obviously over time a _necessary_ condition for making a sustainable
business. But it's definitely not a _sufficient_ condition. I think there are
a huge number of people who don't understand this.

Eg.: when I see people clambering for TSLA to make a profit... I think they're
wrong. Or, at least, mostly wrong. TSLA has been unprofitable in the service
of dramatically (and rather effectively) expanding both capacity and market
share. Those are excellent reasons to be unprofitable. And TSLA still
obviously has a lot more room to grow in both respects, so in this situation,
trading profitability for growth is a good decision. (exhibit A: Amazon)

~~~
njarboe
>> in your case was the consultancy owned by the people making these short-
term profit maximizing decisions[?] >It was a publicly-listed company, so
that'd be a yes.

I'm a bit confused. I stated that public companies are not often owned by the
people who mange them and that is a real problem. You are asserting that all
publicly-listed companies are managed by their owners. That is just wrong. I
own both TSLA and AMZN and of course there are hundreds of things that need to
be done to make a sustainable business. Just talking past each other at this
point, I think.

~~~
nkoren
Ah, sorry, you're correct. The company was owned by people who _wanted_ short-
term profit maximising decisions (its shareholders). The people actually
_making_ the decisions also owned shares, but nowhere near a controlling
interest.

Anyhow, I agree with your point about companies that are controlled by
founders (whether public or not). A (sufficiently pragmatic) visionary with a
controlling interest can achieve things that no amount shareholder-focused
governance can. Of course they can also drive a company into the ground faster
than anyone -- but the heights are reserved for them.

------
pwagland
Indeed this is the whole idea behind OKR. The Objectives are _supposed_ to be
grand, audacious, and only achievable under extreme stretching. Furthermore,
they are _supposed_ to not be used as a KPI for performance reviews.

------
kolbe
I wish more people knew and understood this concept.

It's important to know the impact that you have on a system, especially at
large scale. Decades after Goodhart made this law, we still have shoddy credit
risk measures that put tens of trillions of dollars at risk just in the US
alone. Hell, the last recession alone was mostly caused by the government's
choice of credit worthiness metrics.

~~~
velp
Can you explain how the recession was caused by "government's choice of credit
worthiness metrics", instead of rollbacks of regulation?

~~~
kolbe
I'm not sure why you think these things are mutually exclusive. And as far as
Goodhart's Law is concerned in this case, they're intertwined concepts.

The government's choice to allow securities rated by S&P/Moody's to be treated
as gospel allowed/encouraged banks, insurance companies, GSE's and pension
funds to take far more risk than was systemically reasonable.

------
logicallee
Aren't there are exceptions? For example, a measure of health might be life
expectancy. If this becomes known, is it no longer a good measure? I think
it's still a good measure. I mean, how can you get to a life expectancy of 100
without a healthy population?

~~~
triviatise
Life expectancy is actually a really great example of the opposite. People
routinely claim our health system is bad compared to europe because our
average life expectancy is lower.

It turns out african americans skew our metrics because they have

1) high infant mortality 2) high murder rates of young men

The high infant mortality is irrespective of wealth and has a disproportionate
impact on average life expectancy because the death of a 0 age person has a
huge impact on the average. When you look at the rest of the population, our
whites are on par with europe and our asians are on par with asian countries.

One way this is gamed is that some countries have different standards for
reporting infant mortality.

<<The infant mortality rate is defined as the number of deaths of children
under one year of age, expressed per 1 000 live births. Some of the
international variation in infant mortality rates is due to variations among
countries in registering practices for premature infants. The United States
and Canada are two countries which register a much higher proportion of babies
weighing less than 500g, with low odds of survival, resulting in higher
reported infant mortality. In Europe, several countries apply a minimum
gestational age of 22 weeks (or a birth weight threshold of 500g) for babies
to be registered as live births. This indicator is measured in terms of deaths
per 1 000 live births.>>

In the US it is gamed in reverse in the sense that the average hides the true
problem is with subpopulations and not the entire US medical system as a
whole. Our (gamed) poor performance in this statistic is used as a measure to
support moving to a substantially different system.

~~~
scarmig
It's not like Europe doesn't have marginalized populations. Your argument is
basically "if you exclude the parts of the US population that are worst
treated by our healthcare system, then ours is the best!" There is something
meaningful in comparing the "median" or "modal" health experience in different
countries, but "we treat black people badly compared to white people in the
USA so you have to only look at white statistics" is a terrible defense of the
system.

Your point about infant mortality statistics being gamed is on point, though
(and I pointed it out in a sibling commment).

------
dfee
Another article on the front page decries Harvard’s holistic approach to
admission - with arguments that admission should be based on a single measure
of academic performance (like standardized test scores, GPA, etc.).

~~~
lostcolony
Yes, but holistic approaches have -also- become a target. Think of the over-
achieving high schoolers doing as many extra-curriculars as possible just for
their college application.

------
stcredzero
During the Vietnam War, it was body count. That turned out to be a great way
of incentivizing the most murderous troops.

~~~
AcerbicZero
Probably more of an incentive for NCO's and Officers to overclaim the results,
so they look good to their superiors.

~~~
stcredzero
That seems to happen in all wars. Aircraft downed and tanks destroyed in WWII,
for example.

There were occasions in the Vietnam War where regions were declared to be free
fire zones, and everyone within was to be killed, down to the last child and
granny.

~~~
sonnyblarney
I'm sorry but this is a serious (and arguably offensive) falsehood.

A 'free fire' zone is an area wherein there is no requirement for soldiers to
coordinate heavy arms fire with other units via HQ etc.. This is a very
reasonable policy in jungle fights where units can come under ambush at any
time, i.e. "We're in an ambush and we're about to die, can you ring up HQ to
see if we can shoot back?"

But the notion that US forces would just designate an area where everyone
including civilians are to be killed is an outright and offensive lie.

Though there are some situations in which Army Units did actively and
knowingly fire on citizens, these are known and well documented tragedies of
soldiers acting with fury in the moment agains villagers who were supporting
insurgent forces - this was not and never was any kind of US policy, and the
perpetrators faced Court Martial.

The Vietnam war is so completely misinterpreted in pop culture; even to this
day, it's disturbing.

~~~
stcredzero
_I 'm sorry but this is a serious (and arguably offensive) falsehood._

I'm sorry, but I will show you to be mistaken below.

 _But the notion that US forces would just designate an area where everyone
including civilians are to be killed is an outright and offensive lie._

Then you should have an issue with the Ken Burns documentary. I think I got
the terminology wrong; "free fire zone" means something else. However, the
practice I am referring -- that of designating everyone in a certain place as
an enemy, then killing everyone within -- did happen.

[https://en.wikipedia.org/wiki/My_Lai_Massacre](https://en.wikipedia.org/wiki/My_Lai_Massacre)

From Ken Burn's "The Vietnam War" Part 5 narration: (00:44:43,393 -->
00:44:47,428)

 _In the summer of 1967, Tiger Force was sent to the fertile Song Ve Valley.
The entire population had already been herded from their homes and crowded
into a refugee camp. But some had come back to resume the farming they had
always done. The valley had officially been declared a free-fire zone, and
Tiger Force 's officers took that literally. "There are no friendlies," one
lieutenant told his men. "Shoot anything that moves." Over a seven-month
period, they killed scores of unarmed civilians. Among their victims were two
blind brothers; an elderly Buddhist monk; women, children, and old people
hiding in underground shelters; and three farmers trying to plant rice. All
were reported as "enemy... killed in action."_

~~~
sonnyblarney
No, you have misinterpreted.

The US does not declare that US forces should just go and kill anyone, rather,
'anyone' can be considered a target, which is perfectly reasonable policy
while fighting an insurgency.

In normal rules of engagement, the women and children running out of the
village are obviously out of bounds. They are civilians. You can't fire at
them for any reason.

In a 'free fire' zone, anyone can be considered an enemy combatant if the
situation indicates as such. The woman running out of the village with an
AK-47 - is 'in bounds' as an enemy combatant in an area where villagers are
considered on the side of the insurgents. Obviously, random civilians are not.

The misinterpretation of 'free fire' or the 'loose application' of it are
considered 'war crimes' by the US and it was absolutely never intended that
the policy be used by soldiers to just shoot up people - your implication that
it was used this way is completely false.

The My Lai massacre was a war crime, not an intentional act sanctioned by US
forces.

Also: war is dirty. There are no wars without war crimes. Once the bad genie
comes out ... it spills all around, the best we can do is contain it.

FYI that's one of the hardest things to do, i.e. make those kinds of calls in
the field - often it's not obvious.

Go and watch one of the documentaries that delve into US forces wrangling over
a decision to drop munitions, and consider how tricky those decisions are ...
so many factors.

~~~
stcredzero
> The US does not declare that US forces should just go and kill anyone,
> rather, 'anyone' can be considered a target

 _" There are no friendlies," one lieutenant told his men. "Shoot anything
that moves." Over a seven-month period, they killed scores of unarmed
civilians. Among their victims were two blind brothers; an elderly Buddhist
monk; women, children, and old people hiding in underground shelters; and
three farmers trying to plant rice. All were reported as "enemy... killed in
action."_

> In normal rules of engagement, the women and children running out of the
> village are obviously out of bounds. They are civilians. You can't fire at
> them for any reason.

During the Korean War, my grandmother had her life saved by a wad of cash
(large, because of wartime inflation) hidden in a secret pocket in her skirt.
It stopped a piece of shrapnel produced by strafing from a US Douglas A-1
Skyraider. Was she "in bounds?"

> The misinterpretation of 'free fire' or the 'loose application' of it are
> considered 'war crimes' by the US and it was absolutely never intended that
> the policy be used by soldiers to just shoot up people

Can you see how using Body Count as a metric would tend to produce "targeting"
of the metric which would result in the targeting of people to turn into
corpses? That is my whole point here.

~~~
sonnyblarney
"It stopped a piece of shrapnel produced by strafing from a US Douglas A-1
Skyraider. Was she "in bounds?""

Civilian casualties are generally avoided if possible. I'm happy that your
grandmother was saved, but it's not relevant.

"Can you see how using Body Count as a metric would tend to produce
"targeting" of the metric which would result in the targeting of people to
turn into corpses?"

No. 'Body count' is used in every war and has been since the dawn of time, and
we continue to use it today as one of many metrics.

Using this metric will not encourage professional soldiers to arbitrarily
murder civilians.

------
triviatise
We hold people accountable for actions instead of outcomes, since outcomes
(even profit) can be gamed at the overall detriment of the organization.

We talk about the outcome we want, look at the levers that will get the
outcome, and then agree to the actions that we are going to take.

If the actions dont result in the forecasted outcome then the underlying
assumptions are wrong. The assumptions are wrong if 1) people arent doing the
tasks properly 2) the relationship between the task and the outcome are
different than what we thought. If the issue is 1, we provide training. If the
issue is 2, we use the revised assumption for future forecasts.

We dont use the word target anymore, we use the word forecast.

------
dqpb
This is one reason why the grading system in schools is absurd and
incompetent.

------
biastoact
Without dates and metrics you can’t measure progress. Like mile markers along
the road (back before GPS / Smartphones). But without continuous, high trust,
collaborative conversation you can’t actually make progress. That’s the
gasoline powering you down the road. The goal should be Disneyland (shouldn’t
it always be Disneyland?), not the next mile marker.

------
fallingfrog
That's why you know your company is in trouble when shareholder becomes the
customer and the stock price becomes the product..

------
tlarkworthy
This is also demonstrated quite well in machine learning with the test set.
The target is test error, but you don't let algorithms optimize that directly
and instead they optimize against training error. Goodhart's law is a
manifestation of over-fitting in social systems.

------
gojomo
And thus for performance measures, "security through obscurity" can make
sense.

------
stcredzero
The same principle applies to attributes and qualities which mark successful
entrepreneurs. The same principle applies to attributes and qualities which
mark good developers. The same principle applies to attributes and qualities
which mark good employees. Good people. Good politicians. Activists. Good
pundits. Intellectuals. Top tier school degrees.

Hacker News karma.

Shouldn't we all be looking out for the "targeting" and be trying to find the
next measure?

------
em3rgent0rdr
Reminds me of all those computer architecture papers that optimize for IPC to
the detriment of overall performance. Or the CPU frequency wars which led to
ever deeper pipelines which were completely obliterated with branch
mispredicts. Or the current trend of increasing core count without actually
being able to do any useful work on them...

------
RelaxSelf
yep, seen several idiots whose work habits are destructive be promoted for
gaming stats

seen stats make people discourage/delay income/work to avoid anomalies/spikes
in stats because if there's a spike it makes it look like you are slacking the
next month, or makes a manager think you should be able to constantly hit the
spike

------
mlthoughts2018
So we should minimize the fraction of measures which have become targets then?

~~~
smallnamespace
You shouldn't optimize at all but instead pick multiple, 'distantly related'
measures and satisfice over them.

That's sort of what humans do to have a robust objective function. Sufficient
food, water, physical safety, love, social interaction, meaningful work,
spiritual fulfillment—fulfill each of them and humans generally move on to the
next 'unfulfilled' goal on the list rather than endlessly maximizing something
that ceases to improve their life quality.

~~~
maxxxxx
Exactly. Don't reduce everything down to one factor.

In business you need profit AND customer retention AND ongoing development AND
some level of luck AND a lot of other factors.

Same for health. You need enough sleep AND good food AND a good mental state
AND good genes AND a lot of other things.

Focusing on only one thing while neglecting others doesn't work.

~~~
mlthoughts2018
But then your "one factor" just becomes a combination of various other
factors. I don't think the point of Goodhart's Law is related to "over-
optimizing for just one thing," rather it's a statement about whether _any_
metric (even a broad, diverse metric that takes into account multiple
different things) can retain informativeness once it is set up as an
optimization target.

~~~
maxxxxx
I think you could set up an optimization target for a company that says "be
profitable and grow for the next 30 years". You would see quickly that this
target is a complex beast with a lot of different factors at play that need to
be balanced against each other. I don't think Goodhart's law would apply here.

------
Angostura
Serious question - it may no longer be a good measure, but does that
_necessarily_ mean it is a poor target?

Assuming targets are necessary, do they not have to based on done measure?

------
mathattack
I’ve seen this a lot. Even people who intellectually know that correlation
doesn’t equal causality make this mistake all the time.

~~~
smallnamespace
It's not even that correlation doesn't equal causality, it's that the
correlation may also completely reverse.

------
joker3
People respond more or less rationally to their incentives. Who knew?

------
mitko
I've seen similar topics come up on HN front page every once in a while.

Here are some of my thoughts on the topic, through the prism of metrics for a
company.

[http://dimitarsimeonov.com/2018/03/23/the-vanity-metric-
para...](http://dimitarsimeonov.com/2018/03/23/the-vanity-metric-paradox)

TLDR of my post: The vanity metric paradox - every metric, sufficiently
optimized becomes a vanity metric, as it stops being the make-or-break
qualifier for a company.

------
notEvenOnce
(cough! cough! "scrum")

(cough! cough! "agile")

(cough! cough! "daily stand-ups")

~~~
jingleheimer
I always felt that agile processes were a means to give non-answers to
performance measurement in order to allay the consequences of that measurement
and allow developers to get on with the business of making software.

How big is the task; 10 story points

How long will that take; depends on our velocity

Can it be done by this date; we're not allowed to plan that far ahead. You can
put it in the 'icebox'. We might get to it eventually.

~~~
pmwhite
Yes, definitely true.

If (1) everyone is highly effective in getting tasks done, and (2) working on
tasks that are reasonably well prioritized by both technical risks and
business value, why exactly do you need to know more?

There are legitimate reasons that a business may need a date. But making sure
people are working harder than hard by beating them over the head with a date
coughed out by a gantt chart is not one of the legitimate reasons.

