
How Measurement Fails Doctors and Teachers - e15ctr0n
http://www.nytimes.com/2016/01/17/opinion/sunday/how-measurement-fails-doctors-and-teachers.html
======
brohoolio
My wife has a great doctor. We lost our first kid and this doctor took tons of
extra time helping us in the second pregnancy. From fitting us into her
already full schedule to taking 10 extra minutes in an appointment to calm my
wife down.

There is nothing about the experience that could be captured in these metrics.
But I can tell you that she kept my wife afloat during the bad days and
probably kept her from seeing another councilor. Ultimately we had a baby girl
the second time around. The kindness and as cited in the article, love, that
the doctor showed us really really helped us.

~~~
randycupertino
Performance metrics sound like their empirically great, but as we all know,
it's more complicated than that.

Personally, I've seen important programs, such as primary care-based
depression screening and management that are proven to work then be turned
into checklists and given to teams that are about checklist first, and less
focused on mission/impact/caring. And I've seen patients respond that they are
not helped. The therapeutic value of perceived empathy and attentiveness is
unknown but should not be underestimated.

------
danieltillett
The thing I am always amazed by people who developed metrics is they never
seem to stop and think how the metric will be gamed and then take action to
prevent this. It should be part of the development of any new metric to have a
"evil" team work out all the ways the metric can be gamed so this can be
blocked.

For example, it should be impossible to "teach the test" other than by
teaching broadly the underlying concepts being tested. A well designed test
will touch on so many concepts that only a education that is broad will work.

~~~
acbart
Why do you assume they don't stop to think about that? People who develop
assessments usually think quite rigorously about exactly such problems,
assuming they're following proper methods. Is it so surprising to you that
this is a very difficult problem?

I would compare this to asking, "Why don't software developers think about
bugs?" They do, and they attempt to handle them, but it's almost impossible to
solve all such problems in programs (tests) of non-trivial size.

~~~
danieltillett
If they did then it would not be possible to “teach the test” other than by
teaching the full subject. If there are any shortcuts a teacher can take with
the class then the test is a failure.

I used to write tests all the time in my subjects and my tests could not be
gamed. I would have half multiple choice questions drawn from every area we
covered in the semester and half essay questions where the students had to
integrate what they learned into a coherent answer. My students would ask me
what they had to learn for the final exam and I would always say everything -
if it wasn’t going to be examined there was no point teaching it because the
students won’t both learning it :)

------
WalterBright
The irony is that in every organization, everyone knows who the performers and
who the slackers are. But nobody wants to talk about that in deciding who gets
promoted and who gets set aside.

A manager once told me that before the employee performance review cycle he
had already decided who was going to get what raise. He'd start from that and
work backwards to derive the required "metrics" to support it. The right
people got the raises, and the HR department was satisfied with the scientific
process :-)

~~~
danieltillett
I don't disagree with you that this is how things work, but my god this
process is prone to corruption, both deliberate and unconscious. As a manager
it is very hard to not reward your allies and punish your enemies and if you
have a process that enables this then this is exactly what happens.

~~~
WalterBright
1\. I wouldn't consider working for a boss who regarded me as an enemy. 2\.
The manager also gets evaluated, and word will get around if he's promoting
cronies, and if his dept gets poor results as a consequence. 3\. I mentioned
that everyone knows who is naughty and who is nice. A rating system can be as
simple as "who do the patients prefer" and "which teachers do the parents want
for their kids". I expect that would be pretty accurate and pretty hard to
game.

~~~
danieltillett
Walter nobody wants to work for a boss that considers them an enemy hence why
there is so much politics in the workplace. Almost nobody is honest to their
boss about what they think of them.

Word might get around about the boss, but when your job is on the line most
people tow the line - as the Japanese say the nail that sticks out is the one
that is hammered.

The problem with using metrics of what patients and parents think is this is
not a good measure of what makes a good doctor or teacher. You can be an
incompetent doctor with a fantastic bedside manner or a teacher all the
parents relate to but who can't teach a fish to swim. The metric here is not
the one you really want optimised.

On this topic I personally prefer doctors who have very poor beside manner
because I assume they must have optimised competence over personal relations.

~~~
WalterBright
> The problem with using metrics of what patients and parents think is this is
> not a good measure of what makes a good doctor or teacher.

I think it's natural to assume this is not a good measure. Let's take
teachers. I don't believe a teacher can consistently fool parents into
believing they've taught the kid a lot when the kid is obviously not learning.
And bluntly, the teacher is supposed to be working for the parent. Why
shouldn't the parents decide who they want teaching their kids?

------
darawk
This article is idiotic. If you have a problem with tests disincentivizing
physical fitness, add a test for physical fitness. What's the problem?

And in medicine the problem is even more extreme, and comes down even more
obviously in favor of _more_ , not less, measurement. Here's an example:

[http://www.nejm.org/doi/full/10.1056/NEJMsa0810119](http://www.nejm.org/doi/full/10.1056/NEJMsa0810119)

~~~
pak
You found a singular NEJM paper by the celebrated "checklist champion" Atul
Gawande for a method that applies only to one specialty, surgery. How can you
be sure that this generalizes to all medicine?

Given that doctors in most other fields spend a ridiculous (by some metrics,
7:1) ratio of time doing paperwork or data entry vs. seeing the patient [1], I
would suggest that there is serious opportunity cost in adding more self-
measurements and checklists to most doctors' workload. Surgeons are likely an
outlier that perhaps benefit most from being regressed to the mean.

[1]: [http://well.blogs.nytimes.com/2013/05/30/for-new-
doctors-8-m...](http://well.blogs.nytimes.com/2013/05/30/for-new-
doctors-8-minutes-per-patient/)

~~~
darawk
I can be sure that it generalizes to all medicine because it's obvious. It
generalizes to all fields. It's why we write unit tests. Nobody likes writing
tests, but we do it anyway because they _work_. Like, really, really well. The
idea that doctors would object to being asked to do the same when people's
lives hang in the balance is crazy to me.

Now, it's certainly possible they're doing a whole bunch of useless paperwork.
And it's certainly possible that there is too much in some areas - but that
isn't a rejection of metrics and objective criteria. It's a rejection of the
particular implementations of those things in particular places.

EDIT: To be clear, I think that what is actually needed is an overhaul of the
tech. used to collect these metrics and fill out this paperwork. It's
absolutely insane to me that doctors still use paper _at all_ , or that they
have to enter the same data multiple times in multiple places, constantly.
That needs to be fixed. And they need to start collecting 100x more data on
their performance with 100x less friction of collection, by upgrading their
technology in this regard from what was available in 1970.

~~~
pak
I can totally understand your sentiment. I would just note that writing a unit
test for a class and designing a metric that unambiguously measures good
clinical care are on very different levels of difficulty. In our research
group, we've tried to do some of the latter regarding infection control, and
there is definitely a tradeoff between making metrics explainable in plain
language (allowing mental buy-in by healthcare workers) and measuring
attributable differences in performance, which usually requires a heap of
tricky statistical corrections that few people can grok.

It would be a very interesting future where healthcare workers have the reams
of stats that professional athletes do, and where everybody in the space is
literate enough to intuitively understand each number and its caveats, like
hardcore sports fans. For example, an ICU nurse is bound to encounter more
hospital-acquired infections (HAIs) per week than a nurse in pre-op
assessment, and so averaged cases of HAI per primary unit is the proper
context for that stat, much like you'd weigh recent rushing yards for an NFL
linebacker against the strength of the defenses faced. Then, managers could
optimize their teams to each person's strength or weakness and/or supply
training in the correct areas, much like coaches in sports do. This would be a
sea change in the medical culture, though, and it would encounter staunch
resistance, because some of the metrics already being pushed on doctors (#
pts/day) arguably incentivize worse care.

To your last point, part of my research now is analyzing electronic medical
record (EMR) data, and coming from any other field you'd be shocked at how
messy it can be. In a way, this is to be expected, because the "vocabularies"
for medical data are huge, and even widely used ontologies have bizarre
properties [1]. I really hope that we can solve the data collection problems
with better technology, but as of now, the incentives for EMR vendors to
overhaul the tech is pretty low, now that most US hospitals have picked a
system and those vendors have everything to lose and little to gain from
overhauling their user interface.

[1]: [http://www.healthcaredive.com/news/the-16-most-absurd-
icd-10...](http://www.healthcaredive.com/news/the-16-most-absurd-
icd-10-codes/285737/)

------
Outdoorsman
>Whatever we do, we have to ask our clinicians and teachers whether
measurement is working, and truly listen when they tell us that it isn’t.
Today, that is precisely what they’re saying.>

The simplest solution, which very often turns out to be the most effective and
least expensive solution, is to ask those personally involved in a system to
improve it, themselves...if they have trouble getting started offer incentives
for them to do so...

That (first) step is all too often skipped in our "age of impatience"...

~~~
doppioandante
I'm not sure I've understood your point, but that has some problems. My
parents are high school teachers in Italy. Right now the government is
requiring a basic evaluation system for teachers. Tests here are hardly ever
standardized and can vary a lot; tests for things like Latin that is still
taught like 100 years ago would be impossible to standardize anyway, so
assessing student performance is impossible. They could not come up with some
metrics for whatever reason, and now teachers have to decide these criteria by
themselves. Of course nobody agrees on what such criteria should be, and those
I've heard from my parents are hardly useful to improve teaching quality.

~~~
Outdoorsman
I think I understand the difficulties you identify...maybe I should have done
a better job expanding upon, or explaining, what I'm advocating, which is
simply the idea of "continuous improvement"...

In short, I'm in favor of involving those most closely related (those in
direct contact, those in the actual environment) to what is perceived as a
"problem" in devising a solution, or solutions...versus having an outside
entity recommending improvements that might, or might not, improve outcomes...

Improvement is often hard, and when it comes it comes in
increments...sometimes those increments are incredibly small, but even
marginal improvements contribute to the success of the whole...even without
"metrics" I think most teachers know, or sense, the difference in student
outcomes their approaches have led to...

I'll bet (although reaching a consensus is difficult, especially across
academic disciplines) that each teacher, at the end of a course, has in mind
some way that their efforts could have been at least marginally more
productive and could quickly answer the question, "What should I do
differently next time?" ...

The best teachers do this automatically...and it makes all the difference...

------
Spooky23
Bean counting is less effective if you don't know what a bean is.

------
scottshepard
Being held accountable to metrics will incentivize some behaviors over others.
It's not surprising that when Arts and Phys Ed are not on the test, they are
dropped from the curriculum. I think student, teacher, doctor and patient
happiness should be one of the key metrics that is measured and reported along
with test scores and mortality rates. That would incentivize schools and
hospitals to take a more balanced approach.

~~~
crpatino
> I think student, ... happiness should be one of the key metrics

I have seen it first hand. Students are always happiest when a charming
instructor provides bland infotainment that allows them to sustain a pretense
of high achievement without trying very hard. The second hapiest students are
those of mediocre teachers who give good grades to everyone so no one is very
motivated to rock the boat. As a matter of fact, the teacher has to descend
pretty low in terms of incompetence so the students will actually be less
happy than if presented with a lecturer that knows their subject and expects
them to put an actual effort in learning.

Of course, all things being equal, stern lecturers are more disliked than more
relaxed instructors. This is not what I am taking about.

------
james1071
Managements often like to use 'performance metrics' and other statistics as a
means of providing cover for their own decisions.

They hide behind the numbers, which they have rigged, to avoid taking any
blame.

------
alberte
I think in all these discussions of bean counters, the underlying assumptions
non bean counters make is that the bean counters are trying to improve things.
That, in my experience, is seldom the case. Mostly the act of bean counting is
to create positions for bean counters, and insert themselves into the process.
There is no end game in this, they will happily have meetings to discuss
processes and improve metrics, because to them, bean counting is all there is.
Without bean counting they would be unemployed. Once you let them in, you're
screwed, cause they're impossible to get out.

Edit: Why the down vote? Perhaps some argument? (Oh, of course, it was
probably a bean counter)

I posted this as a very real comment - once you accept that what you are doing
is an item measurable by parties that aren't professionals in your field (bean
counters) then any qualitative aspects of the profession are lost. Only the
quantifiable is left, and because the bean counter doesn't communicate using
the language of the profession - you are forced to learn their language. A
language of quantifiables and reduction to line items. The rise of the
professional manager who has necessarily no expertise in the field they are
managing must of necessity reduce all non quantifiables to a 0 value. Thus you
end up in the pickle that educators and physicians find themselves in. In the
past administration was performed by members of the profession and so they
could communicate using the same language.

