
High-achieving teacher sues state over evaluation labeling her ‘ineffective’ - ColinWright
http://www.washingtonpost.com/blogs/answer-sheet/wp/2014/10/31/high-achieving-teacher-sues-state-over-evaluation-labeling-her-ineffective/?tid=sm_fb
======
gabemart
> If it sounds as if it doesn’t make a lot of sense, that’s because it
> doesn’t.

To me, it sounds like it makes absolute sense. If you want to evaluate how
good a teacher is, one of the things you need to measure is how well their
students performed vs. how well those same students would have performed under
the hypothetical average teacher.

The particular VAM system used to estimate this metric may be flawed or even
completely broken. But at the point in the article this pull quote is located,
that argument had not been made.

> VAMs are generally based on standardized test scores and do not directly
> measure potential teacher contributions toward other student outcomes.

> VAMs typically measure correlation, not causation: Effects – positive or
> negative – attributed to a teacher may actually be caused by other factors
> that are not captured in the model.

These are criticisms of the VAM system, but I imagine they are equally valid
criticisms of other methods of teacher evaluation. I find it hard to imagine a
practical system of evaluation that accounts for all background factors.

> The lawsuit shows that Lederman’s students traditionally perform much higher
> on math and English Language Arts standardized tests than average fourth-
> grade classes in the state.

This argument uses the same standardized test scores and confounding
background variables that the VAM system is criticized for! Further, it seems
obvious that a set of students doing well does not, in isolation, indicate
their teacher is a good teacher. Students in a top set or from a great school
or from a wealthy neighborhood might be expected to outperform state averages
regardless of teacher quality.

My biggest problem with the article is it doesn't describe alternate methods
of teacher evaluation. VAM may be flawed, but how does it compare to the other
methods? If we accept that teacher evaluation based on student performance is
necessary (is it? I have no idea!), what's a better way to do it?

~~~
pdonis
_> If you want to evaluate how good a teacher is, one of the things you need
to measure is how well their students performed vs. how well those same
students would have performed under the hypothetical average teacher._

But you can't measure that.

 _> it seems obvious that a set of students doing well does not, in isolation,
indicate their teacher is a good teacher._

True. But in this case, at least, we're not talking about "in isolation".
We're talking about a 17-year track record of a teacher's students doing well.

 _> what's a better way to do it?_

When I went to school, the people who made these judgments were my parents.
You can't make these judgments by formula, and you can't make them if you
don't know the details of each individual case. To me, the fact that so many
schools are fixated on "data-driven" student evaluations means that parents
are not engaged.

~~~
nickff
> _" When I went to school, the people who made these judgments were my
> parents. You can't make these judgments by formula, and you can't make them
> if you don't know the details of each individual case. To me, the fact that
> so many schools are fixated on "data-driven" student evaluations means that
> parents are not engaged."_

Many parents know their children are receiving a substandard (or even
damaging) 'education'; the problem they often face is that they are powerless
to fire the teachers, or switch schools. Being 'engaged' does basically
nothing to fix the schools in these cases. Rich parents exercise school choice
by moving to affluent communities with good schools, but the poor often do not
have this option.

The VAMs aim to reliably give the school administrators access to the
knowledge the parents (and often principals) usually already have, as well as
give cause for discipline, incentive, or firing.

~~~
moogleii
I could be interpreting your comment wrong, but it seems to put the onus
heavily on the teacher, which seems a bit naive. My friend just started
teaching in New York actually. And she specifically wanted to teach at a Title
1 school (i.e. poor) in order to reach students with less income and
opportunity (she teaches an East Asian language in a predominantly non-Asian
minority community). That was the idealized plan anyway. Yet, the school won't
give her basic supplies. There is no lounge. No fridge. Ok, fine, that's
tolerable (odd to me since I too was raised in an affluent community). But the
teacher's need paper. There is a locked up supply room full of supplies, but
the teachers are told they can't have access to it. Why? Nobody knows.
Printing rights are curbed.

And on top of that all, NYC DoE requires a masters. That equals debt, if you
went to a "good" school. But the DoE pays crap. Oh and for some reason at one
point, the DoE lowered requirements to be a principal. 3 years of part time
teaching, and you could be a principal. Her principal taught dancing for 3
years, and is now purportedly fit to analyze the effectiveness of foreign
language teaching methods. But anyway, that's beside the point, that's just
describing the environment.

Her program is new. But instead of asking students/parents and filtering for
those who might be interested in an East Asian language, the administrators
decided to randomly force students into the class, regardless of their prior
language history or their year. You know what you get with that? Hostile
students. The kind that scream fuck you in your face. OK, fine, just
"standard" difficult students. But like any other job, if your "manager" has
your back, you can usually deal. But this school doesn't believe in detention.
Okay... The "dream" is that if a student is causing trouble, the deans will
talk to said student. Uh oh, what wasn't accounted for was the saturation of
the deans time due to trouble students. Now you got deans telling you that,
sorry, they can't deal with disruptive students telling you to fuck off
because they're overloaded. So now you have fire support and you're in the
trenches alone. I'm not even going to get into the gay teacher's story. Once
the students caught onto that...

This doesn't even include off-the-clock work that is required, which I won't
get into. I'm sorry, but as someone in the tech field, or any privatized
field, the shit that teachers have to put off with is insane.

I know teachers have been demonized, but the turnover rate in NYC for teachers
is apparently extremely high (I have another friend working at the DoE
itself). I'll have to get a source, but I seem to recall her saying it was
around 70% after 2 years. And after hearing all the ridiculous war stories,
I'm not surprised. Ha, another one of my teacher friends was moved to an empty
classroom. Upon asking for desks, the administrators told her "we don't know
where they are" and left it at that. So she had to essentially salvage desks
marked for discard.

I'm not exactly sure how VAM works, but I'm skeptical that an algorithm can
model something so complex.

~~~
michaelchisari
> I'm not exactly sure how VAM works, but I'm skeptical that an algorithm can
> model something so complex.

And even if it could, should an algorithm like this be obscured from view?
Should we rely on "black box" algorithms like this, or should we at least
insist that, if not the code, the research behind the code be released openly
so that it can be held to proper scrutiny?

------
randyrand
> The lawsuit shows that Lederman’s students traditionally perform much higher
> on math and English Language Arts standardized tests than average fourth-
> grade classes in the state. In 2012-13, 68.75 percent of her students met or
> exceeded state standards in both English and math.

Wow. WaPo is critical of VAMs and then they use a state-to-classroom
comparison to show that she is effective? Doing better or worse than the state
average is just about the _worst_ measure of teachers performance. Entire
school districts tend to perform well mainly as a measure of how well-to-do
that school district is.

This is excaltly the comparisoin VAMs are trying to prevent. Measuring value
_added_ as opposed to the value that was already there. To think that WaPo
thinks you can measure teacher performance in such a naive state vs classroom
way really detracts from the article.

~~~
modzu
very interesting issues, but indeed the wapo article is garbage..

for those asking: [http://en.wikipedia.org/wiki/Value-
added_modeling](http://en.wikipedia.org/wiki/Value-added_modeling)

~~~
graycat
Good. See also my related <rant> below.

------
aidenn0
VAMs are an issue for the reasons mentioned in the article, but just because a
teacher reliably produces the highest performing students does not mean s/he
is a good teacher.

There was a teacher who was the only teacher of highest track Algebra II class
at a local HS who had such a terrible reputation among the students that some
would drop down one track in math to avoid having her. Numerous students (and
their parents) complained to the administration and the official reply was:
"Our top math students all came out of her class" which was a rather specious
argument since all of the top math students also went into her class.

Unofficially she was close to retirement age, had seniority in the department,
and nobody wanted to poke the beehive of forcefully reassigning her classes.

~~~
th0br0
Shouldn't, on the other hand, the fact that "all the top math students also
went into her class" be counted as positive for her? I mean, based upon your
description, those who attended her classes excelled. That is good. So I guess
parents didn't want to put their children into her classes because that
wouldn't have worked out; but maybe the reason for that is that their children
were simply unfit for the level?

~~~
tjradcliffe
Read the description carefully:

> the only teacher of highest track Algebra II class at a local HS

Being the only teacher of the highest track Algebra II class in the school
means that anyone who wanted to take the highest track Algebra II class--which
plausibly contains many of the best math students--would have to take it from
this teacher.

The whole point of the kind of value added modelling that is the centre of
this case is that it attempts to factor out things like the quality of the
student to estimate the quality of the teacher, precisely so that bad teachers
who by dint of circumstances are associated with high-performing students
don't get high ratings.

The problem is that if student background counts for the greater part of
performance even a good teacher may have difficulty scoring highly if they
happen to get a "good" class (one that scores highly on student quality.)

On a larger scale, the anecdote we are discussing here suggests that teachers
as individuals may not make that much difference to student's performance,
since this bad teacher was still able to turn out the best performing students
thanks (one is supposed to presume) selection effects alone.

------
SavvyGuard
Reading through the report from the ASA (that doesn't really "slam" the VAM
statistic but rightly points out the flaws inherent to any attempt to use
statistics in areas with many confounding factors), it appears as if the VAM
is usually derived thusly:

1\. Calculate a regression model for a student's expected standardized test
scores based off of background variables (like previous scores, socioeconomic
status etc). This includes having teacher's as variables. 2\. Use the
coefficient for the teacher as determined by the model to determine the
teacher's "Value Added" metric.

The weaknesses in such an approach are also spelled out in the report: namely,
missing background variables, lack of precision, and a lack of time to test
for the effectiveness of the statistics themselves.

What's interesting is that the teacher in question was rated as "effective"
the year before. The question becomes whether that was based off of her VAM
score that year as well as what the standard error was on her regression
coefficient. Unfortunately, the article doesn't mention any of that.

~~~
x0x0
The problem with regression models is, in skilled hands, it's easy to
manipulate the results. And that is without even opening up the rats nest that
is causality.

For instance, want to raise the R^2, a value foolishly used to characterize
how well the model explains? Add more variables. R^2 is monotonically
increasing in the number of variables. So, for example, add the first letter
of the teachers' middle names as an explanatory variable. R^2 will probably
increase a bit.

Is there homoskedasticity? How much? What did they do to reduce it?

What observations are considered outliers and dropped, and who makes that
determination?

Or, want to tank a teachers' score? Assuming teachers are added as something
like indicator variables, there are lots of techniques to make the standard
deviation increase, allowing you to say that at 0 is within the CI of
B_{teacher}.

If they are using glmms -- as they probably ought to be -- there's even more
room for a skilled statistician to pick outcomes, as more and more of the
setup is a judgement call.

Finally, there's an open question of how well the exams were designed and if
they accurately measured the student pre and post effect; there's a whole
field -- psychometrics -- devoted to testing alone.

~~~
x0x0
heteroskedasticity. sigh.

------
btown
Perhaps I'm naive, but it seems like a model used for decisionmaking should be
one that can show _predictive performance_ \- one that can predict, based on
historical data about a set of students and a specific teacher, how well a
teacher would do teaching that set of students. If it can't be accurate in
that, how is it possible to know that it's capturing enough of the variables?
And it seems that VAMs are decidedly _not_ such a model.

~~~
Periodic
It's hard to know what the system really does because the article really
doesn't explain it.

I think what you're describing is Cross Validation[1]. It would work if they
are predicting performance, but it sounds like the VAM system might be trying
to figure out what a hypothetical "average" teacher would have achieved with
the same students and comparing that the actual teacher's performance. This is
basically trying to predict how the students will do independent of the
teacher, but without such a teacher there is no real way to validate the
model. Perhaps if they examined students across all teachers.

The system may ultimately be more about comparing teachers to each other and
not about actually determining the value provided by an individual teacher.

1: [http://en.wikipedia.org/wiki/Cross-
validation_(statistics)](http://en.wikipedia.org/wiki/Cross-
validation_\(statistics\))

------
afarrell
A fundamental problem in evaluating teachers is that the value they add
matures over many years.

------
pessimizer
Many of the reasons why every value-added scoring system in use is terrible:
[http://garyrubinstein.teachforus.org/2012/02/26/analyzing-
re...](http://garyrubinstein.teachforus.org/2012/02/26/analyzing-released-nyc-
value-added-data-part-1/)

~~~
tgb
Thank you for pointing to a much better argument against this than the
original link provides.

------
dthal
> In 2012-13, 68.75 percent of her students met or exceeded state standards in
> both English and math. She was labeled “effective” that year. In 2013-14,
> her students’ test results were very similar but she was rated
> “ineffective.”

That sure makes it sound like the measure is unstable. If it is then, at a
minimum, output for a single year should not be used by itself, but only in a
rolling average with other years. It seems unlikely that the effectiveness of
a veteran teacher would change that much from year to year. Given that there
was little change in outcomes (no big drop in test scores) the hypothesis of
the measure being unstable seems more likely.

~~~
tgb
Well, this is one teacher in the state of New York. Even a very stable
measurement might give a few funny results every now and then.

(Of course, she's probably not the only teacher in New York with a similar
story. But I don't know that there are very many.)

------
analog31
It always cracks me up when every high level education administrator is
referred to as a "reformer." It's like referring to members of the Chinese
government as "revolutionaries."

------
cynusx
looks like the data scientist behind this has some work figuring out why this
case was badly classified (if true).

~~~
manicdee
You're making an assumption that has no supporting evidence :)

------
sphink
Fitting to a statistical model superficially makes sense. But I think the
details kill it.

The outcome you are measuring is the change in test score from before having a
teacher and after. VAM attempts to statistically estimate the teacher's
contribution to that change.

Presumably, the test is of something that theoretically the students will not
know beforehand. Which means the teachers don't want students who study on
their own (or participate in activities where that knowledge might be useful).
And they don't want students who aren't going to learn it -- whoops, that was
a leap, I meant to say who aren't going to test higher at the end. So you
don't really want the top tier nor bottom tier coming into your class.

Nonspecific to VAM, but a result of standardized test results being used for
anything meaningful to the teacher (salary, tenure, etc.) is that anything not
on the test has an opportunity cost, and so will be omitted in favor of test
prep. The more statistical validity that VAM has, the stronger this effect
will be. If the teacher shows the students how to incorporate their new
knowledge into a broader perspective, it may make the school's scores improve
but it will screw over the next teacher in line (because the before test will
be higher). So there's some peer pressure to make sure the students learn
_nothing_ that they're "supposed" to learn later.

If you consider a subject like math, what happens is that at some point many
students fall behind. This makes the later topics much, much harder, because
they build on what they never quite understood. A perfect teacher would figure
out what balance of old and new material to give each individual student. That
perfect teacher would score poorly on VAM compared to a teacher who crammed in
test-specific mechanics and regurgitation, relying on dismal beginning test
scores to make poor but not awful ending test scores look good. The system
would gradually optimize for squeezing incremental gains out of improperly
taught students.

And don't forget that the outcome is what's measured, and what's measured is
crap. In football, you can look at a score (or just who won). Here, the
structure is tuned to produce students who can do well on year-end tests but
nothing else, certainly not on their ability to apply their knowledge to
situations not likely to show up on a test.

Ok, this became more of a rant against standardized testing, but it just
bothers me that adding statistical power magnifies the problems. You'd be
better off throwing in a large random component, so that teachers' innate
desires to teach well have a chance at winning out over gaming the system.
Because even if your population of teachers is really conscientious, you're
actively selecting for those most willing to play the game. And selection
always wins in the end.

~~~
Retric
Your assuming the delta is based around just the prior test scores vs this
one. aka old 10 new 15 or old 80 vs new 85 is the same improvement. However,
statistically there is a tendency to regress toward the mean making simply
staying at 80 end up as statistical progress. However, I suspect their using a
flawed model that ignores the tendency for school districts to pack high
preforming teachers on top of other high preforming teachers. To correct for
this you need to look at what happens when someone moves from one district to
another.

PS: There is a fair amount of momentum in many subjects so teachers can impact
not just this years test results, but next years as well. In the end it's
really difficult to come up with a high quality model and my guess is they
simply did not bother.

------
seesomesense
There is a simple and easy solution. Private schools.

------
seesomesense
Unionised teachers always complain about evaluation.

------
graycat
Warning: <rant>

Controversial opinions. Long road with sharp curves ahead. Author spent too
much time in school, as both a student and a professor. YMMV.

My guess is that the article deliberately omitted the main point: As good as
the teacher was and as good as the performance of her students was, the
evidence from the testing of that teacher on her 4th grade students on 4th
grade work was that that teacher _added_ relatively little to what her
students already had when they entered her class. Or, in a sense, with the way
their teacher evaluation system works, the _good_ teachers of her students
were in K and grades 1, 2, 3 so that by the start of grade 4 the 4th grade
students were already doing really well at grade 4 work and, with the way the
measurement of amount _added_ by that 4th grade teacher was done, that teacher
was seen to have _added_ relatively little. So, _adding_ relatively little,
she was evaluated as not doing well.

 _Added_? Why the emphasis on _added_? Well, suppose you are teaching 4th
grade, your students come into your class struggling with 1st grade work (it
can happen), you work hard, and in the one year get your students good at 1st
grade work, 2nd grade work, and 3rd grade work, whew, three years of work in
one year, but, still don't have the students good at 4th grade work. So, such
a teacher, taught three years of work in one year, should, still, be rated
very effective. So, likely such situations are the source of the interest in
measuring what was _added_.

Now measuring the _added_ amount is likely, in some cases, tricky both for
both testing and statistically. And, likely the severity and rigidity of the
system kept the 4th grade teacher from moving her 4th grade students on to 5th
and 6th grade work and, instead, kept them grinding away at just the 4th grade
work they already knew so well that the teacher had little to _add_. And if
she had moved her students ahead to 5th and 6th grade work, then the test, of
just 4th grade work, would not show that progress and the teacher again would
be measured as not _adding_ much.

So, net, for a 4th grade teacher, a really good 3rd grade teacher is a tough
act to follow!

It's sad to see such stress and struggle with K-12 teaching. We are so wound
up with the goal of "no child left behind" and, to this end, coming up with
systems to beat up on teachers that don't get us to that goal, that we have
really poor systems of evaluation and, with high irony, fail at some basic
academic tasks and have far too many _false alarms_. Bummer.

Or, if you can't do statistics well, then don't do statistics at all. We're
better off with no statistics than with bad statistics. Have I seen some
really bad, we're talking brain dead, statistics in K-12 education, up close,
and personal? Yup!

Net, I wouldn't trust the Statistics and Evaluation Branch of the New York
State Department of Education to get 2 + 2 = 4 correct. Why? I've seen just
way too much fumbling with statistics. Or, bluntly, effective application of
statistics to important real situations is mostly quite far beyond the
abilities of ordinary organizations -- it's just too hard for them; they just
can't get it right; they make messes and do harm.

Here's one way to slow down nearly any application of statistics: Go to some
statistics texts, get the assumptions, and then demand that the assumptions be
justified. One assumption? Sure, _independence_ \-- that assumption is so
powerful that, in anything much closer to daily reality than quantum mechanics
or dice rolling, it is essentially impossible to justify.

Looks like the goal of "no child left behind" has generated a massive _bozo
explosion_.

My dad was a fantastic educator, and his description of the ideal in education
was a student sitting at one end of a log and a good teacher sitting at the
other end. Try to characterize this educational environment with _statistics_?
As they say in New York, f'get about it.

But, really, no worries, mate: There is a safety valve -- the main source of
_education_ anyway, the home. E.g., I had a friend who went to a NYC school
where most of the students knew only two words, and they could say those two
with a wide variety of variations. The abbreviation of those two words was
just "MF". That's all they knew. Not very articulate, but, then, usually they
did get their meaning across, but, then, their meaning was not very _advanced_
, either.

My friend? In the third grade, he was sick at home for a week with the flu,
and his mother was shocked to discover that he didn't know how to read. So, in
that week, she taught him. Then he knew how to read. In school? Maybe he also
learned different ways to say MF.

Education? He did quite well: Got PBK at SUNY, Ph.D. at Courant, and was a
Member of the Institute for Advanced Study at Princeton. His education was (1)
in K-12 or (2) at home? Three guesses, the first two don't count! Or, four
years, K-3, the schools couldn't teach him to read, and his mother did it in a
week.

Yes, maybe by grade 12 he knew the binomial theorem, and maybe his mother
didn't teach him that at home, but, really, still, his _education_ , the real
key to his education, was at home.

My dad told me about a basic book on education, Dewey, _Democracy and
Education_. So, since I was spending so much time in school, I wanted to know
why and read the book. At one point Dewey defined _education_ \-- passing down
from one generation to the next, where he was quite clear that what gets
_passed_ is both the good and the bad, not just all good. Well, net, most of
that _passing down_ happens at home, and there's next to nothing K-12 can do
about it.

Actually, a lot of people understand this basic fact and, thus, want
_education_ to start at birth, that is, have the government provide the basic
_home_ parenting in, shall we say, _at risk situations_. I believe you will
find that our current President is in favor of this! In other words, he sees
the _at risk situations_ as so hopeless that for a solution it is necessary to
replace the home itself. Maybe he's correct.

Sorry 'bout K-12: I trust that it really can do babysitting, that is, keep the
kids off the streets and, thus, mostly out of crime and drugs, keep the sixth
and seventh grade girls from getting pregnant, etc. For much more, well, in
some of the _at risk situations_ , tough to have much more; or watch the PBS

[http://www.pbs.org/wgbh/pages/frontline/education-of-
michell...](http://www.pbs.org/wgbh/pages/frontline/education-of-michelle-
rhee/)

with "The Education of Michelle Rhee". She tried. She was good. She tried
hard, really hard. In the end, she accomplished basically zip, zilch, zero and
nichts, nil, nada. The teachers themselves commonly believed that the goals
were just hopeless. Or, she was unable to make the K-12 schools make up for
poor homes. Sorry 'bout that. But didn't we know that already?

Well, maybe George Bush believed that education happened in K-12 and that,
thus, we could solve the problem of poor education by a program like No Child
Left Behind in K-12. Well, W also believed that "The Iraqis are perfectly
capable of governing themselves", i.e., have the country stay together, as one
country, a democracy, and not split apart into Shiites, Sunnis, Kurds, and
Others, with fighting, torture, atrocities, civil war, little problems like
those. So, just help the Iraqis write a constitution, hold elections, and all
will come together singing "Why Can't We Be Friends?". Where did W get that
really strong funny stuff he'd been smoking?

Bush 41 was smart enough to stay the heck out of Baghdad. Bush 43 was not.
Maybe W was not the brightest bulb on the tree. "No child left behind"? I
understand: W, if your father had only had that goal in mind!

"More educational statistics, Ma!". Then, for the _poor performers_ , "off
with their heads!*. Might not make the situation much worse!

YMMV.

</rant>

~~~
mkoryak
YTMND.

------
paulhauggis
I feel like teachers don't ever want to be evaluaged on their performance, yet
want raises every year. Without a system in place to get rid of bad teachers
and reward good ones, education will stay broken in this country.

Nearly all professions are evaluated and you can be fired. Why should teaching
be any different?

Unions are one of the main reasons things haven't gotten any better. As soon
as you try to evaluate, the unions step in and stop it.

------
Myrmornis
Meh. Teachers moaning about evaluation. Hold the front page.

