

Cheating in Atlanta: A Teachable Moment - testrun
http://www.wsj.com/articles/cheating-in-atlanta-a-teachable-moment-1428521500

======
bsder
Standard WSJ hit piece. It's all the fault of unions.

In spite of the fact that out of 178 people the _superintendent_ and _34_
principals were in on it. People very much _NOT_ in the teacher's union and
generally hostile to it.

But, hey, an actual conversation about education, socioeconomics, and
achievement? Can't have that--people might actually figure out that testing is
a gigantic Republican boondoggle.

Just blame the unions--it's what our readership want.

~~~
AKrumbach
Okay, let's have that "actual conversation" \-- if testing is just a giant
boondoggle, what method(s) do you favor for measuring education and
achievement in a socioeconomic-neutral fashion?

~~~
bsder
The problem isn't testing. Testing is fine and is actually necessary. Even
testing with socioeconomic bias is okay since you can norm to historic values
to see trends.

The problem occurs at two points:

1) when you then use the results of those tests more broadly than applicable.

Comparing results in the same school year on year is generally valid (although
you have to watch out for events like a big employer closing). Comparing two
different schools close in location requires a bit of finesse. Comparing
schools in completely different locations with completely different
demographics is generally useless--yet NCLB effectively tries to lump all
schools together with the same tests.

2) when you apply those tests to provide punitive measures

Why don't people get that doing this is going distort the system? Software
people wail continuously about "you can't judge my craft with objective
measures." Practically any measure in software I can dream up: number of bugs
fixed, number of bugs found, number of commits, number of lines of code
written, etc. will be met with cries of "that doesn't measure my ability"
_AND_ "if you promote on those measures, people will start gaming the
measures". And they will be right.

Why can't people see that the _exact_ same thing will occur in teaching?

The standard complaint is that only programmers can judge programmers and that
non-programmer managers can't judge correctly. Yet, with respect to teaching,
we magically think that any moron can judge teaching ability and that non-
teaching managers are well equipped to judge teachers.

This is one of my hot buttons. Tech folks are so very quick to defend their
own area with "you can't objectively judge and punish me" and so quick to
apply "but I can objectively judge and punish you."

And, even when a school is doing things right, it's _really_ hard to keep it
going. Positive results take years to show up unequivocally while negative
results can appear almost immediately.

See: [http://www.thisamericanlife.org/radio-
archives/episode/275/t...](http://www.thisamericanlife.org/radio-
archives/episode/275/transcript)

The bit about how quickly things dropped one year when they didn't have
parents come in 3-times-a-year to review report cards is particularly
enlightening. Achievement is getting a _lot_ of little things right, and only
the teachers can see when they're right and when they're wrong.

By contrast, it took 10 _years_ to move from 15% passing reading and math to
66%. That's roughly 5% a year, and I suspect the first years were slower than
that. 5% is statistical noise to a test--which means that all of the things
that were occurring in the first years would have been filtered out by a test.
Not the result you wanted.

So, given that achievement at that school is now dropping, what does the test
tell us to do? Where does the test say: "Whoops. The teachers are the same, so
A) either the children got worse or B) we need to unroll every single change
made by administrators in the last year." Where is _THAT_ application of the
test?

Now do you see the problem with testing?

~~~
AKrumbach
> Practically any measure in software I can dream up ... will be met with
> cries of "that doesn't measure my ability" AND "if you promote on those
> measures, people will start gaming the measures". And they will be right.

Sure, any measurement system will fail, but there are means to make them more
robust. One way is to make it so that what is measured is what is desired:
gaming the system results in more of the desired behavior. Consider a support
desk measured not by time-to-resolution or call volume, but post-resolution
surveys: in order to 'game' the measurement, the agents must actively seek to
promote customer satisfaction.

Another, more controversial, means for more robust measurements might be an
overt, active gamification. Using your example of LoC: instead of unilaterally
praising large (or small) work efforts, LoC is used as the measurement in a
bid war. I'm not envisioning a particular bid process (silent auction? Liar's
dice style round-robin?) but anybody can call BS on the latest bid ... which
then holds that bidder to implement. Beat your own estimate for praise, but
for punishment you would have to exceed the prior (non-BS) wager. Couple a
second bid value (deadline, for example) and the complexities of actively
"gaming the system" becomes nigh on impossible.

Between this article about the Atlanta cheating scandal and the NPR story you
linked, I wholeheartedly believe that the "standardized test" regimen is
currently broken. However, I was hopeful that you might have some suggestions
on alternative measurements or ways we could better apply the current ones.

~~~
bsder
Using tests for an "objective measure" is doomed on its face. That's gigantic
sign that the people involved are more interested in blame transfer than in
progress.

You can use tests for trends. That's a good use of tests. You can use tests
for feedback to the individuals directly involved. That's also a good use of
tests.

This corresponds to my "software metrics" points. If I suddenly see an uptick
in average lines of code committed, I should take a look. If I start seeing a
big increase in bugs filed, I should take a look and see what just happened.
Most of the time, there's a good explanation. Sometimes, someone is doing
something stupid and I need to go fix that. And, rarely, someone is doing
something malicious and needs to be fired.

Tests should be used as an indicator to a human to "look here!" However, test
results are simply an indicator to a human to exercise judgment--not an
objective measure to remove humans from the loop.

The problem with education is that people really don't _want_ to use the tests
as "indicators". As in the article I pointed out, that particular school now
is having a _decrease_ in results--the tests are indicating with a big, red
klaxon that something is going wrong. People should be examining the situation
with a magnifying glass to figure out the issues and fix them.

The problem is that "something going wrong" is very clear--a new principal and
new directives from the head bureaucrats. Do you think that they will look at
the test results and say "Oh, we need to undo these policies and get a new
principal?" Unlikely.

Would you like a metric that only ever gets used against you but that your
management gets to ignore? Of course not. It's why the complaints by
programmers about management are legion.

And that's why teachers hate all this stuff. Somehow, it only ever gets used
against the teachers. When the evidence points at the administrators, school
board or politicians, it gets ignored.

This isn't even limited to teachers. Physicians oppose the collection of this
kind of data about outcomes for the sames reasons--it will very quickly get
turned into a ranking system that gets used against doctors rather than for
improving outcomes.

------
_petronius
>> If the environment that produced this horrific behavior in Atlanta is
“toxic,” blame the people who control that environment, not the testing regime
that attempts to hold those people accountable. The teachers unions that run
our schools view public education, first and foremost, as a jobs program for
adults. That is why the unions fight so hard to keep open failing schools and
want seniority—rather than teaching ability—to determine layoffs.

I don't think the author of this piece manages to back up his argument very
well. The teachers' union says that the testing regime is flawed, and that the
toxic environment is created by it; the author says that the testing regime is
"flawed" but not "fatally" so, without substantitively adressing that concern.
The "people in charge" of this environment are in turn governed by No Child
Left Behind and state/federal regulation and laws -- there's a chain of
authority here, and the brokenness of the system can rightly be argued to flow
down from the highest policy levels all the way to local middle-management.

>> Would you want your child taught by someone who flunked the certification
test five times, let alone 10?

This is a silly argument. How many people take multiple tries to get a driving
license? The test either ceritifies you or it doesn't.

>> And would that instructor be more or less likely to resort to changing
student test scores to hide his own incompetence?

Absolutely no logical connection, other than a wild assertion by an author
clearly hostile to the teachers' union, and it seems to me, teachers in
general.

Like it or not, public school teachers in the US are vastly underpaid and
overworked (especially in urban school districts with low budgets), and have
their ability to teach severely hampered by testing requirements that may or
may not be beneficial to their students. The whole system is broken, but
demonizing the people that are attempting to educate in these conditions
without trying to understand the pressures at play isn't very constructive.

Further reading: [http://www.thisamericanlife.org/radio-
archives/episode/275/t...](http://www.thisamericanlife.org/radio-
archives/episode/275/transcript)

~~~
Shivetya
First, the environment was toxic because the school administration was corrupt
at the top and the school system had too much influence over the testing. To
prevent further such abuse all testing should be done by accredited third
party groups.

Second, get off this silly mantra that they are underpaid. In some areas yes,
but in Atlanta that is not the case. A simple query of various counties here
produces results and for the number of days worked their pay is actually quite
good; that and anecdotal being friend with a 10 year teacher who makes good
money and who berates the system(which includes the union) for the problems.
They cannot fix what they are not allowed too.

~~~
neumann
> all testing should be done by accredited third party groups.

These 'accredited third party groups' would need to be a) accredited, b) be
paid. And it's already happening here in the US[0]. What you have now is an
overhead of bureaucracy that 'accredits' and outsources the testing to these
private companies, who pay call center wages to mark high school exams based
on shoddy metrics that are aimed at maximising profits. All the markers are
temps/contract, and this accreditation system is essentially a cartel of
companies that lobby to ensure their business remains. I cannot see the logic
how this can end up benefiting the students.

Secondly, teachers being underpaid is not a mantra - it is a fact [1]. In the
US, the worth society places on a job is strongly correlated to the pay. The
accreditation and pay, resources and social respect that teachers (do not)
receive I wager is directly correlated with the quality of the education
students receive. The system is so broken as to rely on standardized testing
as to further punish 'under-performing' schools by limiting availability to
funding based on scores.

[0] [http://www.theatlantic.com/features/archive/2014/07/why-
poor...](http://www.theatlantic.com/features/archive/2014/07/why-poor-schools-
cant-win-at-standardized-testing/374287/)

[1] [http://www.nea.org/home/2012-2013-average-starting-
teacher-s...](http://www.nea.org/home/2012-2013-average-starting-teacher-
salary.html)

~~~
Shivetya
Neither source you give is reliable, both are heavily biased.

I prefer actual charts from counties that pay, given 190 days of work
[http://www.dekalb.k12.ga.us/human-resources/teacher-
salary-s...](http://www.dekalb.k12.ga.us/human-resources/teacher-salary-
schedule) in a respectable county in Georgia this isn't what I would called
underpaid. This is before benefits. Its not hard to find these charts for most
counties.

The areas that need attention is all the money spent on administrative people
other than teachers. The US has spent increasing amounts of money on education
with near zero results.

------
RansomTime
This article is behind a paywall. Previous HN commenters have pointed out that
you can get the full text of the article if Google is your referrer. (I.E. you
google the article title and click on the result).

------
mhuffman
This is not new, nor is it unexpected[1]. If you incentivize an outcome,
people will try to get to that outcome. The problem with No Child Left Behind,
is that they (foolishly) presumed that everyone would take the most difficult
path to that outcome.

If they wanted to stay with the testing metric framework, they would have been
much better off with external testers. However, that still would have teachers
"teaching the test" instead of teaching the subject.

[http://www.nydailynews.com/new-york/report-shows-cheating-
te...](http://www.nydailynews.com/new-york/report-shows-cheating-teachers-
article-1.1249570)

