

Reading Incomprehension  - bootload
http://www.nytimes.com/2009/09/28/opinion/28farley.html

======
bbg
I've been in this columnist's shoes, and I can confirm what he says: the score
depends on the grader.

At the center where I graded, only 1 in 14 essays were read by a second scorer
for verification. "Calibration" consisted of a half-hour of reading pre-scored
essays and taking a test that required a passing rate (being within some
margin of the "true" score) of only 80%.

There were ridiculous formulas about the number of adjectives, etc., required
for a certain score. Imaginative writing suffered the most; cookie-cutter
writing was rewarded the most.

Scorers worked eight hour shifts. Sometime after my morning coffee I was the
most generous, and in my afternoon lull, when I could hardly keep my eyes
open, I'm pretty sure I graded the hardest.

I remember one of my colleagues had, sadly, suffered brain injury in a car
wreck. His scores were all over the place, as I could tell when he asked my
opinion on the essays he was scoring. Others had modest educational
attainments -- people who had maybe gotten a bachelor's degree twenty years
before, but not really been engaged in mentally rigorous work in the mean
time. Many of them couldn't reliably spot an adjective. They were amazed that
I had an "accuracy rate" of 97%, the highest at my table by a good 15
percentage points.

We were grading fourth-graders' tests from Arizona. The previous cohort at my
center (that I didn't belong to) had graded high school tests from, if memory
serves, Minnesota. (This was late '90s). The tests were "high stakes" --
failure on the test kept the student from high school graduation. It later
came to light that some large number of students who had failed the test and
been denied graduation, had later challenged their scores and successfully
shown that their tests had been severely mis-graded. They were allowed to
graduate, half a year too late.

After the grading period ended, I stayed on a mailing list to hear about job
openings the next summer in that city. However, some time during the spring I
got a letter saying that there wouldn't be a grading center in our city that
year. Wonder why.

------
mquander
I think it's much ado about nothing. Some smart students couldn't really give
two shits about their grade, and then they'll write essays about _Debbie Does
Dallas_ (I've been there.) If you're bright and a good writer and you care
about how many points you get, then you'll write a technically strong, boring
essay about vanilla subject matter and you'll get a high score no matter how
dim the grader is.

~~~
hs
i wrote about masturbation back in high school. i got 4/10 mark with a big red
X across the page corners :D

~~~
sketerpot
Victory, of a sort!

------
ShabbyDoo
This is yet another reason that I worry about one day sending my pre-school
aged sons to public school. Conform, conform, conform. My experience with
large companies tells me that such training might serve them well, although
similar experience seemingly had little effect upon me (I hope.).

Last year, my wife and I toured the local Montessori school, and I asked
whether students would be required to take the Ohio state-wide standardized
tests. The teacher, thinking that I was concerned that their "performance"
would not be measured, said that they could optionally take them. When I
explained my relief that their time would not be wasted, she chimed in about
how much she hated the tests as well.

American society is so obsessed with measurement that it ignores the nuances
of what it seeks to optimize upon. By definition, any outstanding achievement
lacks a convenient standard of measurement.

------
pavel_lishin
Some of the problem could be mitigated by having multiple reviewers score
answers, and taking the average of the score.

Of course this would increase the cost of grading such tests...

~~~
dfranke
A long time ago, in a faraway, and mythical country, which we'll call China,
everyone wanted to know how long the Emperor's nose was. Of course to be seen
even trying to look closely at the Emperor's visage - let alone to hold up a
(different!) ruler to it - would have invited instant, or I should say, far
from instant, death. But so many people were curious, that a group of sages
got together to look for a method of finding the answer, and this is what they
came up with.

Questionnaires were printed and sent out in bundles to cooperating village
chiefs, who distributed them to the peasants. Literacy was at a sufficient
level that most were able to complete the single question, which was, of
course: "How long do you think the Emperor's nose is?"

When the forms were collected, mathematicians added up all the values, and
divided by the number of forms. Thus it was known that the length of the
Emperor's nose was 6.734602 cm. The complete set of data was of course
preserved, and many years later, with advances in statistical understanding
more advanced mathematicians pointed out that fringe values - obviously the
product of deranged minds - were distorting the honest opinions of the rest,
and by eliminating them and using the very latest numerical modelling
techniques, the mathematicians corrected this value to 4.980403 cm. To this
day, no-one has produced a better estimate.

~~~
hs
your china emperor story only suggests that the parent post's averaging
scoring scheme consist of reviewers who dream about a 'test-taker' and then
give a score (repeat as many as test takers there is) ... if they have access
to takers' answers and keys, then averaging could be done much more
objectively

the china emperor: that's maybe because there's no photograph of the emperor
circulating. if there is, one can make comparison to eyes, age, coin, ground-
legal-measure etc and clean out the outlier data. averaging the rest can give
good enough estimate but not super accurate.

