
More States Opting to 'Robo-Grade' Student Essays by Computer - happy-go-lucky
https://www.npr.org/2018/06/30/624373367/more-states-opting-to-robo-grade-student-essays-by-computer
======
thomasahle
> Developers of so-called “robo-graders” say they understand why many students
> and teachers would be skeptical of the idea. But they insist, with computers
> already doing jobs as complicated and as fraught as driving cars, detecting
> cancer, and carrying on conversations, they can certainly handle grading
> students’ essays.

> One year, she says, a student who wrote a whole page of the letter “b” ended
> up with a good score. Other students have figured out that they could do
> well writing one really good paragraph and just copying that four times to
> make a five-paragraph essay that scores well. Others have pulled one over on
> the computer by padding their essays with long quotes from the text they’re
> supposed to analyze, or from the question they’re supposed to answer.

The science is just not nearly there yet. We can barely determine the
sentiment of IMDB reviews. How should we be able to grade an entire essay?

Working in NLP, it really bugs me when people overstate what can be done like
this. It creates massive expectations that are bound to fail, and then the
entire field suffers.

~~~
JustSomeNobody
> Working in NLP, it really bugs me when people overstate what can be done
> like this.

But they have to. Otherwise funding goes away.

> It creates massive expectations that are bound to fail, and then the entire
> field suffers.

Yes. But by then the people overstating now will have made their money and
moved on.

I agree with you, but honestly companies and investors are so shallow it isn't
hard to see what is going on.

~~~
crispyambulance

       > Yes. But by then the people overstating now will have made their money and moved on.
    

Yep, and those people will likely be sending THEIR kids to elite private
schools with 5:1 student-teacher ratios and 20-40K/year tuition for elementary
and high school. These schools will DEFINITELY NOT use "AI" for grading.

~~~
nickserv
It's weird that what is luxury today was once the norm: organic produce, a
stay at home parent, natural (as opposed to synthetic) clothing, etc.

I won't be surprised if having human teachers / caregivers will be reserved
for the upper classes.

------
stakhanov
If you apply to study at the University of Cambridge (UK, not Boston), an
actual faculty member (not a teaching assistant or anything like that) of the
University of Cambridge (not some lesser institution) will take the time to
sit down with you for an extended period of time for an interview, even for
undergraduates (for graduates, it goes without saying). That courtesy is
extended to a boatload of applicants who make that final stage, and the stages
before that certainly don't involve robo-grading.

In the U.S., you take the SAT or GRE or whatever which gets graded by a
mechanical turk or even an actual machine. And even the good colleges don't
have the good sense to ignore the thing (the exception being MIT, at least
back when I applied there, where stating your GRE score was optional when you
applied as a grad). So a less than astronomical score will immediately make it
impossible to get into any decent college. That system is f*cked up.

For me that meant that the only university that would have me also happened to
be the university that routinely scores first or second place in international
university rankings, that university being Cambridge. And I didn't get a
single offer from the US because of a bad GRE score.

After getting my doctorate in NLP there, I can attest to how fucked up the
idea actually is, that you can score essays using NLP.

~~~
crankylinuxuser
The sicker idea is this whole business of "We'll use machines to do human
work, and badly, cause it saves a significant amount of money", is endemic to
the US at large. And sure, it makes a lot of money right now, which is what
the stock markets and VC's care about.

I can highlight individual companies (Google, Facebook, Amazon... etc.) and
they all use various forms of "AI" (cough) to do all sorts of human related
tasks. And they do it somewhat ok for the standard case, but fail horrendously
all around the edges. But the failing around the edges is an intended side
effect. It costs too much to do it right, so "right now" is what's selected
for.

It's also why HN, Twitter, and Facebook is the final customer support
solution. That's because their customer support portals are echo chambers of
"we're very sorry, but we really dont care".

------
albntomat0
This seems easy to game, especially if you have any insight into the model
behind the grading. I doubt that those running it will release any specifics,
but with a large industry[0] behind test prep, general guidance on how to beat
the system will be available for those who can pay for it.

That said, the SAT writing section was equally silly when graded by humans.
All of the examples of what a good score looked like (per when I took it ~10
years ago) were highly formulaic, rewarding those who invested in played the
game.

[0]: [https://www.quora.com/How-big-is-the-SAT-test-prep-
market](https://www.quora.com/How-big-is-the-SAT-test-prep-market)

~~~
zawerf
If you read the article it actually goes into a lot of fun details on how it
is _already_ being gamed. GAN techniques where you just generate garbage to
get a high score:

> Called the Babel ("Basic Automatic B.S. Essay Language") Generator, it works
> like a computerized Mad Libs, creating essays that make zero sense, but earn
> top scores from robo-graders.
    
    
      "History by mimic has not, and presumably never will be precipitously but blithely ensconced. Society will always encompass imaginativeness; many of scrutinizations but a few for an amanuensis. The perjured imaginativeness lies in the area of theory of knowledge but also the field of literature. Instead of enthralling the analysis, grounds constitutes both a disparaging quip and a diligent explanation."
    

Along with human examples:

> One year, she says, a student who wrote a whole page of the letter "b" ended
> up with a good score. Other students have figured out that they could do
> well writing one really good paragraph and just copying that four times to
> make a five-paragraph essay that scores well. Others have pulled one over on
> the computer by padding their essays with long quotes from the text they're
> supposed to analyze, or from the question they're supposed to answer.

This whole thing is a nice case study for adversarial machine learning.

~~~
tomtimtall
Just because the essays are auto graded doesn’t mean no human will ever see
them. It takes far far longer to grade an essay than it does to skim 3
sentences randomly and confirm that it’s bullshit and get the student expelled
for attempting to cheat.

~~~
Drakim
That will stop people from just copy-pasting one statement over and over
again. But, it will still create a meta game of trying to write machine
learning likable texts as opposed to good texts.

~~~
macNchz
It was already a bit of a game in 2005 when I took the ‘new’ SAT which
included an essay component. Research from the same professor Perelman quoted
in this article had shown that longer essays were graded more favorably (by
overworked humans I believe), so I made sure to fill the whole space at the
expense of quality writing, and got a perfect score.

[https://mobile.nytimes.com/2005/05/04/education/sat-essay-
te...](https://mobile.nytimes.com/2005/05/04/education/sat-essay-test-rewards-
length-and-ignores-errors.html)

------
lozenge
I'm surprised they didn't even mention the morale issues. How will it feel to
know your answer isn't even going to be read by a person?

~~~
tomtimtall
Math homework can be auto graded as well. I wouldn’t shred a tear that no one
reads whether I got the arithmetic questions right or wrong. The point of all
the works is to improve your abilities. Your first short story written at the
age of 7 isn’t going to be a blockbuster, it’s just trash you write to improve
your ability to write. Using robo grading means more absolutely horrible
essays that have no value on their own can be written and graded without a
teacher having to spend the huge amounts of time it takes to grade them.

~~~
sct202
I think that robo-grading is missing the ability to give cohesive feedback on
someone's performance. A teacher who reads a students papers over the course
of a year (or years) will be able to provide guidance to the student on how to
improve. The robo-graders might be able to spit out a semi-accurate grade, but
they won't be able to tell the student what they could improve and how.

~~~
ycombobreaker
What's more, with black box ML techniques it may be the case that "advice"
exists, but that it cannot be effectively interpreted.

------
jarmitage
If the teachers are using automatic grading, the students should be allowed to
use automatic answering.

In fact, why not stay at home and all interaction can take place via Google's
Duplex!

------
frostburg
Stating that multiple choice tests are free from bias is already highly
suspect.

The decision-makers on this seems to be both tech illiterate (or with hidden
agendas) and hilariously arrogant to the point that they seem to think that
epistemology is something that happens to other people.

------
skywhopper
“they insist, with computers already doing jobs as complicated and as fraught
as driving cars, detecting cancer, and carrying on conversations, they can
certainly handle grading students' essays.”

Except that computers can do none of those things, and those three things are
all much simpler tasks than critically grading a student’s essay.

Also, of course the companies selling these products will say they work great.
That’s the very very last person to trust in the conversation.

------
dogma1138
Honestly it might be better than the current system.

I graded papers for national high school exams for a few years (non-US) and
there was a strict grading guide that you had to follow which took out any and
all subjective grading out of the question.

The essay section grading guide was essentially a template that was a must to
follow and any deviation would result in panalties.

You had about 30-45min to grade exams that could take 4-6 hours for students
to complete and none of the questions were multiple choice they were all long
form questions. With ~30 questions per exam, the time included saving the
answers in the electronic grading pad and in the later years in the shitty VB
application they provided. (Which I ended up hacking and having a spread sheet
import function, this got me kicked out of the exam grading because I tampered
with the application...).

We are already robograding papers just with mechanical Turks and in a process
which is much less accurate which is why we use 2-3 graders per exam.

Until we find a better overal solution for evaluating student performance I
don’t see anything wrong with robogradig exams and student work if anything it
might give teachers some more free time to actually teach.

------
abuckenheimer
I think the reality of this is fairly well captured in the 4th to last
paragraph

> Even human readers, who may have two minutes to read each essay, would not
> take the time to fact check those kind of details, he says. "But if the goal
> of the assessment is to test whether you are a good English writer, then the
> facts are secondary."

I would go a step further and say the goal of standardized tests is probably
not to test if your a good English writer but rather to see if your good at
that test. AI has tremendous ability to streamline the grading of standardized
tests which are narrowly focused measure and I think that's fine if it comes
with proper checks. As many people mention here I don't think AI will be able
to objectively measure good general writing and I doubt we'd fully cede
artistic evaluation to a computer as a society but I agree with being weary of
this.

Standardized tests in general though seem to me like one of the greatest
examples of a measure becoming a target and loosing a lot of its effectiveness
(Goodhart's law). I can't help but be reminded of Paul Grahm's essay on
nerds[1] by this article because the effective point of school from an
evaluation standpoint is these tests. However now we've built a computer that
grades these test's nearly as well as its human counter parts and its
glaringly obvious that this evaluation is tremendously game-able. Which
resonates with Paul's argument that: "The problem with most schools is, they
have no purpose."

[1] [http://paulgraham.com/nerds.html](http://paulgraham.com/nerds.html)

------
nlawalker
>> _But Nitin Madnani, senior research scientist at Educational Testing
Service (ETS), the company that makes the GRE 's automated scoring program,
says [...] "If someone is smart enough to pay attention to all the things that
an automated system pays attention to, and to incorporate them in their
writing, that's no longer gaming, that's good writing," he says. "So you kind
of do want to give them a good grade."_

The presence of this quote in the article makes my day; doubly so given its
source. It's a picturesque example of a statement that could conceivably get a
10/10 from a robo-grader based on the language and structure, but the thinking
it represents is completely backwards.

~~~
JoeAltmaier
Write to your audience! That's always the best rule, and a reasonable metric
for 'good writing'. If the audience is a robot...

------
ourmandave
I heard a recent radio ad pushing a "Work from home as a Medical Coder"
training seminar. Sounds dubious.

I wonder if you could create a legitimate service that provides "Work from
home as an Essay Grader."

~~~
Donzo
I stumbled upon this company at ISTE.

They outsource essay grading, similar to what you are suggesting.

[https://www.thegraidenetwork.com/](https://www.thegraidenetwork.com/)

------
westurner
edX can automate short essay grading with edx/edx-ora2 "Open Response
Assessment Suite" [1] and edx/ease "Enhanced AI scoring engine" [2].

1: [https://github.com/edx/edx-ora2](https://github.com/edx/edx-ora2) 2:
[https://github.com/edx/ease](https://github.com/edx/ease)

... I believe there's also a tool for peer feedback.

~~~
ghaff
Peer feedback/grading on MOOCs is pretty bad in my experience. There’s too
much diversity of skills, language ability, etc. And too many people who bring
their own biases and mostly ignore any grading instructions.

Peer discussion and feedback are useful in things like college classes. Much
less so with MOOCs.

------
LarryL
> Marder and Henderson worry robo-graders will just encourage the worst kind
> of formulaic writing.

> they will quickly learn they can fool the algorithm by using lots of big
> words, complex sentences, and some key phrases - that make some English
> teachers cringe.

Spot on! That's already the case with real teachers, with a computer it'll be
hundreds of times easier (and horribly counter productive for the students'
formation).

Back in the day, when I was a student, we already did a LOT of padding in our
essays: writing pages upon pages (length mattered enormously!) of loooong
sentences designed to -basically- make the discours look much more clever (and
accurate) than it really was. I'm still puzzled why our teachers did not
notice the obvious B.S. that it was! Seriously, I managed to pass some classes
(university) with almost no real meat in the text. I just managed to write a
coherent & logical argument, with a minimum of (vaguely remembered) facts, and
that was enough to get a passing grade (but not a great grade of course).
Ridiculous.

\---

Another fundamental problem is that even teachers don't have the same views
upon what is a good essay. For the worst essays, I think they will agree: bad
grammar, typos, no paragraphs, no logical structure, etc. But once you enter
the average (or more) quality it becomes much more difficult to agree on which
text is better! There is a LOT of subjectivity and personal taste, no matter
what directives were given from the educational board, so your results would
vary a lot depending on your current teacher.

Some teachers for instance are very narrow-minded, they expect students to
"regurgitate" their lessons almost word by word, others appreciate a student's
attempt to insight & originality / independence of thought. I've been in the
french education system for a long time (including 6 years in 3 different
universities), and it was shocking to see how much of a grade would depend on
the teacher & of their perception of you (because that's also a factor, even
if the teachers will swear it's not the case).

Even for technical subjects it's not obvious AT ALL that the teacher is right
and rewarding the best answers. Now that I have many years of
programming/projects management behind me, I see plainly the mistakes & B.S.
that some teachers would give us. And I've had coworkers who were still part-
time students explain some of the incredible misconceptions of their teachers,
it was almost beyond belief! Those teachers really lived in another world,
they had no clue.

Well, at least, if you are graded by a program, you know that you will only
have to learn that program's "tastes/patterns", instead of facing the unknown
of being graded by a random teacher (who may like your style or not). That's a
small consolation.

