
Flawed Algorithms Are Grading Millions of Students’ Essays - elorant
https://www.vice.com/en_us/article/pa7dj9/flawed-algorithms-are-grading-millions-of-students-essays
======
dahart
> Utah has been using AI as the primary scorer on its standardized tests for
> several years. “It was a major cost to our state to hand score, in addition
> to very time consuming,” said Cydnee Carter, the state’s assessment
> development coordinator. The automated process also allowed the state to
> give immediate feedback to students and teachers, she said.

Yes, education takes time and costs money. Yes, not educating is both cheaper
and faster. Note how the rationalizing ignores the needs of the students and
the quality of the education.

I live in Utah and my children have been subjected to this automated essay
scoring here. One night I came home from work and my son and wife were both in
tears, frustrated with each other and frustrated with the essay scoring which
refused to give a high enough score to meet what the teacher said was
required, no matter how good the essay was. My wife wrote versions herself
from scratch and couldn’t get the required score. When I got involved, I did
the same with the same results.

Turns out the instructions said the essay would be scored on verbal
efficiency; getting the point across clearly with the fewest words. I started
playing around and realized that the more words I added, the higher the score,
whether they were relevant or grammatical or not. Random unrelated sentences
pasted in the middle would increase the score. We found a letter of petition
online for banning automated scoring for the purposes of grades or student
evaluation of any kind. It was very long, so it got a perfect score. I
encouraged my son to submit it, and he did. Later I visited his teacher to
explain and to urge her to not use automated scoring. She listened and then
told me about how much time it saves and how fast students get feedback. :/

~~~
piokoch
Frankly, I can't believe what I am reading. The idea that some "AI" grades
essays automatically is idiotic and has nothing to do with education. Where is
the place for discussions? Where is the place for ideas confrontation? Where
is the place for writing style development? How this AI is supposed to grade
things like repetitions (that can be either good rhetorical tool or a mistake,
depending on context), etc?

Who the hell came out with such an idea. I would even hesitate to use "AI" for
automatic spell checking as it is sufficient to give some character unusual
name and it will be marked as error.

My guess is that soon or later people will learn how to game that AI. I
wouldn't be surprised if there were some software that will generate essay
that Utah "AI" likes.

~~~
dagw
_My guess is that soon or later people will learn how to game that AI._

Already been done. [http://lesperelman.com/writing-assessment-robo-
grading/babel...](http://lesperelman.com/writing-assessment-robo-
grading/babel-generator/)

Here's a sample essay that is complete nonsense and got a perfect score on the
GRE.

[http://lesperelman.com/wp-
content/uploads/2015/12/6-6_ScoreI...](http://lesperelman.com/wp-
content/uploads/2015/12/6-6_ScoreItNow_2015_Feb20.pdf)

~~~
thombat
The final paragraph from that example is steaming gibberish that nobody could
mistake for English:

"Calling has not, and undoubtedly never will be aggravating in the way we
encounter mortification but delineate the reprimand that should be
inclination. Nonetheless, armed with the knowledge that the analysis augurs
stealth with propagandists, almost all of the utterances on my authorization
journey. Since sanctions are performed at knowledge, a quantity of vocation
can be more gaudily inspected. Knowledge will always be a part of
society.Vocation is the most presumptuously perilous assassination of
mankind."

Yet the robo-scoring acclaims it as:

* articulates a clear and insightful position on the issue in accordance with the assigned task * develops the position fully with compelling reasons and/or persuasive examples * sustains a well-focused, well-organized analysis, connecting ideas logically * conveys ideas fluently and precisely, using effective vocabulary and sentence variety * demonstrates superior facility with the conventions of standard written English (i.e., grammar, usage, and mechanics) but may have minor errors

Any teacher faced with the requirement to use such tools would be better
placed instructing their class on civil disobedience.

~~~
crankylinuxuser
The let me posit another idea...

There's 2 ways of finding out these artifacts of AI essay grading: pure luck,
and being able to afford extensive test-prep (rich).

The luck one can't be accounted for. So I im lead to believe that the purpose
of these essays and their AI grading is to find and escalate rich people.

~~~
JustSomeNobody
> So I im lead to believe that the purpose of these essays and their AI
> grading is to find and escalate rich people.

Well, of course. How many poor people are allowed to decide what is good for
children's education?

~~~
crankylinuxuser
The standard US response is:

'There's a reason why they're poor. Better pull themselves up by the
bootstraps."

Mixed alongside with poverty stricken neighborhoods are the primary funding,
resulting in poor school systems. And those students obviously wont have the
money or the access to get the test-prep needed to "succeed".

It's all too laid out to be accidental.

------
VikingCoder
My mother worked grading standardized tests. It was a hellish job for many
reasons (limited breaks, etc.)

One question she had to grade was essentially, "What's something you want your
teacher to know about you?"

It was an essay answer, and she was supposed to grade it on grammar, etc. Just
the mechanical aspects of writing. (The real question explained the details
more, but that was the core of the question.)

She saw answers that would make you weep.

"My daddy touches me."

"I haven't eaten today. I don't know when I'm going to eat again."

Stuff like that.

And my mother was going to be the only human who ever saw their responses.
Their teacher had no chance to see their responses, just my mom.

So she goes to her supervisor and asks, "What can we do to help these kids?"

The supervisor said there was nothing you can do. Just grade the answers.

~~~
harry8
Some of these will be 100% true as well. But don't make the mistake that there
are no kids who go for shock value or are wantonly manipulative when they know
it can't come back to them.

So how many are true and how many false? I have no clue. Literally none. And
no it doesn't make me feel any better about the screams of existential agony
even if that were a low percentage. Could be high too.

~~~
dmoy
For the not eating, it's pretty easy to get data. It's like 1 in 5 children
live in food-insecure households in the US and maybe 1 in 20 of those very
insecure, so not eating before school provided lunch is common enough that if
you're grading tons of papers you'll run into kids like that.

~~~
stochastic_monk
It could also be a student suffering from anorexia nervosa, which the
confessional aspects of the essay would fit well with.

~~~
JustSomeNobody
I'm confident that your example would of a less percentage than those
mentioned in dmoy's comment.

------
drngdds
This is my first time learning that AI-graded essays are a thing. Am I the
only one who thinks that's insane? I feel like you'd probably have to have an
AGI to meaningfully evaluate an essay.

~~~
Spivak
In a forum of CS people I'm surprised this is one of the top opinions. Our
field is full of super surprising results like this -- that you don't have to
actually understand the text at beyond basic grammar structures to reasonably
accurately predict the score a human would give it.

Like this kind of thing should be _cool_ , not insane. I mean wasn't it cool
in your AI class when you learned that DFS could play Mario if you structured
the search space right?

~~~
cmroanirgo
I came first in English for my school, many moons ago. Leading up to the
finals, I regularly finished ahead of the hard core the English essay people,
generally to my amusement. My exam essay responses were generally half the
length (sometimes even shorter) than the prodigious writers. Although I've an
ok vocabulary, I always made sure I made the right choice of word to hit a
specific meaning, rather than choosing words with a high syllable count.

I'd find it highly interesting to see what kind of result I'd get using an
automated system.

Why?

Because, I once asked a teacher (also an examiner) why I got good grades above
the others, and the answer surprised me: my answers were generally unique
/refreshingly different, to the point/ not too long and easy to read.

I suspect with this new system, I'd be an average student. It'd also be
interesting to find out, several years down the road, if the automated system
could be gamed at all -- I suspect it could, and teachers would help students
'maximise' their scores as a result of that.

~~~
rocqua
It seems plausible that, under this system, you would eventually have learned
to write longer essays. To my mind, that would be a school teaching you to be
worse.

In fact, throughout the article I kept being surprised by the idea that long
is good. When writing, I tend to prefer being brief.

------
RcouF1uZ4gsC
Unlike a multiple choice test where the primary audience is automated graders,
the primary audience for an essay is other humans. If even Google and Facebook
with their billions of dollars and billions of posts worth of data, still
cannot always understand the intent and purpose of written content, what hope
do these algorithms have?

If it is cost-prohibitive for every essay to be graded by humans, then they
should be dropped from the tests. Otherwise, we are missing the whole point of
essays which is to communicate effectively with another human, not just match
certain text patterns.

~~~
anigbrowl
If it is cost-prohibitive then then maybe we should adjust the economic model,
not abandon the measurement.

~~~
rocqua
Sure, have less essay test questions, and start grading them for content not
form.

If you want to grade on form to test the ability to write correct rather than
coherent sentences, make those separate questions, and mark them so.

------
jakear
“In most machine scoring states, any of the randomly selected essays with wide
discrepancies between human and machine scores are referred to another human
for review”.

And “between 5 to 20 percent” of essays are randomly selected for human
review.

So the takeaway is that if you’re one of the 80-95% of (typically black or
female) people who the machine scored dramatically lower, but are not selected
for human review, your education future is systematically fucked and you have
no knowledge of why or how to change it.

Absolutely reprehensible. Anyone involved in the creation or adoption of these
systems should be ashamed.

~~~
kazinator
The thing is, you could be similarly screwed by a biased human whose grading
is not checked by a less biased human.

At least the machines offer the following hope: even if unbiased humans are
rare among paper-grading teachers, those humans can be used to train the
machines, so then bias-free or lower-bias grading becomes more ubiquitous.

Basically, the system has the potential for systematically identifying and
reducing systematic bias. A computer program can be retrained much more
readily than nation-wide army of humans. Humans can be given a lecture on
bias, and then they will just return to their ways.

~~~
gibolt
AI has a lot more potential for bias than humans. It depends on the input data
which is likely heavily biased based on other data set results like face
detection. It will only amplify any small bias present in the data.

~~~
Spivak
It's amazing to see how the general opinion of CS people has _completely
shifted_ in the last few years from "algorithmic scoring is important in
removing the bias from human graders" to the exact opposite.

~~~
kazinator
If we can quantify the bias in the machine, that gives us an opportunity to
close the feedback loop and control the bias.

The bias comes from the human-generated training data in the first place; the
machine isn't introducing its own. For instance, the machine has no inherent
concept of disparaging someone's language because it's from an identifiable
inner city dialect. If it picks up that bias, at least it will apply it
consistently. When we investigate the machine, the machine will not know that
it's being investigated and will not try to conceal its bias from us.

On the other hand, eliminating bias from humans basically means this:
producing a new litter of small humans and teaching them better than their
predecessors.

~~~
gibolt
If...

------
rynomad
Personal anecdote;

I remember taking a standardized test, can't remember if it was SAT or CSAT
(Colorado pre-SAT test). This was at a time when I'm confident that humans
were the graders.

I started with an intro that would be appropriate for a standard 5 paragraph
essay; i.e. the thing you write when you don't know what you're talking about
and you're just following a format.

In the third paragraph I took a leaf from family guy, and just interjected
"WAFFLES, NICE CRISPY WAFFLES, WITH LOTS OF SYRUP." for the next page and a
half, I berated the very foundation of the essay prompt, insulting it the way
only an angst ridden early teen can.

... I got a 98% on the essay.

Fast forward several years. I write an essay for for an introductory college
course final. My paper is returned to me with a coffee stain and a "94% - good
work!" note scribbled on the top. That note was scribbled by a TA that would
turn out to be my girlfriend for 2 years. One night in bed, she tilts her
laptop to me, showing an article that I used as the central theme to the above
essay; "can you believe this?"

"Are you joking? Of course I can believe this, it was the subject of the essay
you gave me an A on 2 years ago"

She admits she didn't read past the first paragraph of anything she grades,
and just bases grades on intuition based on how articulate the essays are at
the outset.

...

The point I'm making:

Does AI suck at judging the amount of informative content in a student essay?
YES

Do humans suck at judging the amount of informative content in a student
essay? ALSO YES

------
dlkf
This is a great example of why it's grossly irresponsible for members of the
ML community to talk about how AGI is just around the corner. In addition to
the fact that we have no idea whether this is true, it primes a naive public
for believing that technologies like this are worth the tradeoff.

"People worry that computers will get too smart and take over the world, but
the real problem is that they're too stupid and they've already taken over the
world."

------
empath75
I imagine that any student that experimented with the form of the essay or
wrote an exceptionally well argued piece in simple language would not have
their test graded appropriately either.

Any essay writing test which could be adequately graded by a machine is not
testing anything of value.

Edit: I’ll further add that as soon as people’s careers depend on a metric,
the metric becomes useless as a metric, because it will be gamed and
manipulated by everyone involved. Almost nobody involved is incentivized to
accurately measure student’s writing ability.

~~~
HarryHirsch
_Almost nobody involved is incentivized to accurately measure student’s
writing ability_

It's the same reason you see keyword posters in math education. "Together"
means "plus", that kind of thing. It's completely worthless, except for one-
step problems, and even then it doesn't always work. What is happening is
collusion between teachers and testmakers. You can't teach understanding, but
you can teach test-passing techniques because the way the test is set permits
this.

You see the same thing here, in English you can get away with not teaching
quality writing if you teach techniques to score well.

~~~
Spivak
I feel like the mistake is assuming that essay writing is about the content.
It's just a thing to give the student something barely non-trivial to write
about.

When your essays are graded they're marked down for mechanical and wording
problems. There's really no point in trying or grade 'good ideas' on a subject
piece you had maybe 10 minutes to skim.

~~~
HarryHirsch
_a subject piece you had maybe 10 minutes to skim_

That's a travesty, and you know it because when the kids are in college and
they have as much time as they like to write their assignments they all use
the wrong words and then misapply them.

------
anm89
To me this brings up the absurdity of having essays on standardized tests.
What about an essay is standardized? It's a totally nonsensical premise.

This always gets made into some kind of techluminati conspiracy for the
machines to ingrain structural racism whereas it's pretty clear all the
algorithms fail to do is improve an already bad situation stemming from a
flawed premise.

~~~
nitwit005
A number of states found out their schools were graduating students who
genuinely could not read or write effectively. If you want to quantify that,
you're forced to test it somehow. How would you test writing ability without
asking them to write something?

~~~
anm89
Reading comprehension with simple factual questions.

------
jedberg
Any state that relies on the AI as the primary grader does not understand the
current state of AI.

It would make sense to use the AI as a first pass, and then not _randomly_
grade the essays with a human, but specifically choose all the essays that are
on the cusp of the pass fail line. Then use all those human generated scores
to update the model, especially if someone moves from pass to fail or fail to
pass. Then maybe throw in a few of the really high and really low outliers to
make sure _those_ are right, and throw away your entire model if the human
scores are drastically different (and obviously don't tell the humans what the
computer score was so they have no idea if they're reading a "cusp" essay or
an outlier essay).

But putting the educational fate (and therefore future earnings) in the hands
of an AI is unconscionable.

~~~
C1sc0cat
But I bet the company took the decision makers to a really nice restaurant
_nudge_ _nudge_

------
bendbro
I think machine learned grading of papers is insane, but at the same time I
don't think we should be training or encouraging students to speak in AAVE (as
the article suggests).

I think the right approach for machine learned systems is to automatically
"whitelist" essays rather than "blacklisting" them. Students in the middle of
the distribution of essays aren't really interesting, so whitelist them, give
them a pass. Those at the extremes can be either exceptional or terrible, but
usually terrible. The judgement of those at the extremes should be decided by
a human, not a machine. You wouldn't want to blacklist the Einstein of essays
because he did something genius that is indistinguishable from insanity.

However, I think there are some essays that can automatically be blacklisted.
For example, those with:

1\. Plagiarism (perhaps human moderated)

2\. Extremely low word count

3\. Extremely high count of fake words

And at the end of the day, these essay assignments aren't there to judge
whether a student is the next writing sensation; they are given to judge
whether the student can write legible sentences and words, to ensure they are
prepared for the future. So perhaps it is at least possible to automatically
blacklist on sentence structure and spelling (you should just lose points for
invalid structure or invalid words, you shouldn't gain points for big words or
complicated sentences). To make this fair, the student should be informed of
this requirement. If they are informed and still fail, then they need to be
remediated. If we discover that a disproportionate number of minorities are
getting blacklisted, then we should investigate why the school is failing to
teach them proper sentence structure and spelling, not pretend we can change
the world to make AAVE an acceptable dialect of english in the workplace.

------
ironSkillet
The underlying problem is that reading essays with a careful critical eye is
_not_ scalable. But another issue this highlights is the complete misalignment
of incentives of the people who greenlit the adoption of this technology.
Because educational outcomes are much harder to evaluate over the course of a
bureaucrat's tenure than budget sizes (longer time horizon and many exogenous
variables), there is a natural inclination to make decisions that reduce costs
as long as they don't have any _obvious_ (to them or their superiors) adverse
outcome for students. This is a pretty low bar, especially so given that most
bureaucrats do not have the background necessary to evaluate technical
solutions.

------
userbinator
I've heard stories from others in the industry of companies using tools like
this on their _human-facing_ documentation and requiring a certain score from
them. Imagine using Microsoft Word's spelling and grammar checker, not being
able to add or override its decisions (without following an extremely lengthy
and bureaucratic process), and being required to have less than X "defects"
per 100 words. Naturally, this results in documentation that is perfectly
grammatical and free of spelling errors, but verbose, full of unusual
phrasing, and next to useless for its actual purpose of informing a human.

Grading students' code using a machine is not such a bad idea in contrast,
because in that case there is [1] no exceptions possible in a programming
language, [2] the machine (compiler) has to understand it anyway, and [3] it
does save time verifying correctness. But communication in a human language
really needs to be assessed by humans. Anyone who thinks "AI" can accurately
assess human language is either severely delusional, or trying to make $$$
from it.

------
robinwassen
I am working with reducing the time teachers spend on exams and assessments. I
have access to a cleaned and manually scored dataset of 550k essays that is
exponentially growing. Looked at creating at a model based on this dataset to
automatically score essays with NLP parameters such as grammar, structure,
spelling, word complexity, sentiment, relative text length etc.

The problem that I encountered was actually how to apply it in a useful way,
since the problems mentioned in the article are quite obvious when you design
the model.

Options that I saw:

1\. Use it as autonomous grading with optional review by the teacher, see the
linked article for the problems with this.

2\. Use it as a sanity check on the teachers manual scoring, but it would not
reduce the work load and probably just undermine the teacher.

Do you have any suggestions for how such a model could be applied in a
practical and ethical way?

Had some thoughts on how to measure actual knowledge about a subject, but that
would require a massive knowledge graph which would introduce a huge amount of
complexity just to see if it would be a feasible approach.

~~~
na_ka_na
Here are some thoughts: 1\. Instead of grading, maybe you can use it for
training, tutoring. If a student is learning to write essays, I'm assuming
it's hard for them to get any feedback. 2\. But then there's probably not
enough money to be earned there.

One trick might be to write an independent AI to summarize the essay back and
see how closely it matches the essay title. This might weed out gibberish
essays with sound English sentences.

------
choeger
Such a stupid application of technology. It looks as if learning is completely
out of fashion nowadays.

First of all, complaining about minorities getting lower grades because their
English is not as sophisticated as that of others is the inversion of the idea
of teaching. That feedback is actually great. We have machines that can give
that feedback (e.g., grammarly)? Then use it to make everyone's writing
better. Grades are just a measure of the success of learning, after all. I
never got why one would not allow a student to repeat a particular test as
often as they like, tbh.

Second, grading essays this way is a clear violation of the idea of teaching.
_What_ do you want the students to learn? Structure? Knowledge transfer?
Grammar? Writing an essay is such a complex task it is a really too broad
goal. And then naturally grading becomes quite difficult.

------
amirmasoudabdol
While this is already terrible, I’m aware of a few project that are trying to
do the same with scientific literature. Basically they are trying to train
models for scoring literatures based on their quality, novelty and what not.
At the current rate and state of AI, I cannot ever imagine this is going to
work.

It was a few weeks ago that someone shared “The Dark Age of AI” on HN [1]. I
think we are promising way over what Drew McDermott thought we would not going
to promise. This is to the extend that we are applying AI on assessing Art,
Creativity and even quality and novelty of Science, something that in a way we
don’t even understand (or trying to understand) ourselves at the time that we
are publishing it.

[1]:
[https://news.ycombinator.com/item?id=20546503](https://news.ycombinator.com/item?id=20546503)

------
amatecha
Grading... algorithms... for essays? How/why is that even a thing? That's
absolutely insane. You can't grade someone's writing skills using algorithms.
That is totally counter to providing a proper education. My mind is officially
boggled.

------
colechristensen
Quality of educational is proportional to quality of evaluation.

Evaluation of how well someone follows arbitrary language conventions is worse
than useless.

I only got to university English 101 outside of some technical writing in the
engineering department, but I have to say none of my education in writing was
worth anything past elementary school. It is perhaps one of the most difficult
things to teach and evaluate, to be fair, but I feel like I am missing a huge
chunk of my education and general ability because of it. I can't write or form
an argument particularly well, rambling on HN and the like is the closest
thing to education I have had.

Prescriptive language rules are not entirely useless. That is the best you can
say about them.

------
waynecochran
I would like to see how it scored on essays by great writers. “Sorry Mr
Tolkien, I’m afraid you have to go to community college first.”

~~~
analog31
In my state, it's going to be "Sorry Mr Tolkien, but we eliminated all of the
departments that are not STEM enough."

------
rkagerer
I'm normally pretty open minded but this is just stupid. AI is nowhere near
literate enough for this task. What kind of world is it when humans create
merely for the consumption of machines. The product of our creativity deserves
better.

I would support any student who refuses to consent to their work being used in
this fashion.

------
lopmotr
I wish this machine bias wasn't always presented in such divisive terms as
race and "disadvantaged groups". It can affect anybody. If you happened to
develop a writing style that looks like typical bad essay writers' style, then
you could be hurt by bias in the grading.

~~~
fzeroracer
If an image processing algorithm fails to recognize black people or worse,
profiles them, how else should this be described but in terms of race?

If you don't talk about the actual problem, how can you possibly expect to
solve it?

~~~
lopmotr
There are many classes of people who have problems of discrimination. Short,
ugly, ginger, etc. The intersections of all those classes are so numerous that
everybody will have some disadvantage. But it won't be apparent unless you
define their class and measure it.

~~~
crooked-v
That's just substituting in smaller or harder-to-define minority groups,
though.

------
Meekro
From the article: " _All essays scored by E-rater are also graded by a human_
and discrepancies are sent to a second human for a final grade. Because of
that system, ETS does not believe any students have been adversely affected by
the bias detected in E-rater."

~~~
shkkmo
Also from the article:

> Of those 21 states, three said every essay is also graded by a human. But in
> the remaining 18 states, only a small percentage of students’ essays—it
> varies between 5 to 20 percent—will be randomly selected for a human grader
> to double check the machine’s work.

So that applies only in a minority of cases.

~~~
Meekro
Oops, my mistake! That's worse than I thought!

------
inlined
> the engines also focus heavily on metrics like sentence length, vocabulary,
> spelling, and subject-verb agreement... The systems are also unable to judge
> more nuanced aspects of writing, like creativity.

This reminds me of a wonderful essay/speech by Stephen Fry on the harm done by
pedantry. I also feel that schools focus so much on a single structure of
essay writing and similarly take the joy out of language.

[https://youtu.be/J7E-aoXLZGY](https://youtu.be/J7E-aoXLZGY)

------
danharaj
This is a natural development of industrialized education. Treating children
as individual thinkers would require for more resources and manpower than our
system would like to provide.

------
rdtwo
Bringing SEO mentality to standardized testing what could go wrong.

------
readme
Absolute garbage. Kids would be better educated by reading and posting on HN
than they would by attending English classes in one of the states that uses
these tools.

------
MisterBastahrd
Here's a thought: if classwork and homework is getting so overwhelming that
teachers can't possibly grade all of it, then it's overwhelming for the
STUDENTS too, and they shouldn't freaking be assigning so much busywork. You
don't need a 5 page essay to determine whether a kid has read a book. You can
figure that out really quickly in a classroom discussion without anyone having
to lift a pencil.

~~~
dragonwriter
> Here's a thought: if classwork and homework is getting so overwhelming that
> teachers can't possibly grade all of it, then it's overwhelming for the
> STUDENTS too

There's no necessary connection there, especially if one of the reasons that
teachers are being overwhelmed is that the teacher/student ratio is
increasing.

> You don't need a 5 page essay to determine whether a kid has read a book.

No, you need it to determine whether a student has (1) read and understood a
book well enough to apply structured thought to the contents and (2) has
developed the writing skills to write a 5-page essay.

Determining whether a student read a book is rarely, on its own, of
significant interest in school.

------
mrarjen
Reminds me of the plagiarism checker they had at my partners university, they
would check identical words on specific subjects... Meaning every word in any
order, so naturally there is a high % of overlap not only with quotes but also
the words used regarding the subject, the teacher would take this literally as
"you did not write this yourself" if 10% of words would be similar.

Don't think anyone passed that class.

------
auggierose
I can't believe that anyone would try to automatically grade essays. This is
either deeply cynical or astonishingly dumb.

------
nyxtom
Good lord what a terrible design. Rather than determine if the writer has a
coherent understanding and a complex prompt, the system grades based on
writing patterns. This is actually my biggest fear of AI. Deploying wide scale
systems like this that have very clear flaws

------
lmilcin
I live in Poland and it is the first time I hear about it.

I am absolutely apalled.

Not even at the idea of grading by algorithm, but by the fact that many, many
people had to cooperate to make this happen.

------
wedn3sday
You say "flawed algorithm," I saw "easily exploitable by intelligent
students."

------
Smithalicious
I don't even think _I_ would be qualified to grade essays, let alone an
algorithm!

------
pauljurczak
Teachers talk back and even may unionize! Crappy AI is cheap and can't
unionize.

------
nostrademons
It seems like these accumulated errors in the educational system and filters
needed to get through it would create a market inefficiency that could be
exploited by a firm willing to ignore degrees, grades, and test scores and
judge for themselves whether a candidate can do the job they're being hired
for.

------
gerbilly
Why are we even bothering to discuss this on this site?

Wouldn't it be better and less biased if we each wrote our own AI systems and
had them discuss with each other instead?

(And we should publish our training data as well, of course)

------
kwhitefoot
Why are algorithms grading essays in the first place?

------
40acres
The sooner we get it out of our heads that this education system of ours is a
meritocracy the closer we’ll get to actually creating a quality universal
system.

------
bayesian_horse
What are teachers but flawed Algorithms?

------
nyxtom
It is becoming increasingly evident that the hubris for implementing AI is
what is going to ruin everything.

------
crispcarb
Because a (likely unsophisticated) algorithm is grading the essays, there's
probably a deterministic method to do score well.

This seems like a terrible idea.

It's not a stretch to imagine the opportunity for nefarious behavior this
allows - think of the recent college admission scandals, and how happy they'd
be to have a guise of algorithmic indifference'.

If used long-term, it could offer a big advantage to the wealthy in other
avenues. Another hypothetical, probably not far from reality: the algorithm
becomes solved (almost or completely) by some premier 'tutoring' company. Said
company can charge a pretty penny given its stellar track record, offering yet
another hidden advantage to the wealthy/elite.

~~~
aidenn0
Surely there's a deterministic method to score well on the math questions?

~~~
crooked-v
An essay is to a math problem as a proof is to a grammar problem.

~~~
aidenn0
There's definitely a deterministic way to score well on HS level proofs. Also,
I think you are overestimating the requirements for an essay on a standardized
test.

