
AI Grading Application Gradescope Shortens Grading Times - ibrahima
https://blogs.nvidia.com/blog/2016/09/02/gradescope-brings-ai-to-grading/
======
akrolsmir
As a former TA at Berkeley (where Gradescope was founded), I can't stress
enough how integral Gradescope was to our exam process. It streamlines the
process of scanning exams, allows us to grade them online, and then to easily
respond to regrade requests.

It seems like they've put a lot of work towards eliminating the remaining
bottleneck: actually assigning grades (which was done by a dozen profs/TAs in
10 hour marathon). Anything that automates busywork and allows teachers to
focus on actually teaching sounds amazing!

~~~
rosstex
As a current TA at Berkeley, those marathon parties are indeed a big part of
the job. But Gradescope has made things much easier overall, and but to scale
for >1700 students in our intro CS61A class!

------
achou
An investor in gradescope here... As a former TA at Berkeley, I can recall how
painful it was to grade exams, especially for lower division classes where you
could have hundreds of students. I love the idea of using machine learning to
help group common answers. It leaves the judgment in human hands, but helps
reduce the grunt work of identifying where that grading applies. A good
example of man and machine working together to produce a better result than
either could do alone.

------
inputcoffee
This story runs counter to the other story that on the HN front page at the
same time, which I enjoy:

[https://news.ycombinator.com/item?id=12414746](https://news.ycombinator.com/item?id=12414746)

[https://www.insidehighered.com/news/2016/09/02/massachusetts...](https://www.insidehighered.com/news/2016/09/02/massachusetts-
institute-technology-experiments-instructor-grading-massive-open)

~~~
johncip
Hey, Gradescope dev here. What detaro said is on the money -- we're able to
group identical short-answer responses so that they can be graded in one shot.
It's not necessary to analyze the answer content for this.

Many (though certainly not all) of the instructors using Gradescope are
teaching CS or Math courses with heavy enrollment. So each exam will have many
submissions (even 1000+), and each submission will have a lot of short
answers. Marking each one on its own is tedious, but until recently it was the
state of the art for paper exams.

Instructors can and do grade essays on Gradescope, and are able to save time.
But in that case the savings comes from being able to create rubrics on the
fly, to change point values without re-adjusting every single marked paper, to
grade across questions rather than across exams, to publish grades without
having to type them all in, and so on.

There's a lot of grunt work that goes into grading, and it doesn't need to be
the case :)

~~~
inputcoffee
I may have misread it. Does that still count as AI?

Also, they have a robot grading the GMAT essays since 1999
([http://www.800score.com/content/essay.html](http://www.800score.com/content/essay.html))

~~~
mesozoic
The classification of answers together is the AI

------
bluenose69
It took me until the end of the video to understand that by "ribble",
"rupert", "rippy", etc., he meant "rubric". This does not reflect on the
video, which is otherwise clear and to-the-point. I imagine the software is
reasonable.

------
thr0waway1239
In other words, the student who arrives at the correct answer by using the
wrong calculation will still get 100% on the question when using GradeScope
because the algorithm says so?

Although my guess is the same thing would have happened with the human TA
also.

------
bozoUser
Not taking away any credit from GradeScope folks but being new to AI, I fail
to understand how is the underlying system labelled AI. Seems like they have
created a rule based rubric for grading & human input can add/delete these
rules..?

~~~
dshields1
> The AI isn’t used to directly grade the papers; rather, it turns grading
> into an automated, highly repeatable exercise by learning to identify and
> group answers, and thus treat them as batches.

It seems like the AI is identifying equivalent answers among respondents. So
if you mark an answer correct on one test, every other test with the same
answer will be marked correct. I worked for a small competitor of this product
in college and we had a lot of trouble with this problem, especially with
answers that were prone to spelling mistakes, or could be written in many
ways. Kudos to them for doing this well.

Another fun bit of AI in this space is in identifying where the answer key
might have made a mistake. We developed some algorithms for determining the
most likely answer to a problem given the responses. We never released it but
I worked on a tool that would grade tests without an answer key at all. Using
50 question tests in a few freshman physics classes I was able to get the
right answer a little over 97% of the time.

~~~
analog31
This makes a lot of sense, and follows typical methods that were done by hand
long ago. When I TA'd freshman physics at a big university in the 90s, the
grading session was a two-pass assembler. First pass, we simply identified the
possible answers. Second pass, we marked them. This gave us much more
consistency, and it was quicker overall.

Amusingly, the exams always asked for a numerical answer, from which we could
guess which mistake they made, then we would find that mistake in their
calculation and mark it. Without that trick, identifying the specific mistake
in each answer was a pretty tedious process.

------
ghshephard
Better than what my calculus labs did - Assigned 30 questions and then
randomly marked five of them (same five for everybody) on everyone's paper.
Thing is - you never knew _which_ five they were going to look at.

~~~
Turing_Machine
One of my undergrad math profs went one better on that. She assigned homework
due every class period. At the start of class she'd flip a coin to determine
whether or not to collect the homework. If she did collect it, she'd flip a
coin for each problem to determine whether or not that problem would be
graded. The result was that the students had to do all the homework (unless
they felt like gambling), but she had to do only 25% of the grading.

------
abarrak
> "Karayev said the AI feature attempts to address three challenges ... and,
> perhaps the toughest of the three, recognizing handwriting".

and it's still one of the most complex problems in automated pattern
recognition.

------
jccalhoun
Seems like a nice system. But to get any traction they are going to need to be
able to integrate into Blackboard and other CMS.

Sadly as terrible as Blackboard is, it has a large market share (it must be
good on the backend stuff or something because it is terrible to use as an
instructor - especially if you teach multiple sections of the same class)

------
pepijndevos
The last time my university experimented with automated grading, every student
hated it. (PerusAll)

This approach looks much more sane, provided they avoid any false positives
slipping into a group.

