Hacker News new | past | comments | ask | show | jobs | submit login
AI Grading Application Gradescope Shortens Grading Times (nvidia.com)
112 points by ibrahima on Sept 2, 2016 | hide | past | favorite | 18 comments



As a former TA at Berkeley (where Gradescope was founded), I can't stress enough how integral Gradescope was to our exam process. It streamlines the process of scanning exams, allows us to grade them online, and then to easily respond to regrade requests.

It seems like they've put a lot of work towards eliminating the remaining bottleneck: actually assigning grades (which was done by a dozen profs/TAs in 10 hour marathon). Anything that automates busywork and allows teachers to focus on actually teaching sounds amazing!


As a current TA at Berkeley, those marathon parties are indeed a big part of the job. But Gradescope has made things much easier overall, and but to scale for >1700 students in our intro CS61A class!


An investor in gradescope here... As a former TA at Berkeley, I can recall how painful it was to grade exams, especially for lower division classes where you could have hundreds of students. I love the idea of using machine learning to help group common answers. It leaves the judgment in human hands, but helps reduce the grunt work of identifying where that grading applies. A good example of man and machine working together to produce a better result than either could do alone.


This story runs counter to the other story that on the HN front page at the same time, which I enjoy:

https://news.ycombinator.com/item?id=12414746

https://www.insidehighered.com/news/2016/09/02/massachusetts...


Hey, Gradescope dev here. What detaro said is on the money -- we're able to group identical short-answer responses so that they can be graded in one shot. It's not necessary to analyze the answer content for this.

Many (though certainly not all) of the instructors using Gradescope are teaching CS or Math courses with heavy enrollment. So each exam will have many submissions (even 1000+), and each submission will have a lot of short answers. Marking each one on its own is tedious, but until recently it was the state of the art for paper exams.

Instructors can and do grade essays on Gradescope, and are able to save time. But in that case the savings comes from being able to create rubrics on the fly, to change point values without re-adjusting every single marked paper, to grade across questions rather than across exams, to publish grades without having to type them all in, and so on.

There's a lot of grunt work that goes into grading, and it doesn't need to be the case :)


I may have misread it. Does that still count as AI?

Also, they have a robot grading the GMAT essays since 1999 (http://www.800score.com/content/essay.html)


The classification of answers together is the AI


Does it? Essay questions are a special case, and this story doesn't claim that they can solve them. But they make improved tools, based on MOOC grading tools (from what I've seen of them) available for paper exams. Which e.g. would allow to spend more time manually grading an essay, instead of wasting time on checking simple questions of simpler form, which in many cases will make up most of an exam.


It took me until the end of the video to understand that by "ribble", "rupert", "rippy", etc., he meant "rubric". This does not reflect on the video, which is otherwise clear and to-the-point. I imagine the software is reasonable.


In other words, the student who arrives at the correct answer by using the wrong calculation will still get 100% on the question when using GradeScope because the algorithm says so?

Although my guess is the same thing would have happened with the human TA also.


Not taking away any credit from GradeScope folks but being new to AI, I fail to understand how is the underlying system labelled AI. Seems like they have created a rule based rubric for grading & human input can add/delete these rules..?


> The AI isn’t used to directly grade the papers; rather, it turns grading into an automated, highly repeatable exercise by learning to identify and group answers, and thus treat them as batches.

It seems like the AI is identifying equivalent answers among respondents. So if you mark an answer correct on one test, every other test with the same answer will be marked correct. I worked for a small competitor of this product in college and we had a lot of trouble with this problem, especially with answers that were prone to spelling mistakes, or could be written in many ways. Kudos to them for doing this well.

Another fun bit of AI in this space is in identifying where the answer key might have made a mistake. We developed some algorithms for determining the most likely answer to a problem given the responses. We never released it but I worked on a tool that would grade tests without an answer key at all. Using 50 question tests in a few freshman physics classes I was able to get the right answer a little over 97% of the time.


This makes a lot of sense, and follows typical methods that were done by hand long ago. When I TA'd freshman physics at a big university in the 90s, the grading session was a two-pass assembler. First pass, we simply identified the possible answers. Second pass, we marked them. This gave us much more consistency, and it was quicker overall.

Amusingly, the exams always asked for a numerical answer, from which we could guess which mistake they made, then we would find that mistake in their calculation and mark it. Without that trick, identifying the specific mistake in each answer was a pretty tedious process.


Better than what my calculus labs did - Assigned 30 questions and then randomly marked five of them (same five for everybody) on everyone's paper. Thing is - you never knew which five they were going to look at.


One of my undergrad math profs went one better on that. She assigned homework due every class period. At the start of class she'd flip a coin to determine whether or not to collect the homework. If she did collect it, she'd flip a coin for each problem to determine whether or not that problem would be graded. The result was that the students had to do all the homework (unless they felt like gambling), but she had to do only 25% of the grading.


> "Karayev said the AI feature attempts to address three challenges ... and, perhaps the toughest of the three, recognizing handwriting".

and it's still one of the most complex problems in automated pattern recognition.


Seems like a nice system. But to get any traction they are going to need to be able to integrate into Blackboard and other CMS.

Sadly as terrible as Blackboard is, it has a large market share (it must be good on the backend stuff or something because it is terrible to use as an instructor - especially if you teach multiple sections of the same class)


The last time my university experimented with automated grading, every student hated it. (PerusAll)

This approach looks much more sane, provided they avoid any false positives slipping into a group.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: