

Automatic Grading of Code Submissions Isn't Perfect Either - gu
http://gregorulm.com/automatic-grading-of-code-submissions-in-moocs-isnt-perfect-either/

======
mistercow
I really think the concerns about bad code are overblown. I had a friend in
college who, for a CS class's final project, wrote an entire game in Java
within a single, enormous function body. I still don't know how he even
managed to do it, but it basically worked and he passed the class. He sort of
understood how functions worked, but he found them confusing, so he didn't use
them.

This wasn't at some community college or anything. This was at Georgia Tech.

That's an extreme case, and I certainly am not saying that just because it
happened in a respected engineering school, that makes it acceptable. But my
point is that in entry level courses (like the ones where you'd be
implementing a clip function), even professors grade on getting the job done.
Code quality just doesn't enter the picture at that level.

The thing is, trying to teach good code directly is pointless. Your less
bright students will accept the dogma and never actually understand how to
apply it usefully. Your brightest students will see it as a bunch of useless
bullshit that's holding them back.

If you want to teach good code, here's how you do it: Make a student write and
maintain a large project. Make them _keep it running_ for two years, while you
make them add more and more features. Keep checking it against an automated
test suite which they do not have access to, and grade them on its
correctness. Give them the resources to learn about best practices, but never
tell them they have to use them.

Then, at the end of two years, let them rewrite it from scratch. _Then_ you
will see a student who has learned the value of good coding practices.

------
trekkin
The author completely misses here - probably due to limited exposure to real-
life third-party code in real life production systems.

Code auto-grading, at least at Coursera, is usually done by running
comprehensive unit tests, which extensively test border cases as well. These
test suites are often 5-10 times larger than the actual submitted code, and it
is difficult to imagine anybody outside of this type of environment spending
so much extra time designing (and testing!) test suites with 100% coverage.

Moreover, code submissions have to comply with (or implement, in case of Java)
predefined interfaces. And some courses (e.g. Scala) have style checker output
taken into account (20% of grade is decided by the style checker in the Scala
course).

In summary, well-thought-out test suites and interface specifications demand
well-designed code submissions; in real life, poor comments or sloppy
expressions are a very minor nuisance compared to poorly designed interfaces
and forgotten border cases.

~~~
gu
I wasn't talking about insufficient test cases. Your remark about extensive
unit tests is therefore quite irrelevant in this context as I don't question
at all that the unit tests take border cases into account.

I am mostly concerned with "soft" aspects. Just consider the case where a
student has to define variables, but picks variable names in a language other
than English, or where control flow in a submission is more convoluted than it
would have to be. Those are the cases I discuss in the article.

Moments ago, someone left a very fitting comment on my blog:

"I am taking the edX CS169.1 course and I find that I will consistently have a
"less than elegant" solution that the auto grader accepts but that I feel is
sub-par. The irony is this class has a large BDD/TDD aspect and is teaching
RED-GREEN-REFACTOR, but with an auto grader once its green there is little
reason to go back and refactor."

~~~
chrisaycock
That comment on your blog is exactly what professional programmers do in the
_real world_ : pass the test suite and move on. After all, the goal of
software engineering isn't to write elegant code; it's to deliver software
that solves the customer's needs. And the customer's needs are tracked via the
spec, not the style guide.

~~~
Evbn
This is a short term attitude that is incompatible with building a system that
grows mor ppwerful or a decade. That may be OK or maybe not, depending on your
horizon and sunset plans.

~~~
alexkus
IME it's not that binary.

In our corporate environment we do enough to pass the tests, with one extra
'test' being a peer review which should take into account a list of criteria
that aren't easy to check for automatically; house code style, test code
coverage, future maintainability, g11n/i18n-ness, etc.

We often only go as far as 'just good enough' but the standard to which that
is assessed is pretty high.

------
CookWithMe
The automatic grading has a huge advantage: It is nearly real-time, and
improving the solution and re-submitting improves your score.

Having been a teaching assistant who corrected programming assignments (and
also a student), I always wondered how many of the students would read my
comments, would go back to their solution and actually improve it. Probably
none. If I (as a student) received a comment about a solution I submitted two
weeks ago, I often didn't instantly know what the corrector talked about. I
had to go back to look at my code. I'm not sure I always did that when I was
busy. Additionally, I think even if I acknowledged the comment, I wouldn't
actually go ahead and fix my solution.

I'm taking the scala course right now, and when I submit a solution and
something is flagged, my thoughts are right in the code, I still have all the
files open in vim, sbt running... so I can instantly go and fix it. And there
is a real incentive to do that, because my score will improve.

~~~
bheklilr
I agree wholeheartedly with this. I find that I learn more and engage more
with my online courses than in a traditional lecture because I am able to
watch a few minutes of instruction, then work an example and have it
validated. It breaks the lecture up into small incremental building blocks,
whereas in a classroom you normally get the entire lecture, then work the
homework several days later. By having real time feedback, you learn better
and more quickly. Also, since most programmers learn by example, if the
instructor uses well structured code, the students likely will, too.

------
runn1ng
This is nothing, compared to the "peer review" of the humanities lectures.

I know that there is no easy answer for doing MOOC (massive open online
course) in humanities, but, according to the web, Coursera's solution is not
working very well and, what is more striking to me, Coursera doesn't seem to
respond.

But again, I have no easy solution for grading essays in MOOC.

More information here:

[http://courserafantasy.blogspot.cz/2012/09/done-more-or-
less...](http://courserafantasy.blogspot.cz/2012/09/done-more-or-less.html)

[http://www.insidehighered.com/blogs/hack-higher-
education/pr...](http://www.insidehighered.com/blogs/hack-higher-
education/problems-peer-grading-coursera)

[http://gregorulm.com/a-critical-view-on-courseras-peer-
revie...](http://gregorulm.com/a-critical-view-on-courseras-peer-review-
process/)

------
zachgalant
I'm making <http://codehs.com> to teach beginners how to code. We're focusing
on high schoolers and promoting good style and good practices.

We have a mixture of an autograder for functionality and human grading for
style.

It's really important to get both. Our class uses a mastery model rather than
grades, so you shouldn't move on until you've mastered an exercise, and
mastery does not just stop at functionality. Style is included.

Making your code readable to other people is really important, and it can and
should be taught and stressed even on small exercises.

At Stanford, code quality is half your grade in the first two intro classes
because it's just as important that someone else understand your code as it is
to just make it work.

------
mark_l_watson
I disagree with the article in general because I think the secret sauce for
these online classes is involving students with non-graded questions during
the lectures, graded tests, and homework.

I think the comprehensive grading of programs submitted for homework is good,
but even if it is not perfect, in the 5 classes I have taken, the assignments
helped me dig into the material.

I also like the model of letting students take graded quizzes more than once.
I find that the time spent between the first and second time taking a quiz is
very productive for improving my understanding of the material.

These classes are fundamentally superior to just reading through a good text
book.

~~~
gu
But aren't those entirely different issue altogether?

Don't get me wrong, I do agree with you that MOOCs are a boon. A well-
structured course may be able to provide a better experience than working
through a textbook on your own. Still, this doesn't mean that those courses
live up to the hype.

------
tszming
What the article saying isn't specified to MOOC, i.e. think about continuous
integration vs code review - they are not contradictory.

MOOC is not going to replace formal education and I think the "limitations"
mentioned are perfectly acceptable due to the issues of costs and incentives
involved, e.g. In the Coursera's Scala course, there are more than 10K+ weekly
assignment submissions, you must need a scalable assessment method. (The
grader is not bad in fact, i.e. knows cyclomatic complexity, warn if you use
mutable collections etc)

------
intellegacy
I'm taking 6.00x and Udacity CS101 currently, and I'd have to disagree with
the OP.

The code checkers give you immediate feedback with test suites that are more
comprehensive than what students would (or could, in most cases) design
themselves.

sure there's no professorial feedback on your code, but 90% of the time those
comments you receive back on your printed out code will go unread. Not to
mention the lead time, often as long as two weeks, from the time you submit to
the time you receive back comments, often makes the comments worthless.

as for style, my Uni Intro to CS courses didn't check my style either. I find
6.00x and CS101 to be vastly superior in almost every respect.

finally, 6.00x and CS101 actually provide you with the "correct" answers after
you've passed their tests with an adequate solution. I've a few times found
myself hitting my head and thinking, "Why didn't I think of that! That's more
elegant than my solution.", and going back and attempting to implement their
solution. Try finding that in anything other than an online course.

------
benmmurphy
the scala coursera course does a style check which will catch some style
issues. i think it uses this: <http://www.scalastyle.org/rules-0.1.0.html> but
it wouldn't have caught the clip problem discussed on the blog.

~~~
CookWithMe
I found the style checker to be pretty good and helpful.

Because for some problems they give a hint, e.g. "this is solvable with a one-
liner", the student could figure out himself whether he is on the right track
or not.

Line-length could also be checked by implementing something similar to a style
checker. It could also check if some methods that are supposed to be used
(e.g. min/max) are being used.

I guess the author is correct in that automatic grading is not perfect and
it's never going to be as good as talking with someone more experienced... but
it can go pretty far. Having corrected programming assignments as a teacher
assistant myself, I have to say that it can be a really tough job and an
automatic grading system may give you more help. When I corrected an
assignment, and there were a lot of issues with it, I would point out the most
important ones. But an exhaustive list is really tough because there is time
pressure. Also, I think it could be too demotivating for the student if he
gets a list of 30+ issues from the corrector, I'd rather have him acknowledge
the 3 most important ones.

------
67726e
As a part of the build system at my work, we run various passes over the code
aside from just "does this compile". I'm sure these MOOCs could find software
to:

1\. Check style of a language

2\. Run a comprehensive suite of unit tests

3\. Static analysis of the code

These tools together can catch most problems of bad formatting, fragile code
(cannot handle edge cases, errors, etc.), and structural errors. Additionally
you could take into account some kind of performance of the code - does this
solve this problem in a reasonable amount of time?

By using standard industry tools, one could do a good grading system that is
entirely automated.

~~~
brunov
Most of the CS courses I've taken from Coursera already do all those things.
The Algorithms course from Prof. Sedgewick in particular had an excellent test
suite for its programming assignments.

You were graded on complying to an interface, code style, correctness,
performance and memory usage, with strict requirements on the latter two. You
couldn't get away with, say, implementing a brute force solution and calling
it a day, you had to solve the problem optimally.

