
The Hidden Perils of Automated Assessment - nemoniac
http://blog.brownplt.org/2018/07/26/perils-of-automated-assessment.html
======
gregmac
> We routinely rely on automated assessment to evaluate our students’ work on
> programming assignments.

Wait, why would you do that? Algorithms are going to be terrible at judging
the hardest and most important part of writing code: how maintainable it is.

Garbage code, through brute force, can eventually pass all test cases -- I
hope that is not what schools are teaching.

The idea of using test cases isn't a bad way to form _part_ of an evaluation,
given the code getting worked on has a defined interface spec. It can also be
done the other way around: run code that produces various bad results to
exercise the student's test cases.

However, more often than not in the real world specs don't exist at this
level, and by using them, they are going to lead people to certain
implementations (rather than showing whether the student can come up with
something appropriate on their own). From my experience in school, they often
also force a compromise design because the imposed interface has shortcomings
(eg: functions that use several output parameters rather than returning an
object, or force a certain functional composition that is entirely different
than what I would do and how I think).

~~~
mattnewton
As a former TA when I was in undergrad, having an initial filter of “does this
compile and produce output” probably regularly cut 25% of the submissions out.
If that isn’t working, it wasn’t worth my time to grade. I think that’s a real
world enough stance.

~~~
gowld
That's an awful attitude for a _teacher_. Why are you even there if you only
want to spend time on students who don't need your services?

~~~
icebraining
In my uni, the test suite was public; you were supposed to run it and talk to
the TA about the tests you can't pass before the deadline. If you didn't do
that, and just submitted broken code, then you'd fail the test. Seems fair to
me.

------
bitL
I remember when I once as a high schooler took a part of one nation-wide
programming competition. I read instructions that were like we expect this
output: "...". I assumed I need to output everything in quotes, so I tested my
algorithms and happily submitted once I was sure they worked fine. I was
pretty shocked when I got very bad score back - the issue was obviously I took
it too literally and they actually didn't expect quotes; after the competition
when I talked to organizers they re-ran the modified programs and I'd have
ended up in 4th place... Well, tough luck.

------
kevinpet
I read the article. I can usually follow statistics in things like social
science papers etc. I had no idea what it was trying to convey. You seem to be
using an unusual visualization and don't label the axes or explain what it is
saying, except in very general terms.

This is a pity because we're currently experimenting with using automated
assessment as part of our hiring process and understanding pitfalls would be
very useful.

~~~
skrishnamurthi
The post ends with a link to the actual paper. The paper's graphs do have
labeled axes. While I agree the blog post should have labeled them too (and
we'll fix that), if you really want to understand what's going on, you should
read the paper anyway — you're not going to get out of a few paras of
appetizer what took several pages of research paper to explain.

For your convenience, here's the paper link again:

[https://cs.brown.edu/~sk/Publications/Papers/Published/wkf-w...](https://cs.brown.edu/~sk/Publications/Papers/Published/wkf-
who-tests-testers/)

I don't think the visualization is all that "unusual": it's a standard
presentation called a KDE
[[https://en.wikipedia.org/wiki/Kernel_density_estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation)].

If you just want to understand the graphs, the x-axis is Rate and y-axis is
Density, because it's a KDE (as the text says just above the first graph).
Think histogram (which it also says): How many students had this particular
true-positive/true-negative rate?

But I think you really will understand this better by reading the paper.

------
Ninn
This post should get an F for not labeling the y-axis.

~~~
skrishnamurthi
Sure. We should fix that. But the post ends with a link to the actual paper,
which very much has labeled axes.

Anyway, the label is "density" because it's a KDE, which it says just above
the graph: "Visualized as kernel density estimation plots (akin to smoothed
histograms)".

Can we get our grade back now?

