

Visualizing 40,000 student code submissions - rsobers
http://www.stanford.edu/~jhuang11/research/pubs/moocshop13/codeweb.html

======
Shizka
Quite cool when you think about it. Each cluster probably represent a
different method for solving the problem. Awesome that it's possible to
classify the solutions like this. I think this might be usable for better
feedback on Coursera. Cool!

~~~
informatimago
Yes, and used in the reverse, starting from a red cluster, you can derivate a
working program. Now let's just find a way to find those clusters from problem
statements ;-)

~~~
Shizka
Ahh yes, I didn't think about that. I wonder if it would be possible to build
the best possible solution from this data in some way?

------
rube
Interesting that the outer edges basically have less occurrence of failed
answers. I guess that means that there is a positive correlation between
thinking outside the box and success? ;)

------
cdman
Interesting choice of colors - red signifying that all the unit tests pass :-)
(that is usually considered "green")

~~~
yaddayadda
The authors say that the colors correlated to similar implementations that
result in similar behavior, with red specifically indicative of passing all
tests. (I totally agree with you that green would have been a much more
logical choice). Which only leaves green and blue. I'm curious what the
distinction is between those implementations (e.g, blue passed some of the
tests, green didn't pass any tests).

------
chrismorgan
Abstract art? Yes. Of the best variety!

A couple of years ago, I made some abstract art of the inheritance structure
of a large project written in a language with (extensively used) multiple
inheritance, there being around 1800 classes. No one was game to produce a
15m-wide, 30cm-high wallpaper (the traditional type) of it, so I just removed
the class names, leaving classes just dots and made it my computer's
wallpaper. It's got quite a few comments. Still, it was nowhere near as pretty
as this.

~~~
akjetma
Do you still have the image? I've created a few myself and they're really fun
to look at. It's interesting to see the symmetry and orderliness of a project
in its early stages as compared to the frankenstein's monster it eventually
becomes. I'll post mine if I can find or re-run them.

~~~
chrismorgan
Matter of fact, while I'm still officially employed by that company (part time
casual) I haven't worked for them this year at all, having been focusing on
the final year of my Uni degree. And the images are at work. So I won't be
able to access it for at least a month and a half. I believe the number of
classes would now be in excess of 2,500. Certain things have been going on in
the past few years which have led to significant growth in both the
development team and the number of classes!

The language used is an in-house language, developed in the late 1980s and
early 1990s, and one that has aged surprisingly well (with comparatively few
modifications to the language since then), though there are now better options
available.

------
iMark
Looks like a load of Pollocks :)

------
khawkins
I don't exactly see the value in this visualization. Clustering measures and
feature analysis would provide far more insight into what's going on here. In
fact, it's not even clear how large the dominant clusters are or what all of
those speckles mean.

~~~
tlarkworthy
?

clustering is putting similar things near similar things. Tree edit distance
is quite a natural measure of distance for tree like things like programs.

You can't avoid some warping when putting high dimensional manifolds on a low
dimensional one. You can see a lot of their data does cluster properly but
their are some long range red arcs (in the embedding space) which are side
effects of warping (they are near in data space).

You can see a cluster of green which is clearly of interest ... why did so
many students get the wrong answer in the same way?

I see lots of value in that picture.

------
mrcactu5
I keep meaning to take the machine learning course.

This is a great way of using metadata to search for patterns in student
assingments. This could detect different "approaches" or "strategies"

------
RVijay007
Probably also allows them to more easily detect cheating on coding
assignments.

~~~
mcherm
No, comparison of text rather than comparison of ASTs is is better for that
purpose. There are many good reasons for ASTs to be equivalent and few good
reasons for text to match.

------
mkelley82
Very cool visualization, I'd like to see how this sort of technique could be
applied to other real world problems.

------
yeukhon
And probably a way to find who is cheating and who isn't :)

