Hacker News new | comments | show | ask | jobs | submit login
Code Webs – Visualizing 40,000 student code submissions (stanford.edu)
39 points by ohjeez 1421 days ago | hide | past | web | 16 comments | favorite

This was posted a couple of weeks ago.


Yep. Seems like the software should know that a lonely "#" appended to the url (and similar things) is a duplicate page?

Very fascinating. I'm excited to see where they go with these data. The final paragraph is where the money is.

In particular, I think it'd be interesting to track students over time. Do some clusters have more difficulty picking up later concepts? Is their submission, while correct, showing some systematic error in their mental model of the language or topic?

It'd be very cool to give qualitative feedback in addition to the quantitative unit tests based on these clusters. E.g., "Your code, while correct, is demonstrating characteristics that may be less maintainable than other submissions. In addition, we recommend a review of [some topic]; using those concepts would simplify your code."

Is this also what is used to enforce academic standards (i.e. each student doing individual work)? I've heard CS lodges the most official complaints of any department.

From what I read, I don't believe it is. In my opinion it should also never be used for this purpose.

While you could probably catch a lot of cheaters this way, there is a possibility for a large false positive rate. If this is true then I would especially advise against deploying this type of software in a traditional university since the academic dishonesty policies can often cause significant and undue harm on an innocent student.

Good comment, but I wouldn't say never.

As an instructor of programming on a university level, I like to think that I have enough sense to know that particularly for "trivial" assignments, some similarity is expected. However, as I've encountered, a great deal of similarity over multiple assignments (and exams) between two students of the same nationality who sit together in class provides additional evidence of plagiarism.

So, yes, I agree a single data point of similarity is insufficient, but a history of similarity, particularly in complex projects, becomes more damning.

I got flagged as a freshman for "55% similarity" (whatever that meant) to another students submission in a "learn how to write shit in C++" type assignment. As far as I could tell, the only thing that triggered the software was the fact that both I and the other kid used do-while loops, while nobody else in the course did. The rest of the programs were semi-similar, just a few lines of cout/cin/<</>>/... to ask your name and echo it back.

So basically what I'm saying here is that I think "for "trivial" assignments, some similarity is expected" isn't always widely understood, to the detriment of students.

I think these sort of systems become most valuable when used to check work against work submitted from previous years to bust frat-house collections of answers, but varying questions year from year probably helps even more in that regard. Similarity between complex projects in the class sizes that were typical at my university (in classes advanced enough to have complex answers) was pretty easy to spot manually. Maybe edit-distance software is useful there to put some weight behind accusations?

I'd prefer that to a Jackson Pollock.

Has anyone managed to compute the value of the Hausdorff-Besicovitch dimension?

Now it's a very long time since I did fractal geometry but as a simple countable set isn't the Hausdorf dimension 0 [zero]?

"Now it's a very long time since I did fractal geometry..." Yep, same here I'm afraid. But A Pollock at the fundamental level is just atoms, which I would think is just another countable set. Still, research has been done into the fractal nature of his paintings. Apparently as he matured the HD dimension increased.

I'm thinking of real phenomena exhibiting a 'partial' fractal nature. I think you are thinking in the pure maths realm.

BTW I still don't like his paintings generally. Manet and Holbein are more my thing, or Morandi on certain occasions.

I've been working in a public facing creative arts role for a while now and my appreciation of the likes of Pollock has become somewhat more favourable over that time.

Hadn't heard of Morandi, not sure his stuff means much to me, however I like this photographic interpretation of his work - http://static.dezeen.com/uploads/2009/06/dc03_ins.jpg (to sell a dinner service). Thanks for the pointer.

Isn't comparing code on edit distance a bit too simplistic? If I pull some functionality into a separate function the code tree becomes already quite different

Read the article, it's not code edit distance, it's AST edit distance. If the extracted function was the same as the code inline, I think this would have little effect, though that might depend on the AST parsing.

Love to see something like this to visualize the evolution of a complex software project such as Linux git.

That is pretty amazing.

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact