

Ask HN: University checks for cheaters when delivering code assigments. How? - redxblood

My university checks for possible cheaters when delivering programming assignments. by inputing them to a program that gives back the probablitlity those two programs have code copyed of each other. How does the program work in essence?
======
eksith
It may use something like Kolmogorov complexity
[https://en.wikipedia.org/wiki/Kolmogorov_complexity#Compress...](https://en.wikipedia.org/wiki/Kolmogorov_complexity#Compression)

...where symbol matching is used along with variable names after removing all
extra white space.

Although, I'm guessing your university uses PMD :
[http://pmd.sourceforge.net](http://pmd.sourceforge.net) It's a fairly popular
tool for detecting code issues (including copied segments).

------
alt_f4
Take a look at this:

[http://theory.stanford.edu/~aiken/moss/](http://theory.stanford.edu/~aiken/moss/)

Specifically:

[http://theory.stanford.edu/~aiken/publications/papers/sigmod...](http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf)

------
borplk
I'm no expert in this but I suppose just like how they check for plagiarism in
essays.

They usually compare submissions with other submissions, past submissions and
also against similar references on the internet in case a match is found on
some website.

~~~
redxblood
I guess... but there has to be something else. You can plagarize an essay, and
finding that is kinda easy because you copy-paste word by word and so, it's
obvious. But two programmers can have identical pieces of code without that
meaning they copied each other, don't you think?

~~~
sold
Not really; discounting extremely simple programs like print "Hello world",
there are many possible stylistic differences and people will write things
differently. One person will write for, another while; one i=0;j=0, another
j=0;i=0; another will name variables differently; another will take two lines
and make them into a procedure; one will write "if x then return y else return
z", another "if x then return y; return z;", another "if not x then return z
else return y" another "return x ? y : z" etc. If your assignment is at least
50 lines of code, compare it with someone else. You will see tons of
differences.

------
RougeFemme
Are you asking out of intellectual curiousity or a desire to beat the system?

~~~
redxblood
Curiosity. If i was capable of beating the system, then i might as well do the
assignment!

------
seiji
It's probably a combination of runtime analysis and code construction
analysis.

Submissions with similar runtimes will use the same algorithm. If most
students use the proper way, but a group of five students all, by "chance,"
pick an obscure dumb way (or a really obscure fast way), then they may be
copying. Manual investigation would be triggered.

Look into how JavaScript minimizers/obfuscators work. They could be parsing
your code down to a form where your variable names and comments don't matter
at all. They're just analyzing and looking for coincident structure of
programs.

Now, your administration is smart enough to not flag code where there's one
way to do it (write "find a member of a list"), but if 10 people copy the same
wrong way to do it, it would be a flag to investigate further.

