

Ask HN: If you did a CS related bachelors, what was your final solo project? - nstart

Also interesting to note would be what happened to your final project&#x2F;dissertation once you completed your degree.
======
nstart
I'll go first. I made a prototype/proof of concept tool that could detect if
multiple authors had contributed to a project dissertation that should have
been written by a single author. I got the idea when I met a final year
student who was far off from finishing his project with just a few days to go.
He explained to me that someone else (a friend) was writing about four
chapters of his dissertation for him.

Even if lecturers read and realised the obvious stylistic differences between
the chapters written by the student and those written by his friend, a simple
"gut feeling" would not suffice as proof to accuse said student of plagiarism.
Thus I built a tool that would statistically analyse the writing styles of the
students and spit out a probability/certainty report that said which chapters
were almost certainly written by someone else.

It worked at a very basic level for long form texts. I never ended up making
it an actual product simply because I've lacked the time and determination to
collect and organise (especially organise) the material required to train a
system into being a real usable product. I had big dreams for it, including
usage in forensics and courts of law (answer to the question of was the
witness/accused/victim forced to write the letter? Did the
witness/accused/victim actually write it him/herself.. etc).. Ah well. Someday
maybe.

~~~
namelezz
I am currently interested in stylistic and stylometry. Do you know of good
resources on this topic?

~~~
nstart
Honestly, there was a TON of research on this topic. I'll try and dig out the
paper that was the foundation of my dissertation. Till then I can tell you
what I did to get my own reading. I simply googled for stylometry research
papers and then like a wikipedia article I searched for each paper referenced
by that paper. Stylometry is just a very vast topic so I might be able to help
further if you could say what kind of way you want to apply stylometry.
Plagiarism? Long form document authorship verification? Email verification?
Writing assistance?

------
EnderMB
A bit of back-story; during my second year of university I picked up C# after
a summer internship at a local company. I was purely taught Java at
university, but was keen to learn other languages as I wanted to expand my
knowledge past what Java would offer me. Sure, Java to C# isn't a huge
paradigm shift, but after playing with LINQ I was sold on .NET.

For my final-year project, I built a search engine from scratch using C# that
would rank websites based on "how they looked". This involved looking at the
various data structures required to store and retrieve this information,
alongside figuring out how to rank pages on both content and their looks. To
handle the looks, I took a screenshot of each page, reduced the size to around
100x100 and scanned the colours of each pixel to see if text would contrast.
Alongside this I scanned the CSS of the page to check text colour and see if
it matched the pixels; if it did, do a comparison against this. I can't
remember the exact algorithms I used, but if you used dark text on a dark
background you'd get ranked down, and vice versa, alongside some other
metrics.

The result was very crude, and looking back on it now the code was terrible. I
had implemented the data structures myself, had a semi-decent crawler set up,
and could sit all the tools alongside each other to crawl the Internet. Most
importantly, it worked so I ended up getting decent marks! While the code
makes me cringe, my write-up was fairly impressive, and I felt that writing
40k words over a year helped my writing ability considerably.

------
UnoriginalGuy
I made a electronic voting system. Nothing too fancy, it was 85% documentation
and 15% code (because very little of the final grade was about code anyway).

It was an attempt at reproducing proctors into an electronic system. Namely
interested parties could sign up to receive the votes in real-time. It had two
kinds of distinct databases/receivers (vote tallies, and a list of people who
had voted).

The two databases were designed so that they were auditable but went to pains
so the two couldn't be merged later trivially. So discovering who you voted
for was non-trivial. This was done in part by delivering the votes in real
time (no caching) and delivering the people who voted in big dumps and
randomising them continuously (so that the order they voted wasn't
deterministic). The voter database didn't even store which terminal they used,
only which location (since that is needed to check it against the paper list
later).

Essentially it was four pieces of software: A voting station, a vote
receiver/database, and a voter receiver/database, and a "vote designer" which
just set up the voting station's config (which was XML).

It was originally meant to have a piece of software so that people could
register themselves to become receivers/proctors but it was never implemented
and I had more than enough to talk about with the design and what was
implemented (and it also left me something to talk about in my areas of
improvement section).

After I graduated it was never touched again and I don't even know if I still
have the code...

------
snickmy
My was composed by a dissertation and a project. I went extra mile, it wasn't
required. In my dissertation I "leisurely" verified that the general problems
of collaborative filtering did appear also in a specific music dataset (that
for the sake of secrecy I'm not going to reveal,but you can consider a music
dataset as big as the one available in iTunes Store 5 years ago). My point
was, if you like A and B, and another user like A,B,C, is not necessary true
that you have to like or be exposed to C as well.

Later on I decided to try out and implement a better solution. In short it was
on paper a behavior driven solution for matching preferences, with an
underlaying layer of audio analysis and a collaborative fallback. If I look
back at it, honestly talking, the implementation was a mess and probably
biased towards the dataset.

I did a Master in ML + AI applied to Multimedia with a focus on audio signals.
I went trough the same idea, this time with a more scientific approach.
Graduated. Patented part of the algorithm, sold to a company, hired by the
same company, integrated in their system, and moved on :)

In other words, pick something that you are passionate about, it maybe turn
into a shortcut into the work world and give you a starting position much
higher than any other grads.

------
DLion
For my High School final solo project I made a few computer vision
applications, using OpenCV and C. Using the webcam you can "play", you can
draw, etc. using the color tracking method; I made a simple head tracking and
motion detector too. You can find it here:
[https://github.com/dlion/ExamProject](https://github.com/dlion/ExamProject)

