

Show HN: A client-side Bayes classifier for Hacker News - rogerbraun
http://rogerbraun.net/a-client-side-bayes-classifier-for-hacker-new

======
moconnor
I tried to train bayesian (and other) classifiers to reliably pick the same
stories to read as I would. Despite looking at a variety of things - title,
poster, domain, corpus from the article, corpus of the comments, I found their
accuracy was never really better than 60%.

Then I tried rating the same set of articles myself several times. My accuracy
was only around 60% too.

Figures.

~~~
duck
Yeah, I think it shows how hard it is to classify something with such a
diverse set of stories. Each week for my Hacker Newsletter project I have to
come up with a short list of links to share. I don't ever want to make it
"automated", but at the same time I need to narrow down what to pick from. I
have tried several things in the past, but what has worked best for me is
using a combination of: what i voted up, votes, # of comments, if <user>
commented, time on front page, and finally a lot of regex filters. Tying all
that together with a simple interface/tool allows me to find a list of
articles that I think my subscribers will enjoy.

------
polyfractal
Very cool! I've been hacking around with modifying HN's interface via JS a lot
recently - this will be a welcome tool in my experiments.

One comment: The up/down votes are really "strong" visually. Perhaps make them
smaller and/or lighter in color?

------
gauravk92
Maybe it's easier simply to classify things you wouldn't want to read and hide
those as less interesting. Because of the variety of topics, training
something to figure out what you like seems much more restricting on the flow.

E.g. if you rarely read things with ".js" (stupid amounts of js library posts
here), it'll be easier to say this is uninteresting to me, vs classifying
everything as interesting so the algorithm has to infer that you find js
libraries uninteresting.

Although I'm pretty interested in node but not js libraries for api's
necessarily, tough problem indeed.

------
Gring
As an alternative, just trust the HN home page algorithm.

Stories seem to move up to a relevant max rank position, stay there and then
move back down. Big stories stay in the top 5 for 20+ hours.

Here's what I do: If I only have time to look at 5 stories per day, I visit
once per day at any point in time and look at the first 5 stories. If I have
time to look at 20, look at the first 20.

Set yourself a timeout, start reading at the top, stop when the time is up,
repeat after 12 or 24 hours. Works very well for me, I get the best stories,
and feel pretty well informed.

~~~
achompas
HN's algorithm is great at pushing linkbait, self-help articles, and
frameworks associated with popular languages to the top.

None of those are interesting to me, however, so a personalized HN classifier
is awesome.

~~~
Gring
Don't you think that submissions are motivated by getting to the home page? So
with time, submissions become similar to what's on the home page.

If the home page is so bad according to you, wouldn't you agree that the
actual submissions still be a sub-par source for your filter, and a different
source would be preferable?

------
growt
Nice work. I hope it gets more attention in the next hours. Seems like an
interesting starting point for all kinds of experiments.

~~~
noelwelsh
Agreed. I love the idea of implementing it in browser. So simple and elegant,
and obvious in hindsight. Why didn't I think of it!?

