
Enough Machine Learning to Make Hacker News Readable Again [video] - Wingman4l7
http://pyvideo.org/video/2612/enough-machine-learning-to-make-hacker-news-reada
======
swalsh
This guy right here made watching the video worth it:
[http://i.imgur.com/twr2j8Y.png](http://i.imgur.com/twr2j8Y.png)

~~~
platz
Why is it that in genetic algorithms never seem to be mentioned any more. Are
they sub-standard, or just a "higher-level" than typically talked about i.e.
you must implement them yourself.

~~~
Russell91
Genetic algorithms are not really an off-the-shelf black box that you can just
plug your data into and get results. They take a domain expert to use
efficiently, and even then they aren't guaranteed to perform that well. The
area that I've encountered where they are most effective is in approximation
heuristics for NP-hard problems where you slowly assemble a solution from
smaller pieces.

~~~
mendicantB
+1 I'd also add that genetic algorithms are for optimization, and can't really
be compared with most of the algorithms in that chart. It'd be a sub-level
where different optimization techniques for finding model weights, for each
type of approach (classification, clustering etc)are compared.

~~~
simonster
Most (all?) of the algorithms on the chart iteratively optimize an objective.
However, most of the objectives are convex or otherwise admit an optimization
strategy that performs better than a genetic algorithm.

~~~
mendicantB
I believe you are repeating what I said (?). All of the algorithms have
different methods of arriving to an objective function and leveraging it's
results. Yet, most share the same problem in terms of optimizing it, and yes,
most choose other routes.

------
nostromo
Cool project!

But HN to me is a way to keep current on what people in tech are talking
about. I don't want to live in a bubble. I want to discover new things that
smart people think are cool.

~~~
tannerc
Something that keeps bringing me back to HN specifically (over the likes of
Reddit, Twitter, etc.) is the sheer intelligence of conversations.

More often than not I'll find myself skimming through the discussion here
before exploring the linked material. The reasoning behind why people feel
something deserves to be "front paged" and the insights that domain experts
offer in the discussion is what (I feel) makes HN valuable. Taking away the
brainy aspect behind how the community works would be an interesting
experiment, alas one I wouldn't want to see _replace_ what we have today.

"Things smart people think are cool" is nearly an understatement.

~~~
gabemart
This is _exactly_ what people on reddit used to say about reddit six or seven
years ago. I have to believe the decline in general quality, and the quality
of the front page most of all, in reddit could happen in some form to HN,
without vigilance on the part of users and mods.

~~~
krapp
Ironically, dang himself just got accused of posting something "not worthy" of
HN:
[https://news.ycombinator.com/item?id=7712692](https://news.ycombinator.com/item?id=7712692)

There seems to be a remarkable difference between what people feel HN should
be, and what it actually is. "Quality" seems to be more of a sliding scale
which correlates to personal bias, and perhaps, a feeling of alienation
brought about by a diverse community, than anything objective.

------
pygy_
Here's the result of the presentation: [http://hn.njl.us/](http://hn.njl.us/)

The classifier rejects this very submission... not sure what to think of it.

Maybe the "[video]" label killed it? The fact that it references HN?

~~~
ColinWright
More likely the retrieved text fron the page itself contains very little of
technical interest. If there were a transcript I expect it would fare better.

Great reminder - material in a video is undiscoverable.

~~~
egwor
Just my 2c; I can often get data faster by reading than waiting for someone to
explain it verbally, so I usually prefer not to watch videos.

If someone could automatically transcribe videos with key panes from the
video... (google??) ... then that would be cool.

~~~
001sky
^^^agree with this

------
philsnow
Idea: browser extension that notices two things: when you follow links from
the HN front page, and whether you upvote that story. If you read an article
and don't upvote it, it labels that article "dreck". If you do upvote it, it
labels that article non-"dreck". Maybe it has some subtle reminder that you
should remember to upvote the article once you're done.

People who use this extension make HN better for themselves (because they're
classifying articles according to their tastes as they go along) and they're
also making HN better for others (by incentivizing people to upvote good
material when they may otherwise have not upvoted).

If you have enough HN karma to downvote, maybe only downvotes count as dreck.
Then you're still improving both your own and others' experience.

Yo dawg, I heard you like HN, so I proposed a browser extension that lets you
improve HN while you improve HN.

~~~
dredmorbius
Two general problems with this, and they're common to many content-
recommendation / filtering systems.

• Explicit rating actions are only a small part of interactions with a site.
Other implicit actions are often far richer in quantity and quality -- time
spent on an article, interactions and discussion, the _quality_ of that
discussion (see pg's hierarchy of disagreement, for example), and other
measures. As Robert Pirsig noted, defining quality is hard.

• Whose ratings you consider matters. The problem of topic and quality drift
happens as general interests tend to subvert the initial focus of a site or
venue. Those which can retain their initial focus will preserve their nature
for a longer period of time, but even that is difficult. Increasingly, my
sense is that you want to be somewhat judicious in who you provide an
effective moderating voice to, but those who get that voice should be
encouraged to use it copiously. Policing the moderators (to avoid collusion
and other abuse) becomes a growing concern (see reddit and its recent downvote
brigades against /r/technology and /r/worldnews).

~~~
philsnow
regarding the first part, granted.

regarding the second part, the proposed scheme uses hn's built in control of
making users earn a bunch of karma before letting them downvote. I agree that
topic drift happens, witness all of the bitcoin related discussion over the
past year or so.

~~~
ColinWright

      > ... hn's built in control of making users
      > earn a bunch of karma before letting them
      > downvote.
    

Since all of this is talking about the classification of submissions, this is
irrelevant, because you can't downvote submissions, only comments.

At least, I don't yet have enough karma to downvote submissions.

~~~
dredmorbius
Submissions can be flagged.

I'd argue they should be downvotable as well, though you're right, they're
not.

Incidentally, comments can also be flagged (on the comment link view only, not
in the forum view).

------
ff_
And the most beautiful thing here is that his classification algorithm marks
this entry as "Probably I shouldn't read this"

I love it: [http://hn.njl.us/](http://hn.njl.us/)

------
harrystone
Are people really serious when they talk about the fabled Hacker News of old?
What could it possibly have been like? I'm imagining something like zombo.com,
only with lower contrast text.

~~~
drblast
The way I remember it, it was like this site but more frequently updated and
focused mostly on Haskell:

[http://lambda-the-ultimate.org/](http://lambda-the-ultimate.org/)

If you read a few rows down on that page, you'll see this:

"For the debate about MS being evil, you can head directly to HN where you'll
also find an explanation of what bootstrapping a compiler means."

And that about sums it up. For a while I didn't even create an account because
I didn't think I could add anything without sounding stupid compared to
everyone else. Now I try to refrain from commenting for...different reasons.

~~~
saraid216
> And that about sums it up. For a while I didn't even create an account
> because I didn't think I could add anything without sounding stupid compared
> to everyone else. Now I try to refrain from commenting for...different
> reasons.

Same here. Though I refrain less.

At some point, I'd like to go and find my first comment on here just to see
what got me to make an account.

~~~
walrus
[https://hn.algolia.com/#!/all/forever/prefix/0/by%3Asaraid21...](https://hn.algolia.com/#!/all/forever/prefix/0/by%3Asaraid216%20date%3C1274328000)

~~~
saraid216
That is the most disappointing thing I have seen all day. Oh well.

------
dkarapetyan
I actually did something like this at some point. I took all the high ranking
items, tokenized them to extract features, and ran them through a bayesian
classifier to do some filtering. I was just using whatever information was
available on the front page and did not do any further analysis with the
actual content.

The results were ok. Maybe with a bit more power it could be more useful but
the results were still hit and miss and I didn't have a long term strategy for
not filtering myself into a bubble other than continuously re-training the
model.

------
zmk_
As an econometrician I cannot believe how many times he said 'magic'. There is
something very wrong when you put things in your model 'because, who knows, it
might be helpful' (like he did with host names). Variable selection is a very
hard problem and using 'magic' is asking for problems. It is so disappointing
to see machine learning, statistics, econometrics deal with similar problems
and fail to learn from one another.

~~~
asdfologist
It's a completely harmless toy project, so who cares if he chose his variables
non-scientifically? He's not creating a cancer diagnosis tool here.

~~~
zmk_
I understand this is a toy project, but he is put in a position in which he
educates people how to use these methods and gives the wrong impression. The
next guy might use this flawed logic while creating a tool for disease
prevalence prediction.

------
soundoflight
Interesting enough... the greens are all ones I clicked on earlier today and
read. A few false negatives but not bad!

------
randyrand
Just because I think it's worth mentioning, I find it ironic that the link to
this video got marked as bad using his algorithm =)

Saw this at the link he provided.

------
vixin
Excellent first rate presentation. Those giving technical talks might want to
take note.

------
mjfl
Very block-boxy ("and it does a whole bunch of math and voila!")

~~~
ColinWright
Yes. So are compilers. And web frameworks. And editors. And memory-management
tools. Progress is made by no longer re-implementing and re-inventing the
things that many, many people have invented and implemented in the past, and
building on their work.

This doesn't mean that there is no value in learning about these things for
yourself, but the packaging of knowledge in reusable tools is the only way
programming progresses.

~~~
Dewie
Nicely encapsulated doesn't have to imply a black-box implementation, though.
I for one would like it if compilers were less black-boxy; ideally, I want to
find out why my compiler does a particular thing by investigating its output,
querying the API, going through the compilation steps, etc., rather than
having to google some StackOverflow answer.

