Hacker News new | past | comments | ask | show | jobs | submit login
A cure for Hacker News overload (jmillerinc.com)
121 points by epi0Bauqu on July 26, 2010 | hide | past | web | favorite | 38 comments

Privacy issues aside, I'd love a co-occurring votes / collaborative filtering based approach to HN. The problem here is that a fixed threshold of votes is a coarse estimation of quality and relevance.

Imagine a system where each HN vote is weighted according to your similarity to that voter. That way a vote by people with whom I have very little in common would also be worth very little to me.

I'd love to view a HN where I don't see the highly-voted Gruber/Apple/Facebook posts but I still see the stuff about Clojure, Steve Blank, and patio11.

You can implement what you want with simple keyword & url filtering.

No, I'm sorry, but I can't use keyword filtering for what I'm describing. Let me explain:

What I'm talking about here is uncovering "latent" communities, if you will. As in, make a giant matrix with the users being the columns and the posts being the rows and then use the eigenvectors to make recommendations (see SVD: http://en.wikipedia.org/wiki/Singular_value_decomposition)

The benefit of this approach is that I no longer have to be conscious of the topics I am filtering in or out. Even keyword based filtering is, again, a coarse estimation of relevance. I may be very interested in clojure, but I'm certainly not interested in every article that contains 'clojure' in the title.

An SVD (or similar) approach would filter my interests loosely on the co-occurrence of votes. That is, a vote from someone with whom I have high overlap is worth more to me than a vote from someone with whom I have never voted the same direction on the same post.

I question whether SVD would yield good recommendations.

In any case, co-voting data is not scrape-able from the public HN site, so I think using keywords and urls is really the only realistic filtering option at this point.

You can use people's comments as a (loose) proxy for their interest in a post; people who comment on something are more likely to have upvoted it (or at least consider it worthwhile to talk about, even if they never really vote on things.) You could perhaps even use Sentiment analysis, and take negative (root-level) comments as downvotes (and prune any branch below a negative comment, because it's probably an argument.)

Speaking of this, does anyone know of a good RSS filter? By that I mean a service to which you give a link to an RSS feed and provide certain filters, and they will provide a link to a modified version of the feed that they host themselves.

Yahoo Pipes is the closest thing I know to that.


Here's a screenshot of one of mine http://imgur.com/NLOkM.png

Awesome, this is exactly what I wanted! By filtering out all the Apple/iPhone/iPad-related crap, I'll probably be able to cut the number of new items from tech blogs in half.

And if it's a Gawker Media site, they allow you to filter their by changing the URI. For example, http://gizmodo.com/tag/not:apple/not:iphone/not:ipad/index.x...

I wish more sites had that kind of filtering.

Oh, that's even better. I don't subscribe to Gizmodo, but I do subscribe to Lifehacker.

Everyone forgets about postrank.com (formerly aiderss.com) It is actually what I use to filter HN... but who knows, I may switch over to the HN50 or HN100 described above.

I don't think that achieves the same thing. First, it requires the user to manually filter out new stuff they aren't interested in, where as a machine learning approach will evolve as the content space evolves.

Think of it like email spam. You can setup manual filters to filter out email spam, but that is a constant and never ending stream of work for you. A simple bayesian filter like pg has described will require far less work and give far better results.

In this case, a machine learning approach is even better because it can bring up stories that a user will be very interested in even though the story would never make it to the current homepage.

I'd like a Best of Hacker News that, like Best of Reddit, links to exceptionally interesting comments. The comments on HN are usually far more interesting than the stories, and often interesting comments are attached to uninteresting stories.

Well, there is a Best of Hacker News page, but it shows the best submissions of the past few days rather than the best comments.


There is also a best comments page at http://news.ycombinator.com/bestcomments.

Great solution and glad to see how you expanded it to other thresholds. I think this might be the best automated way to track the best of Hacker News (of course I think my http://www.hackernewsletter.com is another great way with more of an non-automated feel + other content.)

Very good project. I would love to see, in the rss feed, the real article (the HN text or the link the story points to). This would be great so that we don't have to get out of google reader to actually read the article.

Example : http://toadjaw.com/article?url=http://steveblank.com/2010/07...

Hacker News reader for mobiles, created by toadjaw, is using a scrapping script which will extract nicely the article from the linked webpage.

I prefer http://www.daemonology.net/hn-daily/: that's the top 10 articles of the previous day. Reading them one day late is not too bad for me, and it helps procrastinate less: it has put an end on my "let's see if there's something new".

Still this is a very good alternative, which I'll probably end up trying :)

If you're not time bound to the absolute latest, also check out the new http://hackerbra.in - it will shade down the snapshots you've already seen in its history

A noprocrast of 3 days works great for me!

This may actually prompt me to keep a Twitter client open again (for the first time in a year). Thanks!

Dapper + Yahoo Pipes = Same result, allowing customizable score filter, customizable sorting, customizable date thresholds, etc. In about 30 minutes.

If a single person replies to this comment, I'll implement it.

EDIT: T+46: As promised, but without the sorting or date threshold. Sorry.

Change the parameters to meet your needs. Supports min/max comments/points. Sorts by points DESC. Go play with the pipe if you want :)

Apologies that it doesn't tweet you.


sure, why not. Not sure what dapper is required for, couldn't you doo this all in pipes?

I use pipes for a number of my applications, great stuff yahoo.

Part of why I like HN is that I don't have a big queue of things to read, unlike my RSS feeds. I can skim without feeling that I'm missing something.

This is awesome, I love it.

However, is anyone else concerned that the driver seat of HN is essentially being handed over to new users as more advanced users switch to services like this or "HN daily" or even the "/best" page? It seems like the people that don't know anything exists beyond the front page will become the only ones left to do the work of curating content.

(just speculating, nothing against new users I haven't been here that long myself...)

As long as only a minority of users use this, great. Otherwise, if everyone's reading through things like this, WTF is voting the stories up in the first place?

Part of what makes HN good is that most people hit the site and help with the voting. The more people who move away to reading HN remotely or through feeds.. the worse the voting situation gets.

Agreed, but I don't think there's much to worry about as it stands. PG said newsyc currently gets 60,000 unique visitors per day, and the # of ppl following these twitter feeds is barely 1% of that.

PG said newsyc currently gets 60,000 unique visitors per day, and the # of ppl following these twitter feeds is barely 1% of that.

Though that doesn't say much to what percentage of the 60,000 are even users or logged in.. I suspect it's not a high percentage.

Thanks a lot, this was bugging me the other day and I was starting to brainstorm how I could filter things. This is perfect!

Thanks, I added the 50 score filter feed to Google reader and it works well for me so far.

Diff it!

Say we snap a cache frequently either centrally or locally. It would be so much easier to just diff the HN cache throughout each session with the last central cache or local cache to see what has changed significantly by highlighting or something.

I was thinking of adding that as a feature to http://hackerbra.in. Currently, news is every 2hrs, ask 5hrs, best 11hrs. The site tells you if you've already viewed the current snapshot, but actually highlighting the new items since the last snapshot would be an interesting feature. Also, when stored, having a tweet go out with the number of new items for that snapshot could be useful.

Many thanks! I've added the feeds to Pulse on my iPad and its working like a charm!

@newsyc20 includes links to the story itself and the Hacker News comment page for that story.

Sold! Not having links to the comments when following @newsycombinator always perturbed me.

Kudos for adding the user's Twitter username. Really impressive!

Still holding out for a version that uses Readability to include the full linked article directly in the feed :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact