

Ask HN: Building a search engine using the Hacker News archive - paramaggarwal

Hacker News has the most brilliant submitted links from all across the web. And they are also already indexed based on their popularity.<p>If I were to leave a search engine to index these links, then I would have a very useful vertical search engine for anything and everything that interests people like us.<p>I don't know how this is done, other wise would have done so.<p>Any ideas why this hasn't been done yet?
======
paramaggarwal
For example, a search for 'mongodb tutorial' on HNSearch should have given me
more useful results than searching on Google. But it is not so, because the
results aren't presented well.

[http://www.hnsearch.com/search#request/all&q=mongodb+tut...](http://www.hnsearch.com/search#request/all&q=mongodb+tutorial&start=0)

------
friggeri
How about <http://www.hnsearch.com/> ?

~~~
paramaggarwal
Oh, yes of course, the default engine. But it is more of just an index. It
just shows the link names, and it links to purely what was posted on Hacker
News.

What I am dreaming of is something like Google bot with the Hacker News links
as the seed to start indexing... Also displaying the results with snippets and
rating the results.

What say?

~~~
noahc
What I think you want to do is:

1\. Take all URL's that have made it to the homepage.

2\. Weigh them by the points they've been assigned.

3\. Only go out say n deep (maybe three) by following all the links.

4\. Let this be your corpus of HN relevancy and rank the results returned by
the weighted points assigned to the original link with a decay factor so that
by the time you hit the last n deep links it has decayed appropriately.

I have 204mb of HN scraped by user. I don't think this would be useful to you,
but if anyone else is interested or is looking for a fun project that is
related to linguistics that's half started you can contact me at
noah@noahc.net

