

Ask HN: anyone interested to build a hacker news with tags? - tucson

Could anyone help me build a hacker news with tags?
I am asking only those who are interested to have it as well because I only have a budget for the hosting for this.<p>The point is to be able to search through the whole archive using tags&#x2F;keywords.<p>example of tags:<p>&#x27;security&#x27;<p>&#x27;crm&#x27;<p>&#x27;a&#x2F;b testing&#x27;<p>&#x27;optimization&#x27;<p>&#x27;http&#x27;, &#x27;ssl&#x27;, &#x27;domain name&#x27;<p>&#x27;scala&#x27;, &#x27;c++&#x27;, &#x27;php&#x27;, etc<p>&#x27;lua&#x27;<p>&#x27;sql&#x27;<p>&#x27;marketing&#x27;<p>&#x27;website&#x27;<p>&#x27;landing page&#x27;<p>=&gt; get all posts that relate to each tag (and combinations of tags) <i>sorted by points of individual posts&#x2F;comments</i>.<p>To do list:
1. import all hacker news database
2. insert in database all tags for all posts&#x2F;comments, using an algorithm similar to the Kaggle Keyword Extraction algo (https:&#x2F;&#x2F;www.kaggle.com&#x2F;c&#x2F;facebook-recruiting-iii-keyword-extraction),
which will need to be refined.
3. create great user interface to the new database<p>-------
or if no-one has the time, could anyone advise me on how to download the whole hacker news database?
======
captn3m0
1\. You can download the dataset using
[http://hn.algolia.com/api](http://hn.algolia.com/api). Mind the rate-limits,
though.

2\. This has already been done quite a few times by various apps, most
prominently here:
[http://algorithmia.com/demo/hn](http://algorithmia.com/demo/hn)
([http://blog.algorithmia.com/post/86295023534/algorithmic-
tag...](http://blog.algorithmia.com/post/86295023534/algorithmic-tagging-of-
hackernews-or-any-other-site))

~~~
tucson
[http://hn.algolia.com/api](http://hn.algolia.com/api) Thanks. How many
requests would that require ?

[http://algorithmia.com/demo/hn](http://algorithmia.com/demo/hn) does not work
for me.

~~~
captn3m0
The first link states at the bottom: "We are limiting the number of API
requests from a single IP to 10,000 per hour. "

The hn/tag demo works one time out of five refreshes for me, so keep trying.
Here's a screenshot in case that doesn't work:
[http://imgur.com/yPF0hkn](http://imgur.com/yPF0hkn)

~~~
tucson
Thank you. I appreciate the screenshot. Interesting; I get the feeling the tag
algorithm is crucial to make it work. Not sure it's such a clear no-brainer as
I first imagined it...

------
wanghq
Can't you just search the keywords? I wonder how useful it would be given that
the information (tech articles, such as rails2.1, best features in
jQuery1.0,...) will be out-of-date as time goes.

I think what's useful is various tools if they are still alive. That's why I
want to build a toolbox which collects all the useful tools.

[https://news.ycombinator.com/item?id=8413016](https://news.ycombinator.com/item?id=8413016)

~~~
tucson
your idea might be better. I could help if you need support.

~~~
wanghq
your contact? check my profile for twit id

~~~
tucson
hi, you can email me at: timothee . henry @ gmail.com

