Hacker Newsnew | comments | ask | jobs | submitlogin
Hacker News search
7 points by ymn_ayk 496 days ago | comments
I'm starting to do 'Introduction to CS' from Udacity (http://www.udacity.com/overview/Course/cs101/CourseRev/apr2012). I'm planning to make a Hacker news search app through the course. What do you think, is that a good idea, or not? I know there is something similar; this is my plan: the search will be executed in the domains, referred by the hacker news stories. Thank you

rikacomet 496 days ago | link

speaking of this, there should be one. And a "http://Archive as well.

only a handful of topics make it to the top everyday, but the majority of content is high quality stuff, which if applied at a proper place would help many. What say?


slash-dot 496 days ago | link

Keep in mind that if you crawl this site your ip will be banned. At least mine was when I was playing around with a web crawler i built.


dbaupp 496 days ago | link

There is an unofficial API: http://www.hnsearch.com/api (Provided by the very search engine referred to in the OP, haha!)


pyre 496 days ago | link

Unfortunately there is no API for getting access to personal information on HN (i.e. comments I have made, or stories I've upvoted). You're relegated to scraping if you want that information.


unholygoat 489 days ago | link

now there is :

here is how you can pull a specific username's submissions and you can add filters: http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

And then here is how you can pull the comments for a specific thread/discussion/id: http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

you can now grab a lot of data.. including they enlarged the site's rss feed in hopes of slowing a few of the scrapers..

there are a few items missing, but they added a lot: http://www.hnsearch.com/api

btw that includes a user bio now, as well as things you've upvoted... etc.. its all just done via filters..

the also boosted the rss feed to help slow down the strapers


beatgammit 496 days ago | link

You can always get around this by throttling your web crawler. It will take a much longer time, but at least you'll be able to read HN in the meantime.


alphast0rm 496 days ago | link

The tricky thing when doing this is knowing what rate to stop at without getting permanently banned. I built an Android Market crawler two summers ago, and luckily Google only temp bans (from my experience), so that might be an easier project without any risk.


quesera 496 days ago | link

Respecting robots.txt is probably the best plan.


ig1 496 days ago | link

Use disposable IPs.


ymn_ayk 496 days ago | link

Why will you be banned? Do you know about the reason?


jgeralnik 496 days ago | link

To prevent people from (unintentionally) DDoSing the site.


unholygoat 489 days ago | link

there's a bunch simliar but i don't think any have updated with the newly added API additions...go for it! http://www.hnsearch.com/api

here are most of the other apps still up http://news.ycombinator.com/item?id=2672826


arikrak 496 days ago | link

I think a practical way to implement simple version would be through Google CSE, but you would have more control if you roll your own search.


Lists | RSS | Bookmarklet | Guidelines | FAQ | DMCA | News News | Feature Requests | Bugs | Y Combinator | Apply | Library