Hacker Newsnew | comments | ask | jobs | submitlogin
Hacker News search
7 points by ymn_ayk 496 days ago | comments
I'm starting to do 'Introduction to CS' from Udacity (http://www.udacity.com/overview/Course/cs101/CourseRev/apr2012). I'm planning to make a Hacker news search app through the course. What do you think, is that a good idea, or not? I know there is something similar; this is my plan: the search will be executed in the domains, referred by the hacker news stories. Thank you


rikacomet 496 days ago | link

speaking of this, there should be one. And a "http://Archive as well.

only a handful of topics make it to the top everyday, but the majority of content is high quality stuff, which if applied at a proper place would help many. What say?

-----

slash-dot 496 days ago | link

Keep in mind that if you crawl this site your ip will be banned. At least mine was when I was playing around with a web crawler i built.

-----

dbaupp 496 days ago | link

There is an unofficial API: http://www.hnsearch.com/api (Provided by the very search engine referred to in the OP, haha!)

-----

pyre 496 days ago | link

Unfortunately there is no API for getting access to personal information on HN (i.e. comments I have made, or stories I've upvoted). You're relegated to scraping if you want that information.

-----

unholygoat 489 days ago | link

now there is :

here is how you can pull a specific username's submissions and you can add filters: http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

And then here is how you can pull the comments for a specific thread/discussion/id: http://api.thriftdb.com/api.hnsearch.com/items/_search?filte...

you can now grab a lot of data.. including they enlarged the site's rss feed in hopes of slowing a few of the scrapers..

there are a few items missing, but they added a lot: http://www.hnsearch.com/api

btw that includes a user bio now, as well as things you've upvoted... etc.. its all just done via filters..

the also boosted the rss feed to help slow down the strapers

-----

beatgammit 496 days ago | link

You can always get around this by throttling your web crawler. It will take a much longer time, but at least you'll be able to read HN in the meantime.

-----

alphast0rm 496 days ago | link

The tricky thing when doing this is knowing what rate to stop at without getting permanently banned. I built an Android Market crawler two summers ago, and luckily Google only temp bans (from my experience), so that might be an easier project without any risk.

-----

quesera 496 days ago | link

Respecting robots.txt is probably the best plan.

-----

ig1 496 days ago | link

Use disposable IPs.

-----

ymn_ayk 496 days ago | link

Why will you be banned? Do you know about the reason?

-----

jgeralnik 496 days ago | link

To prevent people from (unintentionally) DDoSing the site.

-----

unholygoat 489 days ago | link

there's a bunch simliar but i don't think any have updated with the newly added API additions...go for it! http://www.hnsearch.com/api

here are most of the other apps still up http://news.ycombinator.com/item?id=2672826

-----

arikrak 496 days ago | link

I think a practical way to implement simple version would be through Google CSE, but you would have more control if you roll your own search.

-----




Lists | RSS | Bookmarklet | Guidelines | FAQ | DMCA | News News | Feature Requests | Bugs | Y Combinator | Apply | Library

Search: