Hacker News new | comments | show | ask | jobs | submit login
Please review my API for HackerNews (ihackernews.com)
104 points by ronnier 2201 days ago | hide | past | web | 52 comments | favorite



Not sure if you have some other strategies for the URL scheme, but I'd probably use RESTful paths... like:

/users/{username}/posts instead of /by/{username}

or

/posts/{id}/comments instead of /comments/{id}

And how come threads are indexed by userid but posts are indexed by username? These kinds of things are the things that slip by developers and give them headaches. I can easily imagine not noticing the username/userid switch and being like "WTF!? 404?!" for a while.

I would expect the standard REST paths would a) make it easier to guess the paths and b) allow for simpler URL generation in client apps (you can generate the url for a user and then just tag on /comments or /posts to get the url for those things)


I'm sorry, but that isn't what RESTful means at all. Structured URIs like you mention can be pretty, but are completely unrelated to REST. In the real world they are often used in an anti-REST way, specified in advance instead of linked via hypermedia. If the client needs to use foreknowledge to construct URI strings, that goes against everything REST stands for.

REST is Hypertext As The Engine Of Application State — the default modus of ActionController::Routing::Routes has nothing to do with it.

A design that uses only opaque UUIDs as names for resources and reveals them to the client via links in the responses is perfect REST. Clean-looking URIs are a distraction, except that they tend to be easier to preserve across software rewrites.


Great advice, I agree with you. I'll work on changing it and the docs, but leave the existing paths for sometime. I've learned something, this was all worth it :)


Looks clean and simple. Well done. I'd add in the documentation comments about caching (on your side and the requester's side) and rate limiting.

For those wanting debugging tools for JSON APIs (a common request for the APIs I operate):

* JSONView, an addon for Firefox that prints nice-looking JSON from the URLs of the API. https://addons.mozilla.org/en-US/firefox/addon/10869/

* Tidy JSON, a command line tool. http://www.raboof.com/Projects/TidyJson/


An API that is subject to change should have a version number as a namespace somewhere in the URL. That way you can have different API versions running and it makes less painful to go forward.


How timely. I just started writing a library to scrape data from Hacker News because I wanted to put the posts I'd upvoted in the sidebar of my blog.

Link: http://blog.edparcell.com/how-i-added-my-hacker-news-saved-s...

Your API has advantages and disadvantages against this approach: On the upside, it provides a uniform way for all languages to access content from HN, which is really cool.

On the downside, all requests through your API have to flow through your server - this makes me uneasy for two reasons: First that you could switch off your servers, esp. if take-up is high and you are not being compensated sufficiently for running them. And second, because I'm uncomfortable authenticating to an intermediary.


Similarly, http://github.com/seven1m/hackernews if you want hit HN directly with Ruby.


If PG isn't opposed to an API, maybe somebody could hack the HN code to add it natively?


For anyone interested, I have created a Java library wrapping the JSON APIs exposed by Ronnie.

http://github.com/anoopengineer/jhackernews

Currently supports only fetching of News pages - top pages, new pages and ask HN pages. Support for comments and voting to be added soon.

Licensed under Apache 2.0 license.


Very cool idea. I will definitely start using it on my iPhone.

Regarding security, you are proxying login credentials through your server. Is that correct? I'd suggest putting up information regarding your privacy policy, if you store any credentials information and the security of your server(s).


That's a good idea. I'll put that up tonight.

FYI, I don't store any data at all. The username and password are required to get an auth token from HN, which is only needed for voting and commenting. The token is what's stored in the cookie that HN issues.


How about a way to retrieve comments or post ID for a given story URL? I often save articles and read them days or weeks later, and it would be nice if there were a simple way to find the associated HN discussion without risking upvote/story submission using the bookmarklet.


I could probably build something like that on top of ronniers API, not sure because I haven't had a chance to look at it and the api page is not loading for me at the moment.


I'm unable to do that because there's not really a way to do that on HN now. I don't store any data so I have nothing to query against.


I second this. It would be awesome if you could search by url.


Looks really awesome, congrats. So it scrapes hackernews and then exposes the data as a JSON api?


Yes, and caches the data for a couple of minutes.


If the data you get is old then I'd suggest caching it much longer to increase the chance of a hit, it probably will not change anyway.

That would lessen the load on the HN servers considerably, especially if your service becomes more popular.


How often is it hitting the server and how many pages deep does it go?


It only hits HN when asked, and caches each request for 200 seconds. Additional requests return the cached version instead of re-scrapping HN. It mimics what is on HN. So requesting /page only returns the first page. If you want to go deeper, you need to pass in the next page ID, which is returned when you request /page.


Good work, man!


Very awesome, needs a search, but yes, I'm sure it'll get use.


It sounds like a plea to be smacked with a banhammer, more than anything else.

It seems to support automating things things that are almost certainly better off left un-automated - posting, voting, commenting.

It doesn't support anything that might be interesting to automate - say, asynchronous notification on replies to me, posting or commenting on my url, mention of my name, mention of keywords I care about, etc.

It asks for HN credentials.

Nit, but still a little lame - lifts the HN favicon.


ihackernews is currently the best browser for hacker news on the mobile by far (its far better than the "app" in the android market).

this api is just extracting what he already built for ihackernews and allowing others to use it, I would be very very surprised if pg banned it, if it causes any issues then they can pretty surely be sorted out.


I think the original commenter was hinting at the opportunity for the author to provide a service on top of HN data, like monitoring posts, replies, karma levels, or providing more categorized feeds based on preferences.

It could violate terms of service as it could at its most basic level scrape Hacker News for this information, but there hasn't been any issue with it so far.


ihackernews is currently the best browser for hacker news on the mobile by far

Not really relevant to anything I said.

this api is just extracting what he already built for ihackernews and allowing others to use it

There are actually significant downsides to 'others using it'. One is that it becomes a single point of failure for anyone using it. It's essentially a proxy so one abusive user could make HN ban the whole thing and everyone else with it. Same goes for downtime, etc.

Similarly, it introduces a third party in the authentication process for relatively little value and significant risk.

I could well be missing something but I just haven't come up with very many reasons such a service is a good idea to counter the many obvious ways in which it is a bad one.


"Not really relevant to anything I said."

By ignoring the fact that this was built for a practical purpose, and only mentioning ways that it could be abused implied that the author had bad intentions when writing it, I was clarifying that for everyone else, he should be thanked for ihackernews at the least.

I didnt say there was no downsides to people using it, but people can figure that out for themselves, there are also significant advantages over 1. not writing code yourself, and 2. caching and sharing the load from this domain, we already know that this site has stability issues and bots can quite easily affect the load, even if all people do with this api is create new ui's for hacker news then it would be worth it, I am pretty surprised that is the only useful thing you can see it being used for, there is obviously a lot of use cases.


By ignoring the fact that this was built for a practical purpose, and only mentioning ways that it could be abused implied that the author had bad intentions when writing it

No it didn't imply anything of the sort. And whether something is built for a practical purpose or not is, in fact, not relevant to whether it's stupid or not. My point was that I think having this is a public web service is stupid and I explained why. The practical purposes you speak of could have been achieved just as easily by releasing the code so people interested in such functionality can use it as a library or host the service themselves.


This looks really handy for getting hold of my raw data. One thing -- parentID for comments I fetched with http://api.ihackernews.com/threads/Robin_Message is blank -- is it meant to be or am I missing something? I'd expect it to be the id of the parent comment, and possibly for there to be an "On" field that takes me up to the top level.


Thanks, I'll look at this tonight and get it fixed.


    "children":[{"postedBy":"ronnier","postedAgo":"21 hours
    ago","comment":"\u003cfont \u003eThanks, I\u0027ll look
    at this tonight and get it
    fixed.\u003c/fontu003e","id":1694452,"points":1,
    "parentId":1694340,"postId":1694049,"cachedOn":
    "\/Date(-62135575200000)\/","children":[]}]}
Thanks!


Very nice!

Two questions, curious as I'm as well in the process of indexing HN, and your API may help me avoid this:

- how much content are you actually indexing ? Do you keep every single post or only the ones that do it on the home page or ask HN ? How far in time did you go ?

- do you have some way to implement a full-text search (eg: posts that contain a specific word, to be accurate) ?


I don't store or save any data, other than an in memory cache. I just scrape, process, and output the data. Since I'm not storing data, I have nothing to search.


I may very well use this to launch a Hacker News reader for iOS. Is there space for this (would you want it)?


I couldn't wait for JSONP, so I whipped up a simple wrapper proxy (couldn't believe this didn't exist already).

http://jsonpify.appspot.com/?url=http://api.ihackernews.com/...


This is awsome, I was looking for something exactly like this last weekend! No more scraping for me =)


IIRC there is some Internet Explorer issue involving the "application/json" content type which makes it safer to just use "text/plain". Worth looking up...


You usually access an API programmatically rather than via the browser, so this shouldn't be an issue for most use cases.


Unless it supports CORS for cross-domain access from the browser: http://www.w3.org/TR/cors/

Which it probably doesn't.


It says in the first paragraph that he intends to support jsonp in the future. It will matter then.


Wich language/framework have you used to create it?


It's all C#, ASP.NET MVC. I'm just exposing functionality that I had to build in for http://ihackernews.com.


Awesome. I'm hoping that you already have a library of HN entities and are using the JsonSerializer? Would it be possible to share that C# library or some client code?

Of course I could just do it myself, but it seemed like silly work going from C# > REST > C# :)


Thanks, but haven't read your resume first. After that i thougt that this is the answer. :)


Now you should write a client for the API using RestSharp ;)


I thought PG frowned on this kind of stuff?


Can you link me? If he does, I'll bring it down.


etiology of previous frowns suggests frustration due to load placed on the server. in this case you should actually be alleviating load on the server since so many people love to hack on HN for fun.


PG frowns on clever hacking?


At some point this API will fail if it gets popular to the point of triggering HN's abuse protector.


OpenID login? Everyone seems to forget about this.


Thank you!




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: