Hacker News new | past | comments | ask | show | jobs | submit login
Hacker News API (ycombinator.com)
1714 points by kevin on Oct 7, 2014 | hide | past | web | favorite | 298 comments



Oh man you guys, patio11 has generated massive amounts of content: https://hacker-news.firebaseio.com/v0/user/patio11.json?prin...

I count 8,483 submissions. I'm sure there's something interesting to be done with all of this data. A word frequency chart?

---

Edit: So apparently there's a ruby gem that lets you feed it a body of text and generates pseudo-random phrases based on that text.

I present to you the patio11 impersonator: https://gist.github.com/christiangenco/e8d085e47479be0131e1

One of my favorites:

    A nice set of challenges -- kitty at a school with tens of thousands of bucks a year or less immediately.
---

Also, a word count on patio11's submissions: 1,052,351. For comparison, all 7 Harry Potter books total 1,084,170 words. patio11 has written the entire Harry Potter series worth of content on HN. Just... wow.


In February, I did an analysis on all Hacker News stories: http://minimaxir.com/2014/02/hacking-hacker-news/

Yesterday, I published an analysis on all Hacker News comments: http://minimaxir.com/2014/10/hn-comments-about-comments/

There's a lot of interesting trends in the data. Let me know if you want to know anything in particular and I'll get back to you. :)


It would be interesting to redo the top karma list by comment karma only. It would also be interesting to look at liberal vs conservative political sentiment over time. I feel like the site has gotten more liberal over the years, especially after all the Chelsea Manning stuff, but not sure if that is accurate.


I actually have the list of top users by comment karma: https://docs.google.com/spreadsheets/d/1Ws2VK-UB7IQpVUZgPZ9M...

I also included the number of thread wins (user made comment with highest number of points in a given submission thread) to see if there was any unusual relationship. There wasn't: correlation is 0.89. I ended up not including it in the article because it's long enough as-is. :P


Thanks! I would have thought that I'd have been higher ranked when looking at only comments, although actually my rank is almost the same. Interesting to see that many of the other top users are using the site in more or less the same way.


Is that top overall karma users, but then sorted by comment karma, or is that actually top comment karma users?


Top comment karma users.

SQL query:

   SELECT
   author,
   SUM(num_points) AS total_comment_karma,
   MAX(created_at) AS last_comment
   FROM hn_comments
   GROUP BY author
   ORDER BY total_comment_karma DESC
   LIMIT 1000


It'd be interesting to see if including an URL in the body of a post has any effect on votes received.


I would be interested to know the average hourly cost of HN in lost productivity :)


I'd like to know the average hourly amount earned from HN. I know personally HN has helped me greatly in the projects I do at work.


Yes this is an interesting question too, but one that is much harder to answer. I love HN, but it sure gets a lot of activity during work hours :)


We seem to think that time spent learning is lost labor, rather than that time spent laboring is lost time for learning, although we know that both learning and laboring are required for productivity, and that learning is capex.


40 wpm x total words x (hmm need to look up productivity stats) - but we could do it :-)


That's some pretty impressive analysis, nice visualizations too. Thank you for doing all that hard work!


Thanks for that. Great stuff.


Thanks, I had been curious about that number for a while. The last time I checked it was 500k or so.

For folks who want to do interesting things with the API but don't want to be abusive to Firebase's servers, I whipped up a quick ruby script to cache a particular user's comments/submissions on disk: https://gist.github.com/patio11/1550cad3a02edd175049

It tries to rate limit itself by putting 200ms of sleep between requests, so downloading all of my comments would take ~30 minutes.

"I release this work unto the public domain." -- feel free to adapt it to your needs.

Usage is "ruby slurper.rb $USERNAME $MAX_COMMENTS_TO_FETCH."


Interesting - you rate limited your requests, and I multithreaded mine :p (with x10 I downloaded your comments in <3 minutes).

From https://www.firebase.com/pricing.html it looks like the top plan supports 10k concurrent connections, so I suspect the impact is negligable.

Thanks for being an outrageously good resource and beacon of inspiration! You've unknowingly been one of the most influential role models in my career/life: I just relaunched one of my side projects as SaaS last month and it's succeeded beyond my wildest expectations (already at ~$8k YRR). Hopefully I can follow your trajectory and never have to actually work another day in my life :)


Congratulations on the success. Nothing in business makes me as happy as folks telling me that what I wrote/did/etc helped them out.

Though I don't know if I'd describe my lifestyle as "never having to actually work another day in my life." It feels less like work some days and more like work others. For example, it is 1:30 AM and while I could be snug in my bed I am instead clearing out the AR support inbox. (Poor planning earlier today, but still.)


"Actual work" meaning Japanese salaryman/having-a-boss-that-tells-you-what-to-do-and-when-to-do-it. I think I'm still in the honeymoon phase of customer support: I still get little rushes of adrenaline even for angry complaint emails ("something that I've created has provided so much utility for someone that they're angry when it doesn't").

It's certainly difficult at times, but when you can set your own hours, do wherever you want on whatever you want, and take as much time off as you want for any reason, it's difficult to justify using the W word.


tptacek is similarly impressive, with 30,051 posts.

https://hacker-news.firebaseio.com/v0/user/tptacek.json?prin...


"Impressive" is one word for it, I guess.


LOL. My first thought was, "That's kind of a dick comment." My second thought was, "Oh, it must be tptacek himself." :-)


That was exactly my reaction as well.

I wasn't going to comment, but "mhartl" looked familiar. I just wanted to say "thank you" for your Rails Tutorial; I don't think I would have ever learned to program without it. It literally changed my life.


Awesome, glad to hear it!


I had very literally this exact same reaction.


We're all eating the elephant one byte at a time. With 193,000+ karma his average item gets about six upvotes/resubmissions.


You can get a 'prettier' version by removing .json entirely:

https://hacker-news.firebaseio.com/v0/user/tptacek


Another frequent contributor, edw519, has published an ebook of his or her comments: http://hn.my/edw519


his or her comments

The book is copyrighted by Ed Weissman, so although it's possible "Ed" is short for "Edna," the higher probability can be assigned to "his" in this case.


I didn't look closely, 'his or her' is just a habit.


I'm speechless. Honestly, the content is so expansive and valuable. I love the internet, hackernews, physics, code, knowledge, and the tiny things in between :)


It checks out, I put in TempleOS and got out Genesis: "We have dreamed a dream, and there is no interpreter of it."


Here are some line charts of his posting history: http://hnuser.herokuapp.com/user/patio11/

Click on the line chart to do an hnsearch for the time period.

Update: Site should be back up. It crashes occasionally (that's part of why I hadn't post it yet.)


Oops, Application Error. You getting the HN effect?


How about feeding the top 20 into Bingo Card Creator?


off the top of my head: highest value word patio11 writes is "more" or "raise"...


This... is cool, but also kinda sucks for me. I've invested dozens of hours into writing an extremely complicated scraper for my Android version of HN.

https://play.google.com/store/apps/details?id=com.airlocksof...

The newest version (still under development, probably a month or two from release) adds support for displaying polls, linking to subthreads, and full write support (voting, commenting, submitting, etc). I'm fine with switching to a new API (Square's Retrofit will make it super easy to switch), but without submitting, commenting, and upvote support I have to disable a bunch of features I worked really hard on. Also it would've been cool to know this was coming about 3 months ago so I didn't waste my time.

Anyways, quick question on how it works -- when I query for the list of top stories

https://hacker-news.firebaseio.com/v0/topstories.json?print=...

it just returns a list of ids. Do I have to make a separate request for each story

https://hacker-news.firebaseio.com/v0/item/8863.json?print=p...)

to assemble them into a list for the front page, or am I missing something?


I'm sorry you just invested a lot of time in scraping. I know from experience what a pain that is. We said several times that the API was coming, and I've made it clear to anyone who asked, but there's just no way to reach everybody. All: in the future, please get answers to questions like this by emailing hn@ycombinator.com.

Re write access and logged-in access, if that turns out to be how people want to use the API, that's the direction we'll go. But we think it's important to launch an initial release and develop it based on feedback. There are many other use cases for this data besides building a full-featured client: analyzing history, providing notifications, and so on. It will be fascinating to see what people build!


I'm not blaming you. It just feels bad, you know? I'll definitely email you in the future about stuff like this. And don't get me wrong, it will be great to be able to throw out the cruft that comes along with parsing the current layout. The app is engineered to be able to drop in a new API pretty quick since I thought something like this would happen eventually.

It would help me out a lot if the current front end would live on under oldnews.ycombinator.com like that until the new API has write access, though. I think it's pretty cool to be able to be reading an article somewhere else, click "Share" in Android and have "Submit to HN" pop up in the results.


I second that request. Having a subdomain point to the current layout for a little longer is definitely going to help the transition, especially for write access and platforms without Firebase SDKs.


This... is cool, but also kinda sucks for me. I've invested dozens of hours into writing an extremely complicated scraper for my Android version of HN.

This definitely does suck. I feel your pain. But it's also part of the package of scraping websites. You go in knowing that it could break at any time.


Oh, I'm well aware. I've had to push many quick fixes when some field gets renamed, etc. It's really not the API change that bothers me, more the lack of features. But hopefully they can add those things soon and I can re-enable them down the road.


Yes. While with HTTP pipelining you can request them all over a single TCP connection using a single SSL session, you will need to make an HTTP request for each item you want.

If you're on a supported platform, the Firebase SDKs handle all this efficiently and can even provide real-time change notifications.


[Firebase Employee]

If you use the SDKs, we handle the connection and all of the data is sent over a reused full duplex socket rather than individual requests. https://www.firebase.com/docs/android/, https://www.firebase.com/docs/ios/, https://www.firebase.com/docs/web/


I'm trying to attach a ChildEventListener to the "item" Firebase and I'm getting a "permission denied" error. My guess is that I am doing something wrong, but on the off chance that the adding event listeners is not (yet) enabled, it would be nice to know. Any clues to what I might be doing wrong?

I've never used the Firebase API itself before. It's very clean!

Edit: I reached the same (now obvious) conclusion as mentioned in the reply below. Now my quick hack is working perfectly. Thank you so much for this!


[Firebase Dev Advocate] Glad you're enjoying Firebase! Attaching a listener to the "items" Firebase is disabled. This is because it would send every item from HN to your computer. You'll need to attach a listener to the individual item instead. The "permission denied" error is coming from the security rules on the HN Firebase (https://www.firebase.com/docs/security/quickstart.html). If you're trying to find out what the latest updates are, they're kept in the /updates node (https://github.com/HackerNews/API#changed-items-and-profiles).


I'm also currently writing a scraper[1] for the HN frontpage (for my WIP Hacker News redesign), and while there's a limited Algolia API available, it doesn't do much good if users can't post comments, upvote etc. Same goes for the official one now.

So, @anyone involved with the API project, can you give us an estimate on when will the OAuth-based user-specific API be rolled out? I'm fining with pausing my efforts until then, if it's going to be soon, in order to go a less complex and error-prone path.

[1]: https://github.com/geomaster/hnop/blob/master/backend/src/hn...


It's not going to be soon :(


[Firebase Dev Advocate] @airlocksoftware - Yes, you should make separate requests for each story. You can attach a listener to the topstories node (https://www.firebase.com/docs/web/guide/retrieving-data.html...) and when that’s triggered, you can make a request for the data on each story. Using the Firebase SDK, each request will get made using the same connection. I'd recommend using our SDK instead of the REST API so you don't have to worry about managing your own connections and retries.


Here's an example showing all topstories and updating in realtime. Obviously, in JS, but the other Firebase SDKs are similar: http://jsfiddle.net/firebase/96voj1xh/


Just wanted to drop a comment on the awesomeness of your app. Hacker News 2 is by far the best Hacker News app, not just on Android, but on all mobile platforms i've tried (so, iOS, Android and Windows Phone) Awesome work you are doing.


I did use your app for learning purposes - I studied the code quite a lot when learning Android. Thanks for good job !


Yeah, I like it a lot, but I've put tons of time into my scraper for Reader YC (https://github.com/krruzic/Reader-YC). I support everything but polls currently. This api is nice but my scraper actually supports more... No option to get Show HN, Ask HN or New afaik. Still glad this is out!


Exactly. Is this really the case, or it just isn't documented? I've send an email to api@ycombinator.com about that and hopefuly, I will be able to shed some light on this, later. I will write as soon as I get a response. (assuming someone responds)


I've got a reply from YC.

Yep, there are no API methods specifically for getting the Show, Ask, New, Comments etc. lists, yet.

They will probably add such, though.


just wanted to let you know that I love your application!


Thanks! I think you'll like the new version whenever it gets finished :)


Big fan and daily user of your app as well. Looking forward to the new version!


It's OT but i couldn't resist! Could you also please add some Material Design Love ? Thanks for making this awesome app opensource.

Cheers


I, for one, was just thinking about writing a scraper...

Thanks very much guys!


This is a big question for me too. It sounds like you need to fetch every id from the REST API. I need to test the iOS (and you Android) SDK.


I remember it was announced a few month ago.


Nice app! Is login broken though?


Oh, thanks for the reminder. I fixed it a few days ago and did a staged rollout but forgot to push it to everyone. I've done that so it should update for you soon.


It works now, thanks! One side question: why is your hamburger (icon) a Double-Double (http://www.in-n-out.com/mobile/double-double.aspx)?


Ha! A) Because I love Double-Doubles. B) Because it's more than 2 year old (before there really was a hamburger icon on Android). It's completely redesigned in the next version, though.


Can't wait to upgrade to it! Thanks for the wonderful app!


Thanks!


So why, in the first place, would I want another mobile app rather than just opening the fully functional website (which is pretty simple & basic already) on my mobile browser?


Because it can be better designed, use common design / navigation patterns of your mobile OS, notify you when you get a reply, change the text size, change the theme, have richer animations, and allow you to automatically share content from other applications directly to HN?


Yeah. I'm just a bit averse to apps scraping data for the reasons you mentioned. It should have been the work of the mobile website, not an app. Speaking purely from the user's point of view (not the developer - I realize this is a community full of app developers) - one can't just keep installing apps for every website which is not mobile efficient yet. You all must have seen a lot of websites showing messages like "Welcome, we have an app, pess OK to install that, or Cancel to continue". Most of those websites don't do anything which a mobile website couldn't.


I agree with you in theory, but most mobile websites are poorly thought out and implemented - if at all. I definitely don't download apps for every site I use, but for the ones I use daily, I generally find I need to. Native OS interactions seem to be difficult to get right in the browser.

HN is definitely an example of a site that isn't ideal in a mobile browser. For instance, if you have the ability to downvote, it's incredibly easy to mistakenly downvote when you mean to upvote because how close and small the buttons are. There's other added functionality, like tracking who I've upvoted / downvoted in the past as well as tracking un/read comments when returning to a thread. In the browser, I use a chrome extension for this, and on my phone, I use airlocksoftware's app. (side-note, I wish said state carried between the extension and the app)

The developers of HN are surely capable of creating a mobile website that could work just as well, or even better than a mobile app. But currently, it's not ideal. And for that reason, I completely appreciate airlocksoftware's (and the devs of other HN apps) for their efforts.


Edit (Typo): Most of those apps don't do anything which a mobile website couldn't.


Because the website looks terrible on a phone/small screen. I visit ihackernews.com more often than I actually visit news.ycombinator.com


Ideally those in charge of HN would have simply employed someone - they had offers starting at free I gather from previous threads - to make HN mobile friendly. Failing that apps are just patching the original site. I too decry this form of progress (replacing web access with apps that only fix the borked site) but it's not hard to see why people should want that.


Mobile websites work poorly on my Moto G. Apps are smooth and pleasant to use.


Potentially offline access to comments.


The API can be used to create a mobile friendly site, since the official one is not for some odd reason.


I've been working on a Hacker News client for Windows Phone over the past several weeks and am very close to an initial release, so I feel somewhat ambivalent about this.

On the one hand, of course it's great that HN is finally getting a proper API and also modernizing its markup (which is a mess even if you ignore all the tables – for example, the first paragraph in a comment usually isn't wrapped in <p> tags), but on the other hand this current v0 version is very lacking and impractical for a regular client application.

Since the top stories (limited to 100) and child comments are only available as a list of IDs a client app would have to make a separate HTTP request for every single item, which is obviously not something you'd want to do especially in a mobile environment. Other lists apart from the top stories (new, show, ask, best, active etc.) don't seem to be available at all right now.

Of course this is just the first version, and the documentation promises improvements over time – which I don't doubt at all – but there's no clear indication that the API will be at feature-parity with the current website, even excluding anything that requires authentication, by October 28. So this means that I – and other developers of client apps or unofficial APIs – will probably have to write new scraping code once the new rendering engine (which I assume refers to the website) arrives instead of being able to switch to the new API immediately.

Now I guess I might just be needlessly worried, especially since the blog post explicitly says that the new API "should hopefully making switching your apps fairly painless", but then why not wait until it's actually ready for that before making the announcement? Putting a half-baked API out there a few days/weeks (?) in advance before it's fully fleshed out doesn't seem all that helpful, at least to me.


Use the Firebase libraries rather than the REST one to efficiently handle requests. I believe it uses a websocket internally. "It does all the work for you and is awesome." to quote Nick.


Well unfortunately there's no Firebase SDK for Windows Phone/C#/WinRT, and the WebSocket API is undocumented.

EDIT: After having played with the JS SDK a bit I'd like to add that it is indeed incredibly awesome.


[Firebase founder] There is a Firebase C# SDK on the way. We've had some other things, that we've been working on for the last year, that are shipping in the next few weeks which have taken priority. After that, we'll be shifting focus to new SDKs (they're a little complicated and take a bit of time to build)


I'm really glad to hear this. I have been loving Firebase for the app I'm making but one of the components has to run in C# and talking to Firebase from the C# app was much more painful than the other Firebase portions of the application.


This would be awesome. Right now I'm experimenting using raw HTTP requests and Newtonsoft.Json as the JSON [de]serializer. I presume that you will make your C# SDK a portable class library so we can use it in iOS and Android apps as well via Xamarin?


"C# SDK" doesn't say much. SilverLight can use C#, WinRT can use C#, .NET can use C#, but that doesn't mean your "C# library" will run everywhere. C# is just the syntax.


Also, while they don't have a Windows Phone API I previously wrote a COM Windows Scripting API to use it via Chackra from an ASP.Net app.

I think it's even easier now that JS is a supported app language.


Are you referring to the WebSocket API that the Firebase SDKs use internally? It doesn't seem to be documented anywhere so I guess it's only slightly better than scraping HTML ;)


I set things up so I could actually run their whole JS SDK and talk to it from C#.


Thanks for the tip, I actually just figured that out myself a few minutes ago. Should be good enough until a proper SDK arrives.

With access to a Firebase SDK the only major additions the API needs to become a viable replacement for existing read-only client apps would be support for all the other lists apart from top stories (new, show, show new, ask, jobs, best, active) and more than 100 items for each. For apps that need write access I'd suggest keeping the current website on a separate subdomain until that is implemented into the API.


Didn't they remove support for calling COM APIs (i.e. ActiveX) in Chakra? (At least in IE 10 and later, I think - the versions with proper EcmaScript 5 support.)


> (which is a mess even if you ignore all the tables – for example, the first paragraph in a comment usually isn't wrapped in <p> tags)

Oh, that's in the actual API.

https://hacker-news.firebaseio.com/v0/item/8422922.json?prin...

I strongly hope they add a plain-text field for returned comments.


What you're getting there is how the comment text happens to be stored (and presumably always has been). We've talked about changing that, because it would allow us to do some implementation improvements like... well, I forget just now. Might have had to do with caching. Anyhow, if enough people want it, we'll bump up the priority.


I would love it. Markdown would be the bee's knees there, if you can reverse it.


See if this helps: http://filr.io/channel/hackernews/stream

queryparams: pageSize (no of items to return in one go) before (takes an epoch timestamp and returns results before that timestamp)

this is a stream of HN stories that have made it to the front page, in order of the time that they made it to the frontapge.


Or use my Hacker News client for Windows Phone:

http://www.windowsphone.com/en-us/store/app/hacker-news/a527...


I've tried your app, it's functional but obviously I wasn't satisfied with any of the existing options which is why I wrote my own ;) It should be coming out this week, just a few finishing touches now...


Great you're here. I'm using this app, but I'm sorry to say this... It's pretty bad.

Here's a list of issues that I encounter very often:

- It often crashes during startup

- There are massive encoding issues, I see question mark symbols pretty often.

- Comments often won't load, without any error message whatsoever.

I uninstalled the app multiple times for these reasons, but unfortunately there isn't anything that's better.


Would you be willing to beta test my application? It's pretty much finished already and doesn't have any of the issues you listed (in addition to having a, in my opinion, much nicer design), I'd just be looking for some general feedback regarding the discoverability of some features. I could submit a beta today or tomorrow.


I'd like to beta test your app as well! meng.tan [at] gmail. Thanks in advance!


I would love that. Invite me at leoncullens @ live . nl


I'd love an invite as well, if possible.


Sure, just need your Microsoft account email so I can add you to the beta list.


I wrote a stupid simple wrapper and pushed it to PyPI. My excuse is that I needed to learn how to use setuptools today.

    pip install hackernews-python
Usage:

    >>> from hackernews import HackerNews
    >>> hn = HackerNews()
    >>> hn.top_stories()
    [8422599, 8422087, 8422928, 8422581, 8423825...
    
    >>> hn.user('pg')
    {'delay': 2, 'id': 'pg', 'submitted': [7494555, 7494520, 749411...

    >>> hn.item(7494555)['title'])
    Hacker News API

    >>> hn.max_item()
    8424314

    >>> hn.updates()
    {'items': [8423690, 8424315, 8424299...], 'profiles': ['exampleuser',...]}

https://github.com/abrinsmead/hackernews-python


Can't get it to work, e.g., for `hn.top_stories()` I get:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "hackernews.py", line 23, in top_stories
        return r.json()
    TypeError: 'list' object is not callable
Tried in both python 2.7.3 and python 3.2.3

EDIT: You need a relatively new version of requests for this to work. The version packaged with Debian Wheezy is too old. Use pip.


Thanks for the feedback!

Prior to version 1.0.0 response.json() was a property, and was not callable. That's probably what caused your error.

I'll add a minimum version to requirements.txt for the folks that install this package manually.


Decided to recreate the Hacker News homepage using Ember and the new API. I was really pleased with how easy it was! https://realtimehackernews.firebaseapp.com/


Very cool. Would be neat if changes were more pronounced when they happen. Will the posts change order in this demo if they change on the homepage?


Sadly no, but I'm looking into how to make that happen! It's a result of my particular implementation, not a fault of the API.


Very cool! Congrats on the quick work!

(As I find myself pondering the idea of standing something up like this on an dual-stacked server purely so that I could access HN from my IPv6-only test network... hmmm...)


Definitely digging the live-update on the scores. Haven't seen top stories switch places yet, but I'm guessing that happens also.

Good work.


Holy crap, that was fast. Impressive!


Very cool, but why not update the design a bit, while you're at it?


The API was just announced today. Maybe give him until tomorrow to reinvent Hacker News. ;-)


dstaley nice work. Would be nicer still if firebase had SPDY and QUIC lit up. I don't think it would be a problem with Apache already in front of Varnish.

But more importantly, why no favicon? ;)


that is really cool seeing the numbers change (occasionally)


To everyone asking about logged-in access and write access: this is just a first release! Where it goes from here will depend, in good iterative fashion, on what people want.


I think part of the angst centers around the bit where the markup is going to change. Lots of apps for HN that scrape the UI rely upon the current site to enable things like voting and other signed-in-only actions, for which there is (currently) no first-class API way to do these things. Even if the endpoints for voting change, getting other information (i.e. which items you voted on, so the vote arrows hide) is still markup-dependent.


How does this differ from the Algolia HN API in terms of data access? (https://hn.algolia.com/api) I was able to download all HN data recently with ease using that endpoint. Authentication?

EDIT: After looking at the documentation there are two new aspects of the Firebase API not in the Algolia API:

1) Ability to see deleted/dead stories.

2) Endpoint for user data.

Question to kogir/dang: Has the "delay" field (Delay in minutes between a comment's creation and its visibility to other users) always been there?


The 'delay' field has been there for years. I vaguely remember pg announcing it here when he added it, so we could probably dig up the exact date.


I'm also curious whether it removes some of the limitations of the Algolia version; I wanted to download my content for some statistical analysis (notes at http://www.gwern.net/HN ) but I discovered that it seems there's some hard limits to how much of my data I can reach: https://github.com/algolia/hn-search/pull/36


There's a cheat to work around that limit: use the created_at_i parameter.

Example: https://github.com/minimaxir/hacker-news-download-all-storie...


If you want to get data for a single user through Algolia's API via the commandline, you could also use https://github.com/jaredsohn/hnuserdownload. It uses the same technique as minimaxir's code (a post of his was the inspiration.)


I don't know Python, so I'm not sure what your source code is doing. At a guess, you've hacked together some sort of repeated queries thing with a time-window?


> Get 1000 entries and process them. > Take the timestamp of the last entry. > Requery the API, asking for articles made before that timestamp

Repeat.


> "Delay" field post by pg on 2008-06-29 https://news.ycombinator.com/item?id=231024


Algolia scrapes HN HTML and provides their own API from what it looks like.


They don't scrape it.

From the hnsearch about page:

"Every 2 minutes, the last 1000 HN items (stories, comments, polls) are sent to Algolia's indexing API. Items from the last 48 hours are refreshed every hour."


Correct, and we're planning to move to the new official API ASAP (instead of the legacy one -> it was not web crawler but far from perfect) for the indexing.

Regarding the REST APIs, let's keep both for now :)


[Firebase founder here] This is pretty exciting for us, we're glad kogir, dang, kevin and sctb chose to expose HN's data through Firebase. We're seen quite a few startups (and big companies like Nest) do this, since building, maintaining, and documenting a public API often isn't a easy task.


How does this API work with Firebase?

Is HN data already in Firebase (as its primary data store) or is content from HN's DB getting 'mirrored/cloned' on-demand to Firebase for the API?


This makes it really easy to add average karma to the comment section for every user. For instance, you can paste the below into the console, and should add average karma data for each user.

    Array.prototype.forEach.call(document.querySelectorAll('a[href^=user]'),
    	function(v,k) { 
    		var s = document.createElement("script");
    		s.src = '//hacker-news.firebaseio.com/v0/user/' + v.innerHTML + '.json?callback=ud_' + k;   
    		document.head.appendChild(s);
    		window['ud_' + k] = function(user_data){
    			var avg_karma = user_data.karma / user_data.submitted.length;
    			v.innerHTML += ' (' + avg_karma.toFixed(1) + ')';
    		}
    	} 
    );


Here is something I built with the Algolia API awhile back and just haven't gotten around to cleaning it up to post here.

It lets you download all comments/stories for a user as a JSON or CSV file, breaks down karma between comments and stories, and plots comment/story counts, karma, etc. over time on a line chart (clicking will show you the details via an hnsearch).

Also I built some npm modules so you can get this information via the commandline.

http://hnuser.herokuapp.com/.

Example: http://hnuser.herokuapp.com/user/tptacek/

The Chrome extension hasn't been updated for awhile (it just superimposes a small amount of this information on the user page).


I really appreciate giving a 3 week heads up before moving to a new frontend structure. It's a nice gesture, but I have this horrible feeling that there's only about a 10% chance that my Hacker News app gets updated in time.

I know you can't not iterate because people are scraping, but it does stink. At least this will make everything more future-proof going forward.

However, it may be nice to give a bit more heads up than 3 weeks. I know a lot of apps can take ~2 weeks to get through the review process for iOS.


Some sites actively punish scrapers by constantly, purposely changing their markup. So giving them an API and 3 week head start is leaps and bounds above and beyond what can be expected. When you operate a scraper, you are always on the defensive when it comes to site updates suddenly breaking your app. They should be so lucky to get this 3 week notice. It is all on them if they can't turn it around in time.


As someone who poured countless hours into meticulously scraping the HN markup and faces the prospect of having to port all my code with dread, I'll probably be pleading for an extension alongside you.


A preview release / staging version would help those of us with scrapers update it, without having a so much downtime / scramble when it's finally released.


Maybe they can put the new renderer on a different sub domain for a few weeks after their 3 week deadline. One to beta test the interface and two to give devs a bit more time to convert.


We'll probably start with logged out home to a percentage of traffic and work our way to updating the rest of the site and handling the full traffic over a few weeks.

If people need help converting their apps, I believe the Firebase guys have offered to help. Contact us at api@ycombinator.com and we'll connect you.


Love the staged roll out idea. You could probably also use user agents to recognize scrapers from real users (could also be helpful for slowly rolling out to older browsers).


Might be easier/quicker to use the API to generate your own HN clone with HTML in it's current format, then point your App at that


I've been waiting forever for an API from HN, but unfortunately I will not be using it for my app (https://github.com/bennyguitar/News-YC---iPhone).

I've built a library for iOS (https://github.com/bennyguitar/libHN) that handles scraping, commenting, submitting, voting, etc pretty well and allows me to make as few web calls as necessary to use HN. It looks like I'd have to drop functionality and completely change the networking scheme to match this API - something I'm not willing to do yet.

Correct me if I'm wrong here, but to get every comment on a post, I'd have to recursively get each item for each child. Instead, right now, I can make one network request and get all comments for a story. Granted, I have to parse the HTML (which I hate), but it's a much cleaner solution than going through every item, checking the children and then getting those items ad infinitum. Again, I just glanced over the documentation, but that seems untenable to me.


If you use the Firebase iOS library they handle it all and use an efficient bidirectional streaming protocol.


That's good to know - wasn't super clear on my first pass of the documentation.


Check the article btw

"Most importantly, the reason we released an API is so that we can start modernizing the markup on Hacker News. Because there are a lot of apps and projects out there that rely on scraping the site to access the data inside it, we decided it would be best to release a proper API and give everyone time to convert their code before we launch any new HTML."


Yeah, the only problem is that I don't want to cut major functionality just to use their API. Things I do, that it doesn't look like the API handles:

    - More than 100 stories
    - Best, Top, Ask, ShowHN, Jobs, User Submission Posts
    - User Management (logging in/out)
    - Commenting
    - Submitting
    - Voting
My app doesn't just function as a reader, which is what this API seems geared towards with the v0 release, it functions as about as full-fledged of an HN client as you could get. There's a couple things that I haven't built in yet like changing your about me text, but those were on the roadmap.

I'm actually thinking about storing the configuration of how my app scrapes online such that if the HTML markup changes, I won't have to push huge sweeping changes to the App Store to get my app online again. I just deploy to Heroku and the app will handle that configuration and scrape correctly sans pushing to Apple.


When everyone === lucky to have a supported platform.


I welcome the idea, but this barely qualifies as an API. The most useful part is the "current top stories" - but what timeframe exactly? Seems to be over 3 days at least and can't be customized. And even my test parsing of the 100 top stories took a good minute.

And that returns only the ids, nothing else. To get basic information like the score, title or url you have to lookup the ids individually. And even the story items do not contain such basic information as the number of comments. And you can't calculate it yourself since only the top comments are even returned (as ids of course). So you'll have to recursively dig through the comments to get the number.

This is even more curious as there is a very solid Algolia API where you can filter for submission time, story score, number of comments and even return a greater number of results + access page numbers to get even more.

To get the information of a single algolia api call you will need hundreds or thousands (in case of nested comments) "official" API calls. Hoping for updates


If up/down vote data were included in the API, much needed experimentation on collaborative filtering would be made possible! This is Hacker News after all.

Right now one team, Ycombinator, is trying to fix important issues in the ranking and moderation of posts and comments. Many of us are frustrated by the increasing domination of popularity (and hatred) over quality and relevance. A lot of good submissions and comments are simply buried, never to be found. There is too much muck to have to wade through. The timing of posts and comments plays a much larger role than quality. I could go on and on.

Imagine a Netflix Prize-like flowering of experiments and collaboration, leveraging the hacker community's collective smarts and enthusiasm. Many of us have ideas, but right now are unable to test them. What a shame if a great idea dies on a notepad.

There are two possible issues with opening up voting data: gaming and privacy. If having vote data allows someone to game the front page, then only include it with some delay (2 days?) so that it could't be used to game the front page. This will still allow experimentation with collaborative filtering algorithms and the like.

My take on the privacy issue is that anonymity isn’t that important for a site like Hacker News:

1. Startup culture is about straight talk, putting your money where your mouth is, and open critical feedback, both in the giving and receiving. There are precedents for exposing voting data (e.g. Quora, Facebook, Stack Exchange).

2. HN is not aimed at political discussions or other topics where anonymity can be paramount.

3. Pseudonymity is sufficient for those who don’t want their votes and comments tied back to their actual identity.

Thoughts?

I would love to hear from others who yearn to experiment with alternate algorithms and strategies for improving Hacker News.


There are many legitimate views on this, but FWIW mine differs from yours. I believe that anonymity actually is important for a site like Hacker News, and the odds of us ever publishing the vote data—even pseudo-anonymized—are small. Sorry to disappoint.


Daniel, I understand. Do you or any others at Y-Combinator have any thoughts on how the hacker community could experiment in the areas I mention above, or whether you guys even think such experimentation would be valuable?


I built a scraper around 3 years ago (been through a few usernames since then), and I've had to change it once 3 months ago because the HTML output added quotes around HTML attributes.

Even though it's read only, I'll continue to use my scraper rather than the API simple because it's one request, rather than the API would require one request for the top IDs and then one call per story, so it would be 31 calls instead of just 1.

Unless I'm missing something, it seems fairly poorly designed for top stories, and non existent for new stories.

------

EDIT: Looks like I missed the text about updating to a new rendering system in 3 weeks time, and to iterate designs faster to allow mobile friendly theming. Looks Like I WILL be updating to use the API


yeah, I just have the same problem here... and then I have basically the same question as someone mentioned below... new stories through the api? do we have to get the max-id and then get everything below the max-id and check if its a story? and other ideas?


Yay! I've been wanting something like this to come out. I've been playing around with some new tech stacks and built a css replacer of hacker news, but always wanted an actual api to make it easier.

http://jmaat.me/hn

There's a bunch of css pages that come out for hacker news, but I couldn't find anything that aggregates them. This will be alot easier to extend and customize the site.

I'm not seeing any api's for the jobs or show sections though? Hopefully this might come in the future?


Why do you need an API to replace only CSS? You can do that with an extension like Stylish/Greasemonkey.


Well personally, I didn't want to install any browser add-ons. I also had some other ideas, like aggregating reddit and hackernews posts, but would need to scrape for that (unless there's an external api I'm missing).


Just pushed the ability to subscribe to all changes and new items: https://github.com/HackerNews/API/blob/master/README.md#live...


The Firebase JavaScript library makes make this impressively straightforward to use. I built a clone using React.js and Firebase's library. Because v0 of the API requires a request for each news story, it's not possible to use Firebase's React mixin yet.

https://github.com/ssorallen/hackernews-react


Here's another React version which also does comments (and allows you to fold them - I needed to write a userscript for that before!). It's using react-router to switch between top stories, comments, individual comment and user profile pages.

http://insin.github.io/react-hn

I've just gone for it with Firebase's React mixin, binding everything as an object, since their devs in this thread don't seem concerned about rate limiting. The mixin seems to throw an error every time I try to unbind, which I'm just catching and logging for now.

Edit: I just watched this comment pop up live in my version - pretty neat :)


Nicely done! Binding every HN story as an object would have been the simplest approach, that makes sense.

Is the source available somewhere?


Original source is here until I set up a proper repo (after modularising and setting up a build process):

https://github.com/insin/insin.github.com/blob/master/react-...


[Firebase Dev Advocate] This is great, thank you for sharing!


I'm definitely excited about the API and the future possibilities with it. Looks like a great start. I do have a few questions and suggestions, though.

Is there any chance of getting more than just the top 100 stories returned? I think it will be a lot more useful for api consumers if you can use a query parameter to set the limit (within reason, usually 1,000) and a number of results to skip. For now, scraping is still more desirable to me since I can retrieve any number of results in their current order.

Better yet, but more complex: a number to skip and a certain timestamp so I don't see the same article on two pages due to natural upvoting, downvoting, or rank decay.

Also, if there's any flexibility still with property names, I'd suggest these changes for clearer semantics: "deleted" -> "hidden" (since they're obviously not deleted) "by" -> "author" (for more clarity) "kids" -> "children" (the common convention)


Please do allow other sites to use HN logins. Then the community could develop useful sister services.

For example, a site where HN members can upvote and rate different development tools, libraries, IDEs, management tools, etc. All with backlinks to HN discussions. It's a great community and there are many ways we could share knowledge and experience.


Any kind of rate limitations we should be made aware of?


[Firebase Dev Advocate here] We don't currently have rate limits. I'd recommend using our SDKs, they handle connections more efficiently than dealing with the REST API. You can also run your own server process using our Node (https://www.npmjs.org/package/firebase) or Java libraries (https://www.firebase.com/docs/android/quickstart.html).


Thanks for answering.

I'd rather use the REST API directly, for what I need is rather simple and not downloading, installing and maintaining an SDK is more appealing. (My app was developed a while ago and was doing HTML scraping, but the 30-second limit on HN killed it, because of testing -- I don't need to query more often than twice per minute, but while testing I ran the thing a little too often).

So, what are the limits on the REST API, and how do limits work? (A max number of requests per hour would be better than per minute for example).


Any kind of rate limitations with the REST API we should be made aware of?


You really want to use the Firebase SDKs.

https://news.ycombinator.com/item?id=8423055


There are currently no rate limits on the REST API.


This implies that your SDKs don't use your own REST API. What do they use instead?


We establish a websocket connection and fall back to long polling when appropriate. This SO question has more details: http://stackoverflow.com/questions/12591011/firebase-with-mo...


Rate limit is up to Firebase. We'll see if we can get them to answer.


That doesn't worry you at all? Were any alternatives considered?


Trust me, their rate limit will be much higher than anything we'd easily be able to provide.


Oh and if you guys build anything cool with the API, please let us know at api@ycombinator.com.


Here's a simple example that displays the top story and votes using the Firebase JS SDK (and updates in realtime): http://jsfiddle.net/firebase/cm8ne9nh/


and one showing all top stories updating in realtime. Obviously, in JS, but the other Firebase SDKs are similar: http://jsfiddle.net/firebase/96voj1xh/


Suggestions for improving the API, to make it more valuable for data mining and analytics. This assumes more historical data is available.

1. Provide a way to bulk download the data (that's a click, instead of scraping the API)

2. Add a field for the maximum position a story reached on the front page

3. Add the numerical score for the comment (at least on comments that are N days old, which won't interfere with the reason to hide the scores on the main comments)

Some other changes that would be awesome (but are less realistic) include:

4. A historical event log of votes (even better would be relating those votes back to users, but I imagine that's not going to happen for privacy reasons. An intermediate possibility would be a vote log connected back to anonymized user ids, assuming the anonymous id -> real user id mapping is difficult)

5. A historical event log of display position changes for stories & comments

6. An event log of pageviews with as much metadata as possible to release without infringing on privacy


> the reason we released an API is so that we can start modernizing the markup on Hacker News

This is bigger news. No more tables!


I think quality of the v0 API is not so good for the following reasons.

1. Major functions are dropped (best, new, job, ask,...)

2. Useless response schema of the top stories API (should learn from Netflix's internal/public API design)

3. A short transition period

It's very hard to provide same responses to apps.


Oh this will be cool, and I really look forward to being able to read HN on a phone or tablet in the future without all the zooming and scrolling!

It will be interesting to see if it has an impact on site traffic, how much of that traffic is scrapers today?


Would it be possible to cache the number of comments a story has? Or am I wrong in my understanding that the only way to find the number of comments a story has is to walk the tree of child items and maintain my own count?


+1


Glad to see that HN will still be written in a lisp after these major upgrades.


I regard that as part of the site's DNA. It permeates so many things about its design, both internal and external.


Great news! Will require a fair bit of retooling, but, ya know -- omelets, eggs, etc!

I see the topstories query to replace scraping /news: https://hacker-news.firebaseio.com/v0/topstories.json?print=...

And there must be a query to get the newest items instead of the current /newest.

Are there also new equivalents for /active, /best, /classic, /show, and /shownew?

I'll be happy to replace dodgy XPATH parsing with a proper API. Hope we won't lose these other views though!


So, does it mean I can get the top stories, and then get a top story item with all the comment expanded ? I mean, at first it look like it just send me the id and I need to fetch the detail for each of them. Again, this is just looking at the REST API, not the iOS SDK for example.

I'll need to "convert" SwiftHN (https://github.com/Dimillian/SwiftHN) either to this new API or adapt my scrapping engine to the new site layout.


Please do convert it, I am working on a project with some of your code as a base!


You're using the app, or the scrapping engine (Hacker Swifter)?

The nice thing is that HackerSwifter public API is already in a quite finished state for the available functions, even if I switch to the API, the method calls will be the same.


I am actually using the app because I am doing a mashup of HackerNews and another piece of software. If I decide to end up changing the UI a lot (which looks like it is trending towards), I was thinking of switching to the scraping engine. It's awesome that I won't have to change a lot if you do switch to the API. I'm following on GitHub, keep up the great work!


This is so awesome – albeit a bit overdue ;)

I'm going to start diving into the API to build a simple, powerful "Google Alerts for HN" app on Assembly, and I'd love help from anyone who's interested: https://assembly.com/hn-monitor

There are some products like this out there, but they had to rely on scrapers and the HNSearch API, so I've always found them to be spotty. I think we can make something better.


Chiming in here and presenting my try at a Hacker News website using Angular JS and Bootstrap. https://github.com/lekoaf/HackerNews

My only problem with the API is that you need to do an awful lot of Ajax calls just to get something out of it. The topstories endpoint just gives you an array of IDs and then you need to do one Ajax call for each ID to get the story.

Oh well, the site is a work in progress. Not done yet.


I spent about 20 minutes throwing this together, so it's VERY rough, but maybe it will turn into something useful. I'm primarily a java programmer but I've been wanting to teach myself more ruby, so here it goes.

https://github.com/stevenspasbo/Hackernews

EDIT: I should say, I started on this to create word clouds, but if anyone has any ideas, contribute!


For my twitter bot https://twitter.com/hn_bot_top1 I use http://api.ihackernews.com/ at the moment. This works but the site (and the API) is quite often not available. So I'll probably switch to the new API as soon as I have some spare time...


Smart move using Firebase. This instantly gives developers traction with the API and as a big API client guy, I love making clients but also very happy we'll have all the tools we need to get started using the API right off the bat. Considering Firebase doesn't have rate-limiting too, this means that the things people can build with this API are limitless.


"Let there be apps".

Anyone with a Firefox phone that wants to work together on a HN client for it? FFOS has been lacking a good HN app.


I am excited about this because now I can finally build a way to query my own posts and comments. I often times come across products/services/cool hacks on here that I vaguely remember, but cannot always locate. Using built-in search is kludgy, and I'd like to be able to do something more complex. Thanks for doing this!


It's interesting that you mention the phenomenom of "I remember seeing it (or maybe even saying it) but I cannot find it". This has happened to me a lot. It is intensely frustrating to me because it's blocking - I don't get much done until I re-find whatever it was.

I wonder how many other people have this? And what their techniques are?


I have the same problem, and I think the best solution would be to index my web browser search history and be able to search through that index. However that would probably generate a lot of junk. My current solution is to bookmark anything that's vaguely interesting. And when I remember something, I search through the bookmarks. I never look at them, they're just for keeping the stack of stuff to search through.


I am using the same approach too. However, most of the old stories I had liked at HN were not bookmarked, so I had to struggle writing a scaper for them. It would be very cool to be able to authenticate and grab all my saved stories and the comments for archiving and fun!


Awesome!

On the top of my wishlist: Look up HN story by URL.

(From the comments, I learned about a different API that will suffice for now: https://hn.algolia.com/api/v1/search?query=google.com&restri... )


I added upvoting stories and comments feature to my ios/android app few weeks ago. And thinking about adding commenting.

http://hn.premii.com/about

Now I will have to remove that. Please keep current version of the YC site on a different subdomain.


I use your app, it's awesome! Also agree with your request, as I said elsewhere in this thread...

By the way, animations (the slide between list and comments) have been jerky since a recent update. I'm using the Android app on an HTC One M7 with the latest stock ROM (4.4.3 and Sense 6.0). A few other people have mentioned this in Play Store reviews. Is this a known issue?


If someone would be interested in contributing to an OSS project to build an iOS HN client, please have a look at https://github.com/bonzoq/hniosreader.


OSS under what license? I don't see anything in a quick look over the repo.


MIT. Added it to readme now. Thanks.


I would love a personal API too where I can download my own data (i.e: saved articles).


Does anyone have a good solution on how to get around making a separate request for each Item? Is there somewhere we can pass an array of Item Ids? Is this planned for the future?

Thanks HN!

EDIT: I just read the post about using the Firebase SDK to do this efficiently.


Nice! I'll be changing my Mac system tray app HackerBar over to use this instead of some Objective-C scraping magic that it uses now: http://hackerbarapp.com/


Wow I've never heard of this but that is a really cool app, I'll definitely check it out. Do you know of anything similar for reddit?


There's way too many reddit things already out there... I was trying to tap a market that doesn't have lots of enthusiastic software developers.

oh.... wait...


Give me a HN trend chart for the words "pivot" and "mvp" please:)


If you guys are already using Mashape, it has been released as a community API at https://www.mashape.com/community/hacker-news


Why could you not have released it 1 day sooner? :) I just wrote a scraper: https://github.com/calebmadrigal/hn-tracker



Great news, I've avoided touching my iPad reader because of the whole scraping issue to get at some of the data. Now I can justify updating for iOS8 + the new API. (just hope it gets approved in time)


I'm just wondering if you'd consider releasing a regular data dump, possibly via bittorrent.

As I'd like to do some analysis, but don't really want to thrash the API, downloading all data.


I managed to take a picture when this got 1337 upvotes.

https://hostr.co/file/cHCoBCgwsjEc/2.PNG


Very cool, I'm glad I prepped my android app for this :D

One question: why the choice of returning everything as an ID, on mobile, this will require a lot of very small network requests.


If you use the Firebase SDKs the request is sent over a reused full duplex socket rather than creating new/individual connections.


Yeah, unfortunately, I'm trying to not use any closed source stuff in the app. Going to write some service layer stuff for it



Perhaps keep the old site still running on a different URL so scrapers who can't get their act together in 3 weeks can just change the URL they're scraping from.


Finally, an AlienBlue caliber app for HN is now possible. Soon.


news:yc is the best I have found. Just wish it has collapsable comments.


Disappointing they are still going to use Arc even with the update to modern HTML. I despise how with Arc if I leave the page open during the day and try to click a link later the link has expired.

They should really update to a modern web framework at the same time. Big modern frameworks like Rails are making ridiculously awesome improvements like replacing page loads with XHR (quicker loading since JS/styles/etc is all loaded already, no screen flash, etc.) in a progressive enhancements manner.

So Arc can't even generate fully functional links, let alone keep up with modern web advancements.


I guess you don't use HN very often. We fixed that in nearly all cases a few months ago :)


What is it about Arc that causes the unexpired link issue? I know very little about Arc. But I understand it to be a language whereas the issue you describe appears to be the function of some implementation decision. Can you please elaborate?


(I'll take a crack at explaining this momentarily...)

Good grief, that took a long time. Here you go: http://pastebin.com/bSW5dfRQ [1]. I'd better stop neglecting my duties now!

Edit: One thing I forgot to put in there: one reason the closure technique is powerful is that you're leveraging the programming language and runtime to do most of the book-keeping for you. Whatever data is handy, you just reference. The system will remember all the references. That's why using things like query strings and hidden form fields is more complicated: you have to handle all those details yourself (not to mention serialize and deserialize them if you're passing through any other format than what your program keeps in memory). That is tedious, and when your app has many kinds of request, the complexity quickly piles up.

Of course there are other abstractions you can build over this, but closures are an elegant one—especially in cases where programming simplicity is more important than scalability, which is most cases.

Edit 2: A few people thought this should be its own post, so I made https://news.ycombinator.com/item?id=8425011.

[1] Originally http://pastebin.com/dETyYtpX, but I added the above bit etc.


I started a small package for golang here: https://github.com/cryptix/gohn


HN is getting a mobile-optimized site?

Best news I've heard all day!


http://jsfiddle.net/chrismccoy/cuz7ugL8/

quick fiddle to dump the json


Wrote a simple ruby wrapper for this.

https://github.com/infinitus/hnrb


Great news! Can we add endpoints for "Show HN" and "Ask HN" too though? I don't see an easy way to get them.


This is a very good news. Made my day. Thank you.


Are there API methods for New, Show, Ask, Jobs and Comments page items?

Or should I be picking them up from "/v0/topstories"?


This is excellent news, and allows me to make an app for YCombinator using our social app platform!

This should really energize the HN community.


On a related tangent, since you mention modernizing the markup, can we please have markdown support in comments?


Until now I've been using the RSS feed to write a Maildir archive of stories.

It will be nice to write that directly via the API :)


Glad to see there is an API. What about the ability to delete accounts? Why do I have to beg to delete an account?


Yes!!! Now I can make my HN app! I don't like the ones that are on the market now (on Android platform)


I have been working on a toy project that uses the Algolia API. Maybe I'll switch it to this now.


Any plan on having another field on the items route that can be queried besides id, e.g. url?


What about search? Would be great to have an ability to search by URL or title fragments


Does this encoding: &#x27;

have a name ? I can't actually find anything useful on the 'net.



Thanks. I'm trying to get cURL to decode, but it doesn't seem to natively handle. Now I'm digging into .../escape.c

I feel like I must be missing something... :/


You can't simply decode each character without losing information. For example, &#x3c; means a literal < character to be shown on the page, as opposed to a < in the stream which starts an HTML tag.

If you're just planning on displaying the text in a browser, no decoding is needed. If you want to parse the text to do some sort of textual analysis, an HTML parser library might be best.


I understand what you're talking about re: &#x3c; and '<' -- the json -looks- page (terminal in my case) displayable, barring the &#xhhhh; encoding. cURL has facilities for decoding %20 (for example), but not what we're getting back w/ this json.

You've given me an idea though, so back to vi for me.

Thx.


not sure if you figured something out already, but just saw your comment and remembered that this exists in PHP:

http://us1.php.net/manual/en/function.get-html-translation-t...

absent another source, you could dump it out for your usage elsewhere.

  % php -r 'print_r(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES|ENT_HTML5));'
or

  % php -r 'print json_encode(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES|ENT_HTML5));' | jq .
edit: just found http://dev.w3.org/html5/html-author/charref (but might be harder to parse..)


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: