We went as far as building a browser-based IDE-like environment for generating these, and a language called parsley for expressing the scrapes. If you're interested in this, you could check out some of our related open source libraries:
Edit: I just open-sourced the scraping wiki site we created here: https://github.com/fizx/parselets_com
> cat hn.let
"title": ".title a",
"link": ".title a @href",
"comments": "match(.subtext a:nth-child(3), '\\d+')",
"user": ".subtext a:nth-child(2)",
"score": "match(.subtext span, '\\d+')",
"time": "match(.subtext, '\\d+\\s+\\w+\\s+ago')"
> csvget --directory-prefix=./data -A "/x" -w 5 --parselet=hn.let http://news.ycombinator.com/
> head data/headlines.csv
4,Simpson's paradox: why mistrust seemingly simple statistics,2 hours ago,http://en.wikipedia.org/wiki/Simpson%27s_paradox,41,waldrews
67,America's unjust sex laws,2 hours ago,http://www.economist.com/opinion/displaystory.cfm?story_id=14165460,59,MikeCapone
23,Buy somebody lunch,3 hours ago,http://www.whattofix.com/blog/archives/2009/08/buy-somebody-lu.php,58,DanielBMarkham
Also, if you're in or near SF, I'd be happy to get coffee sometime.
Hacker New Newest links are also available here:
Other APIs are never expired (Expire feature is still not pushed)
So as nice as this is, it simply won't work here for the many people who would like to use near live data on HN.
This is not saying you should do it.
Hacker News needs a real API.
The lack of functionality: it had it. The problem is that it not only required user/password, but it also was caught into HN's safety net, that prevents multiple accounts from the same IP to do a lot of stuff.
So it can work as library, but not as a server-side API. Therefore he removed it.
For another project of mine, I used Hacker News search API, which is really consistent, and really powerful, and is maintained by the the yc company that does ThriftDB
Can you add support for taking existing JSON API (rather than scraping HTML)? This useful for APIs that are neither accessible with CORS nor JSONP, APIs that are provided by incompetent mental midgets who don't answer emails or participate to their Google Group (cough MBTA cough).
So, it's basically a web-scraper, but with a JSON API. The API input is limited to a single parameter, that indexes the record to be scraped. The API output is taken from that indexed record, consisting of a set of scraped elements within that record, and presented as JSON, with attributes named as user specified.
Although this is limited to a list of renamed records, it could be extended (if needed), and I really like the concept and UI implementation. Feedback: As someone who has never used css, I found it very tricky to even duplicate the tutorial: selectors are sensitive to leading and trailing spaces; the selectors given in the tute aren't what's needed (and see BTW below); and often "API call failed: Internal Server Error" indicating a problem with the selector, but not what it is, and ATM service is often "unavailable" :), it's slow switching back and forth between "edit" and "test" (why not include testing on the same page? like HN comment edits: textarea + rendered result); when an attribute is removed, it remains in the JSON (code eg http://apify.heroku.com/resources/4fcb26d7a06a160001000024); it takes a long time (30s, 1min) to get a result. I hate to say it, but it's like my experience with ruby: it takes so much time and effort to get the tool to basically work, that I've used up all my enthusiasm/gumption and have none left for the project I had in mind. But much of this is because of current traffic spike, my ignorance of css, and minor polishing/bugs that can be fixed in vers 1.1 - as I said, I really like the idea and UI.
But a deeper question: why a service, instead of a library? It's cross-language, but has an extra dependency (the service), an extra network jump, processing from many users convening at one point. It's interesting to me, because the world seems to be moving towards services, and this would logically include components that formerly would be libraries. Will this happen? What are the pros and cons? Will Amazon etc provide free computation for users of open-source components, analogous to open-source libraries? Interesting.
BTW: minor typo/bug in active URLs in the tute (http://apify.heroku.com/tutorial/create): an extra "s" in "episodess":
Service is just an extension of this library:
The intent of service is to make mobile apps without a backend/db like Parse for read only APIs
There's also a good API which powers my favorite Android HN app over here: http://hndroidapi.appspot.com/
"API call failed: Internal Server Error"
It won't work for #! URLs. Twitter has a nice streaming API for search if that's what you're looking for.
* I guess #! URLs could be transformed into _escaped_fragment_ URLs 
* I know twitter has an API. It was just an example. Maybe this example  would be more relevant (content could also be fetched with an _escaped_fragment_ URL).
I tried to fix your e.g. but couldn't get it to work (I tried //span@data-time (xpath) - what is the unique index of a tweet?)
But it's just HTTP, so it's basically already an API.
The advantage of the app over the library is caching and automatic expiry