
Like Instapaper, but for Developers - mmackh
http://api.thequeue.org/v1/
======
nikcub
I tested a lot of these services and libraries a while ago as part of
developing a product that required extracting article text and metadata from a
URL.

The best service, and it won by some margin, was Diffbot (www.diffbot.com). I
ran comparisons between approx 20 different services and libraries and it won
by some margin. It uses machine learning rather than regular expressions or
per-site filters, and the engine has been extensively trained (I threw a lot
of edge cases at it, which improved it). There seem to be a lot of similar
services that do well with common cases but completely fall apart when applied
broadly.

So to the author of this service - what features or examples do you have that
distinguish your implementation from others? What is the technique being used
here?

~~~
mmackh
I set out to build my own interpretation and that's what I did. It does have
an automatic extraction pattern, but also uses per side rules, if available.
I'd say the main distinguishing factor is the price point: free.

~~~
kolektiv
Which if you want people to use it with confidence probably isn't a selling
point:

<http://blog.pinboard.in/2011/12/don_t_be_a_free_user/>

~~~
mmackh
I cannot, in good conscience, charge for just scraped content. I mentioned it
before, but once I complete testing, I'll release the full source on GitHub

~~~
mrleinad
I used to work for an online editorial in Spain, which did exactly that:
charge for scraped content. You can't imagine how many customers there are for
this kind of thing.

------
Concours
Looks nice, we should talk, I run a service that does the same (and more):
<http://www.feedsapi.com> , where are you based in Switzerland, I was in
Bienne a couple of months ago and based in Germany. I will drop you a mail
shortly.

------
andysinclair
Any chance you can expand this as a "real" service, i.e. one with a guaranteed
service level for a monthly fee?

I would love to use this in an iPhone app I am building, but I am obviously
wary as it may disappear/go offline at any point.

I would gladly pay a monthly subscription to use it.

~~~
mmackh
I'm using it in <http://readapp.net> & my upcoming HN News App, so it isn't
schedule to disappear any time soon. Send me an email, so we can discuss this
further, if you'd like to.

~~~
eli
Interesting service. You've got several typos and some awkward phrasing in the
text under <http://readapp.net/pub.html> though.

~~~
mmackh
Thanks, will rewrite this today

------
dabeeeenster
What text extractor engine are you using?

~~~
mmackh
I'm building this on top of a PHP port of readability.

~~~
mmorey
Are you using this port <http://code.fivefilters.org/p/php-readability/> ?

------
JoshTriplett
I tried this on <https://www.xkcd.com/386/> , but
[http://api.thequeue.org/v1/clear?url=https://www.xkcd.com/38...](http://api.thequeue.org/v1/clear?url=https://www.xkcd.com/386/)
just extracted the content disclaimer and Creative Commons license notice at
the bottom of the page: "Warning: this comic contains [...] This work is
licensed under [...]".

~~~
mmackh
Should be all fixed now, please let me know if it works for you.

~~~
JoshTriplett
Somewhat better now, but it still grabs a lot of the boilerplate too.

(Also, out of curiosity, what did you change to fix it?)

~~~
mmackh
I added a set of rules for this page. Unfortunately the DOM isn't easily
broken down, thus the extra clutter.

------
lowglow
Sweet. We should talk. I run a similar project at <http://www.rtcool.com/>

------
johncoltrane
Thanks. You should avoid underlined text for non-links, though.

------
neiljohnson
It would be really great if, for shortened links, it also provided the final
url

~~~
mmackh
Try now

~~~
neiljohnson
Great stuff, thank you

------
endlessvoid94
This is great. I made a personal periodical for myself using readability and
it worked, but was a pain in the ass. This is exactly what I should've built
first.

------
digamber_kamat
Thanks a lot. It needs to improve a bit I guess but a great beginning, always
wanted such an API.

------
einarlove
Just what ive been looking for! Will definitively use it sooner or later.

------
sidolin
You might want to stop it from opening local files.

~~~
mmackh
Could you please elaborate?

~~~
randallsquared
Probably they're talking about the implication of errors such as in the result
of
[http://api.thequeue.org/v1/clear?url=http://news.ycombinator...](http://api.thequeue.org/v1/clear?url=http://news.ycombinator.com/item?id=3646627):

    
    
        Warning: file_put_contents(db/aHR0cDovL25ld3MueWNvbWJpbmF0b3IuY29tL2l0ZW0/aWQ9MzY0NjYyNw==.clr) [function.file-put-contents]: failed to open stream: No such file or directory in /home/mackh_vps/api.thequeue.org/v1/clear.php on line 27
    
        Warning: file_get_contents(db/aHR0cDovL25ld3MueWNvbWJpbmF0b3IuY29tL2l0ZW0/aWQ9MzY0NjYyNw==.clr) [function.file-get-contents]: failed to open stream: No such file or directory in /home/mackh_vps/api.thequeue.org/v1/clear.php on line 31
    
        Warning: Cannot modify header information - headers already sent by (output started at /home/mackh_vps/api.thequeue.org/v1/clear.php:27) in /home/mackh_vps/api.thequeue.org/v1/clear.php on line 55
        aHR0cDovL25ld3MueWNvbWJpbmF0b3IuY29tL2l0ZW0/aWQ9MzY0NjYyNw==Invalid URL
    

Also, maybe turn off display_errors and turn on log_errors in your php.ini.

~~~
mmackh
Thanks, the ? character was causing issues

------
gillesguillemin
Thanks man, you quite likely made my day!

------
n8ji
any chance you'll add JSON support?

~~~
mmackh
Try adding &format=json and let me know if you run into any bugs

~~~
wseymour
How's about sum Accept headers up in there?

~~~
mmackh
Could you point me in the right direction?

~~~
lysol
Accept: application/json

~~~
mmackh
I'm currently using header('Content-type: application/json'); Source:
[http://stackoverflow.com/questions/267546/correct-http-
heade...](http://stackoverflow.com/questions/267546/correct-http-header-for-
json-file)

~~~
Robin_Message
I think they want it so that if they send an Accept header in the _request_
that asks for json, you reply with json, instead of using a query parameter to
specify the format.

<https://developer.mozilla.org/en/HTTP/Content_negotiation> has more details
about the accept header and its use.

------
ale55andro
too awesome! I like it and it couldn't have come at a better time. Made my day
as well :)

------
dragosstancu
Very cool. I could totally use it in a future iPhone app. Pinterest fever
alert! :)

------
balsamiq
Hey thanks for using my post as an example! ;)

------
TamDenholm
I'm glad there is JSON support. Does anyone else think XML should die a
painful death?

------
robmcm
Very cool, good work :D

