

Ask HN: Is there an API that allows me to extract the text from an article? - ppjim

 I wonder if there is any API that allows do the same as Instapaper or readibility. In particular you can select any web page and just get the text, removing the navigation menus and advertising. I'm on a project that needs to analyze several Internet news sites and extract the contents. The problem is that each Internet portal has a different structure that is difficult to add a new site.&#60;p&#62;Greetings.
======
polyfractal
Viewtext [1] provides an API that gives you clean(er) HTML. It still contains
some markup but is vastly simplified. You can also roll your own with tools
like HtmlCleaner [2] or lxml [3]

[1] <http://viewtext.org/>

[2] <http://htmlcleaner.sourceforge.net/>

[3] <http://lxml.de/>

~~~
ppjim
Thanks I'm going to check out. I google about text extractor software and I
found AlchemyAPI

------
astrofinch
<http://www.diffbot.com/docs/api/article>

