
List of resources: Article text extraction from HTML documents - necrodome
http://tomazkovacic.com/blog/56/list-of-resources-article-text-extraction-from-html-documents/
======
juiceandjuice
For a while now, I've aliased a version of wget as 'wcat', (alias wcat="wget
-qO- -U NoSuchBrowser/1.0") to dump pages directly to my browser so I could
quickly search through and use less, sed, and all sorts of other stuff.
Integrating text extraction into that would be pretty useful.

