

Generating Images or Extracting Text from Documents? Docsplit. - jashkenas

http://documentcloud.github.com/docsplit/<p>It's always been a pain to make documents (PDFs, Word, Excel, etc) displayable and searchable on the web by extracting images and plain text. Docsplit is a command-line utility and Ruby API to help make it a little easier. It wraps the excellent PDFBox, GraphicsMagick, and JODConverter libraries so that you can do things like this:<p><pre><code>    docsplit images docs/*.pdf --size 700x,50x50 --format gif

    docsplit text expenses.doc

    docsplit title presentation.ppt</code></pre>
======
jashkenas
Clickable links:

Documentation: <http://documentcloud.github.com/docsplit/>

Source Code: <http://github.dom/documentcloud/docsplit/>

Blog Announcement: [http://www.documentcloud.org/blog/2009/12/07/announcing-
docs...](http://www.documentcloud.org/blog/2009/12/07/announcing-docsplit/)

