
Show HN: DocRipper – Scrape .doc|.docx|.pdf|.txt|.sketch with 1 command - pzaich
https://github.com/pzaich/doc_ripper
======
Meekro
Interesting! Apache Tika is the "classic" way of doing this, and it supports
many dozens of formats. Is this meant to address certain needs that Tika
doesn't?

~~~
pzaich
Interesting! I started this project a long while ago and I gradually
introduced additional features and formats over time after running into
performance issues with other file parsers (not Tika). Tika looks like a great
solution if you don't mind the Java dependency.

Here's a JRuby wrapper:
[https://github.com/ricn/rika](https://github.com/ricn/rika)

