
Ask HN: What's the best document parsing tool/SDK that you've heard of? - voiceclonr
I am looking to parse various documents (docx,ppt,pdf,pst etc), extract metadata, text etc for search. I&#x27;m looking into Apache Tika - but my gut tells me a native windows tool may be better long term. Can anyone refer to tools&#x2F;SDK they&#x27;ve used or heard to be successful ?
======
mindcrime
Tika is what we use. It's not perfect, but it works pretty well for our
purposes.

