

Ask HN: APIs to reliably convert any document to HTML? - grease

Hi HN,<p>I've broken my head on this, and haven't found a a reliable way to programmatically convert documents (doc, docx, pdf etc) to HTML. The only option seems open-office as a server - but this keeps crashing (at least once a day). I would like something that can process thousands of docs per day and not crash. Any one here has faced this problem / knows a solution?<p>[ PS: In case you're wondering why, we run a web app for recruiting ( recruiterbox.com ) which requires converting resumes to html ]
======
OneWhoFrogs
I've never used it, but the Google Docs API fit your requirements:

<http://code.google.com/apis/documents/>

It accepts doc, docx, and pdf and does export to HTML. I'm unsure about what
the API rate limit is, though. The FAQs suggest that it can be raised by using
a premier account.

~~~
grease
Looks like every document that needs to be converted has to stored with Google
Docs ... could not find anything related to rate limits ... one mischievous
way to use this could be to upload a doc to Google Docs, export it in the
format I like, delete the original doc, repeat with next doc. Think they'll
ban me for abuse soon though (considering the volume I'm expecting to churn).

~~~
callumjones
You also need to request it to be converted to Google Docs format otherwise
you can only retrieve it in its original format.

------
dinedal
Document conversion is a tricky space for a startup. All the rules are defined
by companies who would very much like to see you fail, and code wise it's
almost the most boring task I can think of.

