
Ask HN: Service to convert between different document formats - casvc
Hey HN guys and girls,<p>We are working on SaaS API to convert between different document formats (e.g. Excel&#x2F;CAD&#x2F;Email&#x2F;whatever to PDF&#x2F;TXT). I am wondering: which conversions pairs (source format &#x2F; destination format) are of interest to you and why?<p>Thank you!
======
jay888
How is your service going to be different from existing file conversion
services like zamzar.com, convertfiles.com etc ?

~~~
casvc
Thanks for asking. The main difference is focus on depth instead of breadth -
thus instead of multitude of possible output formats support only few
(PDF/HTML/TXT/IMG), but with some added features. Just few examples: \- bulk
search and autoredactions (marking / blacking out parts of documents that
match certain queries) \- signature and handwriting detection \- tokenization
(for TXT output) \- language detection (for TXT/PDF output) \- named entity
detection (for TXT/PDF output)

Potential customers are people developing systems for GDPR (data protection),
fraud detection, eDiscovery and content management.

~~~
PaulHoule
If you are doing some kind of intense annotation probably your most important
thing is having an output format that supports the annotation you want to do
-- not necessarily supporting any.

I have been thinking about universal annotation and the formats that I find
the most interesting are PDF (because so much content exists in PDF) and HTML
(open, easy to work with.)

~~~
casvc
You are absolutely right - we are thinking along the same lines. The only
reason why we are offering TXT/IMG as output formats next to PDF/HTML is the
fact that some people will have their own composite document formats and they
can build those out of TXT/IMG.

------
brudgers
I am curious who the customers for this service are and what document formats
they deal with.

