
Ask HN: Any Need for Office to PDF Conversion API? - jcnnghm
For a project I was working on, I needed the ability to convert Microsoft Office files (Word, Excel) into PDF files so that I could work with them more easily.  I created a REST API that allows this conversion very easily; POST the Office file, and once the conversion is complete, receive a POST with the converted PDF.  I was thinking that this service could be useful for other people's projects.   Before I go through the effort of creating a website and documentation, does anyone else have a need for this sort of service?
======
pytxab
Before you build it, you might be interested to know that Scribd (and Docstoc,
and others) have long had API's that offer this as a free service. We have
seen pretty good usage at Scribd, but never enough to think that we could
charge for it.

On the other hand, API's for document sharing sites are more complex and offer
a lot of other functionality - this may scare off users who just want the
transformation. I could see a market opportunity for a Twilio style API that
was very targeted at this functionality. I still don't think you'd make a
great deal of money charging for it, but I could see it getting some use.

~~~
jcnnghm
I had no idea that Scribd could do this. I think I'd rather just use the
Scribd API than do it myself, and I can probably use some of the other
functionality as well. Thanks for the heads up.

------
vijaydev
Have a look at JODConverter -
<http://www.artofsolving.com/opensource/jodconverter>

I have used it in the past from my Java applications in the same way you
intend to. Here is a list of supported formats of conversion:
[http://www.artofsolving.com/opensource/jodconverter/guide/su...](http://www.artofsolving.com/opensource/jodconverter/guide/supportedformats)

~~~
pytxab
JODConverter is definitely a reasonable solution for most applications. You
can also write a Python script that leverages OpenOffice's Python UNO API to
do the same thing - a basic script that turns Office files into PDF's is only
~20 lines of Python.

For really serious applications, though, you will eventually run in the limits
of OpenOffice's support for MS Office formats. If you are very picky (or have
a very picky client) about the quality fo the conversions, you'll end up
having to use Microsoft Office to do the conversions. This is a lot less
pleasant to set up, so I wouldn't do it unless you have to.

------
braindead_in
I was looking for one and didn't find a simple enough one. So ended up
installing unoconv on my server. Does the job.

------
jonafato
I've seen this done (and used on a few occasions) as a standalone service for
end users. Usually this is ad supported. I can think of a few applications I
might use this for, such as accepting school assignments without having to
trust all of the students to submit proper file types. Outside of advertising,
how might you monetize such a service? Or would it be a public service?

~~~
jcnnghm
I was thinking a Twilio style model. Sign up, account gets charged with a few
dollars for testing. Then there would be a small fee per file, I was thinking
$0.05 to $0.10 per file, or a penny a page.

Haven't really gotten that far with it.

Edit: I should have also mentioned that I don't think this could be ad
supported because the end user will likely never see anything but the output
file, as it's an API and not a user facing service.

------
guiseppecalzone
It depends on how much it costs. I needed this for awhile. My startup sends
faxes from the internet (<http://www.hellofax.com>). We need to convert files
from 40 some different file types to PDF before we send it to the fax server.
There were other converters out there, but they were limited in the number of
file types that they convert. So, we are now paying $30+ dollars a month for a
Windows server at Rackspace to do it. So, if you can convert a ton of
different file types, we would have definitely been a customer.

------
smokey_the_bear
I used to work on Google's applicant tracking system. We used some software
that did that to convert all the resumes. It didn't work very well for
formatting.

~~~
jcnnghm
Interesting. I was able to solve the quality issues that plague many of the
other solutions for this problem. Do they still use something like that?

~~~
smokey_the_bear
I don't know, I worked on that in 2005 and have since left Google. But I'd
imagine a lot of large companies do something similar with resumes.

~~~
carterschonwald
when i was doing the application late fall, early winter, it was acrobat pdf
based, which sadly is broken on macs with a case sensitive file system

~~~
smokey_the_bear
It looks like they've built an online application system since I was on that
project. 6 years ago, you just emailed your resume to an email address that
mapped to a job requisition, and we tried to process whatever attachments came
in that way.

------
quizbiz
I know I am dealing with a client that wants to turn web pages into good
looking PDFs ready to print and ready for review by a boss.

~~~
pytxab
The best way by far is wkhtmltopdf, because it uses webkit to render the page.
Most other open source projects use a toy HTML renderer, which is not going to
work for in-the-wild webpages.

~~~
yesimahuman
Thanks for that link. Right now I'm doing something similar using a modified
version of webkit2png.py that outputs into PDF instead. I'll have to check
this out.

------
freikwcs
There has to be some need for this in the legal industry. Good luck!

------
rnugent
We take PDFs and convert them to XML, CSV or put them in a DB and provide an
access API. Can anyone think of a use case for this type of system?

------
whatwhatwhat
this idea as a web app is probably MORE viable if you could add in some other
conversion services... but it is all kind of muddy waters

------
anigbrowl
I don't get it. Why not just 'print' to pdf with Adobe's driver?

~~~
jcnnghm
Because you can't do that programmatically from a system that isn't running
windows, like a linux web server processing files for display.

~~~
lsb
Sure you can! Webkit html to pdf: <http://code.google.com/p/wkhtmltopdf/>

~~~
Ralith
Since when does "HTML" include "Office?"

------
underdesign
It's been done. I use <http://www.fastpdf.com/>

If you monetize it, make it easy to submit jobs, you have a business model.

