

Show HN: Docverter: convert plaintext to PDF, Docx, or ePub. Now in open beta. - zrail
http://www.docverter.com/learn.html

======
pestaa
What does this add to pandoc? In other words, why would I pay for this
service? It obviously runs in the cloud, but I haven't found much else.

In fact, my observation is that the API documentation was merely copy-pasted
from the original. Example:

Pandoc docs[1]: <http://dl.dropbox.com/u/144454/hn/from.png>

Docverter API[2]: <http://dl.dropbox.com/u/144454/hn/to.png>

[1] [http://johnmacfarlane.net/pandoc/README.html#header-
identifi...](http://johnmacfarlane.net/pandoc/README.html#header-identifiers-
in-html-latex-and-context)

[2] <http://www.docverter.com/api.html#toc_425>

However the author went the extra mile to rename sections thus making it sound
like the Markdown extensions are in fact Docverter's.

Sorry to be so negative, but this almost seems like acting in bad faith and
selling a GPL-licensed software as service under a new name.

~~~
alexchamberlain
Whilst I agree that people should be open about the software they run, let us
remember that selling access to GPL software running on your machine is
perfectly legal.

~~~
rodw
Also, for what it's worth the author was very open and explicit about the
provenance. In the footer:

"This is a copy of the Pandoc README file, modified to suit Docverter's
manifest format."

I'm confused though. In another comment [1] zrail says this (the HTML-to-PDF
in particular) is built on a Java library. Is Flying Saucer based on Pandoc?
Do you use one sometimes and the other other times?

[1] <http://news.ycombinator.com/item?id=4645497>

~~~
zrail
Docverter is a collection of a few pieces of software that get used at various
times. For example, if you do markdown to docx you'll just be using Pandoc. If
you convert to PDF you'll be going through flying saucer. If you go markdown
to PDF you'll go through both. MOBI conversions go through Pandoc to get an
ePub and then through Calibre to get the mobi.

The point is that you don't have to worry about those pieces, though, since
Docverter abstracts over them with a simple API.

------
memset
This is super cool. One thing that would be useful is an "examples" page. That
is, have some sample .txt or .html files, and show us what the output .doc,
.docx, .pdf, .epub, etc files look like. Just a list of static files, really.

You could also do something more fully-featured, like a sandbox where you can
upload your own files. Perhaps it could be an "evaluation plan" which has a
maximum of 10 conversions per month. (Then again, $5 isn't really that much to
pay to evaluate a service.) Or maybe unlimited conversions with the evaluation
plan, but the output files have a watermark?

I had no idea this was based on pandoc until I read the HN comments. So, cool!

~~~
zrail
Thank you for the kind words. There are a few examples on the API page in the
Advanced section but you're right, they should be featured more prominently.
Also, the free dev plan will give you full access but will indeed insert
watermarks.

------
alexchamberlain
I'm impressed someone has had the balls to launch a service without a free-
tier. Note that there is a developer's access plan for free though, so no
moaning!

~~~
endlessvoid94
Same here. It's amazing what having customers vs. users can do for a business,
right from the get-go.

~~~
hntester123
+1

------
nlh
This looks great (and is something I'll very likely use on a project I'm
working on now!)

My quick "dumb" question -- what's the pitch for using this vs. what I would
call more traditional conversion tools? My project will need some HTML -> PDF
goodness and I was planning on researching and running some sort of local /
server-side package (which I presume exists, though I haven't researched them
yet).

Either way, congrats on the launch - this makes a lot of sense and sounds like
a great utility.

~~~
zrail
Thanks! The pitch is that HTML to PDF conversion tools, as a rule, are not
very good. Docverter is actually on it's third iteration of that particular
conversion because the first two didn't provide even close to satisfactory
results.

~~~
highace
That's interesting. So you've built your own HTML to PDF converter? Can you
provide an example (a screenshot, maybe) where your version excels against an
existing solution?

~~~
zrail
It's a small web service that wraps around a Java library named Flying Saucer,
so there isn't anything to look at really.

Flying Saucer excels against the alternatives I looked at in a few ways.
First, Pandoc's built in PDF writer uses a LaTeX intermediary which doesn't
support anything that web writers have come to expect. Second, the other tools
were webkit based which variously didn't support the page media spec, didn't
support embedding fonts, or both. Others were custom one off of desktop tools
that wouldn't work how I need for Docverter.

~~~
sciboy
Just so you are aware, flying saucer while nice when you first use it has
tonnes of bugs and isn't really being developed these days. You'll find
yourself more and more diving into the code because the output is substandard.
We used it for years and have now moved away because we couldn't stand
customising it for every little edge case more, plus it doesn't support
html5/CSS3 which is essential nowadays. Take a look at the codebase - you
won't want to be adding to that! Additionally it expects documents to be
completely in memory, which means it will take down your Java server
sometimes.

~~~
zrail
Out of curiosity, what did you move to?

~~~
sciboy
We've been using phantomjs, but it has it's issues too. I'm not sure there is
a good solution anywhere in the open source world unfortunately - and tools
like prince are expensive.

------
zrail
Hi HN,

I posted last week about Docverter, my plain text to rich text conversion
tool. It's actually ready for people to start using now. I'll be here all day
to answer questions.

Edit to add: I added docverter PDF conversions to my blog last night and it
took all of an hour. Check out the Download PDF link toward the bottom here:
<http://bugsplat.info/2012-08-11-task-oriented-dotfiles.html>

Code is here:
[https://github.com/peterkeen/bugsplat.rb/blob/master/app.rb#...](https://github.com/peterkeen/bugsplat.rb/blob/master/app.rb#L122)

------
liamk
Looks great. If you added docx and pptx to the inputs then you could easily
compete with some big names (your prices are very competitive).

~~~
zrail
Thanks. I added docx and pptx inputs to the todo list. I'll have to look into
how to parse them into HTML.

~~~
eric_bullington
You're in for a world of pain when you start parsing docx and pptx. On the
bright side, if you can figure out a good solution, you'll likely have a solid
business model. I would imagine that there would likely be significant demand
for converting docx and pptx files into html or markdown, as a service. If you
do come up with a nice, well-documented API for all of this, I'd certainly
recommend your service. If you come up with an outstanding docx parser, then
I'd use your service myself (I am using my own somewhat primitive solution for
a current project involving the conversion of docx files).

Here's a few projects to look at, if you haven't already:

1\. <http://www.docx4java.org/trac/docx4j> 2\.
<https://github.com/mikemaccana/python-docx> There are some interesting forks
and more active forks, but this is the original python-docx

~~~
zrail
Thanks for the links! I'll dig into them later.

Having never looked at it myself I'm not really sure why it would be so
painful. Are the formats just super wacky?

------
pknight
Are you going to add a way to convert urls to pdfs? It would be awesome to
create a pdf of a webpage from the push of a button.

~~~
josephlord
You might need to be careful that you don't end up liable for any copyright
infringement. I would speak to a lawyer before pulling content from a URL and
transferring it to an unrelated destination.

Maybe T&C can protect you in this scenario but maybe not.

~~~
pknight
The use case I was thinking of was using it on my own website though, for
things like printing out a nicely formatted invoice or a printable mockup of a
webpage.

Recently I designed some flexible forms meant for printing, but I used
php/html/css to generate it. I discovered that it's really hard to get a good
quality print out of a webpage. If you use screenshots you get poor
resolutions, and direct to print/pdf conversion tools didn't render the CSS
all that well.

Don't see how T&C couldn't cover such scenarios in which the user has explicit
permission to generate a copy of a webpage.

------
evolve2k
Can this service handle more complex structures like converting a table in
Textile or Markdown into a Table in docx? Can it handle word styling?

I'm building software where the output must be in docx, wondering how far I
can go in not having to deal with word automation to get the output I want
into a Word Doc.

~~~
zrail
It can definitely handle word styling if you provide it with a reference docx
to copy the styles from. I haven't tried tables. If you'd like, feel free to
create a dev account and give it a spin.

------
eblade
This is likely a service I'd use once in a while. Say I want to convert an
html to mobi once per month, the listed pricings do not fit my use case. That
means one conversion for 5 bucks.

I know it isn't easy to set up the pricings but would there be any "pay per
use" for people like me?

------
lrem
Is there any benefit to this over simply running Pandoc?

~~~
zrail
If you're running your app on Heroku or another free PaaS, it's a pain to get
pandoc running and stay within the slug size limits. Additionally, if you want
to convert HTML to PDF and have reasonable results (i.e. CSS support of any
sort) you need to run a secondary conversion process which is also not trivial
to set up.

------
don_draper
Can you watermark and password protect pdf files? If so this would be a great
service for selling ebooks.

~~~
zrail
Not at the moment. I'm not really sure how to do that, but I'll look into it.

~~~
zrail
Edit: Actually it looks pretty easy to add. I'll work on that tonight and
update the @docverter twitter account when it's ready.

------
sebnukem2
I want to evaluate the service before paying anything for it.

~~~
zrail
There's a dev plan link at the bottom of the pricing page, specifically for
evaluating before you purchase. It watermarks your documents but other than
that it's the real deal.

