
Apache PDFBox – A Java PDF Library - rkda
http://pdfbox.apache.org/
======
plq
I always found SVG (plus the Inkscape CLI) to be the best interface for
generating PDF documents.

I'm sure there are arguments in favour of libraries like this such as
performance or better control but when all you need is to generate a simple
invoice with a couple of dynamic fields, just use Inkscape to draw your
template and your favourite XML library to put values in it in your code. When
you need the PDF, use inkscape --export-pdf to get your PDF document.

SVG also makes it quite easy to show an in-browser preview of the generated
document before performing the actual conversion, avoiding a costly PDF
generation most of the time.

~~~
vram22
Interesting. So do you run inkscape manually from a command line or shell out
to it from a program? And I guess that would not scale well f you want to
generate many PDFs, in the sense of running inkscape each time would be a perf
hit?

~~~
tracker1
It seems that it would scale just find with RabbitMQ, and worker queues. It
may not be maximum performance per node... but that's not the same as
scalability.

~~~
vram22
I guess if you have enough hardware resources, and the delta is low, it may
not matter.

After all, we run Unix commands tens or hundreds of times a day, alone or in
pipelines, to get things done, vs. putting all the needed functionality into
one program and calling it.

Various pros and cons exist for both approaches ...

------
sigmaml
I used PDFBox a few weeks ago to dynamically annotate a set of PDFs. The
library works well; but it is severely under-documented. Also, the API is
asymmetric w.r.t. _get_ / _set_ in several places. After several hours of
frustration, I had to look up the PDF specification for constants to set for
default appearance, _etc_., and then set them directly in the underlying
_COSObject_ dictionary.

Despite a few problems, the library is good. I will certainly use it again.

Edit: I used 2.0.0-RC1.

------
ilamparithi
I used to use a Play framework module to create simple PDFs. You just do your
formatting in html and render it as a PDF.
[https://www.playframework.com/modules/pdf](https://www.playframework.com/modules/pdf)
(only for 1.x series). There seems to be a version for 2.x series but I have
never used
it([https://github.com/innoveit/play2-pdf](https://github.com/innoveit/play2-pdf)).

~~~
hatchoo
I do the same but using the flying-saucer library. Just pass the HTML text
(with CSS) and it gives back a PDF. Up to a certain point the limitation was
getting the header row of multipage tables to appear i every page. When they
got to implementing support for CSS 2.1, worked like a charm. Headers and
Footers (witblh page numbers) work too.

~~~
joshmarinacci
What? People still us Flying Saucer?!

------
mtrycz
I see that they have released 2.0 lately, but can't find any dates, nor the
improvements over 1.x. Is this the news? A press release would be better,
probably. It's not [Show HN], it's an established Apache project, so I don't
get what the news is here.

~~~
alricb
Their "Migration Guide" [1] contains a list of improvements. They should
probably add a date to their pages, but reading their mailing list the release
is at most a few days old.

[1]:
[http://pdfbox.apache.org/2.0/migration.html](http://pdfbox.apache.org/2.0/migration.html)

~~~
laumars
The download pages do have dates, albeit you have to hunt to find them:

[http://mirror.vorboss.net/apache/pdfbox/](http://mirror.vorboss.net/apache/pdfbox/)

    
    
      [DIR] 1.8.11/                 2016-01-17 21:55    -   
      [DIR] 2.0.0/                  2016-03-18 12:02    -

------
tmd83
Anyone has a solution to make pdf from html thats hosted. The java library
flyingsaucer is stuck at css 2.1 and not much maintained. I have been meaning
to check wkhtmltopdf which will need launching a process but might be the best
bet for such work. Anyone has better experience in doing something similar?

~~~
aviraldg
wkhtmltopdf works quite well, I've used it before. A bit of a hassle to get it
compiling though.

~~~
rpedela
I have also had great success with wkhtmltopdf. There are pre-built binaries
available too:
[http://wkhtmltopdf.org/downloads.html](http://wkhtmltopdf.org/downloads.html)

------
andrewstuart2
I'm surprised more people aren't simply trying to use TeX or LaTeX to generate
dynamic PDFs. Most of the libraries I've looked at historically were comically
incapable, limited to basics like "place a single line of text at these
coordinates," with no provisions for line wrapping, paragraphs, tables, etc.

LaTeX (or any TeX variety) on the other hand give you so many different
powerful features to generate any arbitrary document, and not only in PDF.
There are, of course, a few associated risks that would need to be mitigated
as with anywhere user input gets compiled in some turing-complete system, but
I'd imagine it would be fairly straightforward to make sure input gets
properly escaped so an attacker couldn't run arbitrary code.

~~~
topogios
PDFBox is quite good at extracting text from pdfs. When I use PDFBox, it is
for extraction.

Maybe this has changed with newer versions of PDFBox, but 5+ years ago, the
internet wisdom was to use PDFBox for extraction and something else, like a
version of iText that suited your license needs, for generation.

As much as I like LaTeX, if you have made no prior time investment in
typesetting with it, it is not trivial to produce custom good-looking output
with it.

Have you tried using (La)TeX in a real world project? Would be cool to hear
from someone on whether compilation time is an issue. Some TeX packages have a
quite severe impact on performance.

------
rodionos
It would be nice to see a feature diff with itext.

~~~
sgt
iText can be somewhat frustrating to work with, so it would be great if this
is a viable alternative.

~~~
ternaryoperator
Really? I've used iText for years. It is extensively documented both online
and in the 600-page book from Manning and the guys who wrote the library
answer questions on SO, most being answered in less than 24 hours. In
addition, iText has both a high-level API and a low-level interface, so
anything you can do in a PDF you can do with iText. I'd be hard-pressed to
think what could be frustrating about it. That doesn't match my experience at
all.

------
t_tsonev
And if you're looking for commercial software that produces PDFs right in the
browser, see [http://demos.telerik.com/kendo-ui/pdf-export/page-
layout](http://demos.telerik.com/kendo-ui/pdf-export/page-layout)

Disclaimer: I'm part of the team. Also see [http://docs.telerik.com/kendo-
ui/framework/drawing/drawing-d...](http://docs.telerik.com/kendo-
ui/framework/drawing/drawing-dom#known-limitations)

------
donretag
Hopeflly PDFBox is more stable than Apache Tika. Tika has become bloated with
all the different formats that it supports.

Latest version of Tika parser has 44 compile time dependencies:
[http://mvnrepository.com/artifact/org.apache.tika/tika-
parse...](http://mvnrepository.com/artifact/org.apache.tika/tika-parsers/1.12)

~~~
sigmaml
The scope of Tika is different and wide. For the PDF subset, Tika uses PDFBox
to parse and extract data from the document.

~~~
donretag
I have never noticed that. Perhaps it is because I never used Tika directly,
but as a dependency from other projects such as Nutch.

I realize the scope is different, which is why I mentioned that Tika supports
more formats.

------
jimjimjim
Just in case there are any posts about the complexity of using any pdf
library, you need to understand that the PDF spec itself is an almost
unimaginable monster of complexity and frustration.

------
techaddict009
Can we use PDFBox to convert microsoft formats like .doc, .docs, .ppt, .pptx,
etc to PDF?

I am struggle finding some good solution with the same.

Their website doesnt have any proper answer to this.

~~~
voltagex_
It doesn't seem to have that feature, no.

Windows 10 and OS X have built in PDF printers now.

~~~
PinguTS
Is the Windows 10 PDF printer better than the MS Office integration?

Because the MS Office PDF integration is shit, when I get such PDFs in Preview
on Mac OS X and then want to print it.

------
cdnsteve
Any feedback on how this compares to Reportlab in Python or Prawn in Ruby?

~~~
alricb
PDFBox can read and render (to an image) pdfs in addition to generating them.
I don't know how feature-complete it is.

------
murkle
Can it be compiled with GWT?

~~~
Freak_NL
PDFBox does not seem to depend on anything but the JDK (1.6) and commons-
logging, so it might work if you manage to find a working and matching version
of commons-logging for GWT, and if PDFBox only uses the JDK parts supported by
GWT. That is a pretty big if though. I doubt someone bothered to port this to
GWT, but who knows, you may be in luck.

If you are trying to generate PDFs in a web-application, I would delegate this
task to the back-end.

