
PDFium: Chrome’s PDF rendering engine is now open-source - andybons
https://code.google.com/p/pdfium/
======
atesti
I found it interesting that it seems to use Antigrain by Maxim Shemanarev in
[https://pdfium.googlesource.com/pdfium/+/master/core/src/fxg...](https://pdfium.googlesource.com/pdfium/+/master/core/src/fxge/agg/agg23/agg_array.h)
(Chrome uses Skia). Unfortunately the author of Antigrain died:
[http://beta.slashdot.org/submission/3154635/rip-maxim-
sheman...](http://beta.slashdot.org/submission/3154635/rip-maxim-shemanarev)
It's nice to see the fascinating Antigrain code to be used for PDF viewing
every day!

~~~
dman
Did not know about this. As an agg user this made me sad.

------
reedlaw
This is good news because it renders PDFs a lot faster and better than pdf.js
that Firefox uses. Also, I would have to install this binary blob to get
Chromium to render PDFs. It seems Chromium could easily adopt this, but I'm
not sure about Firefox.

~~~
azakai
Well, it wouldn't make sense for Firefox to adopt this:

1\. It is tied to v8 (PDFs can run JS, this PDF viewer uses v8 do so - see
CJS_Context::RunScript etc), so it would mean bundling 2 JS engines, with all
the security downsides of that.

2\. This is written in C++. You can sandbox C++ in various ways, but that
would still increase the surface area of the browser, compared to pdf.js which
only uses things normal web content would use.

3\. pdf.js is not just meant to render pdfs, it's also a useful project to
push forward the web platform. Areas where pdf.js was slow turned out to be
things that were worth optimizing anyhow. This doesn't benefit people viewing
pdfs directly, of course, but it's still an interesting aspect of the project.

~~~
comex
Well, Chrome could make this viewer not increase the surface area of the
browser just by changing it to a NaCl plugin. After all, it already exposes
the ability to run native code inside NaCl to the web. ;p

I like the concept of pdf.js, but it's still significantly slower, and thus
provides a worse experience to the user, than native viewers.

~~~
zobzu
i dont know, on recent computers, more often than not i dont really see a diff
between pdf.js and others as a user.

it seems to only be an issue on really heavy pdfs, which are pretty rare

~~~
Jyaif
Think mobile.

~~~
Ygg2
On mobile (Android) both Fx and Chrome start downloading pdfs.

------
yincrash
It's really interesting to see that there are foxit employees on the list of
committers. I assume that means that it was initially a fork of the foxit PDF
reader?

~~~
barbs
Foxit isn't open-source, so I don't think it would be legal to release a
derivative under the BSD license (IANAL).

~~~
awda
It's perfectly legal for Foxit to license their own software to Google to be
released under the terms of the BSD license. This would be with Foxit's
consent, of course.

~~~
tfinniga
Right, and probably some money changing hands.

------
ksec
Why did they close down something like Google Reader and Not Google Code
Hosting? Do anyone actually use it?

I wish they could either make Google Code decent or simply kill it and use
GitHub instead.

Is this a new implementation? Of did Foxit release it as Open Source?

~~~
iamsalman
It shows that the original code was Foxit's so would that mean that Foxit
released the code or did Google purchase code rights and then decided to open
source? Anyways, why would they host it on Google code?

~~~
ksec
That is the point. If Foxit opened it up then they deserve some credit.
Otherwise the web is going to publish this as another Google PR without
acknowledging their work.

------
scrollaway
I thought Chromium's PDF engine was based on Foxit; did they change that?

EDIT: I see there are Foxit employees in the commit list. Well, that explains
that!

Anyway this is great news. Kudos Google.

By the way, for those confused, the source is not on svn like Google Code
fails to communicate but on
[https://pdfium.googlesource.com/](https://pdfium.googlesource.com/).

------
zx2c4
This is great because it is now the best open-source PDF rendering library.
GhostScript, Poppler, XPdf, pdf.js -- they all sort of work alright, but are
pathetic compared to FoxIt, on which this source code is based. What we now
have with this source is a high performance highly compliant clean codebase of
C++ PDF rendering code. Excellent news. Expect lots of future PDF innovations
to be based on this.

~~~
TillE
I've had by far the best performance with Sumatra, which is open source but
unfortunately Windows-only. I try Chrome's PDF reader once every few months,
hoping it'll improve, but I always disable it when I see it's still
disappointingly sluggish.

~~~
gsnedders
Sumatra just uses muPDF, FWIW.

~~~
johnx123-up
Not really true.
[https://news.ycombinator.com/item?id=7788074](https://news.ycombinator.com/item?id=7788074)

------
ferongr
It's interesting that this came, seemingly out of the blue, a little after it
was made widely known (from the mozhacks article [1]) that Opera developers
were working towards integrating pdf.js into their Chromium fork.

[1] [https://hacks.mozilla.org/2014/05/how-fast-is-pdf-
js/](https://hacks.mozilla.org/2014/05/how-fast-is-pdf-js/)

~~~
gsnedders
And Opera announced their move to the Chromium Content API and WebKit around a
fortnight before Blink was announced.

And note that isn't the first time there's been Opera interest in pdf.js — I
spent the majority of summer 2012 working on trying to get pdf.js running well
in Presto, as was relatively well known around pdf.js contributors.

------
async5
It looks like they did it in a hurry (on the next day) just after
[https://hacks.mozilla.org/2014/05/how-fast-is-pdf-
js/](https://hacks.mozilla.org/2014/05/how-fast-is-pdf-js/) was published?
Competition FTW!

------
e98cuenc
In case the authors are lurking here, what are the main differences between
this and poppler?

~~~
Ologn
While the main poppler developers, who IIRC are three guys from Spain, have
made a heroic effort, poppler is really not that good. Poppler was created by
ripping out code from Xpdf and making it into a library.

If you look at the code, it is not really well architected. Here is a file I
found a problem in -
[http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Tex...](http://cgit.freedesktop.org/poppler/poppler/tree/poppler/TextOutputDev.cc)
. Take a look at that file and judge for yourself if it follows "Code
Complete" type suggestions.

One reason I looked in that file is poppler does not deal well at all with
many map PDFs like
[http://web.mta.info/nyct/maps/busqns.pdf](http://web.mta.info/nyct/maps/busqns.pdf)
or some others I have on my hard drive. They take forever to load.

Some PDFs have caused the applications using poppler to crash, although some
of those have been patched. It's not as bad as it used to be, but still. My
patch to speed up the bus map PDFs was not accepted. Then there are features
like being able to enter data into PDFs and such. Compare and contrast Adobe's
official Acrobat app for Linux and a PDF reader based on poppler like evince.

So the answer is a standard one - code architecture, bugs and features. The
answer would be to take the PDFs that Adobe Acrobat handles but which poppler
doesn't in terms of bugs and features, and see how pdfium handles them.

Of course, it's possible pdfium will handle those but fail on an entirely
different class of PDFs and their pdfium specific bugs.

The PDF standard is a fairly large one. What features does pdfium handle which
poppler doesn't? What percentages of PDFs crash the viewing application, or
don't render correctly compared to poppler? And so forth.

I should also add that poppler usually depends on cairo for vector graphics.
So once in a while the fail for a pdf is on cairo, not poppler. I have seen
some of those fixed, some not.

~~~
frik
I compared the speed (plain text extraction) of xpdf, poppler and mupdf on
100k PDFs. mupdf is in 95% cases the fastest, then comes xpdf and then poppler
(the latter two crashed on a few files). SumatraPDF viewer went from poppler-
only to two engines (poppler & mupdf) to mupdf+patches. At the moment from my
experience SumatraPDF has the fastest and most reliable PDF engine, that is
open source. So it will be interesting how this Chrome PDF open source engine
(based on Foxit?) performs outside of Chrome as standalone library/commandline
tool.

------
wooptoo
I believe this is the source of the PPAPI plugin, and not something built into
Chrome.

Anyway, this is great news for Chromium, as the PDF plugin can now be shipped
to distro repos.

~~~
dchest
Chrome's PDF Viewer _is_ a plugin. You can see it by opening
chrome://plugins/.

------
nppc
For some one using PDF.js (which works great both on Chrome & Firefox) for my
company's enterprise app - does this matter much ?

~~~
emn13
Unlikely, given that this is native code, and pdf.js is web-app friendly js.

------
steipete
Sadly no tests, no documentation (except the documentation from FoxIt). Not
even source code documentation.

------
MrBuddyCasino
So it seems the SDK has the basic plumbing to make a commandline tool out of
it:
[https://pdfium.googlesource.com/pdfium/+/master/fpdfsdk/src/...](https://pdfium.googlesource.com/pdfium/+/master/fpdfsdk/src/fpdfview.cpp)

Anyone interested?

------
gkya
Just glanced the source code, and, isn't it bad to “#include
"../../../sth.h"”? Wouldn't it be better to set the include path while
compiling and just “#include "sth.h"”?

~~~
frabcus
Not really, having an explicit path seems more useful and clearer to me.
Easier to read the code and find the file. Things like "gf" in vim will
definitely work on it to open it up too.

------
gue5t
In typical corporate code-dump style, no README and no clear instructions on
how to build or what form the output takes. I installed gyp to try it and get
a variety of errors depending on what I try (the furthest I got was complaints
about v8.gyp being missing; does this have to build within the Chromium source
tree?). Does any Google insider want to explain their internal build practices
so a mere mortal can try to compile this code?

~~~
async5
Per
[https://code.google.com/p/pdfium/issues/detail?id=1](https://code.google.com/p/pdfium/issues/detail?id=1)
: "Looks like the standalone build system is not yet present."

------
runn1ng
So, apparently _somebody_ still uses Google Code.¨

edit: .... just not for the actual code.

------
ahmett
Seriously, releasing on Google Code Project Hosting instead of GitHub? Even
CodePlex is better than that.

------
rushi_agrawal
Isn't it annoying to see code.google.com as the medium of sharing code? I've
got so used to Github that google code seems like an old 20th century thing..

