
An even slimmer pdf.js - AndrewDucker
https://blog.mozilla.org/nnethercote/2014/06/16/an-even-slimmer-pdf-js/
======
gabemart
The reduction in memory usage seems impressive for large PDFs, but peak usage
of 695MiB to display a 14-page pdf [1] still seems like _loads_ of memory. On
my machine, a native reader (Foxit) [2] takes ~30MiB.

[1]
[http://cdn.mozilla.net/pdfjs/tracemonkey.pdf](http://cdn.mozilla.net/pdfjs/tracemonkey.pdf)

[2]
[http://www.foxitsoftware.com/Secure_PDF_Reader/](http://www.foxitsoftware.com/Secure_PDF_Reader/)

~~~
GHFigs
<fanboy>With MuPDF it peaks at 15MiB.
[http://www.mupdf.com/](http://www.mupdf.com/) </fanboy>

~~~
TheLoneWolfling
How's compatibility?

~~~
GHFigs
All I can speak to is that I've never had a problem reading. I didn't even
realize the recent versions support forms until this thread.

~~~
TheLoneWolfling
Good to know! I'll try it for a while, see how I like it.

------
annnnd
I don't want to sound negative and I certainly don't want to start a war here.
I also really appreciate what Mozilla is doing and I am an otherwise happy
user of FF on both desktop and mobile.

That said, FF memory usage (not just pdf.js) is sometimes just insane.
Granted, I only have 2GB on this Linux, but FF invariably eats most of it
(with just 10-20 tabs) if left running for long periods. Interestingly enough,
once I restart FF and reopen the same tabs, memory usage is far lower. Memory
leak? Inefficient caching? Who knows...

Every now and then there is a version of FF that claims to be using less
memory, but in my experience the differences have been negligible. So as far I
am concerned memory friendly FF is something like fast Java... I am still
waiting to meet one of these beasts in the wild. :)

~~~
acdha
Uninstall your extensions, particularly Ad Block Plus, and see if the problem
recurs:

[https://blog.mozilla.org/nnethercote/2014/05/14/adblock-
plus...](https://blog.mozilla.org/nnethercote/2014/05/14/adblock-pluss-effect-
on-firefoxs-memory-usage/)

~~~
com2kid
At which point, I have no reason to be using FF!

FF's plugin system is one of its huge advantages. With IE11 at a "good enough"
state now, if I'm not installing ABP I'm not going to bother switching away
from IE on my machines!

~~~
acdha
I'm not saying uninstall forever, simply to do so before making claims about
Firefox's performance, stability, memory usage, etc. – basic debugging 101
stuff about removing variables.

That said, extensions are at the bottom of my list of reasons to use Firefox:
support for the latest web standards, performance, security, text rendering
quality, usability of the core UI, quality of the developer tools, etc.

------
userbinator
_pdf.js uses HTML canvas elements to render each PDF page_

If it's doing this for text or vector graphics content, then I'm of the strong
opinion that they're doing it _very_ wrong - HTML already has facilities to
render text, and browsers support SVG for vector graphics. The canvas elements
should be used for bitmap images only, since that's what they were designed
for.

Edit: this thought occurred to me because the Chinese site Baidu has an online
document viewer which basically converts uploaded PDFs into HTML and does it
_without_ needing to canvas everything; here's an example:

[http://wenku.baidu.com/view/544340cea1c7aa00b52acb38.html](http://wenku.baidu.com/view/544340cea1c7aa00b52acb38.html)

In my Chrome it takes ~60M sitting on the first page, complete with all the
other pieces of the page (including ads). Scrolling through all the pages of
the document, it ends up at ~130M.

Here is the original document:

[http://www.promelec.ru/pdf/MBI5030%20Preliminary.pdf](http://www.promelec.ru/pdf/MBI5030%20Preliminary.pdf)

When it is opened in PDF.js (v1.0.277) to the first page, I see ~95M and
scrolling through all the pages makes it peak at ~270M; and this is just for
the PDF viewer only, so there is clearly much room for improvement.

~~~
hosay123
Sorry, but this is definitely a case where code speaks louder than words.
PDF.js was a miracle for its time, and the guys behind it definitely aren't
amateurs.

The PDF standard is far richer than the SVG object model, and not to mention
SVG _performance_ in just about every browser except for Chrome has always
sucked balls.

Using canvas seems a solid trade off - pay upfront with RAM in order to
consistently render and cache a page once, or try to jerry-rig some nasty
translation to SVG that's painfully slow to zoom or scroll, and no longer
renders consistently across browsers.

~~~
userbinator
I'm not doubting that PDF.js is an amazing work of effort -they have
essentially replicated the entire PDF rendering engine - but that considering
a different approach, with a different set of tradeoffs, could yield a
solution requiring far less memory and still have practical use.

When considering memory, keep in mind that using too much - combined with
every other process the user could be running - means a higher chance of
needing to use the swapfile, and then any performance advantage otherwise
gained disappears rather quickly. It should be noted that browsers already
cache rendered page images (from HTML) in memory and are very good at that,
along with rendering HTML.

------
jeroen
The current release is v1.0.68, v1.0.277 is a pre-release, so this will
probably be in the next release or the one after that. (
[https://github.com/mozilla/pdf.js/releases](https://github.com/mozilla/pdf.js/releases)
)

I have recently switched my project from v0.8.1013 (from feb 2014) to v1.0.68
and rendering times have improved enormously. Good to see that there are more
improvements coming.

------
pjc50
Useful reminder: you can set "pdfjs.disabled" in about:config to turn it off
and redirect PDFs to a native app of your choice.

The concept of rendering PDFs in the browser isn't fundamentally bad, but
large files crashing the browser is a disaster. This happened with Adobe as a
plugin and it continues to happen.

Now if someone can provide a means of un-embedding streaming video as well,
that would be very useful.

~~~
TheLoneWolfling
> you can set "pdfjs.disabled" in about:config to turn it off

Already do that. Bonus: I'm on a laptop intermittently without internet
access, and can open up PDFs to read offline without having them be lost when
my browser crashes.

> Now if someone can provide a means of un-embedding streaming video as well,
> that would be very useful.

Really hard to do, unfortunately. For HTML5 video, maybe (and I'd love it if
someone came up with something of the sort), but for flash video the problem
is that there isn't really any standard for flash video players, so you can't
really parse the flash file to figure out what video to play without running
it.

That being said, maybe some sort of local proxy that recognizes when a video
file is requested by flash and reports a 404 to it, with a popup "do you wish
to open this link in a local player". But that way lies madness.

~~~
ffreire
This could, perhaps, initially be implemented as a plugin that
inspected/listened for link clicks and forwarded them to your application of
choice. I believe VLC is capable of opening streams without too much hassle
(not sure about other players).

Sounds like a fun weekend project, at any rate!

~~~
TheLoneWolfling
VLC's stream handling isn't the best, though.

(VLC uses a fixed-length buffer, and requires the buffer to be full before it
starts playing. Not the best for an iffy connection.)

That being said, worth a shot! (Not to mention that a plugin could redirect to
an arbitrary application without too too much hassle)

------
hbogert
So.. the author is focusing on metrics - the memory usage. Basicly what's
being done is release the caching sooner. Well what if a user's usecase
scenario is such that he's looking at 2 pages after another a few times which
have 10 pages in between.. This would cause a lot more rerenderings than
before, making CPU go up. I'm not sure if I want to trade memory for CPU when
running on battery

~~~
nnethercote
Software is full of trade-offs. If slightly pessimizing an unusual case is the
cost of drastically improving an extremely common case, then I'm happy to do
that.

~~~
pessimizer
Switching between two non-consecutive pages in a pdf is not an unusual case.

~~~
nnethercote
Switching by scrolling, or by jumping directly to the pages using a page
selector? The latter case will be handled fine by a 10 page cache.

Anyway, I think this fear is overblown. The CPU cost of rendering a single
page isn't that high.

------
kuntau
Maybe Chrome team should have this MemShrink meeting every once in a while. At
the current state it will eat every memory available be it 8GB or 16GB system.

------
_quasimodo
I wonder if it is feasible to compress cached canvases, either as part of
pdf.js or natively in firefox.

------
Goopplesoft
Out curiousity of the process: why wouldnt something like this show up in a
earlier release?

~~~
zz1
Likely because it follows the highly organized release cycle Firefox has
adopted a couple of years ago. If the new pdf.js is included right now in
Nightly, which is necessary in order to verify its integration in the browser,
its interaction with other elements and to ensure a long enough test phase on
rendering before it lands on release channel… well, if it is right now in
Nightly, let us do the math:

* Firefox (release channel) is now 30 * Beta is 31 * Aurora is 32 * Nightly is 33

That's why.

~~~
nnethercote
Yep. The calander is here:
[https://wiki.mozilla.org/RapidRelease/Calendar](https://wiki.mozilla.org/RapidRelease/Calendar)

------
Pacabel
This incident should serve as a very good lesson to those in the Firefox
community who often claim that memory and performance problems don't exist,
even after numerous users report experiencing such problems.

Such memory and CPU consumption problems are often blamed on vague "third-
party extensions", even when they happen with fresh installations on systems
that have never before had Firefox installed. Or they're otherwise blamed on
the user somehow, even when the user is engaging in a perfectly reasonable
workflow.

And even if the problems don't happen on one system, they very well could be
happening on another system, as this very incident shows quite well.

This part of the article is particularly relevant: "Shortly after that, while
on a plane I tried pdf.js on my Mac laptop. I was horrified to find that
memory usage was much higher than on the Linux desktop machine on which I had
done my optimizations. The first document I measured made Firefox’s RSS
(resident set size, a.k.a. physical memory usage) peak at over 1,000 MiB. The
corresponding number on my Linux desktop had been 400 MiB!"

This matches very well with what so many Firefox users describe as happening
to them. The memory consumption ends up skyrocketing to well above what it
reasonably should be. Gigabytes of memory are unjustifiably consumed.

Perhaps now instead of ridiculing or ignoring people who report such issues,
those in the Firefox community will perhaps do the responsible thing and take
them seriously. It should be very obvious now that memory consumption problems
can happen on one system, while not happening on another.

~~~
nnethercote
Hi, Pacabel!

Can you give specific examples of this ridicule? While I don't claim that
everything is perfect, my experience is that for the past few years Mozilla
has taken Firefox performance issues extremely seriously.

For example, I started a project called MemShrink exactly three years ago to
reduce Firefox's memory consumption. In fact, I even wrote a blog post today
that discussed the major improvements from the past year, and what areas we
still fall short on:
[https://blog.mozilla.org/nnethercote/2014/06/16/memshrinks-3...](https://blog.mozilla.org/nnethercote/2014/06/16/memshrinks-3rd-
birthday/). And if you want more detail about this particular effort, you can
read the 70+ status reports I've written in those three years here:
[https://wiki.mozilla.org/Performance/MemShrink](https://wiki.mozilla.org/Performance/MemShrink).

And since you mentions extensions, you could also read about how we solved the
vast majority of the memory leaks that involved extensions here:
[https://blog.mozilla.org/nnethercote/2012/07/19/firefox-15-p...](https://blog.mozilla.org/nnethercote/2012/07/19/firefox-15-plugs-
the-add-on-leaks/). That was almost two years ago.

As for pdf.js, here's the bug I filed last year about it using too much
memory:
[https://bugzilla.mozilla.org/show_bug.cgi?id=881974](https://bugzilla.mozilla.org/show_bug.cgi?id=881974).
The title of the bug is "pdf.js uses too much memory". No ridicule or ignoring
the problem there.

MemShrink is just one of numerous performance-related projects that have been
undertaken at Mozilla over the past few years. I'm sure with a little Googling
you could find out about some of the others.

