1. It is tied to v8 (PDFs can run JS, this PDF viewer uses v8 do so - see CJS_Context::RunScript etc), so it would mean bundling 2 JS engines, with all the security downsides of that.
2. This is written in C++. You can sandbox C++ in various ways, but that would still increase the surface area of the browser, compared to pdf.js which only uses things normal web content would use.
3. pdf.js is not just meant to render pdfs, it's also a useful project to push forward the web platform. Areas where pdf.js was slow turned out to be things that were worth optimizing anyhow. This doesn't benefit people viewing pdfs directly, of course, but it's still an interesting aspect of the project.
1. It's not tied to V8.
It uses V8 currently, but:
- The vast majority of the code has nothing to do with JS.
- Almost all of the JS-related PDF code is independent of V8.
- The use of V8 is abstracted out sufficiently via IFXJS_Runtime etc that with a bit more work, a different JS runtime could be easily integrated.
2. You note that C/C++ can be sandboxed, but then claim that it "still increases the surface area of the browser".
This is nonsense: Firefox's 11 million lines of code are also written in C/C++, and should be sandboxed. You don't increase the surface area by also sandboxing the PDF plugin -- either the sandboxing mechanism works or it doesn't.
3. If anything, pdf.js's continued poor performance and operation demonstrates why forcing everyone to operate inside of a JS runtime is incredibly harmful to the web platform's progress.
Adding more C++ to the browser certainly does increase the attack area. Sandboxes that expose enough functionality to run a modern browser engine commonly end up with holes here and there (e.g. the Pwnium vulnerabilities) and it's best to not use them as the only layer of defense; moreover, the sandbox does not fully enforce all of the security properties that the Web platform demands (e.g. cross-origin iframes).
> 3. If anything, pdf.js's continued poor performance and operation demonstrates why forcing everyone to operate inside of a JS runtime is incredibly harmful to the web platform's progress.
As I mentioned before, pdf.js is not as JS bound as you might think. Furthermore, it was written before asm.js existed and doesn't use any asm.js; if pdf.js were JS bound, then asm.js would be a very powerful option to improve performance that would not involve dropping to unsafe native code.
Creating yet another sandbox seems silly, and NaCl hasn't been hit by pwnium, it's only been a stepping stone to the renderer (I'll let comex dive into details here!)
It's just an OS sandbox currently. pdfium previously worked with NaCl, with a non-V8 JS VM (work done by Bill Budge). V8-on-NaCl used to work, I think it may have bitrotted since then, but it used NaCl's dyncode modify API to do PIC. The GC moves code too, so extra page permissions need to be changed when that's done, but I think that's the extent of code modification that needs to be handled for a JS JIT to work on NaCl (on top of the sandboxing).
I don't like the DRM sandbox anyhow; it's unfortunate that DRM was added to the Web, forcing a DRM module at all (speaking for myself, not my employer).
I understand the feeling about DRM, but given that sandboxed DRM is going to happen I'd hope that the best efforts possible are put in to make users safe. Good sandboxing seems the right way to go. I'm not any kind of a security expert, but jschuh seems to think the current sandbox isn't sufficient:
I hope the right improvements go into tightening the DRM sandbox :)
I like the concept of pdf.js, but it's still significantly slower, and thus provides a worse experience to the user, than native viewers.
It's been a significant effort on our part, and we'll be contributing it back to the PDFjs code base. Opera also has a similar coalescing effort underway by Christian Krebs.
Will V8 run inside NaCl? As I understand it, the NaCl JIT functionality is pretty slow for use cases like polymorphic inline caching.
> I like the concept of pdf.js, but it's still significantly slower, and thus provides a worse experience to the user, than native viewers.
Would that matter for PDFs? I thought js in PDFs is mostly used for form validation, which isn't very compute-heavy.
it seems to only be an issue on really heavy pdfs, which are pretty rare
Firefox also seems to register two separate mime-types for pdf, only giving an option to use pdf.js on one of them. I've yet to dig into firefox and fix this.
AIUI from the NaCl guys, it already does.
In any case, I use a native viewer, which provides the best overall user experience.
In particular, I enjoy it's superior font rendering (compared to the chrome implementation). I really don't get why chrome (on windows anyhow) has fairly fuzzy fonts while rendering pdf - noticably worse than pdf.js or acrobat.
It's dead certain that PDF.js has plenty of room to improve but that requires solid benchmarking, not anecdata. I would hope Mozilla is collecting telemetry data about common bottlenecks from millions of users and triaging to see which problems are core or artifacts from local system configuration, graphics drivers, etc.
Frankly, it's something I'll gladly put up with in most cases just to avoid bad font rendering - and that's exactly what I do.
BTW, I'm pretty sure that this kind of stuff if pretty platform (and GFX-driver) dependent - and e.g. FF on mac os performs less well IIRC.
On the other hand, there might be an initial compilation pause when starting to load the PDF.
May be Opera should adopt this.
I've been working on coalescing the elements and you can see the fruits of my labour by dropping a PDF into https://web.notablepdf.com which is an annotation app based on PDF.js.
Are there statistics about what other sorts of PDF's people read? How else are PDF's made? I always assumed that people that use word processors would exchange files in their word processor's format, but maybe some people export to PDF? (I'm not familiar with the habits of word processor users.)
So far, coalescing works, but we've had to make substantial changes to a lot of places. The hard part is still ensuring it is compliant with PDF specs, which we're in the process of working through before we submit it as a patch to the PDF.js team.
In most situations I don't feel the need for a native reader, anymore.
"Original code copyright 2014 Foxit Software Inc. http://www.foxitsoftware.com"
* Copyright (c) 1998-2000, Microsoft Corp. All Rights Reserved.
* Module Name:
* GDI+ Native C++ public header file
I saw no evidence in the project showing that this file has BSD-style license.And since it is part of the windows SDK, it is nearly impossible to be BSD-style licensed.
Maybe it was included from foxit's code or other codebase but it is better to be put into the thirdparties directory due to the license issue.
Thanks for pointing out my misconception about Windows SDK.
Agreed it is always nice to have these things in a thirdparty directory, though, but the larger Chromium project actually does appear to have all of pdfium in third_party, which helps keep that clear.
I wish they could either make Google Code decent or simply kill it and use GitHub instead.
Is this a new implementation? Of did Foxit release it as Open Source?
EDIT: I see there are Foxit employees in the commit list. Well, that explains that!
Anyway this is great news. Kudos Google.
By the way, for those confused, the source is not on svn like Google Code fails to communicate but on https://pdfium.googlesource.com/.
And note that isn't the first time there's been Opera interest in pdf.js — I spent the majority of summer 2012 working on trying to get pdf.js running well in Presto, as was relatively well known around pdf.js contributors.
If you look at the code, it is not really well architected. Here is a file I found a problem in - http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Tex... . Take a look at that file and judge for yourself if it follows "Code Complete" type suggestions.
One reason I looked in that file is poppler does not deal well at all with many map PDFs like http://web.mta.info/nyct/maps/busqns.pdf or some others I have on my hard drive. They take forever to load.
Some PDFs have caused the applications using poppler to crash, although some of those have been patched. It's not as bad as it used to be, but still. My patch to speed up the bus map PDFs was not accepted. Then there are features like being able to enter data into PDFs and such. Compare and contrast Adobe's official Acrobat app for Linux and a PDF reader based on poppler like evince.
So the answer is a standard one - code architecture, bugs and features. The answer would be to take the PDFs that Adobe Acrobat handles but which poppler doesn't in terms of bugs and features, and see how pdfium handles them.
Of course, it's possible pdfium will handle those but fail on an entirely different class of PDFs and their pdfium specific bugs.
The PDF standard is a fairly large one. What features does pdfium handle which poppler doesn't? What percentages of PDFs crash the viewing application, or don't render correctly compared to poppler? And so forth.
I should also add that poppler usually depends on cairo for vector graphics. So once in a while the fail for a pdf is on cairo, not poppler. I have seen some of those fixed, some not.
Anyway, this is great news for Chromium, as the PDF plugin can now be shipped to distro repos.
Nothing in the wiki either https://code.google.com/p/pdfium/w/list
edit: .... just not for the actual code.