
PDF Reader in JavaScript - swah
https://github.com/andreasgal/pdf.js
======
rdamico
At Crocodoc we tested out a similar approach using the Canvas element to
render pages, but ultimately opted to go the @font-face route for a number of
reasons, including:

\- Near-instant client-side rendering (albeit a one-time ~5 second wait during
server-side processing)

\- Native font rendering (Canvas rasterizes all text and doesn't benefit from
technologies like ClearType)

\- Native text selection (which is important for overall UX and our annotation
tools such as highlighting)

\- Better performance on mobile devices (this is an ongoing project)

Here's a comparison of the test file micheljansen links to (thanks!) compared
with the same PDF in Crocodoc:

\- Canvas rendering: <http://bit.ly/kU0mlW>

\- Crocodoc rendering: <http://bit.ly/je2wcv>

Edit: By the way, we're really impressed by this canvas implementation :-)

~~~
swah
I had never heard of crocodoc and was very impressed with this rendering. What
browsers do you guys support?

~~~
peterlai
IE7 - IE9, Firefox, Chrome, and Safari

~~~
swah
So why hasn't Google bought them?

------
micheljansen
I couldn't find a demo online and was too lazy to deal with Chrome Local
Policy madness, so I temporarily put one up here:
<http://pumpkin.micheljansen.org/~dawuss/pdf.js/test.html>

It has some obvious flaws, but it already works surprisingly well!

~~~
knowtheory
Christ on a stick. I submitted this _and_ an example 3 hrs ago. sigh. oh well.

<http://news.ycombinator.com/item?id=2657412>

~~~
joshuacc
You may wish to edit your comment to be less offensive to Christians and other
religious readers. We may not be particularly vocal around here, but we do
exist.

~~~
knowtheory
No, i don't think i'll edit it (nor do i wish to), but i will apologize for
offense i have caused, as that was not my intent.

Edit: I just want to be clear, you see things like "goddamnit" and other ppl
using the words "christ" or "god" (in what others may see as taking them in
vain) all over the place including on HN. I don't see what the big deal is
there either. So really, i do mean it when i say i intended no offense.

~~~
joshuacc
Thank you for the apology.

The distinction I would make between your comment and "goddamnit" is that
while I don't particularly like either one, "goddamnit" doesn't actively mock
God/people's religious beliefs about God. On the other hand "Christ on a
stick" certainly seems like going out of one's way to mock the
crucifixion/people with religious attachment to the crucifixion.

Does that help explain where I'm coming from?

~~~
GetOberIt
Go blog about your persecution?

~~~
swah
A throwaway account for _this_ comment?

------
tiddchristopher
Typographic ligatures (character combinations such as ff and fi) are being
displayed as gray rectangles in Firefox 4 on Windows 7.

~~~
lloeki
Same with Fx4 on OS X. Chrome 12 skips ligatures entirely (so _difficult_
reads _difcult_ ), while Safari 5 just renders a big boxed nothing in lieu of
the document. There seem to be some trouble with curly braces around the
emails in the header.

------
Gerdus
Sweet. Scribd has an HTML5 document viewer so a PDF viewer is certainly
doable. Maybe this will be included someday as a minimal PDF viewer in
firefox.

Now all we need is a proper javascript printing api and almost all business
application can be done in the browser.

~~~
windsurfer
But this is pure client-side. Scribd does server-side generation of the HTML5
page.

~~~
skimbrel
Yeah. If this gets more robust and productized, it just might eat Scribd's
lunch.

~~~
webXL
And Adobe's

~~~
dualogy
...not sure. They're not exactly charging for Adobe _Reader_. If end users or
site owners switch to a less bloated viewer, they still keep using Adobe's PDF
format.

------
bnewbold
I've been having very good results with the Inkscape SVG renderer (inkscape -z
input.pdf -l output.svg). Not pixel-perfect, but handles images and vector
diagrams pretty well, and text is selectable/searchable. This would be a
server-side solution, not client-side.

------
mbrubeck
Here's a blog post by the authors (members of Mozilla's graphics and
JavaScript teams) explaining the motivation and direction of the project:

<http://andreasgal.com/2011/06/15/pdf-js/>

------
ChrisArchitect
nice _technology demo_ \-- I like using google quick view for pdf
reading...this is a nice alternative... dreaming up uses for it - immediate
display and download of pdf receipts that sort of thing?

------
bennytheshap
I think it's not just the Chrome local policy that's getting in the way, but
that the code does an XHR request and I don't think any browser passes that
through straight to the filesystem.

~~~
pcwalton
Firefox does, if the calling web page is itself a local file and the XHR
requests a file in the same directory as or in a subdirectory of that page.

------
tomp
Example rendere document: <http://devongovett.github.com/pdf.js/test.html>

------
neovive
Very interesting. Is there a performance benefit to using "const" instead of
"var" to declare variables in JS?

------
singingfish
Does this library provide the potential to extract the text out of the PDF for
further processing?

------
albb0920
Cant find a link to working demo, I'm just too lazy to git clone.

~~~
devongovett
<http://devongovett.github.com/pdf.js/test.html>

------
ChrisArchitect
speaking of google ..... so much for Chrome's built in pdf viewer..heh.

~~~
Lennie
Feels a bit like deja vu from the google native client. As it was proven that
many, many things are already fast in JavaScript.

~~~
starwed
The author points out in a blog post that this, being pure .js, doesn't need
any sandboxing.

Also, it'll actually be used by FF for rendering PDFs in the future,
apparently.

------
johnx123
Correction: It's not "reader", but "writer" (rendering)

There are other libs out there: WPS: PostScript for the Web
<http://logand.com/sw/wps/index.html> jspdf <http://code.google.com/p/jspdf/>

~~~
georgemcbay
It is a "reader" from the context of the user -- it is software the user uses
to read a PDF. In this context a "writer" would be software you use to author
a new PDF file. I think it is safe to say this is the more widely accepted
terminology on this sort of thing considering Adobe's own PDF viewer is of
course called "Adobe Reader".

------
swah
I also hoped this submission could trigger a discussion about the code style.

This guy is a Mozilla contributor, probably what other posts mean when they
talk about "great programmers", right? Yet, for now, his implementation is a
single 3000 line file.

[edited for alternate ending]

So there is this guy who writes a PDF viewer in a 3000 line file, and the guy
who writes another simple web app neatly organized in 42 files... Which one
would you want on your team?

I wonder how jslinux source code is organized.

~~~
drivebyacct2
What are you asking me to agree about? I would never get caught dead working
on a 3000 line file, and honestly, that seems like a terrible way to have to
add new features.

~~~
barrkel
Some perspective:

    
    
        $ wc -l expr.c decl.c
        26638 expr.c
        34860 decl.c

~~~
kingkilr
I'm assuming those are from GCC?

~~~
barrkel
No, the Delphi compiler.

~~~
kingkilr
Ah, this morning someone told me that if I wanted to get serious about my
constant folding I should be aware the GCC impl was XXXX LOC (in the same
neighborhood as what you posted) so I assumed :)

