
How Scribd's HTML5 conversion works ... even when it shouldn't - matthiaskramm
http://coding.scribd.com/2010/08/26/plan-b-font-fallbacks/
======
gregstoll
My respect for how hard it is to convert from .pdf to HTML just went waaaay
up. Scribd must have really thought displaying in HTML5 was worth a lot of
trouble if they went to all this effort to be precise!

~~~
points
Whilst technically impressive, it reminds me of the 'NASA spent millions
developing a pen that writes in space vs Russians used a pencil'.

Googles competing pdf reader just renders pdfs into images which also have
selectable text.

~~~
prs
As for the NASA story, you might find this link interesting:
<http://www.snopes.com/business/genius/spacepen.asp>

------
danilocampos
I don't have any particular use for Scribd but I love them anyway.

Any company that can kick Adobe in the nuts by publicly reconfiguring their
business to shun Flash is all right with me. And what technical prowess these
guys have. Incredibly smart people there.

~~~
count
I've put a few PDFs of presentation slides on scribd, and then used their
widget to embed those slides directly into a blog post talking about the
presentation. Linking to a PDF on another page (and asking many people to a)
open acrobat, b) download a multi-MB file from my host) and/or hosting that
PDF yourself is sub-optimal. Let scribd host it (you can provide a link to
download the actual content you uploaded, rather than their web widget), and
pay for the bandwidth. I never saw the use for it, until I started giving
talks and wanting to embed them. It's youtube for PPT :)

------
Samuel_Michon
Weirdness. They have a section titled "Detecting the font family", where
they're supposedly comparing Trebuchet and Courier, but in the example they
show Trebuchet and _Myriad_.

------
fragmede
Idle thought - would it be possible using layers to do selectable text in an
image - top layer is 100% transparent text, 2nd layer is the image. The font
face is preserved (though it still won't wrap), but the text is still
selectable for copy and paste.

~~~
matthiaskramm
Browsers are "smart"- they won't let you select text that's 100% transparent.
99% transparency works... but also looks kind of weird if the texts don't
overlap perfectly.

You can roll your own text selection in javascript (on top of the bitmap) if
you know the glyph positions though- that's what e.g. Google Books does. It's
a valid option if you don't care about zoomability.

~~~
bd
Are you sure? I just made a quick test and it works ok in FF, Opera, Chrome,
Safari, IE9 (Windows 7, Ubuntu 9.10):

    
    
      <body style="background:url(image.png)">
       <span style="color:rgba(0,0,0,0)">Hello world</span>
      </body>

------
lazyjeff
I don't find being able to view the mangled pdf on screen worth the time saved
downloading the actual pdf. With video and audio I can understand the benefit
of in-browser viewing, but why do we need this service for pdfs anyway?

~~~
d2viant
It's horribly confusing to someone who doesn't understand that it's not being
rendered by the browser itself. PDF's (especially when rendered inside the
browser) break the browsing experience.

Your browser controls stop working as expected, history gets bent, links don't
work as expected. All of a sudden you're now working within an Adobe Reader
application or FoxIt Reader (albeit embedded in the browser) without even
realizing you're in an entirely different context outside of an HTML page.

~~~
ugh
Links and history work as expected (at least in the browsers I know with
native PDF support) and I’m not sure why it is bad that some users might not
understand that a PDF document is not a HTML page.

I also don’t know how Scribd helps users understand that better. Seems
horribly confusing to me if you don’t follow them closely. (Wait what? The PDF
is suddenly a webpage? But sometimes Flash? I can still download the PDF? Why
doesn’t it look exactly like the webpage? What’s going on?) It’s perfectly
usable, even without a deep grasp of the concepts, but so are PDF viewers
inside browsers.

~~~
DougBTX
The problem historically with plug-in PDF readers was that the Adobe one was
very slow. That seems to be fixed now - though I can't tell if the software is
better or I just use faster hardware.

------
yarone
Matthias At Work

