
Scribd CTO: “We Are Scrapping Flash And Betting The Company On HTML5″  - jasonlbaptiste
http://techcrunch.com/2010/05/05/scribd-html5/
======
johnyzee
I've worked for a big Scribd competitor (bigger than Scribd by some measures)
and I am blown away that they would move in this direction. Those saying that
they are not taking a great risk are missing the point in a serious way. In
this business, the viewer _is_ the company. It is your whole competitive
advantage. Taking apart a PDF is nothing - putting it back together in Flash
with the right appearance, cropping, individual character metainfo,
hyperlinks, caching, progressive loading, etc., etc., is science. To do that
you need to be very long on Flash, and they just threw away all that inventory
and expertise.

Which is, of course, a killer move if they can actually pull it off as they
will be the first and only company to do it (again, comparing with Google's
Javascript PDF viewer misses the point, as that 'just' renders images). I am
just dumbstruck to see a company do such a 180 degree turn and they deserve
all the credit in the world for having the guts.

~~~
briansmith
Won't everybody using Flash be able to do the same thing with Flash CS5's
<canvas> \+ FXG support?

Isn't the point of this move to support the iPad, the iPhone, and other non-
Flash environments?

------
jacoblyles
Great. Now I might not cringe every time I see a Scribd link.

A flash PDF reader is worse than useless. I can't use flash on my iPhone, and
the Scribd reader used to break back when I used an Ubuntu machine. And it
certainly never ran great on my Mac. When you consider that I already had
perfectly fine, and free!, PDF readers on all three devices, Scribd was
actively harmful.

But an HTML 5 PDF reader could actually be useful! It may be more lightweight
than downloading a whole PDF. It will certainly be no worse than what I can
currently use to render and read a PDF.

This is good. Scribd may change their image as a widely hated startup.

~~~
jacoblyles
I get it, we'll ignore the fact that Scribd makes the web a worse place when
the founders are in the thread.

~~~
BrandonM
Well that's a pretty ridiculous thing to say. How can a website make the web a
worse place? A website either provides something you can't get anywhere else,
which is intrinsically valuable, or you can just ignore the site.

~~~
jacoblyles
>How can a website make the web a worse place?

It's easy. Someone sends you a Scribd link. Scribd is written in Flash so
maybe you can't read the damn thing. Before Scribd, people would just send you
PDF links. Everybody can read PDFs.

Scribd harms the web by reducing the number of people who can view its
content. That's what I call "providing negative value".

------
zweben
I'll be interested to see how they will deal with the licensing issues
regarding font embedding. Right now, there aren't a whole lot of non-free
fonts that allow @font-face embedding, because embedding a font with @font-
face makes the font downloadable to anyone with a little digging through CSS
files for its URL.

I hope a solution is found for this that doesn't require specific allowance of
CSS embedding in the license. If it does, some fonts will never be legal to
embed.

~~~
petercooper
The fonts are what interest me too. Not just the licensing, though (that never
stopped YouTube with its videos - at least not for a few years!) but with
having to convert a myriad of fonts embedded in documents into SVG fonts
(otherwise they won't work on the iPad).. that's no mean feat.

------
wmeredith
Thank god. I like their product, but it's such a resource hog. Here's hoping
that HTML5 Scribd will be a little more lightweight.

------
thiele
"Betting the company"? Sounds a little dramatic. It's HTML5 not Silverlight.
How much risk is there really?

~~~
FlorinAndrei
My thoughts exactly. HTML5 is going to be (or is already) everywhere, so
where's the big risk?

~~~
matthiaskramm
Investment risk. Scribd is a small company, and putting a whole battalion of
developers on a PDF to HTML5 conversion for half a year without even knowing
whether that'll turn out to be possible (as in, good enough for 10,000,000
documents) is a scary move. That being said, check out our upcoming
engineering blog for technical details about how we convert to HTML5 now!

~~~
axod
"putting a whole battalion of developers on a PDF to HTML5 conversion for half
a year without even knowing whether that'll turn out to be possible"

The Google pdf viewer seems good enough to me. It'll be interesting to see
what you can improve on that.

~~~
bobbyi
Actual pdf readers like Preview seem good enough to me. I don't understand why
I would need to take a PDF and view it as HTML.

I do see the value for formats like .pptx where I may not have a reader, but
PDF is already a "lowest common denominator" like HTML.

~~~
sesqu
I've occasionally been thankful for Google's PDF conversion. PDF isn't a
lowest common denominator - there are computers without that capability
(kiosks and terminals, for example). What's more, PDF viewers can be pretty
heavy, especially for fancy documents, and their browser integration tends to
suck even when it exists.

Sure, I like reading my PDFs with evince, but sometimes HTML is just
preferrable.

------
qeorge
How big of a role did SEO play in this decision?

~~~
snowmaker
Good question, but none, really. It won't affect our SEO - we did this to
improve the user experience on the site.

~~~
tamersalama
Actually - I heard it did

~~~
jacquesm
You work for scribd too ?

~~~
benologist
Do you really have to work for scribd to see the benefit of millions of pages
of text?

Edit: it turns out they already present the text at least some of the time,
but markup and text portability still adds some value (see my comment above).

------
thacker
Just in case anyone's wondering -- it's not just converting each page to an
image. It's all HTML5 text, graphics, and images where appropriate.

~~~
wmf
Is there anything specific to HTML _5_ there?

~~~
thacker
The new viewer doesn't use the full spectrum of HTML 5 features, to maintain
compatibility with older browsers, but it would not be possible before HTML 5.

~~~
zmmmmm
Can anybody name a single feature exclusive to HTML5 that it is actually
using? TFA says:

"Friedman estimates that 97 percent of browsers will be able to read Scribd’s
HTML5 documents"

That pretty much counts IE6 into the picture, so I'm really wondering exactly
what "HTML5" features IE6 supports!

~~~
bphogan
Using the HTML5 doctype lets you use HTML5 tags and custom data attributes and
have a valid document. New HTML5 form fields, custom attributes, and markup
elements are usable in IE6 mainly because it just doesn't really bother to
explode when it encounters them. Form fields just show up as text boxes,
custom data attributes are only used in JS anyway, and new structural elements
are usable and styleable in IE6 just by adding JS that does a
document.createElement().

HTML5 isn't something that just came around. It's been in the works by browser
makers for quite a while, which is refreshing. Rather than it being a spec
made up in a purely academic environment (XHTML 2), it's something that's made
up of technologies that have already been used by one or more browser makers
(and often, developers on real sites.)

Also, using the HTML5 doctype in IE6 causes IE6 to go into standards mode,
which is just pure luck.

You can do a lot of good for users if you start using some of the HTML5
features right now, even if it's not apparent. If you use the type="email" for
your forms when you ask for an email, the ipod and ipad will bring up the
Email keyboard layout. That alone is kinda cool.

------
whalesalad
No offense to Scribd, but what is wrong with viewing a document in your
browser using something like Safari? I realize that not all machines are
capable of this, but they could pretty easily. Someone asked the other day,
"Why doesn't Windows have a native PDF reader?". Surely it's possible for all
browsers to quickly and properly render a PDF, with easy controls to navigate.

I understand the added benefits of being able to comment, discuss, share,
etc.. your documents with Scribd, but honestly why the need for HTML5 or
anything at all? PDF's are viewable just fine in a simple PDF viewer.

~~~
netcan
Well, they don't. Not currently anyway.

------
jasonlbaptiste
this is great news. flash was just really overbearing and felt heavy. im sure
it was also a bitch to deal on the back end. Can't wait to play around with
it.

ps- i win newsyc bingo: YC company, techcrunch article, HTML5.

~~~
jorgeortiz85
Bzzzt! You're missing: Apple, Facebook. Try again.

~~~
whatusername
He got Adobe Flash and HTML 5. That's tangential enough to almost include
Apple.

Needs: * Facebook integration (or even better for HN cred - removing facebook
connect) * "Now works on iPhone/iPad"

~~~
jasonlbaptiste
Find out how a 30 year old lean startup,Apple, made billions by opposing
facebook privacy issues and furthering HTML5 on the iPad by acquiring Kiko (YC
S05). (techcrunch.com)

~~~
whatusername
Almost perfect. I just realized we missed Erlang.

~~~
krishna2
Doesn't scribed use erlang?

------
jacquesm
If this doesn't rely on HTML5 features, why call it HTML5?

DHTML or javascript would have been good enough then, wouldn't it?

~~~
daleharvey
if they are using html5 features and degrading gracefully, as most html5 does
by default anyway, why not say you are using html5?

for one it might help push a few people into using more html5 capable
browsers.

~~~
jacquesm
They _only_ use features of HTML5 that are also supported by older browsers:

From the article:

"Friedman estimates that 97 percent of browsers will be able to read Scribd’s
HTML5 documents because those parts of the standard are older and more widely
adopted."

I don't read that as 'graceful degradation', but as a subset based on older
tech.

~~~
tyler
We create basically the same experience across browsers, but use the latest
tech that the browser supports. For instance, things will render much faster
in Chrome than in IE6.

However, the documents will basically look the same across all of our
supported browsers.

------
MartinCron
I don't normally like the "betting the company" metaphor as I feel it's
overused and over-dramatic. There's also a sense of motivating staff with
fear, in that if you don't work hard enough or if you make any mistakes, we'll
lose this bet and the company will fail and everyone here will be out of work.
No pressure, or anything, though :)

When I met Microsoft's Chief Software Architect at PDC last year, he kindly
thanked me for "betting on Azure" and I thought to myself "I'm just
experimenting with a new technology that may make my life easier, I try my
best not to ever _bet_ on anything". Not wanting to be a smart-ass jerk, I
just talked about lolcats and 4chan instead.

Reading about what Scribd is doing, it seems that the metaphor is pretty
appropriate.

------
codexon
I hope they succeed and HTML5 replaces Flash, but this seems like a very risky
move.

~~~
joubert
Google already does a great job of showing PDF files using HTML and
JavaScript. Don't think it is huge risk.

Plus the upside is their stuff will then work on mobile devices too.

~~~
thacker
Google rasterizes the PDF and streams it to you as an image. Scribd will be
converting documents to HTML and CSS while maintaining a near perfect
facsimile of the original document.

~~~
axod
Google does a lot more than that. For example copy and paste works if you
select some text and copy it out. That's non trivial.

~~~
jamesjyu
That is true, their conversion understands text regions and various other
things. However, what makes Scribd's viewer more sophisticated is that it will
actually use _structured_ HTML to render the document content. This is more
than just putting on a layer that specifies regions in the document, it will
actually just be a normal HTML document, made of divs, text, images, etc.

Plus, it will maintain the fidelity of the document -- meaning that even PDFs
with complicated layouts will be rendered properly in HTML. No trivial task.

~~~
axod
What will be the main advantage?

It's a great technical challenge, but will users notice the difference?

~~~
hazzen
Users will be able to easily view PDFs on the web from any device. I primarily
read HN on my phone. Any time I see a [Scribd] link, I forget about the story
because I can't read it. I have also had several occasions where I _needed_ to
read a PDF on the go. I had to email it to a friend and then call them,
dictating the pertinent bits over the phone.

Once this is live, Scribd will gain at least myself as a user, and I suspect
many more.

(Nesting is too deep to reply to axod, so: the fact that Google is doing this
should be reason enough. Scribd exists as a place to publish material. That
material should be reachable by as many users as they can manage.)

~~~
axod
[http://docs.google.com/gview?url=http://infolab.stanford.edu...](http://docs.google.com/gview?url=http://infolab.stanford.edu/pub/papers/google.pdf)

Works fine for me :/

------
TrevorBurnham
If they really can display PDFs accurately using HTML5, that's great, and
plenty of other companies (including CrocoDoc) would benefit from following
suit. But PDF still does a lot of things that HTML doesn't, like first-class
font embedding (@font-face won't cut it). I'm not sure whether this is the
right time to make the switch, though me and my iPad wish them luck.

------
benologist
This is a good move. HTML is a better and more natural feeling format for long
documents than Flash (and PDF readers) ever were.

------
vineet
If they are moving from flash to html5 then how are they drawing?

You can get alot of drawing flexibility if you use Canvas, but performance of
any Canvas-bridge for IE makes it not worth using.

~~~
qhoxie
I might be missing your point, but Scribd does not do any drawing. Any non-
text elements are represented as images.

~~~
vineet
Hmmm.... ok, that makes sense. Thanks!

------
wallflower
Seems like a route to a possible acquisition by Adobe or Google. The technical
constraints of HTML5 make it interesting.

------
jarek
"Now any document can become a Web page."

Now if only I didn't have to log in to read it.

~~~
hackworth
you have to log in to read documents on Scribd? i've never had to.

~~~
jarek
Was it to view them as the source PDF rather than their Flash approximations,
then? I can't remember, but they /did/ get on my nerves more than once
requiring an account to get at the hosted content.

~~~
s3graham
Yes, it was for the pdf. Clutter the web with SE spam, and the make you log in
to get the real document. Gee, thanks.

------
thewileyone
I don't get it. It's a shift of technology from one platform to the next.
Content, such as ads, is going to be the same. So one gee-whiz thing to the
next is just going to result in the next generation gee-whiz ad-blocker.

------
mrvir
HTML5 really makes sense for the online viewing. Guess PDF will stay as
popular offline reading format. At least until we get some universal cross-
platform/cross-browser webarchive format. Could be an idea for a start-up ...

------
cageface
I'm not happy with the way Apple's been throwing its weight around on this
issue but all the same I'd much rather see this kind of thing implemented in
HTML than in Flash.

------
jackfoxy
Ironically the page is causing me a stack overflow in IE8.

------
greyfade
Cool. Maybe now I'll actually be able to log in and view a document?

Because I never have been able to before.

------
dmoney
It looks like it's still flash. Why announce this before it's done?

~~~
gojomo
To get at least two waves of attention: (1) announcement; (2) delivery.

------
jaekwon
Nice. Thank you, it's about time.

