

A practical, scalable, distributed data store - smanek
http://www.scribd.com/vacuum?url=http://www.hpl.hp.com/techreports/2007/HPL-2007-193.pdf

======
wdr1
Is it just me or is scribd.com the worst thing to happen to the Internet since
MIME-based email?

Insofar as I can tell all it does is strip me of basic functionality (e.g.,
save the thing so I can read it offline), introduce confusing functionality (
_TWO_ scrollbars on the left!? WTF?) and all for what? So some engineers could
do some flash-based masturbation & feel web 2.0?

Scribd.com developers: you are not Web 2.0. You are not 1.0. You are web -0.5.
You are what we people did when they had bullshit internal doc apps, _before_
HTML.

Now please stop pissing off the Internet & go under already.

~~~
bayareaguy
I suspect that Scribd may be good for precisely the kinds of proprietary
format documents you don't see as YC news items very often (Word, Excel,
PowerPoint etc).

However for PDF, Scribd really isn't a good fit since there are already so
many good PDF viewers out there. In particular the PDF support my OSX Laptop
comes with blows Scribd out of the water so a Scribd link to a PDF is
effectively a downgrade for me.

Too bad there isn't some info in the HTTP protocol Scribd could use to decide
to deliver me the PDF instead. Oh wait...

------
smanek
Here's the real PDF: <http://www.hpl.hp.com/techreports/2007/HPL-2007-193.pdf>

------
thaumaturgy
I read the paper, and I must be missing something. (I'm not a computer
scientist, so there's probably something important here that's flying right
over my head.)

The paper mentions "fault tolerance" a number of times, and I'm thinking,
"fantastic! They've got some magic in there so that if a node goes down, there
are automatically some number of nodes that can instantly take over its
place." Except, they don't ... their fault tolerance sounds like they just
expect to use servers with really good data recovery. It does mention a
primary backup server that's supposed to shadow each other data server, but
now we're talking about an effective 2n nodes. Also, it doesn't sound like it
would be all that fault tolerant in the event of network problems (broken pipe
between nodes, denial of service attack, etc.), or if the root node goes down.

Since they're only storing actual data in leaf nodes, they're not
outperforming a classic balanced binary tree in search time.

It does sound like they've done some good work in making sure that they keep
data integrity in the event of client-server communication problems, but
that's also been solved by a number of database and file systems already.

So ... can someone explain the novelty here?

