

SSD Streaming at Last.fm - russss
http://blog.last.fm/2009/12/14/launching-xbox-part-2-ssd-streaming

======
sadiq
I'm a little confused as to the actual problem being solved here.

For SSDs there is essentially no seek penalty, so random read and sequential
read performance are essentially the same. Spinning hard drives, because of
the seek penalty, have pretty poor random read performance but fairly good
sequential read.

Surely, for a streaming application, it should be sequential read that is the
key requirement?

You can predict with a very high degree of accuracy what piece of data you'll
be requiring in the near future, at which point the seek can be scheduled well
in advance.

This, of course, assumes your data is laid out in a sane fashion and you're
not blindly jumping all over the disks.

Strikes me that last.fm could potentially do with looking at approaches like
Facebook's Haystack where they put considerable time in to minimising the
amount of IOPS required to pull an image off of the disk.

~~~
russss
Last.fm does progressive streaming, not HTTP downloads.

i.e. it reads 16KB of one track, sends it to one user, then 16KB of the next
track, sends it to the next user, etc. Then after 1 second it goes through and
does the whole thing again.

This is a massively seek-heavy operation.

~~~
ajross
Only when implemented naively. Why not transfer the whole 5-10MB file at once
into a buffer somewhere (on-box, in a proxy, whatever) and then stream from
there? DRAM is vastly cheaper than trying to get a seek-heavy architecture to
scale on seek-limited devices.

But then, moving to flash is sort of a complicated version of the same
solution...

~~~
russss
It's not a complicated version of the same solution - it's cheaper version of
the same solution. As expensive as flash is, it's still cheaper than the
equivalent amount of RAM.

~~~
ajross
Cheaper to buy as a part, not cheaper to implement. Given that last.fm has a
bunch of boxes with the data already on them, replacing them all with SSD
versions sounds like a _more_ expensive solution than a bunch of buffering
proxies to me.

And in any case, I said "complicated", not "cheaper", and I'll stand by that.
Complexity has costs all by itself.

~~~
russss
Run-of-the-mill caching proxies are no good in front of a distributed
progressive streaming system such as this, because they would have to be
modified to stream progressively and also to authenticate requests, which
would increase the complexity far more than the comparatively minor changes
Last.fm had to make to MogileFS.

So I say it's cheaper and less complex.

~~~
ajross
Huh? I'm all but certain that a default squid install would work fine.
"Progressive streaming" at the server side generally doesn't require any
configuration at all in TCP -- the client reads what it wants, the buffers
fill up, and the transmission stalls on a missing ACK. And the authentication
layer is almost certainly downstream of the backend storage anyway, so I'm not
sure how that would matter.

Swapping hardware is a _really_ expensive expensive change in the IT world,
significantly more so than deploying new hardware on an existing
infrastructure. That you somehow think otherwise is surprising to me.

------
pmorici
Has anyone tried this with a local caching file system like OpenAFS? I've
always wanted to but AFS is a PITA to get setup.

------
lallysingh
This is the type of stuff I love to read about. Thanks for posting it.

