
Distributed file storage: MogileFS - bootload
http://www.uncov.com/2007/4/4/distributed-file-storage-with-mogilefs
======
staunch
Another example of Brad Fitzpatrick showing what a hacker armed with Perl (and
occasionally C) can do. He creates tools that obsolete expensive equipment in
weekends.

Perlbal = Who needs a $200k "intelligent" hardware load balancer?

MogileFS = RAID my bank account? No thanks I'll use commodity disks on the
cheap.

Memcached = A hundred DB servers? How about we just utilize the extra memory
on the network and cache a lot, thanks.

~~~
bootload
Gotta agree. Best thing MT & 6Apart did hiring this guy. I can't help think
that if some of these tools intelligently applied with twitter (or MySpace)
they might be doing better than '11,000 s' ~
<http://news.ycombinator.com/comments?id=12441>

The reason I added this was I was looking for topics on scaling. Google (lots
of reads), Livejournal (lots of writes & reads) must be doing something right
in scaling to have a look at. One point Paul Buchheit [0] made in Startup
School 2007 was the difference b/w types of data. Big data & small data. The
later you should use DRAM for. Wonder if Twitter uses 'memcached'?

Reference

[0] Startup School 2007 Wiki, 'Paul Buchheit creator of GMAIL at YCombinator
Startup School'

<http://wiki.startupschool.org/doku.php?id=notes>

~~~
staunch
Six Apart didn't hire Brad -- they acquired LiveJournal from him and named him
"Chief Architect" or something. He's an amazing hacker and a reluctantly-
badass entrepreneur.

I'm sure Twitter will use memcached, like Facebook/Digg both do. Dealing with
11k rps doesn't really tell us much though -- how many are dynamic and how
many are cachable/static?

From the "5 questions" interview their developer does not seem to be amazingly
qualified to do what he's doing -- definitely no Fitzpatrick.

~~~
bootload
_'... Six Apart didn't hire Brad -- they acquired LiveJournal from him and
named him "Chief Architect" or something ...'_

Same result.

_'... their developer does not seem to be amazingly qualified to do what he's
doing ...'_

The thing that strikes me is the system is not layered enough. The API's the
app developers should be calling would shield having to deal with these types
of problems. nostrodemons [0] summared flickrs approach to optomisation. [1]
So is it the lack of a scaling infrastructure where twitter is failing?

 _'... how many are dynamic and how many are cachable/static ...'_

One thing I notice with twitter is the update on the sytem. Every 2 minutes.
For most users 5-10 minutes would probably be ample. I often wonder why they
don't say "right you want RT, well here's the monthly subscription".

As for the dynamic and cacheable, the main hits appear to be reads of RSS
public timeline. [2] RT creation allows no or little caching as the RSS would
be built on the fly. Couple that with Rails in ability to talk to multiple
db's [3] and you get bottle necks. Makes you wonder why they don't switch
certain layers to perl?

' _... definitely no Fitzpatrick ...'_

Rare as hens teeth.

Reference

[0] nostrodemons, 'news.yc user'

<http://news.ycombinator.com/user?id=nostrademons>

[1] nostrodemons, 'comments in Startup founders, what books did you find most
helpful?'

<http://news.ycombinator.com/comments?id=5715>

[2] google groups, twitter development talk, Alex Pain _'we don't gaurentee
that you'll be able to collect contiguous sets of data from the public
timeline API method. It's our most-requested method, so right now it's
optimized for performance, not archival'_

<http://tinyurl.com/3xay7v>

[3] Twitter trouble, 'thereÂs no facility in Rails to talk to more than one
database at a time', Ibid.

------
mattjaynes
I messed around with getting this setup a couple of weeks ago. It was a long
process, and I moved on to something else before I ever finished.

Does anyone else have this up and running? Could you share your thoughts and
experience with MogileFS? Thanks!

