

GFS: Evolution on Fast-forward - joeshaw
http://queue.acm.org/detail.cfm?id=1594206

======
vicaya
I think I independently came up with the same idea to solve the distributed
gfs master problem: use a separate bigtable/gfs1 cluster for master metadata
for gfs2.

I'm glad that I'm not crazy :)

~~~
seymourz
Also, does the gfs1 still have single master, so that the gfs2 has mulitple
Bigtable tablets serving as distributed masters for gfs2? Is this the cause
for "In fact, it just makes the bottleneck limitations of the system’s single-
master design more apparent than would otherwise be the case.", as stated in
the article?

~~~
vicaya
gfs1 is still single master, but the workload is much simpler in this case: it
serves the gfs2 master bigtable cluster exclusively. Most of the documented
gfs master failures are due to misbehaved map-reduce clients. Also the gfs1
master can be down for extended period of time without affecting the master
operations, due to the nature of the cluster (you're unlikely to create a
million files per second resulting in much compaction and splits in metadata
tablets)

The quote you mentioned actually meant that if you use Bigtable on top of
gfs1, the single master failure is more apparent due to the low latency
requirement of the application that use the Bigtable.

~~~
seymourz
Is this vacaya related to the vacaya of hypertable? :-)

------
pbhjpbhj
Google file system, nothing to do with evolution.

------
bravura
Why have google not open-sourced GFS?

~~~
daeken
Even though there's a good bit of info out there about it, it's still a
significant advantage for them. It could happen eventually, but I doubt it'll
be any time soon.

~~~
gwern
That, and who but their rivals would use it? It's highly tuned to their uses
and well-maintained in-house, so there's no percentage to opening it.

~~~
russss
I don't agree; as the article mentions, GFS is used for hundreds of different
tasks within Google which encompass a massive range of use patterns. It's a
general-purpose system, even if it didn't start as such.

It's no coincidence that the entire Google stack
(GFS/MapReduce/Chubby/BigTable) has been replicated as open source: it's
because it's a broadly useful set of tools for doing large-scale work with
data. It would be useful for thousands of companies if it was open-sourced.

I'd wager that the reason Google isn't open-sourcing GFS is because it's a key
part of the "secret sauce" which powers a lot of Google innovation: the sauce
that allows a single engineer to slap together a massively scalable, useful
application in his spare time. I think that separates Google from their
competition more than any specific app does.

~~~
gwern
We seem to be in violent agreement. :)

Thousands of companies might use if it was open-sourced, but I rather think
many of the serious users are competing with Google, or could be competing in
the future. It being part of the secret sauce is exactly why there's no
percentage.

