Conversely, you could have a NoSQL layer below Postgres, where PG stores and indexes metadata which tells it which, of many, small NoSQL dbs to find the actual data in. These data dbs then can be sharded/replicated across physical systems as you like. You loose some raw speed on reads, but avoid a global write lock and the system scales quite well. I've started playing around with such a system with https://github.com/cloudflare/SortaSQL
CloudFlare (www.cloudflare.com) in San Francisco (H1B)
We have built a global network to help make every website faster and more secure. We're looking for the most talented engineers who want to tackle some of the web's hardest problems, see their work positively affect hundreds of millions of people every day, and be a part of a fast-growing, San Francisco-based startup.
Tens of thousands of sites worldwide (from Laughing Squid to CrunchGear to Metallica to the Government of Turkey to the IRS of Pakistan) are already using CloudFlare. More than 200 million people will experience a faster, safer Internet because of CloudFlare this month -- and that is only 9 months since our public launch!
CloudFlare is an engineering-driven organization. The best ideas win here. We're a small (20) but rapidly growing team. We're looking for talented engineers who get excited about the challenges of working at Internet scale. We are currently actively seeking:
Check out some of the jobs we're currently looking to fill at:
What filesystem did you end up using? Ever looked at GlusterFS (http://www.gluster.org/)? From an (admittedly unscientific) comparison of the various free/open-source distributed filesystems, it seems the best to me.
Ceph is still in beta, and the fact that Lustre only supports a single Metadata Server is really scary to me.