

Ceph: open source petabyte scale distributed storage - marcua
http://ceph.newdream.net/

======
twohey
From <http://ceph.newdream.net/roadmap/>

"We hope to have the system usable (for non-critical applications) by the end
of 2009 in a single-mds configuration. Even then, however, we would not
recommend going without backups for any important data, or deploying in
situations where availability is critical."

It looks interesting, but not ready for prime time. Anyone using this in the
wild?

~~~
m_eiman
From the latest Dreamhost newsletter:

 _In fact, Ceph is practically what you'd call "stable" at this point (which
is not to say it's "production-ready"!), and we've actually begun testing it
as a backup/replacement for our poor backup.dreamhost.com server (who's been
having a terrible time for months)!

If YOU would like to give Ceph a try, please, download away... it's free!

Also, we're going to be setting up a "playground" test-bed where anybody can
try out a Ceph installation we set up and maintain in our data center. If
you're interested, just email beta@ceph.newdream.net, and we'll send you an
invitation when it's ready!_

[edit: added the bit about beta testing]

------
va_coder
How is this different from Hadoop distributed file system?

~~~
marcua
Whereas HDFS is to be used programatically or through a shell, it appears
(though the documentation is sparse) that Ceph was designed to be mountable
like most traditional unix FSs. There's the MountableHDFS [1] project for
Hadoop, and so these could end up being equivalent interface-wise. At that
point it's all up to how they implement create/append/delete/seek/replication
semantics, which Hadoop has way more documentation on than Ceph.

The Ceph docs also imply that they have designed it so that it's easy to
snapshot directories---I'm not sure whether HDFS has facilities for this.

[1] <http://wiki.apache.org/hadoop/MountableHDFS>

------
patrickgzill
How is this different from Lustre (lustre.org) ?

~~~
anotherjesse
We are using Lustre on a projec. The differences that jump out to me on
<http://ceph.newdream.net/about/>

1) auto-balancing when new storage nodes are added

2) copies of objects stored in multiple storage nodes

Running HA Lustre requires a LOT of work.

1) Lustre requires tricks like copying files to rebalance

2) Lustre only has one copy of any section, so each OST is a SPOF. Sun
recommends deploying OST in pairs with DRDB and Heartbeat functionality
(cutting your space in half and complicating the deploying), but a box failing
won't break your FS. This doesn't fix the problem that if a network partition
occurs since you still have locality of chunks in a single rack.

For more information about how complicated Lustre in a production environment
is check out the talks at the last UG by Sun employees (Lessons Learned & Best
Practices, Managing High Availability on a Shine-Equipped Lustre Cluster):

    
    
        http://wiki.lustre.org/index.php/Lustre_User_Group_2009
     

I've not tested Ceph in production but many people I talk to about our Lustre
woes recommend Ethan Miller's work - <http://users.soe.ucsc.edu/~elm/> \-
which leads to Ceph ;)

Even if you aren't worrying about availability, Lustre has issues with usage:

Lustre requires older kernels - which means compatibility with modern non-RHEL
distros leads to headaches when you need the lastest version of KVM (or
python).

Lustre also tries to be a POSIX filesystem but it breaks horrible when you try
to use features like O_APPEND from multiple nodes (file corruption!) We are
still tracking down Lustre breaking when we read files in a certain order
(rsync is ok, but start skipping around in a file read leads to the kernel
stopping responding)

After months of dealing with Lustre issues with trying to support POSIX I have
come to love the non-POSIX (S3-like API) to distributed filesystems.

