Hacker News new | past | comments | ask | show | jobs | submit login

A lot of companies using Ceph at scale are facing huge issues (OVH, etc.), so he is not wrong. Why take the risk of going with a solution that is known to cause issues?

Has nothing to do with EMC.

Thanks for that!

I've talked to a lot of large-ish commercial Ceph customers and they seem to spend a lot of time building kludge-arounds for support. And tend to live terrified that the whole clumsy edifice will come crashing down at the cost of their jobs.

Also too, Ceph is block, object and file. Block is ok up to a point, object is dubious and file is utterly untrustworthy. At least at any kind of real scale - 3 servers in a rack aren't "scale".

Why must someone who isn't a Ceph fan (and I fail to see why storage systems are a "fan" activity) live in the evil pockets of EMC? I know people who've smoked for years and don't have any sign of lung cancer either.

Ceph is all K/V objects underneath. To say that "block is OK" but "object is dubious"...is silly.

Care to share what "kludge-arounds" there are for support? Red Hat offers commercial support in case you need to phone a friend.

OVH isn't exactly a shining example of a quality engineering organization. Simple web searches show how they have misused things and cause large outages.

Ceph is very reliable and durable. We've actually gone out of our way to try and corrupt data, but we failed every time. It always repaired the data correctly and brought things back into a good working state.

Ceph and Yahoo run very large Ceph clusters at scale, too.

I believe OVH uses openstack and not ceph.

You can use Ceph together with OpenStack. They used ceph for their cloud services but had huge problems. If I am not mistaken they completely threw out Ceph by now.

http://travaux.ovh.net/?do=details&id=20636 http://status.ovh.net/?do=details&id=14139 http://travaux.ovh.com/?do=details&id=20490 http://travaux.ovh.net/?do=details&id=26382 and so on...

They took months to figure out what's going on.

Any idea what the underlying issues with Ceph were?

My story is a bit dated, but we went from gluster to ceph to moosefs at one startup. Gluster had odd performance problems (slow metadata operations - scatter/gather rpcs and whatnot I would guess) and it was hard to know from the logs what was going on. Ceph was very very early at this point, but part of it ran as a kernel module and the first time it oops'd, I deleted that with fire. MooseFS ran all in userspace, had good tools for observability into the state of the cluster, and the source code was simple and clean. It didn't have a good story around multi-master at that time, but I think that is improved now.

Ceph is extraordinarily complicated to run correctly. The docs aren't great and commercial support is pretty mediocre.

It's an amazing piece of software, but takes a great deal of engineering to get right. Most folks won't invest that much engineering into their storage.

This is why Providers like EMC and NetApp can extract 10x the cost of the raw storage from enterprises.

The RedHat ceph docs are great and open to everyone for free.

The RedHat commercial support has been pretty good for us. We presented them with 2 bugs, and they addressed both. One took a few weeks but one only took a few hours to get a hotfix started.

EMC storage is absolute trash post Dell merger. Pure 100% dumpsterfire. Their customers know their systems better than they do. It's pathetic.

zzzcpan - no, metadata was in RAM which made things like directory listings and whatnot that were very slow in gluster very fast in MooseFS

No clue what the underlying issue was but when reading:

"We have about 200 harddisk in this cluster... 1 of the disks was broken and we removed it. For some reasons, Ceph stopped to working : 17 objectfs are missed. It should not."

I think the underlying issue is simply "Ceph" ;-)

User error

I can't find now, was MooseFS the one using MySQL for metadata?

MooseFS have a highly optimized service to load all the metadata into memory (similar to Redis).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact