

Why GlusterFS should not be integrated with OpenStack - grk
https://shellycloud.com/blog/2013/09/why-glusterfs-should-not-be-implemented-with-openstack

======
notacoward
GlusterFS developer here. The OP is extremely misleading, so I'll try to set
the record straight.

(1) Granted, snapshots (volume or file level) aren't implemented yet. OTOH,
there are two projects for file-level snapshots that are far enough along to
have patches in either the main review queue or the community forge. Volume-
level snapshots are a little further behind. Unsurprisingly, snapshots in a
distributed filesystem are hard, and we're determined to get them right before
we foist some half-baked result on users and risk losing their data.

(2) The author seems very confused about the relationship between bricks
(storage units) and servers used for mounting. The mount server is used _once_
to fetch a configuration, then the client connects directly to the bricks.
There is no need to specify all of the bricks on the mount command; one need
only specify enough servers - two or three - to handle one being down _at
mount time_. RRDNS can also help here.

(3) Lack of support for login/password authentication. This has not been true
in the I/O path since forever; it only affects the CLI, which should only be
run from the servers themselves (or similarly secure hosts) anyway. It should
not be run from arbitrary hosts. Adding full SSL-based auth is already an
accepted feature for GlusterFS 3.5 and some of the patches are already in
progress. Other management interfaces already have stronger auth.

(4) Volumes can be mounted R/W from many locations. This is actually a
strength, since volumes are files. Unlike some alternatives, GlusterFS
provides true multi-protocol access - not just different silos for different
interfaces within the same infrastructure but the _same data_ accessible via
(deep breath) native protocol, NFS, SMB, Swift, Cinder, Hadoop FileSystem API,
or raw C API. It's up to the cloud infrastructure (e.g. Nova) not to mount the
same block-storage device from multiple locations, _just as with every
alternative_.

(5) What's even more damning than what the author says is what the author
doesn't say. There are benefits to having full POSIX semantics so that
hundreds of thousands of programs and scripts that don't speak other storage
APIs can use the data. There are benefits to having the same data available
through many protocols. There are benefits to having data that's shared at a
granularity finer than whole-object GET and PUT, with familiar permissions and
ACLs. There are benefits to having a system where any new feature - e.g.
georeplication, erasure coding, deduplication - immediately becomes available
across all access protocols. Every performance comparison I've seen vs.
obvious alternatives has either favored GlusterFS or revealed cheating (e.g.
buffering locally or throwing away O_SYNC) by the competitor. Or both. Of
course, the OP has already made up his mind so he doesn't mention any of this.

It's perfectly fine that the author prefers something else. He mentions Ceph.
I love Ceph. I also love XtreemFS, which hardly anybody seems to know about
and that's a shame. We're all on the same side, promoting open-source
horizontally scalable filesystems vs. worse alternatives - proprietary
storage, non-scalable storage, storage that can't be mounted and used in
familiar ways by normal users. When we've won that battle we can fight over
the spoils. ;) The point is that _even for a Cinder use case_ the author's
preferences might not apply to anyone else, and they certainly don't apply to
many of the more general use cases that all of these systems are designed to
support.

~~~
mgalkiewicz
1) It is great that snapshots are on their way. I am looking forward to use
them. All in all you cannot benefit from them in Cinder right now.

2) I dont claim that all bricks must be specified in mount command. I just
point out that having let's say 4 bricks it is impossible to mount volume by
specifing only 2 servers if both of them are down, yet still the rest 2 work.

3) Like I wrote. It only considers CLI.

4) I totally agree with you. Mounting volume from many locations is one of
advantages. It is not supported by Openstack. I dont blame GlusterFS for that.

5) My intension was not to describe GlusterFS cool features but current state
(and preview of Openstack Havana implementation) of integration with
Openstack.

~~~
notacoward
(1) You can in some configurations. If you use qemu there's a block-device
driver in qemu and another on the GlusterFS back end (as of 3.4), which both
allow snapshots via methods external to us. I meant what I said about it being
a hard problem. We're determined to deliver a _general_ snapshot function.
That's much harder than delivering snapshots that rely on an uncommon and/or
unstable base technology, so it's taking a while.

(2) Yes, if you want to survive N concurrent failures you need N+1 mount
servers, and currently released code only supports N=1. However,
[http://review.gluster.org/#/c/5400/](http://review.gluster.org/#/c/5400/) has
already been merged and will be available in the next release.

(3) IMO you should also have mentioned that the problem only manifests in a
specific deprecated use of the CLI (from machines other than the servers).
Nonetheless, this is a known deficiency which I've been personally pushing to
fix.

(5) The current state includes many of these "cool features" (thanks!) without
need for any specific OpenStack integration. That's kind of the point. Unlike
some, we don't need to re-implement features for every access method or use
case. 90% of that functionality would be available e.g. to CloudStack or
OpenNebula _today_. IMO making a big deal of snapshots as a differentiator in
one direction without mentioning myriad differentiators in the other doesn't
leave people with the information they need to make progress toward their own
decisions.

~~~
mgalkiewicz
1) Nobody says it is easy. I keep my fingers crossed for this feature.

2) Exactly. I have pointed out this ticket. When do you think it will be
available and in which version?

3) Unfortunately, I have not found any document where setting authentication
other than IP based is described. I would like to set it through openstack and
by hand. Is it possible?

5) Of course it is great that most features do not require any specific
OpenStack integration. I just wanted to describe those that need it. It mostly
shows what needs to be done in Openstack.

~~~
notacoward
(2) It'll automatically be in 3.5 approximately the end of this year. It will
probably also be backported into the next 3.4.x, but I can't comment on
schedules either for that or for the "downstream" Red Hat Storage releases.

(3) I don't know of any such instructions off the top of my head. Basically
you'll have to edit /etc/glusterfs/glusterd.vol and add by hand the same auth
options that are available for volumes via the CLI. I think there are
(possibly hidden) CLI options that you'd need to specify username/password to
work with that. If you try it and things aren't seeming fairly obvious, feel
free to ping me via email (jdarcy@myemployer.com) or Freenode IRC.

~~~
mgalkiewicz
(3) I would definitely like to avoid modifying files by hand. I knew about
this method but it just does not work for me. I am automatically creating many
volumes with cli and modifying them in configuration file is not convenient. I
guess it is also not possible for cinder devs to implement it reasonably well.

~~~
notacoward
You're opposed to manipulating config files by hand, but you recommend Ceph
over GlusterFS? Interesting.

~~~
mgalkiewicz
Well, I am not opposed to manipulating files by hand. I just dont want to do
that when I have CLI for everything else. Keeping the content of the file the
same on all servers is more toilsome than simple command. Of course there are
tools like chef but that is not the point.

------
j_s
Apparently there are nearly 20 supported storage backends for OpenStack, this
article is discussing the shortcomings of one of them. Not sure why GlusterFS
is singled out.

[https://wiki.openstack.org/wiki/CinderSupportMatrix](https://wiki.openstack.org/wiki/CinderSupportMatrix)

------
epistasis
If I understand this correctly, the complaints are:

\- Terminology -- Seriously? It's not a very strong complaint.

\- Snapshotting -- have to use qcow2 for this rather than native file system
support for snapshotting an individual file

\- Have to use Layer2 separation for security -- but this should be done any
way, shouldn't it? There's no reason to trust this to application level
security, and I there's any need at all for this type of security, L2 is the
only way to go.

Personally, I think Ceph is the future, and I also have personal reasons for
wanting Ceph to succeed. Having dealt a bit with both communities, I think
it's clear that Ceph is going to be the standard go-to destributed file system
soon, and I hope to switch our gluster filesystems to it soon (come on POSIX
FS layer!). So I kind of have it in the bag for Ceph.

However, I don't see these complaints as very strong. I'm only a dabbler with
OpenStack, but fairly experienced with Gluster and its warts.

~~~
mgalkiewicz
Terminology is not a problem:) It is just a little bit misleading when you
start implementing Cinder with Glusterfs.

Complaints are mostly about integration of both tools. I dont intend to
discredit Openstack/Glusterfs in particular.

------
viraptor
> Compute node downloads such image, puts it on a local disk and boots a VM.
> This method makes it impossible to use the highly desired live migration

That's not true. Live migration is possible both with glance images and cinder
volumes.

~~~
mgalkiewicz
Could you point out some docs about it?

~~~
viraptor
It works by default. I don't think you need to do anything crazy about it.
Base images will be downloaded by Nova during migration and libvirt/qemu will
copy the differences. (assuming you use libvirt, I don't know about other
supervisors)

See nova.virt.libvirt.driver:LibvirtDriver.pre_live_migrate - the code path
for "if not is_shared_storage".

~~~
mgalkiewicz
Are you sure that you are talking about true live migration? Is your instance
available during the migration?

~~~
vishvananda
This is definitely possible, it is called block migration. Here is an example
of someone showing usage (note the bug he mentioned has been fixed):

[http://www.sebastien-han.fr/blog/2012/07/12/openstack-
block-...](http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-
migration/)

Note that there are some reliability issues using versions prior to qemu-1.4
and libvirt 1.0.2

To enable "true" block migration where the server remains live the whole time
instead of being paused, you need to modify a config option:

block_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_NON_SHARED_INC,VIR_MIGRATE_LIVE

(this adds VIR_MIGRATE_LIVE to the default flags)

Also, keep in mind the same caveats to regular live migration with this flag
in that there are edge cases where the i/o in the guest is so great that the
migration will never complete.

~~~
mgalkiewicz
You are right. Block migration is definitely the way to go. Unfortunately it
is poorly described in openstack docs for KVM.
[http://docs.openstack.org/trunk/openstack-
compute/admin/cont...](http://docs.openstack.org/trunk/openstack-
compute/admin/content/configuring-migrations.html)

