
A Gluster developer's thoughts on Torus - wmf
http://pl.atyp.us/2016-06-torus.html
======
spotman
This is a tough one. It's easy to be the expert and point out what will not
work.

The crux of it is as he states the false advertising, and how many people will
see this and start to use it without knowing what they are doing. However it's
hard to separate false advertising, with overconfidence. Maybe they even do
not have to be the different things in this context.

The author is correct, this is incredibly hard to get right. Without having
some experts heavily contributing and steering a project like this (and, by
experts, I mean distributed filesystem experts, not just general distributed
application knowledge) it is going to be a long uphill battle, fraught with
terror.

But, who's to say that they won't find that type of collaboration. They may
just do that, and they may just pull off making a decent project in some
years.

But is it hard, hell yes. Is it going to happen as fast as they claim,
probably not. Is it a silly idea to build this and not contribute to prior art
more? Probably. Is someone a little pissed off about this and taking to the
internet to moan about it a bit? Probably.

At the end of the day, file systems are hard, and if your job is to deploy,
manage, scale, or recover them, you won't just be blindly throwing your
petabytes at a brand new project anyways, and if you are, you should be
removed from your current position.

The type of folks that will be early adopters and contributors won't be
putting their banking transactions on it.

------
dice
>Single-threaded sequential 1KB writes for a total of 4GB, without even
oflag=sync? Bonnie++? Sorry, but these are not "good benchmarks to run" at
all. They're garbage. People who know storage would never suggest these.

As a non-storage person: what should I be using instead of dd and bonnie++?

~~~
bcantrill
bonnie++ is, as Jeff says, well-known as a canonical example of a benchmark
gone horribly wrong; see Brendan Gregg's excellent post on active benchmarking
and bonnie++[1] for the gory details.

Beyond bonnie++, storage benchmarks are fraught with peril; years ago, I
dismembered SPEC SFS as being similarly unsafe at any speed when benchmarking
storage systems, albeit for much more subtle reasons than the glaring
mechanical flaws in bonnie++.[2] And as for dd, I actually think it's okay as
long as you explain clearly what it is (and isn't); Jeff's complaint is that
they seem to be treating this single dd invocation as "write performance",
when it fact the truth is subtler -- and things like blocksize and
synchronicity matter a great deal.

More generally, anyone interested in storage benchmarks would be wise to read
essentially everything that Brendan Gregg has written on the subject, starting
with his five-part (!!) series on file system latency.[3][4][5][6][7]

[1]
[http://www.brendangregg.com/ActiveBenchmarking/bonnie++.html](http://www.brendangregg.com/ActiveBenchmarking/bonnie++.html)

[2] [http://dtrace.org/blogs/bmc/2009/02/02/eulogy-for-a-
benchmar...](http://dtrace.org/blogs/bmc/2009/02/02/eulogy-for-a-benchmark/)

[3] [http://dtrace.org/blogs/brendan/2011/05/11/file-system-
laten...](http://dtrace.org/blogs/brendan/2011/05/11/file-system-latency-
part-1/)

[4] [http://dtrace.org/blogs/brendan/2011/05/13/file-system-
laten...](http://dtrace.org/blogs/brendan/2011/05/13/file-system-latency-
part-2/)

[5] [http://dtrace.org/blogs/brendan/2011/05/18/file-system-
laten...](http://dtrace.org/blogs/brendan/2011/05/18/file-system-latency-
part-3/)

[6] [http://dtrace.org/blogs/brendan/2011/05/24/file-system-
laten...](http://dtrace.org/blogs/brendan/2011/05/24/file-system-latency-
part-4/)

[7] [http://dtrace.org/blogs/brendan/2011/06/03/file-system-
laten...](http://dtrace.org/blogs/brendan/2011/06/03/file-system-latency-
part-5/)

~~~
notacoward
+1 for anything Brendan writes or says. Seriously, to those of us who do this
stuff, he's idol material.

~~~
SEJeff
Ditto for Mr Cantrill, one of the fathers of DTrace fwiw

------
merb

        It's not true for Gluster. It's not true for Ceph. It's not true for Lustre, 
        OrangeFS, and so on. It's not even true for Sheepdog, which Torus very strongly
        resembles. None of these systems were designed for small clusters.
    

That's true. There is no system that is easy to administrate and starts with 1
Node and then can Scale to 1, 3, 5, 7, etc.

No system addresses this (they don't want to or it's to hard whatever).

------
sshykes
It started off so nice and friendly, and then reads so hostile at the end.
Sort of like a shit sandwich, except with the bottom slice of bread missing.

I am curious to understand how Torus is similar to Sheepdog [0].

From the Sheepdog website:

    
    
        Sheepdog is a distributed object storage system for
        volume and container services and manages the disks
        and nodes intelligently. Sheepdog features ease of use,
        simplicity of code and can scale out to thousands of
        nodes.
    
        The block level volume abstraction can be attached to
        QEMU virtual machines and Linux SCSI Target and supports
        advanced volume management features such as snapshot,
        cloning, and thin provisioning.
    
        The object level container abstraction is designed to
        be Openstack Swift and Amazon S3 API compatible and can
        be used to store and retrieve any amount of data with a
        simple web services interface.
    

[0]
[https://sheepdog.github.io/sheepdog/](https://sheepdog.github.io/sheepdog/)

~~~
notacoward
Try this link to see the similarity a bit more clearly.

[https://github.com/sheepdog/sheepdog/wiki/Sheepdog-
Design](https://github.com/sheepdog/sheepdog/wiki/Sheepdog-Design)

They're both basically block storage, with similar approaches to sharding and
replication. Sheepdog seems to be using the term "object" more than they used
to, but it's important to note that sheepdog objects have semantics closer to
files or virtual disks than to S3/Swift style objects. The two also use
related approaches (consensus vs. virtual synchrony) for coordination. Most of
the differences are related to the fact that Sheepdog has already evolved over
several years to have many of the features that are still on Torus's nascent
road map. Ceph's RADOS/RBD is only a bit further from either one than they are
from each other. None of them are _identical_ , of course, and I never said
they were, but from a purely technical perspective Torus's stated goals could
have been achieved more quickly by contributing to Sheepdog than by starting a
new project.

------
the_common_man
> Anybody who would suggest these is not a storage professional, and should
> not be making any claims about how long it might take to implement
> filesystem semantics on top of what Torus already has.

The entire post reeks of condescension and arrogance.

I also don't like the potshots at marketing. I think Torus is of to a very
good start. It's a project after all and they are claiming things to set
across their vision. They didn't "lie" about things being here already but
it's going to happen in the near future. What's wrong with that? Because
"storage experts" think it takes years to build ? Sorry, visionaries don't
listen to "experts" and set out doing things.

~~~
rdtsc
> visionaries don't listen to "experts" and set out doing things.

I am stealing that quote. It is my new favorite sarcastic quote.

~~~
timv
It turns out that my 2 year old is a "visionary" \- who knew?

