
Torus development has been stopped at CoreOS - Perceptes
https://github.com/coreos/torus/commit/4df3e5633f466f90047071b995a41fca3ecca5ad
======
chuhnk
"But we didn't achieve the development velocity over the 8 months that we had
hoped for when we started out, and as such we didn't achieve the depth of
community engagement we had hoped for either."

Open source is tough, even as a successful VC funded company. Gotta give
credit to CoreOS though, rather than beating a dead horse they're
acknowledging there's little external interest and their time would be better
spent focused elsewhere. Seeing as they've discontinued Fleet as well it's
likely they're doubling down on the commercial Tectonic product built on
kubernetes. There's also likely pressure from their investors to start making
money, especially as they're well into using their series B and might want to
look to raise more money.

Distributed file storage is a tough market in itself. Developers get excited
about technology but how many need this as opposed to a highly available
database? I love the building blocks of distributed systems and understand
that Google's technology is built on a layering of tech (Colossus, Spanner,
etc) but it seems the world is not yet ready. Everyone is already struggling
to understand the complexities of this new ecosystem and how the pieces all
fit together.

Again, good move by CoreOS, wish them luck with their commercial strategy.

~~~
parenthephobia
_> Everyone is already struggling to understand the complexities of this new
ecosystem and how the pieces all fit together_

I believe Google's (and Amazon's) secret sauce is having large sysops and
devops teams filled with subject matter experts - and the software's original
developers.

Efforts like Torus are trying to build zero-to-low-maintenance turn-key
solutions, whilst Google is happy to have tens of full-time distributed
storage engineers.

~~~
fh973
For Google at least the SRE teams for infrastructure components are not large,
neither in total nor relative to the huge infrastructure that they are
managing.

You could call this operational scalability, it is made possible by decoupling
service quality from individual pieces of hardware. The key ingredient are
redundancy across relatively large failure domains and auto mated handling of
all foreseeable events. Everything is built around this concept, compute with
a fault-tolerant container scheduler, storage on a fault-tolerant file system,
checksuming everywhere,...

Another enabler is probably keeping things simple at the component level.

------
joshbaptiste
Bcantrill on how hard such a feat would be to accomplish.

[https://news.ycombinator.com/item?id=11817387](https://news.ycombinator.com/item?id=11817387)

[https://news.ycombinator.com/item?id=11818081](https://news.ycombinator.com/item?id=11818081)

~~~
bassamtabbara
disclaimer: I work on project Rook.

Yes building a whole new data path is complicated and takes many years to get
right. Its also somewhat of a moving target with new storage technologies
appearing on the scene (like SMR drives, NVME, and persistent memory).

It would be much more effective for the community to coalesce around a common
data path just like we coalesce around kernels. Something like Ceph is a great
start, its battle-tested and has storage vendors (like Intel, Samsung, SanDisk
and others) updating/optimizing it for new kinds of storage.

The focus on simplicity and integration into cloud-native environments is
critical however and Torus' vision was spot on. Kudos to the CoreOS team for
raising the bar on this.

------
KaiserPro
I have karma to burn on this, so here goes:

I worked for several years in VFX/HPC. 30k+ cpus and 15pbs of storage.

Firstly with storage its very rare that people want actual block storage
(unless you are hosting VMs, but thats so 2007.....) Yes, I know, openstack,
but that's just fucking horrific, seriously just use netboot and be done with
it. I've seen people do it inside new clustereing systems, but its really not
fun to do, especially if you consumer is prone to disappearing without
warning. (FSCK is a terrible mechanism for fast recovery)

Most apps, unless they have bought into the "shove everything over HTTP and
pay the penalty", want a posix file system to store anything of importance.
(yes, yes database, but where is that writing the data to?)

Now, there are three ways you can do this:

o use a clustered file system

o Use NFS (with or without a clustered filesystem underneath)

o Fuck about with iscsi/SAS/FC and dynamically map block dynamically.

Using a clustered filesystem spread over many clients is begging for trouble,
mainly because one client can fuck it up for everyone. Some FSs are dynamic
and sexy, but they have a habit of fucking up in new and interesting ways that
even the authors can't figure out.

The common ground is having storage nodes attached directly to a pack of big
fat disks(for streaming IO) or NVME/SSDs for random IO. They then serve out
NFS traffic. Now, you can either have a clustered file system underneth, or
not. (Having stand alone servers can be advantageous, if you can map your
filesystem out hierarchically)

Now, unless you have a Storage area network, then the last option is just
begging for shit performance. You really don't want IO traffic fighting with
network traffic. However, if you want raw throughput, this is the way to go,
but be warned, you won't get any friendly help if you accidentally disconnect
a disk.

Basically, kubernetes/HPC and storage is a solved problem _ducks_ no really,
just map in NFS shares and be done with it. If its exotic, its probably going
to fail hard, and in ungoogleable ways. More importantly only a few people are
going to be able to help, and they may or may not still employed at your
company.

~~~
icebraining
Why isn't a fourth option being explored: local storage with async
replication? Seems like it'd be fairly simple and fast, and no worse than non-
clustered NFS regarding data integrity.

I'm just talking from ignorance, so am I missing something?

~~~
KaiserPro
Oh yes, sorry I assumed that.

The simplest storage, is a bunch of dumb servers (well beefy dumb servers)
with some application aware scripts to move/copy the dataset.

A place I worked at had a wrapper around rsync that would split up the
directory and spawn multiple rsyncs to do a parallel copy.

The Directory structure was effectively copy on write, so backup to the
nearline was <15 minutes

~~~
icebraining
I wonder if you could have something better, closer to streaming replication
of databases. A few weeks ago I found zrep, which sounds more like what I had
in mind:
[http://www.bolthole.com/solaris/zrep/](http://www.bolthole.com/solaris/zrep/)

------
dankohn1
In addition to Rook [https://rook.io/](https://rook.io/) , which CoreOS
mentions and we need to add, please take a look at the other cloud-native
storage options listed on the CNCF cloud native landscape:
[https://github.com/cncf/landscape](https://github.com/cncf/landscape)

Disclosure: I'm executive director of CNCF, and co-author of the landscape.

~~~
chrissnell
Do you have any details about running Rook on Kubernetes? The Rook docs link
to an outdated document about running Kubernetes on CoreOS.

~~~
josephjacks
The folks behind Rook at Quantum have put together an operator (custom K8s
controller and TPR) for Rook:
[https://github.com/rook/rook/tree/master/demo/kubernetes](https://github.com/rook/rook/tree/master/demo/kubernetes)

------
sysexit
I called it here, right at the initial announcement:

[https://news.ycombinator.com/item?id=11816951](https://news.ycombinator.com/item?id=11816951)

This is just too hard of a problem to solve frivolously. Kudos to CoreOS for
trying, and coming to the inevitable conclusion sooner rather than later.

------
123jfeichabc
This is good news - CoreOS needs to focus on what's most important to their
core business to be successful.

Being chock full of bright, relatively young and enthusiastic engineers drunk
on the Golang kool-aid, there's a very real risk of getting distracted by
reimplementing everything under the sun in their favorite shiny new language.

Even if Torus is a good idea, CoreOS has to prioritize, commit, and execute.
They can't afford too many diversions. This is a competitive space, their
opportunity window and runway are both limited, as usual.

------
SEJeff
I kind of wonder if there will ever be a kubernetes operator built for Ceph
(not rook ontop of Ceph). Besides it being a bit of a PITA to maintain, Ceph
is about as good as exists regarding OSS distributed object storage currently.
If they could kill some of the operational overhead via an operator that did
much of it, they might have a serious winner on their hands. Note that I'm
just referring to the radosgw bits for the S3 style storage API, not the posix
filesystem bits.

~~~
hunter_n
There is some discussion on this in the ceph-docker project -
[https://github.com/ceph/ceph-docker/issues/472](https://github.com/ceph/ceph-
docker/issues/472)

Interestingly there is an unannounced project by CoreOS for a storage Operator
that will handle Ceph, Gluster, etc. I'm sure we'll hear more about that now
that Torus has been retired.

~~~
SEJeff
Source for the unannounced project, or just overheard in person from someone?
I don't see anything obvious on their github but it could be private.

~~~
hunter_n
Some more info here:
[https://docs.google.com/document/d/1Nm3ZQXtojd7Ruw8gQ-8xNo0v...](https://docs.google.com/document/d/1Nm3ZQXtojd7Ruw8gQ-8xNo0v1rReHl7v-KSd19UJO4I/edit)

------
marknadal
Dangit, I trust the CoreOS team more/better than a lot of people in the space.
Torus would have been so useful.

At the other end of the spectrum though, maybe this is reasonable? As a
developer, my first thoughts for "I want my own S3" is not etcd (strong
consistency) but projects like
[https://github.com/minio/minio](https://github.com/minio/minio) , or even
using eventually consistent SQLite replication / synchronization tools
[https://github.com/gundb/sqlite](https://github.com/gundb/sqlite) .

So that makes me ask about rook.io too, what layer of the "stack" is it trying
to fit into? Obviously pretty low, but that also seems unnecessary (and part
of why I suspect Torus is stopping).

~~~
jacques_chester
At a glance, Torus was intended to be a distributed file system. As I
understand it, distributed file systems are easier than distributed block
systems, but harder than distributed blob systems.

A blob system is all-or-nothing. You create or replace the entire blob at
once. This makes bookkeeping and replication much easier for the implementer.

A filesystem supports much richer semantics, including the ability to seek
parts of files and modify small regions of files. You need a lot more
mechanics to maintain consistency across a network.

A block store is difficult because you're trying to work at very high speed on
very small units of state wooshing back and forth willy-nilly. You don't get
to rely on any of the higher semantics provided by a filesystem or blobstore,
since you're pretending to be a magical harddrive.

I am often wrong in these matters, as an interested outsider, so I'd be happy
to receive correction.

~~~
umamukkara
Disclosure: I work for OpenEBS project

Torus was intending to write distributed block storge that is container
native. Metadata management using key value (KV / etcd) method is increases
the complexity and not new. Ceph tried it.

OpenEBS uses a novel approch, linux sparse files for managing the blocks of a
volume. Fork of Rancher longhorn. The issue of managing the large scale
distributed block storage metadata is solved easily throught he management of
the files (not blocks).

[https://blog.openebs.io/torus-from-coreos-steps-aside-as-
clo...](https://blog.openebs.io/torus-from-coreos-steps-aside-as-cloud-native-
storage-platform-what-now-2375e7f5b145#.fslj3rwsg)

~~~
fh973
What I don't get around the new efforts around container-native storage: if
you decide to build a new container-native storage system, why would you aim
for block storage and not file storage?

Block storage is not exactly a great fit for containers as you can't access
its file systems from multiple hosts and fail-over is a hassle (forced
remount, fsck).

~~~
umamukkara
Container native storage has two aspects. One, the storage that container
uses, (Like Docker uses Device Mapper, Overlay2 etc.) and two, the persistent
storage that the applications inside containers need.

Both of them need to be truly container native. Portworx is attempting LCFS to
provide a container native storage for containers itself
[https://github.com/portworx/lcfs](https://github.com/portworx/lcfs). So, you
are right. You would need a container-native storage (file system for Docker)
for running containers.

OpenEBS is targeting providing containerized storage (persistent+block
storage) for applications in containers. OpenEBS builds a storage volume as a
container and presents the volume-container as part of the K8s POD. This way
the the storage persistence problems are resolved by K8S orchestration
ingelligence written for application PODs.

Ofcouse, OpenEBS containers will use LCFS when it is ready.

------
geku
Does anyone have experience with [https://rook.io/](https://rook.io/) which is
mentioned in the message?

------
aerioux
could anyone more familiar with the situation give context around the decision
+ what is happening moving forward?

Thanks

~~~
Zelmor
Nothing of value was lost, don't worry about it. People reinventing the wheel,
realised it takes a lot more than they are capable of. Same old, same old.

~~~
parenthephobia
I can't speak to the value of Torus, but in this case the best wheels are
proprietary and their design is secret. Reinventing them is reasonable.

------
epowell2017
Anyone looking at openEBS.io? This is open source scale out block for
containers.

~~~
umamukkara
[https://blog.openebs.io/torus-from-coreos-steps-aside-as-
clo...](https://blog.openebs.io/torus-from-coreos-steps-aside-as-cloud-native-
storage-platform-what-now-2375e7f5b145#.fslj3rwsg)

------
usgroup
Those of us in the game for some time ultimately read "beta" to mean "30%
chance of survival" so this doesn't come as a surprise but it probably isn't
true for our more optimistic colleagues.

I think more tempered marketing would have really helped.

~~~
usgroup
For reference: [https://coreos.com/blog/torus-distributed-storage-by-
coreos....](https://coreos.com/blog/torus-distributed-storage-by-coreos.html)

"Releasing today's initial version of Torus is just the beginning of our
effort to build a world-class cloud-native distributed storage system..."
2016-06

"Torus development has stopped on Core OS" 2017-02

Simply pointing out what is the case.

------
alrs
I understand they don't hand out Internet points for this sort of thing
anymore:
[https://news.ycombinator.com/item?id=11816821](https://news.ycombinator.com/item?id=11816821)

~~~
wmf
Note that CoreOS has not given up on the concept of distributed storage; they
just gave up on writing their own. So they haven't proved you right.

I realize reliable block/file storage isn't "cloud native" but legacy apps
require it and they are willing to spend billions to have it.

~~~
epowell2017
What does "cloud native" mean here? To me it suggests purpose built much as
Torus was /is - though the OpenEBS engineers are asserting there are really
only two "container native" storage solutions going, their open source project
and PortWorx. [https://medium.com/@kiranmova/persistent-storage-for-
contain...](https://medium.com/@kiranmova/persistent-storage-for-containers-
alternatives-to-torus-2375e7f5b145)

~~~
wmf
As with all buzzwords, it means what people want it to mean. Many people say
that cloud-native apps should use only ephemeral storage (probably because
they don't provide reliable storage).

------
fidget
Well that seems eminently reasonable

------
smlacy
What's Torus and why should I care?

------
alrs
The next question that needs to be answered at CoreOS: "Why, exactly, are we
maintaining our own Linux distro when the Go binaries that we're writing can
mostly ignore userspace?"

~~~
el_isma
For one, CoreOS auto-updates smartly, so you can install and forget.

~~~
hagbarddenstore
Ah.... Hahah... Hahhahahahhahahahahahahahah.

No.

During the 1 year I ran CoreOS in production, updates were turned off, because
they caused all sorts of issues.

They only reliable way of doing updates in CoreOS is to replace the machine
and reconfiguring it. But then you need to automate joining etcd, which itself
is a major pain in the ass.

~~~
politician
You don't deserve the downvotes. When Docker arbitrarily changes something
important and pushes those changes to Docker Hub, CoreOS is dragged along for
the ride.

We had our auto-updating servers move to Docker 1.10 over a weekend. Of
course, this brought down our CI/CD process because that version of Docker
changed something important. Our staging environment was totally horked, but
our production environment survived due to an unexplained reboot lock. We were
lucky.

Turn off auto updates.

