
Intel Shuts Down Lustre File System Business - arcanus
https://www.nextplatform.com/2017/04/20/intel-shuts-lustre-file-system-business/
======
sanguy
Lustre is a pure-bread racehorse. It is not something the general masses
should ever take a ride on; or even get up close and personal with. It will
bite you; kick you; piss on you; crap on you; stomp on you; and generally try
to kill you. It is cranky. It is not user friendly. It doesn't play nice with
others.

But if you do know how to ride it; you have track to ride it on; and it
decides to let you ride; you are in for a real treat. It is a rush that can't
be beat. Even though after that ride your body will ache from the beating it's
been given.

In the right workloads Lustre can't be beat. I've seen it completely transform
large HPC clusters from dogs to kentucky derby winners. But as mentioned by
many it is a complex beast that needs the right people; right hardware; and
right workloads to make it run.

We used it for many years for large geospatial big data workloads. Huge images
to many 1000's of HPC nodes. Worked great. But it was highly sensitive to the
technical staff running it; and ultimately it became too "cranky" to use in
production mode without Gandalf and his wizard army to keep it from eating
itself.

So we now run on NFS over IB and happily give up some performance for 24*365
uptime.

~~~
lamby
> pure-bread

OT: I think you meant pure-bred. :) Reminds me of this section from Orwell's
«Politics and the English Language»:

> Some metaphors now current have been twisted out of their original meaning
> without those who use them even being aware of the fact. For example, toe
> the line is sometimes written as tow the line. Another example is the hammer
> and the anvil, now always used with the implication that the anvil gets the
> worst of it. In real life it is always the anvil that breaks the hammer,
> never the other way about: a writer who stopped to think what he was saying
> would avoid perverting the original phrase.

~~~
kevin_thibedeau
It's not uncommon to see anvils with their horns broken off from too much
hammering.

~~~
programmer_dude
Is it possible multiple hammers were involved?

------
markhahn
ironic to see comments here criticizing Lustre, from people who don't seem to
understand it at all.

Lustre is for large HPC clusters. It's not for your desktop; it's not for your
video editing suite. It is the only way to provide a filesystem that scales to
support, for instance, a compute job of 100k ranks, all writing checkpoints
and snapshots periodically. No, Hadoop doesn't do it. Ceph is for performance-
and-scaling-insensitive cloud installs.

Lustre is for when you expect to saturage 100 100Gb IB links to storage. it
works remarkably well for its use case (though even on HPC clusters, MDS
performance can be a problem).

~~~
hpcjoe
Actually Lustre isn't the only game in town for this. BeeGFS
([http://beegfs.com](http://beegfs.com)) does a very good job at this as well,
has better small to large scaling, understandable by mere-mortal error
messages, doesn't require a specific (ancient) kernel or distro for the server
...

So does GPFS ... er ... Spectrum Scale by IBM.

There are a few others that fit in this, but Lustre is not alone here.

~~~
frozenport
To my knowledge GPFS is IBM only.

~~~
Infernal
It's an IBM commercial product, if that's what you mean by "IBM only". You can
install it on any hardware though, we run it on a combination of SuperMicro,
Dell, and DDN currently.

------
epistasis
This entire space is littered with hard to use and/or flawed products. It's
extremely difficult to get right. And even things like HDFS, which redefine
the problem into something much much more manageable and with better semantics
for distributed computing, have had their own issues.

Take, for example, my favorite storage system Ceph. As I understand it was
originally going to be CephFS, with multiple metadata servers and lots of
distributed POSIX goodness. However, in the 10+ years its been in development,
the parts that have gotten tons of traction and have widespread use are
seemingly one-off side projects from the underlying storage system: object
storage and the RBD block device interfaces. Only in the past 12 months is
CephFS becoming production ready. But only with a single metadata server, and
the multiple metadata servers are still being debugged.

With Ceph, some of these timing issues are that the market for object store
and network-based block devices are dwarf the market for distributed POSIX.
But I bring it up to point out that distributed POSIX is also just a really
really hard problem, with limited use cases. It's super convenient for getting
an existing Unix executable to run on lots of machines at once. But that
convenience may not be worth the challenges it imposes on the infrastructure.

~~~
zzzcpan
"This entire space is littered with hard to use and/or flawed products."

None of the internet facing services actually need a strongly consistent POSIX
filesystem that scales across multiple datacenters and a huge chunk of them
won't put up with corresponding latencies of something like that for mere
convinience. So the products are not really flawed, they just don't need to do
those things.

~~~
convolvatron
this is right. posix for posix sake shouldn't really be relevant anymore. look
at the success of write-once s3. posix semantics were designed to address
issues of concurrent access to the same bytes.

also, the only standard distributed protocol for posix, nfs, has always had a
lot of design and implementation issues. v4 is complex and tries to address
some of the consistency issues, but I don't know how successful it is in
practice given the limited usage.

treating blobs as immutable addressable objects and using other systems for
coordination avoids a lot of the pain with caching, consistency, metadata
performance, etc. you can layer those things...its a good cut for large scale
distributed systems

~~~
jabl
Well, NFS isn't really cache coherent, and hence not POSIX compliant. Lustre
is, but pays for it with an amazingly complicated architecture.

I have (somewhat esoteric corner cases, but still) benchmarks that will cause
failures within seconds when run against NFS.

------
heisenbit
Intel seems to be cleaning its house from a number of long ongoing efforts.
The most high profile one was the IDF, then OpenStack and this one here fits
the pattern as well. Looks like at the beginning of the year they did some
hard thinking where the company should be heading and decided to change
direction. And it looks like they mean it.

The real question however is not where Intel does not see its business today
but where it sees its future business.

~~~
dogma1138
They've also spanoff Intel Security back into McAffee as a separate company
which Intel only has a stake in.

Intel seems to be going back to being a hardware only company.

~~~
coredog64
As a counter-example, they bought back Itseez which had spun off to develop
OpenCV.

------
reality_czech
A lot of commenters here are treating this announcement like it is the end of
Lustre. It is not. Lustre has a LONG history and a lot of deployments. A
partial history:

1999: Lustre is started as a research project at Carnegie Mellon University
2001: Cluster File Systems, Inc. founded to commercialize Lustre 2007: Sun
acquires Cluster File Systems 2010: Oracle buys Sun Microsystems, acquires
Lustre assets 2010: Eric Barton leaves Oracle and founds Whamcloud, continues
Lustre dev there 2012: Intel buys Whamcloud for its Lustre assets 2013: Oracle
sells Lustre-related assets to Xyratex 2013: Intel tries to extend Lustre to
Hadoop use-cases 2014: Seagate buys Xyratex 2017: Intel discontinues its
Lustre Filesystem business

I seem to remember IBM and HP being involved at some stage, but I'm having
trouble finding it online now.

The only serious open-source competitors to Lustre are Ceph and glusterfs. But
Ceph is too unstable, and glusterfs 3.0 is based off of distributed hash
tables and so is not strongly consistent.

~~~
Infernal
Whatever happened to parallel NFS/pNFS? I know Panasas basically uses this +
some of their own special sauce, but hasn't been clear to me if there is an
open source implementation of pNFS that could approximate performance of
Panasas on the right non-proprietary hardware.

------
bigjimslade
IBM operates an internal company-wide storage system called GSA or Global
Storage Architecture, which is GPFS on the servers, with various and sundry
methods of access through automounted NFS, Samba, and a basic web interface. I
wonder if such a system could be constructed with Lustre, one that wouldn't
require a Lustre wizard with a Unix beard to keep it happy.

------
anon263626
We used Lustre, pNFS and GPFS at Stanford on HPC gear (DDN, Panasas and some
enterprise COTS). Luster has a lot of moving parts and config. Most folks tend
to use Puppet/Chef and/or Rocks distro to deploy clusters in a somewhat sane
manner. (Sometimes AFS too but not much.)

These days, Ceph/Gluster might work, but Lustre's proven.

------
pinewurst
It's been amusing to see all the apologia and simple ass-covering following
this.

Lustre is a monstrosity - badly designed, poorly implemented, very hard to
configure and keep running or even get adequate performance under other than a
single limited use case.

Good riddance to bad rubbish.

~~~
arcanus
> or even get adequate performance under other than a single limited use case.

Hmm?

From the article: " In the core HPC market, Lustre has 75 of the Top 100
systems on the Top 500 rankings, and is used on nine of the top ten systems."

That is a strong argument for it being performant. That 'single limited use
case' you mention is... high performance computing.

~~~
pinewurst
That's an artifact of the limited choices available. Plus the core HPC market
is both notoriously inbred and tolerant of low system reliability and high
delicacy that the enterprise wouldn't let in the door.

And no, "single limited use case" I mean, is a relatively small number of very
large files that need to be sequentially streamed out/in as fast as possible.
That's a small subset of HPC.

GPFS is the gold standard but it's expensive.

Lustre is more a byproduct of the HPC vendors trying to get more margin by
using open source. They've drunk from the poisoned chalice in that the amount
of time, money and effort required to get Lustre to acceptability (limited as
that may be) has been far more than GPFS license/support cost. Even from the
brain-eating zombie that is today's IBM.

~~~
lo_stronzo
> GPFS is the gold standard but it's expensive.

Confirming! And going on a small tangent (ha!).

Our previous configuration was Lustre and XFS/NFS; the former was the scratch
file system for HPC applications and the latter for home directories and what
not.

Lustre was definitely a beast (in a good sense), but we'd occasionally get bit
by a work load high in metadata operations - which would bring Lustre to its
knees due to latency.

XFS/NFS was great for its purpose (no HPC workloads), but we'd also get bit by
a user or users reading seemingly tiny, innocuous files (inadvertently in
parallel) which would cause load averages to spike; surprisingly the latency
wasn't as bad as the Lustre workload mentioned above.

Not too drink the GPFS kool aid here, but it's definitely solved both problems
above. It has its issues, but definitely handles the common I/O patterns seen
on our cluster.

~~~
pinewurst
If I walked you through the Lustre metadata state machine you wouldn't be
surprised by Lustre latency any longer. It's a veritable Rube Goldberg machine
without the amusement.

~~~
lo_stronzo
I believe it.

At least the Lustre developers (pre-Intel) had the foresight to enable
extremely good debugging - you could simply enable a few procfs settings and
easily find the offenders.

It's still amusing to me however, that the biggest offenders of Lustre slow-
downs were single core processes. I'd check the RPC counts per node, find the
violator, and then check per PID statistics; it was always (95%+) a single
core application performing thousands of calls.

We never did experiment with the 2.x branch of the software. I recall one of
our co-workers stating that even the developers did not believe the dual MDS
set-up was production ready at that time.

------
stargrazer
openvstorage and sheepdog probably don't come close to the scale of ceph or
lustre?

------
me_again
So now they lack lustre?

I'll get my coat.

