
A New Age in Cluster Interconnects Is Dawning - Katydid
http://www.nextplatform.com/2015/11/22/a-new-age-in-cluster-interconnects-dawns/
======
espeed
Seastar
([https://github.com/scylladb/seastar/wiki](https://github.com/scylladb/seastar/wiki))
is a networking stack that bypasses the kernel that you can use in the cloud,
developed by the team behind the KVM hypervisor
([http://www.scylladb.com/technology/network/](http://www.scylladb.com/technology/network/)).

~~~
greglindahl
The HPC community has been using OS bypass networking for close to 2 decades.

------
jpgvm
Infiniband is really great stuff. Unfortunately kernel support has been spotty
at times, especially the SCSI RDMA client and IPoIB.

That said, if you have those issues sorted it's insanely fast, reliable and
has great diagnostics and performance tools available.

I have long wished for it to displace Ethernet in the datacenter but I have
basically given up hope on that for now.

Infiniband is likely to live on not as an interconnect but rather as a L3/L4
set of protocols to implement RDMA on lossless Ethernet.

~~~
greglindahl
The Infiniband market is changing -- Intel basically stopped trying to be
compatible with it in the new 100 gigabit version that's on-package for the
new Xeon Phi. In practice, there were only 2 vendors, so it's no surprise that
they're headed in opposite directions.

One of IPoIB's problems was that it was made extremely complicated (using
connected mode) when only 1 of the 2 implementations benefited from the
complication. Meanwhile, the mild complication which worked great on the other
implementation was not part of the standard.

~~~
vonmoltke
My impression, having worked with Infiniband for several years a few years
ago, is that the drive was always to make it a supercomputing fabric rather
than a general-purpose cluster fabric. The goal was to unseat Myrinet and
other low-level technologies for MPI jobs where the job wouldn't be using TCP
or IP for the communication. IPoIB got tacked on to placate the market segment
that wanted to use tools that communicated with TCP/IP but wanted something
faster than gigabit ethernet.

Unfortunately for the few players in the space, the cluster ecosystem is
moving away from those tightly-coupled systems to more flexible systems that
can tolerate higher communication overheads.

~~~
greglindahl
Actually, IPoIB was a customer requirement. In the Myrinet era, people
frequently built clusters with a very poor ethernet, and used IPoverMyrinet to
access storage etc.

I don't agree with you about the way the ecosystem is headed. Tightly-coupled
problems are tightly-coupled, period. It's true that there is a lot of
interest in more loosely-connected clusters, but that doesn't get rid of the
existing market.

~~~
vonmoltke
> Actually, IPoIB was a customer requirement. In the Myrinet era, people
> frequently built clusters with a very poor ethernet, and used IPoverMyrinet
> to access storage etc.

Interesting. I know we had a general-purpose cluster wired together with IPoIB
just because it was the fastest option at the time. The production systems we
were building would have IB for main communications using RDMA primitives in
our middleware and a gigabit ethernet maintenance network for coordination and
monitoring.

> I don't agree with you about the way the ecosystem is headed. Tightly-
> coupled problems are tightly-coupled, period. It's true that there is a lot
> of interest in more loosely-connected clusters, but that doesn't get rid of
> the existing market.

My point was more that MPI forces a greater degree of coupling to cluster
architecture than some problems require, just because it was the way MPI was
designed. Maybe I am overestimating the portion, but some number of problems
solved in MPI prior to the development of more modern MapReduce frameworks was
done simply because the problem needed the throughput of a cluster and MPI was
the only way to use it without writing your own network middleware. That
portion of the market is going away.

~~~
greglindahl
In my HPC career, I never met such a customer, other than some oil and gas
folks doing the usual prestack depth migration algorithm, which is not a
tightly-coupled problem.

