
Network topologies for large-scale compute centers: It's the diameter (2016) - blopeur
http://htor.inf.ethz.ch/publications/index.php?pub=251
======
SilasX
PSA: in graph theory, the diameter of a network is the length of the longest
shortest-path between any pair of nodes. (i.e., out of all the shortest paths
between any pair of nodes, return the longest.)

It’s a natural generalization of its definition for circles because, if you
modeled a disk as abitrarily many nodes that connect to their immediate
neighbors, the definitions would be equivalent.

Not sure if this is too well known to post.

------
espeed
For some reason the above URL doesn't include a link to the paper [1], just
the embedded video [2] of Torsten's talk MSR posted a few days ago and a link
to the slides; however, this URL includes everything (slides, video, and paper
too):
[https://htor.inf.ethz.ch/publications/index.php?pub=187](https://htor.inf.ethz.ch/publications/index.php?pub=187)

[1] Slim Fly: A Cost Effective Low-Diameter Network Topology (2014) [pdf]
[https://htor.inf.ethz.ch/publications/img/sf_sc_2014.pdf](https://htor.inf.ethz.ch/publications/img/sf_sc_2014.pdf)

[2] Network Topologies for Large-scale Datacenters: It's the Diameter, Stupid!
(2016) [video]
[https://www.youtube.com/watch?v=F8F0JN6X0fE](https://www.youtube.com/watch?v=F8F0JN6X0fE)

------
francoisLabonte
What is really missing here is that you can build modular switches out of
multiple chips that are a lot more cost effective than connecting single chip
boxes mostly because the internal links in a modular chassis in connectors are
a lot more cost effective than any cable or optics. This makes the Clos
topology with Top or Rack using copper cables and spines out of modular
switches with up to 576x100G ports in today's technology winners every time
for datacenters. The supercomputing world is still stuck in requirements of
extreme low latency and hence hitched to Infiniband or other specialty
networks with devices with low number of ports.

~~~
rnxrx
Working from the 10K end-host number in the paper with a radix of 43 and 722
total router nodes, it’s the case that a given switch connects ~15 end-hosts.
Fifteen downlinks to 43 uplinks seems pretty wasteful in its own right.

The numbers you cite are closer to realistic: a modified Clos fabric
consisting of a four-way spine of 8-slot chassis with 36 100G blades
connecting a full complement of 288 leaves, each with 4x100G up and 40x10G
down (no oversubscription, at least nominally) leaves us with 11.5K hosts
connected by a total of 292 switches. Even if the modular switches are 10X the
cost of the ToR this is still markedly cheaper than 700+ ToR’s (and this
doesn’t include actual power/cooling costs and the opportunity cost of space
lost).

This also doesn’t include cabling/transceiver cost differences: 722 * 43 * 2
(62K) vs 288 * 4 * 2 (2.3K). That’s literally an order of magnitude
difference.

Even if the design were approached using single-speed connections (ex: 48x10G
up divided over 16 spines, 48x10G down locally) there’s still a pretty
compelling numerical advantage to Clos: ~210 96x10G ToR’s and 16 16-slot
chassis for 10K hosts is 226 devices and ~20K transceivers. If we assume the
spines cost 10X the leaves then 370X is still almost half of 722X.

~~~
dgaudet
more clos advantages:

\- simplified operations: can service/remove/add ToRs without affecting global
routing or the forwarding capacity of the fabric.

\- tor uplink flexibility: if a rack has higher bandwidth needs then double up
the links (4 vs 8 in your example)

\- tor location mobility: it's a lot easier to manage 4 fiber runs per tor
than it is to manage 43 different fiber runs... backhaul to 4 spine blocks vs.
a complex web of interconnect spreading all over the datacenter floor. with
the fly network it's unlikely you can move a rack once it has been placed, at
least not until you're ready to tear down the whole fabric and build something
new. so you better get your rack density and layout just right. with tor
you're stuck with your spine locations, but everything else can be moved
around.

the fly advantages for homogeneous supercomputers built and decom'd N years
later are clear... but for datacenters which grow and evolve with
heterogeneous devices, fly doesn't seem to really hold up well compared to
clos.

------
jabl
AFAIU one major advantage of Clos/fat tree networks is that you'll do quite
fine even with relatively dumb static routing protocols.

Slim Fly, Dragonfly, and other fancy network topologies tend to require
adaptive non-minimal routing to handle adversarial traffic patterns, which
Infiniband (or ethernet, for that matter) doesn't support.

------
323454
I'll just leave this here
[https://en.m.wikipedia.org/wiki/Butterfly_network](https://en.m.wikipedia.org/wiki/Butterfly_network)

~~~
wmf
Not that useful of a comment. The paper/video explains a new topology called
SlimFly that is more efficient than Butterfly.

