
Fabric, the next-generation Facebook data center network - jamesgpearce
https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network
======
epistasis
This general spine-leaf, and subsequent super-spine construction, are talked
about frequently in recent networking conference talks. Using ECMP on top of
OSPF/BGP are very well established ways to build super-switches to scale to
super large fabrics.

I'd be really interested in the specifics that they don't describe very well
regarding cable layouts and automated configuration of pods.

Also, for anybody stuck in the old paradigm of super-expensive inflexible
switches from the traditional network vendors, be sure to check out the
commodity stuff that was mentioned previously in this HN thread:

[https://news.ycombinator.com/item?id=8400953](https://news.ycombinator.com/item?id=8400953)

~~~
dfox
These topologies (both logical and physical arrangement) were quite popular
for non-ethernet networks in 90's (in both super computers and phone
switches). AS it is reasonable way to get large fabric that has reasonable
performance yet is composed from relatively small switching elements with
reasonable cable routing between them.

------
georgyo
> What’s different is the much smaller size of our new unit – each pod has
> only 48 server racks

48 Racks seems pretty darn large by itself, and that is the smallest unit they
deal with. At only 20 servers in a rack, thats 960 servers in their smallest
unit. And they make it seem like there are hundreds of these pods in a single
datacenter...

A single pod is bigger than the vast majority of the top 500 super
computers...

~~~
nbm
The previous smallest unit was a "cluster" \- imagine for sake of example that
it is the same number of racks as 3 pods. Some time ago, clusters were
somewhat arbitrarily limited in size by a few things - human understanding was
definitely one, management software and visualization, network layout and port
density issues, and so forth. However, each cluster had a bunch of overhead
associated with it that outweigh the benefits, including the primary one -
failure domain. If we only needed one more pod-worth of servers, we would have
to add a cluster with 3 pods worth of racks.

I don't know the actual strategy (I work in a nearby team, but my focus is
mostly on load balancing and CDN infrastructure), but one could imagine that
in future it may be more normal to augment existing clusters/failure domains
(say, add one pod) rather than building whole new ones.

------
trhway
their schematics of datacenter reminds about schematics of a big server 15
years ago. Server racks instead of CPU-boards. "The datacenter is the
computer."

