
Fastpass: A Centralized “Zero-Queue” Datacenter Network - jonbaer
http://fastpass.mit.edu/
======
deadgrey19
As context, I am a researcher in datacenter networks at a world renowned
university. I've read the full research paper in detail, several times (not
just the media summary) and have reviewed it carefully with other researchers
in this area in our weekly reading group.

This first thing to note is that this is not "done and dusted" "accepted as
gospel truth". The work may have been accepted for publication at a top tier
conference (and that's fantastic!), but this means that only a small fraction
of researchers have actually read that paper. Research publications are part
of a conversation in the research community, an argument, and they should be
read critically.

In that light, reading the work carefully, several things should be noted:

First the good stuff:

1) This work is incredibly well written. It's easy to read, it's sexy and it
sells. This makes it easy for reviewers who often get a lot of poorly written
work. A job well done by the writers.

2) In my (qualified) opinion, the reason that this work has been accepted for
publication is that many in the community, including myself, would not have
believed that a centralized arbiter could be be built, AT ALL. This is quite a
novel thing. The idea is mad, and it seems to have worked.

Now the not so good stuff:

1) This paper is carefully craft to be slippery. You should be very careful
about the claims that the authors make, in contrast to what you assume from
the language used.

2) Fastpass is not "zero queue". They simply move queuing into other places.
First, in the end host. When a host wants to transmit data, it must queue
those packets and send a message to the arbiter to get a slot, it must then
wait for a response, and then wait for a slot. Second in the arbiter itself,
the arbiter must be able to keep up. This is easy at low load, but these times
get much longer at high load and larger systems.

3) The authors never measure or demonstrate the impact of this extra queuing
at high load (or at any load) on the end-to-end latency. Using "zero queue" in
the title, implies that latency in the system is better, and indeed the MIT
sound byte assumes this, but it's not measured or demonstrated anywhere. My
guess is that the results just aren't better. So they have focused on the
things that look good, and not the dirty details. This seems disingenuous to
me.

4) The Facebook implementation uses only 1 rack (at most about 100 machines,
probably more like 40), which means that there is only ever 1 path that
packets can take. This means that a significant part of their algorithm
(calculating the right path) is never run, reducing the cost.

5) Despite this, the Facebook implementation shows almost no benefit. They
manage to reduce the number of TCP retransmits from 4 per second down to 2 per
second. They never discuss or demonstrate that this has any useful benefit,
and, frankly, I'd be surprised if it did.

6) The headline number of reducing latency by 1000's of percent is only in a
contrived experiment with ping and iperrf, extreme ends of the latency
throughput spectrum with little relation to the real world. This same result
could have easily been achieved by simply setting ping to a high network
priority.

7) There is no mention of tail latencies for realistic workloads, which is the
real problem that is suggested in the solution but never demonstrated.

8) Scalability is a serious issue for this work, which the authors
acknowledge, but will limit deployability.

9) Ultimately, this work is sexy and interesting, but never in the paper
demonstrates any tangible benefit.

~~~
hueving
Great response but you didn't need to start with the chest thumping ("world
renowned university", "(qualified) opinion"). I almost completely skipped
reading it because it makes you sound like a blowhard.

~~~
deadgrey19
Thanks. I guess I'm trying to give some sort of indication that I know what
I'm talking about. Too many people have opinions without any factual basis to
make them on. I'll try to be more toned down in the future.

------
wtallis
A little bit of relevant discussion:

[https://lists.bufferbloat.net/pipermail/cerowrt-
devel/2014-J...](https://lists.bufferbloat.net/pipermail/cerowrt-
devel/2014-July/003247.html)

------
dekhn
The failure modes of the master to secondary arbiter failover process need to
be analyzed a bit more. Especially if you have a packet of doom that takes out
both the master and the secondary; what happens to all the network traffic
when both are gone? Does it degrade to normal TCP (it didn't look like it).

------
jpgvm
Cool, but if you really want to do this sort of stuff (and don't mind being
tied to the concept of some sort of arbitrator or route server) then you
should be looking at circuit switched networks like Infiniband, Myrinet,
DolphinLink, NUMALink etc.

HPC has been doing this stuff for decades, it's a fairly well understood
problem how to achieve "zero queuing".

It's worth mentioning that Infiniband is incredibly affordable for datacenter
networking and has excellent tooling on both Linux and Windows these days.

~~~
greglindahl
None of the networks you mention are circuit switched. They are all packet
switched.

Fibre Channel is about the last example of a circuit-switched network, and
it's pretty much dying.

~~~
jpgvm
Technically correct, Infiniband is actually a variant of virtual cut through
switching. If one want's be be even more pedantic you can also point out there
are implementations of Ethernet that employ the same technology at a switching
level.

What really makes the big difference though is the presence of the subnet
manager. Which pre-programs the routing information into each switch at fabric
bring-up time. This is what causes Infiniband to act like a circuit switched
network despite ofcourse being VCT at the PHY layer.

~~~
greglindahl
Uh, no. You aren't using those words the way other people do.

------
hexleo
When I saw "Zero-Queue" it scared me. It's really zero? What kind of transport
layer protocol does this network framework taken(D2TCP or ...)?

~~~
wmf
The queues are still there, but scheduling ensures that they don't fill up.

~~~
deadgrey19
Incorrect. The scheduling only ensures that the queues in the network (from
the NIC onwards) don't fill up. However, there queues are still there in the
host and in the arbiter. The authors never measure or demonstrate that these
queues are any shorter, that the tail latencies are improved for any real
workload, or that there is any actual benefit in the approach for a real world
scenario.

------
nullspace
This may be a dumb question - but the experiments show latencies of the order
of milliseconds. How would it work when your median latencies are of the order
of 100 - 200 microseconds? At that scale, the effect of the arbiter would be
more pronounced right?

Am I missing something here, or is this not meant for that use case?

~~~
wmf
It should still reduce your tail latency.

~~~
deadgrey19
Dubious. The question is what the tail response time of the arbiter is at full
load. The key measurement (which is conspicuously absent from the paper) is
the impact on the end-to-end delay at varying load. A distribution graph of
this would answer this question immediately. My suspicion is that it is no
better because essentially that same about of "scheduling work" is being done
regardless of where it is done.

------
anko
I don't normally vote articles up but this looks really cool.

It would be good if they had more info about their testing methodology and
also something like a haproxy implementation.

Also I don't see any mention of failure if the arbiter falls over.

~~~
dekhn
Yes, the paper talks about having secondary arbiter that does watchdog pings,
and if the primary dies, the secondary waits for the queues to flush then
takes over, statelessly. It's a huge hole in the paper.

------
duskwuff
Is it just me, or don't the graphs at the bottom left appear to indicate that
Fastpass achieves "improved fairness" through considerably diminished
performance, especially under low contention?

~~~
ghayes
It took me a while, but I believe this might be one of the few times that a
stacked line chart would better serve the purpose. Total throughput is the
addition of the 5 "per-connection throughputs." Thus, during the peak, we see
~2 Gb/s for all 5 servers, totaling ~10 Gbps. In the top graph, we see 1-3
Gbps for the 5 servers, totally ~10 Gbps. It would be more clear if a "total
throughput" line were added or the author used a stacked line chart, since
we're looking at both total throughput and fairness.

~~~
duskwuff
Right, but I mean the start and end of those graphs. That shows 6+ Gbit for
one flow without Fastpass, and around 4 Gbit with it.

~~~
wmf
Yeah, that's fairly poor. It should be 10 Gbps for the first 30 seconds.

------
ssw1n
Just curious: Is Amy Ousterhout anyway related to John Ousterhout of Stanford?

------
wfunction
Is there no queue for the arbiter?

~~~
deadgrey19
Fastpass is not "zero queue". They simply move queuing into other places.
First, in the end host. When a host wants to transmit data, it must queue
those packets and send a message to the arbiter to get a slot, it must then
wait for a response, and then wait for a slot. Second in the arbiter itself,
the arbiter must be able to keep up. This is easy at low load, but will get
much longer at high load and larger systems.

