Hacker News new | comments | show | ask | jobs | submit login

Where do you get the crazy idea that Google doesn't run its links to saturation? It's crazy because it would cost an enormous amount of money.

The B4 paper states multiple times that Google runs links at almost 100% saturation, versus the standard 30-40%. That's accomplished through the use of SDN technology and, even before that, through strict application of QoS.

https://web.stanford.edu/class/cs244/papers/b4-sigcomm2013.p...

A few more details about strategies here:

https://research.google.com/pubs/archive/45385.pdf

Then there's a whole bunch of other host-side optimizations, including the use of new congestion control algorithms.

http://queue.acm.org/detail.cfm?id=3022184

You might recognize the name of the last author...




No, it would be crazy for them to run things at saturation under normal circumstances as that does not allow at all for abnormal circumstances. The opportunity cost of not using something 100% all the time is offset against the worth of increased stability/predictability in the face of changing conditions.

Though you do need to define "saturation". Are you referring to bulk bandwidth or some other measure of throughput/goodput? Saturating in terms of raw bandwidth can reduce useful throughput due to latency issues.


What I mean is that they do not run their links to saturation in the same way as an ordinary ISP. And because their traffic patterns are very different than an ordinary ISP, and much, much more geographically distributed, they can do all sorts of fun software tricks. The end result is the same: Low/no jitter and no packet loss.

As contrasted with what would happen if you had a theoretical hosting operation behind 2 x 10 Gbps transit connections to two upstreams, and tried to run both circuits at 8 to 9 Gbps outbound 24x7.


For clarity, do you mean that Google can, for example, run to 99% saturation all the time, whereas a typical ISP might have 30-40% average, with peaks to full saturation that causes high latency/packet loss when it occurs?


Yes, that's about right. Since they control both sides of the link, they can manage the flow from higher up on the [software] stack. Basically, if the link is getting saturated, the distributed system simply throttles some requests upstream by diverting traffic from places that result in traffic over that link. (And of course this requires a very complex control plane, but doable, and with proper [secondary] controls it probably stays understandable, manageable, and doesn't go haywire when shit hits the fan.)


So I wonder if that means they can do TCP control flow without dropping packets.


I guess they do drop packets (it's the best - easiest/cheapest/cleanest - way to backpropagate pressure - aka backpressure), but they watch for it a lot more vigorously. Also as I understand they try to separate long lived connections (between DCs) from internal short lived traffic. Different teams, different patterns, different control structures.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: