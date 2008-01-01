This type of scales boggle my mind. Though I have found I can no longer keep up with all the terminologies popping up every day. Posts like these are my only connection to learning the massive scaling of things to make the modern networks work.
"We leverage our large-scale computing infrastructure and signals from the application itself to learn how individual flows are performing, as determined by the end user’s perception of quality."
Is this implying they are using Machine Learning to improve their own version of content delivery network?
a: dark fiber IRUs between cities/metro areas
b: N x 10 and 100 Gbps wavelengths as L2 transport services from city to city, from a major carrier such as level3 or zayo
c: some combination of A and B
and they use that to build backbone links between their own network equipment that they have full control over. Google is its own AS and operates its own transport network around the US 48 states and around the world.
the exact design of what they're doing within their own AS at layers 1 and 2 is pretty opaque unless you happen to be a carrier partner that is willing to violate a whole raft of NDAs. But basically they've built their own backbone to a very massive scale yet without the huge capital expense of actually laying their own fiber between cities.
their network has incredibly low jitter because they don't run their links to saturation, and know EXACTLY what the latency is supposed to be from router interface to router interface between the pairs of core routers that are installed in each major city. Down to five decimal places, most likely. When you have your own dark fiber IRUs and operate your own WDM transport platforms you are in possession of things like OTDR traces for your dark fiber that tells you down to four decimal places the km length of your fiber path.
It also helps that the sort of people who have 'enable' on the AS15169 routers and core network gear are recruited from the top tier of network engineers and appropriately compensated. If they weren't working for Google they would be working for another major global player like NTT, DT, France Telecom/Orange, SingTel or Softbank.
The B4 paper states multiple times that Google runs links at almost 100% saturation, versus the standard 30-40%. That's accomplished through the use of SDN technology and, even before that, through strict application of QoS.
https://web.stanford.edu/class/cs244/papers/b4-sigcomm2013.p...
A few more details about strategies here:
https://research.google.com/pubs/archive/45385.pdf
Then there's a whole bunch of other host-side optimizations, including the use of new congestion control algorithms.
http://queue.acm.org/detail.cfm?id=3022184
You might recognize the name of the last author...
Though you do need to define "saturation". Are you referring to bulk bandwidth or some other measure of throughput/goodput? Saturating in terms of raw bandwidth can reduce useful throughput due to latency issues.
As contrasted with what would happen if you had a theoretical hosting operation behind 2 x 10 Gbps transit connections to two upstreams, and tried to run both circuits at 8 to 9 Gbps outbound 24x7.
Wow.
edit: for a list of the geographical (OSI layer 1/2) locations where AS15169/google peers, see the following: https://www.peeringdb.com/asn/15169
The metros can be seen in e.g. the 1e100.net hosts in a traceroute. They're usually the closest airport code, so e.g. lhr for London.
Somebody in China reverse engineered the metro/POP naming and addressing for latency reasons. You can see that, for example, there are at least three POPs in Sydney:
https://docs.google.com/spreadsheets/d/1a5HI0lkc1TycJdwJnCVD...
https://github.com/lennylxx/ipv6-hosts
For example one can determine that EGLL is in Western Europe, UK without having to look up tables to determine that it is Heathrow.
example agg router:
agg1.nyc01.ny.us.ASNUMBER.net
and then individual interfaces and subinterfaces would be defined under hierarchically under agg1.
For example, if you look at the PeeringDB entry for AS15169 (https://www.peeringdb.com/asn/15169), for the London "metro" there's public peering available on LINX at 3 different POPs, and private peering available at Digital Realty, 3 different Equinix POPs, and 2 Telehouse POPs.
Oh no. I bet this reuse of a name has gone unnoticed internally until now.
This just shows that Android is treated as the ugly step child even within Google (not that this is not already obvious given the state of the Android API).
Espresso delivers two key pieces of innovation. First, it allows us to dynamically choose from where to serve individual users based on measurements of how end-to-end network connections are performing in real time.
Second, we separate the logic and control of traffic management from the confines of individual router “boxes.”
Quick search is showing the 2015 keynote that Amin gave, haven't found the 2017 one yet...
[1] - 2015 ONS Keynote https://www.youtube.com/watch?v=FaAZAII2x0w
And B4: https://www.youtube.com/watch?v=tVNlXg0iN-g
Not really. The OSI model doesn't say anything about where I run my routing algorithm and BGP application vs. where my actual switches are.
"Classical" networking is an artifact of viewing routers/switches as monolithic blocks that embed all of their functionality in one black box. I said BGP application above because that's what it is, an application for distributing/communicating state. The same can be said for many other parts of networking traditionally embedded in the monolithic blob we often call a router.
Label switched fabrics provide inherent NFV, security functions, and allow you to influence paths (i.e. traffic engineer) from applications that are equipped to make decisions based on your priorities, not some rigid vendor implementation.
You will see more of this.
Not every application can put enough context in the request to make it work the way Google is working. Sort of app request context based routing.
Also, they have the advantage that a lot of their apps are like search, in that no consistency is needed. Five consecutive searches for "some query" can return different results each time, with no adverse effects.
That creates a lot of flexibility in routing requests to destinations.
Espresso is making decisions on how to use the application layer, but the underlying layers stay the same.
If you're $BIGASN and you set up an intra building singlemode crossconnect at $BIGCITY to establish settlement free peering (let's say for example a 4 x 10 Gbps bonded 802.3ad circuit) with $OTHERBIGASN, they most assuredly are going to notice if your BGP session and router is not directly on the other end of that cable.
Because they are going to be expecting sub-1ms latency to your router, and not "we're taking this session and stuffing it in some sort of tunnel or encapsulation and sending it somewhere else, to where the thing that actually speaks BGP is located". It's bad juju to practice deceptive peering.
Why should they care?
> It's bad juju to practice deceptive peering.
I don't understand applying moral judgment to a technical design choice.
Two: it's not moral judgment, it's a technical best practice to actually put routers in the city in which you set up new edge BGP sessions. Pretty basic ISP stuff in fact.
In fact, a good system would have a couple of systems handling BGP, with physical location fairly irrelevant, but acting as if they are local to the peer they are talking to.
cough cough what? One of the major challenges for a CDN is predicated upon OSI layers 1 and 2: You need to establish POPs with routers and caching servers geographically distributed near major IX points (L2 peering fabrics, and crucial buildings that host the same IX points, where you can run intra-building fiber crossconnects for network-to-network interfaces to provide settlement free peering to major ISPs). The internet is physically built out of a great deal of equipment at layer 1.
In the case of Google, you need to have a team of people who care about things like cost-effectively building intra-datacenter 100GbE layer 2 connections between Google, and large content sinks (eyeball) ISPs such as Charter/TWTC or similar.
Hand waving around and saying "we've built some new software to improve how we efficiently deliver BGP sessions to edge peers" is cool and all, but don't mistake it for some radical change. It is all still built on top of things like 2 megawatt diesel generators, massive battery plants, DWDM line terminals, dark fiber, etc.
that allows all the advertising to draw in ad spend as well as it does.
https://www.youtube.com/watch?v=TLbzvbfWmfY interesting stuff starts around 7 minutes
There is no useful information in here to advance the state of the art, no new ideas, no publicly available implementations (closed or open source). It's just a very high-level architectural view of a large network given by people who are incentivized to present it in the most favorable light. And due to the lack of any concrete details, it's free from critical analysis.
>Espresso delivers two key pieces of innovation. First, it allows us to dynamically choose from where to serve individual users based on measurements of how end-to-end network connections are performing in real time. Second, we separate the logic and control of traffic management from the confines of individual router “boxes.”
The first has been done before at many levels of the network:
* BGP anycast
* DNS responses based on querier
* Done in load balancer
* IGP protocols to handle traffic internally while taking into account link congestion
I assume their framework gives them much nicer primitives to work with than the above, which would be an advancement in the field if we could actually see an API or something.
The second is very far from "innovation". This is the essence of SDN and this has been the hottest thing since sliced bread in the networking world since 2008 at a minimum [1] and even earlier if you look at things like the Aruba wireless controller.
1. http://archive.openflow.org/documents/openflow-wp-latest.pdf
[1] https://research.google.com/pubs/Networking.html
For example, it would be cool if it were possible to move shared-client/server-secret checking (eg. for an HTTP API) out to the edge of Google's network, such that a DDoS attack with invalid packets (secrets) never even reach the application VM/cluster. DDoS attacks, which force applications offline (by making the app scale up to an unsustainable cost level), could be prevented this way.
Could be a 'smart people problem' -- easier to 'innovate' a new wheel then visualize how existing tools can be remixed to solve.
For an example of remixing, it was apparent 15 years ago that traditional routing was actively harmful for live and VOD video streaming, so we cobbled a mix of the techniques you listed plus a couple more[1] to connect users to edges in real time based on actual real-time end-to-end conditions.
We did it consciously using these two ideas, plus content aware caching, plus one more "magic" but super trivial idea I still haven't seen elsewhere in the wild.
The low tech rethink worked well, handily outperformed proprietary solutions per industry perf and SLO metrics companies, and it's been an ongoing surprise me that it's taken so long for others to rethink or remix the same way.
We filed no patents, as to your point, this all could be argued evident to anyone 'skilled in the art'.
---
1. A couple custom bits: we also had to write an edge server shim as no media server at the time could handle sessions flapping in real time, not to mention the content awareness crucial for giant media files.
> Rather than pick a static point to connect users simply based on their IP address (or worse, the IP address of their DNS resolver), we dynamically choose the best point and rebalance our traffic based on actual performance data. Similarly, we are able to react in real-time to failures and congestion both within our network and in the public Internet.
https://youtu.be/j_K1YoMHpbk?t=7472
As someone who recently entered this field professionally, I find it amusing that most SDN solutions out there are just permutations of each other playing over marketing buzzwords.
Not too different from "cloud computing" from a few years ago.
PS. "The ability to master complexity is not the same as the ability to extract simplicity" is a good takeaway.
PPS. This is a part of EE 122: https://inst.eecs.berkeley.edu/~ee122/fa12/class.html
PPPS. PDF for SDN lecture: https://inst.eecs.berkeley.edu/~ee122/fa12/notes/25-SDN.pdf
[0] https://tools.ietf.org/html/rfc3258
For a company like google, you would most likely have an edge towards your servers as well as an edge towards your peering partners. The peering edge is therefore the part of the network that is used to connect to BGP peering partners.
