The idea of a CDN is that you don't route everything through Google's data cente...

nostrademons · on July 27, 2012

Why stick the mirror in the customer's home, though, where you have to mirror it once per customer? It basically becomes the same as a browser cache then. The logical place for it would be the local switching office (or whatever the equivalent is for fiber technology) - no idea if that's how it actually works, but that's where my naive knows-little-about-networking-besides-the-TCP/IP-stack mind would put it.

btilly · on July 27, 2012

The idea is that one customer's request could be served off of a mirror sitting at another customer's home.

The saving compared with putting it at a local switching office is that you don't need to buy a set of special machines and hard drives that sit at that office. Instead you're leveraging underutilized machines that people have already paid you for that are sitting at their houses.

As far as the rest of the network is concerned, the traffic doesn't need to go to them so they are happy.

nostrademons · on July 27, 2012

That seems highly inefficient, though. Now the request has to go up to the switching office and back down to another customer's home, instead of just up to the switching office. Why not just stick the cache at the switching office and eliminate one of those hops, cutting out some latency too? Disk and even memory are cheap...hell, as Patrick's preso pointed out, one of the main reasons for this project is that bandwidth performance is not increasing nearly as fast as CPU, disk, and memory.

btilly · on July 27, 2012

It is less efficient on time. But the speed of light delays back and forth over your local network are still much better than the time it takes to go anywhere interesting - like a data center. So it is still a win for a consumer. (Albeit less of one than having a separate set of equipment in your office just for caching stuff.)

devicenull · on July 27, 2012

But that's not how fiber works. At some point, it still has to go back to a central office. You can't just connect to your neighbors directly, it all ultimately goes through a switch/router somewhere.

sophacles · on July 27, 2012

The point is it doesn't save overall bandwidth used, it saves bandwidth on shared/contended resources. If you have, eg a switch with 10 1Gb ports[1], and 1 1Gb uplink, and 4 of those are doing something intensive enough to saturate that uplink, someone who requests say, a full download of the gmail client, then it could go strictly across the switch to one of the other 6 local google boxes that has a cache of that, and at a lower latency and impact on other people than going to through the uplink to the nearest cache.

Now, this could also be done in the switch closet, you are right. However, since this would have to also go through either the uplink, or every switch would need a port dedicated to a cache network/box, it would start getting expensive at switching points. Each would start looking like a mini-data (micro? nano?) center. At that point, you could just eat that cost, or say "what are alternatives that cost the same or less in capex and opex?" Perhaps with Google's network-fu, they have solved similar problems in data centers already, and said "we can use our caching/routing stuff here, and put a small capex increase each customer box, which we also need no matter what, and decrease switching point capex, and since it is a simpler network, reduce opex too".

Essentially, it is a similar problem to the one bittorrent solves, just at a different scale/locality. It also starts to look like solutions some vendors/ISPs looked into at one point for bittorrent - instead of stopping bittorrent, keep a map of local people seeing segments and reroute requests for those segments to the local network rather than across the uplink.

[1] assume a decent switch with a full mesh backplane. Also assume real switches will be used with real numbers, not my exemplary ones - the analysis will be the same, but the numbers will of course be different.

philwelch · on July 27, 2012

And with an uplink speed that equals your downlink, this peer-to-peer CDN becomes even more practical.

throwaway64 · on July 27, 2012

the thing is the route to any "mirror" that sat in a persons home would be through Google's network. This wouldn't save them any money.

btilly · on July 27, 2012

It does save them money, which is why every large ISP does it.

The fact that makes it work is that not all routes through Google's network are created equal.

Routes that go to and from data centers go a longer distance, through more pieces of equipment, and include busy backbones that you do not want to get overloaded. Routes that stay in a local neighborhood go a short distance and put load on one router which should be able to take it, and totally skip the critical backbone.

From the point of view of the network operator, going to a data center is slow and expensive. Keeping traffic inside a local neighborhood is fast and cheap. Thus they want as much traffic as possible to go the fast and cheap route.

CDNs cache data on local mirrors, and routes traffic to them whenever possible because that is faster and cheaper than going all the way to a data center. Every large ISP does this, and it would be shocking if Google didn't follow suit.

But actually caching data on hardware that is sitting at customer's houses is an interesting twist.

tonfa · on July 27, 2012

Yes, and putting a CDN node where the fiber terminates seems much much simpler (they might already do it for TV feeds anyway).