
Google reveals details about its datacenters - toddh
http://highscalability.com/blog/2015/8/10/how-google-invented-an-amazing-datacenter-network-only-they.html/
======
lorenzhs
The key takeaway for algorithms research seems to be that "[w]e don’t know how
to build big networks that deliver lots of bandwidth". This is exactly what S.
Borkar argued in his IPDPS'13 keynote [1]. An exa-scale cluster can't be cost-
efficient unless the bisection bandwidth is highly sublinearly in the
cluster's computing power.

We need new algorithms that \- require communication volume and latency
significantly sublinear in the local input size (ideally polylogarithmic) \-
don't depend on randomly distributed input data (most older work does)

It's really too bad that many in the theoretical computer science community
think that distributed algorithms were solved in the 90s. They weren't.

[1]
[http://www.ipdps.org/ipdps2013/SBorkar_IPDPS_May_2013.pdf](http://www.ipdps.org/ipdps2013/SBorkar_IPDPS_May_2013.pdf)

------
GauntletWizard
One of the biggest things I've had to unlearn as an SRE leaving google is
this: RPC traffic is not free, fast, and reliable (So long as you don't go
cross-datacenter). For most companies it is expensive and slow. Facebook's
networks are still designed for early-2000s era topologies and their newer
topologies won't fix that; They've still got way too little top-of-rack
bandwidth to the other racks nearby.

Microsoft hasn't even caught on yet, and is still designing for bigger and
bigger monolithic servers. I can't tell what Amazon is doing, but they seem to
have the idea with ELBs at multiple layers.

~~~
InclinedPlane
This is what continues to shock me about the tech industry. Google started
revolutionizing the way data centers work more than 15 years ago, and it's a
huge reason why they are able to execute on ambitious projects so well and
serve up content that has a ton of compute behind it yet do so incredibly fast
yet also incredibly cheaply. That cuts right to the bottom line, yet nobody
else has copied google yet. I think it's a huge indication of just how
immature the entire industry truly is. It's difficult enough to get anything
done these days, let alone get something exceptional done well. We're only
just a little bit out of the deep depths of the "software crisis" where the
ability to run a successful software project was a hit or miss thing. But
we're still in the software crisis shallows, and it'll be a long time before
we make land fall.

~~~
threeseed
> nobody else has copied google yet

Why would they ?

Almost nobody is operating at Google's scale and interconnects like Infiniband
are rarely saturated to the point that exotic solutions are needed. Fast RPC
is nice and all but (a) not that many companies are using microservices and
(b) not that many are scaling out to the billions of requests/second.

And the 20 or so companies who are operating at Google's scale are hardly
floundering around blind. Facebook has done very interesting work in the
infrastructure space, LinkedIn is amazing on the core software side and
Netflix has pioneered the use of the cloud.

~~~
dekhn
The two examples I think that matter are Amazon and Microsoft. I don't think
they copied Google- I think they independently realized, when they decided to
enter the cloud service, that they had to scale up and out their data centers
tremendously. And when you do that, you quickly learn that deploying more
hosts is easier than deploying the ports to interconnect those hosts with a
globally nonblocking fabric.

Amazon and MSFT have both published nice papers showing that they "got it"
about 3-5 years ago. I think they operate within an order of magnitude of
Google in terms of total external and internal traffic, but that's just a
guess because none of these companies publish enough detail to say for sure.

~~~
Shebanator
As a googler, I would agree with this bucketing. Amazon and Microsoft are in
the same ballpark scale-wise, but companies like LinkedIn and Netflix have
lower scaling needs (this isn't a knock on them, just a technical reality).
Netflix would likely be in the same ballpark if they didn't leverage Amazon.
Facebook is also hovering near the line, and has probably crossed over.

------
ucaetano
I once saw the co-founder of Cloudera saying that Google exists in a time-warp
5-10 years in the future, and every now and then it gives the rest of us a
glimpse of what the future looks like.

Felt exaggerated at the time, but it often seems like the truth.

~~~
threeseed
I actually thought you were exaggerating but 5-10 years is actually very true
in some areas.

GFS only translated into HDFS after 5 years, same with MapReduce, BigTable
etc.

~~~
Xorlev
And still doesn't work nearly as well.

~~~
east2west
I have always suspected so since Google still does not open source its
MapReduce, but I haven't found any actual evidence directly comparing Hadoop
MapReduce and Google's. There could also be technical issues preventing Google
from open sourcing its implementation, although I don't know any. Now that we
are moving on to more sophisticated models and implementation I am hoping more
details can emerge.

~~~
GeneralMayhem
I don't know anything about MR in particular, but the open-sourcing process at
Google often gets bogged down by how interconnected the codebase and ecosystem
are. Releasing an open source version of a component means reimplementing a
lot of stuff that Google _isn 't_ ready to release yet but that that component
depends on. Look at Bazel/Blaze - it took them a long time to get it out, and
the open version doesn't have feature parity. Also take a look at their
governance plan going forward, which hints at the difficulties.[1]

That said, I suspect a large part of the reason is that Google's MapReduce is
highly optimized for running in a Google datacenter, and they're generally
pretty tight-lipped about the real secret sauce in their infrastructure.

[1][http://bazel.io/governance.html](http://bazel.io/governance.html)

~~~
chris_va
[https://github.com/chrisvana/repobuild/wiki/Motivation](https://github.com/chrisvana/repobuild/wiki/Motivation)

(Pre bazel)

------
thrownaway2424
Previously:
[https://news.ycombinator.com/item?id=9977414](https://news.ycombinator.com/item?id=9977414)

------
mkj
So every cluster machine has 40gbit ethernet (?) - does anywhere else do that?

Looking at Table 2
[http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183....](http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf)

~~~
vonmoltke
IBM has had 4x DDR (20Gb) and QDR (40Gb) Infiniband as an option on their
blades and frame nodes for about seven years.

I used to work on a program that is using IBM blades with 4x DDR as an MPI
processing cluster. The cluster was significantly smaller than what Google is
discussing, though.

~~~
matheweis
Yes, this. Infiniband has scaled up significantly since then as well. I
believe you can do 4x25 (100Gb) today.

Having this background knowledge gives you a much better idea of why the
interconnects is going to be such a big problem going forward.... When I was
working with QDR IB 5 years ago, it didn't matter if they had faster cards -
the PCI bus at the time couldn't support anything faster. So you were
literally riding the edge of the I/O capability all the way through the stack.

------
fauigerzigerk
Ironically, if you look at the data center as a computer, this looks very much
like scaling up, not scaling out.

I wonder if one day we will find that sending all data to a data center for
processing doesn't scale. I think that's already a given for some realtime'ish
types of applications and it could become more important.

Obviously, the success of decentralised computing depends a lot on the kinds
of connected devices and whether or not data makes sense without combining it
with data from other devices and users.

With small mobile devices you always have battery issues. With cars, factory
equipment or buildings, not so much. But management issues could still make
everyone prefer centralisation.

~~~
duaneb
> I wonder if one day we will find that sending all data to a data center for
> processing doesn't scale.

Of course it scales. Some problems demand more scale. Through more datacenters
at it. Tweak the algorithms to distribute data more conservatively.

~~~
fauigerzigerk
What could happen is that both processing capacities and sensor capacities
grow faster than networking capacities for years. That could make latencies
untenable for large classes of applications.

~~~
duaneb
Latency is bounded by the speed of energy transfer (i.e. speed of light under
superconducting situations). Even if bandwidth IS a big problem—as it
is—latency will always be a sticking point for distributed systems. As someone
else pointed, we need to start analyzing distributed algorithms with a)
geographical locality and b) complexity on round trips, which have a constant
cost determined by latency.

~~~
fauigerzigerk
Speed of light determines the lower bound of roundtrip latency, but the
sticking point is the upper bound, and that depends on how congested networks
are.

If sensors produce more data and data centers can process more data but
network capacities can't keep up, then the analysis you are suggesting will
show that we need more decentralised processing, not more data centers.

~~~
duaneb
Ahh, I see.

I don't think we were disagreeing; I was just pointing out that data centers
scale very well for what data centers were built for. It's not that surprising
that we are outgrowing them.

------
kordless
> The amount of bandwidth that needs to be delivered to Google’s servers is
> outpacing Moore’s Law.

Which means, roughly, that compute and storage continue to track with Moore's
Law but bandwidth doesn't. I keep wondering if this isn't some sort of
universal limitation on this reality that will force high decentralization.

~~~
duaneb
> I keep wondering if this isn't some sort of universal limitation on this
> reality that will force high decentralization.

To be fair, bandwidth has a high cap. Not all problems will need to be
decentralized.

~~~
kordless
> Not all problems will need to be decentralized.

Our use of bandwidth is growing without bounds. That means _most_ problems
will need to be decentralized, which is akin to saying we're going to need to
decentralize. :)

~~~
duaneb
Well, I have full faith humans can adapt. We just haven't needed to, yet,
except with e.g. slower memory latency reduction vs cpu speedup.

------
andrewstuart2
> The I/O gap is huge. Amin says it has to get solved, if it doesn’t then
> we’ll stop innovating.

I can imagine you can solve the throughput problem with relative ease, but the
speed of light limits latency at a fundamental level, so proximity will always
win there.

I tend to think that storage speed/density tech rather than networking is
where the true innovations will eventually need to happen for datacenters. You
can treat a datacenter as a computer, but you can't ignore the fact that light
takes longer to travel from one end of a DC to another than it would from one
end of a microchip.

------
Mettalknight
> The end of Moore’s Law means how programs are built is changing.

I feel like this is just not true. Yes we have slowed down significantly over
the past couple of years. BUT I do believe that once something big happens (My
bet is the replacement of Silicon in chips or much stronger batteries). I feel
like Moores law will pick up right where it left off.

~~~
ska
Betting on the technological advances we haven't thought of yet to happen is
pretty safe - but betting on them solving a particular current engineering
challenge is a good way to fail.

