
Google's fiber leeching caper - damian2000
http://www.dodgycoder.net/2013/02/googles-fiber-leeching-caper.html
======
ghshephard
Everybody who pays @95th dreams of doing something like this the very first
time that 95th billing is explained to them. I mean _Everbody_.

What the Article fails to explain, is that there is both a port cost, and a
minimum commit. For much less than what it would have cost them for the port,
and minimum commit on a 1 Gig connection, they could have simply mailed their
data via USPS and had it there in three days.

Also, if we do the math - and presume that they only ran 1 gigabit for 5% of
the time - 1 gigabit/second at 36 hours (5% of a month) in terabytes = 16
Terabytes - which barely makes it into the "10s of terabytes" category, and
well within the "Just mail an AIT-3 tape. [Edit - cheaper yet, just load up
the hard drives that you are going to install anyways in your destination]

Edit 2:

Doh - <http://www.mkomo.com/cost-per-gigabyte> shows that the largest hard
drives back then were 20 Gigabytes. We're talking a minimum of 1000 hard
drives for "10s" of Terabytes.

Maybe that 1 GigCircuit Hack wasn't such a bad idea after all.

~~~
mctx
You'd trust your hard drives with USPS?

Edit: I'm not implying that USPS is necessarily unreliable, I'm curious to
hear if this is common practice.

~~~
ghshephard
Presumably I'm not sending the originals. Also, powered down, hard drives are
pretty robust - their read/write heads are locked up. Remember - those drives
have been bouncing around a LOT to get to you through retail/wholesale
channels.

[edit] Also - they obviously had to buy hard drives for the destination,
right? Presumably, early stage google was all about the open-chassis on
plywood slam drives in approach, and wouldn't have been averse to loading up
30 or 40 drives, mailing them out, and simply diffing and resyncing as needed
if 1 or 2 drives had issues, which would have been unlikely.

Also, presuming you needed to have one of your people go to the remote data
center, there is always the old, _move the datacenter with hard drives in my
luggage_ trick. Even in 2000 you could put a lot of data into carry-on.

------
uvdiv
Reminds me of a loophole Microsoft found in their utility contract:

[http://www.nytimes.com/2012/09/24/technology/data-centers-
in...](http://www.nytimes.com/2012/09/24/technology/data-centers-in-rural-
washington-state-gobble-power.html?pagewanted=all)

Threatened to waste $70,000 of electricity to circumvent a $210,000 contract
penalty (for not buying enough electricity), and thus negotiated their way out
of it.

(Note this is MSFT not GOOG, so be sure to interpret this as an act of
_disreputable evil_ and not a clever funny hack).

~~~
GhotiFish
I think what's important here is that it's funny.

Google exploiting good faith rules to shirk payment? Funny.

Microsoft blowing $70K in power to meet a contract? Also Funny.

------
JoshTriplett
From the article:

> At the end of the month, the top 5% of usage information was discarded, to
> eliminate spikes, which were assumed to be measurement errors.

Not actually true. The bandwidth measurements don't have accuracy problems;
providers use 95th percentile billing to make connections "burstable",
allowing servers to handle an small unexpected traffic spike without a huge
bill. And on the flip side, many sites have a traffic pattern of highs during
a subset of the day (customers' waking hours or business hours or non-business
hours, depending on the site) and lows outside that period (sleeping, etc),
for which 95th percentile billing produces a higher bill than just charging
for bytes transferred.

<https://en.wikipedia.org/wiki/Burstable_billing>

Very few customers have the kind of traffic pattern that would allow for using
bandwidth during only 5% of the month. And in this particular case, Google
could just as easily have saturated a 50Mbps line rather than using 5% of a
gigabit line, which would have massively reduced (though not completely
eliminated) the bill.

~~~
damian2000
Thanks, didn't know that, I will update the post to mention Burstable billing.

------
SoftwareMaven
Even for a scrappy startup, I consider this to be exceedingly unethical.
Somebody paid for that bandwidth, somebody who essentially invested in Google
with no opportunity to see a return on the investment.

I can't picture the world where this would not be considered "evil".

~~~
Narkov
The rules were set by the supplier and they played within the rules. Don't
play the game if you don't like the outcome.

~~~
rtpg
Ethics aren't about the "rules".

~~~
jchavannes
You will always have bad people. The more people that exploit the system, the
faster the system will change to be more 'fair', otherwise you're just
handicapping yourself. Look at all the patent lawsuits. As they said in my
business ethics class: ethics schmethics.

~~~
tedsanders
(1) It's not clear that acting unethically will lead to better rules and more
ethical outcomes, in general [and actually, in some cases the opposite will
happen - if you're corrupt it makes it easier for others to be corrupt]

(2) Handicapping yourself is the whole idea behind ethics and morality.
Sometimes it's more important to do the right thing than the selfish thing.

Attitudes like yours are so disheartening to read. I totally disagree with
your sentiment.

~~~
jchavannes
Moral != Ethical. I'm guessing you think Foxconn replacing workers with robots
is bad too?

~~~
tedsanders
How is moral different from ethical? In my mind, they are pretty synonymous.

And no, I don't think Foxconn is bad for replacing workers with robots. Why
would I?

------
mrb
This "95th percentile" measuring method is still widely used by bandwidth
providers, colocation facilities, etc. I am not sure why the article says it
was a common billing practice "of the time". More info:
<http://en.wikipedia.org/wiki/Burstable_billing>

~~~
jlgaddis
I'd be curious who still uses this. None of my contracts are based upon the
95th percentile. We also don't bill using this method either.

~~~
ghshephard
I've never met an ISP that sells Racks in a colo that didn't bill transit at
95th percentile (if you chose to use theirs). Are you hosting bare metal, or
is your colo providing the physical server?

If you deal with a Datacenter that is just providing you with a Cage + Power +
Cooling + Physical Security, you are almost always going to pay for transit
@95th percentile.

------
ars
I'm not sure if I'm supposed to be impressed at the ingenuity, or disgusted at
the dishonesty.

~~~
rdl
Early-Google really fucked moron salespeople at most of their vendors (well,
technically the salespeople also got paid, so the only morons were the
investors). People sold space with power/cooling included at ratios which were
unsustainable, and Google's people were more than willing to sign contracts
with the facilities at those prices. Then Google "fully utilized" what they
had in the contract, which really screwed their colo providers.

OTOH, this was in the colo/bandwidth/fiber nuclear winter of 2001-2004, so
maybe some vendors were happy to have some revenue, but there were cases where
the Google stuff was at negative overall margin on variable costs (i.e. the
vendor would have been better off shutting down the power plant vs. selling to
Google).

~~~
jonknee
Using what you paid for is not unethical. Selling what you can't provide is,
so if the colo can't keep up it's not Google's fault.

~~~
rdl
It's like renting a hotel room and then running the water/power/etc. 24x7 (and
via cable to outdoors to power a factory for your 24h stay).

~~~
jonknee
Sure... If hotels advertised unlimited water and power. I similarly don't feel
bad for a restaurant that loses money on an all you can eat buffet.

~~~
Evbn
There is "unlimited" and there is "reasonable and customary so let's not waste
energy metering it".

------
miles_matthias
That is a pretty awesome hack, but I wonder why they didn't just use hard
drives and ship it instead of wasting months. Cost / benefit seems off there.

~~~
joshmlewis
As mentioned above around that time cost of data on hard drives was $20/gb so
that would come out to around $240k at the time. It's crazy how things have
changed and we can buy 16gb flash drives now for $5-10. I remember when the
first flash drives came out and it was like 128mb for $20.

~~~
Evbn
More like 4MB for $20. I have some in my desk.

------
anigbrowl
I find this a little hard to believe; it has the whiff of an urban legend.
Besides the fact of minimum monthlies and the probability that the bandwidth
provider is not so stupid as not to notice that they're moving lots of
terabytes30 hours a month and nothing at all the rest of the time (especially
over a period of months), where's the time saving? you could just do all the
copying in the data center, repack the drives and fly them to the east coast.
This seems a lot more efficient than trying to do it in chunks of <72
minutes/day.

------
css771
That's insanely brilliant if you ask me. And on a site called Hacker News, mos
people should agree!

------
dredmorbius
A fair number of bandwidth providers charge little or nothing for inbound
datacenter traffic (it's not what they're constrained on), which makes
services for which inbound traffic is high, like web crawling, very cost-
effective. Google likely benefitted from this as well, though they're hardly
the only ones.

------
BryantD
I completely believe this story, because I've done similar stuff. In 1998 I
was working for a company that was using a lot more incoming bandwidth than
outgoing bandwidth, which is the reverse of the usual pattern. So when we
outgrew our data shack and had to move to a real data center, I badgered my
sales guy into showing me the overall bandwidth usage charts for the proposed
location.

Unsurprisingly, there was way more data flowing out of the facility than there
was flowing in, so I talked him into a substantial discount on inbound
traffic. "Hey, nothing else is using that half of the duplex..."

~~~
Evbn
That's optimizing for mutual benefit. not the same as stuffing.

------
damian2000
I've just made some corrections to the post. I was going off memory when I
originally wrote it. After re-reading the relevant passage from the book again
I've corrected the number of terabytes that Google needed to transmit - it was
more like 9 TB. And they transmitted the whole lot over two nights in one
month, not three nights. Sorry for the confusion.

It was on page 187-188 of Steven Levy's book "In the Plex" and the two people
he's quoting are Urs Hölzle and Jim Reese, who were both involved in Google's
infrastructure.

------
DanWaterworth
They had a lot of flexibility around when they used the bandwidth. I don't
understand why they couldn't just negotiate a deal whereby they would only
send data during times of low demand.

------
smutticus
I call BS. I'm supposed to believe that Google took months to do something
that would take Fedex 2 days. I think Google is smarter than this.

~~~
ghshephard
If, by "10s" of terabytes, they mean at least 20 terabytes, then, in 2000, a
"Large" hard drive was 20 Gigabytes. That comes to 1,000 Hard drives. If they
had some sneaky way of getting onto a GigE connection, then perhaps this might
have made sense...

~~~
yardie
They still had to purchase HDs at the other end. It's not like data just lives
in ether. At some point it's got to be written to someone's spinning rust.

How is this any cheaper than copying the HDs, shipping to the Northeast and
installing the HDs into waiting servers there?

~~~
ghshephard
The logistical challenge of swapping 1000+ drives into/out of a small number
chassis, keeping track of what data you've copied, would have been painful
(but not impossible if stretched over a week)

I think the key issues here would be was moving the data at all time sensitive
(is a two month latency not a problem), and, did they really have a "free" 1
GigPort they could sneak under the 5% window with? If it really was free (or
very, very cheap) - then maybe it would have been worth it.

------
dan1234
Maybe I'm overly skeptical in my old age, but I have a difficult time
believing this.

Surely there'd some sort of rental cost for the line in addition to the
bandwidth cost (or perhaps a minimum spend commitment)? Wouldn't it have been
cheaper, simpler and possibly quicker just to get a smaller line and saturate
that?

Has anyone read the book? Does the author provide a credible source?

Smells like urban legend.

~~~
damian2000
I was surprised to read this in the book, and even more surprised that it
seemed no one had discussed it online yet either, as far as I could tell. Its
on page 187 of the book. The author, Steven Levy's source for this story is
Urs Hölzle who was the first VP of Engineering at Google. His title now is
Senior Vice President for Technical Infrastructure.

------
ef4
This kind of traffic optimization happens more frequently than you might
think. Particularly among CDNs.

CDNs don't optimize solely for performance. They optimize for a balance of
performance and cost. That means saturating a given 95/5 link for 5% of the
month, and then mapping the traffic away right before it starts to get
expensive.

------
cantbecool
I don't understand the "didn't use the connection at all outside that time,
they should be able to get some free bandwidth." How or why would Google shut
down a data center for that long just to pump some data across the lines?
Wouldn't that mean down time? I don't get it.

~~~
damian2000
The purpose of this data transfer was internal - transferring search index
data between west and east coast - and they had a supplier contract just for
this purpose.

~~~
cantbecool
Thank makes sense. Silly me.

------
NeutronBoy
It's common for internet providers who cap to run the 'capping' process in
batches, so it's possible to get 'free' bandwidth by exploiting the time
period (often up to 24 hours) between reaching your limit and when your
provider caps you.

------
trustfundbaby
So is it that the index didn't change that much in the months the transfer was
happening? or the changes were cleverly isolated so they could be copied over
last and the switch flipped immediately after?

------
outside1234
"a good deed never goes unpunished" (95% billing to prevent bursting
megabills)

"don't be evil" (except for 3000 exceptions)

------
ricksta
why not save it on an SSD and fedex it over?

~~~
polymatter
The first 3 words of the (short) article are "Back in 2000".

------
OGinparadise
_At the time, Google was not hugely profitable like today, and were very
conscious of costs._

Meanwhile their hosting company was making a trillion in profit /s.

Google plays fast and loose with ethics and definitions, from (shall we call
it) penalizing competitors and then releasing a clone to working with Visa and
MasterCard ala Wikileaks to cut off the money
[http://www.gizmodo.co.uk/2013/02/google-wants-to-stave-
pirac...](http://www.gizmodo.co.uk/2013/02/google-wants-to-stave-piracy-sites-
to-death/) to sites with "piracy sites." Their PR hasn't worked on me for a
few years and there's even a boomerang thanks to it.

