
How I attacked myself using Google and I ramped up a $1000 bandwidth bill - Panos
http://www.behind-the-enemy-lines.com/2012/04/google-attack-how-i-self-attacked.html
======
OzzyB
+1 For Amazon for kindly reimbursing the overage charge.

-1 For Google for creating what is the biggest threat to content providers by enabling easy-to-use DDOS attacks across the entire interwebs.

</hyperbole>

Seriously, is this what we have to look forward to when Google Spreadsheets,
and God knows what else, become ever-more popular?

Think about all the additional onerus costs that would be incurred by content
providers as more and more Google Spreadsheet users _hotlink_ images, mp3s,
videos...

This has to be a bad design decision by Google, there's no need to redownload
assets by-the-hour, on-the-hour, regardless of whether the user's spreadsheet
is open or not.

Is it time to go back to the days of putting your web assets behind
$HTTP_REFERER?

~~~
hm8
Like the author notes, I don't think the problem was with Google. It was no
fault of theirs. The reason: It's a fine line between maintaining privacy
policies and managing such events. If google were storing/caching these links,
there would have been an outcry from those worried about user privacy and
stuff.

About the by-the-hour downloads, well there again is a trade-off between
providing data quickly and doing a lazy evaluation.

I think, as the author notes, it was just an unfortunate event that was a
consequence of good design decisions gone bad circumstantially and wreaking
havoc for the author.

It was certainly nice of Amazon to have made that refund. The resources (read
bandwidth) were used after all.

~~~
ricardobeat
I don't understand the privacy concern - a publicly accessible URL doesn't
offer any privacy. It's the same as a transparent proxy.

~~~
andrewreds
A password works by there is a piece of text that only you know, and thus,
when you give that password to a server, that server knows who you are.

can't you also view a url as a password? (If only I know the url, then only I
can download the file).

I am able to give out a url to someone else, so they can access the file,
likewise, I can give out my file server's username and password, and whoever
has it can also access my files.

~~~
gizzlon
You have a point, but the difference is that url's are not usually considered
secret and are therefore not treated the same way a password would be. For
example in the browser..

------
damoncali
Reminds me of the email cannon we built on Gmail, not exactly on purpose. We
needed a full Gmail account to do some testing on our company's email backup
system. At the time that meant 8GB of email. And you couldn't just send a
bunch of huge attachments, as it had to be like regular email. So we took a
gmail account and signed it up for dozens of active linux related email lists
(because they were easy to find).

The result was an email account that got an email message every second or so
in a variety of languages.

We quickly realized that you could have some fun by forwarding that address to
_someone else's email account_.

Fast forward a few months, and Gmail smartly requires a confirmation before
allowing you to forward.

~~~
creamyhorror
Once in middle school in the late '90s I created an email loop between two
free email accounts that forwarded an extra copy of the email every iteration.
It was an experiment out of curiosity and mischief. I don't think the server
actually went down when I started the loop (at least, I can't remember if it
did) - since the mailbox would have failed up and messages fail to be
delivered.

I think I was hoping to produce an "email cannon" similar to yours - an email
spammer to be directed at whoever I liked.

------
richardlblair
It's really awesome that Amazon was reasonable and refunded the charges
because they were accidental. I mean, technically it was still your fault, so
it would have been easy for them to be jerks about it.

~~~
sohn
They wouldn't be jerks if they asked for the charges; you still generated the
traffic and they had to pay for it.

~~~
EvilTerran
Just as "legal" does not mean "ethical", "contractually permitted" does not
mean "not a jerk move".

~~~
jmj42
just like asking you to pay for the services you used is not a jerk move.

Amazon did a nice thing, but the services were used. Asking to user to pay for
those services (even if it was a mistake), would not have been a "jerk move"

~~~
Karunamon
Bandwidth pricing is a funny thing, in that it's only metered because it's
convenient to do so. You haven't consumed any kind of finite resource by
moving 1GB or 1000GB. There isn't any "use".

And it makes good business sense besides. They can let the guy off and eat the
probably less than a hundred or so bandwidth this guy actually cost them due
to their upstream providers, get a good writeup and look better as a result,

..or kill his account, take him to collections, etc, etc, etc, (which would
probably cost them more than $1K anyways), and lose a customer and get a PR
black eye while they're at it.

So yeah, it would be a jerk move, and a pretty dumb one at that.

~~~
jff
In the end, we _are_ talking about a finite resource, because the Internet can
only handle so much data transfer at one time. One way or another, your
packets have to travel over physical infrastructure which, just like your home
network, has a maximum capacity. Use of this infrastructure costs money;
charging by bandwidth should help prevent users from "clogging the tubes"
willy-nilly because there is a cost associated with excessive use.

~~~
Karunamon
Except most providers don't charge by time and demand (which would actually
make sense), but by a fixed cap or fixed cost per byte. If the problem is
congestion as you suggest, surely it would make more sense to charge different
amounts based on time of day, length of session, etc?

Which would give Amazon more trouble? 1,000 TB spread out over a month, or
1,000 TB spread out over a day? Their current pricing model assumes both of
these are equal, which they are plainly not.

Aside: Caps make the same mistake. If you have a 250GB cap, the _ahem_ ISP
charges the same whether you burn through that cap in a month or in a day.

~~~
p9idf
It is more complicated than that. Customer convenience must be considered. For
instance, I and many others like me will not buy data transfer from someone
when figuring the cost requires me to account for the time of day and phase of
the moon.

~~~
Karunamon
Customers already deal with such billing from other companies and it's not
difficult.

Ever had a cell phone? During the day counts against an allotment, after a
certain time at night, calls are free. For a home internet connection it would
probably be reversed, as peak usage is right after work lets out.

Some power companies also do the same thing.

------
gojomo
But, why re-downloading every hour?

Does merely having the spreadsheet passively open in a browser trigger that,
or was some other process re-loading the spreadsheet every hour? (If the
former, I wouldn't be as forgiving of Google. I understand the desire not to
cache possibly-private data, but proper URL design and conditional GETs should
be able to prevent the entire download on an automatic hourly schedule. And
even if the latter – the author had chosen to reload the spreadsheet each hour
– I'd want Google's design to allow browser-side caching to work for such
embedded and/or generated images.)

~~~
jonknee
About an hour is the standard time for external data in Google Spreadsheet to
be refreshed. I've come across this with JSON data.

~~~
gojomo
Even if noone has it open in a browser? Or they do have it open, but haven't
interacted with it?

In either case, this seems odd, unless the URL is especially noted as
'volatile', and/or there are other parts of the spreadsheet that might trigger
conditional calculations/notifications based on that URL's contents. (And
don't S3 resources have last-modified-dates or etags for conditional GETs?)

~~~
jonknee
I don't remember, but it wouldn't surprise me, that way it updates correctly
for offline mode and if you have conditional logic it can work on changed
data. It also improves page load time, which is very important for Google in
general but doubly so for an online office suite (Excel opens quickly, so
should a Google spreadsheet).

------
tocomment
This really underscores Amazon's glaring omission of a billing cutoff on
Amazon web services. How hard would it be for them to let me say, cut off my
services at $100/month?

This is the main reason I'd never use AWS to host anything public.

~~~
dredmorbius
It's not that simple.

First, the problem isn't inherent to virtual hosting services, you could just
as easily get hit by this on a bare-metal site, though the interplay of S3 and
Google Docs is an added dimension.

Cutting off all services opens the door for a class of DoS attacks. Simply
direct enough traffic at a single account's assets, and you'll knock them
offline for a given billing cycle. If the attack is cheap to launch (botnet,
URL referral network, etc.) it's a cheap attack. Different entities would have
different cut-off and degradation policies.

Better would be to identify the parameters of a specific anomalous traffic
pattern, but this can be hard.

A more general solution is to set asset (server side) and client (remote side)
caps in tiers. You'd want generous (but not unlimited) rates for legitimate
crawlers, your own infrastructure, and major clients. The rest of the Net
generally gets a lower service level. Such rules are not trivial to set up,
and assistance through AWS or other cloud hosting providers would be very
useful.

~~~
nknight
> _you'll knock them offline for a given billing cycle_

Er... Because caps can never be raised mid-cycle?

~~~
dredmorbius
Sure, they could be, but it's still a pretty blunt alerting system.

Depending on the organization, though, you'd have to get purchase approval on
the overage, etc. That will depend on specifics. Makes for sticky questions to
answer there as well, which is another argument for putting better management
tools at the cloud level.

~~~
nknight
You're making this _way_ more complicated than it needs to be. It's not an
alerting system, it's a knob, like a thermostat. I turn the knob where I want
it, the system follows the "run until this limit is hit" instruction. If I
want to change my mind later, even after the limit is hit and the system shuts
down, I can make that decision at that time. I don't want the system or
anybody else to make it for me.

As for the "purchase approval" nonsense -- if the knob isn't right for _your
company_ , then don't use it, but there are a _huge_ number of companies where
the guy turning the knob is the one and only person with _any say_ over money
being spent at all.

------
vladd
I cannot help notice that Hetzner offers 5000 GB/month AND a full dedicated
server, for 39 EUR (51 USD) [1], so his traffic would have cost him at that
rate a total of 100 USD if he were to use a dedicated server instead of
Amazon.

(before mentioning Amazon's scalability, consider that Hacker News is ran on a
single dedicated server, and the moral of the story seems to be how not to
scale especially when you don't want to)

[1] - <http://www.hetzner.de/en/hosting/produkte_rootserver/x3>

~~~
robryan
Of course if you want to do away with any kind of built in redundancy and take
over the sysadmin duties involved you can get the raw space and transfer
cheaper.

~~~
krelian
I have a VPS on buyvm which includes 2TB/month for $6. Each extra TB is
another $2.50. This is not a special case, you can other providers with very
good pricing for data transfer. I don't understand how the difference in
bandwidth pricing can be so large. I'm also certain that Amazon doesn't pay as
much for bandwidth as the guy from buyvm.

~~~
true_religion
buyvm is overselling their bandwidth to you. Real uplink pricing is not 2
dollars per terabyte. If you actually tried to use say 13 terabytes from a
single vm, it would work eithe because of rate limmiting on your uplink or
they would cut you off.

~~~
lsc
well, real uplinks are charged on the 95th percentile, mostly. (sometimes
capped, and Cogent does 90th percentile)

I'm signing a deal (if all goes well) tomorrow for $0.65 per MB/sec. 5 gigabit
commit on a 10GB port, ($1.15/megabit overage charge, billed on the 90th
percentile.) From Cogent, probably the cheapest provider that claims 'tier 1'
status (there's a lot of argument about Cogent's "tier 1" status... which is
kind of funny; If you only have one provider, a good tier 2 is going to be
more reliable than a tier 1 anyhow.) but it is a real uplink and I really can
use 5 gigabits of that.

If you ran one megabit full out for a month, assuming 2,629,743 seconds in a
month and assuming seagate gigabytes, divide by 8 and you get 328717
megabytes. so, uh, yeah, two dollars a terabyte? assuming they are larger than
I am, and their BGP mix is the low-end stuff (Cogent and HE.net or the like)
they aren't losing money.

~~~
X-Istence
Prices have gone down since the last time I had to deal with all of that. I
remember the company I was working with paying $1.67 per Mb/sec, I've seen
offers for 1 Gigabit/sec for $1000 so roughly $1.00 per Mb/sec (using 1000,
not 1024), but I didn't know they would go down to $0.65 per Mb/sec!

Guess the more you commit to, the better pricing you get!

~~~
lsc
Bandwidth pricing falls dramatically with bulk. Cogent offered me $0.75/meg
for a 3g commit, $0.65 for a 5g commit, and $0.50 for a 10G commit. I know a
guy that is getting $4.00/meg on a 100M commit on a 1000M pipe, also from
Cogent. (and he chose that over going through me for $650/month for a capped
full 1000M pipe) Bandwidth is also a "negotiated good" without a standard
price, so what I pay may be rather different from what you pay.

On top of that, bandwidth costs fall... dramatically[1] in competitive
markets, and I'm in silicon valley, probably one of the most competitive
transit markets. According to the graphs I've seen[1] I'm paying below
average, but it won't be very long before what I'm paying is average. I was
explaining this to the real-estate guy that owns my data center that was
wanting to get in on the bandwidth business (he wanted to get people in for
cheap and crank up the price on renewal, like you do with real-estate and
data-center space) His response? "but then why would anyone want to be in the
bandwidth business?"

I mean, the fiber in the ground? that's like real-estate. the prices go up,
the prices go down, eh, whatever. But, the amount of traffic you can push over
two strands of fiber? that goes up all the time; I have some old cisco 15540
ESPx DWDM units that can do 16 10GB/sec waves over one pair of fiber. They
were awesome back in the day. Modern DWDM equipment? you can get 80 100GB/sec
waves on one pair of fiber.

It's irritating, though, as like everything in this industry, you have to
negotiate for months to get the "real price" - I asked Cogent for a single,
capped 1 gigabit port for $1000/month north of 6 months ago. "Call me back
when you can get me a buck a meg" They kept calling me back "how about $3 a
meg? how about $1.75 a meg?" I mean, even now, they beat the buck a meg price
point, but I had to buy a lot more than I needed (I'm splitting it half and
half with another company, at cost, so while the transaction cost was huge,
once you factor in the discounted setup fees, well, I am still paying more
than a grand a month, but it's still a pretty comfortable fee for me.) I
imagine Cogent has spent several thousand dollars of salesperson time, and I
/know/ I have spend several thousand dollars of my time on this, and they are
charging me rather less than if I had wasted almost none of their time. They
even dropped the setup charge down to almost nothing.

And now I've gotta do the same thing all over again with a second provider
(most likely he.net) What a waste of time and effort all around. I mean, a
little bit? it's kinda fun, I mean, sales people are always ridiculously
overdressed extroverts, and in this industry, most of them can pick up on my
personality and act in a way that is tolerable or even fun for short periods
of time, but I really am an introvert. I mean, it can be fun for a while? but
man, I have had like 5 meetings the last two days, between dealing with Cogent
and dealing with the people that are buying half the pipe from me. It's
exhausting, and I guess I have a hard time seeing how this is the most
efficient way to sell bandwidth. I mean, I guess some people throw up their
hands and pay the $3/meg asking price, and if they can break even on me, god
damn, you wouldn't need many of the $3/meg customers to get really rich.

But then that leads the question: why bother with me? I mean, I'm going to
turn around and sell transit very near this cost in a very public way, which
will mean that more of those $3/meg customers are going to turn around and ask
Cogent for a discount.

I guess they can rely on the fact that, well, I'm a scruffy introvert, and no
large corporation will do business with me. Still, I mean, Cogent will let me
drop 1G (capped, sadly) ports into datacenters where I don't have equipment
for $650/month, and I've been running ads saying I would be willing to sell
those at cost (and make my profit off of the difference between the list setup
fee, $2500, and what they are actually charging me.) - This was mostly a way
to get the higher commit pricing without actually paying for it all.

[1][http://drpeering.net/white-papers/Internet-Transit-
Pricing-H...](http://drpeering.net/white-papers/Internet-Transit-Pricing-
Historical-And-Projected.php)

~~~
X-Istence
At the time we were looking at a 2 gb commit...

$4 a meg sounds really expensive, at that point it is almost worth it going
up, have room for future expansion and pay less for it, or do you do, and
split the BW and cost.

------
TwoBit
That such a simple thing on your part could result in such an extreme and
expensive result implies something is wrong with the design of the system
you're using.

~~~
ma2rten
I was going to post exactly the same thing. It's very lenient of him to blame
himself instead of Google for this.

------
rachelbythebay
Feedfetcher strikes again, huh? I never did find out why they were so
interested in one of my images.

<http://rachelbythebay.com/w/2011/10/27/wtfgoog/>

------
K2h
I loved your reference to the huge Russion bomb Tsar Bomba. I for one had
never heard of it, and it made a great metaphor.

[1] <http://en.wikipedia.org/wiki/Tsar_Bomba>

~~~
DanBC
Check out the Nuclear Effects Calculator; You can see what effect various
historical bombs would have. (They include Tsar Bomba.)

([http://nuclearsecrecy.com/blog/2012/02/03/presenting-
nukemap...](http://nuclearsecrecy.com/blog/2012/02/03/presenting-nukemap/))

(<http://news.ycombinator.com/item?id=3624714>)

------
ltcoleman
This was an extremely interesting article. I hope Google rectifies this type
of behavior. This is actually pretty scary.

~~~
typpo
This isn't Google's fault. I could get a couple machines and do the same thing
if I had a list of 250 gigs of files. And I wouldn't be limited to once every
hour.

~~~
bri3d
But those machines wouldn't be Google's, and you would probably either be
committing a crime (building an exploited botnet) or using someone's money (be
it your employer, university, or yourself) to run them.

This is pretty clearly a different league of attack: rather than attacking
systems or spending money, you'd be exploiting a Google feature to use
Google's resource for free to incur a giant amount of data transfer.

~~~
tantalor
It would almost certainly be criminal regardless of the method (Google, EC2,
botnet).

------
RandallBrown
If I wanted to launch an attack on something like Instagram all I would need
to do is put a bunch of images (hosted on instagram) into a Google
Spreadsheet? Then the google crawler will come through and download them all
once an hour?

~~~
BryanB55
That's what I'm wondering too, I don't see how this would work on a normal
website since a normal website probably wouldn't be whitelisted by Google like
AWS is... What am I missing here?

~~~
RandallBrown
I don't think you'd be able to take down a website with this strategy. The OPs
site never went down, it just cost him a lot of money. Even pretty small
webservers should be able to serve up static image files pretty easily.

But you could do a denial of service by making the service too expensive and
all of your work would be hidden behind the anonymity of the google bot.

------
igorsyl
Try <http://cloudability.com> to keep track of your cloud costs. Note: I am
not affiliated with them.

~~~
ltcoleman
Thanks for the link!!

------
stretchwithme
kudos to Amazon for running time backward and letting you pull your foot out
of the way.

I can't help but think that if benign decisions lead to disasters like this in
the cloud, how much destruction could robots wreak in the future due to
similar benign choices?

~~~
Panos
That was exactly what I was thinking after reflecting on this issue...

------
Yarnage
I'm pretty surprised Google didn't have the client download the images
instead. Wouldn't that be a better solution or am I missing something here?

Pretty interesting though and if this becomes a big enough story you can bet
Google will be changing something; the last thing they need is someone using
Google Docs to DOS websites.

~~~
bibinou
Google Docs is in HTTPS so they need to proxy the assets like Github does :
[https://github.com/blog/743-sidejack-prevention-
phase-3-ssl-...](https://github.com/blog/743-sidejack-prevention-phase-3-ssl-
proxied-assets)

~~~
Yarnage
Interesting. I never thought of that. It seems almost silly that you're
bringing insecure content over a secure channel like that but then again it is
only images.

~~~
EvilTerran
_"only images"_

As long as they don't do WMFs, with their code-injection-by-design
functionality...

<http://en.wikipedia.org/wiki/Windows_Metafile_vulnerability>

------
RobertKohr
Can you limit bandwidth with AWS?

Also, why would the spreadsheet be calling these images every hour. Did you
have the spreadsheet open? Does google do this call even when no one is
viewing the spreadsheet?

~~~
justincormack
You can put a robots.txt in the bucket.

~~~
Panos
Which will be ignored by Feedfetcher :-)

Plus you _cannot_ put a robots.txt at s3.amazonaws.com so if the url is
accessed through the <https://s3.amazonaws.com/...>. url, the robots.txt will
not work.

~~~
simonw
You could put robots.txt in the bucket if you address it using the
<http://mybucket.s3.amazonaws.com/> alternative URL scheme - a robots.txt in
the root of the bucket would then be available at
<http://mybucket.s3.amazonaws.com/robots.txt>

~~~
Panos
Yes, that would solve the issue of not being able to have your own robots.txt
file and I did not know about that. On the other hand, Feedfetcher would still
ignore the robots.txt

~~~
justincormack
Google's justification for ignoring this is very weak.

~~~
icebraining
I disagree. Feedfetcher is no different than a browser: it fetches the URL the
user inserted, nothing more (unlike a spider, which discovers URLs by itself).

~~~
lubujackson
Not true. It fetches the URL every single hour, not just when the user
requests it. So Google is claiming they can ignore robots.txt because it was
an action performed by a user (true) but they're unleashing a huge problem
with this background refreshing. Google is wasting gobs of their own money,
too. What if I made a bot that generated 1000s of Google accounts with 1000s
of spreadsheets hotlinking 1000s of big files stored on S3? This one guy's one
file did TERABYTES of transfers over a week. The underlying problem is that
Google is relying on the domain name to indicate the company size, and thus
the bandwidth allocation for this service.

~~~
icebraining
Background refreshing is a common feature in client applications, like RSS
readers. I think their reasoning makes sense.

I do think they should change their process (making it lazy-load instead), but
that's a different issue to robots.txt.

------
Freaky
Amusing thing about that page - it's full of '\$100', and uses Javascript to
strip the \'s out, replacing them with empty <span> elements. Not sure I
really want to know why...

~~~
Panos
Mathjax :-)

~~~
Freaky
Ah, that's a good excuse. Thanks :)

------
lubujackson
Here's the big money question, since I haven't used S3 - do you have any
throttling control? Can you simply block the feedfetcher bot, or block
repeated hits like this at all? It seems like S3 is a nightmare $$$ hole if
there aren't some really robust tools to manage this sort of problem.

------
jyothi
Sometimes a 509 Bandwidth limit exceeded helps!

AWS can actually do a setting for max bandwidth per hour or so & alert early
if there is suspicious activity.

------
SagelyGuru
Scary indeed. Many thanks for the warning. I was contemplating starting to use
Amazon Cloud and some Google tools but definitely will not touch any of it
now. Who knows how many other traps like this are laying in wait for the
unweary?

All this automation is very well but this illustrates the dangers of running
stuff on other's machines of unknown complexity and out of your own control,
while having to pay for whatever may happen. Not for me, thanks.

~~~
dasil003
I don't think the OA would want you to take this as FUD. These are incredibly
powerful tools that give you more control than the alternatives (AWS anyway),
and yes unintended consequences are possible, but this really is a freak
occurrence.

------
matthieupiguet
Seeing how popular the story is, amazon could not have made a better $1000 PR
campaign than being classy and reimbursing him!

------
devs1010
"What I find fascinating in this setting is that Google becomes such a
powerful weapon due to a series of perfectly legitimate design decisions."

I call in to question that these are "perfectly legitimate design decisions",
basically, if google thinks that the data is private or too sensitive to
cache, then it shouldn't be this easy to have it automatically keep hitting a
site like this. Google should have realized the potential for abuse here. I'm
guessing it truly is an oversight on their part as I can't imagine they would
want to waste all this bandwidth either, however its something they should
figure out a solution for.

------
arunoda
This is the same kind of attack. I've demonstrated here. But for users of a
popular analytic service. <http://news.ycombinator.com/item?id=3873774>

------
otterley
I'm curious to know whether the objects were stored with Cache-Control: or
Expires: headers. Does having such headers make a difference?

Clearly the client's not presenting If-Modified-Since: pragmas as I believe S3
honors those.

------
anthemcg
Quite an interesting article. I didn't even think of it as a potential DDoS
tool until he mentioned it. It makes sense but it seems like Google should
have a built a fail-safe for something like that situation.

------
nfonrose
You can use <http://cloudcost.teevity.com> to protect yourself from such
situations (disclaimer : I'm the founder and CEO)

------
cabalamat
I have a VPS. When i read stories like this, I think there isn't any point in
going over to GAE or AWS. But lots of people do use these services, so what am
I missing?

------
_k
He's getting 100+ requests per second.

You could rate limit the IPs. The question is how many IPs is Feedfetcher
using ?

------
rorrr
Kudos to Amazon for refunding the guy. I don't think any other hosting company
would just take the bullet for the customer.

------
drivebyacct2
>"But how come did Google download the images again and again?"

"But how come did" indeed.

~~~
drivebyacct2
Communication skills matter, sorry.

~~~
arunoda
But you get the idea what he trying to say no? We all here are not speaking
English as the mother language. Please excuse :)

------
timwang
very interesting findings, good read.

