
How I attacked myself with Google Spreadsheets (2012) - aliakhtar
http://www.behind-the-enemy-lines.com/2012/04/google-attack-how-i-self-attacked.html
======
tristanj
Previous discussion (763 points, 143 comments). For some reason this doesn't
show up in the "past" search results but shows up if you search for the
domain.

[https://news.ycombinator.com/item?id=3890328](https://news.ycombinator.com/item?id=3890328)

~~~
gus_massa
New title: "How I attacked myself with Google Spreadsheets (2012)" (I assume
someone added the 2012 later.)

Old title: "How I attacked myself using Google and I ramped up a $1000
bandwidth bill"

I think that the problem is that Algolia looks for results with _all_ the
words, the new title has the word "Spreadsheets" that is not in the old title.

I think Algolia has an option to allow variations and typos in words, but not
an option to match most of the words (like 4 of 5 words).

~~~
monochromatic
The url is the same though. That ought to be detected.

------
aresant
That is just such a fantastic headline. So close to linkbait but really just a
brilliant summary of what actually happened. So good I actually remember it
from last time (and clearly others did too based on first comment!). I'm
always amazed by what enters our collective consciousness based on exceptional
word smithery.

------
thoman23
Apparently this is a repost, but I for one missed it the first time around.

I'll just say I found it to be a highly entertaining and well-written account
of a nightmare scenario I think many of us here can relate to: The unexpected
and unexplained exploding AWS bill.

------
wodenokoto
I know Google has cheaper bandwidth than most, but it's still amazing that
they are willing to pull 250gb every hour of every day for a single, free
spreadsheet.

~~~
subway
Welcome to the wonderful world of network peering.

I suspect Google is very transmit heavy on bandwidth usage. Peering agreements
tend to sweeten as your rx/tx ratio approaches 1, so increasing rx on the
network makes it easier to establish a peering arrangement, avoiding the need
to purchase transit.

~~~
voltagex_
Is that why download traffic (incoming data) is free on EC2?

~~~
kbuck
Almost certainly. Many datacenters offer free incoming bandwidth either to
improve their ratio or just because it doesn't actually cost them anything
(e.g. they're paying for a 10gbit internet exchange port but their traffic is
highly asymmetrical on the outgoing side).

------
scintill76
> What I find fascinating in this setting is that Google becomes such a
> powerful weapon due to a series of perfectly legitimate design decisions.

It does have a certain "perfect storm of good intentions" quality, but no,
"prefetching" hundreds of gigabytes worth of images that the user is not
looking at right now* and that will not be cached for the next time the user
views it, that the user did not indicate will be changing frequently or have
recently changed, and doing it every hour on the hour (according to timestamps
in a screenshot), is not a "perfectly legitimate" design. Calling it that
implies IMO that there is nothing Google should change about this (maybe the
author does not mean that.)

Maybe I or the author are missing something here -- why did Google think it
was necessary to fetch something that will not be immediately shown to the
user nor will it be cached for later? I can understand the no-caching
decision, but then why fetch at all if it's not needed _now_? Why is 1 hour
supposedly short enough for some hypothetical user that wants their
spreadsheet's embedded images to update automatically, but long enough to not
cause damage (wasn't long enough in this case)? And I hinted at "on the hour"
above because it seems like some sort of staggered refreshing would be better
on the CPUs and networks involved, though it wouldn't make a difference to the
author.

Even if for some reason they think fetching this aggressively and wastefully
is good, it seems like it's in Google's own interest to have some kind of
safety valve (bandwidth restriction, hard abort, something in between) after a
few hundred megabytes on one spreadsheet's refresh cycle. If nothing else,
that omission means it probably wasn't a "legitimate" design decision.

Wild theory: the author was accidentally causing the refresh somehow (or maybe
purposely automated but forgotten.) Somehow it seems more likely than Google
setting it up this way on purpose...

* I'm kind of assuming here, but the author doesn't mention anything like he was actively viewing the spreadsheet while the attack was happening. Even if he had it open (and with all the image-linked cells in view!) for hours on end, I stand by my other points that it's strange and not a perfect design for Google to auto-refresh in this fashion.

~~~
coreyja
I'm gonna test it out on a much smaller scale, and slightly different setup. I
just grabbed 3 images from a rarely used server hosted with DigitalOcean, with
a custom domain, and put them in a new Google Sheet using the same =image(url)
technique the author mentioned. The access logs show Feedfetcher-Google;
(+[http://www.google.com/feedfetcher.html)"](http://www.google.com/feedfetcher.html\)")
grabbed each of the images once immediately which makes sense. I'll check back
in a few hours and see if there are any other requests.

Edit: I also grabbed 3 different images and put them in a separate sheet. I'm
gonna leave one open on my desktop and not open the other and see if that
changes the requests.

One Hour Later: I think that Google is probably grabbing images On-Demand now.
There are 12 total requests on my 6 images. The first 6 are sporadic, which
correspond to when I added the images to the sheets. Then the next 3 are in
the same second, which come from opening the sheet on my desktop. And the last
3 also came in the same second, again from opening that sheet on my desktop. I
kept one open and closed the other and neither have had the images requested
since.

~~~
onli
I'm pretty sure they fixed it. I don't remember whether that was already in
the original discussion of this story or a little bit later, but I remember
the comment that one can't use it anymore as an attack tool since the
feedfetcher behaviour changed.

------
spdionis
A lot of people complain about amazon lacking a cap for spending everytime
such a story appears. Invariably though everrytime i've read about something
like this happening amazon dropped the bill if the usage was not intentional.

Honestly from what i've seen this policy of amazon's is really nice and if
they did otherwise they'd constantly get a lot of bad pr. Cases like this
probably happen often but not everyone writes about it. A lot more would write
rant blog posts if amazon didn't drop such bills.

------
x1798DE
Does anyone know if Google ended up changing their behavior on this?

I'm struggling to see why this is a legitimate design decision on their part -
how is downloading a new copy every hour different from maintaining a
persistent cache wherever they are storing it after download?

~~~
_gopz
> Since these URLs are private, Google does not want to store them anywhere
> permanently in the Google servers.

~~~
x1798DE
Yeah, I read that part, I just don't understand it. They are obviously storing
the contents of the image somewhere (in memory?) for an hour, otherwise
there's no point prefetching it in the first place. At the end of that hour,
they download it again. If the image isn't different, they would be better off
not throwing away the old copy.

------
vortico
Why do people chose to use hosting services with no cap, and then complain
that their bills are arbitrarily high? You agreed to that in the Terms and
Conditions.

Just use a service with a fixed monthly rate for a fixed capacity, and
up/downgrade as needed. Of course you don't want your service to be shut down
after reaching a limit, but you should be watching the resources as you would
with AWS, only the consequences are much less bizarre than a surprise $1,700
bill.

~~~
akshatpradhan
People aren't as self-compliant with watching their resources as you think.

------
scintill76
If content negotiation[1] had a standard way, Google's client could have told
the server it was only for a thumbnail of N size and a smart server could
serve less bytes.

[1] [https://developer.mozilla.org/en-
US/docs/Web/HTTP/Content_ne...](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Content_negotiation)

------
leni536
Is it a good idea to use AWS through a prepaid virtual card to avoid such
cases? I'm planning to set a up a personal site and I could afford the site
going down instead of paying $1000. The guy got refund though, but I would
rather not going through this hassle.

~~~
abricot
Well... even tough they wouldn't be able to charge the money to your card, you
would still owe them.

~~~
sisl
Yep.

Even if Amazon would claim the money you owe in realtime, they probably
wouldn't shut you down if your prepaid hits zero.

I use a prepaid for my Kindle purchases and Amazon processes the order even if
the payment gets rejected, following up with increasingly insistent reminder
emails.

------
bearzoo
I REALLY REALLY want someone to do this with a huge amount of google image
thumbnails so that the google crawlers just start hitting google servers.
Would it be considered malicious to do such a thing?

~~~
danso
> _Would it be considered malicious to do such a thing?_

Maliciousness is usually described in the context of intent. So your first
sentence should provide the answer to your own question. Is there a way to do
what you describe such that the result is gratifying and yet doesn't cause
significant duress on Google's servers and employees?

~~~
jon-wood
Just doing it the way started wouldn't cause significant duress on Google's
servers. You'd be requesting that Google transfer data from a service designed
to handle massive volume of traffic, across their own network, to another
service designed to handle massive volume. Honestly, I'd be surprised if it
even tripped any monitoring alarms.

