
Google bots follow SQL-injected URLs from third party sites - cubictwo
http://blog.sucuri.net/2013/11/google-bots-doing-sql-injection-attacks.html
======
mrtksn
So, this is basically using the Google bot as a proxy. I don't see what Google
can do about it, or does Google have to do anything at all.

Just secure your website as if the attacker is not going to use the Google bot
as a proxy, because well, nobody can guarantee that the attacker will use the
Google bot.

Google bot is nothing more than a computer sending HTTP GET request to your
server, so if your site is vulnerable to HTTP GET requests why you write like
there is something special about Google bot?

~~~
djjaxe
Well it would be nice if google could check the url it was visiting and if
there is any sqli in it to not send the request (though this could potentially
slow their crawling...)

~~~
geofft
How is Google supposed to check for what is/isn't "sqli"? The proposal reminds
me of Yahoo! Mail's old "medireview" problem, where it filtered emails
containing the string "eval":

[http://en.wikipedia.org/wiki/Medireview#Blocked_emails](http://en.wikipedia.org/wiki/Medireview#Blocked_emails)

Even if you look for somewhat complete SQL strings, if I want to host
[http://try-sql-in-your-browser.io/?sql=select+foo+from+bar](http://try-sql-
in-your-browser.io/?sql=select+foo+from+bar), I'd want Google to index it.

~~~
mikeash
For a hilarious variant trying to protect against "attacks" on humans, use
your favorite search engine to search for "buttbuttination".

~~~
hobs
Awesome, I had seen a few things in the wild like this, but this ended up in
some fun reading. Thanks!

------
ChuckMcM
This is sort of sad, now I have to go dig through crawler code to see what we
(Blekko) do when we crawl a site. We do avoid known 'traps' (which are often
inadvertent) that result in loops but this is a bit more nefarious.

On a related note, if you would like to be the person looking through the
crawler code and designing a defense for this, and are willing to work in the
bay area. Send me an email :-)

~~~
jsmeaton
I don't believe this is a problem a crawler should be solving. The site left
open SQL injection vulnerabilities, and they're complaining that someone is
attacking them. Whether the attack originates from a crawler or not is beside
the point.

~~~
garethadams
The article is about an attacker using Google's behaviour as a form of
indirection. The attacker can't be traced directly by the vulnerable site if
he carried out the attack simply by triggering Google's crawler.

~~~
jsmeaton
I agree that it mentions indirection, but I don't agree that it is the purpose
of the article at all. The article finishes with this line:

> We are contacting Google about it, but it is always something important to
> keep in the back of your mind. __You can’t just whitelist their IP, and
> allow through without any type of inspection __.

I wouldn't be surprised if the authors knew of the sqli vector, and whitelists
appropriate clients. The problem is the sqli, not WHO can hit it (or who was
ultimately responsible).

Edit:

I just went and looked at the company behind this article. I was too quick to
judge apparently. Their business is protecting developers from stupid mistakes
like sqli at the firewall.

Therefore, the article really IS about indirection. Sorry.

------
iLoch
I think the real problem here is that the author is assuming the Google Bot
somehow gets special treatment. Why are you concerned that the bot is sending
SQL injection attacks? The same thing could be done by a regular user. Are you
defending against regular users? If so, then you're defending against Google
bots too. Now if the problem is that your firewall is blocking Google bots,
well then you're going to have to let them through if you want Google traffic.
It's unfeasible for Google to fix this on their end. Your application should
be secure from SQL injection without the need for a firewall anyway, so it
shouldn't be a technical problem having to make an exception for Google bots.

~~~
bryanlarsen
The problem is not that the Google Bot gets special treatment, but that
attackers get special treatment. It's common to blacklist or slow responses to
the IP's of potential attackers. A slow response to a false positive is
usually acceptable as long as you have a low rate of false positives, but it
can seriously affect your Google ranking if you are slowing their bots...

------
lazyjones
This could also be Google probing for vulnerable software on your website.

They're doing it for various Typo3 versions (I know, because we got some false
positives in the past - Google Webmaster tools warned us about it, we saw the
requests in logs, our fault for replying with Status 200 for some invalid URLs
where we just showed our main page), they might be doing it for other software
where the only way to check for vulnerabilities is to try an actual (harmless)
SQL inject.

------
badman_ting
But why would the attacker want this, how can it benefit them? I don't
understand how a third party could make Googlebot hitting a SQLi benefit
themselves. I get the cleverness of hiding behind Googlebot, I just don't see
how you can "steer" the attack, so to speak, to your own benefit.

~~~
bazzargh
There's a long history of being able to search google for sites with
vulnerabilities or infections if you know what to look for. With a black hat
on, alter a busy forum's code to generate a link spreading the infection via
google to every domain that commenters mention; a tiny fraction of those will
be infected, those will show up later in a search.

That's a bit weak, and might not pick out decent targets. However, visit the
forum as a user, mention some _specific_ sites, and hey presto - google
infects them for you, while you disappear in the crowd.

Seems a bit airport thriller though. More likely the leaked sqli and google's
use of it were accidental.

------
asm89
Wondering how effective this is for getting a competitor website to block for
example the google crawler. Maybe that's the motive.

------
vezzy-fnord
Reminds me of an anecdote I read on Stack Overflow where a web developer
inserted an unprotected database killswitch parameter in his code which ended
up being triggered by a Googlebot.

This is a pretty interesting attack vector, though. The old Google bomb being
used for more nefarious purposes than spoofing page rankings.

------
tylerkahn
Spoofing IP headers (specifically the source address) is possible.

~~~
jamescun
This would only work on connectionless protocols, such as UDP. UDP offers no
message reliability and relies on the source address to respond. This is how
DNS amplification attacks work, as the main protocol of DNS is UDP.

Whereas HTTP goes over TCP which is a connection-oriented protocol. TCP offers
message integrity by going back and forth between client and server multiple
times to verify the message was retrieved successfully. Without a valid source
address the 3-Way TCP Handshake used to establish the connection cannot
succeed.

~~~
hnha
thank you, I always wondered but never asked.

~~~
AYBABTME
Coursera had a great Computer Network course.

I think you can still register and download the videos:

[https://class.coursera.org/comnetworks-002/](https://class.coursera.org/comnetworks-002/)

------
rshm
I think the cause here is google following the injected urls from third party
sites.

~~~
axblount
That's exactly what the article says.

