
How to contact Google SRE by dropping a shell in Cloud SQL - fanf2
https://offensi.com/2020/08/18/how-to-contact-google-sre-dropping-a-shell-in-cloud-sql/
======
antoncohen
> One of our interesting findings was the iptables rules, since when you
> enable Private IP access (Which cannot be disabled afterwards), access to
> the MySQL port is not only added for the IP addresses of the specified VPC
> network, but instead added for the full 10.0.0.0/8 IP range, which includes
> other Cloud SQL instances.

> Therefore, if a customer ever enabled Private IP access to their instance,
> they could be targeted by an attacker-controlled Cloud SQL instance. This
> could go wrong very quickly if the customer solely relied on the instance
> being isolated from the external world, and didn’t protect it with a proper
> password.

I'm not convinced by this, I'm not sure it is vulnerable in the way the author
is suggesting "they could be targeted by an attacker-controlled Cloud SQL
instance".

First of all, GCE has firewall rules outside of iptables. But the main thing
is that the way Cloud SQL does Private IP is via VPC peering. Google creates a
VPC on their side, runs MySQL in it, and peers that VPC with your VPC. You
actually tell Google what CIDR range to use in the their VPC (the Cloud SQL
VPC).

I don't think is it fair to assume that all customers are in the same VPC, and
same subnets, with routes between them, and no GCE firewall rules blocking
them.

~~~
epereiralopez
We found every Cloud SQL instance runs in a Google-owned project called
"speckle-umbrella-<num>", with <num> being a number between 1 and 80. Each
speckle-umbrella-* project contains several Cloud SQL instances, of different
customers, and they do seem to be on the same network and without proper
firewalling, because we ran zmap on 10.0.0.0/8 and could see several IPs with
the MySQL port open (We did not try to connect to any of them though).

This problem would have probably been avoided if Cloud SQL used different
tenant projects per customer (Something most other GCP services do), but for
some reason it doesn't do that.

~~~
antoncohen
That is interesting. There is some magic networking going on if Google allows
every customer to allocate an IP range of their choice, and the customer can
use all the IPs in that range, and Google runs multiple customers on the same
network (same VPC and subnet).

A project can contain multiple VPCs. And a VPC can contain multiple subnets,
but not with overlapping ranges.

[https://cloud.google.com/sql/docs/mysql/configure-private-
se...](https://cloud.google.com/sql/docs/mysql/configure-private-services-
access#configure-access)

------
emptyparadise
The lengths you have to go to talk to a real person at Google!

~~~
9dev
We had an issue with our GKE cluster once, which first threw an unknown error
during a (much anticipated) bug fix release, and was subsequently stuck in
some kind of loop. No other deployments could be created, three notifications
about an unknown error were spawned per second, the audit log was overflowing.
Tried to reach someone at google, no chance. The situation fixed itself after
a few days, presumably some kind of timeout was reached.

~~~
mcintyre1994
This is Google Cloud GKE? You can't speak to support there within a few days
on a paid account when their systems completely break?

~~~
9dev
Unless you pay for a separate support subscription, no. Not even if it’s their
systems at fault.

~~~
Ansil849
I mean, this seems fair to me. If you want support, you have to pay for it.

~~~
danudey
If I'm paying for a service, I expect to be able to speak to someone when that
service breaks. If Google Cloud breaks my GKE cluster, it's unreasonable that
I can't get anyone to go and fix it without having to pay extra money.

If I break it, sure, fine, no problem. IF it's their fault, I expect to be
able to hold them to account.

~~~
donor20
That's absolutely not the agreement. If the service breaks you will be
eligible for SLA credits if the service downtime exceeded the SLA target.

The GKE SLO is something like no more than 3.5 hours of downtime per month.

"I expect to hold them to account".

No - if you are not paying for support you will just be able to claim the
credits. You do this by filling out a form within 30 days. That's it.

~~~
ttymck
I, and I'm sure others, appreciate this response. But I think if someone is
under the belief that "I expect to hold them to account", then the only
appropriate response is: "Then GCP is not for you"

~~~
donor20
Not entirely true. If you have a business need that demands this level of
holding another big company to some account / get some response, then you pay
for it.

I think google's higher end service level starts with a $150K/year base fee +
cut of spending. That's actually a pretty good deal (1FTE) to have your back
much more covered when there are issues - I think they work towards 15 minute
response times there. Plus they can help you avoid screwing up your own
redundancy planning through reviews of your setup.

What wasn't clear from parent was do they expect to hold google ($100B+/year)
to account while spending $2,000/month - that isn't going to happen at all as
google as already outlined how they will compensate you for downtime.

Finally - for really large deals you can negotiate with their sales folks.

~~~
ttymck
An important distinction, absolutely.

------
blasdel
The /mysql/tmp/greetings.txt trick was cute

but do kids these days not know about
[https://linux.die.net/man/1/wall](https://linux.die.net/man/1/wall) ?

~~~
jeffbee
These systems are stripped down to the bare minimum. There's no reason to
believe that every "standard" program, and certainly not a setgid programs
like wall or write, would be present.

~~~
peterwwillis
All you need is write access to the pty fd (or in the case of a reverse shell,
just the fd of the tcp socket). The SREs could talk to the hackers and the
hackers could just echo stuff in their terminal which the SREs could read.
Writing a file to disk is less l33t, but more straightforward :)

Edit: I think I was wrong; you can't manipulate network socket fds this way,
you'd have to use ptrace() on the process. If it were a real shell with a pty
I believe what I suggest could work, but reverse shells don't open ptys.

~~~
mkj
The "# cat greetings.txt" has a # suggesting they sorted out a real pty
somehow. Or it was faked later :)

~~~
marcan_42
The usual trick to get a pty is `script /dev/null` by the way, if that command
is available

------
gwittel
Nice work and writeup. All stemming from very basic mistakes -- SQL and
command injection.

Worrying that the CloudSQL internals (like the private IP range) aren't
strongly walled off. It will be interesting to see how this changes in
response to the researchers' work.

~~~
trhway
looks like a shortcut:

> the container was configured with the Docker host networking driver
> (–network=host).

~~~
waheoo
unreal. lol.

~~~
lima
Probably wasn't meant to be a security boundary.

------
spicyramen
This is the type of write up that used to appear in _hackernews_ great work

~~~
enneff
It is evidently the type of write up that is still posted to HN.

~~~
spicyramen
Yep less and less we see this

------
Teppich
You have to realize something and its true for Azure, AWS and Google
(practical experience):

You have to pay for your support and its not cheap.

Its a good thing on one side: You get yourself a cheaper selfservice solution
and you can use a ton of stuff until you really need support. Its a bad thing
as you haven't thought about this and support costs money.

Big company with Support contract with Google: Its awesome. Srsly. You get an
answer in the next 4h, they will experiment, they will talk to the product
teams, they will keep you posted etc.

~~~
Insanity
Yes, I have experienced both the "quasi direct line" and the "just a startup"
side. Things are different. And it does make sense that things are this way.
:-)

------
ransom1538
Thank you!

CloudSQL (expensive wrapper around mysql) restricts you changing the
wait_timeout which is really terrible when using cloudfunctions OR
innodb_flush_log_at_trx_commit. Now! With my own reverse shell I can finally
edit the CloudSQL variables and contact a SRE.

[https://dev.mysql.com/doc/refman/5.7/en/server-system-
variab...](https://dev.mysql.com/doc/refman/5.7/en/server-system-
variables.html#sysvar_wait_timeout)

~~~
gouggoug
Google cloud SQL does not prevent you from changing wait_timeout
([https://cloud.google.com/sql/docs/mysql/flags](https://cloud.google.com/sql/docs/mysql/flags))

However, it seems like innodb_flush_log_at_trx_commit is not supported indeed.

------
29athrowaway
They write up is excellent, but here there is a tl;dr

The Google Cloud console offers a way to export data into cloud storage based
on a SQL query expression.

By exploiting a SQL injection vulnerability related to that SQL query
expression field, the attacker learned that they had file write access to
/mysql/tmp.

The attacker found a second vulnerability affecting the API endpoint used to
export data. This endpoint invokes mysqldump behind the scenes.

Then, the attacker creates a database with a malicious plugin embedded in a
BLOB. This plugin is a C program that creates a shell process where standard
input and output file descriptors are an alias for a socket file descriptor,
creating a "reverse shell" where the attacker can execute commands remotely.

Then, this malicious database was imported via mysqldump, and the malicious
plugin was written to /mysql/tmp, then loaded, thus executing the malicious
plugin.

Using the reverse shell created by the malicious plugin, the attacker found
that the process was running in a Docker container using host networking,
which could be used by the attacker to monitor network traffic in the VM host
machine.

Using traffic monitoring, the attacker found traffic related to the Google
Guest Agent, and then used a TCP connection hijack attack to hijack
connections to the Google Guest Agent.

The connection hijacking was used to trick the Google Guest Agent to authorize
a new SSH user, which was then used to escape the Docker container VM.

~~~
biddlesby
Thank you! The article lost me after the reverse shell, but your explanation
helped me to understand.

------
YarickR2
Google CTF event is this weekend, I believe it's not too late to gather a team
, register it, and take a part ; such events are a very special kind of fun; I
participated in one of CTFs recently, and quickly understood limits of my
knowledge, as well as multitude of ways to enrich given context, and exploit
it.

------
avmich
Just remembered - a long time ago, in a galaxy far, far away, to drop into
debugger - and get superuser permissions - on BESM-6 one could launch a clone
of "Colossal cave" on a terminal, get to a particular place inside and wave a
magic wand...

~~~
bloopernova
Wow, thanks for the nostalgia trip!

I think I might alias "XYZZY" to "sudo -i" :)

------
leblancfg
You had to scroll all the way down the article, but:

>Therefore, if a customer ever enabled Private IP access to their instance,
they could be targeted by an attacker-controlled Cloud SQL instance. This
could go wrong very quickly if the customer solely relied on the instance
being isolated from the external world, and didn’t protect it with a proper
password.

Anyone care to expand on that? Would that be common practice?

~~~
sleepydog
They're making a reasonable assumption for a bare-metal network, but their
conclusion doesn't hold for Google Cloud (and likely Azure/AWS). A customer's
Cloud SQL instances do not live in the same VPC as another customer's, and in
Google Cloud, VPCs are fully isolated -- 10.0.0.0/8 in VPC A is a distinct
network from 10.0.0.0/8 in VPC B, and the two cannot communicate unless there
is a Cloud VPN, VPC peering, or a multi-NIC instance acting as a bridge
between the two. The broad firewall rule is harmless; I must admit I'm not
even sure why the iptables rules are there at all. Probably something about
security and onions.

Source: I work in GCP support.

~~~
ThePowerOfFuet
If you work in GCP support, have you ever looked at the Cloud Spanner
project(s) in Pantheon? There are thousands and thousands of Cloud SQL VMs
sitting in one project, so it's absolutely plausible that there might be some
traversal possibilities.

------
rootsudo
That must've been such an exciting moment to have someone say hello.

Chilling too.

~~~
londons_explore
More importantly, it means those same employees probably have unaudited access
to the data in your cloud SQL database.

If SRE can write a file to the filesystem, they can totally copy the database
files out too.

I completely support SRE being able to log into instances for debugging, but
them being able to do that without leaving an audit trail visible to me, the
data owner, isn't up to modern standards IMO.

~~~
delroth
As far as I can tell, there is nothing in the writeup that suggests the file
wasn't there from the beginning and present on all Cloud SQL instances. I know
this was a standard practice in a few of the products I worked on where we
expected the first few layers of security/obfuscation would be peeled off by
curious outsiders.

This seems much more likely than OP's exploration being detected in realtime,
then someone SSHing into the container manually to put a message.

Disclaimer: I work at Google, not on Cloud SQL, and don't know anything
specific about that greetings.txt.

~~~
speckler
I work on Cloud SQL, we definitely don't add a file like that.

~~~
wtm_offensi
It could be the case that SRE recognized our accounts since we are hunting on
a daily basis.

In rare cases SRE's reach out to ISE's who contact us on their behalf. In this
case a file was added. I personally don't see any reason why this would be a
bad thing, it's a managed service and we tried to attack it after all. No data
was removed, nor is there any indication that there are no audit trails. I
hold both the ISE's and SRE's in extremely high regard. (If you try to Hunt
for bugs in GCP you will find out why)

------
mkj
In the tcpdump screenshot the hostname is something.bugbounty.internal . Was
the author running against specific bugbounty machines or something?

~~~
wtm_offensi
The tcpdump screenshot was created after the bug was fixed. It demonstrates
the interception of traffic generated by the Google Accounts Daemon on a
instance running within my personal project.

------
terom
Curious about the container capabilities that enabled them to attack the host
network: per the docs [1] containers do not get `CAP_NET_ADMIN` by default,
but they do get `CAP_NET_RAW`. I assume that's what allowed them to
inspect/inject and network traffic and thus spoof the HTTP response.

So `docker run --net=host --cap-drop=NET_RAW` seems like it might be a good
idea. I wonder if it's still needed for `ping` and such in modern Linux?

[1] [https://docs.docker.com/engine/reference/run/#runtime-
privil...](https://docs.docker.com/engine/reference/run/#runtime-privilege-
and-linux-capabilities)

~~~
justincormack
Mostly its not needed for ping but the config is not yet great in all distros,
I talked about this in
[https://docker.events.cube365.net/docker/dockercon/content/V...](https://docker.events.cube365.net/docker/dockercon/content/Videos/5xr3jskfFKk5jm6pL)

------
moon2
Hey, I recognize Offensi from the Liveoverflow video! [1]

I've been reading some Google VRP writeups [2] in order to inspire me in my
bug bounty journey. There are a few by Ezequiel Pereira and Offensi. There's
some really cool stuff, that go beyond XSS.

[1]
[https://www.youtube.com/watch?v=E-P9USG6kLs](https://www.youtube.com/watch?v=E-P9USG6kLs)
[2] [https://github.com/xdavidhu/awesome-google-vrp-
writeups](https://github.com/xdavidhu/awesome-google-vrp-writeups)

------
grep_it
I had never heard of a reverse shell before. That is a very neat and simple
little trick!

~~~
hnick
You might also like
[https://en.wikipedia.org/wiki/Shellcode](https://en.wikipedia.org/wiki/Shellcode)
if you haven't heard of that either.

Shellcodes are binary strings that you write to memory or execute another way
(like this plugin situation) through an exploit to actually initiate a shell
under that process' user id. This can be local on a setuid process for
escalation or remote (there are 2 other types in that article in addition to
the reverse shell).

~~~
grep_it
Thanks for the share!

------
fnord77
I was able to follow this until the point of escaping the container.

How did they get the user wouter created on the host and how did it have sudo
access?

~~~
LethargicStud
GCP supports remotely loading public ssh keys onto a box. They do this using
the metadata endpoint - this is (in theory) a trusted API endpoint available
to instances @ 169.254.169.254. IAM actually uses this - when you call other
services, client libs reach out to the metadata endpoint and get IAM creds to
send with each request.

Anyway, they have a local process that polls the metadata endpoint and adds
authorized keys on the host. So you can e.g. upload your public key in the web
UI, their metadata endpoint will serve it up on your instance, the guest agent
will poll the metadata endpoint and add your key to the authorized_keys file.

These folks spoofed a response from the metadata endpoint. They used
[https://github.com/kpcyrd/rshijack](https://github.com/kpcyrd/rshijack) to
inject their own hand-crafted public key, which the guest agent happily added
to authorized_keys (and created the wouter user).

They then ssh'd using their key:

> ssh -i id_rsa -o StrictHostKeyChecking=no wouter@localhost

> Once we accomplished that, we had full access to the host VM (Being able to
> execute commands as root through sudo).

Looks like they had passwordless sudo as well.

~~~
hnick
What's a mitigation for the spoofed packet? TLS or something of the sort?

~~~
paulfurtado
If the docker container was running with something other than --net=host, it
could have been avoided easily with standard networking concepts (route tables
with reverse path filtering or iptables rules), even if the attacker somehow
managed to get CAP_NET_ADMIN in the container the host network namespace still
would refuse the packets. Although, with --net=host you could actually add
iptables rules that match based on the cgroup and limit the IPs/ports allowed.
It'd also be possible to filter the container's syscalls with seccomp. I'm not
entirely sure why the container had CAP_NET_ADMIN at all, which is required
for tcpdump and the man in the middle. Also, using user namespaces would have
limited the attacker's abilities even if they had root in the container. A lot
of defense in depth techniques are possible here.

There's also: simply not leaving the gcp login backdoor open. We run our gcp
instances similar to ec2: on first boot we take the ssh keys from the metadata
service and lay them down and do not run the GCP agent, we have standard
config management + ldap for login after the first boot. This means that a
hacker gaining access to your GCP credentials can't gain a shell on an
existing instance trivially.

------
vbernat
> Most of the time, we were able to type fast enough to get a successful SSH
> login :)

I thought it would have been automated by a script. Either they are super fast
typers or the metadata server is quite slow!

~~~
paulannesley
Looks like the metadata service uses HTTP long polling (“ This request also
includes a timeout (timeout_sec=<TIME>)”), so I think they had a <TIME> second
head-start in racing against the real metadata response.

------
bobbydreamer
All this is interesting read knowing so many people are still using MySQL 5.7
. No announcement this week on new version support.

Will you ever be in MySQL 8

------
RMPR
This is excellent, I wonder how to reach such a level of understanding.

