Hacker News new | past | comments | ask | show | jobs | submit login
Remote Code Execution on a Facebook server (scrt.ch)
877 points by phwd 6 months ago | hide | past | web | favorite | 192 comments

Hey, developer of Sentry here. We're submitting a patch that prevents showing any settings on this DEBUG page.

I'd like to mention that we never suggest running Sentry in DEBUG mode, nor do we document how to do this. Sentry does use Django, so it's pretty easy to put pieces together and figure out how to do it.

So while we can help obfuscate this specific issue, running in DEBUG mode always has the potential to surface insecure information due to the nature of it.

Our patch to suppress this information entirely can be found here: https://github.com/getsentry/sentry/pull/9516

This is why you (looking at frameworks) should never use a format that may contain code to store data, especially when the client has control over that data (even if signed). The same vulnerability has occurred in almost every language/framework that does this, including Rails and Java-based ones. Just use something like JSON, which completely avoids code execution vulnerabilities like this. Except of course for the early JavaScript JSON parsers that just used eval for parsing...

Yes, Django realized this 7 years ago [0], and JSON has been the default for the past 5 years[1], it's just that facebook was using an old version of Django, and probably not the default.

[0] https://groups.google.com/d/msg/django-developers/YwlZ9m9k1b...

[1] https://github.com/django/django/commit/b0ce6fe656873825271b...

Whoa, really? Django 1.6 has been out of extended support since April 2015, if I read https://www.djangoproject.com/download/ correctly...

We maintain our own patches. We want to upgrade but because the migration framework we use does not support newer versions of Django finding a good upgrade path has been suboptional. It's likely we're going to just make the migration framework compatible with newer Djangos (work is under way).

Even without the pickle-related vulnerability, exposing your secret key that is securing cookies seems pretty bad and likely to lead to other vulnerabilities, although they'd take longer to find.

And if the secret key were secure, the pickle use would not be vulnerable.

Still, multiple layers of security, yadda yadda, sure.

But this is beyond the pickle issue. I'm not sure I'm completely convinced you should not use pickle for browser cookies that are appropriately cryptographically verified. (although fwiw I believe Rails changed it's default cookie serialization to use json instead of the ruby equivalent to pickle which suffered from the same issues).

Isn’t it still an escalation of privileges? You’ve gone from having a proper session of system A (limited damage potentially) into code execution on machine Z.

yup, sounds right.

PHP has this problem too - never use PHP unserialize() to unserialize hostile (i.e. user-accessible) data. It won't end well for you. I think it's true for any sufficiently powerful serialization format...

Had, https://wiki.php.net/rfc/secure_unserialize

(but yes you still need to implement a safe unserialize white list)

I also don't understand why frameworks insist on returning debug data as part of the response. If you are working on something in a development cycle and need debug information, surely you have access to the server that's running in your console. I've always printed debug information there and then send the response. That prevents these kinds of leaks if the server is deployed with debug turned on.

1. Pretty formatting of stack traces and other info, including making use of CSS and Javascript. For an extreme example, consider Werkzeug which allows stack frames to be inspected interactively in the tendered stack trace.

2. There are many situations where the person who we want to see the stack trace and debug output does not have access to the console output. E.g. hosting the service internally for a QA team.

In general, it is an error often made by people who focus on security to think that security considerations always trump convenience. In fact, it is a trade off like any other.

And "something like JSON" could mean YAML, which had it's own share of RCE bugs. It's still better than these object deserialization bugs one can find in Java or Python Pickle, which seems to be even more permissive.

Python Pickle RCE is hardly even a bug - because it is meant to deserialize objects (including functions!) for a language that is almost completely dynamic. Indeed, "discovering" that your unpickling is vulnerable to remote code execution is not too far from discovering RCE in a plan to compress data to the Kolmogorov limit by allowing users to send you arbitrary x86 binaries.

This is a great concrete example why you should never run debug mode on a public server. Django can only do so much for redacting private info. This is also a great example of how insecure pickle is!

Luckily, Django provides checks to avoid this kind of leakage before hitting production with https://docs.djangoproject.com/en/2.1/howto/deployment/check...

And to be fair, the docs are littered with warnings about not using debug mode in production. Debug mode is open for.... debugging. not production.

wow thanks - just ran this on some production servers and caught some intersting stuff! had no clue this command existed.

They were running django 1.6 ....

I know this is going to get some jeering, but that's one nice thing about .Net's machine.config <deployment retail="true" />, you designate the machine itself as a non-development environment and tracing, debug output, and so on are disabled for all .Net/ASP.Net applications.

That might not work for all edge cases, but broadly there's a lot of machines which are only for non-development/production code, and a system-wide setting makes you a lot safer.

Funnily enough, it is mostly .NET applications running in production that I see stack-traces from these days.

If you even know something is a .Net app then the developer is likely terrible, so it is a self-selecting sample. You don't know most of the .Net MVC apps you interact with because there's no reason to know that.

Boy, getting hammered for poor phrasing. What's meant is if the stack (whatever it is) bleeds into the user experience, the dev has done a poor job.

This is really discriminatory and arrogant.

I used to feel the same way. It’s worth changing.

I think they mean if you can tell that it's a .NET app (e.g. because they left the X-Powered-By header on), not if it's a .NET app in general.

Facepalm. I've filled my dumb comment quota for the week.

Well, at least it was clarified. It got a couple upvotes, so I guess others misinterpreted it too. Sorry.

That's true for basically every stack

> If you even know something is a .Net app then the developer is likely terrible

For sake of argument, this is pretty easy to do just by looking at job postings.

Ever heard of Jon Skeet?

You can use env vars to do similarly for all the big frameworks (django, rails, etc)

Just a thought. This might actually be trivial to implement in a unix environment from the administration side of things. I'm not 100% sure but all child processes inherit the env vars from the parent correct? So setting `environment=production` high up in the process tree should make it available to all processes.

There's still chances of this getting overridden down the line and all apps have to conform to one style but at least it's possible?

Absolutely. But none of the major players are looking for it, and they all have unique ways of designating the machine that way.

There's no specific technological challenges, this is entirely political, getting half a dozen or more different projects with different priorities to check the same variable for the same purpose.

Plenty of production apps in my line of work are executed with `env -i` for consistency.

I don't know what the standard way is to deploy django or .net apps, but nodejs apps are often packaged into a docker container, or cloud function, that run independently, not on any particular machine that you designate as 'production'. So 'system-wide' settings don't really exist because there is no concept of a machine.

With nodejs in a docker contianer for example you specify whether the server is designated for production or not during build time, not when you run it.

I think you're a little over-confident in the Node community; I've seen a lot of Node apps deployed the same as any other (just tar'ed up and deployed via some kind of script or other deployment tool).

Seems a bit of the HN echo chamber; people here seem to think, because of the high quality stories and comments, that the entire world uses best practices for everything; containers, code reviews, versioning, test coverage, CI, automated deployments, etc. While in reality I think most places are not doing that. This is anecdotal ofcourse but most (large and small) companies I have worked with (as clients or partners) are in fact doing none of these things.

Quite true. I hardly see any of the best practices that keep being discussed here.

Small to middle size companies whose main business is totally unrelated to selling software just care that their stuff works somehow. And everything that IT does is a cost center.

I guess that's .NET Framework only - is there something similar for .NET Core?

Exactly.. i think you often see a warning about pickle when it comes to safety. For example, the current documentation has a big, red warning right at the top: https://docs.python.org/3/library/pickle.html

The Django project also has a lot of warnings about the pickle serializer: https://docs.djangoproject.com/en/2.1/topics/http/sessions/

I think maybe frameworks should stop calling it debug mode and start calling it "danger mode". Because clearly people aren't paying attention to what it actually does.

Facebook joins Patreon in the "why somebody should make sure our python web framework debug mode isn't enabled in prod" club.

Patreon's screw-up was a lot more embarassing though - they apparently left an actual Python shell exposed to the web for at least a week after someone warned them about it, and their entire user database was exfiltrated and posted on the net as a result.

Yep. Every company will face security issues; it's unavoidable. But what happened to Patreon should make people seriously question trusting them with your personal information or money. (They probably use a 3rd party payment processor who has much better security practices, but still. Also, that 3rd party doesn't do you much good if an attacker with control of your production web/application servers or CDNs is intercepting credit card form data before it's sent off.)

Facebook has had vulnerabilities and exposures, but nothing like that.

Some surprising takeaways for me; Facebook uses a Django app in their infrastructure. That app was in debug mode, revealing server secrets. That app was also configured to use the Pickle based session storage, leading to one of the few serious RCE vulnerabilities in Django.

I'll have to remember this next time I think an exploit scenario is too unlikely.

Facebook runs the largest deployment of Django in the world -- Instagram.


Nice job! I also really appreciate the lack of memes and very concise format of this blog post

Also the title. Clear and concise without being click-bait (such as Facebook RCE for fun and profit, all your Facebook belong to us, etc...)

Those kinds of titles are really tired, but clickbait is unfair. "X For Fun and Profit" can be found in g-files back to the late '80s, a time well before clicking.


The point is, "for fun and profit" is such an overused and utterly boring cliché. Meaningful titles are pleasant to read and shows that the writer has put some effort to bring clarity into what they're trying to convey. (As someone who sits on a major open source conference talk panel, I cringe when I see one of these clichés slapped into the title without much thought. I politely suggest to rephrase to convey more "signal" in the title.)

I'm in absolute agreement. Nerd-culture has been on autopilot for decades and needs to find its next level.

This is a really interesting topic to me, because I've felt a pull toward things like this myself and seen it for decades. For example, the top of the README in my Zephyros project[1] was very playfully done, and very well received too.

People are naturally more lighthearted and organic than the sterile software and documentation we tend to write, and we're more social too, wanting to connect with others even if it means through a simple in-joke or catch phrase.

[1] https://github.com/sdegutis/zephyros

I am not saying to not be human and take all the zest out of the writing.

FWIW, I suggest the book On Writing Well. William Zinsser has valuable advice on how we can still retain humanity in our writing without being dull and lackluster when dealing with dry and "sterile" topics like software.

Part of nerd culture finding it's next level will be we stop saying "next level".

Memes are the next level, it seems

An army of trolls that launch a fringe reactionary political movement which transcends the national boundary with postmodern "dank memes" seems to be the level we are at now. I wonder what's coming next...

As someone who attends open source conferences, thank you for your efforts to keep talk titles meaningful :)

(My pet peeve being "$thing 2: electric boogaloo", which just seems to be filling up space with nonsense words - at least "for fun and profit" makes gramatical sense...)

Don’t forget “How I learned to stop worrying and love [thing]”.

Just stop with these. They aren’t funny or original.

shish2k: electric boogaloo

Perhaps you should start a panel -

MODCON: Revenge of the Boring Moderator

You should write an article titled “For Fun and Profit Considered Harmful.”

The internet IS serious business. Combining it so much with popular culture just dilutes understanding and destroys standards.

Dijkstra framed it best:


Lack of memes - I knew I was missing something.

Completely agree - engaging and concise

totally agreed. my only pet peeve was the DD.MM.YYYY date format at the end of the post. Who does that?

Outside of the US, very few places use MM/DD/YY. https://en.wikipedia.org/wiki/Date_format_by_country

Im in the US - and I exclusively use YYYY.MM.DD since its the only thing that sorts properly in a filesystem

At least Switzerland/Germany/Austria. I think most of Europe actually. The domain is .ch, so that will be the Swiss locale.

In my opinion there are two date formats that make sense: DMY (decreasing granularity) or YMD (increasing granularity). I tend to favor the latter because of ISO. MDY definitely doesn't make any sense at all.

> Quoting the Sentry documentation, system.secret-key is “a secret key used for session signing. If this becomes compromised it’s important to regenerate it as otherwise its much easier to hijack user sessions.“; wow, it looks like it’s a sort of Django SECRET-KEY override!

One wonders why that is even there. Was Django's own session code not good enough?

`system.secret-key` is a value from the options system that gets propagated to different parts. It actually just sets the `SECRET_KEY` value in the settings file for all intends and purposes which has different consumers.

This abstraction exists because some options can be set from the admin UIO.

So, summarily we have:

1. Enabling debug mode in production

2. Running publicly-accessible app with publicly-accessible crash screens without any monitoring system noticing it's happening

3. Relying on auto-cleaning in debug facilities to sanitize security information (never works)

4. Using over-powered serialization protocol which allows for code execution for storing user-accessible data

5. Thinking that merely signing content prevents abuse of (4)

Not bad.

Note that 5 is true though. Using a secret to sign or encrypt a cookie does normally work, and it's a common practice. Usually the impact of the secret leaking is that you can impersonate anyone, not that you can run arbitrary code, but the practice of using a session secret is common and not a bad practice nor broken inherently.

3 as well I think is unfair. That isn't something facebook implemented or is relying on; it's just the default behavior of django's debug stack. That's entirely on django to do that and lull people into a false sense of security in some cases (though it also probably helps in many cases too, so it might be okay).

The real issues are 1 (leaving debug mode on by accident), 2 (not noticing 1), and 4 (which is a django issue I think, not a facebook one).

Yes, and for #4, Django switched to JSON instead of pickle by default 5 years ago.

> Using a secret to sign or encrypt a cookie does normally work

If the secret key is not compromised. So you have to ask yourself - why you send to the user some info that is so sensitive that needs signing? Why not just keep this info to yourself and send an opaque ID instead? Yes, I know there are issues with it too, but at least this issue is not there.

> 3 as well I think is unfair. That isn't something facebook implemented

I didn't say it's Facebook fault - though ultimately, of course, it is as much as if you run certain software on your servers and do not configure it properly, it's your fault. So there's a fail in having security key in a place that's so easily accessible that debug mode dumps it without even asking. Not necessarily a direct Facebook fail, but a fail.

See JWT. You can make stateless apps easier without worrying about a trip to the database to grab session info


If you want disposability with the ID method you need some sort of datastore or cache to contain session info

This is a good example why you need regular pentests in big companies. Everyone (should) know that using pickle is insecure and everyone (should) know that django debug should be False in production. Still, if the numbers get large enough someone will miss something.

Facebook does pentests all the time, but they don't find everything. This is why you should also run a bug bounty program.

The use of Pickle isn't uncommon for session cookies in Python apps, from what I've seen. Pickle isn't really a problem unless you end up unserializing untrusted data... which a sign+encrypt scheme is supposed to ensure doesn't happen. You just can't leak the secret key or you're in trouble.

Though, there's no excuse for leaving Django debug on in production.

I'd say it's a bad idea anyway - why you need to trust the user with anything that needs pickle (as opposed to much more primitive format) to unserialize? If you ever have a reason for non-opaque-id cookies at all, it should be very simple. If you stuff very complex objects that require native serialization into user-side storage, it's probably bad idea regardless of security implications.

Great article. Big frameworks like Django are nice to get an app started quickly, but if you use them long-term it really pays to study how they work under the hood, read some source code, etc. Although "don't run debug mode on production" is probably in the first page of the tutorial :)

Congrats on the bounty payment! That has got to be one of the most concise blog posts too.

The fact that the machine has a hostname "*.thefacebook.com" doesn't imply that it also runs software of the "Facebook" social media software. So not sure how much impact this exploit would have had.

thefacebook.com was their original domain from way way back in the day. The fact this system has a legacy DNS entry might indicate its age.

Here's mail.thefacebook.com:


I don't think anyone will ever know how much impact, but it implies that Facebook is not good at security.

No one talks about the working parts of security. Like, in this case, having the application on a separate box, and having that box separated by vlans from important things.

I don't think the solution is to put things you don't care about behind vlans and having applications on separate boxes. This box is/was an attack vector. It was holding secrets. Secrets provide other attack vectors.

It is - it is defense in depth. The application should be secure, and there should be steps to secure it. Especially because a secure application can protect a less secure network setup.

But on the other hand, the server or the network should not trust the application to be secure at all. The infrastructural setup should assume the application to be an <exec($_POST['do_me']);>. And that's why the application should be isolated on a system level, on a network level, and as much as possible. That's the good part I mean - the part that worked.

I have bad news for you. No one is good at security.

This is great news for me, I work in security :)

But the reason why nobody is good at security might be that security is always last on the budget list.

He got $5k for an arbitrary remote execution bug? What a rip-off.

He received $5,000 for remote code execution on a server in a separate VLAN and containing no user data. Barring another vulnerability to commit something like server-side request forgery across the VLAN, that significantly reduces the actual severity of the vulnerability. You wouldn't be able to chain this execution to impact user accounts or other Facebook servers. You also wouldn't be able to exfiltrate sensitive user data. Realistically, what you'd have is a very privileged server for phishing attacks.

The severity of security vulnerabilities should be judged on their context, not on their classification or category.

I remember the guy who broke into a Facebook server then used the information stored on that server to get onto a more important server with user data. He either didn't receive a bounty or got a reduced bounty because the server he accessed was deemed unimportant and chaining hacks is against the rules or something.

Think it was this one: http://exfiltrated.com/research-Instagram-RCE.php (actually it's even worse than I thought with Facebook threatening legal action against him)

That's not at all the whole story.

Do you know the rest? That seems pretty damning. The researcher followed the bug bounty rules to the letter and Facebook applied a different set of rules, then intimidated his employer. It's also duplicitous for them to claim his bug as inconsequential, and his access as a huge privacy violation.

If Facebook wanted to discourage pivoting access, they should have clearly stated so as Google and Microsoft have.

The researcher did not follow the rules to the letter. They found an in-scope RCE vulnerability, then dumped and saved the filesystem contents and used them, after reporting the bug, to compromise Facebook's AWS configuration. You would get in trouble for doing that even on a contracted penetration test, let alone a bug bounty. To make matters worse, the researcher did all this in a fit of pique about not being awarded a higher bounty.

There's a whole thread on HN about this.

Here's Alex Stamos' writeup:


The researcher's blog contains a reasonable rebuttal of Alex's write-up:

>In the case of Facebook, the rules can be seen at https://www.facebook.com/whitehat. There is no rule which states what to do when a vulnerability is discovered, but there are several which imply that my testing was valid. These include: Report a bug that could ... enable access to a system within our infrastructure Remote Code Execution Privilege Escalation Provisioning Errors

We both agree the initial RCE was in-scope. The researcher reported the RCE immediately, then reported the privilege escalation by weak user passwords, then reported the API key escalation.

Moreover, you're factually incorrect about the "fit of pique." The researcher stopped poking around immediately after receiving the email; he simply continued before Facebook contacted him. When Alex said,

Please be mindful that taking additional action after locating a bug violates our bounty policy.

There was no reference whatsoever to that policy in the official Bug Bounty guidelines. Alex fibbed. Indeed, how is a canonically "in-scope" privilege escalation supposed to work if researchers are to stop at the first bug?

Lastly, if Facebook's idea of defense-in-depth is a master API key to all Instagram S3 buckets, accessible from a simple diagnostic panel, any bug bounty program is merely window-dressing.

Look, this is Hacker News, I know you can come up with 1,000 different random arguments for why Facebook was wrong and this guy was right. What I can tell you with confidence is that if you did what this researcher did as a contract pentester, you'd get your firm permanently fired from Facebook (or virtually any other major site you might have been testing), and would probably be immediately fired yourself. What he did was very much outside the norm for testing sites.

Remember, he didn't simply pivot. He back-pocketed credentials, didn't tell anyone he had them, and used them later to hit out-of-scope systems. Nobody is OK with that.

Makes sense; I don't know the pentesting industry too well, nor the norms of bug bounty programs. Certainly his sort of behavior would give me the heebie-jeebies if I ran security at a firm like that. Thinking more on it, I think he was edging black hat and used the rules lawyering as an excuse for pushing past an ethical limit.

You've changed my view.

I think this might be like the 3rd or 4th time I've managed to accomplish that the whole time I've been yelling at clouds on this site. Thanks!

So you basically admit that pentesting is useless, because they have to follow the rules which the blackhats won't even think twice about.

>He back-pocketed credentials, didn't tell anyone he had them

It should be assumed that any data on the pwned server is now accessible to attacker, just like in any real world scenario.

So you basically admit that pentesting is useless, because they have to follow the rules which the blackhats won't even think twice about.

It's pentesting not penthieving. That's like saying military training is useless because they don't actually kill people.

> So you basically admit that pentesting is useless, because they have to follow the rules which the blackhats won't even think twice about.

He didn't say that at all. Your thesis here is that preventative discovery has no utility if it does not perfectly simulate real world conditions. That's a pretty extreme position; I don't think you'll sell many people on it.

> It should be assumed that any data on the pwned server is now accessible to attacker, just like in any real world scenario.

I think you'll have a hard time finding companies who are okay with security professionals taking sensitive data for themselves just because they're reporting a vulnerability.

Not to mention the part where he engaged the researchers boss who was completely not a part of this research.

If you read Stamos' post more carefully, he explains exactly why he did that, and given that his alternative was to engage Facebook legal directly, the researcher should probably be glad that's what ended up happening.

I agree he deserves a higher payout, but it was on a segmented server that seems like it didn't have any customer data or other important data. If he were a real attacker, who knows where else he could've pivoted from this server (perhaps it wasn't quite as segmented as Facebook thought). But assuming it truly was pretty isolated, compromising it probably wouldn't have caused any damage.

Regardless, I feel like he deserves at least $15,000 for this, since it is full RCE.

Where are you pulling that number out of?

Honestly, just a gut feeling. (So, my ass.) I just feel like RCE should warrant somewhere around that much by default, even if the overall impact isn't high in a particular case. Some combination of potential impact and nature of the vulnerability/exposure should go into reward calculations, in my opinion.

But is RCE worth 15 because he got 5? If he'd gotten 1, would you say he deserved 5? Or 3?

I think I would've said something close to $15k either way. $5k isn't insulting (in my opinion) given the context. $1k would definitely be insulting.

Probably the same rather random place where $5000 got pulled out of in the first place

Please explain how you would go about valuing this bug.

He should have gone to the black market, better yet sat on it. How long did it take Facebook to come forward with its user privacy violations?

What would you guess the going rate on the black market would be for an RCE on some error-log collection box that happens to be in use by Facebook? My guess is "less than $5000".

The only code they executed was "sleep(100)" If they started dumping env variables/snooping around they would have done a lot more than $5000 worth of damage.

How are you quantifying that?

He deserves a higher payout, but 1) telling him to be unethical rather than reporting it responsibly and accepting a not-insignificant reward is stupid, and 2) there's almost no chance this would've garnered more than $5,000 on any black market. This is not something that would grant a cybercriminal access to sensitive or profitable information, unless Facebook is wrong in their assessment of how segmented the server is... and even then, much more work would be required for the attacker (more than an average vuln buyer is probably capable of).

Not to mention this way he can actually report it on his taxes and get the money plopped straight into his bank account, whereas black market is a hell of a lot more uncertain if you even get paid.

You didn't read the article.

"09.08.2018 20:10 CEST : a 5000$ bounty is awarded – the server was in a separate VLAN with no users’ specific data."

From the HN guidelines:

Please don't insinuate that someone hasn't read an article.

To be fair, it's a hard rule to follow sometimes when someone makes it painfully clear that they didn't actually read the article.

It's not a hard rule to follow. You can just say 'The article says X'.

Kiro 6 months ago [flagged]

Not applicable when it's obvious that the poster hasn't read the article, like in this case.

It's applicable precisely in that case.

> scanning an IP range that belongs to Facebook (

ping -4 facebook.com

results in Maybe, author used some other way to get those IPs. Can anyone throw a light on this?

Facebook, like many large internet companies, buys their IP blocks outright, so they show up under their AS number [0]. Facebook seems to have 3 AS numbers [1,2,3] and that IP appears in [3]

[0] https://en.wikipedia.org/wiki/Autonomous_system_%28Internet%...

[1] https://bgp.he.net/AS32934

[2] https://bgp.he.net/AS63293

[3] https://bgp.he.net/AS54115

You're going to get different results for the same queries depending on where you are in terms of DNS queries. This is not a FB thing, either. Lots of places will "steer" you towards one frontend, POP, datacenter, or whatever, in order to get you off the public Internet and onto their fabric sooner.

Run the same query from a bunch of sites with different connectivity in the same town and you might get different answers depending on who's got peering agreements with who. Then spin that query again using other places in the world and you can get even more variety.

That is, assuming the site in question has worked out a way to vary the A/AAAA records for their actual domain (as opposed to the hosts within it, like "www"). Some of them might just point it at a single POP/frontend/whatever, and when that one goes down for whatever reason... things get interesting. Not that I would know anything about that.

Using whois and BGP tools, I suspect.

Edit: https://bgp.he.net/search?search%5Bsearch%5D=facebook&commit...

Also see https://bgp.he.net/net/

They're in the ARIN database. Just search for FACEBOOK-CORP


A side note. By scanning, he probably just means a Shodan search:


Login required

Plenty of organisations (especially one of Facebook's size) tend to have their own autonomous system numbers, pretty trivial to get the ranges from BGP announcements for any given ASN.

Yes, I'd like to know this as well.

Wow, a fix in <24 hours, that's pretty impressive.

I mean the fix is toggling a single environmental variable from True to False, on a system that isn't normally accessed by customers, so the risk is really small in rolling out the change.

Sometimes you’re lucky if a company reads your report in this time but of course I would expect and we generally see much better from the likes of Facebook etc

You're right, being able to read, triage and act on something in such a massive system is quiet the accomplishment.

Facebook deploy updates in their prod 10-20 or even more, times a day

They could deploy continuously, but deploying to PROD doesn't include minutes/hours/days/weeks/months of investigation, testing, documentation, verification...

Hopefully they also changed the compromised key.

It took them more than 10 days actually... They just shutdown the instance until they could find a solution.

30.07.2018 00:00 CEST : initial disclosure with every details.

09.08.2018 18:10 CEST : patch in place.

Great article...

One suggestion: You use "However" quite a bit. Not sure if you intended to show your thought process as it evolved, but that is the feeling I got.

So they were pickeling EXECUTABLE objects in the session and storing it in the users browser cookie? Interesting. Nice find.

There is no such thing as non-executable pickle. Pickle is not safe and must not be used for anything, ever.

Wow. I have to look into it. That sounds wildy unsafe

> Pickle is a Python module used to serialize data, but contrarily to JSON or YAML, it allows to serialize objects properties but also methods. In most of the cases, this is not a problem, but one can also serialize an object with code in the __reduce__() method, which is called when the object is unpickled.

Why does it run __reduce__?

if it exists, __reduce__ is run when pickling the object and returns code that will be run when unpickling. It allows to completely customize how the object is re-created on the other side, which might be needed e.g. when the type is defined in a native extension (at least the docs name this use case).

Other than fixing the Django source code , is there any OS level mitigation techniques that can detect and prevent such security vulnerabilities?

I am thinking something like selinux, docker or chroot - a bit like internal firewall for Django (or any other webapp).

Any suggestions on the links to latest best practices?

Of course. You can lock the process down so that it can't make unexpected system calls. If you deploy in a modern container environment, you can also use container networking to drastically limit what the application environment can talk to on the network. Though it's a less potent mitigation than seccomp and container isolation (and one you get for free once you deploy in a container), you can also limit filesystem access. If you run the application under a non-privileged uid, these mitigations combined can raise the bar somewhat for privilege escalation and pivoting after compromise.

Of course, in a sense, Facebook appears to have accomplished something simpler simply by sacrificing an instance to this application and putting it on a lonely isolated VLAN.

The other responses to your question are pretty weird, since it's obvious that there are things you can do to mitigate the possibility of your Django program literally calling execve or whatever.

It seems like several people are discussing whether you can use the OS to prevent the sentry instance from disclosing its own data. You're talking about using the OS to prevent pivoting from that sentry instance to compromise something else.

That said, I'd think it's pretty obvious that you can only address the second part, so it's what's interesting. If your application speaks HTTP, then the OS can't do that much to keep it from doing stupid things over HTTP.

Ah, if that's the case, this thread makes more sense. Thanks!

> You can lock the process down so that it can't make unexpected system calls

Huh, this isn't something I've ever come across before.

Off the top of my head, I guess it would be possible on Windows using a kernel mode driver, but that's pretty hardcore, and really easy to get wrong.

I know you can easily audit syscalls on Linux with auditd, but haven't seen preventing them before.

Is this an option on both Linux and Windows, and is it commonly used? Interested to know more!

The Google search you're looking for is [seccomp docker].

(Docker is the most typical way this gets deployed but not the only way.)

Have a read of openbsd's pledge & unveil

There's nothing to be done at the OS level, because the OS doesn't know which things are secret and which things aren't. You can sandbox the Django process (and to a certain extent Facebook were doing this at an even higher level, by having the whole server on a separate VLAN that was (supposedly) away from the important stuff) but that only does so much because the process needs to have enough access to do whatever it was intended to do.

Ultimately the only thing you can do is not run debug mode in production, not use insecure serialization formats and not leak secret keys. If you want to be a bit more pre-emptive about it, avoid language-specific serialization formats especially in dynamic languages ("serialization" that executes arbitrary code seems more common in those) and use taint checking or, better, a type system that can avoid leaking secrets (you likely need rank-2 types to be able to have values that you can use in certain contexts but never access outside them, a la http://okmij.org/ftp/Computation/resource-aware-prog/region-... ). It sounds like Django already has some level of taint-like functionality but this plugin/addon (sentry) then didn't use it properly. So, multiple failures interacting to create this vulnerability - I guess we should take that as a sign of progress compared to the days of basic buffer overflows?

I am hoping a security subsystem that detects call to external programs (such as sleep in this case) , log them and raise alarm if not in a whitelist.

ThreatStack definitely does this, utilizing the OS-exposed auditing frameworks. There are probably open source alternatives as well, but I rely extensively on TS for server-level threat detection.

You don't need to make calls to external programs to make good use of an RCE vulnerability like this.

It seems unlikely. The application has a feature to use a secret key to secure everything. The same application then prints out the secret key to anyone that asks. Your OS can't do much about that.

Maybe if you tell your OS "never let any data containing this string leave the machine" you can kind of mitigate this, but it's unlikely to work. The HTTP response is probably compressed. Someone probably base64-encoded the thing and stuck it inside some JSON, which is then base64-encoded again.

Ultimately, it's about managing complexity. Django and the extension punted: Django says it will print anything and everything it has access to, and the extension has a documentation caveat about how bad that would be. One could imagine an API that tries to be resistant to this sort of thing; when the extension is initialized, it deletes the key from the environment dictionary (only partially possible), stores it in a private attribute, and only provides public EncryptAndSignCookie and DecryptAndVerifyCookie methods. This will be better than a documentation caveat, maybe, but the truly ambitious debug mode will probably get the key and print it out.

(I would also point out that I'm not a fan of storing state in encrypted+signed cookies, if only because there is no way to revoke a stolen cookie without revoking every cookie ever. If you have the state on the server to store a revocation list, you might as well just store everything there and never have this problem.)

To be clear, this isn't an issue with the Django source, as much it was a miss configured server - having debug mode on enabled the stack trace to be leaked which yielded the secret key. Debug mode shouldn't be used in production, and django probably shouldn't be responsible for snipping every possible value out of debug traces.

Depends on your threat model. For all we know chroot, docker, and SELinux could have been in active usage on this machine.

But Facebook may view a compromise on their edge network and potentially one of their trusted servers as serious even if actual damage on that server itself is limited.

Doesn't seem to be a Django problem, as the author says

> wow, it looks like it’s a sort of Django SECRET-KEY override!

Sentry uses its "own" secret key, so Django doesn't know it should be stripped from the stacktrace.

A simple `DEBUG = False` in `settings.py` would fix the bug. You don't have Django run in debug mode in production.

> I found a Sentry service hosted on

I do not remember on top of my head now but I think there are few scanning software to find all the running apps on a remote machine. If you are aware then please share

I've used nmap to check for open ports. Not 1:1 for running applications, but an easy way to check for mistakes when making a machine public.

I've also seen plenty of references to shodan.io, but I have no experience with it.

Shoot. It did not strike me. I thought it was good only for web apps and not in-background running processes like Pickle (a binary protocol )

Great hunt!

Great, concise article! But the real question is how much did they pay you for the bounty??

Article has a timeline specifying $5000.

I can get remote code execution on an Amazon Server (AWS). Do I get a cookie?

So, this was simply taking advantage of a crash-prone webapp running on a debug-enabled Django instance using Pickle session serialization, and more specifically this was only possible because _Django didn't redact the stored secret key used to sign serialized inputs out of the crashdump information!_

Did the author tell Django about this yet, or is this a (possibly unintentional) 0-day?

Besides the above interestingness, the morals of this story I get are

- Stay persistent and leave your scanners running; you never know what new things will turn up.

- Crashdumps _are_ interesting

- Yay, $5,000!

- Middleware and frameworks will always clash in useful and interesting ways?

Django is pretty serious about warning you of the risks of this [1]. I think in this case the key was in some third party options variable so the bug is there. The sentry thing sounds like an internal bug to Facebook rather than a django issue.

[1] https://docs.djangoproject.com/en/2.1/topics/http/sessions/#...

Maybe I misunderstood the article, but I thought it said that Django does strip this information, and the Sentry app went out of its way to store the secret key in the SENTRY_OPTIONS payload. This custom, non-Django code effectively circumvents Django’s protections, making the bug the responsibility of Sentry, not Django.

Ah. You're right, I completely got this bit wrong. Thanks.

Wait, so that means Sentry kind of has a vulnerability.

Sentry does secure `DEBUG` mode. It's for development only and must not be set by customers. That said, we will try to make it less easy to set by accident: https://github.com/getsentry/sentry/pull/9516

If I remember correctly, Django only shows these information if left in debug mode. Needless to say, this should never be used in production.

It was a Sentry specific "secret-key" that was overriding Django's.

In contrast, I submitted a bad vulnerability in Facebook’s password reset feature yesterday that lets attacker’s send password reset PIN numbers to email addresses the user doesn’t necessarily control. The security team said to works as designed so they’re not going to fix it.

Basically if someone requests a password reset on your account then the PIN number gets sent to all email addresses associated with your account, not only the primary one. This is an issue because many people have one locked down email address for things like registering accounts, but others they use to talk with people, delegate to their staff, use with CRM apps, etc. (But you still need your everyday email addresses linked to your account so that people can find you by email, see your email on your profile, etc.)

The FB security team just says that delegating your email address isn’t secure so it’s not their problem. Like no shit, that’s why it’s a vulnerability. But for some reason the FB security team thinks it’s a good idea to let anyone immediately bypass 2FA and hijack your account.

This does sound like a reasonable feature request to me: the "contact information" and "account recovery" use cases are different, and it's not obvious that an email listed for one should be automatically used for the other. They could have you choose for each email whether it's allowed to be used for account recovery.

As an analogous example, what if you did the "forgot my password" flow and they sent a recovery code over SMS to any listed phone number on your profile, sent a twitter DM to your listed twitter account, and sent postal mail to any postal address on your profile? (All at once, without waiting or confirmation.) That would expand the attack surface significantly, and it would be pretty easy to steal someone's account by stealing their postal mail. This case is similar: a secure email for account recovery and a less secure email for contact info, but Facebook forces you to use both for account recovery.

On the other hand, some people intentionally want multiple account recovery emails so they're less likely to lose access to their account. I imagine Facebook's hesitation with this feature is that it's hard to clearly communicate the distinction between the two use cases, and they want to bias toward simplicity.

I tend to agree w/ the FB security team here. Don't list email addresses owned by an adversary in your account. :-/

> Don't list email addresses owned by an adversary in your account.

I mean if you're delegating your email address, your staff aren't adversaries. But they shouldn't be able to, say, drain all your retirement accounts either.

Just because I want people who search for alex.krupp@gmail.com to be able to find my Facebook account doesn't mean I want password reset requests sent there. It wouldn't be at all unreasonable to send them there if that was explained in the UI, but just immediately sending a password reset pin to a non-primary email address without any warning is crazy. At least wait a few days if the user doesn't take any action after it's sent to their primary address.

Even if they changed it so that you could select where to send the password reset, rather than having it go by default to all accounts, that still means anyone with delegate access to the email address could go and request a reset on your behalf.

> that still means anyone with delegate access to the email address could go and request a reset on your behalf.

That's how it should work. You shouldn't have to remember which email address you signed up with in order to request a password reset. But that's fine as long as the pin number only gets sent to an account that's locked down to a degree that's appropriate relative to the assets under protection.

I would agree with Facebook's assessment that this isn't a vulnerability. If a user intentionally adds an untrusted email address as an official email address for their account, Facebook can't really detect or prevent that.

Using a delegated email address to sign up for things sounds like the vulnerability then, not Facebook's handling of it.

That scenario obviously wouldn't be Facebook's fault, because any email address you sign up with is always going to be your primary email address unless you change it later. What I'm talking about is signing up with a non-delegated email address, and then adding a secondary email address for the purpose of either displaying publicly on my profile or so that my friends can find me via email in search.

In other words, cases where I had no intention of ever allowing a password reset request to be sent there, knew fully well that it wouldn't be safe, but did so anyway because Facebook provided zero indication that they would do this and it's not at all intuitive that they would.

I side with facebook on this one. It isn't like anyone can arbitrarily enter an email address association, or re-enter an associated email during the reset process.

So, in order to get in there, you would need to know the account you're trying to get into, and have access to an associated account in order to lift the PIN from there.

It isn't to say that it cannot be done, but it does sound like a reasonable action.

It does emphasize how your (Facebook account's) security is only as strong as the weakest link - there was a story not too long ago about someone's accounts getting hacked because they had an old college e-mail address still set as recovery e-mail address, allowing an attacker to circumvent gmail's 2FA and everything.

If you have two factor authentication enabled, this will allow you to reset the password, but won't they still need the other factor? SMS? Presuming the other factor is not also an email. So the primary exploit might be stolen phone where the sms and email go to same device?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact