I think this is one of the best sets of responses to a security incident I've seen:
1. Disclose the incident ASAP, even before all facts are known. The disclosure doesn't need to have any action items, and in this case, didn't
2. Add more details as investigation proceeds, even before it fully finishes to help clarify scope
The proactive communication and transparency could have downsides (causing undue panic), but I think these posts have presented a sense that they have it mostly under control. Of course, this is only possible because they, unlike some other companies, probably do have a good security team who caught this early.
I expect the next (or perhaps the 4th) post will be a fuller post-mortem from after the incident. This series of disclosures has given me more confidence in Stackoverflow than I had before!
I've had the pleasure of working in an org led by Mary Ferguson (she's on the byline of these posts) at a previous employer. She's an excellent, honest, and caring leader who knows how to deal with people and improve the many interpersonal structures & systems. Bravo to her for continuing to kick ass in her current role.
I’m also impressed by the response. It also helps that the affected number of users is small, though. Imagine the same reporting, but the number of affected users was 1M. In other words, maybe good reporting can only get you so far.
One major advantage SO has stems from its userbase: we are much less likely to panic than the typical user of a system not focused on software development.
I wish I could agree with you but the details provided in the post do not help us understand what happened.
We only get very superficial information by one of the rare companies that could typically contribute and help the community by sharing what really went wrong.
Right now, I'm in a situation that forces me to speculate (in addition to reading all the speculation comments below) on whether or not I could do the same mistake than SO did, and that terribly saddens me.
At the bottom of the post it says: "We will provide more public information after our investigation cycle concludes."
Might not be fair to judge the overall response yet. If full details of the problem were expected upon initial disclosure, then they wouldn't be able to do prompt disclosure.
Can't help but wonder - how obvious did the bug have to be that it was deployed on May 5th and an attacker gained access the very same day? That's pretty crazy - and there was only one attacker who poked around for 6 days without the bug being subsequently noticed by anyone else? If I were to speculate, it seems like the work of an insider or someone close to the project.
Of course, this is based on nothing but wild guesswork and it could be just coincidence. But in their shoes, I would really want to make sure no one else gained access and did anything more to hide their tracks, since it seemed like such a quickly found exploit.
> The intrusion originated on May 5 when a build deployed to the development tier for stackoverflow.com contained a bug, which allowed an attacker to log in to our development tier as well as escalate their access on the production version of stackoverflow.com.
> Between May 5 and May 11, the intruder contained their activities to exploration
To me, that reads "we deployed a bug on May 5th that allowed intrusion, and it was exploited the same day." If that's not what they meant - I certainly recant my criticism - but "on May 5th when a build deployed contained a bug" is difficult for me to interpret differently.
Exactly. If it wasn't deployed on May 5th, instead of writing, "when a build deployed to the development tier", I'd expect something along the lines of, "The intrusion originated on May 5 in the development tier".
That was my thought as well. The attacker may have been probing the server for that particular bug (and probably many others) once or twice a day and so detected it almost immediately.
I am impressed with the way Stack Overflow is handling this, but I would appreciate details about how a bug in the development tier gave an attacker the ability to escalate their access in production. Does the development tier have production keys? Did the attacker learn about some other bug through access to the development tier? Is the development tier sharing a privileged private network with the production tier? Hopefully the security community can learn from this incident to improve best practices.
This is obviously speculation, but development does not require direct access for a pivot to production. CI-Systems might be reachable from dev and I often see CI-Systems that are less well secured than prod. Pivoting from an owned CI to a prod system is often comparatively easy: inject code in the next build, the artifact is implicitly trusted (hey, it’s from a known-good source) and bang! Prod owned. Or developers expose their ssh keys via agents to dev systems, the same key works for prod and you might be in.
Which is why best practice is now to start with CI in production - it needs limited access to dev, and the access from development environments into it can be literally just enough to collect the latest artifact.
It's not clear to me that "development tier" means "a development environment." I interpreted it as the standard StackOverflow.com infrastructure that developers use -- this is contextualized a few paragraphs later when they mention "we maintain separate infrastructure and networks for clients of our Teams, Business, and Enterprise products", which implies that the production environment that was compromised was something other than those three.
(Even if I'm right, though, I definitely agree that the wording is very easy to misinterpret and they should clarify the post to explain what the "development tier" is.)
I find it concerning that access to the dev tier
made it possible to escalate into production.
Should dev not be sandboxed from production?
"The intrusion originated on May 5 when a build deployed to the development tier for stackoverflow.com contained a bug, which allowed an attacker to log in to our development tier as well as escalate their access on the production version of stackoverflow.com.
"
> contained a bug, which allowed an attacker to log in to our development tier as well as escalate their access on the production version of stackoverflow.com.
Why are the credentials for production granted to development systems?
> As part of our security procedures to protect sensitive customer data, we maintain separate infrastructure and networks for clients of our Teams, Business, and Enterprise products and we have found no evidence that those systems or customer data were accessed.
I sure wish more companies did this. Such peace of mind.
Could also be that the attacker made a set of random API calls that returned private data, e.g. to test that his permissions are working, and received whatever that API returned at that time without targeting any specific users (or caring about that specific data).
As it happens, this incident was discovered by a curious user who had a script watching for new accounts with staff privileges, who brought the attacker's account it to the company's attention (in chat) because it looked unusual.
Stack Overflow seem to be following a very responsible incident response procedure, perhaps instituted by their new VP of Engineering (the author of the OP). It is nice to see.
Wow! Note to self: maybe having a slack bot or something that yells really loudly whenever someone gains or loses admin privileges would be a good idea.
Stack Exchange data Explorer (SEDE) has been around for quite awhile. The user table contains flags for this. Who mods are has never been hidden. The user in question drew attention by having 1 rep and having CM level privilege on 173 sites. They also engaged in a pattern of behavior that didn't fit with that level of privilege.
Sorry, by "staff" did you mean company staff (employee) or did you mean site moderator?
Also, what User table are you looking at? The one I'm looking at only has these fields: Reputation, CreationDate, DisplayName, LastAccessDate, WebsiteUrl, Location, AboutMe, Views, UpVotes, DownVotes, ProfileImageUrl, EmailHash, AccountId
I know that would likely be the main entry point and who found it. I reviewed that chat logs which are public and they just said they found it. As far as I know this isn't exactly hard information to find as you could just parse the moderator listings if you needed to.
This was their post for context: "API reports is_employee as False, but the user is not on mod lists, so... ?"
I see. The moderator listings are updated once a day I think. This seemed to happen more quickly than that, so unless it was just plain luck on when it was updated, I don't think it's that.
I updated my post with their comment, they may not be directly going through SEDE as there is an API that can be used. I know there is a bot called charcoal that sniffs out bad posts etc and auto reports them for example that uses the API.
There are IIRC a few minor exceptions, but those are for smaller communities just out of Area 51 AFAIK and those still apply to users that have shown extensive involvement in those communities. That said those users still have more than 1 rep in all cases AFAIK.
1. Disclose the incident ASAP, even before all facts are known. The disclosure doesn't need to have any action items, and in this case, didn't
2. Add more details as investigation proceeds, even before it fully finishes to help clarify scope
The proactive communication and transparency could have downsides (causing undue panic), but I think these posts have presented a sense that they have it mostly under control. Of course, this is only possible because they, unlike some other companies, probably do have a good security team who caught this early.
I expect the next (or perhaps the 4th) post will be a fuller post-mortem from after the incident. This series of disclosures has given me more confidence in Stackoverflow than I had before!