Any privacy conscious company would not allow any random data, especially auth d...

icedchai · on March 21, 2019

How do you stop it and still have an effective development org? Services need to be debugged, so requests and responses need to be logged...

g45y45 · on March 21, 2019

Its pretty easy, you configure your logging library NOT to log the attribute, key/value pair, whatever containing the credential. If you can't modify it on the server side (which you can lazy bones), you tell your central logging system to mask it out before it is written to disk.

This isn't difficult or non-standard. If you are logging all client request/responses full take including auth creds, credit cards, SSN, etc, you are likely doing it wrong, and possibly violating some industry regulations.

laurent123456 · on March 21, 2019

At a company I worked for, if we logged any production data, we had to confirm there was no PII in there and no passwords or tokens, and very few people had access to these logs.

There's many layers of wrong if what FB did: carelessly logging production data, letting thousands of employee accessing these logs, and of all these people apparently none of them cared to mention there was a problem here, or if they did it was ignored by management. They don't have any excuse here.

dymk · on March 21, 2019

You implement your logging infrastructure with awareness of PII and other sensitive information. Whitelisting fields to log would nearly fix it, blacklisting fields would be a bare minimum.

crdrost · on March 21, 2019

This is a good question and I have been at multiple shops which had this bug induced by a "let's just log everything" accident somewhere in the codebase. It's very nice to think of logging as a sort of "aspect" or a middleware that gets deployed across the whole stack, but it's a bit of a mistake.

You have something like four options for fixing it:

1. Every request is responsible for its own logging. This is actually not a bad approach because state-altering requests really need to be logged whereas state-viewing requests are much more optional, they help to try and guess “what were they doing when they ran into this bug they’ve reported?” but mostly they just occupy database rows. The risk is that someone is in a rush and commits something which does not discharge its logging obligation. You can build a system which forces this if you want, “the router will dispatch to your function and one of your function's arguments will be a locally-stateful logger, and once you are finished I will check whether the logger has handled anything, and if not I will log an error. So you should always `$logger->noLoggingNecessary()` somewhere explicitly in the codebase and then if this is wrong it gets caught in code review more consistently.”

2. The sensitive data is used to generate a bearer token and this flow is outsourced to its own un-logged server. You explicitly use the bearer token to construct everything important about the user account in a step before the logging begins, then delete the bearer token from the rest of the request. This flow can actually get really slick: the bearer token can contain the user data, optionally encrypted, with a message authentication code to ensure the user didn't tamper with it: you can then hit a near-empty Redis instance (or a near-empty table) looking for revoked bearer tokens super-fast, since you probably don’t see too much session revocation. So, user data lookup actually becomes unbelievably cheap because it's mostly CPU bound with (check empty key/value store, MAC-or-decrypt, parse body, pass to the handler function).

3. The logging service becomes controller-aware: each controller specifies whether it is supposed to be logged and the logging service just respects that flag and is otherwise global. So it might log that the login controller was accessed, but it doesn't log anything else about the controller.

4. The logging service becomes message-model-aware. This one is actually kind of slick, too, it means that you describe declaratively what sorts of data types are present in the messages that are transmitted to and from the server: and the first thing you do when you get a request is to validate the request against the model you have declared for messages to that request's namespace. So you will have a `validate($model, $value)` function that takes some arbitrary JSON data and a model and returns a normalized version of that data; a natural extension to this traversal that you're already doing (either by returning two normalized results or calling the function with an extra `options={removeSensitiveData: true}` type of argument) will allow you to define in the message-model itself whether the property is sensitive and should never be logged.

throwawaymath · on March 21, 2019

I don't mean to engage in whataboutism here, but unless you're under the impression that no tech company is privacy conscious, what you're saying isn't true (though that'd be a reasonable impression, to be fair).

It's absolutely a security failure and it's not acceptable. But it's not a matter of "allowing" it to happen so much as, "Which vulnerability will we be caught by?" And it actually is a pretty normal vulnerability. For example, Apple[1], GitHub[2] and Twitter[3] have been vulnerable to this exact issue in recent memory.

I also don't mean to be defeatist. This kind of problem is preventable. But it's merely one dumb mistake in a universe of dumb mistakes that leads to serious security failures, all of which are easy to make. The most sophisticated and well-funded information security teams in the world - usually the FAANG teams - still miss things which look pretty silly in isolation.

At this scale being privacy conscious is necessary but insufficient. You can't realistically conclude anything about a company's dedication to privacy based on whether or not it was impacted by this kind of vulnerability. Making a corporate policy to hash passwords in the the database instead of storing them in plaintext is easy to codify, easy to implement and easy to verify. A corporate policy to never log authentication credentials is not nearly as well-defined, even if it's equally as important. That means more mental overhead, disagreement and uncertainty in preventing it. Ultimately, it also means more mistakes can be - and are - made.

________________________

1. https://darthnull.org/security/2014/03/10/cve-2014-1279-touc...

2. https://www.zdnet.com/article/github-says-bug-exposed-accoun...

3. https://arstechnica.com/information-technology/2018/05/twitt...