Is there no penalty to them for basically lying to everyone in their initial announcement? Doing it once can be explained as they sincerely thought their original number was correct, but they have a history of doing this and there's no chance I believe now that they aren't doing this on purpose.
I can totally imagine someone on the instagram photo pinch-to-zoom team saying "Whoa - if someone re-logs in while a photo is partially zoomed, our metrics capture the password. Thats bad!". They find thats super-rare, and only impacts 10k users.
Later, other logfiles are grepped for passwords (possibly even by doing a fulltext search across all fields and blobs for common passwords), and more and more are found.
Then someone says "but some of our data is in fields base64 encoded - did you try grepping for base64 encoded plaintext", and even more are found, etc.
This is painfully accurate. I've encountered exactly this kind of thing, including having to repeat search efforts.
You learn about an issue, do some investigation, get an estimate, then inform the public. If Facebook had waited for months to complete the investigation and get these numbers, do you think people would have accepted that? Probably not. So they publish what they know when they think they've got it figured out. But probably this launches up a team that goes looking for other bad logging. Then a while later they find some more errors and the numbers are revised up.
This seems expected.
What I was trying to say was that I can understand companies making this mistake once. Maybe even twice. But once a company has an established pattern of doing this, they lose the benefit of the doubt and we have to assume they're doing it deliberately. Even more-so if they do scummy things like try to bury their "it's 1000x worse" update with other high-profile news.
Facebook has a long history of this kind of thing now, every single new privacy violation of theirs must be met with the expectation that it's actually 1000x worse than they say. I'm not aware of any other companies with a history of doing this multiple times.
Not that I'm giving Facebook excuses, but maybe something was overlooked before acquisition. Then again, Facebook is breaching security protocols left and right.
1) As a software engineer I can't imagine how such errors could possibly have entered production code accidentally, especially after code review. If precise details of these errors are released I am open to have my mind changed.
2) Even if it did, I can't see how it would take facebook's tens of thousands of engineers 7 years to find this "bug".
They also would have to run it in such a way after every code update. Since every code update may introduce a new nook or cranny, that would slow down development too much.
I wouldn’t even try doing that, but instead have a two-pronged defense:
- code review of every single log statement in the code base by individuals whose _only_ job it is to prevent such problems.
- permanent checking of every single line logged for a thousand or so common passwords.
At Facebook’s scale, the second probably has lots of false positives, so I would do that at time of logging, when the location doing that logging is known, so that an alarm only gets triggered if a single log statement repeatedly logs passwords.
Someone who works as a Dev on Facebook should know that HTTP requests can contain sensitive data, session cookies, passwords, credit card information, etc.
But they had some code that did this:
But then someone on a different team changed how passwords were sent so excludeSensitive was broken.
2. You create a system that detects high entropy content before being logged.
3. You don't want to drop all high entropy content, so you create some rules about where in requests to look for high entropy content.
4. Something about the request structure changes, breaking your log filtering.
5. There is nothing that notices the drop in the amount of content filtered out of logs.
There's oodles of ways this could happen. I'd wager that more than half of all businesses that have a website that handles passwords has logged passwords in plaintext somewhere.
Unfortunately, as numerous password breaches have shown, most passwords aren't that high entropy.
Like i'd love to agree, but frankly, i'm not surprised. I feel like I've seen crazier things happen on production environments...
The logs contained user credentials, and they hadn't noticed. I pointed it out to the CTO.
My engineer: "Should we log the body of api requests?"
Me: "Yes, of course!....... waaaait a second. No."
The result was manually white listing fields that SHOULD be logged on each end point. It's a pain compared to "just log everything" or even a black list, but it's far safer.
You can't imagine? Really? Error occurs, request that caused error gets logged.
One thing I can think of is have some production test accounts that are regularly used and have a unique password. Then have an automated task that periodically greps the production logs for the password to see if you have a log leak.
Any other approaches?
Real users will find routes through your app that automated tests never consider. For example, what if the users 1-year login cookie expires as they're on step 3 of a 4 step "change your avatar" process. That's highly unlikely, and probably not even tested. Yet it might well work (due to good modular design), but also log inappropriate data.
Better hope you have na exhaustive list of all the production logs.
It also checks for base64 encoded versions of the data (with various alignments). There is also an alert if data is unscannable (due to compression or encryption).
The check is done at logs ingestion points, but also on outgoing http requests from webdriver automated tests (since some third party scripts might be shipping the data off to someone else's server).
The scanned for words are:
* the top 100 passwords, excluding things used as test strings.
* A few company specific passwords.
* A few testing passwords
* A few random strings which are also deliberately inserted into source code files in places that should never (by design) pass between client and server.
Afterall, that's what they interview on. Best Practices. Ethics.
Doesn't Facebook have blogs on things like efficient servers, algorithms and such. They are obviously at the forefront of technology (or know about it). Why aren't they implementing these things?
It seems to me that the more your company is worth, the less you get to bend rules, all to chase more money. HR an lawyers will take care of the rest.
Should you store plaintext password on the server side?
Facebook announced this before the Muller report was released.
The Facebook security post was updated April 18, 2019 at 7AM PT .
The Muller report was released "Thursday morning, shortly after 11 am [EDT]" 
Disclosure: I work for a big tech company but not Facebook. All opinions are my own.
It's probably "safer" for Facebook to make the announcement with the guaranteed disruption of the Mueller report release afterwards, than it is for them to release it after the report is out, since they can't predict how long people will be focused on the report. I can imagine a conversation along the lines of what I just wrote happening at Facebook, and that's really a distirubing level of coordination to hide a serious incident.
If the client-side hash is strong enough it means that your plaintext password leak becoming a hashed password leaks at least protects the user from reuse on other sites.
If you are concerned about transmitting passwords, use SRP (https://en.wikipedia.org/wiki/Secure_Remote_Password_protoco...) instead of inventing your own bad crypto.
What do you mean by "once" and "twice"? Are you comparing splitting the hashing cost across client and server vs. adding extra hashing cost on client?