"Due to a bug, passwords were written to an internal log before completing the hashing process. We found this error ourselves, removed the passwords, and are implementing plans to prevent this bug from happening again."
Genuine question—how would this bug be produced in the first place?
My (limited) experience makes me think that cleartext passwords are somehow hard coded to be logged, perhaps through error logging or a feature that’s intended for testing during development.
I personally would not code a backend that allows passwords (or any sensitive strings) to be logged in any shape or form in production, so it seems a little weird to me that this mistake is considered a “bug” instead of a very careless mistake. Am I missing something?
Let's say you log requests and the POST body parameters that are sent along with them. Oops, forgot to explicitly blank out and fields known to contain passwords. Now they're saved in cleartext in the logs every time the user logs in.
We made this mistake - the trick is determining what fields are sensitive, what are sensitive enough that they should be censored but included in the log, and the rest of the crud.
It turns out that this is non-trivial - when censoring how do you indicate that something was changed, while keeping the output to a minimum? blank/"null" was rejected because it would mask other problems, and "* THIS FIELD HAS BEEN REDACTED DUE TO SENSITIVE INFORMATION *" was rejected for being "too long". Currently we use "XXXXX", which has caused some intern head scratching but is otherwise fine.
Easy, you have a framework that validates & sanitizes all your parameters, don't allow any non-declared parameter, and make something like "can_be_logged" a mandatory attribute, then only log those & audit them.
Wouldn't that make it easier for someone that has access to hashed passwords in the case of a database leak? They would just have to submit the username and the hashed password (which they now have).
In this case client side will have our algorithm (i.e in JavaScript) + private key which we will use to hash the password. If this is the case I could not see any different between giving hacker password or hash-password with algorithm and key.
So sure you don't want to log everything in Prod, but maybe you do in Dev. In that case, a bug would be to push the dev logging configuration to Prod. Oops.
If you have the clear text password at any point in your codebase, then there is no full-proof way to prevent to log it unintentionally as the result of a bug. You just have to be extra-careful ( code review, minimal amount code manipulating it, prod-like testing environment with log scanner, ...)
Not exactly log files, but I once noticed a C coredump contained raw passwords in strings that had been free'd but not explicitly overwritten. Similar to how Facebook "deletes" files by merely marking them as deleted, "free" works the same way in C, the memory isn't actually overwritten until something else writes onto it.
Aren't coredumps static copies of the memory state at time of termination - usually unplanned? So not really the same thing as having ongoing access to a program's memory; I can't really see a debugging process that would involve viewing memory in a dynamic way, whereas it's somewhat of a concern if coredumps (an important debugging tool) reveal plaintext passwords.
In the past, I've seen logs monitored for high-entropy strings that could be API keys or passwords. However, in a NoSQL/UUID-using environment, this could be really hard to implement.
Perhaps implement some type of “password canary” - some type of test account(s) with known high-entropy passwords.
Have an automated system send periodic login requests (or any other requests which contain sensitive information that shouldn’t be logged) for this account, and have another system which searches log files for the password.
If it’s ever found, you know something is leaking.
I think the joke is that both Github and Twitter are famous for being built on Rails (although Twitter just-as-famously required a move off of Rails in order to scale)
There was a great keynote about this at this year's RailsConf
The argument was essentially that if Twitter had instead chosen a language that was more natively inclined toward scalability, they would have necessarily hired 10x as many engineers and they would not have succeeded at building the product that people use to tell each other what bar they are at, simply, which ultimately was the braindead simple thing (that you can probably scale just fine in any language) which drove their success... it wasn't any great technological feat that made Twitter successful, it was pretty much just the "Bro" app that people loved.
(The talk was called "Rails Doesn't Scale" and will be available soon, but RailsConf2018 evidently hasn't posted any of the videos yet.)
Where is it showing a password there? I assume this has been fixed because I can't duplicate it on my machine and the screenshot posted on that article doesn't seem to show any plaintext passwords.
Which makes me wonder, is this really a bug or did someone make it look like a "bug"?
Also they say they found no evidence of anyone stealing these passwords, but I wouldn't be surprised if some companies decide not to look too hard just so they can later say "they found no evidence of such an act."
So best practice would be that the cleartext password is never sent to the server, so they could never log it even accidentally. That means the hashing needs to be done client side, probably with JavaScript. Is there any safe way to do that?
nah, that just makes the "hashed password" the equivalent of the cleartext password. Whatever it is your client sends to the server for auth is the thing that needs to be protected. If the client sends a "hashed password", that's just... the password. Which now needs to be protected. Since if someone has it, they can just send it to the server for auth.
But you can do fancy cryptographic things where the server never sees the password and it's still secure. like the entire field of public key cryptography, diffie-hellman key exchange, etc.
You can store things as follows. Store the salted hashed password with its salt server side. When the user wants to login send them the salt and a random salt. Client side hashes the password + salt then hashes that hash with the random value. What am I missing? Probably something since this is something I rolled my own version of when I was a teenager, but it's not immediately obvious to me.
Server stores hashed-password, hash-salt, and random-salt.
Server sends hash-salt, and random-salt to client.
Client uses user password and hash-salt to generate hashed-password.
Client hashes hashed-password using random-salt.
Client sends hashed-hashed-password to server.
Server grabs stored hashed-password and hashes used stored random-salt to check for match against client's hashed-hashed-password.
--
So the only thing this actually does is not share the text of the password that the user typed to the server. But at a technical level, now the hashed-password is the new "password".
Let's say the database is compromised. The attacker has the hashed-password. They make a login request to fetch the random-salt, hash their stolen hashed-password with it and send that to the server. Owned.
Along with being more complicated with no real gain, this also takes the hashing away from the server-side, which is a big negative, as the time that it takes to hash a password is a control method used to mitigate attacks.
Just send the plain-text password over HTTPS and hash it the moment it hits the server. There's no issue with this technique (as long as it's not logged!)
This is true. It does prevent an attacker from reusing a password they recover from your logs. But as others have pointed out a DB breach means all your users are compromised. Thank you.
No, random-salt is not stored permanently but generated at random by the server every time a client is about to authenticate. Alternatively a timestamp would be just as good.
The random-salt has to be stored, at least for the length of the authentication request, because the server needs to generate the same hashed-hashed-password as the client to be able to match and authenticate.
> Alternatively a timestamp would be just as good.
I don't see how that would work at all.
I also don't see the need to go any further in detail about how this scheme will not be better than the current best practices.
A timestamp would work the same way it works in (e.g.) Google Authenticator.
Incidentally, I really resent how it's impossible to have a discussion of anything at all related to cryptography on HN without somebody bringing up the "never roll your own crypto" dogma.
If the ideas being proposed are bad, please point out why, don't just imply that everyone except you is too stupid to understand.
Edit:
I just reread your comment above and you did a perfectly good job of explaining why it's a bad idea, I must have misunderstood first time round: it's a bad idea because now the login credentials get compromised in a database leak instead of a MITM, which is both more common in practice and affects more users at once.
Sorry for saying you didn't explain why it is a bad idea.
The problem with this scheme is that if database storing the salted hashed passwords is compromised, then an attacker can easily log in as any user. In a more standard setup, the attacker needs to send a valid password to log in, which is hard to reverse from the salted hashed password stored server-side. In this scheme, the attacker no longer needs to know the password, as they can just make a client that sends the compromised server hash salted with the random salt requested by the server.
> Store the salted hashed password with its salt server side.
So now _this_ is effectively just "the password", that needs to be protected, even though you're storing it server side.
If an attacker has it, they can go through the protocol and auth -- I think, right? So you prob shouldn't be storing it in the db.
All you're doing is shuffling around what "the password" that needs to be protected is, still just a variation of the original attempt in top comment in this thread.
The reason we store hashed passwords in the db instead of the password itself is of course because the hashed password itself is not enough to successfully complete the "auth protocol", without knowing the original password. So it means a database breach does not actually expose info that could be used to successfully complete an auth. (unless they can reverse the hash).
I _think_ in your "protocol" the "original" password actually becomes irelevant, the "salted hashed password with it's salt" is all you need, so now _this_ is the thing you've got to protect, but you're storing it in the db, so now we don't have the benefits of not storing the password in the db that we were hashing passwords in the first place for!
I guess your protocol protects against an eavesdropper better, but we generally just count on https/ssl for that, that's not what password hashing is for in the first place of course. Which is what the OP is about, that _plaintext_ rather than hashed passwords ended up stored and visible, when they never should have been either.
Cryto protocols are hard. We're unlikely to come up with a successful new one.
But now your password is effectively just HMAC(user_salt, pwd), and the server has to store it in plaintext to be able to verify. Since plaintext passwords in the db are bad, this solution doesn't sound too attractive, unless you were suggesting something else.
"Since if someone has it, they can just send it to the server for auth" unless it's only good for a few moments (the form you type it into constantly polling for a new nonce).
Not really... it's not that simple. You could use the time of day as a seed for the hash, for example. There are tradeoffs to be made, which is partly why they don't do it, but the story isn't as simple as "the hash becomes the password".
And how do you propose to do that when the clocks arent synchronized? Clock drift is exceptionally common. Not everyone runs ntp or ptp. Probably even fewer use ptp. Desktop/laptop clients it's typically configurable on whether or not to attempt clock sync, and ive never seen where the level of synchronization is documented for PCs. High precision ptp usually requires very expensive hardware, not something to be expected of home users or even a startup depending on the industry.
The point was you could do similarly here. Just have a margin of like 30 seconds (or whatever). I never said you have to do this to nanosecond precision.
Only if the server only keeps around the hash -- which is why I said there are trade-offs to be made. The point I was making was that the mere fact that you're sending a hash does not trigger the "hash-becomes-password" issue; that's a result of secondary constraints imposed on the problem.
Makes sense, and then you're getting into something akin to SSH key pairs, and I know from experience that many users can't manage that especially across multiple client devices.
There are probably ways to make it reasonable UX, but they probably require built-in browser (or other client) support.
Someone in another part of this thread mentioned the "Web Authentication API" for browsers, which I'm not familiar with, but is possibly trying to approach this?
It ties in with the credential management API (A way to have the browser store login credentials for a site, a much less heuristic based approach than autocomplete on forms) and basic principle is generate a key pair, pass back public key to be sent to server during registration. On login generate a challenge value for the client to sign. I don't think iirc the JS code ever sees the private key, only the browser sees it.
I believe you could use a construction like HMAC to make it so that during authentication (not password setting events) you don't actually send the token. But if someone is already able to MITM your requests, what are the odds they can't just poison the JavaScript to send it in plaintext back to them?
I think their goal is to still use https, but stop anything important from leaking if a sloppy server-side developer logs the full requests after TLS decryption (as Twitter did here).
No, there fundamentally isn't, because you can't trust the client to actually be hashing a password. If all the server sees is a hash, the hash effectively is the password. If it's stolen, a hacker can alter their client to send the stolen hash to the server.
If a hash is salted with a domain it won't be use-able on other websites. You should additionally hash the hash on the server, and if you store the client hashes, you can update the salts on next-sign in. A better question is why clients should be sending unhashed passwords to servers in the first place.
https://medium.com/the-coming-golden-age/internet-www-securi...
This discussion is only relevant with an attacker that can break tls. A hash that such an attacker couldn't reverse might be slow on old phones so there is a tradeoff.
Also, hashed passwords shouldn't be logged either.
>That means the hashing needs to be done client side, probably with JavaScript. Is there any safe way to do that?
No [0,1...n]. Note that these articles are about encryption, but the arguments against javascript encryption apply to hashing as well.
Also consider that no one logs this stuff accidentally to begin with. If the entity controlling the server and writing the code wants to log the passwords, they can rewrite their own javascript just as well as they can whatever is on the backend. There's nothing to be done about people undermining their own code.
> consider that no one logs this stuff accidentally to begin with
It's possible. You create an object called Foo (possibly a serialized data like a protobuf, but any object), and you recursively dump the whole thing to the debug log. Then you realize, oh, when I access a Foo, sometimes I need this one field out of the User object (like their first name), so I'll just add a copy of User within Foo. You don't consider that the User object also contains the password as one of its members. Boom, you are now accidentally logging passwords.
Any user object on the server should only ever have the password when it is going through the process of setting or checking the password, and this should be coming from the client and not stored. So, your case of logging the user would only be bad at one of those times. Otherwise like in the case of a stored user you should just have a hashed password and a salt in the user object.
Creating a User object that holds a password (much less a password in plaintext) seems next level stupid to begin with, but fair enough, I guess it could happen.
> Also consider that no one logs this stuff accidentally to begin with.
It can happen if requests are logged in middleware, and the endpoint developer doesn't know about it. It's still an extremely rookie mistake though, regardless of whether it was done accidentally or on purpose.
As others have stated, you'd just be changing the secret from <password> to H(<password>). The better solution is using asymmetric cryptography to perform a challenge-response test. E.g. the user sets a public key on sign up and to login they must decrypt a nonce encrypted to them.
That would transfer it from something you know (a password) to something you have (a device with SSL cert installed) which are meant to protect against different problems.
Hmm why should passwords (hashed or not) be stored in logs though? I don’t see a reason for doing that. You could unset it (and/or other sensitive data) before dumping them into logs.
Wouldn't it be better to never even send the password to the server, but instead performing a challenge-response procedure? Does HTTP have no such feature built in?
Being simplistic, perhaps an automated test with a known password on account creation or login (e.g. "dolphins") and then a search for the value on all generated logs.
For people that know more about web security than I: Is there a reason it isn't good practice to hash the password client side so that the backends only ever see the hashed password and there is no chance for such mistakes?
Realize the point of hashing the password is to make sure the thing users send to you is different than the thing you store. You'll still have to hash the hashes again on your end, otherwise anyone who gets accessed to your stored passwords could use them to login.
In particular, the point is to make it so that the thing you store can't actually be used to authenticate -- only to verify. So if you're doing it right, the client can't just send the hash, because that wouldn't actually authenticate them.
But at least, with salt, it wouldn't be applicable to other sites, just one. Better to just never reuse a password though. Honestly sites should just standardize on a password changing protocol, that will go a long way towards making passwords actually disposable.
I don't think a password changing protocol would help make passwords disposable. Making people change passwords often will result in people reusing more passwords.
No the point is for password manager. The password manager would regularly reset all the password.... until someone accesses your password manager and locks you out of everything!
Ultimately, what the client sends to the server to get a successful authentication _is_ effectively the password (whether that's reflected in the UI or not). So if you hash the password on the client side but not on the server, it's almost as bad as saving clear text passwords on the server.
You could hash it twice (once on the server once on the client) I suppose, but I'm not entirely sure what the benefit of that would be.
A benefit would be that under the sort of circumstance in the OP, assuming sites salted the input passwords, the hashes would need reversing in order to acquire the password and so reuse issues could be obviated. But I don't think that's really worth it when password managers are around.
I'm imagining we have a system where a client signs, and timestamps, a hash that's sent meaning old hashes wouldn't be accepted and reducing hash "replay" possibilities ... but now I'm amateurishly trying to design a crypto scheme ... never a good idea.
Assuming that you are referring to browsers as client here. One simple reason is that the client side data can always be manipulated so it does not really makes any difference. It might just give a false sense of safety but does not changes much.
In case we are talking about multi-tier applications where probably LDAP or AD is used to store the credentials then the back end is the one responsible for doing the hashing.
I can't think of a good reason not to hash on the client side (in addition to doing a further hash on the server side -- you don't want the hash stored on the server to be able to be used to log in, in case the database of hashed passwords is leaked). The only thing a bit trickier is configuring the work factor so that it can be done in a reasonable amount of time on all devices that the user is likely to use.
Ideally all users would change their passwords to something completely different in the event of a leak. But realistically this just doesn't happen -- some users refuse to change their passwords, and others just change one character. If only the client-side hash is leaked rather than the raw password, you can greatly mitigate the damage by just changing the salt at the next login.
If you don’t have control on the client, it’s a bad idea: Your suggestion means the password would be the hash itself, and it wouldn’t be necessary for an attacker to know the password.
For one, you expose your hashing strategy. Not that security by obscurity is the goal; but there's no real benefit. Not logging the password is the better mitigation strategy.
We schedule log reviews just like we schedule backup tests. (Similar stuff gets caught during normal troubleshooting, but reviews are more comprehensive.)
It only takes one debug statement leaking to prod - it has to be a process, not an event.
Create a user with an extremely unusual password and create a script that logs them in once an hour. Use another script to grep the logs for this unusual password, and if it appears fire an alert.
Security reviews are important but we should be able to automate detection of basic security failures like this.
It would also be a good idea to search for the hashed version of that user’s password. It’s really bad to leak the unencrypted password when it comes in as a param, but it’s only marginally better to leak the hashed version.
This only works if you automate every possible code path. If you're logging passwords during some obscure error in the login flow then an automated login very likely won't catch it.
Log review is done for every single project at my workplace too (Walmart Labs). So I don't think this is a novel idea. And it does not stop there. Our workplace has a security risk and compliance review process which includes reviewing configuration files, data on disk, data flowing between nodes, log files, GitHub repositories, and many other artifacts to ensure that no sensitive data is being leaked anywhere.
Any company that deals with credit card data has to be very very sure that no sensitive data is written in clear anywhere. Even while in memory, the data needs to be hashed and the cleartext data erased as soon as possible. Per what I have heard from friends and colleagues, the other popular companies like Amazon, Twitter, Netflix, etc. also have similar processes.
It's novel to me; never worked anywhere that required high level PCI compliance or that scheduled log reviews. Adhoc log review, sure. I think it's a fantastic idea regardless of PCI compliance obligations.
We just realised the software I'm working on has written RSA private keys in the logs for years. Granted, it was at debug level and only when using a rarely-used functionnality, but still.
We also do log reviews, but 99% of the time they simply complain about the volume rather than the contents.
Do you enable debug logging in production? In our setup we log at info and above by default, but then have a config setting that lets us switch to debug logging on the fly (without a service restart).
This keeps our log volume down, while letting us troubleshoot when we need it. This also gives us an isolated time of increased logging that can be specifically audited for sensitive information.
(We caught ourselves doing it 4-5 months back, and went through _everything_ checking... Only random accident that brought it to the attention of anyone who bothered to question it too... Two separate instances by different devs of 'if (DEBUG_LEVEL = 3){ }' instead of == 3 - both missed by code reviews too...)
I know it's irrational but I really dislike Yoda notation. Every time I encounter one while reading code I have to take a small pause to understand them, I don't know why. My brain just doesn't like them. I don't think I'm the only one either, I've seen a few coding styles in the wild that explicitly disallow them.
Furthermore any decent modern compiler will warn you and ask to add an extra set of parens around assignments in conditions so I don't really think it's worth it anymore. And of course it won't save you if you're comparing two variables (while the warning will).
I don't think "Yoda notation" is good advice. How do you prevent mistakes like the following with Yoda notation?
if ( level = DEBUGLEVEL )
When both sides of the equality sign are variables, the assignment will succeed. Following Yoda notation provides a false sense of security in this case.
As an experienced programmer I have written if-statements so many times in life that I never ever, even by mistake, type:
if (a = b)
I always type:
if (a == b)
by muscle memory. It has become a second nature. Unless of course where I really mean it, like:
One way to not write any bugs is to not write any code.
If you must write code, errors follow, and “defence in depth” is applicable. Use an editor that serves you well, use compiler flags, use your linter, and consider Yoda Notation, which catches classes of errors, but yes, not every error.
These kinds of issues are excellent commercials for why the strictness of a language like F# (or OCaml, Haskell, etc), is such a powerful tool for correctness:
1) Outside of initialization, assignment is done with the `<-` operator, so you're only potentially confused in the opposite direction (assignments incorrectly being boolean comparisons).
2) Return types and inputs are strictly validated so an accidental assignment (returning a `void`/`unit` type), would not compile as the function expects a bool.
3) Immutable by default, so even if #1 and #2 were somehow compromised the compiler would still halt and complain that you were trying to write to something unwriteable
Any of the above would terminally prevent compilation, much less hitting a code review or getting into production... Correctness from the ground up prevents whole categories of bugs at the cost of enforcing coding discipline :)
Note these are only constant pointers. Your data is still mutable if the underlying data structure is mutable, (e.g. HashMap). Haven't used Java in a few years, but I made religious use of final, even in locals and params.
I don't really see how unless you've never actually read imperative code before; either way you need to read both sides of the comparison to gauge what is being compared. I'm dyslexic and don't write my comparisons that way and still found it easy enough to read those examples at a glance.
But ultimately, even if you do find it harder to parse (for whatever reason(s)) that would only be a training thing. After a few days / weeks of writing your comparisons like that I'm sure you'll find is more jarring to read it the other way around. Like all arguments regarding coding styles, what makes the most difference is simply what you're used to reading and writing rather than actual code layout. (I say this as someone who's programmed in well over a dozen different languages over something like 30 years - you just get used to reading different coding styles after a few weeks of using it)
Often when I glance over code to understand what it is doing I don't really care about values. When scanning from left to right it is easier when the left side contains the variable names.
Also I just find it unnatural if I read it out loud. It is called Yoda for a reason.
But again, not of those problems you've described are unteachable. Source code itself doesn't read like how one would structure a paragraph for human consumption. But us programmers learn to parse source code because we read and write it frequently enough to learn to parse it. Just like how one might learn a human language by living and speaking in countries that speak that language.
If you've ever spent more than 5 minutes listening to arguments and counterarguments regarding Python whitespace vs C-style braces - or whether the C-style brace should append your statement for sit on its own line - then you'd quickly see that all these arguments about coding styles are really just personal preference based on what that particular developer is most used to (or pure aesthetics on what looks prettiest - that that's just a different angle of the same debate). Ultimately you were trained to read
if (variable == value)
and thus equally you can train yourself to read
if (value == variable)
All the reasons in the world you can't or shouldn't are just excuses to avoid retraining yourself. That's not to say I think everyone should write Yoda-style code - that's purely a matter of personal preference. But my point is arguing your preference as some tangible issue about legibility is dishonest to yourself and every other programmer.
There are always assumptions being made, no matter what you do. But "uppercase -> constant" is such a generic and cross-platform convention that it should always be followed. This code should never have passed code review for this glitch alone.
Rust has an interesting take on that, it's not a syntax error to write "if a = 1 { ... }" but it'll fail to compile because the expression "a = true" returns nothing while "if" expects a boolean so it generates a type check error.
Of course a consequence of that is that you can't chained affectations (a = b = c) but it's probably a good compromise.
Well in Rust ypu couldn't have = return the new value in general anyway, because it's been moved. So even if chained assignment did work, it'd only work for copy types, and really, a feature that only saves half a dozen characters, on a fraction of the assignments you do, that can only be used a fraction of the time, doesn't seem worth it at all.
Unless you follow a zero warning policy they are almost useless. If you have a warning that should be ignored add a pragma disable to that file. Or disable that type of warning if it's too spammy for your project.
I've been working on authentication stuff for the last two weeks and the answer is "more than you'd like".
But luckily it's something we cover in code reviews and the logging mechanism has a "sensitive data filter" that defaults to on (and has an alert on it being off in production.)
From the email I received:
"During the course of regular auditing, GitHub discovered that a recently introduced bug exposed a small number of users’ passwords to our internal logging system, including yours. We have corrected this, but you'll need to reset your password to regain access to your account."
"[We] are implementing plans to prevent this bug from happening again" sure makes it sound like this bug is still happening. Should we wait a couple of days before changing passwords? Will it end up in this log right now, just like the old one?
That sounds more like "We're adding a more thorough testing and code-review process for our password systems to prevent developers from accidentally logging unhashed passwords in the future".
No, it sounds like a reasonable bugfixing strategy. Identify the bug, identify the fastest way to resolve it, then once it's fixed figure out how to ensure it never happens again, and what to do if it does.
I think you read "prevent this bug from happening again" to mean "prevent this particular problem from happening one more time", while the blogpost probably means something like "prevent this class of bug from occurring in the future"
A couple of steps you can take to reduce the chances of accidentally putting sensitive information in a log.
1. Make a list of all sensitive information that the test users in your test environment will be giving to your application.
As part of your test procedure, search all logs for that information. This can be as simple as having a text file with all the sensitive information, and doing a 'grep -F -f sensitive.txt' on all your log files.
2. If you are working in some sort of object oriented language, make separate classes for sensitive data, such as a class Password to hold passwords, a class CardNum to hold credit card numbers, and so on.
Make the method that whatever you use for logging uses to convert objects to strings return a placeholder string for these objects, such as "Password", "Credit Card Number", and so on.
Probably better than 2 is to use actual privilege separation between sensitive data and as much of your service as possible. Handle passwords and password changes in a separate microservice which exchanges it quickly for a session cookie, so that the bulk of your application logic doesn't have passwords in memory at all. For credit card numbers, do something like what Stripe does where one API endpoint exchanges the card number for a token and every other endpoint just needs the token, which means that Stripe can route that one URL to services they pay extra attention to.
This assumes that passwords and credit card numbers are more sensitive than session cookies or tokens, which is usually true because people tend to share passwords across sites and definitely share credit card numbers across sites. So the risk of logging cookies/tokens is much lower. Also, you can revoke all cookies and token much more easily than you can force everyone to change passwords or credit card numbers.
While good advice, your proposal is not necessarily better than #2 because #2 is something that happens automatically once the password hits your object model.
If rather than a naked string you have a Password class with literally no way to extract the plain text password then you can be positive that any code that uses it will never accidentally log the password.
In contrast, if you rely on microservice tokenization you can still accidentally log your password before your tokenization happens, just like people are accidentally logging passwords before they are bcrypted.
Both proposals have a problem of logging raw requests or raw service calls (outside of your object model).
Where plausible it would probably be best for microservice calls to only take already bcrypted passwords, and error out if it detects one that isn't, so there is zero chance of accidentally logging a plain text password when calling a microservice.
Again, an actual object (e.g. HashedPassword) shines here, because your code can automatically detect bad values the instant it hits your object model, and refuse to properly log or give access to anything that looks like it isn't already hashed.
You have to get the password into the object model first, though—and your object model can simply not contain a Password type in this process (and only in the password-handling one). You shouldn't pass the password on as a naked string and then drop it—you should prevent the password from getting past the auth service at all.
I think the risk of logging raw passwords in the auth service model is lower because logging in a password-specific microservice is an intuitively dangerous thing to do, so both code authors and code reviewers will pay heightened attention to it. Meanwhile, "log all requests" is a common thing to want to do and will raise fewer alarms in a primarily-business-logic service. (In fact, another usually reasonable thing to do is "log all requests that don't parse properly and return 500...")
Observation: security really does happen in low-level code quality like this.
This is a hard notion to accept because it has to be built into the organization. No silver bullet, no consultant, no five-step methodology. Just rigorous software engineering up and down the stack, and care, and attention to detail.
You could also grep logs as part of automated testing and system monitoring. Checking for known passwords for test accounts known to be freshly created, logging in, etc.
Do you know of an elegant way to do this when working with protobufs? Ideally, mark a field 'password', and the generated class' __str__ equivalent returns ""
We do this extensively. A field can be marked as "redacted", and then interceptors can do things like:
1. (most pertinent in this discussion) The logging framework can redact fields before emitting log entries.
2. Endpoints can redact the data unless the client explicitly requests (and is allowed to receive) unredacted data.
3. Serialization mechanisms (e.g. Gson) can be configured to redact data before serializing. (Again, probably can't always do this, but can make that the default for safety.)
It's also very straightforward to hook up as a Java annotation that does the same things.
I highly recommend using a password manager. I finally bit the bullet and started using 1Password a few weeks ago, and I haven't looked back since. It's just so much better than having to remember a thousand different passwords.
Besides securely managing passwords, you can also use a password manager to secure your digital legacy. 1Password has a feature where you can print out "emergency kit" sheets that has the information required to access your password vault. I printed out two of these sheets and gave them to trusted family members in sealed envelopes. In the event that I become incapacitated, they will be able to access my accounts.
Same thing here. I ditched the online password managers a few years back for a similar setup and it's been just about as good as lastpass was, with the added benefit of being stored locally.
Does anyone have a recommendation for a good keepass client for iOS? Is MiniKeePass still the best option? I've been wanting to switch to KeePassXC + something for iOS for a while but I'm not sure what the best way to go is.
Not GP, but KeePass user: I store my KeePass database on a small thumb drive (SanDisk Cruzer Fit), together with a copy of the KeePass executable. If I absolutely need to decrypt my password database on someone else's machine I can take the "secure" software from the USB and hope for the best. The USB also stores a copy of Truecrypt and a large Truecrypt container with backups of my encrypted private keys (PGP, SSH).
Yeah, totally with you — don't trust devices you (or your employer) doesn't own. I'm borderline still where I trust my employer's devices with my personal passwords sometimes, but even that seems a bit iffy.
I'm going to plug https://bitwarden.com/ since nobody else has yet (I'm not affiliated). Open source, clients for everything, free for personal use, imports from other managers. I had been using a GPG encrypted text file for a long time, then later KeePass (and variants) on dropbox. I switched to Bitwarden a while ago and have been very happy with the whole thing.
Everyone should be using a password manager. You can't really trust the average joe to be able to make secure passwords for the potentially dozens or hundreds of sites and services, and even if they do, they probably use just one secure password for everything.
I just wish there was more seamless support for apps to use 1Password to paste in passwords. There are still sites that prevent pasting into password fields!
It's not that they don't allow the use, it's that they don't have a convenient 1Password icon next to the password field. I've noticed some apps have that. Not sure if it requires some specific integration or some open protocol.
The password manager integration is a public standard. More annoyingly, though, apps can also find out if you've pasted into a field and then immediately clear it (some rinkydink banking apps do this.) So it's a 2-pronged problem.
+1 from me as well. Great for storing literally anything sensitive, syncs flawlessly across devices (I currently use dropbox to sync the vault, but it supports other options & they have their own syncing/account system (that is not required to use 1password)).
It is not open source but the vault format is open.
I really wish websites would support use of client side TLS certificates as part of the authentication process. Combining that with a username and password would give you two-factor authentication.
Client side TLS certificates get sent in the clear before you authenticate the server. (You can send them in a renegotiation, but renegotiation has been a historic source of both implementation and protocol security bugs because it does complicated things to TLS state.) So you don't want a client-side certificate that includes your name; that's a huge privacy leak.
You could imagine a scheme where you give a user a certificate with a random subject, and you have a server-side map of random string to user account (so leaking that map isn't the end of the world like leaking passwords, it merely reintroduces the privacy leak above). I recall some proposals for that several years ago. Today, Web Authentication effectively does something effectively equivalent, as does U2F, although they don't involve TLS client certificates specifically.
In TLS 1.3 client certs are sent over an encrypted link, and a reasonable client can and should wait for Finished from the server to arrive, at which point they're entirely sure of who their recipient is too.
Another nice thing is that TLS 1.3 servers can send a CertificateRequest asking for a particular _type_ of certificate, so (if that's ever used in anger) it lets us have clients that don't need to waste the user's time when they don't actually have a suitable certificate anyway. In earlier versions servers could only hint about which CAs they trust, not anything else.
> So you don't want a client-side certificate that includes your name; that's a huge privacy leak.
If it matches the username I have on a website like reddit or HN, then is it really a privacy issue? Anyone, regardless of whether they're logged in or not, can see posts I've made under my username. Though what you say can be an issue for websites where privacy from other users is expected (e.g. banks).
> Today, Web Authentication effectively does something effectively equivalent, as does U2F
Both of those seem to rely on HTTP, while TLS could work with any application level protocol.
They can't see that the posts are coming from your IP address, though. That's one of the things TLS protects—I can post from a coffee shop and nobody at the coffee shop can know (except perhaps by traffic analysis) that the person at the table next to them is the person with this username.
I looked into this and the user experience involved is very poor, in particular the browser interfaces. It sounds like a chicken-and-egg problem. On top of this, is the clear text sending of the client certificate as another poster mentioned.
>It's just so much better than having to remember a thousand different passwords.
Login by email should really become a thing. There's just no reason to store passwords for most sites where you can just stay logged in indefinitely. On rare occasion you need your login cookie refreshed, just send a new link to your email. The burden of remembering a thousand secure and unique passwords dissolves immediately.
I keep all my passwords in a text file. I can't imagine remembering them all. I suppose I should keep that file encrypted and synced to multiple devices with rsync or so. Would a password manager give me any advantage over this scheme?
A password manager will have an integrated password generator where you can configure the spec (include special chars, brackets, custom characters, etc. or not). And you can keep password spec "favorites". So you can quickly generate a 20-char with special chars and accents password, or an 8-char, only letters and numbers for those websites that requires that.
It will allow you to organize the passwords in a hierarchical way with folders (banks, administration, forums, whatever), and set icons.
It will also keep the date of the last time you modified it. Sometimes this can be useful to know if you are impacted by a breach revealed after the fact. You can also make passwords expire if you like.
You can also add extra data in a way that doesn't clutter the main view. This can be interesting when credentials are more than login/password. For example you could add a PIN there. For my car radio there is a code to enter to make it work after the battery dies, I added the entire procedure to the extra data as I always forget it and it's not intuitive.
I just checked, I have 957 passwords in my KeePass.
Yes, a password manager is just an encrypted database for your passwords. 1Password synchronizes all of your passwords across devices and makes sure everything is secure. You only need to remember a single "master password", which is never sent outside of your local device. In the event that you lose or forget your master password, the password vault is completely unrecoverable.
1Password can also store other information besides passwords such as credit cards, software license numbers, passport numbers, etc. There is also a secure notes feature for storing arbitrary text.
The other password manager that I tried before 1Password is Lastpass. I ended up choosing 1Password since I think it's better designed and overall feels slicker. The /r/lastpass subreddit is littered with complaints about broken updates and bugs...
Sync, browser integration, password generation, audits on password age and duplicates, validation against pwned passwords, shared vaults — nothing that you can't do yourself on top of a text file, if you've got the time and energy for that. TOTP, ACL, secure notes and files — these can't easily be done with a text file, but don't need to be part of a single password management system just because the commercial vendors have added these.
Yes. Among the many features a manager app like 1Password would provide is a way for easily pasting in a password to a login field with a simple keystroke.
You don't need to trust others. For example, build the password using some simple algorithm that uses the TLD name. This way you only need to remember the algorithm.
I used to use this system but moved away from it. The reason is twofold. First, if it's a simple enough algorithm there will be enough 'hash collisions' that if someone gets their hands on one of your passwords and your email address, there's a non-negligible possibility that they will be able to find another domain that has the same password.
Second, sometimes sites mandate that you change your password. Or have rules that are incompatible with that algorithm. And then you need to start remembering exceptions to your algorithm, at which point you're back where you started.
There's also https://lesspass.com/ which is stronger than what the parent mentioned and I used a system like that for several years.
I gave up on it for the same reason. Having to remember exceptions, plus when you change your password, you have to change it everywhere, which is annoying because you can't remember every active account.
Do you only use Apple devices? Despite spending most of my time on a laptop with macOS, I also have a gaming PC with Windows, a home server with Debian, a mobile device with Android, and a tablet with iOS. It's nice to have a bit of flexibility available.
If you use an alternative browser such as Firefox you lose access to the built-in integration.
I think their SaaS offering has vault sharing for friends and family, which isn't available through iCloud Keychain.
They provide additional security audit features, such as vulnerability tracking. Quite relevant: I just opened the app and Watchtower had a vulnerability alert notifying me to update my password on Twitter.
It supports One-Time Password, which can occasionally be convenient.
Other kinds of item are supported as well, such as credit cards, bank accounts, software licenses, identities, and secure notes. No more having to grab for my wallet when I need to input my credit card or driver's license info. No more having to search for a checkbook to find my bank account number.
Not particularly, if you don't need the 1Password features and use Safari on macOS. The password generation is integrated into the browser UI which is arguably better for non-nerd and lazy-nerd users.
I use 1Password to store everything. Security answers so I can use completely bogus ones, social security numbers for family members, software license keys, membership info, etc.
iCloud Keychain is definitely well-integrated, but I've run into a few edge cases where it doesn't behave the way I need it to. In these cases, 1Password is better since it actually lets me dig in and edit some of the low-level details in a quality UI (versus digging a couple levels deep in system settings/Safari preferences to find/edit the password in question).
I'm skeptical whether it is really worth it to trust yet another party that can potentially be bribed by intelligence agencies and what not; or even hacked. No. I sit down once a year and think of CorrectHorseBatteryStaple-like passwords [1] for each important service, where each password is a relatively complex function (involving deletions, insertions, swaps, associations, numbers and special characters) of details of my life, the current year, the service in question and the username. That way I have a unique password for each service and I can easily reconstruct it based on that sort of easily recallable information.
How long have you done this, for how many sites, do you rotate passwords (when sites are breached, and/or on a schedule), and have you had to access sites in a mentally compromised state (distracted, sleep-deprived, post-concussion)?
Every once in a while I hear someone explain their system for this (and I used to use a simpler scheme), and I can think of arguments about why it won't work for long, but I'd be happy to update my internal monolog from actual evidence.
I've got about 1K passwords in 1Password. I generally rotate them when they're 1-3 years old, depending on the threat model, cost of compromise, and on what I'm using password review to procrastinate.
My question with these schemes is how do they deal with sites which have weird password requirements which don’t match the scheme.
Typically they don’t remind you when you are logging in that *we made you use 8 character passwords but you can’t use some special characters” or whatever. So you have to have some way of remembering what crazy password rules they had 3 years ago when you registered...
"We are sharing this information to help people make an informed decision about their account security. We didn’t have to, but believe it’s the right thing to do."
The "we didn't have to" is a little jarring given the scale of this.
I suspect that many more employees at Twitter have access to the logs, than have access to a super computer and pasword hashes.
I know I wouldn't trust my password with the number of people that have easy access to logs at other large(ish) tech companies.
I really can't imagine why "we didn't have to" was included in that tweet, at all. What other flaps like this have occurred that exposed my creds or personal data to large numbers of employees, that they didn't have and didn't choose to tell us about?
More employees at virtually every major web company have access to instances (and thus instance memory) than have access to supercomputer clusters, too. Every mainstream popular web application is fed a constant high-volume feed of plaintext passwords, right there in memory (or, in typical TLS termination environments, on the wire) to be read by a persistent attacker.
That's true for nearly every single internet facing service, no? A compromise resulting in point-in-time access to traffic is a bit different than a bug that creates a persisted historical record of every single user who signed in for a period.
Maybe I miss the point behind this comparison? I guess I'd understand more if I thought the number of folks with node access and log access were in the same magnitude at Twitter, or if the TLS stack persisted data over time.
> Last year a contractor deleted the president’s account.
The fact that they undeleted it is strong evidence that he didn't have discretion in how he performed his job, and thus was actually an employee and not a contractor.
Indeed. I deleted my Twitter account recently, there was a message that data is retained for 30 days to facilitate un-deletion. I assume their internal process is the same.
So Twitter found out they had a bug that caused them to store passwords in one of their databases in plaintext. Their response is just a generic 'hey maybe you want to store your password'.
Compare that to Github who just yesterday went through the exact same problem, except they're requiring users change their password. Their CTO didn't make some 'hey you should THANK US" claim.
Yup. Affected users were emailed 24 hours ago. Only affected people who initiated a password reset previously (I assume during a certain time frame?)
> During the course of regular auditing, GitHub discovered that a recently introduced bug exposed a small number of users’ passwords to our internal logging system, including yours. We have corrected this, but you'll need to reset your password to regain access to your account.
> GitHub stores user passwords with secure cryptographic hashes (bcrypt). However, this recently introduced bug resulted in our secure internal logs recording plaintext user passwords when users initiated a password reset. Rest assured, these passwords were not accessible to the public or other GitHub users at any time. Additionally, they were not accessible to the majority of GitHub staff and we have determined that it is very unlikely that any GitHub staff accessed these logs. GitHub does not intentionally store passwords in plaintext format. Instead, we use modern cryptographic methods to ensure passwords are stored securely in production. To note, GitHub has not been hacked or compromised in any way.
I think it's part of the problem of writing a statement to be read by a large number of people - it could be read as "There is nothing legally forcing us to do this, but we are doing us anyway as it is in the best interests of our users" (which is positive) OR it could be read as "We didn't have to do this; we did you a favour, but there might be situations where we don't disclose this" (which is fairly negative). FWIW, I think the former is the intent.
Personally, I'd probably just say "We felt that we had to disclose this to protect the interests of our users" (and not acknowledge that they might not, or that they had to make a decision to do so), or just say what they _are_ doing ("We are disclosing this in order to protect our users") and avoid the possibility that it is misinterpreted. I don't think there is anything to be gained by saying that they might not have done this.
Regardless, it seems very defensive. A company that looks after their users "because they choose to" is a lot more suspicious than one where customer care is simply assumed to be inherent to the operation.
Back in my younger, move naive days when I first started using a password manager and such a practice was not as widespread, I had the same question I think most other people have: "If my computer gets hacked or infected, won't the attacker instantly grab all the passwords from the manager?"
I chose to use a manager anyway, with the logic that if my system was compromised I was owned either way; memorizing passwords provided very little, if any, "softening" of the damage.
It's funny looking back on that fear now. Password managers have become more mainstream, and we, collectively as users, have learned something interesting. One of the assumptions of that hacking fear was that it was more likely for a user's machine to become compromised than it was for the service's the user uses to be compromised.
Well, as it turns out, the opposite was true and the fear was unfounded. The vast majority of password leaks we've seen over the past decade have been due not to malware but rather server compromises.
Perhaps that makes sense in retrospect. But it wasn't obvious a decade ago. Back then viruses and malware were rampant on the internet. In many ways they still are, but not like it was back then. Those were the days of pandemics like the Blaster worm. So it made sense to be more afraid of your own machine being compromised.
But the tides have turned. The landscape of OS security has improved dramatically. The single points of failure offered by servers are far more valuable now than the unwashed masses of networked user computers.
Just a funny retrospective that I thought I'd share.
My statement is based not on frequency of news, but rather on my understanding of the provenance of passwords in password lists. The vast majority of passwords in password lists are sourced from service hacks; not from user malware. And that makes sense. Malware events net maybe thousands of user passwords? On a good day maybe 100k. But hacks of major services like LinkedIn ... those yielded hundreds of millions of passwords. The two just don't compare in magnitude. Remember that HIBP's list is sitting at something like half a billion password combos right now.
One thing to note is that a remotely-compromised computer gets you access to the actual browser session and not solely to the password. In fact, most of the time the user has already logged in, and so the session is accessible but the password isn't (users without password managers have no persistent store, and password managers try to keep the secrets encrypted except when the master password has been recently typed).
So it's not often useful to take a password from an individual user and put it in a password list, but it is often useful to maintain persistence on their machine. Tools for doing that (RATs) are definitely common, both from random internet attackers and from e.g. angry exes with physical access to your device. But those aren't the attacks that e.g. HIBP is interested in, so if you're looking at data from people interested in password compromises, the effect is that you'll undercount client-side compromises.
At the height of gold farming when hacking WoW accounts was very profitable 1/2 our guild was individually hacked over a six month period. Normally it's much harder to detect password being leaked but it was really noticeable and hardware/cellphone tokens made a huge difference.
So, I suspect the reverse is true with standalone PC's being more likely to be compromised, what makes this less noticeable is it's harder to automated extracting value from those hacked accounts beyond sending gmail spam etc.
PS: This is also why cryptocoin software on users machines is basically a non starter.
> Malware events net maybe thousands of user passwords? On a good day maybe 100k.
You are underestimating how much malware there is out there, both on desktop and mobile. There are probably botnets that consist of >100k compromised machines.
It amuses me to compare what a virus laden desktop pc looked like 10 years ago, vs what almost every major commercial/media website looks like now. They’re practically the same with all the ads and popups.
I wonder if it has to do with the changing motivations of malware authors. In the pre-internet and early internet days, it was mostly young folks doing it for shits and giggles, as a power trip. It wasn't targeted; they just wanted it widely distributed to make a big splash, and it often didn't communicate with the creator in any way.
Now, malware is a criminal industry. People are making big money off of it. It doesn't make a lot of sense to target individuals when you could go after big, wealthy institutions. Same goes for government malware--they're looking to infiltrate and disrupt rival institutions and industries, not inconvenience private citizens.
> Well, as it turns out, the opposite was true and the fear was unfounded. The vast majority of password leaks we've seen over the past decade have been due not to malware but rather server compromises.
That's absurd. I can count on one hand how many account I need to be extremely secured. Accounts that will actually affect me if they were hacked.
- Works
- Bank
- Paypal
- Google
- AWS
All have a different and secure passwords.
The remaining accounts that I care about but won't affect me much if they were hacked use variation of a single password.
The accounts where I don't care use a simple password.
If any one of the first one are hacked or leaked, well I was already screwed I guess, the service was hacked... just hope that they take care of it.
If I get hacked, I just hope that I catch it before I log into one of theses accounts, which funnily enough, doesn't happen that much (except for work, but that's mostly at works so not my responsibility).
If any other password get leaked? Well not too bad, I request a new password and that's it.
> The single points of failure offered by servers are far more valuable now than the unwashed masses of networked user computers.
It's true up until it's no longer the case. So many people use cloud backed password manager too... your point currently apply to them too.
You see the big leaks but you don't see the peoples that get their passwords stolen from their computers, it's not new worthy.
W3c and IETF or other similar clever folks really like security stuff and do lots of clever things to make us safer. So why couldn't we create a http browser/server authentication method that has something closer to a nonce-based challenge/response mechanism? If it were standardized, the browsers could even do some clever hashing of some peer addresses or other things that we think should be static. All of the browsers could still present a "username/password" field that looks remarkably similar to existing forms but never reveals the plaintext password to the peer.
As long as I could type my password into other browsers on other computers and still get the same result, it seems like it's moving things forward. It would be opt-in, traditional username/password forms would continue to live on for decades to follow.
So what am I missing? Presumably something like this was considered and ruled out?
EDIT: yes, of course, I forgot about the existing HTTP authentication mechanism(s) in RFC 2069. I don't know why it never caught on but the fact that the browser uses modal dialogues for these means a significantly different context/user experience.
These challenge response mechanisms still require a shared secret. This means the server still needs to know either your password or a hashed version of it.
TLS covers the problems a challenge-response method is supposed to solve. That is, TLS prevents replay attacks because the shared secret is sent under encryption.
Really, the solution to exposing passwords to the endpoint is to do key-derivation client-side, with a server-provided salt.
Mainly because the modal dialog can't be styled for security reasons, so the UI designers and marketing tools will want to style the dialog box like everyone else.
In the end user experience vs security is a straight tradeoff.
Secondly, the digest authentication only supports MD5 anyway.
1. User provides username/password.
2. An exception occurs somewhere.
3. The stack trace from the exception is logged.
4. The stack trace includes the credentials.
5. The exception ends up in a ticketing system (Trac, JIRA, etc.)
6. Nobody notices for years.
Or all the environment variables containing tons of password+tokens, or in javascript libraries all the headers, including cookies that could be used to overtake a session. Error/Exception tracking software is quite dangerous if it written with a "lets store everything, because it could help find the problem" mindest.
When I was doing some consulting via RDP, I made sure to inform the client that I was about to expose things. "I am about to run strace on this web server instance. This is going to show what all the system calls are doing and how long they're taking. It is going to make the performance of this one instance very poor. It is possible that this will print private information like passwords and credit card numbers in clear text. Are you okay with this?"
I'm not sure they fully understood what they were acknowledging, but I heard back later that they were impressed with my professionalism.
These comments are everything from eliminating passwords (good luck) to “congress should fix it with law” (lol).
Having been in the identity space for a while and seen how various companies think about it, these kinds of problems will keep coming up...mostly at companies for whom identity is commodity. These companies will always give account security the minimum requisite attention. No one at Twitter gets excited about working on the login form.
I know OpenID was a bust and everyone hates Facebook Connect, but as an industry we need to figure out how platforms that view account security as a necessary evil can vendor that to people who take it seriously. Trying legal avenues to get people to take it seriously or finding alternative methods is what we’ve been trying for the last 15 years and it hasn’t worked.
Security Keys are often used as a second factor, but the original vision was that they could also be a primary factor in lieu of a password. Humans are bad at passwords, but they carry around house keys every day. Why not also carry around computer keys on their keychain?
For the average person (not worried about a megacorp or nation state attacking them), TouchID is also a possible "password replacement" for the primary authentication factor, although it comes with the disadvantage of not being rotatable.
Alternatives to passwords have "kind of worked" with bitcoin and cryptocurrencies: When they became valuable and exchange breaches became costly, people started using 2FA (exchanges made them mandatory) and ledgerwallets en masse. We should be making these kinds of technologies cheaper and dumbproof so that everybody can use them. Some alternatives proposed in the past, like OpenID, were almost comically complicated for Joe User.
At work one of our most commonly used libraries prints it's connection string (including the plain text password, username and database) in the log files on debug level (which I often see as the configured level).
When I pointed it out they told me it was intentional, and that attackers wouldn't go to the log files anyway if they could acces the system.
I gave up on the discussion at that point
There's another argument that might work in GP's case. A fair share of successful attacks were successful not because the attackers actually broke something on their well-guarded target. But because they used credentials obtained from other poorly defended systems.
fair point. my thoughts were that spending your time building $10 widgets and getting paid $5 by someone who is negligent with their use of people's passwords is akin to working for a company that pollutes public waters: some of your paycheck is "tainted" by the dangers you put others into, and you are smart enough to understand it. in that sense, quitting seems like a non-act, because you stop acting unethically.
I'm guessing you're referring to someone's ability to actually fix it -- in the case of logs, you can make a pretty simple regex to strip out all kinds of PII, and there really are a lot of arguments (e.g. proactively reducing cost of security audits -- if someone is reviewing your logs to figure out what happened, they might not want to see customer data).
Twitter's CTO had an odd tweet about this disclosure (emphasis mine):
> We are sharing this information to help people make an informed decision about their account security. We didn’t have to, but believe it’s the right thing to do.
Pretty much. Apparently github did something similar and now people are curious about the library/framework. Why is it so obvious that they both used the same library?
They don't say what the timeframe for this issue is. Have passwords been logged for the last 6 months? Last 3 years? Was this a bug found and fixed last year, and only now are they reporting it?
This seems like the sort of problem that typed programming can vastly reduce. In a language where string representation is controlled by the data type itself, one could protect against emitting passwords in plaintext accidentally. You'd still be vulnerable until deserialization happens, but that's a lot less surface area to worry about.
Twitter hasn't figured out how to properly handle passwords after over a decade of its existence? No, I'm not changing my password, I'm deleting my Twitter account for good.
I'm tired of big shot Internet companies getting away with such bland disregard of basic security and privacy rules.
It would be fine, but people who claim to be pro in software and are being paid premium refuse to learn from mistakes, neither from their own nor from others'. They just mitigate the fallout by saying things "It's was a mistake, sorry about that, it happens, software is hard".
Brain surgery is hard. Mistakes happen. But after a few mistakes you probably should stop doing brain surgery altogether. At least the patients will get a higher chance to survive your surgeries by avoiding you.
In terms of security, handling passwords should be considered analogous to brain surgery. A single mistake undermines the whole thing. If you can't handle that, stop doing it, and let people do it who can handle it better.
I understand your frustration but its coming from flawed argument. According to cancer.org, there is 50% chance you survive brain tumor surgery. From that account half the surgena should stop working.
Imagine the twitter traffic and how much code they are dealing with. This kind of mistake can happen and will happen. Someone was surely held accountable which they do not need to disclose.
No, it doesn't. And that's not the issue. It's OK to make mistakes if you learn from them to prevent them from happening twice or trice. But apparently even if paid premium they don't. There were soooo many password db hacks and cracks in the recent years, I can't believe people still defend bad software engineering as if mistakes is a natural occurrence which cannot be prevented. You can't prevent singular mistakes, but you well damn can make it hard to exploit them!
If you keep and let bad developers in software security even if they made the same mistakes (and mistakes of others) repeatedly there won't be any security left. So what's the point?
Would you relax? If what Twitter says is true (and there's no reason to think it's not), these were passwords which were logged to plaintext logs, which only people internal to the company can read.
We're not talking about a massive password breach, a bunch of script kiddies who found a database of plaintext credit cards by going to /admin.php and logging in with "admin / admin", or anything like that. We're talking about a mistake Github themselves made (and if you think Github doesn't know what they're doing in terms of security, I question your judgement).
Furthermore, when was the last time there was a major security breach at Twitter? You're claiming they're "keeping" bad developers and not learning from their mistakes as if this was a regular occurence for them.
And coming from me, I don't usually defend security breaches and malpractice. This doesn't really qualify. They made an official announcement, notified all users, even unaffected ones, both by email and on first login; that's more than you can ask them to do.
What bothers me about reactionary posts like yours is they give negative feedback to companies who actually do right by their breaches, which as is well known in the security field, is a matter of when, not if.
> It's 2018, no sane person would reuse their passwords across multiple sites which can and do get hacked.
All of my family does this :(. I think it's really common outside of tech savvy people. I've tried pushing them to use a password manager on their phone, or writing them down so they can use multiple passwords, but they'd rather just use a single one.
Years of bad advice didn't help either:
- Never write your password down
- Make sure you use at least one symbol!
- Make sure you use one capital letter!
Changed mine, just logged back in to this warning:
Keeping your account secure
When you set a password for your Twitter account, we use technology that masks it so no one at the company can see it. We recently identified a bug that stored passwords unmasked in an internal log. We have fixed the bug, and our investigation shows no indication of breach or misuse by anyone.
Out of an abundance of caution, we ask that you consider changing your password on all services where you’ve used this password. Learn more
> The glitch was related to Twitter’s use of a technology known as “hashing” that masks passwords as a user enters them by replacing them with numbers and letters, according to the blog.
Sigh. They appear to have confused hashing with asterisks.
> A bug caused the passwords to be written on an internal computer log before the hashing process was completed, the blog said.
So "related" in almost no way whatsoever, then? The state of technology reporting in the mainstream press really makes me despair sometimes.
The state of technology reporting in the mainstream press really makes me despair sometimes.
The problem is that the mainstream press writes for mainstream users, and the state of science/technology understanding in the general public is the real problem.
I hear what you're saying, but I think — in this case — the article's just giving too much irrelevant information. I think most mainstream users I know would understand it if it were written something like:
> A bug caused passwords to be written on an internal computer log, the blog said.
I'm not sure many users would understand the issue from just that - a perfectly reasonable (albeit inaccurate) reaction to that would be "Of course Twitter store my passwords - otherwise how can I log in?".
Understand what, exactly? I don't believe most users understand that passwords aren't necessarily stored in plaintext (on a system log, or in a database).
> Sigh. They appear to have confused hashing with asterisks.
No they didn't. Or else the excerpt you quoted would have said literally that, i.e.
> The glitch was related to Twitter’s use of a technology known as “hashing” that masks passwords as a user enters them by replacing them with asterisks.
While the description of "hashing" is tortured, it seems to be technically correct, if you don't make specific assumptions about the phrase "that masks passwords as a user enters them by replacing them". But yes, the normal interpretation of the present tense would be that the characters are being hashed as the user types in the password, rather than after the form submission.
This is the phrasing from Twitter's actual blog post[0]:
> We mask passwords through a process called hashing using a function known as bcrypt, which replaces the actual password with a random set of numbers and letters that are stored in Twitter’s system.
Why isn't the password hashed and salted client-side, with the salt being sent from server to client?
I understand that hashing in the client side would only substitute the user password for a new one. This alone sounds like a win to me, as it at least contains the damage somewhat in case of leakage, specially if salted.
EDIT:Sorry, I just found the exact same discussion in this thread.
It would be nice if they forced a password reset for all affected users and submitted the old passwords to the Pwned Passwords list.
I have trouble believing no passwords have been misused by insiders - sure, there wasn't any large-scale misuse but I am sure someone poaching a few passwords here and there for later mischief (once everything has settled down) would have gone unnoticed.
If the previous breaches are of any indication, it's that users don't give a shit - many major websites have leaked passwords (https://haveibeenpwned.com/PwnedWebsites) and they're still alive and kicking; for the ones that have gone down the drain (Yahoo!) it was more because the service itself faded into irrelevance.
Based on that I'd say it would be pretty safe to disclose a breach and reset all passwords; if your service is relevant your users will stay with you, and if not then not disclosing a breach will only buy you time before the inevitable happens anyway.
I suspect by "Losing active users", GP meant losing people who get confused or stuck throughout the password reset phase. Twitter is used by a lot of computer novices.
Twitter did disclose this, through email and on first login. Anyone they'd lose because of the breach is long gone and I also think it's probably next to nobody.
I once had configured a mail server to log passwords while I was figuring out how to set it up. Then, I turned on logwatch and started using it “for real”. Sometime later, while reading through the logs that logwatch was sending to gmail, I discovered that I had been emailing myself login credentials via plain-text emails.
Can't we reduce the risk of these accidental disclosures by using public/private key encryption? Ie:
- server publishes a public key on a fixed url
- your JS framework (react/angular/form handling/whatever) is modified to, whenever you query the value of a input[type=password], to return a value encrypted with the private key. (as close to the reading of the field as possible, except for eg. 'repeat password' handling)
- your serverside code decrypt the password as close to their 'VerifyPassword' and 'UpdatePassord' implementation as possible. and fail if they receive an unencrypted password.
This would make it very hard for intermediate steps (middleware, RPC/REST/JSON decoding) to accidentally spill the password, even if they fully log the traffic, and is something frameworks should be able to help you enforce.
I don't understand why servers don't just store passwords as a public/private key schema. A private key is algorithmically generated from a password with the public key stored on the server. When logging in the server sends a one-use-only 'challenge' as an encrypted blob to the person logging in. Locally they use the algorithmic private key generated from their password to decrypt the blob and send the response back. And you're logged in.
Seems like a pretty simple system and absolutely nothing that has to be secured is stored server side. Is there some clever reason I'm missing that this system fails?
----
Even better you can also salt the password->key schema based on something like the username, making table based attacks infeasible.
Naive question - wouldn't a lot of these issues be much better if passwords were (on top of server side processing) salted and hashed client-side? Then in principle you couldn't do these cross-site attacks where people reuse their passwords.
I haven't received anything from github, but I use two-factor and oauth tokens, and I don't remember the last time I logged in, let alone changed my password.
Might just be a weird coincidence, but at 2:14 PM today I got an e-mail from Twitter with the subject "Security alert: new or unusual Twitter login". I hadn't heard anything about the Twitter bug at that point (was it even public at 2:14?) but I changed my password as a response to that e-mail.
Now I'm wondering if I need to change it again... does anyone know what time the bug was patched?
I think asymmetric encryption would solve all problems. The sever only ever stores the users public key and uses challenge-response to prevent replay attacks.
This is a bad idea, but if you are required that your uses have passwords, you could use their password to seed an elliptic curve private key to create the user key on the fly.
So you go to a site, you enter your password, but then javascript creates a private key and public key based on the userid and password (which is a unique tuple). The key will always be the same regardless of the password.
When you first set your password, it sends the public key to the server, which is then stored. If that key was intercepted, it's not a problem.
Then the server challenges by sending a random code, which the client uses the private key to encode, sends back, the server then decodes with the public key. That guarantees that the private key is known by the client, and thus the password is known, but the private key never leaves the machine.
The second time you go to the site, you get the challenge, and respond. Neither private nor public key is transferred.
I don't think there's a solution that "solves all problems". What if you want to log in using a dumb terminal or not execute untrusted code on your host?
No amount of entropy will protect against plain security goofs. "It would take a computer about 115 OCTILLION YEARS to crack your password" some analysis tells me. Guess how many engineers it takes to downgrade any password's strength to "123456".
I have two-factor authentication with Twitter and wherever else I can using authy. Isn't that enough for me to not change my password? It truly gets bothersome. Maybe I'll need to break down one day and finally get a password manager. :(
Sounds like you don't value twitter that much, so it wouldn't be the end of the world if it did. Unless you reuse that password for anything, but I'm sure that's not the case. You could always delete the account if you're no longer using it, although your use of the word "actively" makes me think you might be using it for oauth or keeping it for other purposes.
If this happens in a month's time, will they have to pay out $96,000,000 via the GDPR? This came up at my place of work the other day: are data leaks as a result of bugs breaches of the act?
IANAL: No, there is a process of stuff before you will get fined. A human error does not suddenly lead to heavy fines, if you can prove that you put everything you can to prevent this from happening, i.e. follow all the processes GDPR defines like data processing agreements with third party processors, technical and organizational measures to protect against privacy violations, documentation of personal data stored with type of processing, type of consent, and such. what example would a data privacy agency set to fine a company that is fully compliant? I imagine you could even get court to side with you on not having to pay a fine.
data leaks of course are always a breach and must be reported. that doesnt mean you get a fine though.
Specifically, you'd look to Article 83[1]. It being an unintentional act (83(2)(b)), otherwise following best practices (83(2)(d)), taking steps to mitigate the damage (83(2)(c)), and that Twitter announced the breach (83(2)(h)) weigh heavily in favour of the fine being minimal, or there being no fine at all. Article 83(1) also notes that fines must be "proportionate and dissuasive" -- there is nothing proportionate about imposing the maximum fine for a simple error, nor dissuasive about fining a company that is otherwise compliant and following best practices.
It would be highly unlikely that any action would be taken at all. The issue was handled responsibly and was unintentional. Fining companies who disclose problems in this manner would just cause others to try and cover them up, which would go against the intent of the GDPR (protecting users).
Announcement is underplaying the situation. Absolutely change the password..anyone could have seen it. Just because there ia no evidence of log reading does not mean anything.
The partial answer to this is to dump detailed logs after a few days or weeks. That way, even if disaster strikes like this, it can be somewhat contained.
If you hash the password on the client, the hash made on the client is now the password. It's easy enough to send the hash directly, and the server is none the wiser.
That's not to say it's a bad idea, though. I have used client-side hashing in the past to allow passwords of arbitrary length while using finite network resources.
That's not true if you ask the client to hash both the password and a one-time token you just send.
Knowing both the password/key (which was send during registration) and the token, you can calculate the hash. Otherwise you can't. Password/key is never transmitted.
You can also use asymmetric encryption. I always cringe with amateur devs sending plaintext.
It does. You can do the same with with the salted hash of the password and not the password the user entered. It's still worse than real authentication protocols where you don't store the shared secret at all, but better than those diy schemes described earlier.
Also you can't get away from sending the shared secret during registration. Even with asymmetric cryptography you have to rely on the TLS to make sure you exchange the real untampered public keys.
We need a regulatory rulebook codified in law by congress that fines companies that make these "mistakes". Enough of a fine will force companies to take these "mistakes" seriously.
In Yahoo's case, that might have forced Marissa to actually keep a cybersecurity team and not cut them when she knew the systems were in danger of being compromised. We aren't getting any jail time, but hefty fines that don't stifle growth, just punish negligence and carelessness that are codified and don't need long court hearings to pass are a must.
Why? I have a small app with a few thousand users that generate almost no money but contain sensitive data - if I were to be fine because of a leak, I would be dead financially. Where do you draw the line between the companies that should be fine and those which don't?
No matter how advance our technology is or the security measures we take, any system connected to internet somehow will have a leak or an intrusion or something that compromise security.
Well, we need to be realistic about risk, security, and responsibility. And how legislation can have unintended consequences that work against our higher goals.
For example, a common refrain on HN is how centralized the internet is becoming. Do punitive damages for mistakes prevent mistakes? How likely is it to create an environment where the only organizations that exist are those that can afford mistakes? Is that worth it, especially in the context of some Twitter passwords that were logged internally?
Also, one of the most important awakenings that need to happen in light of recent events is the personal responsibility of who you share your information with. It's just as important as the question of what an organization does with your information.
If there were no trade-offs, then we could fix everything with legislation.
I'm not arguing for a fine, but companies should be legally obligated to tell their users when the personal data of their users might have been compromised.
For example, I can imagine kneejerk legislation that would simply ensure that next time this happens, Twitter just keeps their lips zipped. Not exactly an improvement.
Can you pitch an example of what this legislation would actually look like? And how it would differentiate between something like the Experian leak and some (oh no) Twitter passwords getting logged internally.
I agree. But i'm not sure this sentiment applies here, twitter probably has an amazing security team and its clear that they were using good practices. they found a bug and disclosed the issue to their users.
Those are great examples of regulations that actually benefit the user. When the regulation however addresses the wrong problem, then it becomes burdensome and dumb. If the user uses a simplistic password, his account will be hacked even if it's never leaked. Studies show that bad passwords and phishing are more destructive than leaks.
... because Yahoo had an interest in leaking passwords?
Security is expensive, breaches have to be made more expensive. That's the difference between a company's management thinking about security as a checkbox they have to fill as opposed to an ongoing investment meant to reduce risk.
It's really hard to avoid logging passwords when there's a parse error on structured data that includes a password, for example if there's malformed JSON in the authentication HTTP request. A weird error like that is exactly when verbose logs are essential, but it's also a time when your system doesn't know what it's logging. Amazon has a machine-learning based service to try to deal with this.
How exactly does one manage in a major company to set up an app in production that spits passwords, or even less sensitive customer data, to a logfile?
Is that the state of infrastructure engineering? The cynic in me wonders if it's related to the trend to take developers with some systems knowledge and turn them into Ops/SA's.
To defend against what? That some guy at twitter who saw the logs can login and change somebody's status?
Anyway, I guess you're right, I simplified. It is kind of a valid point, although if I was this paranoid then I would never leave the house. I just wanted to say that if the web wouldn't have been built on a giant piece of s*, for example password authentication, none of this would have been necessary.
The "right" way to handle this is to revoke access to all logged-in devices, revoke all previous passwords, send an email notifying users of the revocations, and force a password change upon next login. But Twitter won't do that because a non-trivial percentage of their user base would never jump through the hoops to get things up and running again and Twitter's MAU numbers would sink.
Over-zealous developers who think it's appropriate to log all function calls with parameters for trace level logs, or a framework with the same opinion automatically applies such tracing and logging over the whole code-base.
No-one notices because no-one uses trace level logging, until one day another developer is tearing their hair out because they can't reproduce a bug that is only occuring on live. It's an urgent bug that needs resolution asap. So this developer turns on the trace level logging and eventually finds and resolves their bug.
Being the careful person they are, they turn off the logging and go away happy.
Meanwhile they've unknowingly produced a few gigabytes of log outputs which happen to include plaintext passwords.
That's just one of many different scenarios where people acting in 'good faith' can still lead to bad outcomes. That is why a "PUNISH THEM!" attitude to this kind of incident is not helpful.
HTTPS form submissions should be encrypted while the data travels between the user's computer and the server, but the server will still need to decrypt them to perform the hashing. It's possible, and probably even common, for inexperienced or forgetful developers to add request logging for debugging or diagnosing service outages without adding extra logic to scrub sensitive fields.
seriously? pretty easily. somebody probably left a debug log message in place or something. guaranteed that this happens all the time and most people don't report it.
I doubt anyone left something that logged the plaintext password. No reasonable architecture necessitates holding onto a plaintext password for more than one line of code.
One possibility is an HTTP server on the request path after TLS termination. But then why is an HTTP server logging the request body?
My guess would be some sort of instrumentation process was blindly reading data in memory without distinguishing what the data was, but produced logs that incidentally included passwords.
In my experience, I've seen both of the following scenarios:
POST request comes in from the client. Full URL and request body is logged. Sometimes for simply troubleshooting, sometimes for security reasons (e.g., wanting to know all data coming in so that it's possible to identify security holes after they've been exploited).
POST request comes in from client. Frontend server makes a GET request to a backend server, and the password ends up in the standard request logs. In one case, I've seen this happen because the developer thought path variables were cool, so every API they wrote looked like /a/b/c/d/e. Sigh.
As developer, I can tell you this happens more often than I'd like to admit.
debug logs is that necessary evil you need to troubleshoot pesky bugs. Unfortunately some of these debug tools need to be turned on in a live environment to capture those logs for debugging. But also Unfortunately, we are humans and we concentrate on fixing the bug and forget to turn off logging or log unnecessary data.
Indeed. This is probably a good reminder for every developer to just go and check through their logs to see what is there. It can be quite a shock sometimes to find how much can get dumped there..
I'm curious if anyone has details on using bcrypt/scrypt at scale. Specifically one way I could see this happening is something like login requests go to a load balancer that puts the requests on a queue to be picked up and validated by some hasher service, and the queue ends up writing the requests to logs to recover from certain kinds of failures.
From the password compromise side, not really, you're just pushing the cost of hashing to your users (and it will impact mobile users more). There's a similar technique on needing proof-of-work on the client to combat DDoS.
From the authorization side, there is a threat, because if your table storing hashes is compromised, attackers just have to supply the stored hash to the auth endpoint and they get to login as anyone.
A combination of hashing on the client side (or immediately once the pw hits the endpoint) with something cheaper followed by a more intense bcrypt/scrypt afterwards might help a bit with the tradeoffs.
Does it buy you anything at that point? A server-side issue, such as this, would still log the thing you need to log in, and a client-side issue would just intercept the hashed form or could derive the hashing mechanism from analysing the client.
At most, it would seem to prevent weak passwords from being passed directly to bcrypt, but salting should solve that in a similar way anyway, and anyone brute-forcing a copy of the database can incorporate the same weak hashing logic
Another comment in this thread mentioned an idea of seeding the hash with a quickly expiring nonce fetched from the server. I think that’s a quite clever approach, similar to CSRF tokens in a sense.
That would effectively create a one time “password” for transmission from browser to database. In a case like this one, where sensitive text transmitted from the client leaked into logs, it would be a non-issue. The sensitive string in the logs is a temporary hash that would be useless shortly after discovery, since it was derived from an expired nonce.
It effectively becomes a real time scrubbing system with 100% coverage, because the passwords are “scrubbed” by design, and do not depend on explicit detection code in some scrubbing mechanism.
That's exactly what it is: incompetence, rank incompetence. Something like nine out of ten people getting paid today as professional software "engineers" should be let go. Dr. Margaret Hamilton figured out most of what we need to do to develop reliable software during and after the Apollo 11 mission. She coined the term "software engineering". Unfortunately, her work suffered from bad languaging and languished.
You'll notice you've been downvoted to hell and the comments in reply to yours are apologists and excuses. Not a coincidence.
> Due to a bug, passwords were written to an internal log before completing the hashing process
This isn't a bug, it's incompetence. Remember folks, never, ever use the same password for two different websites. Assume all websites store your password in plain text even if they hash it at some point. With logging and tracking on steroids these days, I expect "bugs" like this to be commonplace.
This smells like an attempt by Twitter to get as many users as possible to log in to Twitter so they can report XXXX% increase in monthly active users. With the added bonus of perhaps converting some who gave up on Twitter back into actual active users going forward.
"Due to a bug, passwords were written to an internal log before completing the hashing process. We found this error ourselves, removed the passwords, and are implementing plans to prevent this bug from happening again."
Exact same thing that github did just recently.