Hacker News new | past | comments | ask | show | jobs | submit login
Slack was hacked (slackhq.com)
857 points by trustfundbaby on March 27, 2015 | hide | past | favorite | 497 comments

I hate to be the negative guy, and they were hashing passwords better than 90% of the sites, but it would be SO easy to completely neutralize password leakage when the attacker only has access to the database.


tl;dr: Hardcode a second salt in your application code or in an environment variable. Then a database dump is not enough anymore to do any kind of bruteforce.

It's simple, free and you can retroactively apply it.

EDIT: I addressed some of the points raised in this thread here https://blog.filippo.io/salt-and-pepper/#editedtoaddanoteonr...

ircmaxell makes a good point in that comment -- not that the concept of a pepper doesn't work, but that a two-way encryption function is a better choice than a hash function for applying the pepper.

Your pepper will be a long, random key that is known to your app server but not your database server. If you store passwords as:

     bcrypt(bcrypt(password, salt), pepper)
then you spend a lot of cycles bcrypting your long, random key, which is pointless (it should already be long enough to be un-brute-forceable), and you lose the ability to rotate keys or ever stop using this scheme in the future. There's also (in theory) some risk that nested bcrypt() doesn't work predictably.

If you store passwords instead as:

    encrypt(bcrypt(password, salt), pepper)
then you can routinely rotate peppers through a simple one-time decrypt-and-encrypt step on your app server, instead of endlessly nesting peppers as suggested in the filippo.io blog post.

The rest of it is more of a qualitative question -- what's the risk that someone gains access to your DB but not your app server, vs. the risk that in implementing the pepper, you somehow screw up and store something easily crackable?

Once you use Encryption, it's no longer pepper, it's encryption with weak key storage (hard-coded). At that point, genuinely, why not just follow best practices and store the key securely (e.g. using an HSM)?

Certainly storing your key with an HSM is superior and protects against a slightly large class of attacks, but in terms of getting the same protection as salt+pepper was designed for (limiting the damage of a DB-only compromise), encryption, even with weak key storage, is superior. Adding an HSM is just ways we can improve that even further (which aren't really even feasible with the salt+pepper idea).

Because nobody knows how to do that and it is likely extremely expensive by every metric.

And, no, I am not being facetious by saying nobody knows how to do that. I am being quite literal. Have you ever done that? Do you know how? Do you even know what you would google to figure out how?

I'm yet to see my favorite library of course's documentation on a HSM. How do you do that in e.g. PHP with MySQL? MVC with MS SQL? Java with Oracle?

It's worth it. And yes, I have done that, it's not a mystery, they all provide a API/ABI for interacting with them[1], such as OpenSSL's Engine API or PCKS#11.

Edit: There's even a project to literally do this directly from PHP[3]

[1] http://stackoverflow.com/questions/10796485/interfacing-with...

[2] http://en.wikipedia.org/wiki/PKCS_11

[3] http://stackoverflow.com/questions/3231293/how-to-interface-...

Eh, I had a Luna CA3 sitting on my desk for a while. HSMs aren't that exotic. I'd be more concerned about the HA aspects which could require extra code. I suppose you could just shove a USB HSM in several of your servers and encrypt every password with at least 2 of them.

Christ you weren't kidding about it being expensive. Amazons Cloud HSM is $5k upfront and $1.8/h thereafter, which is completely infeasible for the agencies I've worked at. A shame, I'd love to do the right thing there :(

That's why AWS also started offering Key Management Service: https://aws.amazon.com/kms/

You don't get your own HSM but it's MUCH cheaper ($1/key/month) and more scalable and available than an HSM.

That's brilliant. I'm definitely going to use that, thanks for posting it!

I think AWS provides both key management services as well as cloud hsm.

I have not used either myself, but I would imagine the documentation is quite good.

I've done it using a $100 Cryptostick (Now Nitrokey). Here's a guide from Mozilla: https://blog.mozilla.org/security/2013/02/13/using-cryptosti...


Joining the chorus; I've done it. Expensive on some levels, but very cheap on others. Dealing with keys as physical objects is a lot less stressful.

> what's the risk that someone gains access to your DB but not your app server, vs. the risk that in implementing the pepper, you somehow screw up and store something easily crackable?

The question is irrelevant anyway, because if someone gets access to your app server, they get the secret in both cases (pepper and encryption). So the first risk there is present in both cases, which just makes this a static "what's the risk that you screw up in implementing the pepper?"

The question is not irrelevant. It's blindingly obvious that salt+pepper protects against DB-only attacks. Injection that does a DB dump, or vulnerabilities in the DB server that don't exist in the app.

So yes, S+P protection won't save you if the app server is completely compromised, but it does protect you in case DB only is.

And the ability to S+P seems pretty simple to implement and document. Why is everyone panicking that it's hard to do?

I've already seen one person in this thread taking "S+P" literally, and suggesting `bcrypt(cost, salt + pepper, password)`, which has a very obvious and critical problem. Other "peppering" schemes in the comments here also have significant weaknesses. So yes, even a simple idea like this is easy to botch in implementation -- even if the implementation is just a one-line change!

What is the problem? Does bcrypt truncate "salt+pepper"?

Because the salt is in the hash!


$2a means it's using bcrypt version 2a

$12 means 12 rounds

GhvMmNVjRW29ulnudl.Lbu is the salt

AnUtN/LRfe1JsBm1Xu6LE3059z5Tr8m is the checksum

I read your question as comparing the originally-proposed pepper approach vs the encryption approach, and I said the question was irrelevant because both approaches have the identical risk that the app server is compromised, but only the originally-proposed pepper approach has the risk that it's not actually secure (e.g. because naïvely composing hash operations may be bad, and adding static bits to passwords may also be bad).

But upon reflection I believe you were instead using the term "pepper" to include the encryption approach as well and merely trying to question whether the added security of requiring an app server compromise is worth the risk that you still screw it up somehow. And to that I'd say that it's not difficult to apply an existing block cipher algorithm when storing/retrieving password hashes so I think the risk there is low.

> Why is everyone panicking that it's hard to do?

Everybody's panicking because the originally proposed pepper implementation is a really bad idea. That approach has not been researched for security implications, and there are many reasons to believe that composing hash operations without using a specially-defined operation like HMAC is bad, and adding static bits to the salt or password (e.g. if you use 'pepper.salt' for the scrypt salt or 'pepper.password' for the password) is also bad.

However, I believe the approach of using a block cipher to encrypt your hashes with an app-wide password is reasonable. It's not composing operations badly (encrypting a 256-bit string, or whatever scrypt emits, is perfectly reasonable given a secure key) or otherwise providing an attack vector on the hash algorithm.

The biggest risk I can see with this approach is you have to make sure the pepper is stored securely on your app server, never visible to the database server (and never accidentally committed to an open-source repo, if you're using an open-source server). Not just that, but you have to make sure that you don't accidentally lose it either (or you'll have instantly lost all your accounts). But this is a solvable problem.

Because it's harder to implement and easier to mess up than using standard encryption, and has unknown characteristics due to the fact that it's rolling your own crypto. Why would you spend the time and effort to implement it when you can do something that is objectively better in every way?

You never know. Some developers might not be smart enough to use an SSL connection from their app server to their DB server. The influx of newb devs out there is growing and security is usually not high on the learning list or the new devs priority list. If so, hackers could possibly get access to the DB without even needing the app server in the first place. Get your certs folks!

I don't see how this is relevant. There is no reason to believe that the hacker needs to get at your app server in order to access your DB server in the first place.

That sounds like fuzzy scare-mongering to me.

1) You should not invent your own algorithm. That's a given. That's why you use bcrypt/scrypt.

2) It's not abusing the algorithm, it's using a longer salt (in the concatenation case).

3) There's nothing wrong with nesting algorithms (just remember to use hex/base64 encodings, not binary). For example Facebook passes passwords through half a dozen algorithms. They call it the "onion". And it includes a pepper.

4) As for being effective, I think the SQL injection case speaks for itself.

5) As for rotation - just don't do it. You pepper gets compromised? Who cares, add a new one on top of the old one.

Also, I'm confused at how the proposed alternative would be harder to get wrong:

> Encrypt The Output Hash Prior To Storage

Correct me if I'm wrong, but an algorithm is a set of steps, which this looks like to me

    salt = urandom(16)  
    pepper = "oFMLjbFr2Bb3XR)aKKst@kBF}tHD9q"  
      # or,        getenv('PEPPER')  
    hashed_password = scrypt(password, salt + pepper)  
    store(hashed_password, salt)
That is an algorithm, which composes bcrypt with pepper.

The idea of not using key-rotation alone is insane, but lets just focus on your last point

    Also, I'm confused at how the proposed alternative would be harder to get wrong
Really? AES literally has hardware support, and can be done in a single call, and has been studied for years. How can that reasonably be considered "harder" to get wrong than something proposed by some random guy on the interwebs?

Outside of Peer-Review, what reason would anyone have to use the pepper scheme? As others have posted, there are several community members who's opinions do matter due to extensive research and body of work

Your second point might be dangerous - your salt values are no longer random but heavily biased and knowing that all salt values share some common bits might provide a new attack vector.

Is this true? I would think that static bits are no more dangerous than not having the bits at all.

Here is for example an attack recovering a 384 bit ECDSA key [1] by knowing the five least significant bits of the nonce (obtained by a side channel attack) for 4000 signatures. Now hashes and signatures are obviously very different things but I would not bet on the fact that a bias in the salt does not matter.

[1] https://eprint.iacr.org/2013/346.pdf

ECDSA is a completely different beast. I'm not aware of a modern password hashing function that would be broken if it was given a non-uniformly random salt. If such function were to be submitted to PHC (https://password-hashing.net/), I would consider it to be a disqualifying factor.

For password hashing purposes, salt doesn't need to be uniformly random, the only requirement for salt is to be unique and unpredictable to the attacker (see http://crypto.stackexchange.com/questions/6119/hashing-passw...). Most password hashing functions use a cryptographic hash on salt.

This particular function, scrypt, uses one-round PBKDF2-HMAC-SHA256 to mix password and salt:


PBKDF2 feeds salt, basically, into SHA256:


If there was a known attack it would obviously be a non-starter. But why take the risk that someone will come up with something comparable - in the broadest sense - to say the differential attack on MD5 that allows exploiting known bits in the salt if there is no need to?

Interesting. Thanks for the link!

> 2) It's not abusing the algorithm, it's using a longer salt (in the concatenation case).

But PBKDFs like bcrypt and scrypt are not designed to keep the salt parameter secret; in fact they assume the attacker knows the salt. And so if they happen to reveal the salt to the attacker, this is not considered a bug in the algorithm and won't have been flagged or fixed by cryptographers.

(And more importantly in practice, the implementations of these algorithms aren't designed to keep the salts secret.)

The "concatenation case" completely defeats the purpose of using a pepper and leads me to believe that you're not qualified to be giving this kind of advice.

In what way it defeats the purpose of using a pepper?

Why combine hashes? Just use XOR. If bcrypt can protect "not_my_password", it can certainly protect "KKQ{H]zTDWVSJVA", which is the former XOR'd by "%$". XOR doesn't decrease the keyspace (or change it in any interesting way), so any attack on XOR is an attack on bcrypt; XOR is fast enough to evade any sort of timing attacks, too.

Careful. If an attacker is ever able to observe the output for a very short password - one byte, say - then only one byte of your secret salt is used, and the attacker could begin brute-forcing the secret salt starting with the first byte.

XOR could also result in NULL bytes anywhere in the hash input, which could drastically weaken passwords,. For example, bcrypt ignores any password characters after the first NULL byte. This is especially bad if the attacker can supply their own passwords, doubly so if they can observe the output, since they can then easily brute-force individual bytes of the secret and use that knowledge to intentionally create NULL bytes in the hash input.

> XOR doesn't decrease the keyspace (or change it in any interesting way), so any attack on XOR is an attack on bcrypt

I wouldn't make any statement like this unless you've actually gone through the steps to prove it.

>Careful. If an attacker is ever able to observe the output for a very short password - one byte, say - then only one byte of your secret salt is used, and the attacker could begin brute-forcing the secret salt starting with the first byte.

Can you be kind enough to explain with a very simple example?

I would appreciate understanding your point - Thank you

Using the scheme above, a password of "a" would result in a hash H('a' ^ x) where x is the first byte of the secret salt. The attacker can simply test all possible values of x (there are only 256) to determine the value of that byte. Knowing the first byte, the attacker can then look at a two-byte password and repeat the process to brute-force the second byte.

(It's also possible to brute-force more than one byte at a time; it would just take longer. For example if the shortest password the attacker can observe is 6 bytes then they would need to try 2^48 possibilities.)

That's a rather comprehensive answer from @ircmaxell over there, hm. Interesting, thanks for sharing.

so the downsides are "it's not maintanable" and "don't roll your own crypto". I think they are negligible compared to the upsides.

Not at all. "don't roll your own crypto" is a downside that can lead to things completely falling apart or weakening the system overall.

The real downside is that there's a better, proven way to do the same effective thing, which is make a database-only compromise require additional work, without rolling your own crypto. It also supports doing things retroactively for real (not some of the hacks being discussed in this thread) and key-rotation. All the upsides, with none of the downsides.

> "don't roll your own crypto". I think they are negligible

Please do not ever consider "rolling your own crypto" a walk in the park. Unless you have a serious security background and some actual cryptography education and research never, never, NEVER do this. It is not negligible, and it is not safe.

The fact that the pepper can't be changed/rotated far outweighs any upsides

You can change your pepper by double-peppering your existing password database:

  scrypt(scrypt(scrypt(scrypt(password, salt), pepper2013), pepper2014), pepper2015)

Sounds like a maintenance nightmare. Need to ensure you keep your tongue straight when partitioning or restoring databases, migrating/splitting to new apps etc.

You consider "not maintainable" to be an acceptable downside to _anything_ you're doing with your software? What?

I wouldn't say negligible, but rather not slam-dunk convincing

Is there any significant evidence that peppering passwords helps? I've seen arguments for and against peppering out on the big bad internet. Everyone has opinions but there are few people's opinions about crypto that I actually trust.

The best article I've seen against this technique is by ircmaxell [0]. Nicely summed up in this sentence "It is far better to use standard, proven algorithms then to create your own to incorporate a pepper."

Anyone have source material (academic paper, Bruce "The Crypto God" Schneier blog post) that shreds some light on peppering passwords?

I'd be much more interested in how many iterations of bcryprt Slack were using. That has a much bigger bearing on events for me. Anyone at Slack know/want to answer that question?

[0] http://blog.ircmaxell.com/2012/04/properly-salting-passwords...

A properly implemented, simple pepper can only help password security and can't hurt it. Obviously you first must be using a good, slow algorithm (bcrypt, scrypt, or PBKDF2 with high work factor), but a pepper will only help you. (Let's assume the pepper is an AES key which all hashes are encrypted with.)

Yes, many times a dedicated attacker who has read access to your database will also have read access to your source code or config files, but many times they won't. And if they don't, then they won't be able to crack a single one of your passwords, while even with a modern and proper hashing algorithm they still may be able to crack passwords.

Take the scenario of a relatively intelligent hacking or hacktivist group, of which there've been several in the past 5 years. Let's say they're targeting someone they dislike for whatever reason, and find out that person is registered on some forum and decide to compromise the forum. (This tactic of lifting a whole haystack to find a single needle is very common for motivated attackers.) They don't care about any of the other users, they just want to try and crack the hash of one single member and have a full GPU cluster with which to do it. They're also willing to spend weeks trying to crack that one hash.

If the user's password isn't particularly strong, it's going to fall no matter what algorithm they used.

But if the forum is peppering all of their hashes, and those same attackers can only manage to gain access to the forum's database and not its local filesystem, then their chance of cracking that password goes to 0.

This scenario is a bit contrived because odds are motivated and intelligent attackers like these will end up gaining access to the filesystem and reading the pepper with enough time and effort, but the pepper is still an additional defense and means SQL injection alone won't be enough to crack passwords.

Hey thanks for the long response. I totally get the premise of peppering I think my problem is with this sentiment "A properly implemented, simple pepper can only help password security and can't hurt it".

From all the advice I've read security and crypto they don't work like that. The assumption is the other way around. A properly implemented, simple pepper can only hurt password security until proven otherwise by rigours testing and analysis.

Time and time again we read stories of a tiny implementation detail that created a sly and subtle vulnerability that simehow leaks information about original plain text by interrogating the cipher text.

bcyrpt with a large work factor and a per user salt is a PROVEN method to prevent attackers learning the plain text. Until I see evidence from a trusted cryptanalyst I'm not going to roll my own by adding in pepper they didn't plan on being there.

EDIT: sorry let me make my point a little clearer. In the event that the hacker can access the filesystem or memory -- whereever you store your pepper -- could the hacker use the pepper and an implementation detail in the peppering technique to learn information about the plaintext or the salt? This question is what needs to be answered by qualified cryptanalysts before developers start using peppers wide-spread in my opinion.

I would argue that while crypto most certainly works that way (use only when definitively proven), security in general is a bit more lax in terms of requirements.

Despite hashes being cryptographic primitives, user password hashing is less about cryptographic principles (preventing first and second preimage attacks) and more about increasing the amount of work an attacker must muster to find an input which hashes to the hash value.

Attributes like collision resistance generally mean almost nothing when on the scale of strings under 100 characters in length, which most user passwords are. Practically, you are never going to run into collision issues if you're using MD5 or later. Your goal is merely to increase the amount of CPU time it takes to find a hash's original input.

Because of this, even if AES encrypting a hash with a random pepper somehow reduced the collision resistance of a hash (I'm 99.8% sure it doesn't), it wouldn't at all affect the speed at which the hash is cracked.

>a sly and subtle vulnerability that simehow leaks information about original plain text by interrogating the cipher text.

Hashes are not ciphertext. For all intents and purposes they can be viewed as CSPRNG output. There is nothing you could do to them to leak info about the plaintext as long as your hash algorithm isn't pathological. There are things you could do to reduce collision resistance, but I addressed that above.

Password protection in web apps is not about encryption or decryption.

> A properly implemented, simple pepper can only help password security and can't hurt it.

Well, yes. But what is the definition of "properly"? There are definitely constructions of "pepper" that look simple, but drastically hurt overall security:

    bcrypt(hmac(password, key), salt)
If hmac returns raw bytes, you're in real trouble: http://blog.ircmaxell.com/2015/03/security-issue-combining-b...

It's sort of like the difference between birth control and counting based contraceptive methods (Standard Days Method). Executed perfectly, they are equally as effective. But with a slight error, one stays roughly as effective (losing maybe 5 to 10% effectiveness overall) while the other drops drastically (down to 10 to 20% effectiveness).

Considering using encryption is as effective as using a pepper, and it's less prone to weakening the core password hash, I suggest using encryption instead of peppers.

I am well aware of that misuse, as I've exploited it during a CTF before. :)

I would consider using the raw byte-output version of a function a very blatant example of "improper implementation".

Also, I agree regarding encryption. In my example I was actually referring to the random AES key as a pepper, even though it'd probably be better called an "application secret".

It can help in the following scenarios:

1. Hacker steals db but does not compromise web servers (because the hmac pepper key lives on the web servers and not in the db)

2. Hacker can run SQL Injection via web server, but cannot otherwise access web server memory/process

3. HMAC key is stored in a hardware security module and hacker cannot gain physical access

All of the above cases are also helped by just by regular symmetric encryption. Why make things more complicated than necessary?

How is HMAC more complicated than encryption?

Recommendations related to Security, should start from words "I'm {name}, known expert in IT Security, my works can be found here: {url}".

Otherwise such advices should be ignored.

Could not agree more. In fact, here is Bruce Schneier's quote[1]:

    Anyone can invent a security system that he himself
    cannot break. I've said this so often that Cory Doctorow 
    has named it "Schneier's Law": When someone hands you a  
    security system and says, "I believe this is secure," the  
    first thing you have to ask is, "Who the hell are you?" 
    Show me what you've broken to demonstrate that your
    assertion of the system's security means something.
[1] https://www.schneier.com/blog/archives/2011/04/schneiers_law...

Could not disagree more. Perhaps one should use their own reasoning and the ability to peer-review in conjunction with 'appeal to experience' rather than shut out discussion.

I might get down-voted for this, but I'm going to clarify my position a little.

I'm not advocating for shutting-down anyone's discussion. What I, and probably others, are advocating for, is only putting in trust in proven crypto.

The parent comment to all of this basically has the form: "Wow, you could totally solve your password hashing (that isn't broken) by using this scheme I came up with. No one else has looked at it, but boy, it looks difficult to crack to me."

There's a complete difference between that comment, and many of the others: "how about if we do X", and I believe the second is completely valid for discussion; As LONG as you include relevant experts in that discussion.

Joe and Bob talking about encryption isn't very useful unless Joe &| Bob are trained in cryptography and have experience in applying that crypto in the real world.

Down-vote all you want, but that's my view-point at least.

Is your reasoning that of an experienced, competent cryptographer? Because mine isn't. I'm not a cryptographer's peer, either, and I'll bet you aren't either. Step one--nobody said step all, but step one--is establishing your bona fides to determine whether it is worth burning cycles on your idea, because proving or disproving cryptography is very hard and very time-consuming. It is a heuristic that, generally speaking, works pretty well.

No, his reasoning probably isn't. But without putting words in his mouth, there is this quasi-religious "thou shalt not talk about cryptography" attitude among programmers like crypto is literally voodoo magic. The appeal to experience is an incredibly frustrating part of this. It's like people are willfully ignorant and forcing those of us who may not be experts but also want to have an intelligent discussion to pretend to be idiots along with them.

It's not "thou shalt not talk about cryptography."

It's "cite your sources."

If you're pointing out a gaping flaw, you don't need a source. If you're suggesting a standard security measure, you don't need a source.

To insist on something novel, yes you want a source.

I would agree, except "standard security measures" aren't and you lead to travishamockeries like the pepper nonsense. Which is why I default to, "cite your sources."

We have an approach to this which doesn't modify the password hashing at all, so it can't possibly reduce the strength: we store users and password hashes (currently bcrypt * ) in two separate tables, and the key used to look up a given user's password hash is an encrypted version of their user id. The encryption key for doing this transformation is loaded dynamically at app startup and has some extra security precautions around storing it, so it's not baked into either the source code or a config file.

The upshot is that if you get a db dump and are able to brute force some bcrypt hashes, you won't know what usernames they go with. If you get a db dump and our source code, you're still out of luck. If you get ahold of an old server hard drive, you're out of luck. If you root a running server and inspect the process memory, you can obtain this key.

This scheme also allows the mapping key to be rolled, which would immediately invalidate all passwords in the system.

*we also version our password hash records so we can migrate from bcrypt to a new scheme fairly painlessly if it's warranted in the future.

Well, let's imagine an attack scenario.

As an attacker, I get SQL access to your DB (meaning no access to the encryption key). I then download the user names, and the hashes. I then attack the hashes offline. I recover only the weakest few percent (since you're using bcrypt). But since the weakest few are those most likely to be re-used (both by different users and by a single user across sites), they are going to be both more valuable to me and easier for the next steps:

Then, I take the highest frequency passwords and the user table, and I start validating them online in your system. Now if I do that too quickly, you'll notice and I'll be shut down. And if I do that all from the same IP, I'll be shut down.

But what if I had a botnet that I could distribute the load across. What if I kept my request rate small enough to stay under the radar of even a moderate scale system.

I would expect to start seeing meaningful results within days.

If you had 1000 users, then I could surmise that you don't have much traffic, and hence keep the request rate down to perhaps 100 per day. In 10 days I'd have at least a few u/p combinations that I know for a fact worked.

If you had 1000000 users, I could ramp it up quite a bit higher, to perhaps 1000 or 10000 per day.

And since they all came from separate IP addresses, it could be rather difficult for you to tell an attack was going on unless you were looking specifically for it.

Does that mean you should stop immediately? No. It's not that bad of a scheme. But be aware that it doesn't give you (or your users) the level of protection that it may look like on the surface.

So after a breach, you have our (currently) 2 million hashes, and let's say you recover only the weakest few percent of the passwords, which is 60000 known good passwords. Instead of owning 60000 accounts now, you have 60000 passwords, each of which is going to require on average one million attempts before you guess the correct username. Is this not self-evidently better?

Well, let's look at it realistically: http://arstechnica.com/security/2015/01/yes-123456-is-the-mo...

The #1 password out of 3.3 million was 123456, which was used 20,000 times.

So extrapolating that for your 2 million hashes, we'd expect the top password to appear roughly 12,000 times.

Running those numbers, we'd expect each guess to have a 1/12000 chance of matching. Or more specifically, a 1988000/2000000 of not matching.

With some quick running of those numbers, we'd expect a 50% chance of finding a match after trying just 115 random usernames.

I'm not saying it isn't an interesting approach, I just don't think it's nearly as effective as if you encrypt the hash directly (which has no attack vector unless you can get the key).

You forget the cases where username=password (or username+CelverP3rmutation)=password).

> you won't know what usernames they go with

But if you've got a list of all usernames (probably a relatively small number) and access to a running system, isn't it easy to just try each password against each user until you find a match?

The common practice of limiting logins from a single username wouldn't help with that either.

Yes, but we currently have 2.1 million users, so that's still no small burden. And don't forget that's only after you brute forced the passwords.

It's not the security of the passwords I'm thinking about, it's whether or not you've really enhanced it much by obfuscating the relationship between password and username. I'm sure you've thought about this more than me, but I'm a bit skeptical, since if you have a lot of users, you probably have had to create systems to make the log in process extremely efficient. If all of your users wanted to log into your system within a 24hr period, could they? Maybe a week? If they could, then an attacker can attempt to log in with each username over the same period of time.

So you basically increased the cracker's workload by six orders of magnitude, which is equivalent to increasing bcrypt's work factor by 20. Cool!

If you increase bcrypt's workfactor then legitimate requests take longer.

Is there some reason that you don't just hash the whole column or table though? It does everything you described, except also makes even the bcrypted passwords available in the event of a db dump. It just seems like a half-measure where a whole-measure is just as easy to implement.

Hmm, interesting suggestion. We wouldn't want to encrypt the entire row since we want to be able to query on specific columns (version in particular). My only excuse otherwise is that the goal was "disassociate the hashes from the users" not "encrypt the hashes." :) You're right that we could have done more, although whether that makes meeting the original goal a "half measure" depends on your perspective. :)

That's a really interesting idea. I was thinking it might have performance implications but I think that will only apply to the password validation. How do you make sure the mapping key doesn't cause collisions when you roll it to reset everyone's passwords?

That's a great point about rolling the mapping key. We would likely be dropping all the rows in the password hash table for such an extreme event anyway, though, since otherwise there's no way to garbage collect the orphaned ones.

I generally disagree with hardcoded salts, you should assume everything is compromised in a successful attack. But I'm actually commenting here because I don't see how you can retroactively apply the second salt to a hashed string. Could you please elaborate or share a link?

Later edit: I'm referring to your example in your link:

    salt = urandom(16)
    pepper = "oFMLjbFr2Bb3XR)aKKst@kBF}tHD9q"  # or,
    hashed_password = scrypt(password, salt + pepper)
    store(hashed_password, salt)
How do you retroactively apply this?

In incident response you assume the worst, but in system design you try to minimize impact of common attacks like SQL injection.

There's nothing wrong with nesting algorithms (see the Facebook hash onion), so you can use the following scheme:

    bcrypt(bcrypt(password, salt), pepper)
And do a pass on all your database entries like

    bcrypt(old_hash, pepper)

In your website's example, you have

    bcrypt(password, salt+pepper)
Observe the difference between that and the rehash you just posted

    bcrypt(bcrypt(password, salt), pepper)

> bcrypt(password, salt+pepper)

I hope it's obvious that no one should never do this, since the output would contain the "salt+pepper" bits in cleartext alongside the hash, defeating the entire point of the "pepper":


In fact, this is a perfect illustration of why it's bad to put secret bits into a crypto function in a place that's not designed to take secret bits. Bcrypt does not treat the salt parameter as a cryptographic secret, and other algorithms might not either. And they might leak it in more subtle ways.

One would think it's obvious, yet this is what the person in the linked article suggests doing :/

> There's nothing wrong with nesting algorithms

This is really vague. What kind of algorithm? With itself, or just anything inside anything else? Passing the raw output of any common cryptographic hash (SHA-x) to bcrypt, for example, completely destroys its security, as bcrypt input is null-terminated.

(What happens when you nest DES in A*, anyway?)

Obviously you have to use type safety. You can't cast a binary string to a null-terminated string, you have to convert. That's a problem unrelated to hashing.

I would avoid using any particular symmetric algorithm twice. Otherwise if you have an example of algorithm chaining that can weaken security beyond the weakest link, I would love to see it. (Not that I think nesting is a great idea.)

> That's a problem unrelated to hashing.

It (poor implementation) is definitely related to implementing pepper on top of a secure password hash, though, which everybody is already doing differently.

> I would avoid using any particular symmetric algorithm twice. Otherwise if you have an example of algorithm chaining that can weaken security beyond the weakest link, I would love to see it. (Not that I think nesting is a great idea.)

“algorithm” is, again, really vague. (So is “nesting”.) But for something contrived and not snarky, here:

  h = sha512_hex(password)
  sha512_hex(bcrypt(h, gen_salt()) +
             bcrypt(h, gen_salt()) +
             bcrypt(h, gen_salt()) +
             bcrypt(h, gen_salt()))
The weakest link here is 374 bits (4 bcrypts), but the output is 288.

Oh sorry I missed this post for a while.

>“algorithm” is, again, really vague

Something that you can use to hash passwords. What you gave works if you assume gen_salt is seeded per user.

>The weakest link here is 374 bits (4 bcrypts), but the output is 288.

I'm afraid I don't follow. Your bit numbers confuse me, and I don't see how this results in an algorithm that is weaker than either sha512 or bcrypt.

> I'm afraid I don't follow. Your bit numbers confuse me

They’re bits of entropy (not counting the password itself) (I think). SHA-512(M): 512 bits; SHA-512(SHA-256(M)): 256 bits, for example.

> I don't see how this results in an algorithm that is weaker than either sha512 or bcrypt.

That’s my point. It’s not easy to get this kind of thing right, so just don’t bother with pepper.

Bits of entropy I understand, how you got "374" and "288" I don't.

>That’s my point. It’s not easy to get this kind of thing right, so just don’t bother with pepper.

What? Your point is that you haven't demonstrated that it's weaker than the weakest link, therefore you win?

Edit: Okay I figured out where you got 288. Still confused by the 374. Anyway you need to make truncations explicit. You didn't pass all of the sha output to bcrypt. You're taking advantage of an implementation API bug.

I'm not asking for evidence that shoving together functions from google without understanding them can go wrong. That's trivially true.

I want an example where combining hash algorithms is inherently wrong. Like using a block cypher twice can pop out your plaintext, but probably not as extreme.

Edit 2: Oh, 384!

    passwordHash = bcrypt(salt + password)
    encryptedHash = encrypt(passwordHash, pepper)
This way you can rotate your pepper by doing:

    decryptedHash = decrypt(encryptedHash , oldpepper)
    encryptedHash = encrypt(decryptedHash , newpepper)

Excuse my ignorance, but you probably shouldn't be able to reverse an irreversible hash.

The point of hashing passwords is that the true password is not revealable.

The point of salting password hashes is to prevent identical cleartext passwords from being stored as identical hashes in the database. Salts are often stored in the database, as well.

The point of peppering keeps a database dump from being at all useful for recovering passwords. It make sure that a component of the process of cleartext -> DB entry is not even in the database, requiring something from the app as well.

Why does encryption work here? Because you've already done a one way function on the cleartext -> salted hash. At that point, there is still no way to reverse the process all the way to get the cleartext. By using a two-way encryption function for the pepper portion, you keep the ability to rotate 'peppers' periodically, in case it is leaked, for example.

Thanks for the informative post

The pepper is being used as an encryption key in that example rather than using hashing.

In pacofvf's example, you are correct.

But, the original example is hashing the password. No encryption involved. So what makes anyone think that they can reverse a hash?

The problem is that pacofvf's answer assumes you don't need to do this retroactively.

This would work if you were building a new system today, but if you had a DB full of one way hashes you're not going to be able to retroactively modify the pepper.

And more importantly, slack straight up stated they salt the password and use bcrypt. It's all one way hashes, no encrypting/decrypting going on.

It's reversing the encryption, not the hash.

Where in the original example was encryption involved? The salt and pepper were only ever used in a hashing algorithm.

The pepper is the key to the encryption, not a hashing algorithm.

Thanks for just ignoring my question - THERE IS NO ENCRYPTION IN THE ORIGINAL ARTICLE, so WHY are you restating that encryption is involved at all?

Straight from the original artical:

  hashed_password = scrypt(password, salt + pepper)

  hashed_password = scrypt(scrypt(password, salt),pepper)
Absolutely zero encryption going on here. Therefore, no ability to "decrypt" the pepper result. The pepper IS NOT a key for encryption. Period.

You append the salt to the hash, and then re-hash it. Not exactly pretty, but it works.

His example doesn't rehash. His example adds the pepper to the initial salt. This is an honest question, how does one apply it retroactively?

I think you are spot on - you can't apply it retroactively in this case. If you set it up as H(H(pwd, salt), pepper), you could apply pepper retroactively, but that's not what's proposed here nor in the SO question linked above. Also if you do H(H(pwd, salt), pepper), you can successively apply new peppers and make it a bit more maintainable, but I am unclear that it is secure to do so.

Update the hash when the user logs in because you have the password. You can tell if it's been applied by having another column, or prepending/suffixing the output with something that scrypt can't output.

I don't understand your issue with hardcoded salts. It essentially works the same way as an HMAC. It is some secret material that further complicates the attackers job...It doesn't mean they react to a successful attack in a different way.

After hashing the password he is storing the hash along with the users random salt, not retroactively applying the salt a second time.

Your question depends on your definition of retroactively.

I think the article's context is if you don't currently use a pepper, you can easily add one and update all of your password hashes in the database.

From your blog:

   hashed_password = scrypt(password, salt + pepper)  
   store(hashed_password, salt)  
Why aren't you storing the (N, r, p) parameters too? Is it because your library's "scrypt" function automatically encodes all the parameters it uses in the returned string? (This is not hypothetical; it's exactly what many scrypt implementations do.) If so, congratulations: You just stored the pepper bits in your database, because they're part of the "hashed_password" value.

If you care about security then just use an HSM, they aren't that expensive, and eliminate the possibility of a database dump.

The best summary I've read recently on peppering was posted to the PHC list by Thomas Pornin last week. It's worth quoting in its entirety;

  Adding an additional secret key can be added generically, in (at least)
  four ways, to any password hashing function:

  1. Store: salt + HMAC_K(PHS(pass, salt))
  2. Store: salt + PHS(HMAC_K(pass), salt)
  3. Store: salt + AES_K(PHS(pass, salt))
  4. Store: salt + PHS(AES_K(pass), salt)

  I have used here "HMAC" to mean "some appropriate MAC function" and
  "AES" to mean "some symmetric encryption scheme".

  These methods are not completely equivalent:

  -- With method 1, you forfeit any offline work factor extension that
  the PHS may offer (i.e. you can no longer raise the work factor of a
  hash without knowing the password). With methods 2 and 4 such work
  factor extension can be done easily (if the PHS supports it, of
  course). With method 3, you can do it but you need the key.

  -- With methods 2 and 4, you must either encode the output of HMAC
  or AES with Base64 or equivalent; or the PHS must support arbitrary
  binary input (all candidates should support arbitrary binary input
  anyway, it was part of the CfP).

  -- Method 4 requires some form of symmetric encryption that is either
  deterministic, or can be made deterministic (e.g. an extra IV is
  stored). ECB mode, for all its shortcomings, would work.

  -- Method 3 can be rather simple if you configure PHS to output exactly
  128 bits, in which case you can do "raw" single-block encryption.

  -- Methods 1 and 3 require obtaining the "raw" PHS output, not a
  composite string that encodes the output and the salt. In that sense,
  they can be a bit cumbersome to retrofit on, say, an existing bcrypt

  The important points (in my opinion) to take into account are:

  1. This key strengthening (some people have coined the expression
  "peppering" as a bad pun on "salting") can be done generically; the
  underlying PHS needs not be modified or even made aware of it.

  2. Keys imply key management, always a tricky thing. Key should be
  generated appropriately (that's not hard but it can be botched in
  horrible ways), and stored with care. Sometimes the OS or programming
  framework can help (e.g. DPAPI on Windows). Sometimes it makes things
  more difficult. You need backups (a lost key implies losing all the
  stored passwords), but stolen backups are a classical source of
  password hashes leakage, so if you do not take enough care of the
  security of your backups then the advantage offered by the key can go
  to naught.

  3. For some historical reasons, many people feel the need to change
  keys regularly. This is rather misguided: key rotation makes sense in
  an army or spy network where there are many keys, and partial
  compromissions are the normal and expected situation, so a spy network
  must, by necessity, be in permanent self-cleansing recovery mode; when
  there is a single key and the normal situation is that the key is NOT
  compromised, changing it brings no tangible advantage. Nevertheless,
  people insist on it, and this is difficult. The "method 3" above
  (encryption of the PHS result) is the one that makes key rotation
  easiest since you can process all stored hashes in one go, as a
  night-time administrative procedure.

  4. Key strengthening makes sense only insofar as you can keep the key
  secret even when the attacker can see the hashes. In a classical
  Web-server-verifies-user-passwords context, the hashes are in the
  database; one can argue that database contents can be dumped through a
  SQL injection attack, but a key stored outside the database might evade
  this partial breach. But if the key is in the database, or the breach
  is a stolen whole-server backup, then the key does not bring any

  5. If you _can_ store a key that attackers won't steal, even if they
  get all the hashes, then you can forget all this PHS nonsense and just
  use HMAC_K(pass) (or HMAC_K(user+pass)). The key must thus be
  envisioned as an additional protection, a third layer (first layer is:
  don't let outsiders read your hashes; second layer is: make it so that
  your hashes are expensive to compute, in case the first layer was
  broken through).
P.S. As the inventor of blind hashing (which would have prevented this breach entirely) I have a serious horse in this race. We launch publicly at RSA 2015 in San Francisco. Hope to see you there!

> 3. For some historical reasons, many people feel the need to change keys regularly. This is rather misguided: key rotation makes sense in an army or spy network where there are many keys, and partial compromissions are the normal and expected situation, so a spy network must, by necessity, be in permanent self-cleansing recovery mode; when there is a single key and the normal situation is that the key is NOT compromised, changing it brings no tangible advantage.

This is a really strange advice from Thomas Pornin. People rotate keys because not doing so weakens most symmetric encryption schemes. For example while using AES-GCM with 96-bit nonces one needs to rotate keys after encrypting roughly 2^32 ~ 4 billion messages; otherwise the IV collision probability will be higher than 2^(-32), which is already high enough in most large scale systems (and really bad things happen when the IV is repeated).

Given a salted hash is being encrypted, who needs a nonce? The salt's already taken care of that, right?

Also, if you have 4 billion hashes stored, and you rotate the key, and you still have 4 billion hashes stored... What's changed? You would need a key ring or derivative keys I guess but I think this is actually a case where ECB does the job.

But I guess we've now proven the point that even a pepper is non-trivial.

I don't see that as a negative suggestion: that's a fantastic idea, and for all we know, a Slack employee will read your post, and make their hashing even better. :-)

By not implementing the suggestion, presumably.

This is rolling your own crypto, which is universally bad. To paraphrase Bruce Schneier, anyone can write a crypto algorithm they themselves can't break. Peppering a password hash destroys any future maintainability.

In a general sense, concatenating a pepper with your salt is NOT reinventing anything at all. (You can rephrase that as "Store pieces of your salt in separate places" which doesn't change or violate any traditional, rigorously tested application of salt in cryptography)

There are issues with using it as implemented in some posts here, with nested bcrypts, to be sure, but I think the concept is still fairly sound, though there are certainly implementation pros and cons.


As for maintainability:

I'm also familiar with crypt(3)-style password hashes, where a prefix uniquely specifies the algorithm (and subvariant) used.[1]

Why wouldn't this be fitting here? You can then easily detect, and deal with, passwords that have been tagged with "previous" peppers, such as forcing returning users to change password if a previous pepper was compromised, etc.

1 https://www.freebsd.org/cgi/man.cgi?query=crypt%283%29 or http://man7.org/linux/man-pages/man3/crypt.3.html, for some reason I can't find a link to a more comprehensive list at the moment

By that logic, every extra character you concat onto a salt is also "rolling your own crypto."

For example if my salt was CrytoRandom(10), and you increase it to CrytoRandom(15) you've just "rolled your own crypto" according to you.

If that is not the case then explain the difference between CryptoRandom(15) and CrytoRandom(10) + CryptoRandom(5) (longer salt Vs. salt+pepper).

There's a lot of people spreading FUD ("it is unknown!!!") and nonsense (concat two strings is literally rolling your own crypto!) in this thread.

I don't know if peppers are worth the dev' time, deployment issues, and additional maintenance (e.g. rotation). However I do know that the people arguing against it here aren't making rational counter-arguments that hold up under basic scrutiny.

I'm no security expert, but there are some obvious problems with concatting salt + pepper. For example, most hash libraries include the salt in the output so that you only have to store a single string (also encoded in the output would be the hash algorithm and a few input parameters). So now you've likely revealed your pepper in the DB. Oops! So much for extra security.

But that's not all! Hash algorithms are written assuming the salt is random, and now I have millions of hash outputs in which the last X bytes of the salt are shared. Have you proven that this doesn't increase the attack surface? It certainly sounds like it might. This is exactly the type of side-channel attack that tends to break crypto, and you're giving it away for free.

PBKDFs like bcrypt assume the salt is a random value that is not re-used and is not required to be secret. You are re-using part of it, and require that it is secret. That's the worrying part, and it's one of the reasons that several concrete proposals posted here have actual security flaws.

Maybe. But that only makes sense if you reduce the length of the salt to add the pepper, which nobody is suggesting.

If the salt remains the same length and you add the pepper on top, it won't make the final hash less secure/strong, due to the way hashing algorithms are folded.

At worst case scenario you've literally added no security at all with the pepper. There's no rational scenario where it reduces the security when all other things remain equal (i.e. you aren't replacing the salt with a pepper, or reducing the salt's length/complexity for the pepper, etc).

> At worst case scenario you've literally added no security at all with the pepper.

Yes, I think this is the most likely failure mode (though not necessarily the only one - crypto can fail in very suprising ways!).

But even this is harmful, since you are potentially making changes to security-critical code for no benefit. At best you get more complexity and more chances to introduce bugs, plus a false sense of security.

Ciao Filippo!

That's a nice trick, I've read about it elsewhere but never used, will do for sure in the future!

Are you sure that the intruder did not had server access? I mean the info: "We were recently able to confirm that there was unauthorized access to a Slack database storing user profile information." is not enough to deduce that this was an SQL injection (although might very well be).

Stay strong :-)

I still wonder what value does password leak have. I changed my password, my old password was: DV1wn3yHk6W-8m9lZNo_ now you all know it, so what? I don't care, I believe you don't care either. On the other hand, if they were after valuable data, they had access to database and the got what they wanted. So the password is much less valuable than the other stuff they might have wanted. Like chat logs which might contain credentials to other services.

Right, but many people use the same password for their email account as they do for many other services.

Because I'm sure everyone uses a long, complex, and unique password on Slack. So nothing to worry about, everyone.

If Slack data isn't important, then it would have been fine to use one's default low-value password, right? So now some attacker has thousands of copies of the word "dragon".

> No financial or payment information was accessed or compromised in this attack.

This wouldn't be my first concern. It would be all of the confidential communication that happens within slack.

>"If you have not been explicitly informed by us in a separate communication that we detected suspicious activity involving your Slack account, we are very confident that there was no unauthorized access to any of your team data (such as messages or files)."

Under their FAQ on the post. It could be inferred that there was some unauthorized access to certain users' communication logs?

The post notes that the breached database is the user table, which would not contain chat history. I agree that making this abundantly clear makes sense.

This makes it sound like other data was compromised for some specific users. Since they didn't go into how they know it was only for only these users, I'm not very confident about this.

> As part of our investigation we detected suspicious activity affecting a very small number of Slack accounts. We have notified the individual users and team owners who we believe were impacted and are sharing details with their security teams. Unless you have been contacted by us directly about a password reset or been advised of suspicious activity in your team’s account, all the information you need is in this blog post.

This is actually an interesting point. A compromised user table could conceivably be used for all sorts of nefarious purposes. If the attackers "having access" to the information in that table includes the ability to modify that table, then it is pretty much open season on Slack. For example, an attacker could replace a target user's password-hash with a hash that the attacker knows the plaintext of. Depending on the implementation of the random salt, the attacker may have to replace the salt as well. Then, the attacker logs in as the user, downloads the desired chat history, logs out, and sets the password hash to the original. Not enough information was really given in the blog post, but by the sounds of it, some teams experienced more targeted attacks.

I would suspect things like "being used from a completely new country" or something similar. Could be those are the accounts with weak passwords that the attacker tried the top 10,000 passwords against.

If you get the user table, you can log in. If you can log in as (some) users. If you can do that, you can see (some) chat history.

edit you can log in if and when you crack some of the hashes.

Incorrect. You can't login with a password hash, you need a password.

If you get the user table, you can crack the password hashes offline, at your leisure.

While technically true, this seems like it would be computationally infeasible, or at least impractical, given that they were not just hashing but also salting the passwords.

Of course, I barely know anything about computer security, but at least it should prevent attacks using rainbow tables I think?

No, a simple password like "slack123" should be easy to crack with any usable password storage method.

Not necessarily easy. If they're using a decently high cost for their use of bcrypt, we're talking hours to days (or more) per user, even when only considering weak passwords like that.

True, I guess it's possible to crack a password for a single user, especially one with a weak password. I was more thinking that it's unlikely they'll be able to crack the passwords of everyone who was in their database, and given that Slack has so many users it's unlikely for any single person that his/her password will be cracked.

Of course, even if they can't steal everyone's passwords, maybe the hackers will try to crack the passwords of higher profile targets.

GPUs are fast enough to crack a very large percentage of passwords in a short time by brute force, if a simple algorithm was used, even with salt.

With a separate salt for each password not even the NSA can crack that (that we know of). With a single salt for all of them, maybe.

Sure they can. Anyone can. It just takes a long time per password to crack (that time is a function of the cost/# of rounds of the hashing function).

No kidding. That's why I put (some) users. Because brute-forcing the hashes will give you some password plain texts.

I guess that I missed a step in the explanation where you attack the hashes.

However I see that they say that they are using some best practices (bcrypt, "salt per-password") so this attack will be largely mitigated.

Depends on the nuances of the system. If you can pass-the-hash, you can get in.

Agreed. The content of the chat's would be potentially much more important in my mind.

Which leads to the question if slack encrypts the chat data in the database.

That would make implementing search quite hard so I'd say - it's pretty likely they don't encrypt it.

If anyone from Slack is reading this, the encryption should be an option, even if it means disabling or substantially slowing the search feature.

If they encrypted it, Slack would have to hold the key, so that all users in an org can then read existing messages.

No, it could be a private key shared among users.

That's not right. There is no need to store text body in order to index it. Furthermore, you can implement an index of token hashes, rather than an index of tokens.

It would remove a lot of nice search features, however. If you just index tokens without positional information, you have a much harder time performing phrase matching. If you include positional information, you can probably crack the encryption because some tokens are statistically more likely to appear next to each other than others.

If you index shingles (phrase chunks) instead, you lose out on sloppy phrases...you can only match exact phrases. I imagine you can perform a similar statistical attack too.

Hell, just getting the term dictionary would probably allow you to reverse engineer the tokens, since written language follows a very predictable power law.

Hashing also removes the ability to highlight search results, which significantly degrades search functionality for an end user.

Basically, yes, you can do search with encrypted tokens...but it will be a very poor search experience.

If they dont encrypt storage they are highly negligent. Index and search are done in RAM,which is slightly harder to steal than disk data.

This reminds me of the plot of Silicon Valley

Is there a good reason to keep chat data longer than it takes to deliver it to the recipient?

They archive chat messages so that you can search through them later.

That alone would be a great reason not to use them.

It's also a great reason to use them, isn't it? Your searchable chat history basically becomes the knowledge base of your company.

And a great target for discovery in any sort of lawsuit.

As is email.

To me, that is something that you should keep internal, on internal systems with vetted free software.


It's configurable for paid accounts, and can be set as low as one day. However, one of the best features of slack (and products like slack) is message history and search. Otherwise, IRC isn't all that different (WRT messaging).

It's why I love Slack. If I remember a conversation about something two months ago I go to the room, search and find exactly what I needed.

Maybe it's time for Slack to adopt the Axolotl ratchet, too.

I'd love for them to do that, but there's a couple of problems that they'd have to overcome first.

First: Slackbot. This is a Slack-run bot that's in every channel; team owners can customize it to do various things, like scan messages for keywords and give out canned responses. Even if Slack adopted some variant of encrypted chat, each message would still need to be readable by Slackbot, so Slack would still have the means to collect every message.

Second: channel history. When I join a channel, I can see the messages in that channel from before I joined. This means that Slack (the server) must be able to give me those historical messages. In an encrypted group chat, the messages are encrypted only with the keys of the participants at that time, which means newcomers can't read them.

I'm sure there are other features in conflict with end-to-end encryption, too; these are just off the top of my head.

The first could be solved by having the activation part of the bot run on the clients themselves, and only send those messages in a readable way to the server.

As for the second, the server could ask one of the clients to re-encrypt the channel history with the newcomer's key. It would only fail if nobody was online the moment you joined the channel (and you still could get it later).

My concern are the usernames, emails and phone numbers that were probably not encrypted

ultimately passwords can be changed; internal chat messages regarding personal and confidential data can not be taken back.

User metadata can be used for social engineering, and people are typically the weakest link.


Encrypting user data should be a common practice like hashing passwords.

> Exactly!!! > Encrypting user data should be a common practice like hashing passwords.

I get the feeling that you've never done this before and you don't understand the technical challenge and implications of the added complexity you propose here for an essentially free to low-price all-in-one communication online service.

Slack is not the NSA, encryption is not the answer to every security problem out there.

Third party authentication should be the norm. Leaving authentication to providers that absolutely know their shit, just like we leave payments to third party services.

Of course, that requires a decent protocol, and Mozilla is doing the world a disservice in not marketing Persona better seeing as it's the right solution....

Major privacy issues, single point of failure etc etc. We leave payments to third party services because nobody wants to deal with the compliance nightmare that PCI-DSS is, not for security reasons. Payment is also mostly less sensitive to availability and latency issues than authentication.

So in a world where PCI-DSS isn't a thing, you're fine entering your credit card data directly on the forms available on random websites?

Why's a password so different, seeing as most people reuse those passwords? Why do we essentially allow (and yes, I am excluding those that use password managers in this statement, I'm one of those) access to our webmail and other critical services to random websites on the internet? What makes this right?

> Payment is also mostly less sensitive to availability and latency issues than authentication.

That's patently untrue. Latency issues are nonexistant in both areas, and availability issues are critical in both areas.

Yes, I have no problem entering my credit card data directly on the forms available on random websites.

Credit card payments online are so ludicrously insecure that it baffles me it's even legal. I only use them when dealing with the US (although some of the major retailers like Apple have finally started accepting 21st century payment methods), and I simply assume my credit card info has been leaking all over the place for ages.

The whole basic premise of credit cards is "we know it's totally broken, we'll just refund you the money because it's cheaper than fixing the problem".

> So in a world where PCI-DSS isn't a thing, you're fine entering your credit card data directly on the forms available on random websites?

Yes. It might be a hassle should someone misuse it, but the status-quo effectively means if I didn't make the purchase I'm not responsible for it.

More importantly, this was proven before PCI-DSS was a thing.

You mean like how Authy specialised in two-factor authentication, but still managed to have basic string concatenation bugs that rendered their entire 2FA system bypassable?

Huh? This is the first I've heard about this, and searching for "Authy concatenation bug" isn't turning up anything useful.

Here's the write-up from Homokov. The guy is a pen-testing genius: http://sakurity.com/blog/2015/03/15/authy_bypass.html

But if you just want the money shot: http://sakurity.com/img/smsauthy.png

Yes. Typing '../sms' in the field bypassed the 2nd factor. Just, wow.

Huh. Well now I know. Thanks!

Amazing what you can do with improperly-implemented input sanitation :)

This probably could've been prevented by disallowing non-number inputs, no?

"In fact the root of the problem was default Sinatra dependency 'rack-protection'".

They were doing the input sanitation, but it wasn't the very first thing in the processing pipeline, since "best practice" was to pipe everything through 'rack-protection' first.

Homokov was first to state, this was really a black-swan type bug which 99.9% of the time makes it into production. Apparently, they were doing the "right thing" and still got burned.

The parent meant "This probably could've been prevented by disallowing non-number inputs" in SDK libraries. Yes, if SDK would cast everything to digits it wouldn't be possible. It is also quite obvious security-in-depth for a 2FA API. Now they do it.


Or even just input validation on the form itself before passing on to the API, which is more of what I was getting at. I don't know about the details of Authy's setup, but I know that AJAX (for example) supports enforcement of specific value types in text fields.

Basically, the form itself could have (and maybe even should have) required numeric-only values, seeing as Authy's codes are either 6 or 7 digits long and contain no alphabetical or special characters.

:-( Sorry, typo. And HN won't let me edit now, grrr!

hey, that causes some immediate stir in my mind as a user of Authy. Could you share any reference to the incident you mentioned?

... no? no I don't mean like Authy.

You never decrypt a password however. You only compare the hashed version of the claimed one to the stored hashed version, a one-way operation.

What could you do with a one-way encrypted phone number? I'm not able to enter a phone hash to make a call.

Encryption isn't the same as hashing. Encryption is two-way.

The previous comment did make the encryption / hash distinction - though I can totally understand how his post might have been misread that he was recommending the same mechanisms for both sets of data.

OK, so slack stores a username, name and email address for each user. This is visible to everyone else in the same Slack team at minimum. You also need it for e.g. password resets, perhaps billing.

We can assume they aren't total idiots and there's a Internet facing application server that connects to a internal-only database server that has this data. Also, assume SQL injection is not the attack vector.

How would you apply encryption to protect the username, name and email from an attacker that has gained access to the application server? I've gained some shell on the server and have 24 hours minutes to extract data. I can see all the files on the server but maybe as non-root but just the user that runs the application. How can you, as a security sensitive application developer, stop me if I've gotten so far?

I wouldn't. I don't agree with his point either (see my response to him: https://news.ycombinator.com/item?id=9277659).

Why? Encrypting e-mail addresses would break password reset features and phone numbers are generally public anyway (yes you can go X-directory, but the real issue here is why these services require a valid phone number to begin with)

Why would encrypting email addresses break password reset? You can encrypt the database at rest such that the application has a private key that can decode it. That way both the application and the database server need to be breached to obtain anything usable.

It's often a bug in the application that exposes the database, so the same bugs might also be used to expose the private key.

It's also worth noting that it wouldn't just be the web servers that require your private key, it would also be any mail servers you use for sending your newsletters and such like (assuming these aren't run on your web servers - which often isn't the case). Then there's your telephone support staff, who would also may need to know your e-mail address so they could do their job effectively. And any other operators that might compile data extracts, eg for 3rd parties where users have given permission for your details to used / sold.

Quickly you're in a situation where your private key is more available across your infrastructure than the e-mail would have been if it wasn't encrypted to begin with.

Now lets look at the cost of such a system. There's an obvious electricity / hardware cost with the CPU time required to encrypt / decrypt this data (after all, CPU time is the general measure for the strength of encryption) and the staffing cost with the time wasted jumping through those extra hoops. The development time, code complexity, etc - it all has a cost to the company.

So what's the benefits in any companies doing this? They don't gain any extra security? This is really more of a privacy policy for their users; and users which are that paranoid about their e-mail address being leaked should either use a disposable e-mail account or shouldn't be using a cloud-based proprietary messenging network to begin with. What's more the chat history might well have your e-mail address in anyway (eg "hi dave, I'm heading into a meeting shortly, but e-mail me at bob@example.com and I'll have a look tonight")

Don't get me wrong, I'm all for hashing / encrypting sensitive data. But pragmatically we need to consider:

1) are e-mail addresses really that sensitive? Or instead should we be encouraging better security for our web-mail et al accounts (eg 2 factor authentication) to prevent our addresses being abused. Given that we give out e-mail addresses to anyone who needs to contact us, I think the latter option (securing our email accounts) is the smarter one

2) instead of encrypting phone numbers and postal addresses, should we instead be challenging the requirement for online services to store them to begin with? If they have my email address, why do they also need my phone number? Postal address I can forgive a little more if there's a product that needs shipping or payments that need to be made.

Or just the application. Generally, it's much easier to convince apps to give you the data instead.

It's still worth mentioning, even if it's not your "first concern".

Assuming (no evidence, it's just very common) that this was a SQL Injection, here are some ways to protect yourself:

* Use http://en.wikipedia.org/wiki/Database_activity_monitoring. If you don't list users on your site and you get a query that would return more than one user record, it's a hacker

* Add some http://en.wikipedia.org/wiki/Honeytoken s to your user table, and sound the alarm if they leave your db

* Use Row-Level Security

* Database server runs on own box in own network zone

* Send logs via write-only account to machine in different network zone. Monitor logs automatically, and have alerts.

* Pepper your passwords (HMAC them with a key in an HSM on the web server (then bcrypt). Don't store key in db). https://blog.mozilla.org/webdev/2012/06/08/lets-talk-about-p...

* Use a WAF that looks for SQL injections

* [Use real database authentication, per user. Not one username for everyone connecting to db. Yes, this is bad for connection pooling]

Why would you assume that? There are plenty of ways to hack into stuff without sql injections.

It's the most common vulnerability. https://www.owasp.org/index.php/Top_10_2013-A1-Injection

It's the most common vulnerability on the web. It's certainly not the most common vulnerability in projects built under popular non-php frameworks. Under that model, it's harder to create a situation where a SQL injection is possible than not.

Edit: Slack's in PHP, I thought it was in RoR for some reason. Oops.

Slack is a web service written in PHP, so I'd say elchief's assumption is reasonable.

Additionally, Slack has had SQLi attacks found against it in the past, which is proof that they aren't defending against it systematically.

I think it depends on the structure of the system... If you separate authentication from profile information, then you can run each on separate systems. They were already using bcrypt, which is a fairly strong salted hashing system. As to restricting access, having all access to the database restricted to API servers that provide the limitations you mention, you get a similar level of control, without the complexity of managing per-user database logins. With per-user database logins, you are then subject to whatever system the dbms uses. If you are using systems with looser less fine grained controls, you can be even more limited.

The most important thing to stop sql injection is to validate your parameters on the server side.

Yes, but:

1. Not all SQL statements are parameterizable (dynamic identifiers vs literals)

2. Stopping SQL injection doesn't stop Insecure Direct Object References

3. Developers make mistakes

4. Plugins are a risk (example: http://www.zdnet.com/article/over-1-million-wordpress-websit...)

For parameterization to work you need to be perfect, always. My suggestions are for when someone else fucks up.

I am an application security professional, and I created this account in order to make this post after reading many of the comments on this thread.

Many of the comments have great suggestions. However, very few talk about the most important part of creating mitigations and designing key management/crypto. What is the security target?

Before throwing new designs at a problem, the attackers and attack vectors must be defined. If you don't know who you are guarding against and what they will do (and what data they will steal), then how can you possibly say what is a good mitigation??

One might argue that the threat is obvious, but I'll guarantee you that there are dozens of threats here. List them. Prioritize them. Then mitigate them. It is helpful to fully understand the problem/solution space before jumping in with pepper's, salt's, extra databases, and solutions.

It's refreshing to 1) see a breach notification including the actual password hashing algorithm, 2) see they're using a strong one like bcrypt (presumably with a reasonable cost factor).

Regardless, this is an example of why cloud communication (and ticketing and database off-loading [see MongoHQ] and...) systems probably won't ever become commonplace in most of the government space and the finance and health sectors.

I think this just goes to show exactly why these systems will become more commonplace. There are only so many security experts to go around. Having all the very best concentrated on a smaller set of services seems like it makes more sense than trying to get a security expert for every service.

I have a friend who works security, previously for the government, and now for Visa. In his opinion: if you haven't been hacked, you just aren't an interesting enough target for the right people.

I don't know how common this line of thought is in security. But if it is common, then if you're a small company, aren't you better off not hosting my stuff at these large companies, because it's putting information you collect with someone who is more likely to be "interesting" to the right people?

I think a more correct assessment would be:

If you think you haven't been hacked, you probably have (or you are so small that you may have only been probed by bots).

If you haven't actually been hacked yet, it is only a matter of time. Ideally you start designing security layers now, before you are compromised.

As the famous quote goes: there's two types of companies - those who have been hacked and those who will be. Actually, there is a third kind: those that will back hacked again.

You have to look at it from a regulatory and compliance standpoint. While I agree from a technical standpoint that the average company's data is probably going to be safer at Slack than in some internal system, the accountability risk is just too high.

You can't prove your cloud provider is using security best practices, while you theoretically can prove (or disprove) the same internally. Few companies do proper auditing, reviews, and pentests, but they have the capability to do so.

> You can't prove your cloud provider is using security best practices

But you don't have to. They do. Look at Amazon. They publish their security audit each year, and now every company that uses them doesn't have to do their own audit. They can point their auditor at Amazon's report and say "see our datacenter passes".

Also, do you do a security review on your power company, or do you assume they've done it?

I agree, but can't deny self-hosted means your security lapses see less fanfare, which has value to the biggest and most lumberingest risk-adverse organizations

not to mention: you can host a chat server in your company network, somewhat protected from random people on the internet, and your ops people should already be securing that from external intrusions anyway.

I'm mixed on this. It's undeniably true if everyone is in the same building but in my experience this is rarely actually true as people need to work from home, use mobile devices, other offices open, partnerships or acquisitions happen, etc. That tends to lead to people requesting holes in firewalls or using VPNs as a sort of aftermarket spray-on security measure, which inevitably makes things much worse because now you have exposed services[1] which were setup and operated by people whose threat-model was “Sheltered internal service accessed by trusted staff”. It's much better to start with the assumption that your services are exposed to the internet and secure them accordingly.

1. VPNs are almost always terrible for security because people tend to get them for one need (i.e. email, one line of business app, etc.) but when their computer gets compromised the attacker now has access to the superset of everything which every VPN user has ever needed to access and in all likelihood a bunch of other things which were never intended to be exposed but were never adequately firewalled internally.

> Regardless, this is an example of why cloud communication (and ticketing and database off-loading [see MongoHQ] and...) systems probably won't ever become commonplace in most of the government space and the finance and health sectors.

I agree. We might not like rolling out our own instances, but it prevents hackers from being able to grab ALL THE DATA in one fell swoop. It really amazes me that some EHR systems have gone the cloud route.

It's heartening to me. I've seen small practices with atrocious IT security. No WAY is self-hosted (for the thousands of small practices with maybe a couple of clueless help-desk types) even a billionth as secure as a professionally secured cloud service.

Also, "cloud" for services like this means "your own private instance of the software running in a private VM in our datacenter" not "your own customer_id in a shared database."

OTOH, if you're small, you are not as interesting a target as a huge cloud provider that hosts everyone. Which means, while small practice's security should be good, it doesn't actually have to be as good as the big cloud behemoth.

It's why your gmail account is more likely to get hacked than my piddly self hosted imap server. Google's network security is unarguably better than mine, but you are never going to social engineer your way into changing my password, which is actually doable with gmail (happened to my sister in law).

Also, it is far easier to harden one or two hosts than entire farm of different devices.

If you had a vulnerability in an EHR that was run locally at many different hospitals a hacker would still have to target every single hospital that uses it and wade their way through a bunch of different custom configurations. It's not as juicy a target as a cloud-based system where a single vulnerability can get ALL the data of ALL the hospitals EVER in one location. (Like the Anthem hack.) I agree that most locally run systems are more vulnerable than the professional cloud based services. But cloud services are more exposed to attack and are a more profitable target for hackers due to their size.

I think you have to assume that you're going to be hacked if you're a big enough target. You don't know what you don't know about your vulnerabilities. The better question is how you're going to design your data and platform to minimize the damage a major hack can do.

If you're small, cloud may be better, but if you're large it often isn't.

It's true that large will have more resources to do security right, but also they become a bigger target. If a small company self-hosts, they are less likely to be targeted than if they are a customer of a big cloud service where hackers might incidentally steal their data because it's there with thousands of other accounts.

I guess what I'm saying is that regardless of who you are, there is no easily discernible best practice playbook, just a sea of tradeoffs generally made by people with a woefully inadequate grasp of the risks involved. Heck, even the best security people are at a disadvantage in the asymmetrical battle of infosec.

Large does not mean you have better security, as I understand. (see Sony)

Yes, but you at least have the money and resources available to have better security. If you choose to squander those resources and not dedicate a large enough budget to your security department, that's your fault.

Small companies typically can't afford competent and professional security analysts, engineers, penetration testers, and auditors.

bcrypt is only strong if their cost / work-factor is set correctly

The default cost for most libraries and languages is between 10 and 12, which is considered too low for 2015 but still pretty good. As long as they're at the default or above it, I wouldn't be too concerned about an attack against the whole DB.

Targeted cracking attempts against specific hashes are definitely still an issue though.

If I set bcrypt cost to 11, hashing takes 0.1 seconds. At 12, it takes 1 second roughly. Setting it to anything higher leaves my service open to Denial-of-Service attacks, so I'm very hesitant to increase the cost factor.

To you have a credible source for the "10..12 is too low for 2015" claim?

HHVM 3.6 on a small Ubuntu server

You have either a very slow server or a very bad bcrypt implementation. Running bcrypt in python on my 5 year old server has these results:

>>> timeit.timeit("bcrypt.hashpw('this is a password', bcrypt.gensalt(11))", setup="import bcrypt", number=5) / 5


>>> timeit.timeit("bcrypt.hashpw('this is a password', bcrypt.gensalt(12))", setup="import bcrypt", number=5) / 5


>>> timeit.timeit("bcrypt.hashpw('this is a password', bcrypt.gensalt(13))", setup="import bcrypt", number=5) / 5


>>> timeit.timeit("bcrypt.hashpw('this is a password', bcrypt.gensalt(14))", setup="import bcrypt", number=5) / 5


>>> timeit.timeit("bcrypt.hashpw('this is a password', bcrypt.gensalt(15))", setup="import bcrypt", number=5) / 5


That's five repetitions of a bcrypt hash with the work factor passed in bcrypt.gensalt(). The resulting units are seconds.

You are right, my times are apparently somewhat dated. HHVM 3.6 actually gives me 1.88 seconds with costs of 15.

Good thing you made me re-measure :) That makes 13 my new bcrypt default.

Considered to low by...?

Exactly this. If they used ten rounds, it's dire, and just saying "bcrypt" doesn't say much unless you also specify the number of rounds.

It says a lot more than your passwords are safely stored behind unsalted MD5 :)

10 rounds of bcrypt is "dire"?

Yes? 16 rounds take 1ms on my (old) machine. In Python, no less.

10 is obviously the log rounds number. It's not even a power of two! Nor has any implementation of bcrypt even supported such a low number.

How is it "obvious" when the unit is rounds? Ten was an example, round to 16 if you like. The point is still the same, using few rounds is a risk.

Because the set of values that make sense as linear round counts doesn't overlap with the set that makes sense as log base two work factors. Every implementation takes the log number; it's the only number people ever discuss.

And do they call it "rounds"? I've only heard it called work factor.

As a shorthand for work factor? Sure. It may be technically inaccurate, much like talking about centrifugal force, but you'll see "10 rounds" far more frequently than you'll see "1024 rounds". There's another thread on this post that refers to it as rounds as well.

I hate to suggest that your observation is wrong, but 16 rounds should take orders of magnitude more time than 1ms. 16 rounds using Mindrot's Java implementation of BCrypt on my admittedly old 2009-vintage i7 consumes 6.3 seconds to hash a 10-character password.

That's because you're conflating "rounds" with "work factor". "Work factor" is actually 2^rounds, you're using 65536 rounds. Try 4.

Thank you for pointing that out. I suspect many of us in this thread are referring simply to the single parameter to BCrypt.gensalt as the "work factor" or "number of rounds" interchangeably. And you're right, the work factor is what is actually provided to gensalt.

Nevertheless, in all implementations I am aware of, the default for that parameter is 10. And earlier, you wrote:

> If they used ten rounds, it's dire, and just saying "bcrypt" doesn't say much unless you also specify the number of rounds.

tedunangst and I both assumed you were referring to the default 10 work factor of BCrypt and were calling it "rounds" as many of us are doing.

The obvious question that tedunangst is asking (and others in this thread) is whether a work factor of 10 is considered too low.

No, a work factor of 10 is usually fine. I generally use PBKDF2, which uses a parameter for actual rounds, and set that to about 20k, but don't think about rounds, just see how many authentications per second you need to be doing at the most, time your servers and use a parameter that gets you those authentications. 200ms is usually okay for most applications.

If you haven't been paying attention, the US Govt (including the DoD) is moving out in the commercial cloud space in a big way.

How does one discover that they were hacked? The post states that the breach occurred during February, and this is the end of March... did it just take them a long time to react and write a post about it, or did they likely discover after the fact? If so, how?

At a place I worked at we discovered it in a couple ways. One was a routine scan done by our host provider, looking for malicious files meant to do things like create a web-shell. After they found the malicious files they had logs to determine the time frame of the attack(s).

And in another instance the hacker emailed us asking for ransom.

That's a really good question and one that probably has a different answer for every breach. In this case it's also probably a question that only Slack could answer for you. In regards to the second half of your question, being that they only recently went public about it, I suspect that they most likely did discover it after the fact.

True story:

Log into server. Why is server slow? Run `top`. Hmm, `./exploit` is consuming 99% CPU...

./exploit ...? Seriously??? LOL

Step 1) Discover a hole in your code, Step 2) go back to logs and see if anyone ever used that hole, Step 3) panic.

Logging every API request through every layer into ElasticSearch/Logstash or something similar for starters.

Usually, if the attacker doesn't dump your info or otherwise blatantly advertise themselves, the FBI tells you.

The wording of the article suggests that they released 2-factor auth -after- the breach happened. This is purely speculation, but one possibility is that they wanted to get their ducks in a row (i.e. have some enhanced security options in place) before announcing the breach. Mitigate the PR damage.

logs -- perhaps they hired a security firm to do periodic audits.

I recommend "The Cuckoo's Egg". It's a great book about traking down a hacker and explains one of the ways it was done.

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact