Hacker News new | comments | ask | show | jobs | submit login
Is my developer's home-brew password security right or wrong, and why? (stackexchange.com)
64 points by eranation on Dec 19, 2012 | hide | past | web | favorite | 86 comments

I expected daily-WTF like idiocy, but Dave's code isn't insane. It's a pretty standard combination of a (sufficiently random) salt and a SHA1 based hash. SHA1 was until recently regarded "strong enough". Dave's code is unnecessarily obfuscated, but whatever. I don't see how this can be considered "homebrew password security" given that all the heavy lifting is done by SHA1. Anyway, Dave's solution is as good as a SHA1 based solution gets. That's already better than 95% of the websites out there (citation needed), but of course worse than the bcrypt approach suggested by the OP.

The OP pretends to innocently ask "Is my developer's home-brew password security right or wrong, and why?". I'm going to call BS on this. To me it's pretty clear the OP is trying to passive-aggressively bully/ridicule/shame Dave in order to get his way. And sure enough, the top ranked StackOverflow has somebody going through the code line-by-line going WTF at every opportunity. The OP isn't trying to figure out whether he or Dave is right, he's clearly already convinced he's right and wants the StackOverflow community to collectively pat him on the back. Kind of childish.

To me it's pretty clear the OP is trying to passive-aggressively bully/ridicule/shame Dave...

I had a different interpretation; OP is "Dave." It's sort of a time-honored tradition, on-line and off, to seek advice "for my friend."

Nearly impossible. Every single sentence in the post provides evidence against that theory. (Dave "Insists on using a home-brew script" / use of "Homebrew" vs "Standard solution" / Dave's "primary hang-up" became clear after he "immediately balked" / Ridicule of Dave's code is "sincerely appreciated", etc). It's so over the top that the entire post may just be an elaborate troll.

Nearly impossible. Every single sentence in the post provides evidence against that theory.

On the internet, nobody knows you're a Dave.

That's not a salt, it's just broken. A salt can (and should) change from user to user, but not from minute to minute for the same user. That's why salted hashes have the salt stored as a substring of the hashed result: so that you can recalculate the hash using the same salt that was used the first time. The mistake in the code here is that the "salt" is being hashed along with the password. Oops.

Just to be sure, I ran the code twice, a few seconds apart, using "rob" and "password" for the parameters. Here's the results:

It doesn't work.

Presumably the salt is stored separately in the database.

That would be fair, but I didn't assume so since that wasn't mentioned in the original post.

He does seem to imply that Dave's solution works, and if you read the last line:

$hash = hash_it($password, $crypt);

So it is fairly clear that $hash and $crypt would be stored in the DB. $crypt is a reasonable (but not great) salt:


$user is obviously known, $time is the time when the password was created - or maximum ~2000 possible values for most systems. The main part of the hash is $rand, which is 31 bits of pseudo-random entropy. So about 40 bits of entropy if I'm not mistaken? That should be enough to make it infeasible to create rainbow tables in advance.

EDIT: I'm wrong on $time - it is "mdYHis", so much more than 2000 possible values and is actually pretty decent in the salt, as it's unlikely an attacker would know which user registered when. Unless you publish the registration date as some forums do...

EDIT2: While user is known for each user, it does add extra required input for a rainbow table - if there are 1000 users and you know all of their usernames, you would still need to generate 1000x more rainbow tables (1 per known username) if you wanted to crack all 1000 passwords. Also slows down brute forcing, as you can't use the result of one 'trial' on all users at once. (Correct?)

The real main issue is the speed of brute forcing MD5/SHA1 - it's probably possible to brute force these passwords very quickly if the database is stolen. Use of bcrypt is therefore obviously preferred to slow it down - but it is also very likely that FPGAs can brute force bcrypt just as fast, so I'm not sure how much I agree with the common mantra of 'sha1 is useless, use bcrypt'. bcrypt does, after all, have a speed impact on your login requests and puts more load on your servers if you are handling thousands of logins per second...

I guess large scale FPGAs are probably only in use by NSA and some syndicates, while far more hackers have access to mass GPUs?

There are a few things in your comment here that I think are wrong or irrelevant, but I know very little about this stuff, so I'm reluctant to be adamant about it.

> ...rainbow tables...

That's not really relevant anymore. Rainbow tables aren't used, and haven't been for a while; once you start using salts of any kind, there's no point to rainbow tables.

> Also slows down brute forcing, as you can't use the result of one 'trial' on all users at once. (Correct?)

Correct, but any random salt will do. I think you'd usually be better off reading from /dev/urandom though, but for the purposes here I don't think it matters.

> The real main issue is the speed of brute forcing MD5/SHA1 - it's probably possible to brute force these passwords very quickly if the database is stolen.

Exactly. Ordinarily, what happens is a database gets ripped via an SQL injection or some other thing, and then a list of the usernames & passwords gets posted to a forum somewhere where people take turns brute-forcing them in chunks.

MD5 and SHA1 both give recognizable hash values -- you can glance at either one and usually tell which it is.

So someone with a big list of usernames and known salts and MD5s just proceeds to generate MD5s of a list of common passwords + the salt, and then compares that to the list of stolen hashes, and ends up with a pile of passwords that work.

But, there is a challenge here in GoofHash(): if only the database was stolen, and if the methods used to generate the hashes aren't known, then the hashes would be useless. Attackers could tell at a glance that they had SHA1 values, but even assuming that the "salt" was clearly labeled as such in the stolen database, brute-forcing salt + common passwords wouldn't give them the SHA1 values they're looking for.

So I'm a little circumspect about how bad this approach is. I still wouldn't hesitate to say that it's not as good as using known approaches, but if the end goal is the protection of users' passwords, this might be sort of OK -- as long as nobody knows what they're dealing with. Still though, "security by obscurity" is generally not considered to be a smart approach by people like Schneier.

edit: Oh, I meant to respond to this,

> ...but it is also very likely that FPGAs can brute force bcrypt just as fast...

I don't think so. bcrypt (or, better yet in this regard, scrypt) depend upon stretching -- hashing the hash over and over again. Standard SHA1 doesn't do that, so you're pretty much comparing "one iteration of SHA1 versus thousands of iterations of hashF()"; assuming that hashF() isn't found to be broken in some horrible way, thousands of iterations of that will always be slower, on all hardware, than a single iteration of anything else.

>> So someone with a big list of usernames and known salts and MD5s just proceeds to generate MD5s of a list of common passwords + the salt, and then compares that to the list of stolen hashes, and ends up with a pile of passwords that work.

You mentioned earlier that rainbow tables aren't used anymore - but this is exactly what rainbow tables are. Precomputed lists of passwords + salt. That's why I was talking about the entropy in the salt earlier - the more randomness in the salt, the more infeasible it is that anybody would have your rainbow table. And you're right, /dev/urandom of a fairly large size is perfect, but the crypt($user.$time.$rand) is not all that bad either as it does have $rand in there.

I think you were agreeing with me here, and if so I'm just confirming your confirmation. ;)

>> thousands of iterations of that will always be slower, on all hardware, than a single iteration of anything else.

If your FPGA array can brute force 10 characters of bcrypt in 8 seconds and 10 characters of SHA1 in 1 second - does it really matter? If you have pushed the time to crack down into feasible time, the exact number of seconds becomes irrelevant. The hacker may have to step out for a coffee if you use bcrypt instead of getting the results right away. Hardly more secure?

> ...but this is exactly what rainbow tables are. Precomputed lists of passwords + salt.

Rainbow tables are precomputed lists of hashes: md5('password1'), md5('password2'), etc.

Rainbow tables are not precomputed lists of hashes with salts: md5('password1'.salt), md5('password2'.salt).

At the very least, salts will vary from database to database, so there's no reason to "precompute" anything that's salted. It doesn't gain you anything.

What we're talking about here is different, the common approach to dealing with salted hashes. You have a big list of commonly-used passwords -- "abc123", "logmein", etc. -- and you have a list of hashes from your stolen database, and a list of salts from the same. You feed all that in to a program that just computes the hash using your password list + the known salt.

If the salt varies by user, the attacker is in for a long day, but can still expect to break a lot of passwords if the database is using MD5 or SHA1.

If the salt varies by user, and the database is using bcrypt (or better yet, scrypt), then the attacker is in for a long couple of years. bcrypt with a decent work factor should take in the neighborhood of 1 cpu-second on non-specialized hardware to compute a hash. If you have a database of 15,000 passwords, you're looking at 4 cpu-hours per hash.

Compare that to GoofHash(): on my sorta-decent laptop, GoofHash() runs in 0.035s. That means that I can compare my entire password list against each user in the database in 8.75 minutes.

That's a huge difference.

Pretty good calculations of the kind of expected time: http://security.stackexchange.com/questions/12994/whats-the-...

These are with MD5/SHA1 style hashes, which shows that your password is pretty safe regardless of hash method (as long as it has salt) if it has enough entropy in it. Pretty interesting.

Nice link, thanks for sharing that.

So, this also is a slightly different situation. If you are an end-user, yes, you can protect your individual password against stupid hashing schemes by using passwords with lots of entropy. (I use KeePassX for this.)

However, if you are a service provider, you have to protect your users from the stupid passwords that around 30% of them are likely to use and re-use elsewhere, and you do that by using a smart hashing scheme (like bcrypt or scrypt) that makes brute-forcing the passwords an annoying and time-consuming endeavor.

The goal of a salt is to make it so that each password must be targeted individually. The salt need not be random, only unique and sufficiently long. It is impractical to precompute the hashes for (that is, have rainbow tables for) every possible salt, since rainbow tables take up a huge amount of space and take time to generate. Instead, you have to choose a specific password you want to attack and generate the table just for that password. Somebody is not just going to "have the rainbow table" for the salt G$F#V2.

Your calculations are off by several orders of magnitude. Bcrypt can be customized to take more time, but a reasonable amount of time might be 0.1 seconds per password, or 10 passwords per second. Most users are already logged in, so logging in should not be a very common operation.

By comparison, a $500 graphics card can hash billions of SHA1 passwords per second. So what takes 1 second to brute force with SHA1 will take 100 million times as long, or over 3 years, to brute force using bcrypt.

Note that a 10 character all lowercase password has 26^10 possibilities. If we are just targeting this password it will take 19 hours at 2 billion SHA1 hashes per second (well, on average half of that, so around 10 hours), and over 447000 years at 10 bcrypt hashes per second(again, half that on average).

sha1 has the same features that make md5 a bad password hash: It's very memory and computation-efficient. It's relatively easy to implement in hardware. That makes it susceptible to brute-force attacks since you can test far more passwords than with any other algorithm. That doesn't mean that sha1 is a bad hash, it's just for a different use-case. You can stretch sha1 and make it more cpu-intensive by hashing multiple rounds - then you've invented pbkdf2.

As something of an amateur cryptographer, that is DailyWTF-level idiocy to me. One of the fundamental tenants of cryptography is Kerckhoffs's principle, that we should be able to publish the designs of cryptosystems for public scrutiny and they should hold up to that scrutiny (otherwise known as "security by obscurity is bad")

It seems like we have a few problems with Dave as well: NIH combined with a lack of understanding about security. These together are a great way to produce insecure systems. I don't know how you fix people like Dave and would personally find them frustrating, and perhaps that frustration is leaking out in the post.

It's clear the OP had something of a clue and thought bcrypt would be a good approach. You can call it public ridicule if you want to, but let's give him the benefit of the doubt: perhaps he simply didn't know enough about cryptography (which is very, very hard) and wanted to get a second opinion.

Personally I wouldn't fault anyone for asking advice from a public forum about some homebrew crypto code like this. It's certainly better than a lousy password hash like that going into production in place of bcrypt.

Dave's claims and refusal to change are bad, but that has nothing to do with the code itself. The code is obfuscated sha1-with-lousy-salt. That is nowhere near DailyWTF. It's still relatively secure.

Relatively compared to what? MD5 and SHA1 can be computed very very quickly and can easily be brute forced on GPU clusters.

Compared to PBKDF2, bcrypt, and scrypt, this system is incredibly insecure and susceptible to brute force attacks.

My opinion of this code is it's full of nonsensical voodoo and this guy has no business writing cryptographic code of any kind.

You realize how many places use a fast hash for passwords, right? It is nowhere near DailyWTF levels.

The fact that 95% of sites get it wrong doesn't detract from the fact that this implementation is broken, and trivially susceptible to brute force.

If 95% of people believed in demonic possession, we'd all bemoan how crazy, dangerous, and fucked-up the situation is. But when a developer knows the solution recommended by real live cryptographers and chooses to discard it for his own broken version, we shrug our shoulders because, hey, everyone else gets it wrong too.

Fuck that. Bad code is bad. Bad developer attitudes are worse.

Totally agree. The tone of this question and its answers really bothered me.

     $time = date('mdYHis'); // Probably in the database somewhere too
     $rand = mt_rand().'\n'; // Said to be a 31 bit number
     $crypt = crypt($user.$time.$rand); // DES crypt is 64-bits output
Since "$crypt" is time-varying, he's got to be storing $crypt in the database as the seed along with the password hash. This system is no better than a weakly-generated random 64-bit seed.

It's reasonable to assume the attacker knows the username. If the attacker knows the time the user was created, he can easily brute force the 31 bits of $rand. PHP mt_rand is not seeded with more than 32 bits of entropy, so with knowlege of just two (username, time, seed) entries the attacker learns the values of all other mt_rand numbers (past and future) generated by this PHP process. This may have significant implications for the security of other parts of the system.

    function hash_it($string1, $string2) {
        // Equivalent complexity to:
        return sha1($crypt, md5($password)).

    $hash = hash_it($password, $crypt);
So the resulting hash is 160 bits long, by interposing MD5 it is guaranteed to not contain more than 128 bits of entropy. If nothing else, this is a waste of space.

In my estimation, this scheme is about 0.5 bits stronger than plain SHA-1 like Linkedin was using. 90% the Linkedin passwords have been cracked. http://arstechnica.com/security/2012/12/25-gpu-cluster-crack...

This is as strong as SHA-1 plus a salt, which is much stronger than SHA-1 without a salt(but still not a good way to store passwords). The salt does not need to be random, merely unique. Since salts are stored in plaintext in the database next to the hashed password, the assumption is that an attacker has those as well.

> This is as strong as SHA-1 plus a salt, which is much stronger than SHA-1 without a salt

No, not really. Salts are used for two main reasons:

1. So if two users have the same password, it's not obvious in the password file.

2. To thwart precomputation attacks, e.g., "rainbow tables".

Modern password cracking has all-but obsoleted both of these reasons. For (1), if two users can "choose" the same password, then it's almost certainly a weak password that the attacker can guess as well. For (2), GPUs have gotten so fast now that password crackers don't even bother much with precomputed rainbow tables.

I'm not arguing against salting, it's still good practice. I'm saying it doesn't, in reality, add much time at all to the time needed to crack a database of passwords. It's just that at the end of the day, it all comes down to the entropy content of the user's password as experinced by the attacker.

I have a database of 1 million users. Regardless of what hashing algorithm I am using, if there is no salt I just need to hash the entire password space once, where the password space could be a dictionary attack or a brute force of passwords up to length 10 of letters, digits, and symbols. If there is a salt, it means that after hashing the entire key space I only have the password for 1 user rather than all of them. It means attackers must use targeted attacks and slows them down by a factor of # of users. Rather than having all my passwords in a day, an attacker gets access to one user's password per day (over the course of a million days). (Still not a good thing to happen)

> I have a database of 1 million users.

Similar to Linkedin with 6.5m, we have a real-world case study.

> Regardless of what hashing algorithm I am using, if there is no salt I just need to hash the entire password space once, where the password space could be a dictionary attack or a brute force of passwords up to length 10 of letters, digits, and symbols.

When you say "entire password space" you're implying that there's a hard upper limit on the complexity of a user's password. Many sites do indeed restrict passwords to 8 or 10 characters, but that's horrifically broken.

Any not-completely-broken system has an password space so much larger than an attacker could search that it's effectively infinite. Note that even the silly system proposed here uses MD5 which accepts password longer than would even fit into available memory.

> If there is a salt, it means that after hashing the entire key space I only have the password for 1 user rather than all of them.

As described, the "key space" is effectively infinite, but luckily for the attacker some parts of it are far more richer to mine than others. This is where the easy-to-guess passwords lie.

So the attacker won't do anything as dumb as "hashing the entire key space". The attacker will search the most rewarding parts of the keyspace first.

> It means attackers must use targeted attacks and slows them down by a factor of # of users. Rather than having all my passwords in a day, an attacker gets access to one user's password per day (over the course of a million days). (Still not a good thing to happen)

For example, recent data breaches suggest that 50% of users will choose one of the 10,000 most-common passwords. So the attacker tries this set on all 1M of your users. That's 10,000,000,000 hash operations. A typcial rate for SHA-1 is 3.8 billion per second, per GPU. http://hashcat.net/oclhashcat-lite/

So our expectation is that the easiest 500,000 of your 1M accounts will be cracked in the first 5.2 seconds by a single GPU attacker. And this is approximately what happened with Linkedin, ISTR 60% of the passwords were reported cracked that first day.

But the rate of success slows down for the attacker and the remaining passwords begin to take exponentially longer. As of today, 90% of the Linkedin database is reported to have been cracked: http://arstechnica.com/security/2012/12/25-gpu-cluster-crack... So there are really only 65,000 hashes posing any difficulty at this point.

When a fast hash function is mis-used for password hashing, salt or no salt, there's not a case where "an attacker gets access to one user's password per day". The attacker cracks the majority of passwords in the first few seconds yet a small minority will never be cracked.

Salting doesn't save the database as a whole. Salting doesn't help the users with weak passwords. Salting doesn't even help the few users with very strong passwords, they wouldn't need salt in the first place. But for the users in the 60 - 90 percentile, it might buy them a few hours or days.

> Similar to Linkedin with 6.5m, we have a real-world case study.

Except that Linkedin's passwords weren't salted.

> When you say "entire password space" you're implying that there's a hard upper limit on the complexity of a user's password

That's why I said "where the password space could be a dictionary attack or a brute force of passwords up to length 10 of letters, digits, and symbols". I don't actually mean the whole key space, just whatever the attacker chooses to target.

> So our expectation is that the easiest 500,000 accounts will be cracked in the first 5.2 seconds by an attacker with a single GPU

OK, I concede that point that in this case the salt is not helping so much, at least for users using the 10000 most common passwords. For the few hundred thousand users users whose password lies in a larger key space (but still in the range of bruteforce), the salt still protects them.

Getting back to your original point

> Modern password cracking has all-but obsoleted both of these reasons [for having a salt]

Let's look at a better hash function - bcrypt. In this case hashing passwords is still slow enough (by design - you can slow it down to whatever you want) that rainbow tables would again be useful - to compute the hash of 10000 passwords with a difficulty such that each hash takes 1 cpu-second to compute will take almost 3 hours. Adding salts makes this a per user operation instead of a global one.

> Except that Linkedin's passwords weren't salted.

My point is that salting a fast hash function is like issuing lifevests to the passengers of the Titanic. It sounds better than nothing, but in practice it only just keeps a handful of them from drowning so they can die of hypothermia a few minutes later.

> That's why I said "where the password space could be a dictionary attack or a brute force of passwords up to length 10 of letters, digits, and symbols". I don't actually mean the whole key space, just whatever the attacker chooses to target.

Sounds like you think you get to decide what the attacker will target first and when he will give up.

> Let's look at a better hash function - bcrypt.

Agreed, bcrypt is categorically better. But note that bcrypt includes salting as an intrinsic part of its implementation. So it's not something any web programmer could say "I will/will not salt my Bcrypt thusly".

As tptacek says, "if you're typing the letters A E S or 'salting your hashes', you're doing it wrong."

A meme post, by a StackExchange mod, got upvotes?


Yes, and by the same mod: "Deleted long discussion on preimage attacks, collisions, hashing strength etc – Rory Alsop ♦ 8 hours ago"

Aye - we delete ALL long winded discussion in comments, as Stack Exchange isn't a discussion forum.

That may be the rule, but it seems perverse to delete useful comments, which add to the conversation, while adding meme pics, which add absolutely nothing useful to the conversation.

P.S.: Meme pictures originated on discussion forums.

But it is a meme joke forum?

NB: I thought it was funny.

Wow. Wow. I thought their goal was to create a searchable repository of objective answers, not spam it up with useless bullshit.

In terms of nerd culture, I guess image macros are the new version of endlessly quoting Monty Python, except the internet memes weren't even funny the first time.

That meme post is (currently) the 6th answer down the list. If you managed to read through the first four detailed answers and are still looking for more, then I think a meme is fair enough.

Surprised me too, as I had popped it in there for a joke, mostly :-)

Bcrypt is published. That's its strength, not its weakness. That's how cryptography is. The more people try to break it the better it gets.

Dave's primary problem is that his scheme is based on MD5 which is still well known, but can be brute forced quickly. His additional lightweight obfuscation will provide nearly zero extra protection.

Use Bcrypt with salt. End of story.


Published crypto is stronger because it's had many eyes on it, not weaker. This is pre-101 level crypto knowledge, it's like the first line in the idiot's guide. Or it should be!

I write GPU-based password bruteforcers. The main flaw of this algorithm is that it is not iterated: it makes a single call to the MD5 then SHA1 compression function. As a result, I estimate it can be bruteforced way too fast, around 1 to 1.5 billion password/second on a AMD Radeon HD 7970 ($400).

(On this GPU, MD5 alone can be bruteforced at 8 billion pw/sec, and SHA1 alone at 2 billion pw/sec.)

This one flaw, by itself, makes the algorithm very bad. The developer should have used bcrypt, or scrypt, or SHA2-based Linux crypt. These are iterated by making multiple calls to a compression function to significantly slow down bruteforcing, from billions/sec down to thousands/sec.

The main problem here IMO is that this thing will be used to store passwords for a longer period of time than anyone will spend bothering to break it now. So you store your entire password database this way, then two years from now you get breached, and both your database and your code gets exposed.

Then, you might benefit from nobody wanting to bother to spend the energy to reverse-engineer your one-off GoofHash(). But, then again, maybe not. Maybe the fact that all of the passwords have been effectively hashed using single iterations of fast trivial algorithms like SHA1 and MD5 will mean that a ton of your users' passwords could be calculated in a short period of time.

In that scenario, you're gambling instead of relying on battle-tested approaches. Maybe the gamble will pay off, probably it won't, but when the battle-tested approaches have been examined by a lot of really smart people and found to be solid, why not use them instead?

Or, put another way: you're going to have to play a game of dice against Death. You have a choice between playing with a set of dice that thousands of people have rolled over and over again and found to be fair so far, or a set of dice that some dude just rolled for the first time a minute ago and declared to be fair. Which do you choose?

edit: I am an idiot. If someone gets the code and DB, they won't have to reverse-engineer anything; this is just a fast MD5/SHA1 combo, a bunch of passwords could be brute-forced in a short period of time.

Does anyone know of a decent introductory, authoritative text on cryptography? I haven't needed to know this stuff before beyond "use bcrypt because it's good, MD5 is rubbish", but have recently been given the task of generating license keys for our product. I would like to be able to test that what I've created isn't trivially crackable.

When I worked with people who wrote hardware crypto for a living, the book they recommended was Brush Schneier's Applied Cryptography [1]. It's __excellent__. It's the best technical book I've ever read. It's more readable than the Perl cookbook.

The first third (half?) of the book is devoted to explaining (not with code) the various complex interactions between parties who need to trust one another -- lots of stuff on key exchange, and then only later on the different types of ciphers (block vs stream ciphers). The examples are clear and well-written, and VERY memorable. Bruce explains very well what the pitfalls are in each scenario, and all the ways in which malicious attackers can try to break your trust.

The second half of the book is implementation of most of the algorithms in C.

Other books may cover the topic be better, but I haven't read them. (Sorry.) I like that Applied Cryptography gives a good noob-friendly introduction, and builds from there, yet also has depth and source code.

1: http://www.amazon.com/Applied-Cryptography-Protocols-Algorit...

You're going to have to define "trivially crackable". Sure if you use a public key they can't create a keygenerator. After that, crypto has little use to you.

A user can just open your binary and flip the jump that checks if the license is good or not. Do you really want to start wasting time making your software vastly more complicated, in order to try to lock out crackers? If the software is remotely desirable or common, it'll get cracked.

Someone posted a challenge on Reddit a while back for people to crack his license key generator. He gave them the source code to a test program which would verify keys and gave people a sample key that worked. He seemed to think it was uncrackable.

1. People quickly figured out that the key generator just verified RSA signatures.

2. Since he implemented RSA by hand (I guess calling OpenSSL's RSA_INIT() would be too "obvious") it had bugs. Specifically, it didn't check that the signature in the key was strictly below the key modulus, meaning that given signature X, you could add the modulus to get a new valid signature.

3. Since the license keys can't be ridiculous long pieces of text, the key was not long enough. Someone went to a website online and pasted the key into a Java prime factorization applet and cracked it, posting a key generator.

There is always a tradeoff between security and usability. Unfortunately, the tradeoff here is prohibitive: if you use a 1024 bit key and base 32 encoding, your users will have to type in a minimum of 205 characters to enter the product key. Base 32 is the densest reasonable encoding for human input, or close to it.

So the lesson is that even the simple requirement "can't create a keygenerator" is an open problem.

Well with ECC-192 you could get a 77 character code, right? At least it's a well defined problem space and has known tradeoffs.

Well, when you go to ship an actual product and the product people say "no way, that's too long to even fit on the sticker let alone for any user to type correctly" then it doesn't matter how well-defined your tradeoffs are. It's not a solution to the real problem.

For me, trivially crackable here refers to having someone who has read a basic text picking the license key out of the registry and going "hmm, that looks like a such-and-such key, I wonder what happens when I run it through this program here. Oh look it works, I think I'll just extend the date of the license to forever and tell all my friends."

If someone wants to go and disassemble the app and read all that machine code to find the bit to flip then there's not much I can do about that I think.

Step 1: Define what attack you are trying to prevent.

Are you just trying to keep honest people honest? Then just about anything will work. Are you trying to prevent hard-core hackers from cracking your work? A whole bunch of very good people have been working on that problem for a long time and failed.

It is likely that the most dangerous licensing system you can produce is one that routinely false-positives a real user as a pirate. They will cease to be real users in the future. In general, pirates weren't going to be real users anyhow. (There are some exceptions, mostly in the games world, but in general it seems to be true.)

If you can swing it, the most bang-for-the-buck you can get for protecting your work is to not actually ship all of it; provide some sort of critical element that lives on a server somewhere under your control. This can be tricky in some cases, because you really ought to be able to say with a straight face that there's some sort of value-add resulting from that. And that has availability and growth consequences as well. But it works, it's simple (relatively speaking), easy-to-understand, and extremely difficult to bypass. (Still technically not impossible, in some sense, but you've raised the bar beyond what all but the most dedicated attacker will bother with. Cracking is fun; reimplementing a backend service by examining only the data going in and out, well... I can't say that's not fun, but it's fun for a much more select group of people, most of whom are probably using those skills to make real money somewhere.)

"Read all that machine code" - I think you highly overestimate the difficulty of that.

Take a simple checking function. Check the code, if it is wrong, show a message box saying "wrong code". Well, in that case, all you gotta do is set a breakpoint and go back. Programs like OllyDbg or IDA Pro make this kind of reverse engineering absolutely trivial, and something you can accomplish after "reading a basic text".

Most likely if someone's going to "run a program", they can run a crack, too. If you can get away with a license file, then you can do RSA properly. Otherwise, end up with short RSA and hope no one really looks.

It's almost certainly the case that you're far better off improving some other aspect of your application than investing any time trying to defend against pirates.

Thanks for the pointers to OllyDbg and IDA Pro - I'd never heard of them. Embarassingly, it's been quite a few years since I tried actually hacking a program; especially since I'm posting on "Hacker News".

Here's a tutorial I wrote in 2004 which is still a decent start these days. If anything, it's easier these days, because IDA Pro has HexRays which can decompile code into C and makes it very easy to navigate a compiled module.


To make license keys unforgeable, you have to go public key. Typically some sort of digital signature. This won't make your application uncrackable, but will make it impossible to fake legit licenses.

A good reference on how to get these things right is Schneier et al's 'Cryptography Engineering'. The standard advice line here, however, is "get professional help".

Cryptography Engineering: Design Principles and Practical Applications.

gonna be fun when bcrypt will be "rubbish" and people won't understand that due to simplifications. Certainly is a good idea to inform yourself instead (despite the so called experts that will tell you not to even look at it)

I always liked www.amazon.com/Code-Book-Science-Secrecy-Cryptography/dp/0385495323/ as beginner introduction, before going on to more advanced books. it gives you all the basics, and the basics are imo the most important to get right (that's why they're basics), plus its a fun read

Short of an extremely significant change in the way we do computing, my understanding is that you can just push up the work factor on bcrypt and be fine.

We find logical issues that reduce the strenght of the algorithms every year or so, til now. You're saying that bcrypt doesn't suffer this and you can just add more rounds.

That's sort of like adding more rounds to DES.

Applied Cryptography is pretty much the standard.

Cryptography I by Dan Boneh (Stanford University): https://www.coursera.org/course/crypto

The videos are probably available for download somewhere else, so you don't have to wait. Someone posted a site here on HN that saved all the coursera videos, but I can't remember the name.

Each video lasts ~20 min if I remember correctly, but they are very intensive. I never wanted to watch more than one or two per day, my mind would have blown.

Is there something shorter? I mean something consisting only of 5 pages or so? My current knowledge (for web dev purposes) is this: sha1(md5(pw + salt)) is pretty safe. And I heard that doing the sha1 recursively is better. On the other hand my boss is convinced that is no good.

Of course I did a bit of research (ok, only 30-60 minutes on Google ;)) and didn't find anything presenting a sweet and simple solution.

http://codahale.com/how-to-safely-store-a-password/ or PBKDF2 is also a good solution. the tl;dr version is: sha1(md5(pw + salt)) is too fast to be good, crackers can run millions of attempts per second if they have access to the hash.

TLDR 2; Stop inventing your own cryptography. This doesn't just mean "cryptographic cipher". If you're passing data into cryptographic functions and the parameter names don't conceptually match what you're putting in them, you're probably doing voodoo cryptography.

I did this as well, awesome course. The drop-out rate was astounding from the feb-may run though, from 70K signups there were only 1-2K that completed IIRC.

Wow. I'm even happier to have completed it then!

Agreed, it was a great course. Looking forward to the sequel in early April. https://www.coursera.org/course/crypto2

Thanks fr the reminder, must get involved in that, it's fascinating stuff.

I did it, it's excellent as an introduction. It gives you some base points but no more

Thanks for this - signed up!

When your machine is breached and the list of hashed passwords is stolen along with the code the attackers will be able to brute force at very high speed all of your customers' passwords, then use these to get into the customers' gmail/yahoo/hotmail/paypal accounts (because they used the same or similar password) and from there do real financial damage to your customers. And it will all be your fault.

The point of bcrypt is to make it so they can't brute force at high speed. (If you choose your number well enough.)

The published algorithms are prune to dictionary attacks. If you can have even a not too good cryto-algorithms, whose logic is kept really secret you are much secure.

Let's say a hacker, some how got into some non-secure services with good amount of user. And found out the username/password are not encrypted or hashed at all. He can use that list a dictionary. He can also have MD5, SHA1 hashed calculated. The same hacker got it into your secured service, who already have a dictionary to try on. He can get password from MD5 or SHA1 or any known hash, in O(1) using hash-table lookup. Normally, hackers now-a-days have the dictionary to try on, it is just the SALT and hash-algorithm is unknown to them. So, the security measures can be:

1. Protect password Hashes from theft. 2. Protect salt from theft. 3. Protect hash function from theft.

Try to protect them all. All of them can stolen, but probability of 3 theft is less than 2 theft. I agree with Dave, only if he has plan to keep hash function secret. I should not shipped inside java script, so hacker don't need to do any effort.

This is more-or-less totally incorrect. The only thing you got right was 'use salt to help against precomputed tables'.

Cooking up goofy hashes doesn't give you any extra security in the worst case scenario (the hacker gets access to your web server and has the source code for your goofy hash). A goofy sha1/md5 based algorithm with salt can still have millions upon millions of attacks generated per second by anyone with a half-decent GPU. Your only hope once you're in this scenario is to use a big slow hash that the attacker can't run very fast -- y'know, something like bcrypt.

And if you're right that they're more likely to get the db without the source (very dubious -- web servers are a big attack surface)? Then the attacker is still completely screwed trying to break your too slow to attack bcrypt+salt passwords. You've got much better safety in every scenario, not just the "well they didn't manage to see how my goofy hash worked" scenario.

(And that's assuming you, as a non cryptographer, didn't so thoroughly screw up your implementation of goofy hash that it ended up with less entropy than even md5. But how would you know when you've kept it super-secret for l33t security?)

For a short password, does it help to first calculate a hash using MD5 or SHA1 and then run it through Bcrypt? As far as I have understood it's better that the password to encrypt is long enough which the first simple hashing takes care of, or am I wrong?

My gut tells me no. If your password process is to take a string, get the MD5, then call bctypt on the MD5, then you are essentially doing the same as calling bcrypt on the string.

If the USER takes a short string, and MD5's it then sets the MD5 as their password then it is more secure, but that is only because it is a longer password to begin with.

If I'm wrong I'd be interested in knowing why.

Doesn't help nor hurt. (If for one reason or another bcrypt would be unsafe for short input strings then every implementation would already automatically apply the needed preprocessing step)

Re-implementing crypto functions is generally a terrible idea, unless you really know what you are doing (and most likely you don't). There are so many different relevant factors to consider, that single individuals or organizations are really unlikely to be able to cover reasonably close to completely. That's why standards and "well known" algorithms are a good choice.

Also, "security through obscurity" is well known as a worst practice for crypto (probability that the algorithm will be reverse engineered or disclosed is all but negligible), so definitely not a good reason to implement any new algorithm.

This is actually quite tame - I've encountered developers who justified writing their own web server (in VB6!) because it would be more secure than "standard" web servers - this was for quite sensitive commercial data...

The md5 and shuffling of the md5 hash before doing a sha1 is entirely pointless.

The salt generation looks acceptable though.

The op stated that he already created a system using bcrypt. The fact that op doesn't know what is wrong with Dave's code, and needs others to explain why Dave's code is bad, makes me lean towards op stfu and allow Dave to continue.

Op is unlikely to be any more competent compared to the somewhat incompetent Dave.

I agree that it is likely that op and Dave are one and the same, and think this mess of code is some sort of genius.

The only instance in which homebrew password security isn't wrong is if you are actively involved in research and experimentation, destined for publishing and peer review, into better security methods.

We don't publish security-related algorithms to make them stronger. We do it to weed out the weak, and it is remarkably effective at doing that. If you're not confident that your algorithm could survive in the open, then chances are it can't.

Dave's mentality about having hashing that is unpublished is also known as "security through obscurity". Relying on the fact that nobody can see your source code. That's generally known to be a poor strategy.

Ironically, if he would just substitute bcrypt for sha1 in his function, it would be a great crypt function!

This is silly, true, but "you are not a cryptographer, use bcrypt" is not helpful. You could design an equally silly password hashing scheme using pbkdf2 and bcrypt, the recommended alternatives:

    persist( sillysalt($user)+":"+$outer_pbkdf2_salt+":"+
A scheme like this is designed to thwart offline attacks on the password verifier (for example, an attacker acquires a dump of your database).

His user+date+mersenne salt accomplishes the goal of (mildly) increasing the cost of precomputed attacks. He should use a CSRNG instead.

He uses existing cryptographic hash algorithms, SHA1 and MD5.

His shuffling does not add or detract from the security of the scheme.

He does not perform any significant key strengthening. Weak (most) passwords will be very cheap to brute force.

> His user+date+mersenne salt accomplishes the goal of (mildly) increasing the cost of precomputed attacks.

Most password crackers these days are using GPUs and don't even bother with precomputed rainbow tables. So the salt is not increasing their cost at all.

You didn't actually disagree with what I wrote. What you wrote is also true.

Can you, please, explain why passwords will be cheap to brute force? He is bcrypting them and the attacker will have to bcrypt them as well, or not? How should it be done (in the context of your example) done properly?

Sorry that was not clear. I was referring to his SHA1/MD5 version.

My version is not cheap to brute force, but it is silly.

tl;dr - don't break the first rule of cryptography (http://meta.security.stackexchange.com/a/915/12776)

> "Anyone can invent an encryption algorithm they themselves can't break; it's much harder to invent one that no one else can break" - Bruce Schneier

A primary tenet of cryptography is to assume that the mechanism of an encryption is known. This is a case of someone, honestly, not being educated about the topic and therefore I would definitely not want someone like that making security decisions.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact