Hacker News new | comments | show | ask | jobs | submit login
How to Safely Store Your Users' Passwords in 2016 (paragonie.com)
479 points by antitamper 583 days ago | hide | past | web | 301 comments | favorite



I called my bank the other day and they asked over the phone for my password. This isn't a bank I often use, I only currently have a loan through them so I've never used the login on the website. I said I don't remember setting a password. They gave me a hint about the characters in the password and I was able to remember the password based on their hint. I verbally said the password character by character and they confirmed it. This is an example of how to not handle passwords in 2016.


Same thing happened to me, my local credit union emailed me my password. They ensured me that they use "bank-level encryption". Of course I didn't get into the difference between one- and two-way encryption with the teller, or that email isn't secure.

We live in an age where this should be unacceptable. Why aren't there financial security laws yet?


It's so infuriating because I've worked at companies doing digital commerce before, and PCI compliance and certification is quite onerous. But at the end of the day credit card numbers still aren't as sensitive as bank logins, yet there are no security standards on bank logins! It's crazy.


PCI compliance is pretty interesting. It's been many years since I worked in an e-commerce shop but I seem to remember that it even described physical security layers i.e., dictating the placement of door hinges to server rooms.


Goes to show that when money is at stake for the stakeholders then things get done.

I don't think banks actually care about individual user login security too much. Credit Cards reallllly suffer from security breaches though


> I don't think banks actually care about individual user login security too much. Credit Cards reallllly suffer from security breaches though

But the only reason that banks care so much about credit-card security breaches is that the law forces them to do so. If the law didn't make credit card fraud the bank's responsibility, then they'd be just as lackluster about preventing it as they currently are about securing login credentials.


I don't know if that's true. Say we lived in a world where those regulations didn't exist and the fraud risk was on par with what it is today. If one bank introduced their own fraud protection program, wouldn't they basically capture 90% of the market overnight?


This is a good argument, but I'm not sure that I buy this. There are plenty of industries (I think of the cable and cell-phone industries, but I'm sure there are others) where it's just a given that the service will be crappy, even though one company that started to respect its customers could seemingly corner the market. (I'm not sure what are the results of T-Mobile's exercises in respecting its customers. Their unfortunate net-neutrality stance with Binge On means that it's not an unalloyed win to go with them.)


I get what you're saying, but credit cards are unique: the advantage credit card companies have over those other industries is that there is essentially no lock-in, and your old cards continue to work while you're in the process of switching eg your autopays over. It's a very switch-friendly industry.

Remember when you couldn't take your cell phone number with you and so pretty much nobody switched carriers? It was a massive pain. Now it's easier than ever to switch, except most people are locked into multi-year contracts. Switching friction = high, but not impossible. As you said, TMO is trying to compete here.

Cable has monopolies on towns, so there's 0 incentive. People couldn't switch even if they wanted to. I suppose there's satellite, but you'll still be paying the cable company for internet -- they get their pound of flesh no matter what. Switching friction = impossible.


These are good points. My remark was off the cuff; it's clear that you've thought (or at least are thinking) about it much more deeply.


What law are you talking about? PCI-DSS is required by the card companies and run an organization called the "Payment Card Industry Security Standards Council". It's self-governed essentially. It's not federal law


I had always assumed that it was a matter of law, rather than self governance. I stand corrected.


Well, there's your problem. When dealing with passwords you don't want encryption. You want hashing.


Hashing is considered one-way encryption, no?


No, it's one-way cryptography, but it's not a form of encryption.

https://paragonie.com/blog/2015/08/you-wouldnt-base64-a-pass...


I hear credit unions are not regulated as well as banks, so even if there are laws for banks, the credit unions don't necessarily have to comply. Credit union security will probably always lag behind banks. :(


I don't think that is true. Source?

Furthermore, credit unions can't make risky bets that could put them under. The money you deposit goes out as loans to other people.

I avoid banks like the plague. Too shady for me, never again.


I once had to call my bank a few years ago to disable my debit card. The reason was that I inserted it in an ATM and the ATM had chosen that exact moment to malfunction and shut down!

The operator on the line asked me a ton of questions starting from my user-id (but not password), date of birth, full name, father's name, place of birth, type of account and many others that I don't remember now. Only after I correctly answered all these questions, did he start acting on my instructions.


My bank always asks me for: Birthday, Address and last time i got how much money. The first 2 details can be easy (in my country there is a website which shows this for most people who dont know how to stop them), the third one can be easy if you stalk me a day or two.

But if i tell them to not make it that easy, i cant manage my shit over the phone anymore :/


I have a telephonic pin for my account that I use.


Simpler and way much more reasonable.


I also hate the stupid security questions used to identify you which they always claim "add security". In almost all cases they decrease security.

Where did you spend your honeymoon? What was the name of your first pet? What is the name of the street where you grew up?

For any given person, a LOT of people know the answer to these kind of questions.

Also, I hate it when people use date of birth to verify identity. Medical people love doing this. Um, just check the person's Facebook and see when everyone wishes them a happy birthday, then go access their medical records?


"Medical people love doing this"

Tangential to the actual issue, but in that field it's to prevent patient mixups, not to defend against malicious attackers.


I once saw a report about two persons having the same name, same birthday, and same birthplace. It really fucks up a lot of administrative databases.


I loathe security questions too, but those aren't even that bad! My credit union's default security question is "What was your first musical instrument?" I wonder how many guesses you get?? I always answer these questions with a long string of random characters.


LOL, check their Facebook pictures and see if they are playing an instrument in any of them. Most people don't play more than 1 instrument, so it's probably their first.


How many common "first instruments" are there for kids? Maybe 5? There's a pretty good chance you'll get it right without doing any research whatsoever.

It's not quite as bad as asking "what species was your first pet?" but not much better.

This is the real problem with security questions; the answer space is often so very small, and can be narrowed down even further with a little research.

Questions like "what was the first name of (your maternal grandmother, your first best friend, etc.)" are very common -- well, there are stats on most popular first names of given generations in different places. If you know what country the person is in, you can make a good guess at these.


It's not even that tough. Just guess "piano" and you'll be right 40% of the time. If you get a second guess try "violin".


I'm not saying I disagree, but how would you verify identity over the phone?


By allowing people to set their own questions and answers instead of a) using a pre-defined list of questions b) forming questions from information about the customer that friends/acquaintances usually know or information that can be found via a Google search.


You can still have the question. But my answer is a random 32 character string of alphanumerics :)


I algorithmically generate the answers to the security questions with:

     answer = PBKDF2(hmacsha1, password + question, "", 100000, 16)
This is also incidentally the basis for how I generate unique passwords for every service except banks, communication, and other sensitive things. I want a different password on every website and don't want to trust any password-remembering software I didn't write. The same function works fine for generating answers to secret questions.


This is morally equivalent to using a password manager to encrypt your passwords with a "master password". :)


Not really. I don't oppose using a master password, which I don't use anywhere directly or store on disk anywhere. I just don't want to trust closed-source code to manage passwords, and want to be able to generate the password to anything from anywhere without having to carry around an encrypted table of stored passwords. In this case, I implement it myself, with the help of some common open-source Python libraries.


Have a look at pass [1], it's a minimalist tool in bash that is so simple you can easily make adjustments to it yourself. The codebase is very small so it is easy to audit. The principle is that your password are encrypted with your public key. You can then use git to keep running copies of your encrypted passwords on many devices.

[1] - https://www.passwordstore.org/


Thanks! This is interesting.


I did say "morally equivalent" rather than "technologically equivalent".

By that, I mean the overall security of your password scheme is analogous to what people get out of a password manager.


Password managers are more secure. Here you just need the master password, with password managers you need the master password and the database file.

Still, a lot better than password re-use.


> I just don't want to trust closed-source code to manage passwords

KeePass? It's great, and open source.


Name and shame.


[deleted]


I don't think you should out them publicly. Are you interested in having them improve their security model [1] or do you want to have what they have they done out in the open so that others can try to social engineer them and their customers might suffer the consequences?

[1] If so, tell them about this in a way that will get their attention without causing their customers or them any harm.


You're correct. I'll contact them.


Always have a reasonable response deadline. For a security issue, above 1 month is unreasonable for a small deployment, above 3 months in a huge deployment in my opinion.


Bear in mind that giving the password over the phone has a different threat model to sending the password over a TLS-secured connection from your browser to a bank-run web server. Specifically there is a human in the call centre who is transcribing what you say. Using a partial password (give me letters X, Y and Z) is a way of mitigating the risk of call centre staff being able to harvest meaningful amounts of security credentials. This does mean that you need to be able to check subsets of the characters in the password, which rules out hashing the whole password in this case.


> This does mean that you need to be able to check subsets of the characters in the password, which rules out hashing the whole password in this case.

As you implicitly point out, however, it doesn't require any portion of the password ever to be visible to the call-centre employee; one can just supplement an individual hash by a collection of hashes of appropriate character subsets, and then (say) randomly pick among the available subsets.


note though, that this means you have a collection of hashes for the password that are each 3 characters or whatever long, which can be brute forced in essentially no time at all. Crack em all, lay them out according to what letters the hash is for, put it all together and you are done. Even just having 1 subset reduces your passwords security by that many letters, if not more (you can filter out dictionary guesses that don't match those letters and such)


Good point. Would salting them obviate the problem?


nope. Salt is good for eliminating rainbow tables and similar vectors. At this level, let's just say you go after uppercase, lowercase, digits and 20 different symbols, (a total of 82 letters) you'll wind up with 551,368 possible combinations. The only way to make it "safe" would be to get a hash method that would take multiple seconds to run on good hardware. (as a comparison, I crack raw NTLM at something on the order of billions of hashes per second)


Back from 2009-2010, I did work as a web developer on an ecommerce site. A month or so in, I discovered that they kept all of the user password unencrypted in a database.

I went to my boss and explained that we can't do that. It's inviting exploitation. He responded to me that we had to keep them in plain text, in the database so that we could send them to users who forgot. If they can't login, they won't order product.

I have heard similar stories from other IT professionals. It's amazing that these operations aren't getting pwn3d twice a week.


Unhashed passwords don't get you pwned. They only become a problem after you've been pwned.


I experienced this too many times with UK/US providers (e.g. Hostgator and Fasthosts). They often ask for server root password or email/password combo for the client portal, even after you verify account ownership.

Seems that they don't have any other way to grant the technicians access to the server/my account, which is absolutely ridiculous.


I don't know much about what sort of security compliance banks must implement, but surely that's in violation of something? What country?


United States


Plaintextoffenders!


Bad idea to use bcrypt.hashSync in Node.js. I hate that so many tutorials use that one instead of the correct bcrypt.hash with a callback. This is Node.js for that 200 ms where you are hashing that password nothing else runs, no requests, everything stops. Here is the correct way to use bcrypt in Node.js:

bcrypt.genSalt(10, function(err, salt) {

  if (err) return; //handle error

  bcrypt.hash(clearPassword, salt, function(err, hash) {

    if (err) return; //handle error

    // Store hash in your password DB.

  });
});


Note that bcrypt.hash just calls bcrypt.hashSync anyways[0]. So there it's still going to stall exactly the same amount.

edit: Looks like it depends on which version you use.

'npm install bcrypt'[1] gives a version which supports true async usage (via V8 async callbacks in native code)

'npm install bcrypt-nodejs'[2] gives a pure JS version which I linked to. Which is the top search result for 'nodejs bcrypt'

[0] - https://github.com/shaneGirish/bcrypt-nodejs/blob/master/bCr...

[1] - https://www.npmjs.com/package/bcrypt

[2] - https://www.npmjs.com/package/bcrypt-node

[3] - https://www.google.com/search?q=nodejs+bcrypt


And 'npm install bcrypt' is the package linked to from the article.


The project from the article is https://github.com/ncb000gt/node.bcrypt.js

There, the encryption is handled asynchronously, the event loop is yielded while the calculations take place in a C extension.


BUT NODE.JS IS NON BLOCKING?!?@?!?!?!

Kidding.


It's IO-bound. I guess they could make a version that yields and resumes several times though!


lol at pure js KDF's


wow that's some serious downvotes. Why? a 1 second KDF function on a GPU will take upwards of a minute in javascript

... and conversely a 1 second KDF in Javascript will be trivial to crack brute force. There is no point in it.


I just measured it, and native bcrypt seems to be about 5× as fast as bcryptjs. As long as the time for one hash is more than 0.0001 s, you’re doing pretty well as far as password storage goes, so a pure JavaScript bcrypt is bad mostly because it’s blocking on Node and not because it’s too slow to be useful.

Please don’t pull numbers out of nowhere.


when I was figuring out password security I did some tests. on my phone (a galaxy s3) I got about a 40x slowdown of pbkdf2 versus an x86 server running a compiled version (C I assume). I suppose running the js version on a server might give a better runtime, but I'd think the point of a js KDF is to run on the client, not the server.

edit: and I'd assume a GPU version would be faster.....


The speeds you were seeing might have had less to do with the fact that it was JS and were more about the mobile device being much slower than the server, then. This isn’t really avoidable, but a 40x slowdown isn’t the end of the world for bcrypt.

Important to note that the only advantage of hashing or prehashing passwords on the client is offloading work from the server. It doesn’t improve security (on a website).

Re: GPUs:

- One of bcrypt’s advantages is that its memory requirements make using a GPU provide much smaller returns compared to, say, iterated MD5: you can’t parallelize it as much.

- The point of using GPUs to generate password hashes is generally only to break them, because GPUs can do lots of hashes in parallel. Your web service probably doesn’t need to generate that many hashes at the same time, and it probably isn’t otherwise running on a server with a computey GPU. Clients’ GPUs? Even less parallel, they’d generate on the order of one hash per year.


but I'd think the point of a js KDF is to run on the client, not the server.

bcrypt is used on the server (node.js) to hash a user's password before storing it in the database. Then later, when a login is done, the password from the login is checked against the stored hash to see if the match (and by implication, that the original passwords are the same). So the usage is mostly on the server as expecting a client such as a mobile device to do large rounds of CPU intensive work just to login isn't going to make for happy customers.


right, but running bcrypt on a server, why run a pure javascript version? Why not the C-compiled version?

So to me the logic behind a pure js version is one that can run on the client too. It actually sounds interesting, as it could improve password security! But it's just too slow.

In my case, I ended up sha512'ing the password client-side, and use that as the "password" sent to the server. If there were a native KDF for all browser's i'd use that instead.


In our case, we use the pure-js version during development.

Some windows developers have issues compiling native modules, and it was easier to try the native plugin, if it fails fallback to the pure-js version with an identical API during development. That way we get the ease of development with pure-js, and the production speed of the native module.

Also there is the ability to run client side (mesh networks with webRTC), and the ability to run this on architectures other than x86 where the native version won't compile.


where are you getting those numbers from?


my own testing. maybe not a 1 sec to 1 minute factor though. see my other reply in this thread a few seconds ago.



More specifically, any functions that end in Sync should be avoided.

Unfortunately a lot of people think that these functions are a simple way to "not have to deal with callbacks", which is not correct...


It's not correct?

Is there any reason not to use the Sync functions if you're not writing a web server and don't particularly care for performance?


It is indeed not correct. They are not functionally equivalent to their asynchronous equivalents at all - your entire event loop will be blocked for the full duration of the operation.

This may sound like "not a big deal", but even an operation that takes 1ms will cause noticeable lag at ~50 operations per second, and block everything approaching a thousand.

For emphasis: they block the entire event loop. Not a single request, not a single client, not a single operation or event. The entire application for every user everywhere.


If it's your own thing, do whatever you want. But the only thing sync should be used is to exec shell scripts in node.js. sync functions are bad otherwise in any kind of production environment.



I forget my passwords all the time so clicking on 'forgot password" is already my primary way to login on many sites I don't use often.


That works as long as the website has this functionality. One day you'll miss it... and end up writing to Bram Moolenaar, explaining why you forgot your password (http://www.vim.org/account/forgot_password.php)


    "explain why you forgot your password"
I don't know why but that made me laugh.


The idea that this is even remotely acceptable in 2016 is astounding.


> Why is it telling me I created a new account instead of logging me into my existing account?

> Email addresses on Medium are case-sensitive.

Wow. Is there ever any possible benefit to this? (Serious question)


The local mailbox portion (bit before the "@") of email addresses is case sensitive.

In practice, most email providers don't actually honor that, and so in practice it's a bad move. It is technically correct though.


Hmmm...right you are. Thanks.

I guess I never thought to look that up. Likely due to the common way being so pervasive.

> The local-part of a mailbox MUST BE treated as case sensitive. [0]

[0] https://www.ietf.org/rfc/rfc2821.txt


As for handling this in the real world, the approach I like best is to store email addresses the way they are entered, but create an index on the lowercase version for lookup and uniqueness checks.


A lot of people don't really know their own email address, then. When based on their name, they might write both with capitalization and without depending on mood...

We lower-case all e-mails in our systems. I'll let you know when we hit a system that actually treats it case-sensitive.


Note that the local mailbox portion MUST BE treated as case sensitive in the context of the SMTP protocol and such, but it is up to the mail host to actually define the semantics of the local part. Non-case sensitive mailboxes is not prohibited anywhere in the spec, what you are referring to is there to prevent the local portion of the address from being changed when passed through a relay.

"the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address."


The only problem I see with that is when people change email addresses (maybe they were using their work email and change jobs).


That's an interesting point indeed. When you are fired from your job, you don't often have the time to connect to all our accounts based on your email address to change it. Just checked their FAQ[1] but this does not seems to be covered. Any clue someone?

[1] at the bottom of this page: https://medium.com/the-story/signing-in-to-medium-by-email-a...


I asked Medium about this. They replied, "if you don't have your Twitter/Facebook account connected to your Medium account, you won't be able to sign in at all. You'll need to contact us to regain access to your account. If you don't want this to happen, please do not use an email address you think you won't be using in the future."


I can't figure what is the Medium process when they are being contacted to regain access to an account.

In a near future, we could have a "Argh I lost my email address" link replacing the now very common "I lost my password". Nevertheless, I have no clue what would be a standard process to recover a lost account because of a lost email address, maybe a SMS based process and personal questions like "what was your first pet name?"


Maybe this isn't practical for some people, but I think it's bad practice to use your work credentials for something you plan to "take with you" when you leave. Best to keep a strict separation of work and personal credentials.


Yes and no. Yes, obviously for your bank account or something that is very personal.

But as an IT manager, you could enforce people from your company to use their work email address for accounts that are directly linked to your company activity. For instance, We use a online tracker[1] for Use cases management. When someone leave the company, it does not have access to its email account. Consequently, he/she would no longer have access to our tracker by not having access to its email address. This could rather convenient scheme.

[1] Pivotal tracker (who requires password to connect).


Don't use your work/school/ISP/whatever email. Period.


What about occasional delays in receiving the login email?


This is really awesome! I am going to use it for my next projects...


So there's no password option? That's horrible, means you can't use it with incognito mode (because you'd have to receive an email each time)


You can use the link in incognito. Why wouldn't you?


Because every time you open the site you'd have to receive an email. vs just letting 1password, browser, etc enter the password.


> That's horrible, means you can't use it with incognito mode

No it doesn't.


This is how Craigslist has done it for more than a decade. You can "make a account" with a username/password if you want but it isn't needed.

No issues with incognito mode.


Serious question: What about using a Public/Private key encryption to store the password?

- Private key is stored in a secure place. Offline for all i care; printed on a piece of paper; memorized and swallowed.

- When user creates the account - password is padded with salt, then a public key is used to encrypt it. The resulting encrypted form is stored, along with the salt.

- When user attempts to authenticate - the password that is provided is padded with the stored salt, encrypted with the public key and compared to the stored password.

Private key is never used when comparing passwords. Never available to the system doing authentication, etc.

The only purpose of using a reversible encryption - is to be able to switch to a different authentication provider completely transparently to the user.

I've implemented this functionality considering that we may need to switch over to active directory (or some other directory) in place of storing passwords in the database - but never used it, fearful that i would be committing some cardinal crime against proper security practices

Thoughts?


Where do you store the public key?

If you store it on the server alongside the encrypted password and salt, you're effectively using a bigger salt with a weird hashing algorithm that hasn't been vetted by any expert for the specific purpose of password storage.

If you store it on the client side, the user needs to supply the public key whenever he tries to authenticate. This has the effect of making the salt much bigger, but you don't need to use public keys to achieve the same effect. If you can trust the user to store a public key, you can trust him to store any large chunk of random bits.

So apart from the virtually useless fact that you might be able to decrypt the password with the private key (why would you even want to do that, instead of just resetting it?) your method seems to offer no real benefit compared to simply using larger salts with argon2/scrypt/bcrypt/whatever.


Not sure i understand your reply at the moment, but i will definitely try.

We are using RSA. The public key is on the server. We simply using RSA as a "hashing algo". Randomly generated salt is added to the password sent by the user to make the password (much) longer as well as strengthen it before the encryption takes place.

I have explained the reasons for storing the passwords (for the time being, hopefully temporarily) in other replies in this comment branch:

quote:

There's no argument from me regarding undesirability of keeping the passwords. But we also have a different concern - we don't want to be in the business of authentication at all - our goal is to have a third party service or appliance performing authentication and forwarding us already authenticated sessions with the user name as a header, for instance. Our desired end-state - is when we do not have any knowledge of or access to credentials used by the end-user.

To enable us to make this transition possible - for the time being we store the passwords, since as i explained in another reply, we must make this transition transparent to the users and we have legitimate users that login once a year (think credit report, just an example, not our business), so we can't intercept the credentials within a reasonable time period.

/quote


The more standard way to migrate passwords/authentication schemes is to do it on login. So if you want to switch to scrypt from some other format, wait for them to login, check their password via the old method, then rehash the password with the new method. Store in the database which method was used.

If you're switching to a non-password based system like OAuth, make users link their accounts after logging in with their password. If you are going to completely drop the password, then send each user an identifiable link where they can link their account.


Thank you for your reply. Interesting you mention intercepting users passwords.

Our system already went through a similar transition years ago - when we had to migrate away from NDS (Novell), since we abandoned that technology.

In NDS we did not have access to hashes. We ended up having to "MTM" the passwords in opur web app to intercept them before they are sent to NDS and after a successful login store them in the database. To transition our entire user base took a very long time (more than a year), since we had no way to compel the users to login and had to wait until they login on their own volition.

Ever since we've been reluctant to lose the ability to recover the passwords. Transition to Active Directory has been on our agenda for quite some time - and having the ability to decrypt the password for this purpose - we can make that happen in short order (AD hashes the passwords, so it will be a one-way operation of course).

But the question still stands - is asymmetric encryption strong and secure enough to take place of hashing, assuming the private key is secured? I have not been able to find any relevant sources to answer these questions.


> But the question still stands - is asymmetric encryption strong and secure enough to take place of hashing, assuming the private key is secured?

It should be (though it's a big "assuming"). Being able to quickly move away from an older less secure hashing scheme rather than waiting for users to log in could be a security advantage too.


There's a couple of us here watching the replies to this - as this has been our concern for a while and this is a great opportunity for us to get some feedback. Thank you for your reply.

Interesting point about it being an advantage. I am actually going to make a note of this.

One way of handling the private key that i proposed - was to use a privileged access vault like Thycotic. Or just have it on a USB key in a lockbox.


Because it would leak information about the length of the password (and would require length limitations on passwords if the encrypted password is stored in a database). More importantly, I don't see what benefit you get from encrypting your password (read: slow and causes the above issues) rather than using hashing algorithms. If you use the crypt standard for storing your hashes, you can also update the hashes on login when migrating to a new algorithm. Unix systems have solved the "storing passwords" problem. We just need to keep iterating on "what hashing algorithm is trending on HN this week".


Hi, i explained our reasons in replies to other posts, i think they are if not good, at least worth considering.

quote:

There's no argument from me regarding undesirability of keeping the passwords. But we also have a different concern - we don't want to be in the business of authentication at all - our goal is to have a third party service or appliance performing authentication and forwarding us already authenticated sessions with the user name as a header, for instance. Our desired end-state - is when we do not have any knowledge of or access to credentials used by the end-user.

To enable us to make this transition possible - for the time being we store the passwords, since as i explained in another reply, we must make this transition transparent to the users and we have legitimate users that login once a year (think credit report, just an example, not our business), so we can't intercept the credentials within a reasonable time period.

/quote

The approach i used applies salt. It can also pad the password to some standard length, if desired. Though it currently doesn't do that.

Slowness of the algo is perhaps a benefit in a way? In my testing it is still fast enough for our use (we are not facebook).


This only works for deterministic encryption like "textbook RSA", not for any semantically secure encryption, which would produce different ciphertext each time you run it.

I would worry about trying to dual-purpose the encryption like this. If you feel like you must have a way to get the plaintext password back, which BTW you really really do NOT want that liability, but anyway, the safer choice would be to use a semantically secure encryption of the password, and then ALSO hash the password using the current best practices. Use the hash for now to authenticate users, keep the ciphertext if you must, but you'll probably regret it.


Thanks for you reply. It is indeed RSA that I used (from .NET framework).

There's no argument from me regarding undesirability of keeping the passwords. But we also have a different concern - we don't want to be in the business of authentication at all - our goal is to have a third party service or appliance performing authentication and forwarding us already authenticated sessions with the user name as a header, for instance. Our desired end-state - is when we do not have any knowledge of or access to credentials used by the end-user.

To enable us to make this transition possible - for the time being we store the passwords, since as i explained in another reply, we must make this transition transparent to the users and we have legitimate users that login once a year (think credit report, just an example, not our business), so we can't intercept the credentials within a reasonable time period.

What i try to get out of the replies - is whether there's nothing immediately broken about using asymmetric encryption that endangers us aside from the understandable but not immediate concern about liability.

P.S: Earlier i mentioned "intercepting credentials" in our code - and wanted to highlight that the ease of doing that is precisely the reason why we really want to separate ourselves from handling credentials in any form.


I'd been considering something similar, never really made a PoC though.

How about this: User provides their public key when they signup. To authenticate, server produces a nonce and sends to the client, client signs the nonce with private key and sends the result, server verifies signature with public key.


Interesting thought, i need to think about it - i will and write a reply when i have a little more time than i do at present.

What are your reasons to hold on to the password?


Upon further consideration, this pretty much describes a an authentication method based on a client certificate. Unless i am missing something - i am not sure how to apply this to our problem space. We have traditional type of users who are used to username/password authentication. Solutions that we consider have to build on that as a fundamental model.


If you're using node.js and you use these hashing methods, your entire server is going to pause for 0.5 seconds on a login because it runs on a single thread. Goodbye to all of your server performance.

You can create a worker system, or use a child process to solve this problem, but most of these articles never mention it


I think this says more about node.js than it does about the proper way to secure a password.


Wasn't Node's big draw early on (aside from the fact that it's JavaScript) the fact that everything is supposed to be asynchronous? I'm actually surprised this is a problem.


Everything is asynchronous but the linked article uses the synchronous versions of the functions for some reason... async versions exist and are recommended[1]

[1] https://github.com/ncb000gt/node.bcrypt.js#async-recommended


Do the async methods actually execute on a thread pool though? Or do they use enough async methods internally to periodically release the main thread enough to keep it responsive?

From a quick glance at the source, the answer to the first is no.


This one is truly async, it uses a libuv managed thread pool.


Yep you're right - disregard what I said above. :)


asynchronous doesn't magically make CPU-intensive tasks faster.


It does mean other tasks (e.g. serving pages to visitors) can still be run at the same time as the CPU-intensive task though, like most languages do when it comes to hashing passwords and logging people in


Correct, but it sure does stop your entire single-threaded process from blocking to the detriment of all other requests.


It says both. The hashing is computationally expensive and Node runs in a single process (if we ignore workers). They could have made an implementation that yields and resumes after several ms. or you can spawn a worker and send it jobs.


Wouldnt this happen with pretty much anything that is not threaded by default? Clearly with PHP you give a fuck, but i assume its not different with Ruby, Python, Go, ASP or anything else. You just usually use them threaded.


PHP is usually deployed in a FastCGI multiprocess setup, so the server will keep handling requests on the other processes while the hashing process is blocked.

I wonder whether threading really saves you, given that there are only so many cores in a system and password hashing is CPU heavy. Naively it seems a DoS would involve sending as many parallel requests as there are cores, which is not a lot and can easily be done from a single machine.


The blog post that HN is reading is served by PHP-FPM + nginx, without any extra special features to handle the load of an unexpected "oh hey we made it to the front page of HN".

It's also hosted on a relatively cheap VPS.


Considering the other article from HN today about Stack Overflow being able to handle their top-100 global Alexa rank traffic with one physical web front end, I'm impressed but not that impressed. PHP has been popular for a long time and lots of smart people have figured out how to make it pretty quick.

For things like blog posts, I'm sad that Movable Type fell out of favor. The pattern of using dynamic server pages to edit blog posts and generate static files for your viewers just makes so much sense to me. Then again, these days we can host our single-page JS front-ends on S3 + CloudFront and hook them up to API gateway and Lambda / Beanstalk talking to RDS and "not have any servers" :)


I know thats what i ment with usually you give a fuck, i dont know any way you could run PHP single threaded on a server (i mean there sure is). But its something common you learn when you programm in the other languages i mentioned.

I dont think it could get a real issue in a real environment, except someone really wants to fuck with you. But the some fail2ban on to fast requests should fix this easily as the hashes dont actually take that long.


> i dont know any way you could run PHP single threaded on a server

Using `php -S 0.0.0.0:80 index.php` would do it!


Even with mod_php you get 1 per fork with the prefork mpm, so this wouldn't even stop some Apache 2.2 box with php4 configured 10 years ago from accepting new connections.


Yes. That's why nearly every web framework is threaded by default. That includes everything you'll find in Ruby, Python, Go, or ASP.

Not serializing IO, but them stopping every worker just because one of them has some hard work to do is not a sane working model for web backends.


It's pretty new that Rails is arguably 'threaded by default', and many/most typical deploy setups for Rails still don't run with multi-threaded request dispatch.

But yes, I agree with you that that is why every web app _should_ be threaded.


That's why you spawn worker processes and servers. Evented IO is way faster than threads as NGinX and Node have shown.


It definitely does, and I think that's the GP's point.


Or you could just use the async versions.... https://www.npmjs.com/package/bcrypt-nodejs


Does that library do the calculations in a thread pool, or is it just async on the main thread? If the latter (which seems mostly likely), then it won't matter.


It looks like that one is pure JS, using native node crypt functionality for randomness.

The one recommended in the article[0] actually includes a blowfish C++ binding, with both synchronous and asynchronous methods for comparing and hashing. That binding uses NAN[1] to queue AsyncWorkers. NAN uses libuv[2] behind the scenes to launch a thread[3].

0. https://www.npmjs.com/package/bcrypt

1. https://github.com/nodejs/nan

2. https://github.com/libuv/libuv

3. https://blog.scottfrees.com/building-an-asynchronous-c-addon...


It does it on the main thread, pseudo-async, so I agree with you, you should not use this library. Look for a native module instead.


> You can create a worker system, or use a child process to solve this problem, but most of these articles never mention it

Or you can do the bcrypt() inside your (PostgreSQL) database and add an extra layer of security by not directly exposing the hash (or even the whole password table) to the web server process (i.e. write a password check SQL function and allow only calling it, no direct table/column access).


Congratulations, now there are plaintext passwords in your server logs...

I did exactly what you are suggesting on a project some time ago - deferred to the database to do the password hashing and comparison to the stored hash. Unfortunately, the server query logs contained the query parameters. So did some of the database logs when running at elevated log levels. We quickly decided that was unacceptable, and moved the hashing process into the web server.


I don't see how moving to the web server would prevent that if you were logging passed req body or query parameters since they would be posted in plaintext (hopefully) over a https connection.


“query parameters” here are the parameters passed to PostgreSQL. That is, "SELECT * FROM users WHERE password_hash = get_hash('P4ssw0rd')" is being logged, not "?password=P4ssw0rd".


Exactly so.

The webserver framework I was using at the time logged all its queries to the database at the normal log level. It couldn't log an actual query string because we used prepared statements, but it reconstructed the query and substituted the parameters in.

Enormously helpful for debugging an issue, when you can just copy and paste a query out of the serverlogs. Although I would probably have preferred that it not do that at the normal log level.

Curiously, it never logged the http request string/parameters. When I wanted that I had to add my own code to the request handler.


I'm not sure but I'm assuming this has more to do with Postgres' WAL than it does filtering logs. I've never used this side of Postgres though, so just a guess.


I find that using the original bcrypt node module is hard to build consistently, I often have trouble getting it built on various systems. I switched to bcryptJS which is a drop-in JS implementation.

Anybody here that knows if any of these is a valid replacement? https://nodejs.org/api/crypto.html


Incorrect; the first hashing method recommended uses a native CPP binding which launches a thread to execute both hashing and comparing.

Unfortunately, the example used for the bcrypt node module is the async form of hashing. That library exports both sync and async forms of hash/compare.


The particular module being used in the article has async versions of those methods. He uses the synchronous versions in the example presumably for clarity/brevity.


Have any examples of how to properly handle it with node? Isn't it just the typical callback pattern?


> base64_encode(hash('sha384', $password, true))

> ...

> The above construction may invite theoretical concerns about entropy reduction (i.e. 72 characters of raw binary without any NUL bytes comes out to about 573 bits of possible entropy, but a SHA-384 hash outputs are clearly limited to 384 bits).

Given BCrypt hashes are a mere 184 bits, I don't see how this is a meaningful concern even in principle. If you're brute-forcing search spaces this big you're no longer looking to recover a password, but find a collision.


> Given BCrypt hashes are a mere 184 bits, I don't see how this is a meaningful concern even in principle.

This was added in response to a point that a couple people (or perhaps a convincing sockpuppeteer) raised and tried to use to decry the entire article. You're lucky to get 60 bits of information entropy in any given user's password, as is. The "theoretical weakening" here isn't a practical concern: "2^192 security" is still boring crypto.


I had no idea that PBKDF2 had fallen so much in recent years. I still remember the 1Password team extolling its virtues five years ago:

https://blog.agilebits.com/2011/05/05/defending-against-crac...


Questions like these are why I don't love the Password Hashing Competition.

In reality, PBKDF2 is fine. It's not the best thing you can use, in the same sense as AES is probably not the absolute best round-for-round, cycle-for-cycle cipher you can use, but it gets the job done.

The answer for "what password hash should I use" can accurately be summed as "put bcrypt, scrypt, Argon2, and PBKDF2 on a dartboard, and then throw a dart".

In 1PW's case, PBKDF2 is even more reasonable, because 1PW actually needed a KDF, and bcrypt is not an especially good KDF.


Correction: Scrypt actually uses PBKDF2 internally. (Previously said "scrypt is based on PBKDF2" but that's a loaded statment.)

PBKDF2 is an improvement over PBKDF1 (and other naive iterated hash constructions), but attacks got better and better defenses are called for.


Scrypt is based on PBKDF2.

No, it really isn't.


I could have sworn it used PBKDF2-SHA256 and Salsa20/8 internally, which is what I meant by "based on".


It also uses xor internally, but I wouldn't say that scrypt is based on xor. scrypt does not use PBKDF2 for any PBKDF properties; it's just a convenient arbitrary-length-output hash function. I would have used a sponge if they had been widely available when I created scrypt.


Okay, thanks for the clarification.


It would be great if, in 2020, or sooner, but probably later, the answer is "don't use passwords any more. They are deprecated components of society."


I think there's always going to be a need for an authentication mechanism that relies on user memory only


Can you expand on this idea? What type of authentication method would replace it? (One that can't be directly linked to the person obviously).


Look at what Fido is doing [1] they have new Standards called UAF (Universal Authentification Factor) and U2F (Universal Second Factor). The idea is that you use some kind of local authenticator and then you sign a challange from the server with a private key that is embeded in the authenticator. The Authenticator can be Fingerprint Sensor, a extern device that you authenticate with a password or it could highjack the OS level login.

New Samsung phones and Lenovo Laptops already support this on the fingerprint sencor but the true benefit is that you can have competition among local authenticators without the server having to change anything.

For legacy devices where you do not have any kind of local authenticator their could be local software that lets you enter passwords. Its already part of the standard that the server has a way to only allow some of the authenticators, based on certification. This would allow a bank to specifiy only devices manufactured by a specific company or that have passed some certification process.

Now some people of course say that fingerprint sensors are not very secure. That is true. It still increases security because the local authenticator will only sign the challenge if it is sent from the correct website/software (App Id) and in most cases the TLS Channel Id. UAF (and/or U2F) prevent far more common attacks at the cost of less security if an attacker actually steels your phone. You can combine UAF with U2F for additional security even if your phone/laptop is stolen.

This has been passed to the W3C and its hopped FIDO 2.0 will be an offical standard [2] [3].

Google, Github and Dropbox already support U2F. PayPal supports UAF (works on mobile). Both of these can also work with NFC and Bluetooth LE.

[1] https://fidoalliance.org/specifications/overview/

[2] https://fidoalliance.org/fido-alliance-announces-fido-authen...

[3] https://www.w3.org/Submission/2015/SUBM-fido-web-api-2015112...


I'm optimistic for the use of physical tokens instead of passwords. In my work environment, a lot of authentication is based on physical token possession plus a password just as a guard against lost devices (password validated by the token, not by the service).

These kinds of systems aren't technically very complex but there hasn't been a lot of traction for them outside of corporate environments. The result is that they tend to be hyper-expensive, unfortunately. Standards like FIDO and the Yubikey are good first steps at pushing this into the consumer space, although they don't offer on-device PIN validation yet.


I would love a replacement for passwords, but i dont think hardware based solutions are practical enough yet. It is a Gadget more i need to take with me, and even more make sure that it connects with any device i have.

The worst part for me would be loosing that thing, in the end i would need alternative login methods anyway to be sure i dont lock myself out.

Classic Authy on a Smartwatch would be the simplest method i could live with that comes to my mind.


Its not one more Gadget that you have to carry around. New Lenovo Laptops and Samsung Phones already have UAF enabled Fingerprint Sensors that could be used for all your Web Authentification needs. The Yubikey is either the size of a USB slot or the size of a key on your keychain, so even your second factor is pretty minimal in size.

Alternative Login is always a problem, their is always a tradeoff between security and useability. You could easly print out some backup access codes. You can continue to use your E-Mail as a anker, you get an E-Mail and then your allowed to register a new token. That leaves the question of how does your E-Mail provider secure its login? I think Google is a good example of the options that are possible.


I may be confused but how do you validate the password using the token if you've lost it?


I remember an article about a month ago called "Google's Plans to Kill the Password." In any case, this is not a new problem and many smart people are trying to solve it because passwords suck.

I don't have a solution myself, but I hope there is one some day.


Disappointed that the first solution isn't: Let somebody else do it.

I know this doesn't apply to banking, etc but 99% of the websites that "require" me to create an account and log in don't need to store primary credentials for me. Please pick a secure implementation of oAuth2 and let people store their credentials wherever the hell they want to.

I'm bored of getting hits from "Have I been pwned?"


This requires your users to trust whichever OAuth providers you decide to integrate with. Sometimes, the set of "trusted OAuth providers" for your users is {}. What then?

> 99% of the websites that "require" me to create an account and log in don't need to store primary credentials for me

Why are you giving them valuable credentials? Give them a throw-away password (password managers are great for this).


I meant OpenID. I literally couldn't see I was saying the wrong thing. Everything you say about OAuth is true. I'm an idiot. I'm sorry for getting so blue in the face.

A hybrid between the two (common OAuth-style endpoints and any OpenID endpoint) is the best solution for everybody.


You don't integrate with a provider. You implement the protocol and let your users supply a URL. Layering on popular alternatives (Facebook, Google, etc) help, but use the Stack Exchange model. Let users do what they want to do.

That way users can be their own oAuth providers if they want.


My question was: "What if your users don't trust any of the existing providers on Earth?"

It's hard to make a blanked recommendation like that, even for "only 99%" of websites. Neither you, nor the person building the website, has any insight into who the website's users trust.

Offer OAuth2 as an alternative to passwords: Great move.

Only offer OAuth2 and don't let people create an account: Questionable.


I did answer.

They can host their own.

I don't understand why they would trust <crappy forum owner> over a dedicated authentication storage place but that's their choice. And yes, there is also every possibility to offer direct credentials, per the Stack Exchange model (they host their own oAuth server and allow simple registrations).


> I don't understand why they would trust <crappy forum owner> over a dedicated authentication storage place but that's their choice.

What if <crappy forum owner> happens to be a security engineer, and <crappy forum> happens to be Silk Road 13?

The trust decisions people make are situational and nuanced. OAuth is great if that's where people invest their trust. Otherwise, you're outsourcing it for the user to a company they might fear.


Again. The user picks who they authenticate with. You (the site owner) get no say in the matter. You aren't outsourcing it to any one company.


No, you're saying "which of this limited set of companies are you going to authenticate with" instead. If you don't want to be guilty of taking users' agency away from their own trust decisions, you need to do one of two things:

1. Let every website on the Internet potentially be an OAuth provider.

2. Make OAuth optional.

If you follow option #2, then this article is still relevant because you need to handle passwords securely.


Your first paragraph is like saying using email is forcing somebody to use one of a "limited set of companies". It's nonsense. Again, if they don't like what's on offer they can host their own, just like email!. They can hire a company like yours to host their credentials with as many layers of security as they want. The user has ultimate choice.

Secondly, every website on the Internet is potentially an OAuth provider.

Not to mention that I have —on multiple occasions here— suggested that websites that consume OAuth should also provide it (like Stack Exchange).


I thought that Stack Exchange was using OpenID?


Anyone have a good explanation of why, in the Python example, they recommend `hmac.compare_digest` instead of `==` for comparison?

Is there something obvious I'm missing here?


== in python will stop comparing after the first character mismatch. You can use that fact to test byte by byte your password knowing that the more good characters you have, the longer the comparison will take, which is called a timing attack.

hmac.compare_digest is a constant time compare, in that no matter if there is a match or not, it will take the same amount of time.


Timing attacks aren't useful against password hashes, but avoiding them is a good habit to be in.


This is "mostly true" but with an edge case...

If the attacker knows which algorithm and work factor you're utilising and your system doesn't use randomly generated per user salts (or an unknown pepper) then theoretically an attacker could use a hash timing attack, combined with a rainbow table, to massively reduce the scope of a user's potential password.

For example, let's say your rainbow table has 20 million password-hash combinations and your hash length is 33 letters, for every letter I know I could drop literally millions of hashes I know it ISN'T. With just the first four letters I could drop it by almost 60% (although my maths here might be completely wrong, it is actually pretty complicated to determine). But there's no real limit on how many letters you could drop from the hash with a timing attack, you could turn someone's password into a 1 in 62 chance.

The TL;DR: Use a per user salt. But a timing immune comparison definitely is "defense in depth" in case there is a bug elsewhere that breaks salts.


I chose the words "password hash" carefully. If you're using a secure password hash, you're not worrying about salts, because they take care of randomization for you. If you are worrying about salts, you probably have bigger problems than memcmp timing.


I literally don't understand what it is you're getting at.

Password hashing has existed since at least the 1970s and until the 2000s salting wasn't common. While some hashing libraries do insist on you supplying a salt, it is still ultimately up to the application developer to generate and store the salt for later usage.

Therefore it is still common for an application developer to "worry about salts" even if just for storage and generation reasons.

Anyone using 3DES, MD5, or similar is likely vulnerable to timing attacks, and they're definitely still in the realm of a "password hash." Plus some wonderful developers hard code the salt (salt = "secret") which too could leave them vulnerable to timing attacks if an attacker knew what hashing algorithm and workfactor (e.g. the default) was in usage.


> I literally don't understand what it is you're getting at.

I truly hope this will help then: https://paragonie.com/blog/2015/08/you-wouldnt-base64-a-pass...

"Password hashing" is its own compound noun. The acceptable algorithms (as defined in the blog post this HN thread is about) take care of this for you.

> Anyone using 3DES, MD5, or similar is likely vulnerable to timing attacks, and they're definitely still in the realm of a "password hash."

3DES is a block cipher. MD5 is a crytographic hash. Neither of them are password hashes.

> Plus some wonderful developers hard code the salt (salt = "secret") which too could leave them vulnerable to timing attacks if an attacker knew what hashing algorithm and workfactor (e.g. the default) was in usage.

What you're describing is closer to a "pepper".

http://blog.ircmaxell.com/2012/04/properly-salting-passwords...


> 3DES is a block cipher. MD5 is a crytographic hash.

3DES is both a block cipher and a cryptographic hash. At least UNIX thought so in the 1990s as many MANY people were storing UNIX passwords in 3DES, DES, MD5, and similar.

> Neither of them are password hashes.

25 years of computing history would disagree with you. MD5 was the defacto standard for password hashing for almost fifteen years.

But no doubt you'd playing silly word games, and are going with your own definition of "password hash" that includes or excludes different hashing algorithms as it is convenient for you. I won't get drawn into that.

> What you're describing is closer to a "pepper".

What you're doing is called being "condescending." You know full well from my posts above that I am familiar with salt/peppering/hashing, and the different technologies involved. So linking to 101 tutorials and definitions of basic terms is only intended to aggravate.


[deleted]


I think everyone is getting a little to emotionally invested here. Might be a good time to take a step back and detach a bit. Apologies for interrupting your conversation, I just don't like seeing everyone going for each other's throats on HN.

"Be excellent to each other."


"You know the law: two men enter, one man leaves."


> This is a really dumb semantic argument. You're arguing over the meaning of what I said upthread. I will spell it out for you:

No, I'm not. And never was.

Someone replied to me splitting hairs over the term. I never even brought it up.

Did you reply to the right post?


To be honest I'd probably interpret them as being antagonistic if the same statements were directed at me (and quoting never helps either) but I think if you take a charitable interpretation it doesn't read that way. I think it's just a direct/technical challenge/debate sort of reply and it's really easy to read into those when you're on the receiving end of them.


> While some hashing libraries do insist on you supplying a salt, it is still ultimately up to the application developer to generate and store the salt for later usage.

If your password storage mechanism requires you, the programmer, to generate a salt, you may well be using the wrong password storage mechanism, or using it in the wrong way.


> If … your system doesn't use randomly generated per user salts

… then you’ve already lost, constant-time comparison or not.

Also, rainbow tables are obsolete.


Precisely what Thomas said.

Using == instead of hmac.compare_digest is unlikely to be a source of vulnerabilities in your application, but it's a good habit to get into whenever you touch cryptography.

See also: https://news.ycombinator.com/item?id=10345965 (pg. 33-36, 42-43 of the PDF)

    Again consider timing leaks.
    Many interesting questions:
    How do secrets affect timings?
    How can attacker see timings?
    How can attacker choose inputs to influence how secrets affect timings?
    Et cetera.
    
    The boring-crypto alternative: crypto software is built from instructions 
    that have no data flow from inputs to timings.
    Obviously constant time.


Would it still let you sneak in if the hash was rubbish? Say it was just a simple md5 of the password. If you could do a timing attack you could use rainbow tables of the hashes and whittle down the the possible set of passwords really quickly, right?

So if the timing attack has told you that the first letter of the hashed version is 'A' then you find another password from the table that hashes to AAxxx, ABxxx, etc.

Obviously that depends on you being able to precompute all the passwords in a consistent way as the hash being used.

Along the same lines - could you use a timing attack to figure out a salt? I guess it's near impossible?


You could potentially leak the first N bytes of a valid unsalted trash hash (e.g. MD5) and then use this information to optimize an offline brute-force attack. The more bytes you leak of the hash, the more you can narrow down your offline attempts and the less subsequent packets you need to fire.

I was going to develop this into an exploit tool, called TARDIS (backronym for Timing Attack to Remotely Dispel the Illusion of Security) against, e.g. Piwik, Oxwall, and other products that still use MD5 passwords. The main reason I didn't was: No free time to build it and tune it against the internals of various programming languages' == implementations.


I don't think that's an issue when your comparing salted passwords. AKA F('Passsword') = Hash('Passsword' xor 'some value') = "123456"

F("value") > "123455" which is close, but that does not let you get a 'better' guess.

PS: Assuming the Salt is hidden, and the Hash is secure.


hmac.compare_digest is constant time whereas == will return as soon as a mismatch is found. The difference in return time can be measured. The key phrase is a Timing Attack[0].

[0]https://en.wikipedia.org/wiki/Timing_attack


I'm not a python person but it is probably a constant time comparison. If you compare things in such a way that the time it takes to complete increases with increasing prefix match than timing attacks are possible to recover the secret.


For some reason the article completely fails to link to libsodium: https://download.libsodium.org/doc/


> For some reason the article completely fails to link to libsodium: https://download.libsodium.org/doc/

The "bindings for most programming languages" link goes to the libsodium documentation, but I'll add a link in more contexts.


The article offers specific advice for Java-without-libsodium, but it offers no such specific advice for Java-with-libsodium.

There would appear to be three distinct Java bindings of libsodium.


(I'm not ignoring your comment, I'm currently debating whether to update it to include libsodium example code or to write a separate post for that.)


Write a separate post and then link to that from your current post. Best of both worlds :D


I have always been impressed by the way django does it.

https://docs.djangoproject.com/en/1.9/topics/auth/passwords/


Assuming this is legit (seems like it), mega bonus points for walking through examples on the various platforms/languages.


From previous discussions about this topic, I had noted down the following best practices:

Passwords should be scrypt'ed on client, and then, the server should generate a SHA256 hash of the scrypt'ed hash and store that in DB.

- Running CPU & memory heavy scrypt hashing on the client side will allow us to use bigger hashing work-loads.

- EDIT: Removing the MITM point, because as many said, that's the job of TLS anyway.

- External brute force attackers will have to take the burden of heavy hashing. No DOSing through scrypt.

- Storing SHA256 hash instead of scrypt hash on DB means even if DB is stolen, attackers can't use stolen scrypt hashes to authenticate any client.

I would love to get others' feedback on this. EDIT: Found the reference: https://news.ycombinator.com/item?id=9305504


I learned very early on to never trust the client. An infected machine cold just not encrypt the password and now you have a database where infected clients just got sha256. That is if course easily broken with brute force or even rainbow tables.


An infected machine would more likely steal the password, then still run the scrypt. Why make it easier for someone else to steal it too?


How are you going to calculate the scrypt hash, client-side?

Doesn't that scrypt hash then become the password, from the server-side application's perspective?

How are you storing the salt for the user if your server only knows about a SHA-256 hash.

> MITM attacks won't get access to unencrypted fields.

That's TLS's job. If you, for example, are building a web app and you're delivering Javascript to perform the scrypt calculation, a MitM can replace the code to exfiltrate the user's plaintext password. It doesn't make sense for the threat model.


Very valid questions. Let me try to answer them and let's see if those answers are valid or not.

Yes, the scrypt hash just becomes the password for the server. However, this password is going to be unique and almost impossible to guess. Salting this password on server side doesn't give us any additional benefits. The server can store the salt that the client used, though.

Yes, the MITM point was minor. Just another minor security benefit on top of TLS (which takes the majority of the burden of securing against it). So, maybe I shouldn't even mention this point.


>Yes, the scrypt hash just becomes the password for the server. However, this password is going to be unique and almost impossible to guess. Salting this password on server side doesn't give us any additional benefits.

I don't even your threat model.

Can you walk us through how you see this working?

Client takes 'crappypassword' as input, sends p = scrypt('crappypassword')

The server stores SHA256(p)?

EDIT: misread what you said. is this correct?


Yes, now it's correct.


And still invalid. Are you assuming that collisions against SHA256(bcrypt(p)) are harder than SHA256(p), right? Any math to prove that?

Otherwise, you're relying on the user side to properly safeguard bcrypted password. That is often wrong. You cannot store any kind of transformed password anywhere, or it's no longer a password, but a token.

Yes, those "Remember me" things transform passwords into tokens. Tokens can be stolen.


Note: I'm not endorsing the client-side scrypt/bcrypt approach, but I do think it's interesting.

I'm going to refer to bcrypt because that's what your comment used, but the parent post used scrypt.

> Are you assuming that collisions against SHA256(bcrypt(p)) are harder than SHA256(p), right?

I don't think it is. I think it's assuming 2 things:

1. That SHA256 collisions are rare enough, and SHA256 attacks are hard enough, that they can be ignored.

2. That the output space of bcrypt is large enough to overcome the risk that salting is usually designed to overcome.

On the first point: This approach simply isn't worried about SHA256 collisions between passwords. Yes, it's theoretically possible that 2 users with different passwords and/or different salts might end up with the same hash. If that happened often then it would be an issue - if you had n distinct users in your system but only n/2 distinct hash values, then if an attacker had a copy of your password store, it would effectively double the pay-out each time they successfully cracked a user's password (that is, each cracked password would allow them to authenticate as 2 users).

But in practical terms, collisions are going to be tiny, and you're really worrying about the case where n distinct users have n-1 or n-2 distinct hashes. That's not going to meaningfully change your exposure.

What is more of a risk (and this might be the point you were making) is that if any of your hashes happen to collide with the hash of a known input, then you're screwed, and while that's unlikely, "unlikely" isn't an ideal protection.

When you're storing { SALT , HASH( SALT || INPUT) } you're reasonably protected against such collisions because they would only be effective if the known input happened to start with the salt.

What the bcrypt approach can offer is that it knows that the input to SHA256 needs to be the output of bcrypt, so you can refuse to accept anything that isn't 186 bits long (or 31 base64 characters, depending on your approach). That constraint might well be stronger protection than a salt, although I haven't run any numbers.

Probably adding a salt is safer, and I was going to attempt serious analysis on this scheme I'd certainly want to test whether that was true or not.

On the second point, salting is usually designed to work around the scenario where 2 users have the same password and therefore (absent a salt) would have the same hash. It is not specifically intended for the scenario (described above) where two users have different passwords that happen to hash to the same thing.

In the approach discussed here, the input to the hash is the output from bcrypt. That value has already had a salt applied, so 2 users with the same original passwords would be providing different inputs to our hash function, so we would be storing different outputs.

> you're relying on the user side to properly safeguard bcrypted password

I think that's a genuine issue. You need to do a not-insignificant level of client-side processing on the user's password. You're relying on the browser disposing of that data securely. It's "just bits in RAM", but bits in RAM leak, and there's no way for the browser to know that the bits you were working with were sensitive.


I think the simplest thing is for the server to generate a salt for every user and add a step for the client to request the salt for a given user. If you want to protect against account enumeration then you need to be able to generate consistent salts for accounts that don't exist also.


I don't hate the idea of putting this piece:

    base64_encode(hash('sha384', $password, true))
In client side JavaScript. I've seen my own passwords scroll in front of my eyes when debugging servers and reading POST variables. It's a minor level of shoulder surf protection before it hits the proper hash on the server.


"shoulder surf protection"?


"Shoulder surfing" refers to people extracting personal information, such as credentials, from your computer by just looking over your shoulder while you're typing them in or displaying them.

Precisely the reason why password input fields usually display replacement characters instead of actual characters (or nothing at all if you're on a Unix terminal).


How would transforming it before sending it over the wire help here?


Not at all. If I want to impersonate someone, I just have to act like them. The only protection is 2FA.


I think the idea is that the scrypt hash does become the password, but it's a better password. A password with a lot more entropy (kinda). The increased kinda-sorta entropy makes the use of a slow key derivation function on the server side unnecessary, so SHA-256 becomes sufficient.

Not sure how I feel about it.


I feel that this is needlessly dangerous, personally.

How are the salts managed? This detail is important. SHA256(scrypt(password, constant_value_instead_of_salt)) is going to produce collisions in the stored hash.


The salt could be created from the username (possibly with a standardize form step for case insensitively and maybe also including a site salt).

On mobile devices client side hashing could affect battery life. If done with javascript, then login won't work without javascript enabled. Also, a work factor that doesn't slow down the slowest devices that a user might use could provide much less protection than desired, although the server could also do some work.


I think he's going for SHA256(scrypt(password, salt, work_factor)) and saying the bare SHA256 is sufficient. As opposed to SHA256(salt+scrypt(password, salt, work_factor)).

I think it's dangerous because of the dependence on the client, and useless because scrypt(password,salt,work_factor) on the server is plenty hard to attack.


> scrypt(password,salt,work_factor) on the server is plenty hard to attack

For a large enough work_factor. In practice "large enough" is usually interpreted to mean "something that seems safe without being so large that I need to buy lots more servers"

The argument (which I'm interested in, but not yet sold on) is that moving the scrypt to the client allows you to pump up the work_factor even higher than you would have been willing to do on the server.


After thinking about it more the concept is interesting, but the main fear I have is that doing it in a web browser with JS is next to impossible to secure. Maybe if vendors got on board...

In general though, pushing auth down to clients in Javascript makes my skin crawl. You're one XSS away from having an attacker no-op your scrypt and return SHA256("secret-attacker-password") on registration. The logical follow-up to that: 'have the server run scrypt the first time' -- but then you've just moved the tough work to user registration, which seems just as exploitable.

I dunno. Safe crypto is hard enough already; I'm not sure pushing it into Javascript on the browser makes it any easier.


The proposals I've see have always had the initial scrypt be done on the server.

The DOS risk on registration can be mitigated more easily than the login one. e.g. Rate limiting new registrations will often be more palatable than limiting logins, or you can require "email validation" before you set the first password. And, not every application allows self registration.


> useless because scrypt(password,salt,work_factor) on the server is plenty hard to attack.

It enables DoS attacks, though, right?


Since the hash is derived purely from the password, it demonstrably does not have any more entropy than the password.

Before you mention the salt - the salt is only useful if it if available. The client needs to know the salt in order to perform the calculation, therefore your attacker will also know the salt, so that doesn't count as entropy in the hash, either.


I'm not sure how I feel about it either. That's why I decided to ask for feedback. As for responding to sarciszewski's comment, there will still be a salt on the client side. We obviously don't want to use a fixed salt for all users.


> MITM attacks won't get access to unencrypted fields.

A MITM would let you hijack the JS that controls scrypt/SHA-256, so you're already at game over. You've got to deliver that to the user in some fashion (TLS!); this isn't a real win for your approach.

> External brute force attackers will have to take the burden of heavy hashing.

If the attacker is trying targeted access to a site the only thing that's relevant is the time they have to expend - the hash is opaque. If they have the hash from, say, a DB theft, they're already going to have to take that burden. Your approach doesn't seem to add anything.

> Storing SHA256 hash instead of scrypt hash on DB means even if DB is stolen, attackers can't use stolen scrypt hashes to authenticate any client.

I'm not sure what that means. If you steal my scrypt hash for example.net, how do you use that to authenticate me to example.net?

Your approach pushes out complexity to the clients and doesn't seem to win us anything.


You're putting a lot of faith in the client.


Very interesting. Is there a reference implementation or discussion you can link to?


I'll try to find the discussion. EDIT: Here's the discussion - https://news.ycombinator.com/item?id=9305504


Wish there was a site where it lists algorithms and gives a table, and an ability to compare it to x years ago:

algorith | fairly safe difficulty (all variables) | very safe difficulty without incurring too much performance cost.

pbkdf + sha1 | completely unsafe | completely unsafe

pbkdf + sha2 | 100000 | ...

pbkdf + sha256

bcrypt

scrypt

argon2


PBKDF2-SHA1 is safe.

That's the problem with a chart like this. The gradation will go from "completely unsafe" salted hashes to "very much safe enough" with only marginal changes after that.

Another problem is that these functions are all parameterized, so the chart needs to capture the safety level at specific parameters.


My goal was to create a chart of:

algorith | a safe set of parameters | a VERY safe set of parameters but needs strong hardware

Maybe a 4th column of "unsafe if difficulty is below"

This could be something updated yearly or whatever to help people figure out what things they need to move towards. If they see that their currently used settings are below the unsafe line, they will know it's time to upgrade.

Example: I use PBKDF2-SHA1 at difficulty of 11k iterations. Is that number still within the "probably okay" list, or is that in the "I can hack any password on your list in 15 minutes"


It would make for a good interactive "chart", perhaps. Whereby, as long as your sliders aren't all the way to the left, the chart is mostly green.


I'm part of the Paragon Initiative Enterprises team and have access to edit the blog. If you have any questions (AMA) or would like to suggest any additions, please let me know.


We really appreciate the post, thanks! Not a suggestion for this topic, but would love a future post on request signing in 2016 (think HMAC). In particular there are so many JSON REST APIs these days going over HTTPS that it's hard to determine best practices and what's overkill.


Thanks for the suggestion, I'll put it on the list. :)


Perl programmers should try this module: Crypt::ScryptKDF I have been using it for a while now


And with DBIx::Class it is almost easier to do it right[1] than wrong.

[1] https://blog.afoolishmanifesto.com/posts/do-passwords-right/


What I find frustrating is the lack of availability of most of these algorithms for the most common platforms (.net, php, java). The author recommendation seems to be driven by availability, not the algorithms own merits.


If you're using any of: PBKDF2, Bcrypt, Scrypt, Argon2, then you're fine. Our recommendation is:

    1. Use the best option available, but
    2. We provided example code in multiple languages for the
       best one that's widely available


It wasn't long back, trying to use bcrypt in PHP was an exercise in futility.


I believe the Ruby example is incorrect: When checking a password's validity, you must use a constant-time comparison or else you are exposing a vulnerability to side-channel timing attacks.

There is an open issue in Coda Hale's bcrypt repo about this: https://github.com/codahale/bcrypt-ruby/pull/119

My stance on posting "best practice" articles is: you must follow all best practices in them.

<Edited for clarity>


As mentioned elsewhere in the thread, this isn't a "you must", this is a "you might as well".

Timing attacks depend on an attacker having control over the hash being compared (e.g. they have a HMAC in a cookie they're sending you, and they can adjust it character by character) - with randomised secret salts and server-side hashing, this isn't the case.

The SCrypt example is the one I'm more concerned with. The defaults there are 1MB of RAM and 64-bit salts, both of which could do with increasing. I have an open issue on this: https://github.com/pbhogan/scrypt/issues/25

As it is, the example would probably be better as this:

    password = SCrypt::Password.create(usersPassword, salt_size: 32, max_mem: 16*1024*1024)
You'll be pleased to know SCrypt::Password#== is at least constant-time.


> I believe the Ruby example is incorrect

Unfortunately, there's nothing I can do about that, unless someone can point me to an alternative that uses a constant-time comparison.

I've left a comment on the pull request so that, hopefully, it can be merged.

> (and likely others)

Which others?


After re-reading the others in depth, all other examples appear to use appropriate comparison methods (knowing nothing of the underlying implementations). I've updated my comment to clarify.

I see your team is actively posting on the bcrypt-ruby issue #119 as we speak, so I guess I'd say wait for the PR to merge, or manually implement the approach they've outlined to secure compare: https://github.com/codahale/bcrypt-ruby/pull/119/files

More

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: