Hacker News new | comments | ask | show | jobs | submit login
Mozilla's secure coding guidelines for web developers (wiki.mozilla.org)
250 points by girishmony on Oct 1, 2011 | hide | past | web | favorite | 65 comments

    Invalid login attempts (for any reason) should return the generic error message:
    The username or password you entered is not valid
In practice, on any non-trivial website, it doesn't make a difference for security.

Registration form will show a specific error when you try to register username that is already taken. Password reminder form will show error when you request reminder for an unknown e-mail. Some websites even have AJAX APIs for checking validity of usernames/emails!

Because of that it's easy for an attacker to check whether username or password is invalid. Vague error messages make it only hard for the user.

Why not emphasise user privacy more by hiding emails. eg "An email should be sent to the address above. For user privacy, we cannot confirm if it exists."

I wonder how many emails scrapers have done of Forgotten Email pages.

This could work, but there's still vulnerabilities. How about a timing attack? The page with that message probably responds faster if the email address doesn't exist. (Sure, a timing attack can be fixed if you can make the page run in constant time; but is it really cost-effective to spend time developing that rather than actual features?)

1) Validate email address. 5-10ms

2) Send a non-blocking request with a flag, eg "Good" "Bad".

3) Return message to user that email has been sent but can not confirm if the email is true for privacy reasons.

While yes, it can be down to a timing attack, the trouble is that this vector can be used against sessions, logins, etc. It can be a standard that Mozilla should adopt.

This is a tricky issue, for the exact reasons you mention: already taken usernames will cause an inadvertent verification of its existence, as can password reminders that use e-mail addresses as input. The latter can be handled by giving an ambiguous response, which comes with a slight (but negligible) cost of annoyance for forgetful users. The only method that comes to mind for avoiding username mining is forcing the user to pick from a fixed number of available usernames based on f.e. a part of the supplied e-mail address. This is a bit of "userland villainry", though.

sorry, but that's stupid.

a correct site - and I have corrected dozen of them, and they work just fine does not give a specific error msg for password reminders either, or any other function.

so the generic message should be returned from every such function, and i'm pretty sure they talk about password recovery as well. (and again ANY such function should return a generic message)

mind you, its much easier to compromise a site when you can check the username and just have to crack the password. you can automate it easily as well.

ps: oh, look,next paragraph after what you pasted: "The following message should be returned to the user regardless if the username or email address is valid:" for recovery. Pretty sure you've seen it and voluntarily ignored it :-( mean mean mean.

How would you deal with a registration form that takes a username as a parameter, without leaking whether or not a username is already taken?

I would probably have a CAPTCHA on the form already, to prevent automated signups. Preventing username leakage is a side benefit. An attacker would need to hire a CAPTCHA farm to harvest any significant number of usernames.

If you don't want to use a CAPTCHA for regular signups, you can add one to the page dynamically when you see multiple registrations from the same IP address.

Does this mean that you wait until the user submits the whole login form to display an error message if the username is already taken? Do you display a new captcha for each attempt?

What about sites that let the user know his username is already taken using AJAX? Should this be avoided too?

I like zobzu's idea of using an email as login, though.

In the response, thank the user for initiating registration and send her an email with a link (and a token) to continue with the process.

Of course, the downside is that you're slowing the user down. It's acceptable for the sites that choose to require valid email addresses: if you're going to go there you might as well get it done sooner.

You'd also be spamming potential victims, but that may not be that bad, as you'd also be alerting them.

You could set a cookie after a successful login. If a later attempt fails because of a munged username or password, you could inform the user exactly what the problem was if you know they have logged in successfully before and the username is correct or "close" to the real username (not sure how close is "close enough").

EDIT: oops meant this to be a reply to the GP.

But then people would complain that you were tracking them after they've logged out of the website...

email as login plus emailing him for verification captcha against automation for the spam

and if thats too annoying to implement use browserid.org

This is a great resource, but some of the input validation stuff doesn't sit well with me, for example:

> Examples of Good Input Validation Approaches... Firstname: Letters, single apostrophe, 1 to 30 characters

First, I'm not sure if I should interpret letters as [A-Za-z] or something more inclusive of non-Latin characters. But anyway, why restrict this so much? What about spaces, as in Mary Ellen; dots, as in P.J.? Heck, why can't I use a hyphen or a number? Just because you might not try to name your kid Brfxxccxxmnpcccclllmmnprxvclmnckssqlbb11116 doesn't mean nobody else will (http://en.wikipedia.org/wiki/Naming_law_in_Sweden#Protest_na...).

Perhaps I'm not seeing the forest for the trees here, but when it comes to restricting input, it always seems there's a risk of "We can not accept that last name" behavior (http://www.cooper.com/journal/2009/09/we_cannot_accept_that....). If you're properly sanitizing/escaping on the way out, why be so harsh on the way in?

One interesting/cool suggestion that I think is worth noting specifically: the use of HMAC+bcrypt instead of just bcrypt for secure password storage.


- The nonce for the hmac value is designed to be stored on the file system and not in the databases storing the password hashes. In the event of a compromise of hash values due to SQL injection, the nonce will still be an unknown value since it would not be compromised from the file system. This significantly increases the complexity of brute forcing the compromised hashes considering both bcrypt and a large unknown nonce value

- The hmac operation is simply used as a secondary defense in the event there is a design weakness with bcrypt that could leak information about the password or aid an attacker

I thought this was interesting as well. The second bullet is a bit misleading though. The benefit is not just in case there is a design weakness in bcrypt. The real benefit is briefly summarized in the first bullet: It's effectively forcing the attacker to compromise not only the database but also the file system to be able to make any offline password guesses.

With bcrypt alone, compromise of the password hashes still allows brute-force offline dictionary attacks. bcrypt means that each guess might take, e.g., milliseconds instead of microseconds, but an attacker has all of the information he needs to make offline guesses and check them against the compromised hash.

The hmac step means that an attacker who has the password hashes but not the hmac nonce effectively doesn't get any offline guesses. Assuming the nonce is e.g., 128+ bits, it'll be computationally infeasible for the attacker to guess the nonce itself, without which he can't verify any offline password guesses.

What I've been doing for quite a while is fairly similar. Instead of simply using a salt for each password I also have a random "site salt" (file system nonce), as follows:

  hash(password_plaintext + salt + site_salt)
Assuming an attacker can't access my site salt, is this less secure than using HMAC+bcrypt? (my hash function is fast)

Ironically, immediately after reading these guidelines, I checked my email and had just received an email from Mozilla's mailing list service that contained my password in plaintext. Oops. (To be fair, it looks like they're just using Mailman http://www.list.org/)

Whats the point of this?

> Email verification links should not provide the user with an authenticated session.

It always bugs me. The "forgot password" links only allows me to choose a new password, but does not log me, adding a extra step.

The point is obvious: it would give anyone who intercepts the (plain text) email full access.

However, the price for not doing this is pretty high in terms of conversion, so as far as I'm concerned it's not a black and white issue.

If there's nothing particularly sensitive to be compromised (and that usually isn't the case at this stage), simple measures like rapidly expiring the verification URL and allowing it to be used only once is "good enough" for most sites.

There are no absolutes in security.

If the link allows you to pick a new password without knowing your current password, anyone who intercepts the email with the link would also get full access using the new password.

It's talking about email verification, not password reset systems. In other words, those types of URLs should only establish a connection between an account and an email address: they shouldn't act as a means of authentication.

Ah ok.

Aren't both of these equivalent though?

(I would think the point may be that a compromised email, doesn't provide access later. Both these scenario are equally vulnerable then?)

It would also be to make sure an attacker can't just iterate through or guess at the emailed URLs and get valid, logged-in sessions without needing to properly authenticate.

Tagged.com allows authentication through most of the emails they send to you. Frightening from a security perspective.

Great from a usability perspective!

The only thing I can think of is it allows the user to realise if their account had been compromised because the password have been changed. But I agree with you.

  Ensure that a robust escaping routine is in place to prevent the user
  from adding additional characters that can be executed by the OS (
  e.g. user appends | to the malicious data and then executes another OS
  command). Remember to use a positive approach when constructing
  escaping routinges.
Surprises me that they regard sending client content to the OS at all. What is wrong with parametrized execution using using functions like os.spawn*, which place arguments straight into the called function's argv list?

theres more than just python you know. you have to take care of this regardless of the engine behind.

beside, even parametrized, you have to be careful. I'll remind you of the 2004 Safari handler exploit. Apple fixed the first exploit by using parametrized argument list. Woot. Next day, exploited again because some programs call execve on their argument list, for example ssh -o'ProxyCommand=exploit here' which is 2 arguments (-o and the complete proxycommand line including the exploit line).

This is valid for websites as well.

I think they are also talking about file/directory names constructed from user input.

Ensure the "tweet this" or "like this" button does not generate a request to the 3rd party site simply by loading the Mozilla webpage the button is on (e.g. no requests to third party site without user's intent via clicking on the button).

Thank you for this.

I'm not sure you can do this without voiding the agreement with twitter/facebook unfortunately.

I'd suggest the use of Ghostery (works on all major browsers) on the client side, and on the server site, well, you know, not use any like button :>

Heise Online worked out a solution[0]: a two-click Like button. The first click replaces a placeholder with the official Facebook Like button. Facebook objected at first, but only because Heise tried to make their placeholder look official. A quick design change allowed them to keep it.

[0]: http://news.ycombinator.com/item?id=2957119

Yeah that's why I'm not sure. It could be attacked depending on the country I guess, and if you don't use a facebook-like icon, its hard to tell what you're going to like. Debatable I guess

twitter allows custom links/ buttons. These don't require on-loading JavaScript or any calls to their services.

Most GET requests to third part sites are fine, you need to be more precise.

A GET request for an embedded resource exposes the user's cookies for that domain and associates a user of one site as a user of another. This is fine when it is at the user's explicitly intended request, but when paired with certain sites known not to delete all cookies on logout, this is nefarious and should not be done.

A lot of people want to get into web development. One thing they have to understand as that while the barrier to entry is low there are a ton of nuances that separate a mediocre web developer from a great one.

These guidelines are a good example of what web developers have to deal with on a daily basis. Certainly not trivial.

I see myself as quite competent, but I still wouldn't trust myself to catch all the nuances. And even if I could it would be too much work for every new project.

I think this is a good argument for the existence of a suite/library that manages things like password storage, recovery, validation, etc. Integrating everything from using good salts and hashes to captchas and retry delays.

Passwords must be 8 characters or greater

Half of the top 50 cracked Gawker passwords were 8 characters (and longer passwords were not exposed, due to the nature of the vulnerability). Since 8 character passwords are vulnerable to a known common weakness (in DES), this should be revised to:

Passwords must be 9 characters or greater

This will prevent your users from using passwords that are vulnerable to the DES attack if they reuse them on other sites.

Lets think about that for a minute. DES truncates at 8 characters, so you're right that if a database with DES-encrypted passwords leaks and is brute forced an attacker will only get the first 8 characters of a password.

But what if my password is the word "biological"? By knowing the first 8 characters, the attacker has drastically reduced the number of guesses that need to be made (assuming a priori knowledge that the password is shared between sites).

Also consider MD5(PASSWORD) and SHA1(PASSWORD). Those are both fairly common constructions for "secure password hashing" [note: they're not really secure] in web applications and both of those would yield up the entire plaintext password if an attacker used a brute-force or rainbow table attack.

If you're designing a secure web application, you can't make your goal to secure all the other websites on the Internet. Bumping the minimum number of characters to 9 wouldn't significantly impact the security of your users. If you're really worried about a situation where a user's password is disclosed, you should consider offering two-factor authentication options for your users.

All good points, but allowing 8 characters still allows '12345678' and 'password', two of the most egregious examples of weak passwords. Granted, weak passwords will always scale to the next minimum ('123456789' or 'passwords' for 9 characters), but 8 character passwords are already among the lowest hanging fruit, so including them in the minimum is misguided.

Don't forgot 'password1' ;-)

And the guidelines specifically say "Blacklisted passwords should be implemented (contact infrasec for the list)" which indicates to me that known common passwords like '12345678' and 'password' will be disallowed (although we don't have access to the list).

My opinion (and we may have to agree to disagree on this point) is that adding one character to the minimum is not going to make a significant difference in application security. I don't believe it mitigates the danger of a leak of DES-encrypted passwords. If you're concerned about a scenario where a user's shared password on another site is compromised, your application can use two-factor authentication or mandate the use of strong pass-phrases instead of traditional passwords.

OT but scary: http://michaelkimsal.com/blog/wp-content/uploads/2011/06/Scr...

This is a financial institution.

I have a question about the password policy.

    All sites should have the following base password policy:

    Passwords must be 8 characters or greater
    Passwords must require letters and numbers
    Blacklisted passwords should be implemented (contact     infrasec for the list)
Is it responsibility of the website to make sure that the passwords are strong for the general user ? Isn't it the user's responsibility to create a good password ? I would think that the site should let the user know about best practices but ultimately it should be up to the user whether to follow it or not.

Is it responsibility of the website to make sure that the passwords are strong for the general user ?

Yes. If a password is too weak and it results in a user's account getting compromised, the website gets the blame. Thus, the website mandates the password policy.

And depending on the site, weak passwords and compromised accounts directly impact other users. Imagine if eBay had no password policy and some of their largest sellers had their accounts compromised as a result; what impact would that have on buyers?

Enforcing password strength makes sense for a site which stores sensitive data. That said, some simple math suggests that length, not character complexity, should be required.


I wonder if there are any sites that simply forbid passwords that are known to exist in rainbow tables.

I'd assume that's what they're talking about with the 'use blacklist'. It'd be easy enough to occasionally repopulate it with "obvious" or known-compromised passwords that turn up.

Likewise, I assume they're keeping that list semi-secure to avoid black-hats/kiddies getting their hands on a list of really good passwords to throw into their cracking engine ruleset.

Great feedback. I'm glad to see this guide was helpful and I've made a few enhancements/updates based on these thoughts.

-Michael (@_mwc)

  Example A field accepts a username. A good regex would
  be to verify that the data consists of the following
  [0-9a-A-Z]{3,10}. The data is rejected if it doesn't
I guess then that pg won't be able to sign up for your service... Nor will donfernandovillaverde79.

In the example given? No, the bounds are restricted. The actual text above though is the important bit:

    The variations of attacks are enormous. Use regular expressions 
    to define what is good and then deny the input if anything else is received. 
    In other words, we want to use the approach "Accept Known Good" instead of 
    "Reject Known Bad"

On a related note, one of the web sites I frequently use stripped all spaces from my password without notice before storing it on the server. So when I registered with a password like 'I am Sam', I found I could only log in using 'IamSam'. Any attempt to use the original password caused an error. I reported this to the site admin, and the solution they came up with was to silently strip the spaces from the password as it is typed into the form. Now I can type in 'I am Sam', but 'IamSam' is the actual password sent to the server. File this under '2 wrongs don't make a right'.

A while ago, Tracfone's web site had a rather disappointing bug. Both the account registration and login flows did password validation, and used different validation functions. The result was that I could create an account with a password containing special characters that the authentication system would reject as containing invalid characters (and therefore not even try to verify against my stored password/hash).

Moral: account registration and authentication must use the same password normalization functions, and if you validate at auth time (which is pointless, but hey), the validation function must be the same as the registration one.

Better moral: just don't do silly things with passwords. Encrypt them and store them, accepting whatever the user wants to send you that's sufficiently long/high-entropy.

The example is there for the guideline about "only filter known good"

That said, it could confuse people. I'd suggest to fill a bug at bugzilla.mozilla.org to put a broader example, or to specify that you might want different characters in the known good list, specially for non-latin writing people, cause yeah, many would just copy paste it without thinking further.

This also keeps out non-latin alphabet users. Like half the world's population.

It's difficult to accept non-latin characters in usernames without making username spoofing trivial. For example, I could sign up for an account as `аw3c2', pretending to be you. What looks like `a' there is actually `а', a Cyrillic letter with its own Unicode code point. Assuming a website where user impersonation is an issue (say, an auction site where a homoglyph attack could be used to scam a seller into sending the goods to a different address), one would need to blacklist all likely homoglyphs and duplicate characters, or just stick to ASCII. (I'm sure users are accustomed to using ASCII usernames, though it's not ideal.)

It also makes it hard for users to log in using a dumber input device than usual.

Then there is unicode normalization. OSX might decide to encode "e-with-an-accent" as two code points, while Windows will combine them into a single code point. Users will not be impressed if they can't log in because their OS doesn't use the right encoding and the web site forgot to normalize.

In this case you should use Unicode Normalisation to turn all input into one canonical form, either with or without combining diacritics

this is brilliant. I am wondering if there are other secure coding guidelines for web devs? I usually refer to stackoverflow for questions about security, but often wondered if there was a set standard.


I was actually wondering how the Mozilla guidance is different.

mozilla websecurity leadis also chairman of owasp. nuff said:-)

Ruby on Rails also provide a guidelines for web security: http://guides.rubyonrails.org/security.html

I love the standardisation of generic answers. First thing coming to my mind as a non-nativ speaker: is there a way to provide translated versions of it inside the wiki?

Good security practices and ease-of-use are often at direct odds with each other.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact