Hacker News new | past | comments | ask | show | jobs | submit login
Web Developer Security Checklist (sensedeep.com)
548 points by aeronautic on May 16, 2017 | hide | past | web | favorite | 244 comments

While it means well, I think some of this advice is pretty bad, or at least unbalanced. From the very first section:

> Encrypt all data at rest in the database

ALL data? How are you supposed to query against it then? What does "at rest" even mean in the context of an always-on database? An encrypted partition or something?

I'm not even sure what this is supposed to mean and it is certainly not common practise.

> Use secondary encryption for data identifying users and any sensitive data like access tokens, email addresses or billing details

So two layers of encryption!? Again, how is one supposed to look up an access token or email address if it is encrypted? And it strikes me that if you don't trust the first level of encryption, the solution is to fix that, not add another one.

> Fully prevent SQL injection by only using SQL prepared statements and stored procedures

That is an extraordinarily inefficient and costly way to develop. The correct way to protect against SQL injection is to use a framework/driver which guarantees escaping and to be cautious about what you allow into queries.

I could go on. The costs of implementing these recommendations would be staggering for the questionable benefits they would provide. It is absolutely possible to have excellent security without implementing any of these ideas, and I don't think they're helpful at all - and certainly not "simple".

Totally with you on the DB encryption but I'm not sure about your comments on prepared statements

>That is an extraordinarily inefficient and costly way to develop. The correct way to protect against SQL injection is to use a framework/driver which guarantees escaping and to be cautious about what you allow into queries.

I've always found using only prepared statements is feasible and justifiable. What is about using them that adds to cost and inefficiency?

Prepared statements are not a silver bullet. Somethings like limit statements in mysql are very hard to do with prepared statements but for the most part it's a good rule of thumb.

My apologies - my brain focussed on "stored procedures" and that's what my rant was addressing. Prepared statements are great and I'd use them whereever possible - totally agreed.

Stored procedures offer no inherent protection.

Often they will make the problem worse if you're doing string concatenation to create dynamic queries in a stored proc.

Nothing wrong with stored procedures. Like anything else, they might not be right for every application.

IMO you should either (A) avoid customizing the database and treat it as a relatively commodity data store or (B) treat it as its own service with its own developers (DBAs) that just happens to use an SQL connection as its network interface.

Some of the most horrible systems are when people just went half-way, and you've got logic randomly scattered between two systems with two programming languages etc.

The image of sprocs continues to suffer from "store all application logic in the database" syndrome, which was all the rage in the early and mid 00s before APIs started to take off. There's nothing necessarily inherently wrong with that approach, but most people have gravitated away from it. DBA/dev split probably had something to do with it.

I'd venture the real reason is many devs know too little of normalisation, data structure and structured data.

Somehow a mess of structures/data objects paired with functions working on these objects - with only ad-hoc ways of enforcing foreign key relationships are seen as "easier", and "better for maintenance" than leveraging an actual (relational) database management system.

Sure it's more fun writing functions than thinking about data integrity - but the vast number of applications are concerned with managing structured information: from scenes in a game that could be interlocking state-machines to the more mundane user/message pairs or similar.

Design is eschewed for prototyping - but then the prototype is kept as the production system, rather than as a basis for design of a better (simpler) system.

If document databases like MongoDB have taught us anything, it's that developers hate any sort of data organization, which is ridiculous and ironic since most of their day-to-day is about manipulating data. Mongo is popular because it allows the developer to stuff a value without thinking about its structure and role, generally leading to utter disaster further down the road.

I do think the biggest problem is that it brought everything into the DBA's turf, and made it too difficult for devs to do their jobs without meddling. As long as there is a dev/DBA organizational dichotomy, there will have to be a structural dichotomy in the software (cf. Conway's Law).

I wonder how many developers would consider using the filesystem as a document store bad practice, but at the same time be happy using mongodb.

It's odd that with the rise of solid, free rdbms' like postgresql (and the current generation of mysql forks) - the general reaction among developers hasn't been to realize that 99% of systems are/should be data-first/data centric - but rather a stubborn struggle to pretend their language supports transaction-guaranteed persisted runtime/memory -- without actually using a real object database like zoodb (for python) or gemstone.

Most (good) ORMs/drivers will be transparently using prepared statements anyway. It explicitly informs the DB "this is unsafe user input" instead of trying to approximate that message with escapes.

Some this advice seems to come out of a cargo cult development handbook. That may sound a little harsh, but it's better to have a tendancy to take any article that states "Always do X" or "Never do Y" with a handful of salt.

If you don't know enough about your own systems requirements, lists like this are going to have you doing work that you don't understand, doesn't need doing (or worse, is detrimental) and if you do, you probably don't need to be using checklists like this to do your job.

Thanks for this, I'm planning on developing a small web application over the summer for one of my gaming interests and as I was reading this I was wondering how practical any of it would be for my project. Don't get me wrong I plan on building in security but I'm planning on building a Rails app were the most sensitive data contained is an API key, it would hardly seem practical to build Fort Knox.

I'd recommend you start with the items at the very end of the check list. Make a list of the threats and plan who you need to defend against.

That will then allow you to cull the list down.

Totally agree about the encryption. I've set up databases to run on encrypted disks (LUKS or nowadays on AWS you can get encrypted EBS), but I feel like that is really just to tick the "encrypted at rest" compliance checkbox, because if the machine is turned on, anyone can read the data (subject to normal file permissions of course). As far as I can tell the only threat this is protecting against is someone physically pulling the drive and walking off with it.

Also this business of two layers of encryption: yep, anything encrypted can't be indexed or searched (basically). At best you can encrypt the query input too and compare for strict equality---sort of like hashing passwords. (Is that what he means?) I've never seen anyone hash email addresses though. And encrypting billing details . . . okay, I guess, although if the app has the decryption key, then what security are you really adding?

We use disk encryption to protect against a specific attack that had occurred in the past: the attacker hacks the hosting provider's admin panel and uses it to reboot our server into a rescue system, in which he has access the raw disk.

Interesting, what hosting provider/system are you talking about here?

It seems to me having admin panel access is an even higher level privilege than having root on the box itself, for any VPS host environment I've personally used before. Linode for instance lets you open up a root shell to your running box, which doesn't even use SSH. I'm surprised it isn't total game-over if your admin panel access is compromised.

The provider that got hacked was Linode. That was a real bummer: we secured the server to the teeth and then we got hacked through a channel that we had no control over. :(

But there are many providers out there where through the admin panel you can gain access to a system. Many providers provide access to the server's terminal. Even if you can't login, you can reboot the server, and at the boot loader stage you can boot the OS in rescue mode, during which you can mount the hard disk.

Good example! So many vulnerabilities come from a lack of imagination. :-)

If the hacker gets access to the database due to a network config error, then they will be able to query the database, but only get records with some sensitive fields encrypted.

Given the number of email addresses that have been hacked and stolen (3.75 billion in Troy Hunt's haveibeenpwned alone), I believe doing one little extra bit of encryption on email addresses is worthwhile. We push that into our ORM layer and can easily flag any field in the database to be encrypted.

On very, very high volume accessed tables, you may not want to do that. But for user logon details -- for us it was a no brainer.

Psst, I really hate to break it to you but you are "stealing" and "hacking" your users email addresses ever time you send them an email!


Wait, so you encrypt the user's email before it's inserted into the table?

Are you using a different field for 'username' on logins, or do you have to jump through some hoops to find an encrypted email to do a login check against the pw hash?

Can't you encrypt the entered email and search with that?

Password hashes are handled separately using bcrypt - another story.

For the User.email field, we crypt/decrypt in the ORM/DB layer. So we search with the encrypted value that is stored in the db. App code provides plain-text email, ORM encrypts and searches with that value. Symmetric encryption of short strings is pretty fast and this is not a high volume API.

We use rate limiting on the API to protect against DOS on this API.

If it's symmetric, then doesn't an app vulnerability make it possible for you to leak your key? Then the app also contains access credentials to the DB, so vulnerabilities in the app will still lead to real access to the DB, right?

Yes, we're not protecting against the app being attacked, the key being jacked and then the attacker getting access to the db. Rather, we're protecting the database against being accessed directly. That could happen due to an error in configuring network access to the database or another service could be compromised that has access to the same database -- as has happened to many mongo databases over this year.

So your solution to fixing a basic ops problem 'software package A was exposed to the internet with(out|effectively no) auth' is: encrypt shit in an ORM layer and destroy your ability to do anything beyond absolute string comparison queries.

No thanks.

No, we don't encrypt indiscriminately. We selectively pick fields to encrypt - fields that are highly sensitive.

And you are right, we do this because an ops error can easily make a mistake sometime in the future and probably will one day. We all make mistakes and defense in depth is all about that.

It does seem like emails are a good fit for encryption. I can't see wanting to do anything more than simple equality checks.

I could see a use case to see which emails are from what TLD.

For instance, all @gmail, all @yahoo, all @aol.

Maybe you want to do the cool "hey we noticed your email on haveibeenpwnd, you should change you password here just in case". In which case, anything other than plain text could prevent that from happening.

Hashing an email in that sense gets much more difficult, no?

If you have enough emails that you can't SELECT * FROM users and do that query in memory, you're probably in a spot where you should not be picking security advice from a blog like this.

(That's not a negative as to this blog, it's really good and I've recommended it to multiple clients already, but that level of acumen should already be assumed at that scale. If you don't have it, this blog post is insufficient.)

This is an interesting solution, thanks for sharing. This would be a cool feature to build into open source ORMs.

This is a good idea, but only for fields which can never be used in range based index lookups.

I see. Makes sense. Thanks for sharing this approach.

Never mind that email addresses are sent in the clear all over the internet every day in normal SMTP traffic?

I am wondering if you are clear on exactly what database encryption protects you against.

Hint: not much at all.

Maybe it's useful to mention security procedures for VPN privacy services. You accept credit/debit card payments, and need to retain records for dealing with fraud, satisfy VAT requirements, etc. Your exit servers need to check account status, check for existing connections to limit concurrency, and so on. But in case adversaries have your exit servers seized, you want them to be clean. So you run them read-only, with no persistent storage, and you do authentication etc with remote servers, using hashed account names and passwords.

Countermail, for example, does the same for its email servers. For Tor relays, there's tor-ramdisk.[0]

0) https://lists.torproject.org/pipermail/tor-talk/2017-Februar...

As far as I can tell the only threat this is protecting against is someone physically pulling the drive and walking off with it.

Are you saying that that threat isn't worth protecting against?

It also makes it easier to move servers, recycle hard drives etc - safe in the knowledge that the whole disk was encrypted - without worrying about where the used disk ends up.

The only real alternative is to manage your own servers in house and degauss/drill through old drives. And this might be more of a hassle (at scale) with ssds.

Consider a faulty disk that is replaced. It might be possible to recover data by using advanced techniques - but might not be possible to remotely wipe it. Now you have to hope your provider manages to safely dispose of the drive.

No, and I've set it up many times. It's just . . . whenever I talk to people about this I get the impression they think it secures them against more. Non-technical people especially (but not just them) worry about someone "hacking into the system". Full-disk encryption is almost more like physical security than digital security.

Every potential threat should be considered, but weighed against likelihood when prioritizing.

Sometimes you make cost/benefit consolations. 100% airtight would be great, but 100% airtight doesn't actually exist. Most organizations with finite resources have to make risk/reward decisions when there are as many threats as there are with web development.

The cost for encryption at rest is single digit percentages these days.

I push for encryption at rest on ANY cloud provider storage for one main reason: I have no control over their disk disposal or reuse mechanisms. They can claim they wipe the data, but I have no way to test that reliably.

As for disk encryption plus FS encryption, keep in mind in AWS or Azure, it's possible for a misconfigured IAM or SPN to leak access to the disk blob.. If it was encrypted with a key separately, the risk is mitigated. Again, it's just too easy to implement on almost every cloud provider thes days.

The cost of full encryption is minimal on platforms like AWS Aurora. If the platform supports it - this is a good way to go. The incremental cost of encryption with Aurora is very low (< 3%). At rest, means that the data on disk is encrypted if the DB itself is compromised.

The 2nd layer is if the server-app is compromised. In this case, the data has already been decrypted by AWS. The 2nd layer means that credentials, emails etc are safe in the server-app tier.

You can lookup tokens by decrypting first. I'm not sure I'm understanding your question properly.

Regarding prepared statements, the mysql2 node driver is equivalent to the non-prepared driver in speed and infinitely safer.

I agree you don't necessarily have to do all these things, but at least think about them. We do all these items in our software and have not found it onerous. Happy to provide more details about this if you would like.

> At rest, means that the data on disk is encrypted if the DB itself is compromised

Right, so basically block-level encryption of the actual DB data file. While this would be much easier than encrypting the record data at the row level, it only protects against literal copying of the data file, which in my experience is not a common threat vector.

You're right though, it is simple enough to do this with something like aurora - although outside of AWS it may be very troublesome, with possible performance ramifications.

I guess my point is that this article is presented as a basic checklist for all web apps, with insufficient consideration of the ROI for implementing these solutions.

> You can lookup tokens by decrypting first

A user attempts to access your API, presenting an access token. How do you query your DB for that access token if all your tokens are encrypted at the row level? You could encrypt it at the app level and search for the cyphertext, sure, but now your server needs to have the secret, and it needs to be global, seemingly removing the point. Every other aspect of your application (notifications engine, admin portal, anything else that needs the DB) must re-implement this scheme.

Again, it might be useful, but the ROI is questionable, IMO.

> Regarding prepared statements

I misspoke here really - my reaction was against the stored procedures, not prepared statements, which are great and should be used whereever possible.

My main point is that security, and all development really, is about trade-offs. The options here are presented without due consideration of the benefits and costs of the tradeoffs they represent - and they article doesn't ask you to just "think" about them, it's a checklist. If you are a bank with 500 developers and unlimited time, sure, do all of these. If you're a startup - questionable at best.

Having a system encrypted at rest is a huge reduction in legal liability under HIPAA. It may be similar in other industries. On the other hand I might not care if someone stole my recipe collection.

Good point. I did say:

"This checklist is simple, and by no means complete. It is a list of some of the more important issues you should consider when creating a web application."

But I'll try to make that clearer. I'm limited by space and peoples attention span. Having links to implementation notes will address your concern I think.

> I'm limited by space and peoples attention span

Might I suggest a simple "tag" for each suggestion indicating the appropriate market for that point. Eg good - your basic SAAS/casual web app. Great - apps dealing with moderately sensitive data. Extreme - you're building a bank, seeking PCI compliance, storing medical data, etc.

You could break it out at the top, and then it doesn't take much space in the body of the post, but still provides some context for which projects should be adopting this level of paranoia?

A tag like this would be a great idea. I'm playing around with a very basic casual web app that will have no sensitive data besides a hashed username/password. If someone wants to hack the database and give themselves nearly unlimited in a game with no leaderboards and share it with their friends I could care less (I'd actually be happy because that means someone actually cares about the app). I'll be using a BaaS to store data so I don't think I have to worry about things like sql injection but I have no experience with web work so correct me if I'm wrong.

>> Fully prevent SQL injection by only using SQL prepared statements and stored procedures

> That is an extraordinarily inefficient and costly way to develop.

Why is this inefficient? Can you elaborate?

As mentioned beneath, the comment was secretly fixated on the 'stored procedures' side and not the 'prepared statements' side. Stored procedures have many costs associated to them; for example, they don't get version-controlled by default and they don't lend themselves easily to a development/production schism; both of these need to be solved by writing out some sort of database-synchronizer-script which will handle fallover. In addition usually you cannot issue breaking changes to an SPROC without scheduled downtime, so you have to intentionally have a flow of "create new SPROC, make code use the new SPROC, deploy across the codebase and audit for the old SPROC's use, finally remember to drop the old SPROC so that old code doesn't get recycled by someone who is looking for the first thing that does what they want." Compare that to just "commit a new patch, then upgrade servers as soon as possible".

Until you deal with important data, like records indicating money in a bank account. Then it's more like "tough shit you can't just slap an ORM on the direct table data".

Most people don't deal with important records though, and generally not people here (though a shout out to people from Stripe and other payments companies!). So I would t recommend stored procedures for this cloud either.

I do recommend prepared statements, but they do have a cost since it is two network calls for one query instead of one. Totally worth it too.

In my development style every database ddl change is done with a version controlled sql file piped trough psql and/or apgdiff. So I guess these points do not apply to me since there is no difference in normal code and database code handling.

I am not the original poster, but some queries can't be moved to stored procedures (depends on SQL software and library abilities). For example:

IN clauses can get tricky if prepared statements don't support arrays / lists of columns.

Adhoc queries get tricky where user could specify a parameter or not.

This answer is more about stored procedures than prepared statements.

That is right. The key point is to reduce exposure to SQL injection by not formatting queries. Prepared statements help solve a whole class of bugs at a lower level.

>>ALL data? How are you supposed to query against it then? What does "at rest" even mean in the context of an always-on database? An encrypted partition or something?

Most database encryption is "transparent", meaning the applications won't even be aware that their database is encrypted. Here is an example:


This can probably explain more succinctly than I can.


Database encryption at rest is a normal practice for PII data.

MySQL example: https://www.mysql.com/products/enterprise/tde.html

Percona article: https://www.percona.com/blog/2016/04/08/mysql-data-at-rest-e...

I think, the author is right, and you might be not.

"Data at rest" means the data that you have no intention of querying soon. Which implies "hot" / "cold" data partitioning, which is usually a good idea. Can be complicated, but commonly encountered in financial backends.

Two layers of encryption can be better than one (if logically and temporarily separated). Encryption algorithms are usually not a problem; credentials compromise or out-of-band vulnerabilities CAN be a problem.

Finally, ANY type of input sanitization is wrong way to do security, and should be employed only as an absolute last resort. Prepared statements provide strong semantic separation guarantees, and you can use ';drop table users; _safely_ as your query parameters. Otherwise you'll be stuck between the requirements corner cases (do you know that legal names can include quotes and apostrophes?) and security requirements, and it is the war that is impossible to win.

All significant modules, whether facing the Internet or internal to a large system, should validate their inputs to prevent propagation of BadThings(TM).

You can choose how much to validate, which is probably more in the former case, but it should be done.

I can't tell you the number of horrors I'm sure we've stopped in (for example) investment banking pricing by avoiding trying to do calculations on junk when the junk was easy to spot. (We also added one or two, eg when one set of interest rates went above 100%, but that's an after-dinner story!)

...and this is why Gerard 't Hooft, the famed theoretical physicist, can't register with his real name nearly anywhere. :(


That sounds like BAD validation that needs fixing, not an excuse not to do any.

Some US sites still won't let me in because (a) they insist that I must have a middle name or (b) screw up the hyphen in my surname.

Many sites and payroll systems assume that everyone has a surname. Never mind its character set and ordering.

Even if you don't make these assumptions you don't need to accept a binary gzipped GB of zeros or worse in a name input field.

Designing a _good_ validation system (to reliably discriminate between "good" strings and "bad" strings, including all corner cases) is really hard, error-prone, not future-proof (standards change) and might be even theoretically impossible in some cases.

Some basic sanity checks are useful, of course. But it is not a good idea to use input sanitization as the only (or even main) method of injection attacks prevention.

> But it is not a good idea to use input sanitization as the only (or even main) method of injection attacks prevention

Can you define what you mean by "input sanitization"? Because in my mind, a prepared statement is doing just that. You say what part is your SQL, what part is your user input, and you let the DB adapter sanitize the input to build the final statement. You aren't writing sanitization code yourself, you're leaving it up to a library, but that's still what's happening. Or do you have a different term for what's going when a prepared statement routine converts user input into SQL-safe strings?

I never said:

1) That it was easy to do right: that's why we get paid.

2) That it was an excuse not to do other checks elsewhere: defence in depth is key.

Anyhow, I hope that we're really furiously agreeing.

I work in health care, where "data at rest" does not mean you won't query it soon, but instead that it isn't either in transit (datastore to app, app to app, app to customer, etc; all of these have transport encryption) or being worked with in memory currently. If the data is in a store, it's almost certainly "at rest".

OK, for anything that is queried on demand it is not a good idea, until we get homomorphic encryption.

> Finally, ANY type of input sanitization is wrong way to do security, and should be employed only as an absolute last resort.

Ok, that's just plain wrong and absolutely wreckless advice. Everything from software development 101 classes to OWASP data validation can call you on that. If you don't understand why you're wrong, please, please, please stop developing software now until you can understand it.

He's correct in the sense of trying to catch SQL injection via input validation - that's a losing game. He isn't saying "don't validate your data at all", that's a different issue. Ultimately, your OWASP issues (XSS, SQL Injection) related to input are going to be prevented by appropriate escaping and data handling across your entire stack by default (key word is default, "trust the devs" is not the right answer). Input validation isn't the ticket.

This is a prime example of the Robustness Principle. https://en.wikipedia.org/wiki/Robustness_principle

Would you be so kind to explain me the attack vector if the user input is never possibly treated as part of the code?

What I came up with is this: user name is stored in the database, and some new junior developer in a large team reads it in the backend code, and immediately plugs into another SQL query using string concatenation. BOOM!

But on the other hand, the very same junior developer can forget to sanitize the inputs before storing them (or do it incorrectly), so there.

> ALL data? How are you supposed to query against it then?

This is actually possible, it's just very easy to make over-complicated: https://paragonie.com/white-paper/2015-secure-php-data-encry...

I wouldn't recommend encrypting all data. I also wouldn't recommend using unauthenticated encryption, which the article doesn't address.

> That is an extraordinarily inefficient and costly way to develop. The correct way to protect against SQL injection is to use a framework/driver which guarantees escaping and to be cautious about what you allow into queries.

I agree with you, but I don't think that's what the author meant. Nowadays, when most people talk about "prepared statements" and "stored procedures", they are just conflating those things with escaping. I think they're trying to say, use something with an API that prevents injection bugs, rather than actually use database prepared statements.

I find that 9 out of 10 times, when people talk about "prepared statements" they are referring to something like the PHP bind_param thing (https://www.w3schools.com/php/php_mysql_prepared_statements....), rather than this sort of stuff: https://www.postgresql.org/docs/9.3/static/sql-prepare.html

History has many examples of "safe escaping" that didn't turn out to be so safe after all.

I'd rather use proper prepared statements than rely on string escaping.

I looked at the two linked pieces of documentation, and to my naive eyes they look like they perform the same task:

PHP: "Bound parameters minimize bandwidth to the server as you need send only the parameters each time, and not the whole query", meaning that the statement is prepared server-side and parameters are sent separately.

Postgres: "A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed.... Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again."

Is there a difference between the two, and could you elaborate?

ActiveRecord uses actual, PostgreSQL prepared statements in many cases. Eg if you execute `User.where(email: "foo@bar.com")`, it will prepare, bind and execute the query separately, with "foo@bar.com" being sent to PG as a bound parameter. You can see this if you flip the proper flags in the PG config and tail its log.

In other cases, like `User.where("email like ?", "#{params[:email_like])}%")`, AR will escape the provided value itself and no bind parameter will be used. However, it will still use a prepared statement, which at least means that only one database query will execute; no matter what AR does or fails to do with it's escaping, you won't execute both the `SELECT` and a `DROP TABLE` because a prepared statement can only be one statement.

> Encrypt all data at rest in the database

This is funny, I've actually inherited a project where the original developer had this idea, he used the same function to encrypt everything: even the news posts available on the front page were encrypted. The passwords were using the same encryption functions and needless to say not using a one way hash so fully decryptable...

Sorry that is not quite what was intended. I've revised the text to say:

If your database supports low cost encryption at rest (like AWS Aurora), then enable that to secure data on disk. Make sure all backups are stored encrypted as well.

i.e. this kind of encryption costs very, very little and give you physical security if you need it.

I take your point that this could be explained a lot better. In the next revision when I fold in the feedback, I'll make this clearer. One suggestion, that I'll do is pair each checklist item with an implementation note that explains it better. Thanks for pointing out the flaws.

Ah, I didn't realise you were the author!

Sorry for coming on a little strong. I should wait a few minutes before tearing into things...

The article is a great beginning, and as we have seen, definitely a conversation-starter! So thanks for creating it.

What I'm scared of, though, is non-technical people finding articles like this and, without a thorough understanding of the tradeoffs involved, presenting it as a list of requirements for their team. That's probably not what you intended - but I've seen this kind of thing happen again and again.

No problem, all feedback is good.

Lucky I've got my asbestos pants on ;-)

"ALL data? How are you supposed to query against it then?"

There is an area of active research called homomorphic encryption that would allow for operations against encrypted data. Right now it is too slow to be practical, but maybe in the future.

Right, but the existence of homomorphic encryption isn't something should prompt developers to start implementing it as a "best practice" now.

(Even then, my understanding of homomorphic encryption is that it would not be great for generalized queries against the data like you would expect in a database with SQL)

I'm fairly certain, and I believe most experts of the field are too, that "general purpose" homomorphic encryption will never happen.

At rest basically means on disk. People might not think about this but AWS actually has a physical disk somewhere which someone could yank from the data center and read from. Not that likely but also not hard to protect yourself from.

If someone is yanking and reading disks at AWS then the game was over a long time ago. Physical access always wins.

IMO, if you're on AWS (or similar) then at rest encryption is a wholly unnecessary expense, unless you need to tick some kind of regulatory checkbox. I can see it for smaller on premise racks to prevent a "smash and grab" problem, but in a secure datacenter? Nah...

Disk encryption also prevents disposal issues from affecting you, which is a separate problem than physical access.

Amazon has employees as well, yes? Employees with access to data centers? Employees that may be convinced to make some "mistakes" in the disposal of old disks combined with the early replacement of a few specific drives?

Of course this is very hypothetical and it requires the attacker to know what disk in what rack to target, I'm not saying it's the most likely scenario, I'm saying it can be avoided by flipping a switch and paying a few extra dollars so I'll keep it enabled.

Yes, and no. Most block storage services provided by cloud hosts shard data across tons of physical media.

Some of them (e.g. Google Cloud) encrypt everything at rest by default too.

Personally, I'd rather have people have their little apps with too much security than major apps with too little.

Also, having people trained in good practices would make hiring easier.

> ALL data? How are you supposed to query against it then? What does "at rest" even mean in the context of an always-on database? An encrypted partition or something?

Data at rest means data on the disk. (so basically LUKS or similar partition / FDE) In virtualised environments it may not obvious, but the disk you're running on can still be accessed in various ways. Here are some examples (not a complete list):

- VM breakout allows someone to read others' volumes

- logic issue allows others to snapshot your volumes

- disk is marked as bad, then discarded without full erase / physical destruction, then found in the dump

> So two layers of encryption!? Again, how is one supposed to look up an access token or email address if it is encrypted?

You don't have to encrypt the things you index. It's more to do with PII, addresses, etc. which you don't want to accidentally expose, and don't have to access too often outside of systems like billing.

It may also prevent you from storing a copy by accident. For example if you serialise a whole record to the output due to type/variable mistake. Or by revealing it in an error message. Of course these shouldn't happen anyway, but it's a good protection in practice.

> And it strikes me that if you don't trust the first level of encryption, the solution is to fix that, not add another one.

Nobody writes ideal code. There will be bugs. You prepare for that using defence in depth. This may mean separate minimal services handing out the sensitive data with extra audit steps. It may mean encrypting the data with a password not stored in the database, so that a trivial dump doesn't give access to them. "fix that" is not simple, otherwise we'd just "fix" all software :)

> That is an extraordinarily inefficient and costly way to develop.

It's the most trivial way to avoid all SQL injections. It's used in lots of projects. As long as you don't need to use parameters where they're not allowed in your database (for example table names), it's not a bad idea at all.

But if you have an ORM and want to use it - great. Most likely your ORM, or framework you use already takes advantage of prepared statements. If it doesn't you end up like Drupal recently (please correct me if I got the framework wrong) which did their own escaping because of prefixed table names and was broken at the framework level.

> be cautious about what you allow into queries.

This is what's extraordinarily inefficient. People make mistakes. Given big enough project, if you allow text queries, or try to be clever with own escaping systems, someone will make a mistake one day. Prepared queries (and good, tested ORMs) get rid of the whole class of issues here and are much more common solution than you suggest.

You should see our ORM. It tries to highly discourage text queries:


I'm going to go the other way on the SQL advice, even though the author has clarified below that he was talking about prepared statements.

It doesn't cost me much, if any, developer time to work with stored procedures. I realize most full stack devs don't have as much in-depth experience with SQL. That was literally all I did for the first several years of my career. Before it ever occurred to me to learn other programming languages and. It was pure SQL all day every day.

If you have the resources to implement even half of this checklist, you probably have the resources to go whole hog on stored procs.

I sometimes do this, and sometimes don't. Partly, it's because I need to stay current with ORMs for the sake of potential job opportunities. And partially because it can be more convenient to keep my head in one language for fleshing out an idea. But for serious projects, I'll go the stored proc route.

In that kind of scenario, I think of the database as a type of microservice that's completely separate from the application layer. You write in the most performant language for that service, even if it's different for the main language the app is written in. And you reap the benefits of the database optimizing the procs as well as the database handling transactions instead of trying to manage that in your application.

How much of a burden this is boils down to whether you personally have the chops (or someone on the team does), and if you get your shit together when it comes to deployment. Stored Proc code should be saved in text files under version control, and your deploy process should make updates to the procs automatically.

If you do things this way, it's not any more painful than versioning your database with migrations, which I do on every project, no matter how small. Using stored procs doesn't have to mean you're working with a mess of unmanageable database code. You can use SQL Alchemy to create models and relationships, Alembic (just speaking to the Python world here), and use a thin wrapper around executing the stored proc, and it's really not that much different from writing prepared statements in Alchemy.

In principle, it's really not different from writing a web app that provides an API that your front end hits (a Python app that resents REST for Angular to call or something else equally common). In this case, I'm just pushing the API down one layer in the stack.

People can argue that the logic that applies to the data should be kept with the logic that defines the data--i.e., that if your model defines the shape of the date, your code that defines its behavior should reside there as well: in the model definition. So the ORM is the correct place for that. Methods on the Class object. I'm sympathetic to that argument, but I also disagree with it, sort of.

I think the rules that govern the behavior of data should exist where the data exists rather than any abstraction layer. Otherwise you have to rewrite them for every new thing that wants to access the data.

And every useful application that people use involving data is going to serve that data to more than the first application you designed to work with it. Keeping these rules in sync across even a very minimal web app that consists only of the app itself, an API, and an analytics platform is already some overhead.

And, every data-driven application already has two means of interface regardless of whether you intend that or not: direct database manipulation. That is going to happen sometimes, whether you want it or not or how bad a practice it is. If there are rules about how things should behave (there are always rules about how things should behave) you have to enforce them at the database level.

If you're doing that correctly already, there's really not that much overhead in using stored procedures. Going that route means that every connection to a database for a certain dataset means that there is a single source of truth for that dataset. Execute procedure x to get data y. That's all any developer on any service needs to know.

It also makes the idea of minimum privilege easier to manage. You don't have to grant select on a table for the API user, for example. You can grant execute to the set of procs that user needs, and only those procs are accessible. Combine that with some well-thought views, and you can greatly limit your attack surface. If someone breaches part of your system and manages to obtain so source code, they don't get any real information about the structure of your database. They only get conn.execute('check_valid_user') or whatever.

The obvious problem here is versioning APIs. Which you must always do. So you have to version your stored procs when there are breaking changes. But the overhead in adding a _v1 or _v1.1 to the end of an .sql file for a proc really isn't that big of a deal compared to all the other stuff you have to do when maintaining an API.

Again, I'm sympathetic to objections to this model. It is not perfect, and I've never once tried to inflict it on coworkers who want to do things differently. I'll do this in my own projects, and when I've gone into a company that's already working with that mindset. I don't hold up the train based on philosophical differences.

There's a lot of criticism I agree with that SOA is not a good place to start. It carries a lot of overhead, and you shouldn't go there until you need to. But to me, the reality is that any web app that comprises a data store, a controller layer, and a front end is already by definition SOA--not the monolith that you want to think of it as. And therefore you should treat it as such and give the various components the attention they deserve.

If you're working on a project you have reasonable expectation will be used enough to require some scaling, this is a totally acceptable way to do things. Let the database do what it's best at, let the controller layer do what it's best at, and then use the front end that's best for your use case.

Because this response isn't long enough already, I'll just add this. I don't think the MVC model is really appropriate for web apps. Even the modified MVVC model is still weird to me. It seems like a model we shoehorned onto what a web app really is. To me, the web app model is data, logic, presentation. The data layer comprises both the data and the rules about the data. The logic layer dictates what happens when an event is triggered by the presentation layer. And it should be relatively thin, particularly when working with one of the slower languages like Python or Ruby. The presentation layer simply defines the user interface and sends messages to the logic layer.

These things really are decoupled by nature in ways they aren't necessarily in native or desktop apps or mobile apps. I think that web developers need to understand that this model really doesn't fit very well. And behave accordingly.

There are obviously exceptions to everything I'm saying here. But the exceptions come at very large scales. Most apps that start out as MVC web apps on whatever stack are not going to become Facebook or Reddit and are not going to have the specific problems that come with that level of scale.

Even so, I would encourage people to rethink what web apps are, and what the MVC/MVVC model really means. I really don't think it's the right model, and the decisions we make about architecture are generally not good ones.

Finally, since this is so long now, I want to point out that I didn't generate these ideas all on my own. I've been heavily influenced by Lex de Haan and Toon Kopplaars in their book, Applied Mathematics for Database Professionals.

Data, logic and presentation sounds a lot like model controller and view. I feel like you suggested most logic should be in how days is handled? But let's be serious here, there's no right way to make a web app. There are certainly wrong ways and inefficient ways. If we focus all our efforts on doing things perfectly well then we would miss the point. Essentially, you're just theory crafting how people should code while ignoring the practical issues of people and coding.

Yes, okay. Every part of my real life experience developing web apps at the full stack is just theorycrafting.

I've clearly never done this in the real world, and I am simply thinking about my idea of a way to create a web app. And I've never worked with, thought about, or managed a team.

Do I actually need a sarcasm tag for this? I think I do, sadly. /s

I'm being a sarcastic fuckhead. Jesus.

Agree 100%. I've written longer responses like this on various web forums and email lists, but these days don't have the energy to keep hammering on this nail. Thanks for taking the time.

Thank you for the agreement. It's worth it to me to keep writing these things. My ideas aren't perfect, and I understand that.

But there are good reasons to keep telling people that this is a reasonable way to do things.

It's clearly not the only way to do things, but I'm tired of this idea that it's an awful, terrible way to do things.

There are way to do this that don't suck, and there are legitimate benefits.

It's not always the best way to do things.

But I often feel like we're dealing with people who haven't properly considered the options when we get into thee kinds of threads.

OWASP has published a release candidate of the OWASP "top-ten" available as a PDF at https://github.com/OWASP/Top10/raw/master/2017/OWASP%20Top%2.... They also support the creation and maintenance of quite a few OSS tools to help secure systems.

To those who are claiming some of these steps are too onerous or would cause performance issues: Are you sure? For an individual system, maybe some of these checkboxes are indeed overkill. Or perhaps your system isn't as secure as you believe? To be fair, there are systems that don't require as much security because the data simply doesn't matter - if they systems contain email address / password hash combinations, they should still treated very carefully.

I think the biggest mistake I've seen made time and time again is that teams go the cheap route and put the database on an Internet connected machine. Don't build a whole architecture until you've got customers to support it but don't be quite that cheap at the beginning.

For systems that require higher than average security, I've been part of a team that went to MUCH further extremes.

Can you elaborate a bit more detail around a MVP-style architecture where the database isn't on the public internet?

I assume you're talking about something like a public facing load balancer that connects into a private network where the web servers and database server live, and you'd need a bastion host (or just the load balancer) to connect into the private network and access any of the machines.

Yes ... that's about the simplest form. If you want to get really fancy you can use a further cloistered machine with hardware encryption and a key that's only inserted when the machine is booted but that's pretty excessive. The main thing is that you don't want your database server's port exposed to the Internet and you don't want your database on your web servers (the most likely part of your infrastructure to be compromised).

There are 168 comments on this thread all earnestly discussing what is pretty clearly a marketing document written by someone without a firm grip on most of the bullets they've written.

Is there that much of a need for another "security checklist", that we'll dive in this deep on a really bad one? Seriously asking!

Finally: if you're worried about "APTification" or whatever it is this company is talking about, and you're deploying in AWS or GCP, do what Ryan's team at Slack did and get auditd monitoring on all your servers, and perhaps get osquery instrumentation set up as well. Attack detection systems like the product this post sells are pretty far down the list of things you should be considering.

As the author, I should clarify that I am a developer - full time and have been for years. If my english seems to imply a lack of depth of understanding - I'm sorry.

The purpose of the checklist is to get people thinking about items they may have forgotten to address during their dev. In the push to ship new products quickly, that happens all too often.

I agree with you that there are many more important and basic things to do first when securing your app - than worrying about APTs. I did not think the checklist gave that impression?

English is not the problem.

You are missing fundamentals like the Seven Deadly Sins of Web Security. Also missing is ninja threat model.

These are the thing that will wipe you out.

This list could just as well be called "List of security stuff I learned about when making my product". Some of the stuff is just Draconian, certainly not applicable for any arbitrary web developer.

There are rarely any checklists applicable everywhere in the same way. Why not treat it as: list of ideas to evaluate and prioritise/ignore into your own checklist for future/current project? It's not like it's all invalid because it's not fully applicable.

No, I think it's valuable as a work log and bucket of ideas on how to improve security. It's just how it's framed that irks me.

Hope the moderators edit the post title to include that this is a 'security' checklist, and not just a 'steps I need to develop a simple website' checklist. I am thinking quite a few people skipped past this article not realising the intended audience.

Agreed. The original title of "Web Developer Security Checklist" is much better.

My biggest worry with this checklist is that while it helps already security-minded people and web developer professionals remember what they should already be doing, it doesn't really help security novices (who may search for something like this) make their app more secure. Why?

1. The checklist tells me what I need to do, but not how to do it right.

I could imagine many security novices reading one of these items, implementing the first solution they find on StackOverflow, and checking it off in a "fixed it, boss" kind of manner. That may lead them to thinking their app is more secure than they should, and is then a detriment to the security of their app.

2. The checklist doesn't really help me decide in what order to do things, and what perceived increase in security I'll receive

Storing sensitive data in the database using bcrypt is pretty easy to implement (many times baked into web frameworks), and provides a good amount of security for the time it takes. Compare that to something like implementing CSP however, which may involve moving a ton of files around your app, adding nonces/hashes, etc, and it gets me an A+ on secureheaders.io, but I'm not sure if all the pain was worth it for my basic application.

3. The checklist makes things way harder for a developer to think about.

Anyone wanting to build a web app looking at this list would probably be too overloaded with information to want to implement much of this, when in reality, they could start off with something like Heroku where they just push it up and it works, and many of the security concerns have been taken care of for them.

Thanks for your well structured comments.

The purpose of the checklist was to get people to think. It is really hard to do much more without going very long.

A number of people have suggested that I link implementation background off each item. I think that can work and layer the info as well.

I know your pain -- things like this get long fast, and it's hard to include everything all at once without going on forever. I would definitely appreciate implementing some background off of each item!

Another thing that might help is knowing about services or open source solutions that can bundle a lot of the checklist together. Heroku might be a paid one, for example, but there might be things like ansible scripts out there that do a lot of this from security professionals, I'd love to know how to be able to package a lot of these checklist items together more easier.

Coding is easy, writing is just darn hard!!

Thanks for the ideas. I'll check those out.

"Hack yourself"

May I suggest the opposite, and say if you're going to pen test your infrastructure, don't have the same people maintaining the infrastructure trying to hack the infrastructure.

Do both. I'm a big believer in having dev learn and own a part of the security process. You learn an enormous amount by hacking yourself. But you are right, you definitely need other eyes to pen test as well.

>Do both. I'm a big believer in having dev learn and own a part of the security process.

Devs already do enough, take some bloody ownership of security outside and inside of code.

When devs start having to whiteboard security issues in interviews, then you can make us responsible for it because we'll appropriately charge you for being decent at two things instead of one.

This is...shortsighted. Security is part of your development work. It always has been, always will be. You need to understand application and server-level security--the former is always your responsibility and while you may have specialists for the latter they do not replace you understanding the fundamentals of it. The server is part of your "stack", even if the "full-stack" people want you to believe it ends at the language runtime.

If you are a web developer and in your interviews you aren't demonstrating that you can think in a security-conscious way, your interviewers need to reflect.

>Security is part of your development work

So is everything else, apparently: databases, algorithms, data structures, operating systems, cloud infrastructure, front end design, business logic features, etc., etc., etc. -- the list just keeps fucking growing and you people are turning devs into skill black holes where they're never going to master anything. All of it apparently matters at one point or another and you're not a good developer if you're not prepared to be the best at everything!

As a web developer, your security knowledge is rarely assessed and tested. If you're not focusing on security 100% of the time, you're not going to be as good as a professional and the bad guys.

You need people focused on security doing the security work. Having someone do business features and security work at the same time is an indication you don't take security seriously because you do not have dedicated people for it. If Bob who's mostly front-end but some back-end says it's secure, I'm not going to believe him. That's the same false confidence that companies parrot to us when they say that "security is a priority", but they have XP machines on their network still and code vulnerable to SQL injection in their code. Don't tell me this doesn't exist because I woke up this morning and read that exact email. You can't half-ass security and a line of business developer is not a security professional.

This doesn't mean you get to be lazy and continue to concat SQL strings even after someone has told you it's bad in a code review.

> So is everything else, apparently

Yes. Sorry (not snarky) that it sounds like you work in a lousy place, but if your "security people" aren't cutting it, you are the line of defense for your users. You need to understand this stuff to be able to do the right thing. Being at least competent--and we're not talking A-plus or best-in-the-business, we're talking a solid C-plus to B, able to consistently make things better rather than leave them static or regress--in these things is your responsibility when you take a job working on stuff that touches these fields, because somebody has to and you're a somebody. "Security people" (and I'm not one, I'm a software developer--though, somehow, [this part is snarky] none of databases, algorithms, operating systems, cloud infrastructure, front end design, business logic features, or application security are beyond me; weird, that) are there to help you, not excuse you from your responsibilities.

There's a lot of stuff to understand! This is why you are paid the medium bucks. Remove "not your job" from your vocabulary and you'll be better at not just these things, but the things you think are your job, too. Because each bit informs the rest.

>Remove "not your job" from your vocabulary and you'll be better at not just these things

No one else is removing that from their vocabulary. I don't see the incentive for me to do that.

This is the problem. You haven't realized that everyone is saying "not my job" and pushing it all onto developers.

> I don't see the incentive for me to do that.

Oh, I don't know, to not be bad at what you do for a living so you can call yourself a professional without it being a farce?

I don't care what other people are doing and you shouldn't either. I care about doing the right thing and building good systems that work for people rather than expose them to risk and you should too--and that means understanding the breadth of your profession. The people who are pushing responsibilities onto you are the people that software is automating out of existence and are of no account except that you get to feel good by "pushing back" in ways that just make everything worse.

>I don't care what other people are doing and you shouldn't either.

Until their decisions affect your work, and you don't get to say anything about it. Because the people pushing around responsibilities are usually your bosses or equals, so you can't really do anything about that. If you get to work in a silo where you're responsible for everything and you understand everyone and do it well, great, you're a master of the universe. You should be a multi-millionaire by retirement at 45.

But most developers won't ever approach that level. Putting your best people in the best slots is a practical approach for teams > 1. And there's no reason you should be mixing responsibilities and watering down everyone's chance at becoming good at ~literally everything in development~

Let me restate, so you can catch it: I'm not saying be good at everything. I am saying be bad at nothing relevant to your work.

Security is without exception and in all circumstances critically relevant to web development. You cannot be an adequate web developer if you cannot look at a system and break down its security impact and how to mitigate it. You can be a bad one, but you can't be even an adequate one.

Less excuses, more practice. It's what you signed up for.

>Do both

Actually yes, this. Good call out.

It is shocking to me to see how many developers giving this checklist a hard time. Every single item is solid advice and what I would have presumed sane people considered common sense/best practice.

Just goes to show how bad a job we have done in the InfoSec world of educating developers.

I got downvoted in another reply, but "security by checklist" was one of the biggest complaints that SANS and other security firms had about enterprise and government IT security policies.

Not that it's a bad checklist, but most "web developers" will not have the background to understand and implement all of these things properly, even if they think they do. Security is not a checklist -- "OK, all boxes ticked, we're done" -- it is also an ongoing, reactive and proactive set of processes and constantly re-verifying that everything you think is so, is actually so. And if you rely on "web developers" to get all of this right you will at some point be disappointed.

I think we can all agree that developers can get better educated about security and can participate building security into the product from the very start. It is hard to engineer security in via a sec-team at a later stage. Education is the key.

How is something like "Use CSP without allowing unsafe-* backdoors" in any way educational? If I'm a newbie web developer, even coming over from embedded systems, how do I know what CSP is? What do I use CSP for? How do I start with CSP? What do I do to configure CSP? What does CSP even stand for? I don't know, it wasn't even defined!

Basically, this is a useless listicle. If you know anything about web security you get nothing from it and if you don't know anything about web security you still get nothing from it.

You are right: checklist is not for education. If you don't know how to implement one of those items, you need to go learn. The checklist itself is still valuable, even to a seasoned security developer.

A checklist will not teach a pilot how to fly and land a plane, but it's value is not zero..

I don't need to get started and I don't need that link; I, personally, know how to develop secure webapps. I am criticizing your listicle for being useless because it is. Your "educational" resource is not educational for anyone.

A largely incomplete list. One of the most important items would be: handle errors correctly and make sure errors do not result into any sort of resource leak or sensitive information disclosure.

For example, this code was from a guy in Stack Overflow:

    ... function (req, res, next) {
        if (err) return console.log(err)
What will that piece of code do? Leak the request object, the response object, including the underlying client connection... keep the connection open, leak memory... and at scale, make your server run out of sockets and memory. Then, when memory is low, swapping kicks in, overloading your CPU as well. In short, kills your machine with only a little 1 line of code mistake.

There are less trivial ways of running into the same situation, but the lesson is that node.js is not a babyproofed technology and needs to be used with care.

This piece of advice is one of the most important, and the most overlooked/ignored in the node community.

Well, I've avoided the node hype because of people warning about bad debugging experience. With that, I'm going to try to avoid touching it ever.

I've used many web stacks over the years and not a single one defaults to blowing up like that. They all follow proper modularization... The framework cleans up objects that the framework creates, you deal with yours.

In any other framework forgetting to end the request would result in the request automatically ending when your code finishes. This includes Vert.x and Undertow which are just as asynchronous as Node(but also multithreaded).

Also how the hell does that leak memory? Is JS reference counting that bad? I've done some really stupid things in Java and C# but never leaked memory enough that it mattered.

It's not a problem with the garbage collector, just how the objects are referenced. Basically you need to ensure the request reaches a terminal state where you either close the connection (ServerResponse.end) or send a response (ServerResponse.send and similar). All the cleanup happens after you do that.

Is there not an OnDispose or some kind of hook to detect when requests can end? To end a request in every framework I've used you just return from your code back to the framework. You shouldn't have to worry about a framework object like request state.

There are ways to detect this, but since express declares itself as an "minimalistic, non-opinionated framework" it's up to you to do it and do it correctly, or have correctly implemented logic that does not run into this issue.

Ah, node. That is a whole post for itself and a really good idea. Perhaps I'll tackle that next. Coupled with async wait issues -- there is plenty to talk about there.

But good point. How would you describe this item on error handling? How would you state it for the check list?

Make sure error conditions are handled gracefully, without leaking resources, and producing errors that do not contain any sensitive information.

Thanks perfect.

> make your server run out of sockets and memory

Do you mind explaining why the code causes this?

Because when you handle a request, you need either respond to it or close it. If you don't close it (e.g: res.end, res.send, ...) the connection remains open. Also all the memory allocated for that request stays around without being collected.

Agree with the point, but remember normal HTTP/1.1 /2 keep alive should close that connection in 15-30 seconds for all modern browsers. It will give rise to a DOS vulnerability as you point out.

But all the objects remain leaked even if the request connection ends. And if the request stream got piped into another stream the problem can become more severe.

Usually error events propagate through piped streams, but that largely depends on the order in which the event handlers were defined.

Are you sure the objects remain leaked when a connection disconnect comes through? I don't have hard data on that.

Depending on the node web framework (express, ...) they may handle that differently and may have timeouts to cleanup.

I'd love to get a firm answer on this one.

I'd love a firm answer too, but I'm too lazy to write up code that puts request objects in a WeakSet and checks for when the set gets smaller. I hope someone else does it. :-)

Yes, I am certain of that. I've seen it in production environments and verified it by looking at heap dumps.

Prepared statements are amazing. Found out about them very soon after I started programming (incidentally my first language was PHP), and switched to PDO right away. This was years ago, I don't understand how people are still using the deprecated and insecure mysql_* functions, you can still find them all over SO and my university "web" class was teaching them as well...

You'll be glad to hear that they've finally been removed since PHP 7.0 so slowly but surely they're being consigned to history.

And there seems to be much better awareness about the perils of mysql_* functions amongst PHP devs these days.

However, 5.6 is still in security support until 31st December 2018, so all that bad advice on SO and elsewhere is still relevant and lurking, waiting to be found by inexperienced devs.

As other people have noted, prepared statements aren't a security panacea, and string concatenation you do in the query can be vulnerable.

When do we get a SaaS which just comes with all this built-in? I want to pick a framework and let the service add all these things, including automatic updates.

What you want, and what is effective and feasible, are at odds. Trying to build a framework for this to work in the general case will--not may, will--result in something that doesn't work in that general case for anybody.

You can't framework away security. Parts can be abstracted, but that abstraction is for ease-of-use, not correctness; you need to understand what's going on and why. It's your job to. (Or pay someone else to. But we're expensive.)

I agree with you, but I understand his wish though.

Security sometimes is just hard and it is unrealistic to hope that all developers, everywhere, all the time will get it right.

The more the platform can do, the better.

I understand it too, but it's, to be honest, horseshit. Well-meaning horseshit, but horseshit despite it.

Security is hard. It is irreducibly hard when you add the constraint that arbitrary do-whatever code and applications must be supported. Having your platform do things is great--to make you faster. You still have to understand what it's doing because it's very easy to step outside the guarantees of that platform and suddenly no longer benefit from those security features. Sometimes you might even have to do that for business reasons. And then you must know how to safely compensate for it.

It's a rare web developer who isn't safeguarding somebody else's personal information. (Yes, even just name + email. Don't make it easier for other people to be phished.) The onus is on us as a development community to take that seriously and to treat the security of our code and our systems with the caution it mandates.

No argument on that (except the horseshit ;-)


Not simple by any means, this one is very thorough and one of the better security checklist I've come across in a while! Sensedeep looks very intriguing and I often wondered if anyone has created an alternative to Atomic Secured Linux. Could this be it? Or is it like an ids / ips with a gui?

We're just in beta. SenseDeep is part host-IDS and part cloud-side. The key is real-time. We want to detect attacks in real-time on the server or on the cloud-side. It has a GUI on top to unify and give a cloud complete view. The focus is DevOps and not companies with a security team. Not sure if this helps or confuses things.

Does your product help satisfy any areas of the pci compliance?

I don't want to take the focus off the post and discussion which was about the web checklist. You could dm me twitter @SenseDeepSec or email mob. But briefly, we hit some of the PCI compliance objectives but not all. We expect to cover more ground in this regard quickly.

Implementation discussions for each of the checks would be handy for those of us not familiar with all of these practices.

Also those most likely to need such a list.

Great idea for a follow on post. I'll do that.

> Store and distribute secrets using a key store designed for the purpose. Don’t hard code in your applications.

Curious: Is there a widely-used off the shelf solution/pattern for this? Or a "idiot's guide to writing one"? It's always seemed to me like super bad practice to hard-code a (for example) AWS secret into your app. However if you set up a basic web service to deliver the AWS secret to the app, wouldn't your app need to authenticate with that service with... a hardcoded secret?

I think the idea is mainly that you separate your config from your code (https://12factor.net/config).

The main security benefits I could see would be:

- By having different config files (different files holding all your ENV variables), you could allow different levels of access. Imagine a junior developer only getting a staging api key vs getting the production api key for S3, for example. With hardcoded ENV variables, you'd probably put the highest level key possible, which would be something like "superuser" access.

- By separating out your ENV variables from your code, you make it more difficult for your entire app to be compromised than if they were bundled together. So if your Github repo got hacked, you aren't worrying about making sure everything else isn't hacked too as well.

In the end though, it's turtles all the way down. You still need your ENV variables to be exposed at some point, so those will inevitably be in some file that lists everything.

My question with ENV files is -- how are people sharing them? Over Slack? Through dropbox? On a USB drive? I feel like you may want some sort of permissions-based-access to them, but have never quite seen a service that does this.

> Curious: Is there a widely-used off the shelf solution/pattern for this?

Credstash, Sneaker, etc. are fine in AWS.

> wouldn't your app need to authenticate with that service with... a hardcoded secret?

Trusted third parties can provision your initialization secret, i.e. AWS IAM instance profiles providing role credentials automatically to EC2 instances. (Set up a policy that can read secret keys for specific encryption contexts and be done with it.)

Sure, classic chicken and egg problem. At some point you need an unencrypted secret.

A better way to phrase it is that at some point you need trust. AWS IAM instance profiles are a great example of this.

I always use environment variables, you can just use ENV['AWS_S3_KEY'] or whatever in your application code. Keep the keys in your local environment, and add separate sets of keys to the staging / production environments. These files probably live in your project (.env or similar) in development but are gitignored. On production, they can be in your web-server or application configuration wherever appropriate, as long as they get loaded. If you are using a tool to manage deployment, you probably just need a step to verify on deploy the files / lines containing keys exist.

I can't think of a widely-used programming language that doesn't support environment variables, though support may be less than exemplary in your language of choice. In ruby land, I use https://github.com/bkeepers/dotenv

Not sure how storing the secret in the user's environment helps. At the end of the day, you're still distributing a secret to an untrusted end user.

This is great!

Unfortunately this old list hasn't been updated in forever:

* Web Developer Checklist || http://webdevchecklist.com/

Besides the bad SQL tips most of it is okay. Sounds like the author has little experience with ORM's or SQL abstraction layers with parameterization like ADO.NET, SQLAlchemy, or JDBC.

Besides the SQL stuff mentioned elsewhere, Regex is rarely a safe whitelist. It's better to use specific escapes for HTML or URI or whatever. Most of XSS is finding bad input that isn't filtered. Hardly anyone is stupid enough to not filter input at all, but few filter it enough to prevent all XSS.

Also keeping port 22 closed is just silly. If you have secure credentials no amount of portscanning will hurt you. If you get tired of the logs just move the port and setup fail2ban. This point is controversial so whatever I guess.

Thanks for the tips. Can you say which SQL tip is bad, someone has already picked up the Stored procedure comment -- really meant prepared statements.

Regarding regexp. You can do very precise regexp for many patterns. I agree some are harder, but I wouldn't say "rarely" in our experience.

The point about port 22 is that if you have it open, many people tend to use it more than they should. Effective automation should eliminate / greatly reduce the need for it.

If using AWS/cloud, then you can apply a security group to open when you need, but otherwise keep it closed at the network level at least. I agree it does seem to get people all hot an bothered.

A lot of Regex is a hint that you shouldn't be using regex. Regex is a bad parser and poor sanitizer.

I would wager that more than 50% of XSS vulnerabilities are due to bad regex when you should be using HTML or URI escape

Hi, about port 22, you can use ssh keys and disable password access, see: https://aws.amazon.com/articles/1233/

How about extending the same with examples for popular web frameworks like django RoR?

I know node better and would be a bit light on other frameworks. Do you have suggestions?

Throw across your node work. I will try to replicate the same for RoR. :-)

When I post the implementation notes for Node, feel free to speak up with the RoR speak for that item and I'll add it in.

Thank you.

I'd be happy to try and do a Python version of this. I'm most familiar with Pyramid, but Flask and Django would be a learning opportunity for me. Much of this advice I've already implemented in Pyramid.

> Create immutable hosts instead of long-lived servers that you patch and upgrade. (See Immutable Infrastructure Can Be More Secure).

An interesting idea, I've not come across before. Anyone know where I can find some case studies on this?

Netflix is notable for doing this with all their infrastructure. You could probably find some presentations by them about it. Martin Fowler wrote about it, so that would be a good place to start.


"Finally, have a plan"

I'd argue that you're more secure and your life is easier if the plan (threat model), or at least the planning, come first. The greater your understanding of your data, users, platform, and attackers, the more likely you are to make good (e.g. secure, economical) choices about infrastructure, design, implementation, testing, incident response, etc.

"Plans are nothing; planning is everything." - Dwight D. Eisenhower

Is it good to redirect to https when user hit API with http? I have heard somewhere doing so is bad

Nop.. it isn't good, secure endpoints for API's shouldn't be exposed in plain, an error should be raised when a developer/app tries to contact via HTTP rather than HTTPs.

Is 404 sufficient?

Depends on your choice, personally i would choose between 410 or 501 but whatever you choose, just don't allow an implicit redirect with any of the 301/302 codes.

Correct me if I am wrong but even with https redirect, POST request will fail though.

General guidance mostly, nothing too deep, seems suited for C-level since there is no discussion on securing the infrastructure (as in hardware, not in software), deployment and way too heavy on looking at the world "from the clouds".

Lots of random mostly unnecessary advice that will surely take a simple project and turn it into a big half-baked unmaintainable mess.

Sorry for the harsh words; but good advice needs to be both practical and cost-efficient.

Could you highlight what you think are the important items in a web security checklist?

Forgot one:

   [ ] Don't rely on security-by-checklist.

Thanks everyone for some great comments and discussion. Really appreciate your time and feedback on the article.

I'll fold in the feedback and the ideas and go forward with it.

Thanks all

OP: Michael O'Brien

Simple Web Developer Checklist:

#1 make sure the css stylesheets are loading

I like this better

I would add one more regarding DOS protection:

Sign the session tokens that you issue.

This way you don't need to do any I/O to verify these tokens at the router level.

I'm not sure I fully understand what you are saying. Can you elaborate.

I came up with this technique when designing our platform (https://qbix.com/platform) so I don't know if it's widely used.

Many apps include session id tokens in requests, to identify the logged-in user. The session id is usually a bearer token (like in a cookie) which identifies the session on the server.

To mitigate against DDOS attacks, as you said, all publicly available resources can be cached on CloudFlare. (Personally I look forward to content-addressable protocols like IPFS supplanting HTTP.)

However, the non-public resources are usually dynamic and should not be cached. These resources are typically for users who have logged in, or at least have a session.

So our design basically encourages this:

1) if you want to serve non cacheable resources, require that the request included a session id

2) the session id is generated on our server and signed with an HMAC so the app can verify this signature without doing any I/O.

This is because I/O is expensive and hard to parallelize, whereas statelessly checking whether a session token has been issued by our app is easy to implement, even at the external router level. Simply examine the packets coming in, decrypt, look at the request, and verify the HMAC on the session id. If it is wrong then the session id is bogus so we don't expend any more resources within the network on this request.

You can make sessions expensive to start. For example, a user might have to log in with a valid account to get a session.

The big question is, how can we ensure that users don't get too many accounts? To prevent sybil attacks. Any ideas?

Isn't best practice when it comes to passwords to actually choose a good one, use a password safe, and _not_ rotate?

Both. You want to choose a good password and then not let it get too stale. A very old password (say 1 year) has a higher chance of being subverted purely because there is more elapsed time wherein attackers could have gained access.

Choose good passwords, long, special chars, preferably random and generated by a password generator / manager. And then change periodically. That period depends on your application.

We change our cloud passwords and keys every 90 days.

"A very old password (say 1 year) has a higher chance of being subverted purely because there is more elapsed time wherein attackers could have gained access."

Any actual evidence for this? My counter-hypothesis is if your password lasts 6 months of attempted hacks it'll last >6 years (unless social engineering attempts succeed).

I ask because rotating goes against the current NIST password guidance. In fact, for your recommendation "Implement simple but adequate password rules that encourage users to have long, random passwords", I'd recommend pointing people in that direction.

Sorry, I should be clearer (late here).

With time passing, the chance of you or anyone with access to the password being socially engineered, or some other human error, or a hack on your PC desktop systems, increases linearly with time. The password may last a decade of brute force cracking, but we humans .... continue to make mistakes far more frequently.

So rotating passwords protects against the accumulation of human mistakes and insider threats.

If you are using proper hashing, then your passwords should be safe even if the hashes are compromised.

Could you please point to the NIST recommendation you mention. I thought they said that you should NOT force customers to change passwords. But that is different to you rotating your own critical passwords at a time of your choosing and on your policy.

Rotating passwords will only help in a very specific situation: When the password has been leaked, but you have not yet been hacked.

If someone has already gained access to the system, changing passwords are not sufficient.

If no one has gained access to the system, rotating passwords does not protect you against social engineering.

Nicely said.

The one mod I'd suggest is:

If someone has gained access to the passwords and has not used the password yet or was not interesting in directly using the password themselves, but rather, they on sold it. There is a window of opportunity that rotation helps.

For example: you may be on one of the password lists being sold in the dark web. The owner of the list isn't hacking you, but those purchasing the list will some time soon.

So more specifically, you could be compromised by malware on a PC holding the password and that password may be extracted, sold and may not be used against you for months. Rotation helps in this case which is more common than we care to admit.

You are wrong, at least if you listen to what people like Bruce Schneier and other experts say about password security.

How much of this is built into various frameworks such as Rails, Django, EmberJS (for frontend) etc?

Django actually has pretty good out of the box security settings and support for XSS, CSRF and SQL injection protection, support for bcrypt and argon2, setting password rules/validation through settings, etc:


EmberJS handles all the aspects that purely browser-based application code can - protecting from XSS attacks, setting up your Content Security Policy, and client side input validation.

No much at all.

For RoR many of the app-level points are - CSRF, SQL escaping, bcrypt by default, etc. That's a big reason to use such a framework in the first place.

Yeah, what kind of shitty frameworks is this person using if the answer to this question is "not much at all?"

@aeronautic, there's a typo on your website front page : "acccount" with 3 c.

Security is a process and not a product of some checklist.

Apparenlty written by somebody who isn't responsible for the actual web development themselves, just upfixed web devs work for enterprise pen testing.

Sorry, not true. Spent many years doing full stack web dev and security. Please offer some constructive criticism - gladly received.

You are the OP HN needs but not the one we deserve...

Come on man, 4 submissions in 5 days all leading to your startup's website? And a profile created 5 days ago with zero comments on anything you didn't post?

You could at least try to make it look like you're doing more than promoting your startup.

Been an avid HN reader for ages, but been seriously heads down doing the startup thing for about a year with zero time for blog or posting. Insane hours. Started surfacing a month ago so I can do more than just code and wanted to share a bit.

Not trying to hide the company, in fact pretty proud of it and the new things we're doing. But we are very early stage and still in beta.

I do want to add value through the posts and discussion like this regardless.

I never heard any rules about these things.

If this character is coming up with worthy content then who cares? The vast majority of blog posting is indirect (or direct) marketing.

I don't think he's done anything wrong.

OP's submissions appear to comply with all of the site guidelines. What does it matter if they also happen to send traffic to his startup?


It's time to stop posting unsubstantively like this; we ban accounts that won't.


All of my supposedly unsubstantiated posts have spawned productive conversations. That they tend to be short skeptical responses is simply my service to foil the groupthink that forms around the saccharine content marketing and growth-hacked-to-death affiliate-linked nonsense that we're all staring at to fill compilation lulls.

Sometimes something short and pithy is more impactful than a big long explainer with citations. Here's a clever little comic that helpfully explains my rationale:


I'm not in it for the internet points. I'm in it to express points of view that people like to forget about on here.

Why lol?

Not being a douche - I'm genuinely curious why that is a laughable suggestion.

Being a security-focused post, it probably refers to https://news.ycombinator.com/item?id=13718752

Cloudflare MITMs your secure connections. If you get the cheaper Cloudflare options, it's really insecure.

We offer free origin certificates on any plan level (yes, including FREE). It's not 'really insecure' and you seem to imply that encryption costs more with Cloudflare. That's not true.


Your data is in the clear within Cloudflare, and may even be in the clear between Cloudflare and the real host if you choose that option. You're trusting Cloudflare's security and Cloudflare's internal certificate authority. Hundreds or thousands of sites would be compromised if Cloudflare had a security breach. Like the one they had three months ago.[1]

[1] https://techcrunch.com/2017/02/23/major-cloudflare-bug-leake...

Data is only 'in the clear' inside a machine. All machine to machine communication in Cloudflare is encrypted with mutually authenticated TLS. If a user chooses to not encrypt the back haul from Cloudflare to their origin then, sure, that's not encrypted, but we offer free certificates for origin machines so there's no reason to use that option. If you don't like Cloudflare's Origin CA then use Let's Encrypt on the origin server.

Even so, there's nothing preventing a LE or court order from compromising the confidentiality of your customers, no matter how hard you work on minimizing the scope of your cleartext domains.

I know that you, Prince, rdl, and others are serious about security and privacy, but let's be honest here: If the Feds come a-knocking, you will comply.

It's not that we don't trust you or your competence. You're just not immune to the jackboot threat model.

Actually, we'd fight like crazy legally — as we've demonstrated repeatedly and successfully — and have implemented our technical systems to make it difficult to reveal anything even if we were ordered to. Moreover, we've included warrant canaries in our Transparency Policy so you can know if anything has changed:


See section "Some things we've never done."

Have you ever received a legal order not to change your warrant canaries?

Have you ever received a legal order to not disclose that you have been ordered to not change your warrant canaries?

And so on.

Thanks so much for clarifying. We use your service - would recommend in a heartbeat.

> Your data is in the clear within Cloudflare

Just a heads up you're telling Cloudflare's CTO how Cloudflare works

Just a heads up - it's the inventor of Nagle algorithm telling Cloudflare's CTO how Cloudflare works. This could be very interesting ;) (also stuff like that happens quite a lot on HN, I recommend using some user tagger extension)

Awesome, I love this site

Yeah that would be super useful, so far I've been going by remembering usernames

You never know... I might learn something :-)

Oh for sure - I wasn't saying it to shut the conversation down, more to point out you're speaking with the knowledge of how the innards work :)

I could have worded it to say that better but was dashing out of the house

Yes it does that and that will rule it out for some apps.

An option for many sites is to configure CloudFlare in pass-through mode (no MITM) and then just switch it on when you are being DOS'd.

But then the attackers know your origin IP from before you turned on MITM and can just DDOS it directly.

Is that even an option? Where is that setting?

That is the cloud icon. Make it gray and it is just a DNS. i.e. DOS protection armed and ready, but not active until you need it. That is how we use it.

Ah I see, I thought there might be a setting I'd missed that just forwards your traffic through without cache etc

I do not like this list.

> [ ] Use minimal privilege for the database access user account. Don’t use the database root account.

This advice seems outdated. In general, every significant security breach will get the attacker root access. Playing games with database accounts gets you no security at all, while introducing lots of friction and headache.

Sorry, you don't throw away mitigation techniques because they aren't foolproof. This is still excellent advice. Stop using sa and root accounts for your apps.

I agree with Spydum. The reason I agree is - Layers of security are required. You should not use a root account for your web application. You should also use escaping, properly formed queries, and prepared statements. You use best practices in an orderly manner and you will be much safer than just picking one "silver bullet". Not every attack provides root access to the database or server. Why make it easier for folks? Least privilege is a viable tool in your bag.

Significant breach in the context of web development is usually logic error that leads to unauthorized application-level access, leaking cookies, service unavailability etc.

This is however not the most likely thing to go wrong, in general. Topping the charts is as always, the one and only, User Error!

Restricting db access is the last line of defense against accidentally DROP-ing TABLE in production.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact