Hacker News new | comments | show | ask | jobs | submit login
Why NoSQL Equals NoSecurity (informationweek.com)
27 points by jordhy on Apr 8, 2012 | hide | past | web | favorite | 34 comments

We use hadoop, cassandra, and redis in production. I've never found Information Week to be a high quality source of information, and unfortunately this is no exception. Data systems almost always rely on being in segmented and protected portions of a network with access being controlled by applications and firewalls.

Overall this article struck me as having an axe to grind. I wouldn't put, SQL Server, Oracle, MySQL or any other database on the web with open access. Just like we don't do that with any of our NoSQL solutions. Generally, I've found that security granularity is not as robust in the newer Big Data tools, but that was the same argument against MySQL and Postgres a decade ago.

* Edit: Since we are playing on names, apparently Information Week = Information Weak.

This article makes it sound like relational databases are the pinacle of security. Sure, they have usernames and passwords and other controls, but the data is still usually stored unencrypted.

I don't think putting a few username/password credentials onto the datastore will make it much more secure, especially if I can put those same access controls into a database access layer.

In the end you need to defense in depth just like always, and that includes putting access controls, network controls, OS controls and everything else.

I haven't tried it myself, but Redis has support for simple authentication at least. http://redis.io/topics/security

Just in case you want to open up your Redis instance to the internet?

Sometimes I think people like antirez add these features just so an article isn't published proclaiming Redis security to be broken because of no password authentication.

We use the authentication in our apps. It's simple, and as Zephyr suggested I wouldn't recommend using that as your only security - it's not our only security, just part of the overall approach.

WTF? If you can't trust your network, OS, and user level access, how are you going to trust a DB ACL? I'll put everything up against Information Week using a caching layer (Re: memcached), which has unfettered access without security from an external layer. Non-NoSQL don't magically make security a non-issue; what a joke; in all cases, equally, one has to be aware and diligent about securing your network.

Honestly, I don't think the ACL really provides much in terms of security. I mean, your db username and passwords are stored (in plain text) in your settings.py, config.php, database.yaml, etc. I've seen a few times where master passwords have shown up in a user's bash history due to carelessness. Also consider that your backup script contains read access credentials for all your data.

IMO the main benefit of having an ACL is to compartmentalizes your database so that a SQL injection against one app (maybe) won't spill over to a different app. That's not something that makes me feel good at all.

> WTF? If you can't trust your network, OS, and user level access ...

You can't trust them all equally, and allow for a failure in any one of them to result in a total failure of all security. That's the M&M security model -- crunchy on the outside.

This is one (of several reasons) I wouldn't use memcached without authentication if it could lead to a privilege increase: It completely breaks defense-in-depth. Compromise of just one system can lead to total system compromise.

The article's author/etc simply has an axe to grind with NoSQL. Are we going to throw out our *nix flavors that have us disable root SSH? Our FSes that don't have or disable-by-default block encryption? Security should definitely be championed, full stop, but I'm not convinced by this article that NoSQL or relational DBs have any reason to be implementing the secure layers on my systems.

A big draw to NoSQL is not only the ease of use. One advantage is the separation of working parts. Not jack of all trades, master of none architectures.

The article doesn't make it clear what sort of security features the author would like to see in NoSQL databases beyond SSL support, which they then admit is offered by MongoDB's "commercial offering". Never mind that the commercial version is, I'm assuming, the one that is used by all of the financial institutions the author is worried about.

I'm curious though -- what security features (if any) does HN think NoSQL DBs are missing?

Exactly -- for his example of a financial services system, just buy the commercial version, or tunnel using ssh.

You can also encrypt the underlying file system using your OS's tools.

I can't speak to how financial institutions use, say, Oracle's authentication and authorization. Most web apps & services use their own logic to manage access control. I'd love to hear what someone form that industry has to say. (An "architect" preferably. Not a mere "developer", who are the cause of this whole security problem ;) ).

RDBMS's, put a large focus on data integrity and access within the datastore layer itself (To differing degrees); NoSQL systems more or less offload that responsibility into the application layer. There is still a responsibility of running it within a trusted network.

This is a decision that may not be appropriate for everyone, but that does not make it wrong. Many people who used mySQL to power web applications did use just one credential to access the database. In that scenario the authentication overhead is pretty much just a waste of time.

I would tend to think financial/banking/<insert would-be secure application here> would be insecure no matter what data store they use. It seems to just be part of the game - ask anyone that's developed for one.

One would assume that by adopting NoSQL that would mean a refresh of code/architecture/security but only a small part of me actually believes that.

I also do not really understand the point of this article ad it did not provide me with anything useful.

I'll play. I worked at a financial institution for several years. On the one hand you'd think we were secure because we used a relational database, but we also had a heavily segmented network with all of the databases in their own VLAN's with highly restrictive permissions. On top of that each application was locked down to exactly which data it could use and what could be done with that data.

So if you buy the premise of the article, it wasn't NoSQL, so it must have been secure. In reality we were very secure from external breaches, at least as secure as you can reasonably be.

Personal information was secured to the maximum extent possible. Encrypted, backed up to multiple independent tapes stored in separate locations etc. On the other hand, the overall complexity of the system was very high and if someone breached our network there would have been a very large surface area to attack. I didn't feel like the article addressed this aspect of security in the least. Security just isn't one of those things that are easy or are ever done. It takes constant vigilance and continuous adjustment and improvement.

Did your institution go the seldom traveled route of encrypting sensitive information? Typically all the effort is spent trying to avoid breaches instead of protecting against them when they occur.

Initially it was not encrypted and was stored in plain text in the database. Then we started encrypting sensitive information at rest, both the database files and the backups. We also encrypted certain sensitive information with a secondary encryption. The next massive undertaking was to find all of the processes that use the data, how they use it - both internally and externally, and then ensure the data was protected throughout the entire process. The company was still working through that process when I left. They had identified many places that were problematic and were actively working on fixing them. The team I managed was the payment processing and cash management team. There were a number of legacy systems written in access, fox pro, DTS, Excel even. Many of those were the most painful, and frankly the most problematic.

Many people were put in a situation where the requirements for getting something implemented through the IT teams would take months, yet they were still held accountable for getting things done, so they would build an Excel process to get the job done. They would build some fantastic process, and get promoted, then that Excel system would get dumped on either someone in IT or someone replacing them.

I'm getting off track, but looking at it superficially the data was encrypted. If, however, you watch the data through it's entire value stream, there are many places it is vulnerable. Given the state of affairs in most companies and how few of them think in terms of managing value streams, I expect my observations are fairly widely applicable.

How much information from highly secured RDBMSs is leaked and lost on a daily basis? How much of that leakage is prevented by the database's security mechanism, vs. that of the application connecting to it?

And to pick on certain things - does SSL connections REALLY matter? If you're running over an untrusted network (bad idea, in general), why not just use IPSEC?

I'm the author of a (very) new NoSQL datastore called Artifact (http://zv.github.com/artifact/), that, admittedly, is very prototypical and is not representative of MongoDB, Couch, Riak or any of the other major players. I'm also a vulnerability researcher who makes his way in the world finding and dissecting security bugs and holes in network infrastructure.

I feel this article misrepresents a lot of facts about security in the same way that articles discussing the relative security of passwords misrepresent security.

Here's what I mean

Question: Whats the most obvious way to attack a system? Answer: Guess Passwords.

Because this is the most obvious way to attack computer systems, an undue amount of attention is paid to the topic because it's the only way we're aware of. In reality the vast majority of systems compromised happen because of memory corruption vulnerabilities, or in recent years, (predominantly) web bugs like SQLi and LFI. Explaining the dynamics of taking control of a program through smashing the stack and getting EIP is tough, at least in contrast to explaining how guessing passwords works. So people worry that "their passwords aren't secure" despite that any real black hat deleted D8.dic off their harddrive years ago. Thus, theres a million articles and sites on how secure your password is, but not that many to check if you've got address randomization enabled.

Are Blackhats trying to read your NoSQL stores after staging an internal attack? Possibly, but not nearly as likely as the fact that you've got xp_cmdshell enabled. Attacking MongoDB is tractable, and provides numerous opportunities for a wide nop sled w/ memory mapped files and more than enough opportunity for use-after-free attacks in objects, but the same (and then some!) can be said of any modern SQL database.

The fact that query strings are safe in every major NoSQL datastore is such a massive advantage over SQL DBs that, even if you're using prepared statements, NoSQL is almost assuredly more secure in a day to day context, except against zero-days (Which we can reasonably say are less frequent in established datastores, irrespective of their design decisions).

The fact that he quasi-attacks stateful firewall security in the article is pretty much baseless, clever TCP bugs are long since worn out, and all the things that mainstream developers have thought is clever like firewalk(8) haven't been effective since I was 11. It's not unreasonable to trust major commercial firewalls for anything you'd trust SSH for. It's not reasonable to trust C/C++ applications in general, as even the most valorous efforts at secure software have turned up short (The beautiful irony here may be that SSH, as trusted as it is, has had numerous pre-auth exploits in it, as it is written in C++)

Finally, let's be real, as an article aside -- SSL does absolutely nothing. Every single year there are multiple complete breaks in what SSL is supposed to do, both practical and theoretical. These complete breaks are regularly released with executables, so don't give me that "We're only raising the ba-" Nope, ASLR and W^X raise the bar, you could argue SSL widens your attack surface on anything but HTTP transactions. Furthermore, if MITM is the biggest thing you're worried about then I'd encourage you to download any of the latest XML parsing libraries or PDF readers to rectify that -- it's tragically only minor exaggeration to say attacking these complex, loosely interacting components such as parsers to stage a more complex internal attack is organizational computer hacking in 2012.

All firewalk did was port scan through firewalls. It wasn't some magic trick that got you root on them.

I don't know what you're trying to say with regards to SSL, or how it can be reasonably compared to runtime hardening.

Most firewalls can be configured to block packets with a TTL greater than the route hop count back or simply never allow traffic through even if the TTL is the hop count + X (where X is distance to internal host, presumably 1)

I didn't mean to imply TTL incrementation was magic remote root.

As far as SSL, you're right in that it is a bit tangential. I'm just trying to illustrate the huge security flaws that, in my eyes, stand in front of "can someone sniff my traffic".

What huge security flaws in TLS are you referring to?

(One of my business partners is the author of firewalk).

I'm a mere pup in San Francisco, you're Thomas Ptacek. You probably have a wikipedia page and personal islands. I could suffer an untimely fate if you so much as looked at me wrong.

What I mean to say is that your authority is better than mine on this - so if you think SSL is secure, then I'll start telling people SSL is secure. I just think the numerous vulnerabilities unearthed so far, ranging from the eminently practical (Compromise a CA!) to the blindingly obvious (SSLStrip) constitute a poor record.

Not the response I was going for; just wondering if you meant policy stuff like CA's or protocol stuff.

Serious question -- was your post generated by a Markov chain? While the sentences and paragraphs are basically grammatical, they make no sense at all.

I'm sorry my green ideas are not to your furious sleep specifications. Forgive me Strunk and White.

I found the post very clear and interesting. Maybe it was written like a rant that you'd normally say out loud rather than write down, but it was good nevertheless. What did you find confusing?

What's your takeaway from it?

Multi-layer security is an interesting issue where each service can be exploited in a couple of ways. There are app-logic layer problems like not escaping input as well as app-internals layer where app's language matters to some extent. Java does have reasonable protection from buffer overflows... unless it's a bug in jre where all apps suffer. On the other hand we only have some unfortunate exceptions like unusable but safe qmail.

So while it's not the end of your worries of course, from the developer's point of view if you can make sure you're running on a system handling ASLR properly, using a language with managed-memory, designing a system to not require escaping in the first place, so there's nothing to forget about... you can eliminate several classes of attacks without even starting to think about specific scenarios and business logic itself. (not all follows from the post itself, but this was on my mind after reading - so matches your question hopefully)

Made sense to me.

> The fact that query strings are safe in every major NoSQL datastore is such a massive advantage over SQL DBs that, even if you're using prepared statements, NoSQL is almost assuredly more secure in a day to day context, except against zero-days

In what way is my SQL DB (which I use with prepared statements and placeholders) more vulnerable than any NoSQL thing?

> It's not unreasonable to trust major commercial firewalls for anything you'd trust SSH for. It's not reasonable to trust C/C++ applications in general

So the major commercial firewalls aren't written in C or C++? I find that mildly surprising, since I know only of netfilter and pf, and both are written in C

(off topic: curious if you evaluated riak-core, and if yes, why did you pass on it?)

Being the author of the report, a vulnerability researcher, and having been in the industry for years I too think your post misses the point. The article wasn't written for someone who knows the ins and outs of properly securing an environment, it was written for the corporate IT professional or corporate IT security professional who gets thrown into a meeting and someone mentioned the new app they are deploying uses "NoSQL" and they don't know what that means or how they can even start securing it.

While the editors may have sensationalized the title, the key here is that the majority of the users of NoSQL databases are not security researchers or even developers that understand proper defense in depth. Most are using these "database technologies" to solve a specific problem or because they are "cool" so when they go from dev to production, very few controls are implemented anywhere in the stack (network, OS, or app).

Furthermore, if the solution to securing a NoSQL DB is to "secure everything around it" (e.g, the references in the posts to hardening the OS, Network, segmenation, etc) then I can tell you it isn't happening and the deployers of NoSQL don't know that.

The financial institutions that are using NoSQL, that I have assessed, aren't using commercial versions mostly because a developer throws something together, proves it works, and moves it into production rather quickly. I think they will move to commercial versions though as more support is needed.

I don't have an axe to grind, and if you look at my previous database security reports I don't think the big SQL DBs are great either BUT they do provide more options for controls to be put in place at the database level. Of course the rest of the stack should be secured too(duh) but when it comes to defense in depth you want to have as granular controls as possible as close to the data as possible.

I did not make the claim you shouldn't use these technologies, rather I make the claim you can't assume these technologies support the same controls as other relational databases have and call it a day. You have to be much more creative to properly control and audit access.

Lastly, it is difficult to provide actionable technical advice in a article format such as that used by InformationWeek because of the wide variety of readers. Rather, we make recommendations that readers can use to do additional research and find the proper answers which is what we did provide in the actual research report (which is longer and more detailed than the articles).

Oh, and I think Tom had the right idea. Can you explain your firewall and SSL comments? Also, xp_cmdshell as been disabled by default since SQL 2005, so that isn't valid example of "in-secure by default" for a new deployment (even though an attacker can re-enable it if they are 'sa'). If you are deploying SQL 2000 into an environment in 2012 you have other issues)

Sybase and Oracle do have pretty good security solution when you dive deep into them, but I gotta question the whole premise of the article. I have always believed if a outsider can get to your database / datastore then you already have a ton of problems.

Seems like NoSQL target was to solve small part of your Big-Data problem. This is changing because NoSQL can replace RDBMS in many fields and Security is becoming a MUST to have too.

AFAIK OrientDB is the only NoSQL supports security: http://code.google.com/p/orient/wiki/Security.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact