Hacker News new | comments | show | ask | jobs | submit login

I agree with the author that hstore is very interesting, but the data structures are not the key selling point in the NoSQL space in my opinion. The most overlooked advantage to things like Cassandra and Riak are the fact that you have no single point of failure in the system. If an individual node fails, there is no operational impact.

Postgres does have (finally!) a nice replication story, so you have data protection on a second server, but the failover mechanics are much more complicated and you still have a single point of failure at the master/writer node. The story gets even more operationally complex when you talk about sharding and the fact that you now have to have a replica per master -- and it really needs to be the same size as the master if you want to be able to trust that you have enough capacity to fail over to it. Suddenly you need to have half of your database capacity sitting idle.

Now, don't get me wrong, I think Postgres is a wonderful database server. For the vast majority of applications it is the correct default choice. Very few applications ever get to the point where they need to scale further than they can get vertically on a single node and most can tolerate a few minutes of downtime to fail over to a backup server. Hand-wavy "sharding is easy" comments, however, ignore a lot of operational reality and that's dangerous.

Understand your use case. Understand the failure modes. Decide how available your datastore needs to be and how much data you need to support. Know the consequences of the choices you make.

Agreed, but of course the massive write parallelism and fault tolerance of DBMS like Cassandra comes at the cost of dropping ACID, which may cause a lot of complexity elsewhere in the system. It also comes at the cost of limiting the types of queries you can perform without resorting to procedural code (at least in the case of column family based architectures). In other words, it comes at the cost of productivity.

So, even if sharding is not easy, you can do quite a lot of it in the time you save by not having to code around the limitations of most noSQL DBMS.

The day will come when RDBMS are secret productivity weapon of smaller companies that don't feel they have to act like they were Google.

That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.

> comes at the cost of dropping ACID

Do remember though that, as discussed here recently, most RDBMSs do not act in a fully ACID compliant way by default. IIRC it is providing a complete isolation guarantee often also provides a hefty performance hit so compromises are made in this area unless you explicitly tell it to be as careful as it can. I imagine this can cause quite a nightmare for master<->master replication.

There are a lot of people using "noSQL" options for the wrong reasons (such as to be buzzword compliant, or because they don't understand SQL), but there are issues that traditional RDBMSs have that stick-in-the-muds like me (who cringe at the phrase "eventual consistency" should be more aware of than we generally are.

Making the right choices about your data storage can be hard.

That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.

I want to go into this just a little. The issue here is just that there are tradeoffs. Understanding these tradeoffs is what good design is about.

RDBMS's excel at one thing: presenting stored data for multiple purposes. This is what relational algebra gets you and while SQL is not relational algebra it is an attempt to bridge relational algebra with a programming language.

An RDBMS will never be as fast performance-wise as a basic object storage system. However, what it buys you is flexibility for that part of the data that needs to be presented for multiple uses.

So the sort of access pattern that doesn't fit is something like an LDAP server. Here you have a well-defined method for integrating the software, and the use cases for ad hoc reporting usually aren't there.

On the other hand, you have something like an ERP that allows file attachments to ERP objects. Even though the data model doesn't fit exactly, you can't do this gracefully with a NoSQL solution.

So I suggest people think about the need for flexibility in reporting. the more flexibility required, the more important an RDBMS becomes.

Additionally if you have multiple applications hitting the same database, an RDBMS is pretty hard to replace with any other possible solution.

Codd emphasised the relational model as being able to change the underlying storage representation without breaking apps.

Will this eventually be a problem for NoSQL? Or, is the scalability worth the sacrifice?

Or, does NoSQL typically have only one (main) app, so making it work with a specific storage representation is not a big deal? The relational use-case was many different apps, with different versions, needing different access paths to the data. But if you just have one known set of access paths (like a REST URI), you can just design the DB for that. Hierarchical databases worked well when just one DB, one app; they just weren't very flexible.

Hierarchical databases are fast and simple but inflexible as the relationship is restricted to one-to-many, only allowing for one parent segment per child. http://it.toolbox.com/wiki/index.php/Hierarchical_Database

>Cassandra comes at the cost of dropping ACID

Suddenly I want to spend my next Saturday night setting up Cassandra at a party...

Edit: Come on, "dropping acid"! I thought it was funny...

there's a project I have recently come across called Postgres-XC which would solve the problem you describe quite nicely. Basically it's Teradata-style clustering based on PostgreSQL. It isn't full-featured yet (version 0.9.6 is the current version, and it doesn't support things like windowing functions), but it looks extremely interesting in this space.

I totally agree with just what your said, this post actually is meant for know your choices. I actually hate it when people make a lame excuse of using a NoSQL store when they don't need it and they don't know the consequences!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact