Hacker News new | past | comments | ask | show | jobs | submit login
The key value store we ignored (Postgresql) (creapptives.com)
255 points by maxpert on Dec 11, 2011 | hide | past | web | favorite | 63 comments

I agree with the author that hstore is very interesting, but the data structures are not the key selling point in the NoSQL space in my opinion. The most overlooked advantage to things like Cassandra and Riak are the fact that you have no single point of failure in the system. If an individual node fails, there is no operational impact.

Postgres does have (finally!) a nice replication story, so you have data protection on a second server, but the failover mechanics are much more complicated and you still have a single point of failure at the master/writer node. The story gets even more operationally complex when you talk about sharding and the fact that you now have to have a replica per master -- and it really needs to be the same size as the master if you want to be able to trust that you have enough capacity to fail over to it. Suddenly you need to have half of your database capacity sitting idle.

Now, don't get me wrong, I think Postgres is a wonderful database server. For the vast majority of applications it is the correct default choice. Very few applications ever get to the point where they need to scale further than they can get vertically on a single node and most can tolerate a few minutes of downtime to fail over to a backup server. Hand-wavy "sharding is easy" comments, however, ignore a lot of operational reality and that's dangerous.

Understand your use case. Understand the failure modes. Decide how available your datastore needs to be and how much data you need to support. Know the consequences of the choices you make.

Agreed, but of course the massive write parallelism and fault tolerance of DBMS like Cassandra comes at the cost of dropping ACID, which may cause a lot of complexity elsewhere in the system. It also comes at the cost of limiting the types of queries you can perform without resorting to procedural code (at least in the case of column family based architectures). In other words, it comes at the cost of productivity.

So, even if sharding is not easy, you can do quite a lot of it in the time you save by not having to code around the limitations of most noSQL DBMS.

The day will come when RDBMS are secret productivity weapon of smaller companies that don't feel they have to act like they were Google.

That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.

> comes at the cost of dropping ACID

Do remember though that, as discussed here recently, most RDBMSs do not act in a fully ACID compliant way by default. IIRC it is providing a complete isolation guarantee often also provides a hefty performance hit so compromises are made in this area unless you explicitly tell it to be as careful as it can. I imagine this can cause quite a nightmare for master<->master replication.

There are a lot of people using "noSQL" options for the wrong reasons (such as to be buzzword compliant, or because they don't understand SQL), but there are issues that traditional RDBMSs have that stick-in-the-muds like me (who cringe at the phrase "eventual consistency" should be more aware of than we generally are.

Making the right choices about your data storage can be hard.

That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.

I want to go into this just a little. The issue here is just that there are tradeoffs. Understanding these tradeoffs is what good design is about.

RDBMS's excel at one thing: presenting stored data for multiple purposes. This is what relational algebra gets you and while SQL is not relational algebra it is an attempt to bridge relational algebra with a programming language.

An RDBMS will never be as fast performance-wise as a basic object storage system. However, what it buys you is flexibility for that part of the data that needs to be presented for multiple uses.

So the sort of access pattern that doesn't fit is something like an LDAP server. Here you have a well-defined method for integrating the software, and the use cases for ad hoc reporting usually aren't there.

On the other hand, you have something like an ERP that allows file attachments to ERP objects. Even though the data model doesn't fit exactly, you can't do this gracefully with a NoSQL solution.

So I suggest people think about the need for flexibility in reporting. the more flexibility required, the more important an RDBMS becomes.

Additionally if you have multiple applications hitting the same database, an RDBMS is pretty hard to replace with any other possible solution.

Codd emphasised the relational model as being able to change the underlying storage representation without breaking apps.

Will this eventually be a problem for NoSQL? Or, is the scalability worth the sacrifice?

Or, does NoSQL typically have only one (main) app, so making it work with a specific storage representation is not a big deal? The relational use-case was many different apps, with different versions, needing different access paths to the data. But if you just have one known set of access paths (like a REST URI), you can just design the DB for that. Hierarchical databases worked well when just one DB, one app; they just weren't very flexible.

Hierarchical databases are fast and simple but inflexible as the relationship is restricted to one-to-many, only allowing for one parent segment per child. http://it.toolbox.com/wiki/index.php/Hierarchical_Database

>Cassandra comes at the cost of dropping ACID

Suddenly I want to spend my next Saturday night setting up Cassandra at a party...

Edit: Come on, "dropping acid"! I thought it was funny...

there's a project I have recently come across called Postgres-XC which would solve the problem you describe quite nicely. Basically it's Teradata-style clustering based on PostgreSQL. It isn't full-featured yet (version 0.9.6 is the current version, and it doesn't support things like windowing functions), but it looks extremely interesting in this space.

I totally agree with just what your said, this post actually is meant for know your choices. I actually hate it when people make a lame excuse of using a NoSQL store when they don't need it and they don't know the consequences!

The main selling point of the various NoSQL products out there today isn't the schemaless storage, instead it's the ability to grow beyond a single server that's compelling.

228MB of data is nothing, it fits in RAM of any machine. What would the examples in this blog post look like if it was 228GB of data spread across 10 servers instead? How would you grow/shrink such a cluster? How would you perform a query that cuts across servers and aggregates data from some of them? How would you replicate data across multiple servers?

228MB of data is nothing, it fits in RAM of any machine. What would the examples in this blog post look like if it was 228GB

You bring up an interesting point. At what data size do the companies using the various NoSQL DBs feel they have to move beyond a traditional RDBMS? I work with some traditional RDBMS stores now that are >500GB in size with tables that add 20M-30M rows/month and querying still isn't an issue. Admittedly it takes expertise to optimize the system and make it work efficiently, but that's going to be the case with any datastore.

I can sustain this claim. I work with a system implemented almost entirely in Oracle PL/SQL. Some tables in the system are nearing 800-900 columns, their size often exceeds 600 GB per table (not many of such large tables though). Querying isn't a problem at all. Large schema changes are also mostly painless. The only point at which one has to be really careful is when a schema change requires actual calculations based on historic data with an additional write on each record.

A lot of criticisms I read about the immutable limits of RDBMSes turn out, upon closer inspection, to be criticisms of MySQL. Oracle sort of sails past these limitations like a superliner: expensively and gracefully.

Not that I particularly like Oracle as a programmer. I have to check my calendar every time I hit the 32-character limit for names or once again have to write

    some_field_expressing_truth  varchar2(1);
    constraint "y_or_n" check some_field_expressing_truth in ('Y','N');
But relational databases already scale to petabytes and millions of transactions per minute. Just not in the opensource world ... not yet, anyhow.

Well PostgreSQL does scale to millions of simple read-only transactions per minute, at least in some benchmarks. And in PostgreSQL 9.2 it will scale much better, in the synthetic benchmark it reached about 13 million per minute at high concurrency (compared to about 2.5 million for 9.1).


Hence "not yet".

It's been amazing watching the performance surge in postgres these past few years. I wonder if Red Hat or similar will try sponsoring a tilt at the TPC-C crown in future.

Back before Sun bought MySQL they started doing a lot of performance work on Postgres. They didn't go after TPC, but they did show it was only(?) 12% slower than Oracle for SpecjAppServer: http://www.informationweek.com/news/201001901

If you want something to play with that could scale to that size (although maybe not ready for production), see Postgres-XC (http://postgres-xc.sourceforge.net/). This looks promising.

The Wisconsin Courts system has servers running PostgreSQl that they say scale up to millions of transactions per day on 16 cores w/128MB RAM/

The trick is, you wouldn't necessarily. If your workload can be handled with one beefed-up lots-of-RAM-and-solid-state-drives server, you could spend your money on two of these instead of having 10 smaller servers, and be perfectly happy with it.

I don't care about what sells NoSQL products to enterprise users - there are lots of workloads where the data fits in RAM (for some reasonable quantity of RAM that money can buy) and you still prefer the durability and consistency that comes with standard SQL databases, even if you denormalize and/or use schemaless (XML or JSON) storage.

Each database worth its salt will use as much memory as you allow it to use in order to avoid physical IO.

Your point is valid of course. I just wanted to point it out for people that think each action in a RDBMS results in a costly IO operation.

various NoSQL products it's the ability to grow beyond a single server that's compelling.

You make it sound as if these products scale "magically". This is definitely not the case. There's quite a bit of massaging needed to make riak, hbase et al scale beyond certain thresholds - and you better pay close attention to their respective peculiarities.

How would you perform a query that cuts across servers and aggregates data from some of them?

That's actually not very hard to implement once you understand how it needs to be done. And you do need this understanding with any k/v-store, otherwise you'll be very sad when the magic stops working (as documented time after time for MongoDB).

Starting out with a known-good, mature store such as PostgreSQL can make a lot of sense when the trade-offs match your usage pattern.

You can buy a 2u server with 256GB of ram. Think about this.

The main selling point of the various NoSQL products out there today isn't the schemaless storage, instead it's the ability to grow beyond a single server that's compelling.

That's just not true. Some NoSQL products (HDFS, Cassandra etc) sell on the ability to easily scale out. Other (CouchDB, MongoDB etc) focus on other features. CouchDB (for example) doesn't have a real scale-out story at all (beyond "manually shard your data", or try http://tilgovi.github.com/couchdb-lounge/), but that isn't really a problem because it has other features that sell themselves.

There's now Couchbase which does the sharding for you.

I think for schema-less storage systems we know 2 major competitors in market. MongoDB and CouchDB. - CouchDB by default has not ability to scale out except master master replication, solution? Sharding for distribution and Replication for reliability. Or you BigCouch with prayers that it won't trash out your data. - MongoDB is know to stand on its Sharding server mongos and you have to issue sharding commands whenever you scaleout and rebalance. So its again Sharding for distribution and Replication for reliability! - Postgres -> Reliable in storage than both if you do same Sharding and Replication yourself :).

I am in no way substituting or saying 228MB is enough data! I would rather hate MongoDB for being RAM hungry and storing same 115k tweets in Gigabytes of memory (256MB is just like a started of 100 Course meal for MongoDB). Again Facebook ultimately prefers his data to go to MySQL and they have largest 600 shard for a reason!

> I think for schema-less storage systems we know 2 major competitors in market. MongoDB and CouchDB.

Um, what about Riak, Cassandra, Voldemort, and Hbase? (I'm sure there's a bunch more I'm forgetting)

Cassandra, Hbase -> Column Oriented! Not schema less! Riak -> Actually a key/value store with link walking, you can just write map reduce for that document oriented feel! Again I won't bother writing a map-reduce job just to fetch out document with particular values. Voldermort -> Distributed key value.

Again you are missing the point of maturity and a proven user base, and it's comparing apple with bananas! Try putting in same joins and relations in your NoSQL stores that you are bragging about and see how quickly they will lose scaling! Want an example? Neo4J!

> Actually a key/value store with link walking, you can just write map reduce for that document oriented feel! Again I won't bother writing a map-reduce job just to fetch out document with particular values

I thought CouchDB required this as well?

Schemaless doesn't necessarily mean "document-oriented." Sparse-column / ColumnFamily databases like Cassandra are a lot closer to "schemaless" than they are to "traditional rdbms schema."

What about handlersocket for MySQL ? you can have the joins and relations there if you need them.

> Riak -> Actually a key/value store with link walking, you can just write map reduce for that document oriented feel! Again I won't bother writing a map-reduce job just to fetch out document with particular values.


Quite apt.

Personally I find writing MapReduce jobs (in JavaScript no less) to be unbelievably clean and easy when your stack is Riak + Node.

Of course if you've been using SQL for years then this probably sounds difficult in comparison. Except what if you're a JS guy with zero SQL experience?

"Ok, it's a database. How do I query it?"

"You learn this completely new language and dynamically compile your questions down to it, but you have to be really careful because the process is notorious for being a first-class attack vector."

"Did you just tell me to go fuck myself?"

I'm not trying to say anything about the merits of SQL. I'm just pointing out that it's a matter of perspective.

I'm not a big fan of NoSQL, but sometimes I want to write a query and sometimes I just want to write some code, and I could see the appeal of doing it in JS, and especially some of the languages that target JS.

The thing about SQL as an attack vector is frustrating because it (usually) doesn't need to be: use prepared statements and let the driver handle value substitution for you. It's quicker and easier than escaping everything.

I've been using a lot of MySQL and some NoSQL solutions over the years. I don't need most of what NoSQL is supposed to offer and uhm, a couple of years ago I tried PGSQL.

I've never found the same level of finish, performance, etc. It's also very customizable and very dependable. You can get extremely good performance out of it.

Finally... the devs are plain AWESOME. Every time I had an issue, question, etc, not even necessarily related to PGSQL but to SQL and performance in general they would reply quickly, and accurately (I owe one of them many, many beers).

To give you an example, a company I worked for was doing a 10M to 10M comparison (for very specific needs). All businesses we have found were using custom NoSQL solutions, or even Oracle, with huge clusters, costing a lot, and taking ~24h for a result.

So we implemented ours. After a year of testing, coding, etc, it was settled on PGSQL (on FBSD), with the help of the said dev, we were doing the same comparison,in 3H, on a single quad core system (with heaps of ram).

Sure, maybe the other businesses never cared that much for performance, but its still astonishing. Most optimizations were made thanks to PGSQL's dev help were usually speeding up the process by thousand of times.

Disclaimer: I can't detail the comparison (its still under NDA), but it included very specific stuff, in case you wonder about the time taken, etc.

Finally... the devs are plain AWESOME.

A few years ago I got a dreaded call from a customer. A point of sale system was taking 45 seconds suddenly to post invoices. This is a big deal for a cash register role, as you can imagine.

So I went on site, ran some diagnostics, isolated the problem query, etc. At this point they were into the slow part of the afternoon so I asked about it on the -perform list along with query plans and everything else I could think of.

Within an hour I had received an email from Tom Lane explaining:

1) What the planner was getting wrong

2) Why the planner was making an error there, and why this was a special planner case.

3) Giving me suggestions as to how to fix it.

First rate support, let me tell you. (Also, planner behavior was changed in a release soon after to accommodate my use case.)

Finally... the devs are plain AWESOME.

About 10 years ago I had cause to look at the source and I have to say I don't think I've ever seen better written C. The changes we were considering were vetoed, but I marveled at how easy it was to identify the place I would have needed to make the changes and verify that they would not have unintended consequences.

I don't know if it still does, but the code I saw back then read like a book.

In my opinion, Postgresql + hstore + PLv8js (Javascript as a language for stored procedures, see http://code.google.com/p/plv8js/) could really rock the world if it became an accepted standard and got some people behind it.

It's not so great if you start using something and one year later see that the original authors lost interest and move on, leaving you with 10k lines of code that depend on a pray-for-no-bugs unsupported piece of software.

Totally agree. I've been building out a new system with hstore and it's been great, planning to do some benchmarking with plv8js as well. If you've got a dataset that you can conceivably scale vertically to accomodate, postgres is a solid and feature-packed database.

> ...database grew to a size of 239691780 bytes (Just 228MB)

> ...gives me 14689 rows just under 360ms on average

Uh, 360ms seems like an awfully long time to query such a small dataset.

The point of the article wasn't how fast the query ran, nor is the hardware specified. Without that information any assessment of speed is meaningless.

I think that includes the time to transfer the data.

I sure hope not. That's usually not the way you benchmark database queries.

Then its a good thing that line in the article had nothing to do with benchmarking a database query.

reddit used (and still for the most part uses) postgres as a schemaless storage (although they don't use hstore).

Also, I'm pretty sure Heroku uses hstore, because they're the ones that taught me about it.

So not everyone forgot about it.

Just read the comments around and you will see the dogma people face on daily basis despite the huge databases of reddit!

dogma... what do you mean?

Documentation about hstore in postgresql http://www.postgresql.org/docs/9.1/static/hstore.html

Yes -- but not with that module. We're using it to tag documents (individual rows in Postgres and Solr) with user-defined metadata.


... which can then be used to power custom indexes of particular collections of documents, like this one:


The fun bit being, that in both Postgres and Solr, you can still run a precise search (including joins for access control, sharing, and boolean full text clauses) with a single query because the HStore keys and values are indexed.

How did you build it? Are you using hstore SQL directly? Do you just have migrations with raw SQL in them?

Let's dig through it a bit. Our hstore column isn't directly on the document row, but rather in a joined "docdata" table. There was a semi-raw migration to add it:


The "Docdata" model handles serializing JSON hashes of keys -> string values into the hstore format, and parsing 'em back out. This is a little loose, but works for most of the simple things you'd like for tagging documents (senator: Kerry, statecode: WV, recordid: 019281)


... the resulting effect is that from the Rails side, working with "document.data" directly is like working with a lazy-loaded hash.

The more interesting bit is the search side of things, here's the query parser and query object (which generates SQL and Solr queries):



... for any given query, we detect if Solr is required (if full-text is involved), or if the search can be satisfied purely in Postgres. So there's one way of generating the key:value query for each back end.

Hope that answers the question.

(Edit:) Forgot to mention that we add a little extra on top of vanilla hstore "key:value" queries. You can write "dob: *" to show documents with any senator tagged, and "dob: !" to show documents that haven't been tagged with a date of birth yet.

This is awesome, thank you (didn't think to check if you were in Github, but the explanations are great too).

I built this gem for ActiveRecord hstore serialization. Uses ActiveRecord 3.1+ custom serializer hooks.


Thoughtbot posted a similar Postgres article recently:



Did you click the link? Unless I'm missing something, it's completely different content. One submission focuses on Postgres's hstore data type, and the other focuses on its XML data type.

I tried hstore on a project for many of the reasons mentioned. I just hated having to always represent data as strings.

Isn't this effectively what Redis does too?

There are whole Salvatore posts about doing binary data structures in Redis string values.


Strings are the most basic kind of Redis value. Redis Strings are binary safe, this means that a Redis string can contain any kind of data, for instance a JPEG image or a serialized Ruby object.

I am confused by this comment. In the PostgreSQL protocol, everything is a string if you use the text protocol, but you can also use the binary protocol, in which case there is a reasonable binary storage format for hstore. It is even more confusing, as whichever protocol you use seems like an implementation detail of your driver: at the level of an application you should only be working with dictionaries.

how does this compare to MySQL's HandlerSocket?

I haven't done benchmarks yet but if you are looking for HandlerSocket benchmarks http://blog.creapptives.com/post/3329352663/the-nosql-dogma is the link

Those benchmarks are very flawed as HandlerSocket only works in InnoDB. This is a better post on the subject: http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-a...

@chrismealy.. http://croak.eu/rTBVKX

Quit spamming this bullshit here. HN is a text-based discussion forum.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact