

Why Key-Value stores are like C, and why you might want to use one anyways - smanek
http://blog.postabon.com/why-key-value-stores-are-like-c-and-why-you-m

======
fauigerzigerk
I think you're spot on when you talk about a tradeoff between performance and
productivity. But what I would take issue with (just a tiny bit) is that your
test scenario doesn't really show the extent of lost productivity.

Your test case is ideally suited to what key/value stores are good at. Once
you need joins the whole picture is going to look very differently. Thankfully
you do acknowledge that in your post.

There is another issue as well. You can get good performance from key/value
stores only if you know upfront what kinds of queries you're going to need.
For ad hoc queries you need a query optimizer that takes statistics into
account in order to make use of indexes in the best possible way. RDBMS have
very sophisticated query optimizers.

One other difficulty arises when you try to make this architecture work in a
multi-process scenario. Berkeley DB does support that in principle, but it's
very difficult to make transactions, shared caches and recovery work reliably
with processes that are started by apache modules.

------
jimbokun
I like that you present the trade-offs involved in choosing a relational DB or
a key value store. Key value stores can go faster because they are lower
level, like C compared to higher level programming languages. But this also
means they give you fewer features, and you need to write more lines of code
to get the same functionality, and if you're not careful you are more likely
to have bugs or data corruption.

Most of the other things I have read on this topic have taken a position of
"Relational good, Key-value bad" or the opposite. Your take is much more
useful, with some back of the envelope data to back it up.

~~~
newhouseb
"Most of the other things I have read on this topic have taken a position of
"Relational good, Key-value bad" or the opposite."

This is part of a rift between the people writing the posts and the people
spending their time building Facebooks, Googles, etc. Almost everyone I've
talked to who operates (not just plans to) at HUGE scale relies on a
combination of key-value stores (commonly known as memcached) and relational
databases. That way you get all the speed benefits of something like BDB,
without having to re-invent alternatives to SQL queries in [your language of
choice] wrapping BDB.

------
uggedal
While it's easier to achieve high performance and scalability with key-value
stores, they also make it easier to create highly available applications.
Replication, failover, and multi-master support are abundant in the popular
key-value offerings. This was the reason I eventually decided to use Tokyo
Tyrant for <http://wasitup.com>

------
papaf
I was interested to see that the simple Mysql reads in the benchmark are
slightly faster than the Berkeley DB ones. Has Oracle broken Berkeley DB that
badly or is Mysql particularly fast?

~~~
smanek
I suspect there are a few issues at play. First, with BDB I have to
deserialize the entire object to get the UID - while I tell MySQL I'm only
interested in 1 column. Second, with the simple query I'm literaly just
traversing a b-tree - which is the exact same thing that mysql does since I
told it to build an index on the relevant column (except with BDB I do so in
Lisp instead of raw C).

~~~
gaius
I think a more representative test would be BDB vs SQLite, since they are
architecturally more similar (e.g. there's no SQLite "daemon", it's a C
library).

------
smanek
Hello!

Part 3 of a series of articles I've been writing about the technology behind
<http://postabon.com>. This is about the persistence layer (Elephant) I'm
using - but I tried to make it a little more general so it could help non-Lisp
programmers.

------
billswift
Forgive my ignorance, but I never programmed much and not at all in several
years, so I have been reading Learning Perl as a refresher. In what way(s) is
a Key-Value database different from a hash, or set of hashes? According to the
book, a hash is a set of key-value pairs. I assume the database is normally
several hashes with the same keys for finding values, and the software for
associating the values from multiple hashes, but is there more to it?

~~~
lucifer
The difference is persistence provided by a "store", not the semantics of the
data space (which is a mapping from keys to values). Further, a hash is one of
the alternative implementation strategies for a key-value space; your typical
in-memory "database" will likely opt for hashes, while issues regarding disk
access latencies favor tree structures.

~~~
billswift
Thanks, that makes sense.

------
billswift
I read a lot of the NoSQL posts that were on HN a few months back, but they
didn't have anything but programming details which I couldn't really follow or
were not much more than adverts and "wow, isn't this great" stuff. Anyone know
of a source, preferably a book, but web will do, that describes the principles
of the newer databases as opposed to relations?

------
bitdiddle
good post. Another aspect of these simple key-value stores is that the lack of
schema is sort of like late binding for design. More work needs to be done in
the client layer but the tradeoff is maximum flexibility in evolving the
application. The O-R mapping problem disappears

