

LevelDB: A Fast Persistent Key-Value Store - skanuj
http://google-opensource.blogspot.com/2011/07/leveldb-fast-persistent-key-value-store.html
The inventors being Jeff Dean and Sanjay Ghemawat.
More here : https://plus.google.com/118227548810368513262/posts/1UEtSkKp1vv
======
gojomo
An interesting development a while back that I'm surprised hasn't received
more attention was Oracle's release of a SQLite-based interface to BDB:

[http://www.oracle.com/technetwork/database/berkeleydb/overvi...](http://www.oracle.com/technetwork/database/berkeleydb/overview/sql-160887.html)

It's essentially drop-in compatible with SQLite, but with added concurrency
and speed for most operations. (The concurrency addresses a major issue
usually keeping SQLite as a prototyping/single-user-only option in web
development.)

With LevelDB as a BSD-licensed alternative to BDB, I wonder:

(1) How would the LevelDB-vs-SQLite benchmarks change against SQLite+BDB
backend?

(2) Could a SQLite fork with a LevelDB backend get a performance boost?

~~~
est
> SQLite-based interface to BDB

One thing I didn't get about SQL API for BDB, how does something like

    
    
        select * from users where name!='tom'
    

work ?

~~~
gojomo
You really don't know or care that you're using BDB; it works (to the user)
just like SQLite. (Behind the scenes, it's using BDB for the tables/indexes,
and so would do various full- or partial- table scans much like SQLite's
native on-disk format.)

------
dchest
See also:

Previous discussion - <http://news.ycombinator.com/item?id=2526032>

Benchmarks vs Kyoto TreeDB and SQLite3 -
<http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html> (discussion -
<http://news.ycombinator.com/item?id=2813061>)

Benchmarks vs InnoDB - <http://blog.basho.com/2011/07/01/Leveling-the-Field/>

------
gaborcselle
Hi there! I'm a YC alum (reMail W09) and helped Jeff and Sanjay with LevelDB.
Let me know if you have any questions about LevelDB and I'll see if I can
help.

~~~
aashay
How does this compare to other persistent key-value stores such as Membase?

~~~
dlsspy
Membase is a clustered data storage service your application uses.

LevelDB is a persistence library.

That makes LevelDB the kind of thing you plug into membase to get the unique
properties it has to offer (or at least for fun).

~~~
pjscott
In fact, the Riak guys are planning on doing exactly that: offering LevelDB as
one of the storage back-ends, perhaps even the default.

<http://blog.basho.com/2011/07/01/Leveling-the-Field/>

------
stonemetal
Sorry if this is a bit off topic but it seems to me like most of Google's
opensource projects are more source available than open source. Do they
actually take contributions from the community or are they all like android,
source made available when its "ready for public consumption"?

LevelDB sounds like something I would like to contribute to but if the
reception is going to be chilly I won't bother, maybe pick up mongo or redis
instead.

------
gleb
The synchronous writes benchmark is interesting. This is normally bound by #
seeks your disk can do per second, which is mostly a function of rotational
speed. With 7200RPM drive you get 7200/60 = 120 of these a second. So the 100
and 110 numbers for competitors make sense. 2,400 for LevelDB does not.

Is LevelDB batching writes or is there something more interesting going on?

~~~
agazso
If you are writing sequentially, then you can write more than the number of
seeks.

And that is exactly what LevelDB is doing: writing a log (sequential), and
when the memorychunk is full, it is writing it to disk sorted (this is also
sequential).

~~~
leif
flushing the log in an LSM is only kinda sequential, sadly

------
jchrisa
Bindings for Node.js if anyone is interested:
<https://github.com/creationix/node-leveldb>

------
stephth
From the announcement: _it has already been ported to a variety of Unix based
systems, Mac OS X, Windows, and Android._

It's worth noting that the makefile includes options to build for iOS. I've
successfully done it and my next iOS app will include LevelDB. Also worth
noting, thanks to the iOS devices SSDs, it's much faster than with the
traditional HDD machines.

------
taylorbuley
_Upcoming versions of the Chrome browser include an implementation of the
IndexedDB HTML5 API that is built on top of LevelDB_

Really excited about seeing IndexedDB run atop of this

------
rektide
Open sourced as of March 18th, 2011:
[http://code.google.com/p/leveldb/source/browse/trunk/LICENSE...](http://code.google.com/p/leveldb/source/browse/trunk/LICENSE?spec=svn2&r=2)

That initial checkin: <http://code.google.com/p/leveldb/source/detail?r=2>

~~~
gaborcselle
Yes, we put up the Google Code site incognito mode back then, but have since
added a number of bugfixes and optimizations, so we're actually comfortable
announcing the project now.

------
swah
Interesting how, like in the open-sourced protobuf, there are no commits by
Jeff or Sanjay...

~~~
shadowmatter
Jeff and Sanjay wrote the original protocol buffer implementation. The project
was taken over by Kenton Varda, who rewrote the C++ and Java parts; this is
what was open sourced. See <http://temporal.fateofio.org/files/resume>

~~~
swah
Which is what I point as being interesting.

------
newhouseb
How is this different than BDB?

~~~
davidhollander
BDB is a key\value store for unordered data more similar to Tokyo Cabinet hash
databases. Tokyo Cabinet hash databases are a much faster option than BDB if
you only need unordered data.

LevelDB is for if you need ordered data, and a more appropriate comparison
would be against a B+\tree database.

~~~
stephth
_LevelDB is for if you need ordered data_

LevelDB is slower with random reads, but that doesn't mean you shouldn't use
it for unordered data - it's still quite fast.

~~~
davidhollander
> _LevelDB is slower with random reads, but that doesn't mean you shouldn't
> use it for unordered data - it's still quite fast._

In a positive analysis (should rather than shouldn't), assuming no default
choice, it seems rational to use Tokyo Cabinet or CDB _hashmaps_ for unordered
data, and LevelDB for ordered data, from a datastructure and performance
standpoint. To assert more would probably need a specific use case for
context.

~~~
stephth
_In a positive analysis (should rather than shouldn't), assuming no default
choice, it seems rational to use Tokyo Cabinet or CDB _hashmaps_ for unordered
data, and LevelDB for ordered data, from a datastructure and performance
standpoint._

It's as rational as doing optimizations. If the specific performance is
extremely critical, yes, it definitely makes sense. But LevelDB does well with
random reads [1]. With all its features and its permissive license, I think
it's a strong contender as developer's go-to embedded key-value db, like
SQLite for relational data.

Don't get me wrong, I do think your comparison is valuable, but I'm afraid it
could be misleading; I specially found the wording _LevelDB is for if you need
ordered data_ misleading. Someone could read it and assume that LevelDB
doesn't do well with unordered data.

[1] Compare today's benchmarks with these
<http://fallabs.com/tokyocabinet/benchmark.pdf>, it looks like random reads in
LevelDB are quite faster than BDB.

~~~
davidhollander
> _and its permissive license_

Other variables such as licensing are context dependent. In my context, I
generally use embedded databases in shared library form to store arbitrary
serialized Lua (much faster to reload than JSON, no need to compile protocol
buffers) on my own servers, so in my development context LGPL vs BSD is
irrelevant. But perhaps not for an iPhone developer. TokyoCabinet already has
excellent bindings for nearly every language I've used, but thats irrelevant
to a developer whose application is also C.

> _Compare today's benchmarks with
> these<http://fallabs.com/tokyocabinet/benchmark.pdf>, it looks like random
> reads in LevelDB are quite faster than BDB._

I'd bet BDB is a snail, BUT the benchmark you just linked tests 0 databases in
common with the one released by Google. What can be observed is that using a
hashmap (TC) was over 7X faster than randomly accessing the fastest ordered
data structure by the same author (TC-BT-RND) :)

------
jcapote
it would be cool to make a leveldb backed fork of redis

~~~
stanleydrew
Pardon my ignorance but what's backing redis currently?

~~~
stephth
Redis. :) But more importantly, it runs as a server. I think what @jcapote
meant is being able to use Redis operations without a server, like Leveldb or
sqlite. I would love to see that. There's already some effort towards that
direction [1], but using a google backed project instead of building a full
library from the ground up could be a saner approach.

[1] <https://github.com/seppo0010/redislite>

------
swah
I love the insight about how fast compression (Snappy) is like having faster
hard drives.

------
timr
[http://www.hnsearch.com/search#request/all&q=leveldb](http://www.hnsearch.com/search#request/all&q=leveldb)

~~~
swah
Your point?

~~~
timr
That leveldb has been discussed several times on HN in the last two months. I
just didn't break out the links from the search UI.

Downvoters: links to previous context are generally considered a good thing
here.

~~~
swah
Oh, then its easier to understand that in this query:
[http://www.hnsearch.com/search#request/all&q=leveldb&...](http://www.hnsearch.com/search#request/all&q=leveldb&sortby=create_ts+asc)

~~~
timr
Yeah, the search results when I linked them had those results. The index was
subsequently updated, so the top hits are now this thread, and this article.
Oops.

------
mumrah
Anyone know how LevelDB compares to Voldemort? From a cursory glance, they are
identical in their simple API (get, put, delete)

~~~
strlen
Voldemort developer here--

Voldemort to LevelDB is what MySQL is to InnoDB: Voldemort is a distributed
system that allows multiple engines to be plugged in. Mostly commonly,
companies use either BerkeleyDB or MySQL as a storage engine. LinkedIn,
Mendeley, EBay and others also use the read only storage engine, where the
data is pre-built in Hadoop and loaded into Voldemort.

I am really excited about LevelDB: while there are higher priority projects on
my plate right now, we'd very much like to see a LevelDB storage engine. If
anyone is interested in contributing one, they're welcome.

The steps are:

1) Creating JNI bindings to LevelDB (or creating a .so version of LevelDB and
creating JNA bindings)

2) Implementing the StorageEngine interface with the bindings, including
passing in any configuration.

Here is an example of a third party InnoDB/Haildb storage engine for
Voldemort:

<https://github.com/sunnygleason/v-storage-haildb>

------
skanuj
Note to myself : Search before you post. Apologies, I checked new and front
page only!

~~~
dchest
You were right to post it -- it's a new blog post announcing the non-beta
release and benchmarks.

------
newman314
How well does LevelDB work for a mobile device? This might be a nice use case.

------
overred
LSM-Tree is good!

------
swah
So that's what Jeff does!

------
trungonnews
how is this different from membase?

~~~
eis
Uhm shouldn't that be ovious by reading the high level descriptions of each?
They are for completely different use cases. Membase is a distributed
Key/Value server and LevelDB is a Key/Value library.

~~~
thomas11
I see the difference between a server and a library, but both can often be
used for the same use case. Just recently I evaluated a few data stores for a
project and I didn't care all that much about the distinction. For the servers
you're gonna use an API for your programming language anyway, so the
programming model isn't that different.

