

Cdb: a fast, reliable, simple package for creating, reading constant databases - networked
http://cr.yp.to/cdb.html

======
davidu
We know that a number of very high-traffic, high-volume startups in Silicon
Valley have gone great distances with CDB... OpenDNS included. :-)

------
jedisct1
Unless you are just storing a couple entries that rarely change, you should
really consider [http://symas.com/mdb/](http://symas.com/mdb/) nowadays.

------
krenoten
CDB is awesome for its use case - slow changing, read-heavy workflows that are
tolerant of stale data. One limitation of the original implementation is the
use of 32-bit keys for addressing, which limit the addressable size to 4gb.
There are 64 bit modifications, but I have not used them. Does anyone have an
opinion on any of the 64-bit implementations?

~~~
joshu
Supposedly
[https://github.com/gstrauss/mcdb](https://github.com/gstrauss/mcdb) is quite
good.

~~~
jrossi
This here.....Glenn knows his stuff and the testing used by mcdb is great.

------
asb
See also tinycdb, a public domain reimplementation
[http://www.corpit.ru/mjt/tinycdb.html](http://www.corpit.ru/mjt/tinycdb.html)

~~~
dne
The original code is also in the public domain, since 2009:
[http://cr.yp.to/distributors.html](http://cr.yp.to/distributors.html)

------
ctdean
CDB is one of my favorite data structures. When a student wants to learn about
databases, I get them to implement cdb.

It's easy to implement and really demonstrates some good system engineering
tradeoffs.

~~~
zura
Any links to theory behind cdb?

~~~
twic
I'd be particularly interested to know why there are 256 hashtables, rather
than one, or some other number. I don't think i've seen this hybrid trie-
hashtable before.

I wonder if there's any mileage in using a perfect hash function to build a
database like this. It seems suited to the operating model of being slow to
build but fast to access.

~~~
geogriffin
i second that; if anyone knows why there are 256 hashtables rather than just
1, please speak up. my only guess is that it might be a way to prevent 32-bit
int overflow in the C code..

------
derefr
I'm put in mind of all the things that take up the most space on a minimal
Ubuntu installation: charmaps, mime types, tzdata, geoip, etc. Seems like all
of these could do with being packed into a tight, read-only K/V store.

------
ludamad
What advantages does this have over SQLite?

~~~
zrail
It's _extremely_ fast and it's easy to work with for reading, but it's just a
key-value store. Also, it's sort of weird to work with on the write side. You
have to do atomic replaces of the read-only data files.

~~~
jcrites
Regarding writes, most likely the expectation is that you'll update your
entire data set periodically in a batch job, or export and cache another
authoritative data store. I understood the atomic replace feature as referring
to a convenient way to flip to a new version of the database after its
production & distribution by a batch job. It sounds like the "cdbmake" tool is
designed to facilitate this.

A lot of data doesn't change very often; and a lot of data can tolerate
propagation delays on change. When you have an extremely high read volume
against such data sets, it can be economical to cache the entire data set and
distribute it periodically to the fleets of machines that require access, as
opposed to servicing individual reads over a network. Provides lower cost and
latency, while supporting a higher volume of reads and higher availability, at
the expense of engineering cost and propagation delay.

------
PuercoPop
IYAI Common lisp's quicklisp implements and uses cdb
[https://github.com/quicklisp/quicklisp-
client/blob/master/qu...](https://github.com/quicklisp/quicklisp-
client/blob/master/quicklisp/cdb.lisp)

------
sophacles
How does this compare to LMDB
([http://symas.com/mdb/](http://symas.com/mdb/))?

------
wcummings
Weird side-note, when using CDB's from Perl, do not use tie, its painfully
unperformant (realized this the hard way)

~~~
tantalor
Indeed the documentation notes this.

[https://metacpan.org/pod/CDB_File#PERFORMANCE](https://metacpan.org/pod/CDB_File#PERFORMANCE)

~~~
wcummings
I was handed code that used tie, took me a while to realize what was killing
my perf..

------
visarga
I'd like to see a more up-to date CDB with the 4Gb limit removed. Writing
could be implemented by using additional smaller CDB or text file. It would be
an interesting problem to find at what size a flat text file should be
converted into CDB to maximize speed on the whole.

------
xvilka
There is an also sdb - simple string based key-value database with support for
arrays and json, and based on CDB
[https://github.com/radare/sdb](https://github.com/radare/sdb)

------
malkia
Can anyone tell me what prevents it from running on Windows?

~~~
malkia
Found out why (or maybe there is more) - it's using mmap() which is not
directly available on Windows (but there is MapViewOfFile and
CreateFileMapping).

------
krajzeg
_No random limits: cdb can handle any database up to 4 gigabytes._

Am I the only one who finds it hilarious that the first thing after the "no
random limits" heading is a random limitation?

~~~
self
When you have 32-bit pointers, 2^32 isn't a random limit.

------
cordite
That's cool. But I feel like I'll totally forget about it and lose reference
to this (in case I have future interest).

Where's a github mirror?

A google search reveals some entries from the language implementations.

Go: [https://github.com/jbarham/go-cdb](https://github.com/jbarham/go-cdb)
Java: [https://github.com/malyn/sg-cdb](https://github.com/malyn/sg-cdb)
Haskell: [https://github.com/adamsmasher/hs-
cdb](https://github.com/adamsmasher/hs-cdb)

~~~
dsl
Most open source development takes place outside of github. It is a very
valley-centric thing.

You'll find a ton of life changing stuff on SourceForge and random FTP sites.

~~~
icebraining
Notice cordite said _a_ Github mirror, not _the_ Github mirror. You don't need
to have your main development happen on Github to have a mirror there, just
someone mirroring from the main server.

See, for example, the Linux kernel.

