Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

djb's cdb is an excellent use case for this, as updates are infrequent and stale data is tolerable. It is an extremely fast constant database that when hot will sit snugly in page cache.


We used sparkey for hammerspace, which is very similar to cdb. cdb was actually the first thing we evaluated, and it validated parts of our approach.


Even if your data doesn't all fit in page cache with cdb, the 8 bytes per entry (hashcode & offset) of overhead in the main hashtable absolutely will, so even if you're too big for memory, you get single-seek-per-lookup.

The only problem with it is the 32bit offsets mean a 4GB max. Not too hard to fork it for 64bit offsets, though.


We ended up using sparkey as the first backend for hammerspace, but hammerspace was written to support multiple backends. We benchmarked both cdb and sparkey, and their performance was very similar for our use case. At under 100mb of data, we weren't concerned about the 4G limitation or about the data not fitting in cache. I don't think sparkey has the 4G limitation, and someone has already forked cdb to support 64 bit offsets: https://github.com/pcarrier/cdb64




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: