Hacker News new | past | comments | ask | show | jobs | submit login

Hey Ben, what (refs/books/lectures) would you recommend for someone who is interested to learn theory behind these kind of DBs and implement it from scratch?



I originally started Bolt to learn about the implementation details of low-level data stores. It started as a pet project but became more serious as I filled it out and added additional layers of testing.

I'd honestly suggest reading through the source code of some database projects. That's the best way to learn the structures and understand the engineering tradeoffs. Some are more approachable than others though. Bolt and LMDB are 2KLOC and 8KLOC, respectively. LevelDB is 20KLOC and RocksDB is 100KLOC, IIRC.

If you want to go further then I'd suggest porting one of those over to a language you know well. It doesn't have to be code you release to the world or use in production but just porting the code really helps to cement the concepts in my head.


To get the theory, you could just start on Wikipedia. http://en.wikipedia.org/wiki/B+_tree

To understand why the LMDB implementation does what it does will require much more practical knowledge, not theory. The references in my MDB papers (and the paper itself of course) will give you more of that. http://symas.com/mdb/#pubs

B+trees are trivially simple, in theory. Writing one that is actually efficient, safe, and supports high concurrency requires more than theory. It requires understanding of computer system architecture: CPU architecture/cache hierarchies, filesystem and I/O architecture, memory, etc.

It does not require 100KLOC though. RocksDB is just a pile of gratuitous complexity.


Thanks for the good points! Bolt and LMDB seem approachable, and I'm especially interested how ACID compliance is implemented and how to check/prove the correctness of it...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: