
Ask HN: Which books should I to read to become a storage engine designer? - majidazimi
I want to focus on database storage layer such as different data structures, concurrency techniques, file layouts, ... I really need some concurrent examples on these topics. Since many books says use B-tree for indexing. How to store it? How to increase concurrency? How to merge memory buffer with on-disk files while DBMS is working? How to do crash recovery?
======
sprocketonline
Not a book, but Ayende's blog series on Database Building 101 is a great
starting point covering a lot of the issues and opinion from an active storage
engine designer. [https://ayende.com/blog/posts/series/175041/database-
buildin...](https://ayende.com/blog/posts/series/175041/database-building-101)

In other series he also covers Voron, his alternative to Microsoft Esent
engine. [https://ayende.com/blog/posts/series/175074/voron-
internals](https://ayende.com/blog/posts/series/175074/voron-internals)

------
brudgers
Not really a direct answer, but perhaps reading source code for existing
storage engines might be another direction from which to approach the goal.
One advantage is that the technology in an active popular project is by
definition current. Another is that it provides an opportunity to become
involved in a community and ask questions and potentially apply knowledge, for
example by fixing bugs.

Good luck.

~~~
majidazimi
My knowledge is at very basic level. Current projects are extremely modern and
exploit very sophisticated techniques. I'm in the "Hello, World" step. I need
some basic text or papers to get some ideas.

~~~
brudgers
Oh, I think I misunderstood. My goto introduction is _Database Systems: The
Complete Book_ by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom. The
first edition can often picked up online in used but reasonable condition for
very little money.

On the other hand, another way to approach storage is from the bottom up using
file systems. My understanding is that this (files on disk) is what Hacker
News has traditionally used. There are 'state of the art' systems that use
this approach, notably Hadoop and the Hadoop Distributed File System [HDFS].

That brings me to my last point which is 'skating to where the puck is going'
and by that I mean that much of what matters in data storage today are more
related to distributed systems and less related to how data is stored on disk
because storage of files on disk is best treated as a solved problem in almost
every case. Partly because getting block level storage correct is hard and
partly because storage is cheap and fast relative to engineering for high
efficiency.

Lastly, my direct advice is to that the best 'Hello World' for running
Database Management Systems [DBMS] is to install and run several DBMS's. On
Linux, this it is highly practical to run production quality systems on a
laptop even for a hobbyist. And that reminds me that running Linux is probably
a valuable first step if it is not your current operating system because Linux
is the most common operating system for running DBMS's in production and
competency in 'the next layer down' is going to be useful.

~~~
majidazimi
I think I didn't explain my problem well. What I really need is a book (or a
set of papers) that describes and compares for example LSM Trees or ISAM or
other techniques (Which I'm not aware of many of them). I'm not a basic
developer in database world. I want to dig into it and understand how actually
it works (This is where I am basic).

~~~
brudgers
Ok, then I'd start with a bibliographic serach in _The Red Book_ , though for
something like ISAM, previous editions might also be relevant.

[http://www.redbook.io/](http://www.redbook.io/)

