
Let’s Build a Simple Database (2017) - AlexeyBrin
https://cstack.github.io/db_tutorial/
======
ccleve
I have spent quite a lot of time building a database in Java. It's not a small
undertaking.

I've gone through as many books and papers as I can, and there is only one,
_one_ , that gives an overview and actual examples of building a system that
processes SQL. Everything else is theoretical or so general that you don't get
enough direction to get started.

It's "Database Design and Implementation" by Sciore. It's a little dated, but
not by much. It is an underappreciated resource.

~~~
fgonzag
You could have read the source to the processor of one of the many open source
SQL databases though (off the top of my head: MySQL, PostgreSQL, SQLite,
Derby, H2, HSQLDB). Many of them have very clean code, SQLite would be my
choice if you know C otherwise one of the java databases (though I've never
actually read their source)

~~~
afarrell
For some, reading a body of source code without any guidence about what path
to walk or where to focus your attention is really quite a hard way to learn.

~~~
fgonzag
It is an essential skill for our job though, being able to dive into a new
code base, form a mental model of it, and modify it either because of new
feature requirements or simply bug fixing.

And it is a skill, I also had a lot of trouble with it initially, but diving
into several open source projects was the perfect way to practice it, and it
helped me tremendously in my professional development.

~~~
PretzelFisch
Reading the source doesn't tell you why they chose that route or what the
tradeoffs are.

~~~
ldng
Well, depends on the quality of the documentation I guess.

------
ignoramous
Folks interested in this sort of thing might like Anatomy of a Database System
by Stonebreaker et al, which I think is the most concise treatment on the
subject:
[http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf](http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf)

Also, see Redbook for the latest ensemble of research ideas in databases.
[http://www.redbook.io](http://www.redbook.io)

~~~
alexchamberlain
The Redbook looks like an awesome resource. I think it's a real shame that
find resources at the intermediate level of database system design is so hard;
feels very much like the "how to draw an owl" meme

~~~
mathnmusic
> it's a real shame that find resources at the intermediate level of database
> system design is so hard

This is a common problem across subjects. Learning resources need to be
curated topic-wise, format-wise, & difficulty-wise. Some of us have started
doing this over here: [https://github.com/learn-awesome/learn-
awesome](https://github.com/learn-awesome/learn-awesome)

Join in and help curate the database systems topic? [https://github.com/learn-
awesome/learn-awesome/blob/master/d...](https://github.com/learn-
awesome/learn-awesome/blob/master/databases.md)

------
tyingq
Another exercise that's fun is playing with Sqlite virtual tables and/or the
Sqlite VFS layer.

Virtual tables:
[https://www.sqlite.org/draft/vtab.html](https://www.sqlite.org/draft/vtab.html)

A simple example of using VFS to add a useful feature:
[https://github.com/Metalnem/sqlite-
vfsdemo](https://github.com/Metalnem/sqlite-vfsdemo)

~~~
titanix2
Thanks a lot. I planned to implement virtual SQL tables for a project and
wasn’t aware of existing product. This probably will save me a lot of time.

~~~
mappu
You might also be interested in [https://github.com/src-d/go-mysql-
server](https://github.com/src-d/go-mysql-server) , a Go library for putting a
MySQL-compatible frontend over a custom virtual data source.

~~~
titanix2
I know nothing about Go but the project is indeed interesting. Thanks.

------
lmilcin
Writing a database is a good exercise I could recommend to everybody.

Writing a general purpose database is very difficult. On the other hand, if
you know what kind of load you can expect and when you have an idea of
performance and consistency requirements it may become much simpler.

I have rolled out my own transactional database for a commercial project. This
was a decade ago, the device had at most 1MB of memory budget for compiled
code, stack, heap, and database storage. The database was to be stored on a
non-wear-leveled flash requiring precise control of which bytes and how
frequently were erased. We required ability to store small objects (tens to
few hundreds of bytes at most) of varying size and be able to record changes
to multiple objects at the same time, transactionally (think modifying two
accounts). The requirement was that the database stayed consistent no matter
what happened, regardless of any application error or power cycles. These
devices were operated by hundreds of thousands of customers and power cycled
liberally whenever anything didn't work.

The database run in constant memory. It allowed the application to register
hooks to calculate deltas between two versions of records and to apply delta
to base version to produce result version. The database didn't care how the
data was encoded, this was application detail.

The database was stored as transaction log. During operation, all writes were
directed to the end of the log. When the log reached half of the available
memory, all live records would be coalesced and written to the other half of
the space and the original space would be erased.

The read algorithmic complexity was horrible with some algorithms being
cubic(!) complexity. This was absolutely ok, since there could never be enough
records that would make it a problem. Having extremely simple algorithms
allowed for the entire code to be very simple and compact. Everything fit
about 200 lines or so lines of code.

I really enjoyed the project.

~~~
fizwhiz
How long did it take to implement, and how large was the team?

~~~
lmilcin
I did this alone. The database itself was only about 200 lines of code but
took about three weeks of effort which included writing the initial
implementation, refactoring it, writing performance test, optimizing the
implementation for the intended load and integrating in the application.

The application was much larger, about 40k LOC and took about two and a half
years to complete, certify and deploy to production.

~~~
Doubl
A database in 200 lines of code is incredible. I'd love to see that, even
better if it was annotated.

~~~
lmilcin
As I said, there is nothing spectacular about this.

This notion that a database must be a difficult and complex thing and best
left for experts is just turning people off from exploring and learning.

When it doesn't need to be general purpose, has only one thread accessing it,
stores at most 1MB of data, has only KV access (so basically persistent
hashmap) and doesn't have to be blazing fast, the resulting code may be very
simple.

------
chirau
I believe the simplest database is this:

    
    
        #!/bin/bash
        db_set () {
             echo "$1,$2" >> database
        }
        db_get () {
             grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
        }

~~~
pmccarren
Let's optimize!

    
    
        #!/bin/bash
        db_set () {
             echo "$1,$2" >> database
        }
        db_get () {
             # read database line by line in reverse, stopping at first match
             tac database | grep -m 1 "^$1," | sed -e "s/^$1,//"
        }

------
stevoski
Awesome stuff.

Java programmers: if you are curious about how an SQL database works, take a
look at H2's source
([https://github.com/h2database/h2database](https://github.com/h2database/h2database)).
You can even try to fix an outstanding issue. It is illuminating - and much
easier than you would have thought.

~~~
thomascgalvin
For a while now, I've used H2 as the in-memory database for a bunch of
Selenium and Spring Boot integration tests. The primary advantages are 1.
speed and 2. simplicity.

Simplicity is the big thing here; I can configure Spring to launch an in-
memory H2 database and not have to worry about state, clearing and reseeding,
or any of that stuff. The integration tests can just run.

I've been so happy with it that I'm pushing a customer to use H2 in
production. I would normally use Postgres, but my customer doesn't really have
an IT staff, and an embedded database will be a lot easier for them to keep up
and running.

~~~
esfandia
Wouldn't your customer lose all their data if the app crashes?

~~~
geeio
H2 has a disk persistence mode.

Though I’d probably use SQLite for this use case

------
mschaef
Really glad to see this... this has often seemed to me to be a nice sort of
capstone project for an undergraduate degree program. (Covers real
implementation of everything from networking, datastructures, language
parsing, etc.)

------
MuffinFlavored
I can't tell if this is "finished" or not.

It'd be cool if somebody benchmarked this to show just how "fast" SQLite is
comparatively, even if they are by no means equal functionality wise.

~~~
bshipp
I tend to use postgresql in my daily work now but I will always carry a flame
for sqlite, my first DB experience. I still use it for rapidly parsing and
normalizing huge text files, as well as the backend for numerous one-off
webscrapers.

I contend there are few faster ACID-compliant database options for high-speed
input, without moving toward a complex multi-machine distributed solution.
Combined with a high-speed SSD or an in-memory database, that single write-
worker can often put any high-concurrency single-machine database to shame, as
long as some thought is put into the code.

There is something deeply satisfying about writing a short python script and
then watching a 5gb text file or a huge directory of log files get chewed up
and spit out into a tidy little perfectly normalized and portable database.
Bliss.

Just between you and me, sometimes I download a hundred thousand files to
process just for fun. It's not right.

~~~
MuffinFlavored
Do you know how Elastisearch/Splunk are all the rage these days? I feel like
those are super heavy. What are your thoughts on
[https://github.com/brandonros/log-indexer](https://github.com/brandonros/log-
indexer)?

~~~
bshipp
As with anything, it depends on what you want to do. One of my SQLite
databases has a text column that's 500 million rows (it's an 80GB file). Even
with a standard COLLATE NOCASE b-tree index, searching out a word in the
middle of a tuple takes a ridiculously long time (although finding that word
at the beginning of the tuple is amazingly quick).

A solution like Elasticsearch, while indeed heavier, is designed to deal with
that kind of issue so, in my humble opinion, makes sense when you actually
need a broad search capability.

As for the solution in [https://github.com/brandonros/log-
indexer](https://github.com/brandonros/log-indexer), it seems like a tidy
little solution to the problem of having to parse log files later. Not only
does it provide a rapid way of looking up information when you're trying to
solve a problem, but SQLite also acts as a compression archive of sorts as
well.

I didn't look all that closely at the internals of the index.js script in that
repo, but if it's still storing any of the records as JSON you can still
access the internal data held inside the log record with SQLite's own JSON1
extension
([https://www.sqlite.org/json1.html](https://www.sqlite.org/json1.html)). That
would save a pile of time compared to a broad-based directory search for text
file contents. A few lines for a SQL query and instantly you'd have every
occurrence of that error since the program started logging. I'd think that's a
very useful tool!

~~~
MuffinFlavored
> Even with a standard COLLATE NOCASE b-tree index, searching out a word in
> the middle of a tuple takes a ridiculously long time (although finding that
> word at the beginning of the tuple is amazingly quick).

> A solution like Elastisearch, while indeed heavier, is designed to deal with
> that kind of issue so, in my humble opinion, makes sense when you actually
> need a broad search capability.

Hmm... at a high level, how do Elasisearch a) consist huge amounts of data to
disk and b) making searching 80gb and way way up of logs fast?

~~~
bshipp
Elasticsearch uses Lucene at the core of it's search capability.
[https://lucene.apache.org/core/](https://lucene.apache.org/core/)

Basically it examines each record you want to index and generates a set of
features and metadata to aid in rapidly finding what you're looking for. Some
of the features listed on that site are:

    
    
      ranked searching -- best results returned first
      many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
      fielded searching (e.g. title, author, contents)
      sorting by any field
      multiple-index searching with merged results
      allows simultaneous update and searching
      flexible faceting, highlighting, joins and result grouping
      fast, memory-efficient and typo-tolerant suggesters
      pluggable ranking models, including the Vector Space Model and Okapi BM25
      configurable storage engine (codecs)
    

Basically, a standard non-index database search (i.e. using LIKE) is fairly
stupid and generally defaults to a full-text scan of each row in the table. A
b-tree index
([https://en.wikipedia.org/wiki/B-tree](https://en.wikipedia.org/wiki/B-tree)),
for columns with only one word, dramatically reduces the scope of that search
field and makes such searches almost instantaneous. However, it doesn't help
with multi-word text fields and the database engine has to go back to the very
slow full-table scan and check each record individually.

Lucene notes the position of each word in a text string and has a number of
techniques to figure out which records have words similar to the ones you're
looking for, at relatively similar positions (i.e. close to each other, in the
same order, etc) and narrows the search scope to those records. In short, it's
not searching the individual records, it's searching it's own database for
what it already knows about the contents of that record.

EDIT: however, for a log search a b-tree would still be fine, because every
log entry generally would have a similar structure. For example, if you're
looking for a specific error message, that message is not going to
dramatically change from one moment to the next. So having that table/column
indexed with a b-tree would allow you to search for that specific error string
and quickly pull up all the results, regardless of size.

Just make sure you set up your SQL query to have a line like:

    
    
      WHERE column.errorvalue LIKE 'Generic Error Code 1%'
    

instead of:

    
    
      WHERE column.errorvalue LIKE '%Code 1%'
    

As soon as you put that first '%' sign in front of the first letter SQLite
ignores any index you have for that table and does a full table scan, which is
very slow.
([https://www.sqlite.org/optoverview.html#the_like_optimizatio...](https://www.sqlite.org/optoverview.html#the_like_optimization))

That said, if you think you're going to get to 80GB you might want to look at
an alternatively solution to SQLite or, at the very least, version your
databases by month and then use the ATTACH DATABASE command if you need to
mine historical data (or even write a small script to search each database
individually). SQLite isn't really designed to separate tables across
different disks and it's not fun to regularly have to back up 80GB files.

~~~
bshipp
@snowflakeonice

What a time to be alive! I've just started populating the sqlite virtual fts
table. I will report back with my findings!

~~~
mosselman
Any news. I am very interested

~~~
bshipp
Okay it finally (30 seconds ago!) finished indexing.

The reason it ran out of disk space is that I included 3 columns to index on
(in this case: name, path, filename) and it ballooned my 66GB db to 185GB!

However, every single query afterward was instantaneous. Literally
milliseconds to pull 90K results from three full-text columns across 500
million rows. And the search words were anywhere in the column, not just the
beginning. Incredible. I'm simply blown away.

All I did was this single command: CREATE VIRTUAL TABLE fts USING fts5(hash
UNINDEXED, title, path, filename);

and then wrote a normal INSERT statement to populate it like I would a regular
table. It was so painless.

Just be aware that each column appears to drastically increase the size of the
DB.

I'm so excited! I have so many other databases to try this out on!

EDIT: now I'm going to move it off the SSD to a mechanical drive and see if it
still holds the same performance.

~~~
mosselman
Wow, thanks for posting your results. This all sounds very promising. I am
thinking about gathering a few pieces of data from custom sensors around the
house in order to determine what I can do to cut down on energy costs (well
that is the excuse ;)) and instead of using kibana and the like I’d rather use
a lightweight SQLite dB, your stats make me hopeful of using it for this.

Also it is all just very interesting.

------
alecco
CMU Andy Pavlo’s lectures on YouTube are great and also shows state of the
art. Start there.

~~~
zerr
Not sure, I've watched several lectures and I've got an impression that they
are too theoretic. As I remember they even don't implement the database
system, but some very specific part in some existing system.

~~~
charlysl
But can you really build a db system without knowing some theory first (at the
very least, key concepts and implementations), and learning from past mistakes
and hits, and learning db history and know where and why things might be
going?

~~~
zerr
Of course I see your point. Nevertheless, there are theoretic and
practical/engineering approaches to education. The above mentioned lectures
lacked the latter one, at least that's my impression.

~~~
charlysl
But those are just the lectures. The course also includes several substantial
labs that consist of extending sqlite ([1], under Projects; starting code in
[2]). This course gives you both. I don't think you have a chance of doing
these labs without knowing some non-trivial theory first, never mind creating
a production ready db that adds any value (and this includes algorithm and OS
theory/systems programming, and maybe distributed systems too, prereqs for any
db course, and also in practice).

My motto:

If you skip the theory, you will start sooner.

If you know the theory, you will finish sooner (or at all).

[1]
[https://15445.courses.cs.cmu.edu/fall2018/assignments.html](https://15445.courses.cs.cmu.edu/fall2018/assignments.html)

[2] [https://github.com/Nov11/15-445](https://github.com/Nov11/15-445)

~~~
zerr
But are the lab/programming part of the course available as lecture videos?
Such stuff are usually left as onsite students privilege. On the other hand, I
think Sciore's book does a good job of intermingling between theory and
practice, thus the reader stays engaged all the time.

~~~
charlysl
Fair enough, thank you for pointing this out, I agree that the Sciore book
might be a better learning approach to the lab side of things.

How different is this book from his class notes, titled "Database Management:
A Systems Approach Using Java"?

I wonder if his SimpleDB [1] is the same SimpleDB that is used in the MIT and
Berkeley db intro courses, that would be a great match.

Do you happen to know of any other books, not about databases, that do a
similar job for other complex systems? For instance, for OSs there is xv6 and
its companion book [2].

EDIT:

Just one observation: the first lab in the CMU course consists of implementing
a Buffer Pool Manager. This basically means bypassing the OS virtual memory
management, which is too general and far from optimal for a disk storage db,
and implementing you own, in this case a sqlite extension in C++ (the idea
seems to me sort of similar to implementing a bespoke, optimized malloc, not
exactly, of course). Now, Sciore's SimpleDB is written in Java, so I suppose
that this can't be done, so it's a much simpler system, and passes over what
is a key issue in a production class system.

This is also relevant for one more reason, which is that, although this CMU
intro course doesn't really focus on memory caching issues (disk access trumps
the cache), Pavlo's follow up course, Advanced DBs [3], is about in-memory db
systems, and here the memory cache is very much a central issue, and I doubt,
again, that a db written in Java can take this into account, I suspect that
you need a systems programming language to handle this. So, this might be a
reason to favour the CMU course, I am not sure, I am new to the game.

[1]
[http://www.cs.bc.edu/~sciore/papers/SIGCSE07.pdf](http://www.cs.bc.edu/~sciore/papers/SIGCSE07.pdf)

[2]
[https://pdos.csail.mit.edu/6.828/2018/xv6.html](https://pdos.csail.mit.edu/6.828/2018/xv6.html)

[3]
[https://15721.courses.cs.cmu.edu/spring2019/](https://15721.courses.cs.cmu.edu/spring2019/)

~~~
charlysl
Well, for what it's worth, I've just had a very quick look at SimpleDB, and it
does have a buffer manager after all (in simpledb.buffer). I am glad about
this, it is looking more and more interesting to me, and I am more and more
inclined to following this approach. Also, the course notes I alluded to above
are looking like a good substitute for the book, because they also seem to
discuss the source code in great detail, as mentioned in other comments.

The MIT OCW SimpleDB seems different from Sciore's though, simpler, but is
also in Java, I wonder whether they have adapted it or whether it is entirely
different system.

Sciore's old db course, might help figure out how much material to pack into a
terms-worth of learning, but the notes aren't there, you will find then in the
internet if you know where to look :)
[https://web.archive.org/web/20081028011620/http://www.cs.bc....](https://web.archive.org/web/20081028011620/http://www.cs.bc.edu/~sciore/courses/cs357/)

------
eunos
Very interesting! However I wonder whether it's possible to do it on OOP
languages such as Java.

~~~
bunderbunder
Why wouldn't it be?

------
mathgladiator
I love this kind of stuff since I started my own database from a different
angle. Namely, I code generate everything do all indexing in memory. It's been
fun, and rock solid for single host monolith web app.

------
gigatexal
This is so cool! I’ve been wanting to do just this!

------
kakaoscott
Hey thanks a lot for making an education program for a simpleton like myself.

------
parthdesai
does anyone know if there is a similar tutorial in Rust?

~~~
nkozyra
Of course[1], but this tutorial isn't about the language, per se, you should
be able to translate it pretty trivially.

These sorts of tutorials are more about the inner workings of database
operation; I recall doing something like this in grad school but you're
reminded quickly how more complex databases are than you think about when all
you're doing is using them. And hell, I'm not even that good at using them.

[1] [http://nikhilism.com/post/2016/writing-simple-database-in-
ru...](http://nikhilism.com/post/2016/writing-simple-database-in-rust-part-1/)

~~~
parthdesai
Yeah, looks like translating it is a great way to learn both rust and how a db
actually works internally.

~~~
chalst
I'd say that if you want to learn Rust, then this is an excellent project. I'm
not convinced that if you want to learn how databases work that Rust is a
better language than C: it's more complex in that the checker makes you think
about lots of non-db things, and simpler in maybe a non-helpful way, in that
the language provides lots of high-level facilities that it might be good to
work out explicitly.

~~~
sten
Could I trouble you to give me an example of the second one? Something rust
provides that might be better to learn the hard way?

~~~
chalst
I think that dynamically sized types make life easier in places where it might
be good to figure out how to manipulate the concrete data.

[https://doc.rust-lang.org/reference/dynamically-sized-
types....](https://doc.rust-lang.org/reference/dynamically-sized-types.html)

~~~
sten
Thanks

