Hacker News new | past | comments | ask | show | jobs | submit login
Let’s Build a Simple Database (2017) (cstack.github.io)
783 points by AlexeyBrin 50 days ago | hide | past | web | favorite | 88 comments



I have spent quite a lot of time building a database in Java. It's not a small undertaking.

I've gone through as many books and papers as I can, and there is only one, one, that gives an overview and actual examples of building a system that processes SQL. Everything else is theoretical or so general that you don't get enough direction to get started.

It's "Database Design and Implementation" by Sciore. It's a little dated, but not by much. It is an underappreciated resource.


You may find MIT's Database Systems course helpful, the labs consist of building several db components in Java (and the prof is Stonebraker, of Postgres fame):

https://ocw.mit.edu/courses/electrical-engineering-and-compu...

https://youtu.be/F3XGUPll6Qs

http://blog.marcua.net/post/117671929


Stonebraker is the Donald Knuth of databases.


You could have read the source to the processor of one of the many open source SQL databases though (off the top of my head: MySQL, PostgreSQL, SQLite, Derby, H2, HSQLDB). Many of them have very clean code, SQLite would be my choice if you know C otherwise one of the java databases (though I've never actually read their source)


For some, reading a body of source code without any guidence about what path to walk or where to focus your attention is really quite a hard way to learn.


SQLite is quite easy to read; MySQL on the other hand I find very annoying. I used to run a very large (millions of accounts) shared hosting service for which I did a lot of MySQL customizing to remove limitations or remove features that would compromise our systems if abused etc and it was a nightmare to work with that codebase. I had to and learned a lot, but I would not do that again if possible.


It is an essential skill for our job though, being able to dive into a new code base, form a mental model of it, and modify it either because of new feature requirements or simply bug fixing.

And it is a skill, I also had a lot of trouble with it initially, but diving into several open source projects was the perfect way to practice it, and it helped me tremendously in my professional development.


Reading the source doesn't tell you why they chose that route or what the tradeoffs are.


Well, depends on the quality of the documentation I guess.


I agree with you, in that it's an essential skill for our job. However, from my experience at least, I've found that going head first into the source code of something you don't really understand how it works, is not the best course of action.

For example, I would make sure to read about B+ trees first, before reading the implementation of indexes in SQLite or PostgreSQL.


Agreed. A good tip for this is to try to find the earliest version/commit of the project you are interested into.


Most code bases are business logic and not nearly as complex or formal as a db. DBs, OSs and compilers are systems that should be studied to be effective in the code.


The thing about it that I find hard is knowing how to judge the important bits to read and knowing where to direct my attention as I read. Any tips on learning that?

I find that without knowing where to direct my attention, it is really hard to avoid getting distracted.


This.

Here's an anecdote: one my friends worked for a startup that at the height of NoSQL hype (circa 2012) decided to build their product on Cassandra. The data they dealt with was highly relational with hard requirements on consistency. He told me working on a NoSQL database taught him a lot about relational databases, because he scoured the Postgres source code to learn how to implement consistent and durable commits across distributed/sharded nodes.


OK I'll say it... Your friend chose wrong. HORRIBLY wrong. I have a lot of experience with both Postgres and Cassandra and the fact that your friend chose Cassandra despite their data model being relational AND having high consistency requirements shows that they chose Cassandra for the wrong reason(s).

Before you come back saying something about scalability, I've run both databases at way above average scale so what you choose should come down to what you need, which is not what was done in this case (your friend)


I think his comment was mostly about the fact that you can learn a lot about relational databases by reading Postgresql's source. He did say at the height of NoSQL hype. To me, that choice of words gives subtle hints as to the historical context and a disclaimer about the pertinence of that decision. Thank you for your input though, it provides food for thought for the next person mulling over a similar decision.


It sounds like his friend didn't choose anything, his bosses did.


>Your friend chose wrong. HORRIBLY wrong.

Yeah, obviously. Also, he didn't choose anything. The tech stack was mandated on him.


Sometimes the best way to learn the value of something is to try really hard not to use it.


Agree! I've accidentally found this gem. It is really written from the actual hands-on engineering/programming perspective which covers all aspects, from the front-end to the actual storage implementation.


Sciore is a great teacher! I had him at BC 20 years ago for both database systems and object oriented programming. I didn't know he put out a few books, I'll have to check them out.


I've been looking for this type of book. Thanks!


Folks interested in this sort of thing might like Anatomy of a Database System by Stonebreaker et al, which I think is the most concise treatment on the subject: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

Also, see Redbook for the latest ensemble of research ideas in databases. http://www.redbook.io


The Redbook looks like an awesome resource. I think it's a real shame that find resources at the intermediate level of database system design is so hard; feels very much like the "how to draw an owl" meme


> it's a real shame that find resources at the intermediate level of database system design is so hard

This is a common problem across subjects. Learning resources need to be curated topic-wise, format-wise, & difficulty-wise. Some of us have started doing this over here: https://github.com/learn-awesome/learn-awesome

Join in and help curate the database systems topic? https://github.com/learn-awesome/learn-awesome/blob/master/d...


Also, Andy Pavlo's CMU Database Group courses on Youtube.


Writing a database is a good exercise I could recommend to everybody.

Writing a general purpose database is very difficult. On the other hand, if you know what kind of load you can expect and when you have an idea of performance and consistency requirements it may become much simpler.

I have rolled out my own transactional database for a commercial project. This was a decade ago, the device had at most 1MB of memory budget for compiled code, stack, heap, and database storage. The database was to be stored on a non-wear-leveled flash requiring precise control of which bytes and how frequently were erased. We required ability to store small objects (tens to few hundreds of bytes at most) of varying size and be able to record changes to multiple objects at the same time, transactionally (think modifying two accounts). The requirement was that the database stayed consistent no matter what happened, regardless of any application error or power cycles. These devices were operated by hundreds of thousands of customers and power cycled liberally whenever anything didn't work.

The database run in constant memory. It allowed the application to register hooks to calculate deltas between two versions of records and to apply delta to base version to produce result version. The database didn't care how the data was encoded, this was application detail.

The database was stored as transaction log. During operation, all writes were directed to the end of the log. When the log reached half of the available memory, all live records would be coalesced and written to the other half of the space and the original space would be erased.

The read algorithmic complexity was horrible with some algorithms being cubic(!) complexity. This was absolutely ok, since there could never be enough records that would make it a problem. Having extremely simple algorithms allowed for the entire code to be very simple and compact. Everything fit about 200 lines or so lines of code.

I really enjoyed the project.


How long did it take to implement, and how large was the team?


Not the GP, but I'll share my story, as well.

In early 2001, for the project I was working on then, we needed something like a SQLite-lite. (SQLite existed, but was less than a year old and was still way bigger than our needs.) One of the other engineers and I paired to build something that worked well enough to run with over the course of a weekend, and then polished and improved it along the way, until the project was abandoned a few months later, because of the dot-com bomb.


I did this alone. The database itself was only about 200 lines of code but took about three weeks of effort which included writing the initial implementation, refactoring it, writing performance test, optimizing the implementation for the intended load and integrating in the application.

The application was much larger, about 40k LOC and took about two and a half years to complete, certify and deploy to production.


A database in 200 lines of code is incredible. I'd love to see that, even better if it was annotated.


As I said, there is nothing spectacular about this.

This notion that a database must be a difficult and complex thing and best left for experts is just turning people off from exploring and learning.

When it doesn't need to be general purpose, has only one thread accessing it, stores at most 1MB of data, has only KV access (so basically persistent hashmap) and doesn't have to be blazing fast, the resulting code may be very simple.


Another exercise that's fun is playing with Sqlite virtual tables and/or the Sqlite VFS layer.

Virtual tables: https://www.sqlite.org/draft/vtab.html

A simple example of using VFS to add a useful feature: https://github.com/Metalnem/sqlite-vfsdemo



Thanks a lot. I planned to implement virtual SQL tables for a project and wasn’t aware of existing product. This probably will save me a lot of time.


Don't forget about the 'ATTACH DATABASE' option as well. It is great for linking up individual SQLite databases when you need to do a broad-based query across multiple sources.

https://www.sqlite.org/lang_attach.html

If you use a GUI front-end like SQLiteStudio (https://sqlitestudio.pl/), attaching databases is ridiculously easy to do. Just add the database name from the left column in front of the table or table.column label for the database you want to reference. Most SQLite GUI's will do the attach command for you internally.

I add the database name to my queries instinctively now in SQLiteStudio, so that no matter which DB is currently in focus/opened the query runs on the one I intended it to run on.


You might also be interested in https://github.com/src-d/go-mysql-server , a Go library for putting a MySQL-compatible frontend over a custom virtual data source.


I know nothing about Go but the project is indeed interesting. Thanks.


I believe the simplest database is this:

    #!/bin/bash
    db_set () {
         echo "$1,$2" >> database
    }
    db_get () {
         grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
    }


Let's optimize!

    #!/bin/bash
    db_set () {
         echo "$1,$2" >> database
    }
    db_get () {
         # read database line by line in reverse, stopping at first match
         tac database | grep -m 1 "^$1," | sed -e "s/^$1,//"
    }


Awesome stuff.

Java programmers: if you are curious about how an SQL database works, take a look at H2's source (https://github.com/h2database/h2database). You can even try to fix an outstanding issue. It is illuminating - and much easier than you would have thought.


For a while now, I've used H2 as the in-memory database for a bunch of Selenium and Spring Boot integration tests. The primary advantages are 1. speed and 2. simplicity.

Simplicity is the big thing here; I can configure Spring to launch an in-memory H2 database and not have to worry about state, clearing and reseeding, or any of that stuff. The integration tests can just run.

I've been so happy with it that I'm pushing a customer to use H2 in production. I would normally use Postgres, but my customer doesn't really have an IT staff, and an embedded database will be a lot easier for them to keep up and running.


Yeah, H2 is amazingly good. I've used it for things that most Java developers insist it can't be used for. :)

Just understand that any use that even vaguely resembles production will usually get pushback from people who think it is only a dev-level in-memory-only toy that causes constant slowdowns. This will be true even if it is only used for a few hundred rows worth of data.


Wouldn't your customer lose all their data if the app crashes?


H2 has a disk persistence mode.

Though I’d probably use SQLite for this use case


Hey @stevoski can you recommend one of the very basic issues to look at for someone to ramp up on the codebase ? Thanks


Really glad to see this... this has often seemed to me to be a nice sort of capstone project for an undergraduate degree program. (Covers real implementation of everything from networking, datastructures, language parsing, etc.)


I can't tell if this is "finished" or not.

It'd be cool if somebody benchmarked this to show just how "fast" SQLite is comparatively, even if they are by no means equal functionality wise.


I tend to use postgresql in my daily work now but I will always carry a flame for sqlite, my first DB experience. I still use it for rapidly parsing and normalizing huge text files, as well as the backend for numerous one-off webscrapers.

I contend there are few faster ACID-compliant database options for high-speed input, without moving toward a complex multi-machine distributed solution. Combined with a high-speed SSD or an in-memory database, that single write-worker can often put any high-concurrency single-machine database to shame, as long as some thought is put into the code.

There is something deeply satisfying about writing a short python script and then watching a 5gb text file or a huge directory of log files get chewed up and spit out into a tidy little perfectly normalized and portable database. Bliss.

Just between you and me, sometimes I download a hundred thousand files to process just for fun. It's not right.


Do you know how Elastisearch/Splunk are all the rage these days? I feel like those are super heavy. What are your thoughts on https://github.com/brandonros/log-indexer?


As with anything, it depends on what you want to do. One of my SQLite databases has a text column that's 500 million rows (it's an 80GB file). Even with a standard COLLATE NOCASE b-tree index, searching out a word in the middle of a tuple takes a ridiculously long time (although finding that word at the beginning of the tuple is amazingly quick).

A solution like Elasticsearch, while indeed heavier, is designed to deal with that kind of issue so, in my humble opinion, makes sense when you actually need a broad search capability.

As for the solution in https://github.com/brandonros/log-indexer, it seems like a tidy little solution to the problem of having to parse log files later. Not only does it provide a rapid way of looking up information when you're trying to solve a problem, but SQLite also acts as a compression archive of sorts as well.

I didn't look all that closely at the internals of the index.js script in that repo, but if it's still storing any of the records as JSON you can still access the internal data held inside the log record with SQLite's own JSON1 extension (https://www.sqlite.org/json1.html). That would save a pile of time compared to a broad-based directory search for text file contents. A few lines for a SQL query and instantly you'd have every occurrence of that error since the program started logging. I'd think that's a very useful tool!


> Even with a standard COLLATE NOCASE b-tree index, searching out a word in the middle of a tuple takes a ridiculously long time (although finding that word at the beginning of the tuple is amazingly quick).

> A solution like Elastisearch, while indeed heavier, is designed to deal with that kind of issue so, in my humble opinion, makes sense when you actually need a broad search capability.

Hmm... at a high level, how do Elasisearch a) consist huge amounts of data to disk and b) making searching 80gb and way way up of logs fast?


ElasticSearch indexing is really quite slow (everything running over HTTP probably does not necessarily help, though you are using a somewhat botchy batch API anyway) -- think 5k items per second for something like e.g. HN posts --, it really is tuned for retrieval performance. Even on really quite mediocre hardware you can do full-text queries across ten(s) of GB and they'll rarely take longer than, say, 50 ms.


Elasticsearch uses Lucene at the core of it's search capability. https://lucene.apache.org/core/

Basically it examines each record you want to index and generates a set of features and metadata to aid in rapidly finding what you're looking for. Some of the features listed on that site are:

  ranked searching -- best results returned first
  many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
  fielded searching (e.g. title, author, contents)
  sorting by any field
  multiple-index searching with merged results
  allows simultaneous update and searching
  flexible faceting, highlighting, joins and result grouping
  fast, memory-efficient and typo-tolerant suggesters
  pluggable ranking models, including the Vector Space Model and Okapi BM25
  configurable storage engine (codecs)
Basically, a standard non-index database search (i.e. using LIKE) is fairly stupid and generally defaults to a full-text scan of each row in the table. A b-tree index (https://en.wikipedia.org/wiki/B-tree), for columns with only one word, dramatically reduces the scope of that search field and makes such searches almost instantaneous. However, it doesn't help with multi-word text fields and the database engine has to go back to the very slow full-table scan and check each record individually.

Lucene notes the position of each word in a text string and has a number of techniques to figure out which records have words similar to the ones you're looking for, at relatively similar positions (i.e. close to each other, in the same order, etc) and narrows the search scope to those records. In short, it's not searching the individual records, it's searching it's own database for what it already knows about the contents of that record.

EDIT: however, for a log search a b-tree would still be fine, because every log entry generally would have a similar structure. For example, if you're looking for a specific error message, that message is not going to dramatically change from one moment to the next. So having that table/column indexed with a b-tree would allow you to search for that specific error string and quickly pull up all the results, regardless of size.

Just make sure you set up your SQL query to have a line like:

  WHERE column.errorvalue LIKE 'Generic Error Code 1%'
instead of:

  WHERE column.errorvalue LIKE '%Code 1%'
As soon as you put that first '%' sign in front of the first letter SQLite ignores any index you have for that table and does a full table scan, which is very slow. (https://www.sqlite.org/optoverview.html#the_like_optimizatio...)

That said, if you think you're going to get to 80GB you might want to look at an alternatively solution to SQLite or, at the very least, version your databases by month and then use the ATTACH DATABASE command if you need to mine historical data (or even write a small script to search each database individually). SQLite isn't really designed to separate tables across different disks and it's not fun to regularly have to back up 80GB files.


Would any of the various full-text search plugins (e.g. https://www.sqlite.org/fts5.html) make searching your 80GB database reasonable?


@snowflakeonice

What a time to be alive! I've just started populating the sqlite virtual fts table. I will report back with my findings!


Any news. I am very interested


Okay it finally (30 seconds ago!) finished indexing.

The reason it ran out of disk space is that I included 3 columns to index on (in this case: name, path, filename) and it ballooned my 66GB db to 185GB!

However, every single query afterward was instantaneous. Literally milliseconds to pull 90K results from three full-text columns across 500 million rows. And the search words were anywhere in the column, not just the beginning. Incredible. I'm simply blown away.

All I did was this single command: CREATE VIRTUAL TABLE fts USING fts5(hash UNINDEXED, title, path, filename);

and then wrote a normal INSERT statement to populate it like I would a regular table. It was so painless.

Just be aware that each column appears to drastically increase the size of the DB.

I'm so excited! I have so many other databases to try this out on!

EDIT: now I'm going to move it off the SSD to a mechanical drive and see if it still holds the same performance.


Wow, thanks for posting your results. This all sounds very promising. I am thinking about gathering a few pieces of data from custom sensors around the house in order to determine what I can do to cut down on energy costs (well that is the excuse ;)) and instead of using kibana and the like I’d rather use a lightweight SQLite dB, your stats make me hopeful of using it for this.

Also it is all just very interesting.


Final Comment on this: after moving it to a slow mechanical drive the query speed dropped dramatically, as expected. What was almost instantaneous on the SSD took anywhere from 40 to 120 seconds on the slower drive. However, previously the dumb full table scan took anywhere from 120-240 seconds on the SSD and I never even bothered trying on the slow drive!


Hahaha, it's still running. I ended up running out of drive space on the SSD I started it on, which began a lengthy process of shuffling things around, cleaning indexes and VACUUMing two very large databases (60 and 80GB, which ended up running all night), and finally (as of this morning) restarting the process of populating the FTS table. I will respond when it's finally done, I promise!


The end of the last published chapter:

> The next step should be splitting internal nodes. Until then!


CMU Andy Pavlo’s lectures on YouTube are great and also shows state of the art. Start there.


Not sure, I've watched several lectures and I've got an impression that they are too theoretic. As I remember they even don't implement the database system, but some very specific part in some existing system.


But can you really build a db system without knowing some theory first (at the very least, key concepts and implementations), and learning from past mistakes and hits, and learning db history and know where and why things might be going?


Of course I see your point. Nevertheless, there are theoretic and practical/engineering approaches to education. The above mentioned lectures lacked the latter one, at least that's my impression.


But those are just the lectures. The course also includes several substantial labs that consist of extending sqlite ([1], under Projects; starting code in [2]). This course gives you both. I don't think you have a chance of doing these labs without knowing some non-trivial theory first, never mind creating a production ready db that adds any value (and this includes algorithm and OS theory/systems programming, and maybe distributed systems too, prereqs for any db course, and also in practice).

My motto:

If you skip the theory, you will start sooner.

If you know the theory, you will finish sooner (or at all).

[1] https://15445.courses.cs.cmu.edu/fall2018/assignments.html

[2] https://github.com/Nov11/15-445


But are the lab/programming part of the course available as lecture videos? Such stuff are usually left as onsite students privilege. On the other hand, I think Sciore's book does a good job of intermingling between theory and practice, thus the reader stays engaged all the time.


Fair enough, thank you for pointing this out, I agree that the Sciore book might be a better learning approach to the lab side of things.

How different is this book from his class notes, titled "Database Management: A Systems Approach Using Java"?

I wonder if his SimpleDB [1] is the same SimpleDB that is used in the MIT and Berkeley db intro courses, that would be a great match.

Do you happen to know of any other books, not about databases, that do a similar job for other complex systems? For instance, for OSs there is xv6 and its companion book [2].

EDIT:

Just one observation: the first lab in the CMU course consists of implementing a Buffer Pool Manager. This basically means bypassing the OS virtual memory management, which is too general and far from optimal for a disk storage db, and implementing you own, in this case a sqlite extension in C++ (the idea seems to me sort of similar to implementing a bespoke, optimized malloc, not exactly, of course). Now, Sciore's SimpleDB is written in Java, so I suppose that this can't be done, so it's a much simpler system, and passes over what is a key issue in a production class system.

This is also relevant for one more reason, which is that, although this CMU intro course doesn't really focus on memory caching issues (disk access trumps the cache), Pavlo's follow up course, Advanced DBs [3], is about in-memory db systems, and here the memory cache is very much a central issue, and I doubt, again, that a db written in Java can take this into account, I suspect that you need a systems programming language to handle this. So, this might be a reason to favour the CMU course, I am not sure, I am new to the game.

[1] http://www.cs.bc.edu/~sciore/papers/SIGCSE07.pdf

[2] https://pdos.csail.mit.edu/6.828/2018/xv6.html

[3] https://15721.courses.cs.cmu.edu/spring2019/


Well, for what it's worth, I've just had a very quick look at SimpleDB, and it does have a buffer manager after all (in simpledb.buffer). I am glad about this, it is looking more and more interesting to me, and I am more and more inclined to following this approach. Also, the course notes I alluded to above are looking like a good substitute for the book, because they also seem to discuss the source code in great detail, as mentioned in other comments.

The MIT OCW SimpleDB seems different from Sciore's though, simpler, but is also in Java, I wonder whether they have adapted it or whether it is entirely different system.

Sciore's old db course, might help figure out how much material to pack into a terms-worth of learning, but the notes aren't there, you will find then in the internet if you know where to look :) https://web.archive.org/web/20081028011620/http://www.cs.bc....


Yes, xv6 book is on my reading list :) As for other books - Appel's Modern Compiler Implementation in ML and Cooper/Torczon's Engineering a Compiler come to mind.


Very interesting! However I wonder whether it's possible to do it on OOP languages such as Java.


Why wouldn't it be?


It's already almost the case. Juste rename structure X in class X and make every X_xxx functions a xxx method of class X and you're almost set.



The good news is that an OOP language is a superset of imperative languages like C.

The more general answer is that all Turing-complete languages are functionally equivalent. The misery level varies, as does the applicability to any given project, but anything you can do in C you can also do in Java, Python, Lisp, Erlang, Forth, ad nauseum.


I love this kind of stuff since I started my own database from a different angle. Namely, I code generate everything do all indexing in memory. It's been fun, and rock solid for single host monolith web app.


This is so cool! I’ve been wanting to do just this!


Hey thanks a lot for making an education program for a simpleton like myself.


does anyone know if there is a similar tutorial in Rust?


Of course[1], but this tutorial isn't about the language, per se, you should be able to translate it pretty trivially.

These sorts of tutorials are more about the inner workings of database operation; I recall doing something like this in grad school but you're reminded quickly how more complex databases are than you think about when all you're doing is using them. And hell, I'm not even that good at using them.

[1] http://nikhilism.com/post/2016/writing-simple-database-in-ru...


Yeah, looks like translating it is a great way to learn both rust and how a db actually works internally.


I'd say that if you want to learn Rust, then this is an excellent project. I'm not convinced that if you want to learn how databases work that Rust is a better language than C: it's more complex in that the checker makes you think about lots of non-db things, and simpler in maybe a non-helpful way, in that the language provides lots of high-level facilities that it might be good to work out explicitly.


Could I trouble you to give me an example of the second one? Something rust provides that might be better to learn the hard way?


I think that dynamically sized types make life easier in places where it might be good to figure out how to manipulate the concrete data.

https://doc.rust-lang.org/reference/dynamically-sized-types....


Thanks


Considering that i'll have to learn C for this as well, i think going with Rust just looks more interesting.


And anything in Go?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: