
Writing a SQL database from scratch in Go - v3gas
https://notes.eatonphil.com/database-basics.html
======
eatonphil
Unexpected, but welcome, to see the first part in the series on here right
now. I just published the second part [0] today, featuring binary expressions
and WHERE filtering. It also updates the REPL to use a prettier table-printing
library and a readline implementation.

The repo [1] has some additional bare notes on architecture and links to
similar, more mature projects (primarily go-mysql-server and ramsql).

[0] [https://notes.eatonphil.com/database-basics-expressions-
and-...](https://notes.eatonphil.com/database-basics-expressions-and-
where.html)

[1] [https://github.com/eatonphil/gosql](https://github.com/eatonphil/gosql)

~~~
bogomipz
This is great. Do you plan on continuing this blog series? If so I hope you
post it here. Cheers.

~~~
eatonphil
Definitely! I typically do post here. There are also links for various ways to
subscribe on the site itself (e.g. twitter, email).

------
cube2222
Great blog post!

I'd just like to add for the curious, that usually you'd use goyacc for
parsing SQL.

And most serious SQL projects in Go have started with the SQL parser from
vitess and adapted it to their use case (which is just funny trivia, but for
anything big, I recommend it, did the same for OctoSQL [0]).

[0]:
[https://github.com/cube2222/octosql](https://github.com/cube2222/octosql)

~~~
tbrock
Why did all of the Golang SQL parsers come from Vitess? I would love to know
more about this history.

Was it because they were the first and people just started using it or is it
the best for some reason?

~~~
eatonphil
SQL is a humongous spec. Any serious project would rather piggy-back off an
existing parser. Most of the interesting parts for most people is implementing
backends against in-memory, disk, S3, HDFS, etc.

~~~
eatonphil
Also, a contributor to dolthub shared on Reddit last time this post came up
that they originally wrote their own SQL frontend but gave up because it was
so much to maintain and get correct. They ended up going with go-mysql-server
which uses vitess.

[https://www.reddit.com/r/golang/comments/fgwwlx/database_bas...](https://www.reddit.com/r/golang/comments/fgwwlx/database_basics_writing_a_sql_database_from/fk84dq9/?context=3)

------
zoom6628
Probably the first article Ive ever read about lexical parsing with code that
i have actually understood - and I dont even program in golang. Great job.

~~~
throwlaplace
Lexical analysis using these bespoke methods (writing the finite state
machine) is so tedious and error prone. I don't have that much experience but
I just went through crafting interpreters and replaced this same module with
[https://github.com/J-F-Liu/pom](https://github.com/J-F-Liu/pom), which is a
parser combinator library, and it was way easier.

~~~
papaf
_which is a parser combinator library, and it was way easier_

I like parser combinators but a word of warning.

A lot of parser combinators (not all) have problems with recursive grammar
such as Json and SQL [1].

For instance, a JSON map can contain a JSON map and this can lead to stack
overflows when defining the grammar.

[1] [https://fsharpforfunandprofit.com/posts/understanding-
parser...](https://fsharpforfunandprofit.com/posts/understanding-parser-
combinators-4/#5-parsing-array)

~~~
throwlaplace
thanks for the link will take a look. like i said i don't have that much
experience.

------
alperakgun
Curious - would Rust be more appropriate than Go for such a task?

~~~
cultofmetatron
rust would be more appropriate if your intention was to use this in
production. Databases ad GC don't mix well. (its done but it makes tuning a
nightmare)

As much as I love rust, its learning curve is high and I'm sure Op doesn't
want to spend half his article teaching all the intricacies of types and the
borrow checker. Go is easy to learn over a weekend so its probably a better
medium for illustrating the concepts as everything is laid out simply.

~~~
matttproud
A garbage collector is not inherently incompatible with a low-latency
database. I was generally very happy with the performance of Go's garbage
collector enough to have built Prometheus, the time series database, on it
back in 2012, when the collector was considerably more naive.

[https://blog.golang.org/ismmkeynote](https://blog.golang.org/ismmkeynote)

------
userbinator
_selectKeyword keyword = "select"_

I don't work with Go, so this may be a requirement of the language that I
don't know, but whenever I see lines like this, it automatically brings up the
question _why?_ \--- do you really expect to need to rename the SELECT
keyword? Especially when it's named "selectKeyword". Why not just use the
string constant? Ditto for the others like "leftparenSymbol" \--- I see
there's explicit character constants in some of the other code too... it
reminds me of the classic anti-pattern like "int five = 5;".

Also, you may find the full SQL grammars interesting to look through --- they
are quite a bit more complex than the subset presented in the article:
[https://ronsavage.github.io/SQL/](https://ronsavage.github.io/SQL/)

~~~
ycnewsreader
Another reason is that by providing an identifier for this string literal,
misspellings of it can be detected by the compiler whereas there is nothing
tying separate string literals together.

~~~
userbinator
I can hardly imagine a case where you would misspell "select" and not notice
it at some point, nor use it in more than one place (the keyword detector) in
the parser.

~~~
Townley
The pattern (which I employ only sometimes) is to have almost all literals
defined in this way. Perhaps SELECT isn’t likely to be the word you misspell,
but I’m a careless typist and make 10 typos a minute. Taking this added step
helps your editor save you from yourself.

~~~
viraptor
Having standards like that and keeping them helps a lot. Next time you have a
different keyword, you don't have to think "does it deserve a constant?" \-
all of them do.

Similar to how linters stop you from overthinking indentation in specific
cases, or some naming standards, and later rename/reformat-wars.

~~~
userbinator
_all of them do._

That's what leads to dogmatic cargo-culting. Good software is written by
thinking about the circumstances and doing what makes the most sense, not by
mindless rule-following that don't always make sense.

I don't know why someone would be so worried about typos and introduce _more_
verbosity and redundancy in the process; but then again, I don't use an IDE
and I've never had this problem.

~~~
viraptor
I'm not sure you're really making the argument you think here...

Good software may mean thinking about circumstances like "we're dealing with
lots of text parsing and have seen bugs from typos in common keywords" and
doing what makes most sense: "let's prevent those in the future, but using
constants for keywords". It's only cargo culting if you don't know why you're
doing something.

Setting a rule so you don't have to debate something later is a valid solution
and may still offset some redundancies introduced this way.

------
rochak
Do you know about a tutorial or a book that helps write Database from scratch
either in Java or C?

~~~
v3gas
Check out this (in C):
[https://cstack.github.io/db_tutorial/](https://cstack.github.io/db_tutorial/)
also a list here [https://github.com/danistefanovic/build-your-
own-x](https://github.com/danistefanovic/build-your-own-x)

~~~
rochak
Thanks!

------
alexanderhorl
Are you going to add persistence in an upcoming part?

~~~
eatonphil
Probably! That's the most interesting and most difficult part so it may take
me a while.

