While I’m not in the target audience for this, I skimmed through the first chapter and this seems really cool - great job! I will definitely keep this one in mind if it ever becomes more relevant to me.
One small comment: the /about page only has the books listed and nothing else. Some people will probably be interested in knowing a thing or two about the author before getting the books (and there are probably a million people with the same name, so difficult to Google)
This is like wanting someone authoring a PR on github to have their about section filled out. Maybe it's nice for the reviewer to have, but some authors prefer to let the content to speak for itself. I think that's fair.
Well, one's a decently comprehensive book and the other's SaaS aimed at embezzling your employer's learning budget - wider lang support but lower quality.
We have so many database products out there. But if this book leads to more of them, I'm all for it. I think databases, while old as IT itself, still has room to evolve, especially in the distributed realm, and especially in the multi-master configuration where I do not see many products being offered, as compared to one-writer configurations.
The book suggests building a database on top of a KV store. This is precisely what I did with my project. I was initially building a product that I hoped would be able to replace existing file systems and I needed an architecture that made it easy to create meta-data tags for every file and then find every file that had certain tags very quickly. I implemented the tags using a set of novel Key-Value stores that I invented.
Once I had it working, I realized the KV stores I used for tags were just like columns in a relational table built within a columnar database. Querying for files based off their tags was very much just like SQL queries for table rows. So I tried using them to create relational tables.
They turned out to be incredibly fast at a variety of queries (the bread and butter of databases) without needing to create separate indexes in order to get optimal performance. I thought database experts would be intrigued when I showed how much faster my system was than other conventional RDBMS setups on the same hardware. I guess I was surprised when almost no one was even curious how it did it.
I think the problem is that every relational query can be implemented a KV store, but usually the trade offs and bugs in your nascent query engine IS boring, and we can already use a bunch of hella fast KV stores out there if you dont care about ACID or you are willing to give up your decades long implementation details on SQL engine of choice.
I believe the aim of the book is more to promote a basic understanding of how relational databases work internally, by way of implementing a simple one oneself, an understanding which is generally helpful when using databases, and not so much to cause new database products to be created.
As someone who lacks a formal CS education and wants to know more about how databases work, I have been eagerly awaiting this book. I also want some practical golang projects to work on so this is perfect! I'm so excited!
I agree it could have been mentioned. Section 0.2, however, part of the short introduction page [0], provides the information:
The book uses Golang for sample code, but the topics are language agnostic. Readers are advised to code their own version of a database rather than just read the text.
Built college newspaper website back in ‘99 got tired of maintaining it by hand. Discovered PHP when it was new. Wanted to build a content management system. SQL sounded hard so I wrote my own database. Worked great for the years I was at the college.
Indeed. The Redis book was C/C++ so I was hopeful that this one would be as well. Given that database essentials are so closely tied to system calls, I would have hoped for a language that doesn't abstract them away. At least in the first fsync() call, the author does explicitly mention that `fp.Sync()` ends up invoking `fsync` system call, but as someone who has no intention of returning to Golang I'd rather not have to add complexity by requiring me to build and maintain a mental map of Golang calls to syscalls (the worst kind of abstraction layer IMHO: leaky and unnecesary).
never understood the issue with go error handling. As soon as you start dumping exceptions as a valid error forwarding mechanism ( which seems like a totally acceptable design decision), you end up manually having to check every call you make that can raise an error, on each line.
I don't really see any alternative. It also makes you carefully think about how you plan on managing errors in your codebase, which also seems like a very sane thing to enforce.
You don't have to trap exceptions separately on every line. You trap a whole block of code and match on the exception type to determine how to handle the problem. It isn't perfect, nothing is, but I prefer systems languages where typical failures cannot easily be ignored and yet you are not burdened with constantly thinking about it.
> I prefer systems languages where typical failures cannot easily be ignored and yet you are not burdened with constantly thinking about it
This is self-contradictory. In particular, the only way you can have reliable error handling is if you are forced to think about each possible failure.
I assume by "cannot easily be ignored" you mean the way exceptions blow up at runtime? I don't find that an acceptable default for any non-scripting language.
I'd get annoyed too, because I know there are much simpler alternatives. Language design should encourage doing things right by making it more convenient than the shortcuts.
In Rust that entire check can be a single "?" symbol.
How much syntactic sugar is too much is a matter of preference, but I personally think that properly handling all errors without syntactic sugar turns into an unreadable mess because there's just a lot of things which could go wrong.
> I think just forwarding all low-level errors is a really bad habit
Why exactly is that a bad habit? In almost all situations where I return an error I already have enough context, I'm just wondering what else I'd add to that.
most errors you encounters are with i/o and are stupid "can't read, can't write or can't serialize".
In a network environment (which is originally what go was made for) you often need to add tracing information, business-level identifiers or processing information related to your state etc.
I'm currently writing a fairly complex api in go, and to be honest this really hasn't bothered me once.
Not to say it doesn't exists, but with time i've come very suspicious of people complaints over go. Most of the time those complaints come from people that didn't realize they missed an opportunity to have written a much much more elegant solution to their problem.
I never said it is, but to handle an error you have to pass it to the caller that can actually do something with it, whether it's trying alternatives, repeating the step, just logging it, or whatever else - a lot of functions just need to pass it on a couple of times and making this verbose in every single case doesn't make sense to me.
Interesting choice. I'd think that as the context is "As many of today’s (2023+) coders do not have a formal CS/SE education" and the goal is education, a more popular language like Javascript or Python (or, heck, even PHP[1] /s) would be used, instead.
I've taken a look at Go, and while it does seem pretty approachable, it's definitely not nearly as common as Python/JS, and it's always significantly harder for me to learn a new concept when the examples are also in a language I'm unfamiliar with. Maybe that's just me, though.
I think Go is a pretty good choice for the purpose. It balances high-level ease of use and learning curve with decent access to the system-y parts of coding that are so important to databases. What you learn to do in Go will translate reasonably well to a true systems language if the user wanted to take database engine design to the next level.
Languages like Python or Javascript are so far removed from the system-y side of programming that the way you would implement the concepts in those languages would not translate to the way you would actually build a "real" database which is I think the purpose of the book. I think the objective isn't to teach the abstract concepts but how those concepts are expressed in real systems.
I'm a data engineer and I only know Python. It appears Golang hits a sweetspot on many metrics such as performance, parallelism, ease of use etc and since 2016 there's been a lot of new data products and tools written in Golang. So it makes sense to me that the book would use a popular language for the domain.
Databases really do push runtimes in such a way that I think it makes sense to urge folks to use a system level language, or something close to it. In particular it'd be hard to cover concurrency (and parallelism) properly using vanilla CPython or JS, and I think that would impinge on the lessons learned.
That said, it'd be an interesting read on how to make a DB in pure Python.
In this context I meant that the parallelism primitives are different than what you'll get in a systems level language, which might make teaching those parts unnecessarily awkward and probably the concepts less translatable to other runtimes.
For sure they've got different performance profiles.
Very impressive what your group was able to do with Node, and continuing on with Zig. It's got me interested in learning more.
I could be wrong, but my perception is that Go is so opinionated that you'll either write idiomatic Go or use another language. So, from that perspective there's some goodness in using Go as a learner's systems language.
It's opinionated, but it's not _that_ opinionated IMO. You can write awkward Java, shiny C, or whatever idiomatic Go is. Most people I worked with were from .net/Java worlds, and learned just about enough Go to be able to subject others to their coffee bean ideologies.
There's somethings the compiler will fail on like unused variable and the likes, but for the most part you need added static analysis and style checking -- some of which ships with the Go compiler.
Part 1: Modify the KV type
Part 2: Add the Read-Only Transaction Type
Part 3: Add the Read-Write Transaction Type
12.4 The Free List
12.5 Closing Remarks
Good question. I've no idea but if I was the author I wouldn't bother with concurrency, I'd assume single user only and always; a database without concurrency is still perfectly good database so it's hardly making the book title a lie.
Turbo Pascal came with a public-domain sample spreadsheet implementation (CALC.PAS aka MicroCalc) since version 1.0 (from 1983, 40 years ago!). Here is the version from Turbo Pascal 3 on GitHub: https://github.com/hindermath/MicroCalc
Maybe I'm missing something, but after a cursory glance, I can't find any place where that library does an fsync() call. How does it handle durability?
your comment made me laugh.
I never thought about entering a codebase by searching for fsync, but in the case of a DB that's probably the best place to start :))
Memory mapping is a poor way to make a database[0], not all storage can be memory mapped, and this doesn't provide any of the functionality you expect of a database like indexing and concurrency control.
"correct" is not an objective measure, it's a function of use case
mmap is not a panacea, it improves specific access patterns by incurring specific costs, it's definitely not true that mmap is the right choice for all databases
Yes, I understood, but "mmap a file" that should be a database is not "easy", because now you can deal with segfaults and other weird things that are out the normal playbook.
Of course, depending on the other things of the list this is or not a major issue. Is more about how combining several ideas leads to a easy or complex implementation.
I worked on a commercial product that did just this, except it also had a "transaction log" component. All access through the server was logged, and the log could be used for replay/recovery. While it worked for them for thousands of simultaneous clients, it was not a general purpose solution. There were no real indexes: the "indexes" (hash tables, really) were built in memory at startup.
I suppose I should've said "persistent" indexes. There was also only a single type supported: hash table. And it was hard coded to 2 fields. You had to change the code (which was C) to change the database.
This system was rarely restarted, so in practice it didn't matter what it did at startup, as long as it didn't take more than a few minutes, but it did place some limitations on data size. (This was a 32-bit system and everything was memory mapped.)
One small comment: the /about page only has the books listed and nothing else. Some people will probably be interested in knowing a thing or two about the author before getting the books (and there are probably a million people with the same name, so difficult to Google)