
What I learned from programming databases - otoolep
http://www.philipotoole.com/what-i-learned-from-programming-a-database/
======
btrask
This article seems to be from the perspective of _using_ databases, whereas I
expected from the title it would be from someone who _created_ database
software.

Since learning how databases work from the ground up, my understanding of all
software has changed a lot. When people argue about a database server being
good or bad, or which ORM is best, I can only roll my eyes. Almost all of the
popular advice around them is the cargo cult of an echo chamber.

It only took me around 6 months of studying, using, and building databases to
learn the fundamentals. Database technology may seem boring at first, but it's
actually a microcosm of computer science. If you want to learn more, dive in!

~~~
otoolep
OP here. I did create database software. I spent almost 2 years as part of the
core team at InfluxDB, as outlined in the article.

~~~
mpbm
Are there any heuristics for when you should switch from easy data storage,
like parsing flat files, to a simple database, then to a complex database?
Like, "if it's breaking" seems obvious. And "it depends" doesn't clarify what
it depends on and how much. Comparing databases, setting them up, and learning
what's what has a cost. When is the benefit worth it?

~~~
otoolep
Sheer scale is a key reason. The simple, and easy to understand, format of a
text file won't work well when you've got a very large number of records to
read, write, and locate, quickly.

~~~
mpbm
Yeah, that seems to fall under the "if it breaks" category. But even then it
seems like you could just split the monolithic file into different files with
smaller scope. Like all the keys that start with "x" go into the "x's" file.
You could probably get a lot of mileage out of just splitting the files by
index whenever they reach a certain number of records. It doesn't really
matter how many files you have if their naming convention is simple, like if
the key starts with "xyz" it's in the "xyz's" file.

Intuitively it seems like that solution should scale pretty much as far as you
need. As long as some other feature, like simultaneous access, didn't become
important you'd be fine.

I guess that's basically just using the existing functions in the OS.
Hmmm...okay, so that implies a heuristic for investing in a database could be
if the product needs high performance in a function the OS doesn't already
have. In that case you'd have to implement it yourself anyway, and a database
would probably do it better. That would also help constrain the search for an
appropriate database since it would make sense to focus on that function.

OS's already provide the function of allowing multiple users to access the
same information; that's what the file system is. But something like
simultaneous access is different. I think they generally just lock the file.
So I'd have to write an middle-man function to work with the file in memory
when multiple users need to read/write simultaneously. That could easily get
complicated fast and clearly justify a database.

~~~
otoolep
In some ways, that is what a database is. Complex files, each with their own
structure.

