
Never Write Your Own Database (2017) - ahiknsr
https://medium.com/@terrycrowley/never-write-your-own-database-736f704c780
======
tlb
Akin's rules for spacecraft design [0] include: "Any exploration program which
"just happens" to include a new launch vehicle is, de facto, a launch vehicle
program."

By analogy, any software project that includes writing a database is, de
facto, a database project.

[0]
[http://spacecraft.ssl.umd.edu/akins_laws.html](http://spacecraft.ssl.umd.edu/akins_laws.html)

~~~
Strom
Another common version of this is: video game project turning into a video
game engine project, usually not even reaching the actual content phase.

~~~
k__
But weren't such projects rather common?

~~~
sametmax
20 years ago, when much less things were expected from engines and graphics
alike.

It's like a web framework. Coding one when CGI came out was way less work than
now. If you want to match what django or rails does now, it takes years.

I read somewhere it would cost millions if you tried to hire dev to recode
django at today's market rate.

~~~
jcalabro
While these tools such as django and rails or Unreal and Unity are excellent
achievements, we'll never get even better tooling if really smart people
aren't working on these sorts of projects.

I follow a project called Handmade Hero[0] by a longtime game and game tooling
programmer where he codes a complete game from scratch with absolutely zero
dependencies except Windows. It's an excellent educational resource and I've
learned a lot about how games work, including game engines such as Unreal of
Unity. This has enhanced my knowledge of how I might better use those engines
and I feel that it's been a valuable use of my time.

[0] [https://handmadehero.org/](https://handmadehero.org/)

------
MichaelGG
Sometimes it's a huge advantage. I wrote a network search engine. On a single
1TB spinning disk, I could handle 5TB of traffic, stored and indexed, per day.
That's around 2 billion packets indexed. The key was having an log/merge
system with only a couple bits of overhead per entry, and compressed storage
of chunks of packets for the actual data. (This was before LevelDB and
Elasticsearch.)

In practice the index overhead per packet was only 2-3 bits. This was
accomplished by lossy indexes, using hashes of just the right size to minimise
false hits. The trade-off being that an occasional extra lookup is worth the
vastly reduced size of compressed indexes.

To this day, I'm not sure of general purpose, lossy, write-once hashtables
that get close to such little overhead.

Competitors would use MySQL and insert per packet. The row overhead was more
than my entire index. But it worked out: just toss 50k of hardware at it.

But... It does take over a lot of engineering time writing such bespoke
software. Just compressing the hashes (a common info retrieval problem) is a
huge area, now with SIMD optimised algorithms and everything.

~~~
CyberDildonics
I'm going to go out on a limb and guess that it would have been cheaper to
upgrade the hardware.

~~~
MichaelGG
Depends on scale. At higher end, it was near impossible to scale when you're
e.g. inserting a MySQL row per packet. But maybe good enough for a viable
business. I would probably try to take it as far as possible on Elastic if I
were to write it today.

Same thing if you read the Dremel paper. Worrying about bits helps when
scaling.

~~~
fizx
Because Lucene wasn't good at near-realtime in 2009 or so, Twitter's original
(acquired via summize) search was written in mysql. It might have even been a
row for every token, not quite sure.

IIRC, when we moved to a highly-customized-Lucene-based system in 2011, we
dropped the server count on the cluster from around 400 nodes to around 15.

------
lmilcin
I had to roll my own transactional database for an embedded product working in
a very constrained environment (2MHz ARM, 2MB total memory). I did very
extensive search and I found no product with the combination of
characteristics I was looking for:

\- working in very small but more importantly, constant memory (predictability
is a must for reliable embedded app),

\- provide provable transactional characteristics with guarantee to roll back
to previous state in case transaction is interrupted and still be consistent
and able to accept new transactions,

\- minimum storage overhead -- the amount of overhead was directly translating
to competitiveness of the device at the time as multiple vendors tried to
provide their own solutions for the same preexisting platform

\- storage write pattern that would provide wear leveling for the underlying
flash

I ended up writing my own database (less than 1k LOC total with
encoding/decoding data using BER-TLV) that would meet all characteristics,
take few tens of bytes of stack, take few bytes of overhead per write. The
database would vacuum itself and coalesce record state automatically. It had
some O(n^3) algorithms but THAT'S OK since the amount of data could never be
so large that it could pose any problems.

The project took 2 years to complete. I spent maybe a month designing,
implementing and perfecting the database. I wouldn't say that the Akin's law
of spacecraft design applies here. I would probably spend more than that if I
had to integrate existing product and end up with inferior product anyway.

~~~
faitswulff
The title is misleading - it's actually about how and why they did end up
writing their own db. From the article:

> There’s an old adage in software engineering — “never write your own
> database”. So why did the OneNote team go ahead and write one as part of
> upgrading OneNote’s local cache implementation?

~~~
lmilcin
What I am really objecting to is those hard "rules". "Never optimize early".
"Never roll your own database", etc.

All those rules work for most but not all projects. It's the same as saying
"You shall allways obey traffic rules". Maybe I should, but sometimes I may
not want to brake on yellow light when I have clearly impatient driver
tailgating me.

As we gain experience we learn the world is not black and white. Akin's law #8
says:

"In nature, the optimum is almost always in the middle somewhere. Distrust
assertions that the optimum is at an extreme point."

~~~
ksk
But your objection is rather silly!! Those rules are meant for beginners. If
you're not a beginner you already know when/if/how you can break the rules. It
is correct to teach kids that they should tell the truth in every scenario,
even if as adults we don't.

~~~
hutzlibu
"It is correct to teach kids that they should tell the truth in every
scenario, even if as adults we don't."

Not correct. Example, a stranger: "are you alone a home?" or "what is the
keycode for the door" etc.

~~~
ksk
I think its still correct. Sorry, I don't find your examples to be a reason to
teach children that truth is optional.

~~~
klibertp
It would be much better to teach children that the truth doesn't exist, that
everything we call "reality" is a lie on some level, and that the best you can
do is to stay honest to yourself and your intentions. Probably much harder to
teach, but much closer to what they will fight with for the rest of their
lives. I think it's the lack of such preparation in childhood which makes
people give up, either never trying to figure out what the "truth" even is or
relying on an imaginary friend in the sky as a source of one and only truth.

Some simplifications are useful in some contexts, but the world is an
unimaginably complex thing and trying to dumb it down can only take you so
far.

~~~
ibeckermayer
"the truth doesn't exist, that everything we call "reality" is a lie on some
level"

Wait so does this truth exist, or is it also a lie on some level?

~~~
klibertp
> does this truth

It's not truth at all. Just an observation. It happens to fit with my
perceptions. That's it.

You can artificially define truth by tying it to a particular frame of
reference, but that's not "the truth", as that frame of reference is not 100%
transferable to others anyway. The idea of umwelt, as I understand it, seems
to work here. Still, it's just an observation, an impression.

I'm not really saying anything new here. Descartes was saying something
similar quite a long time ago. Then again, he could have meant something else
entirely and there's no way for me to know which is it. I just assume my
understanding is close enough to the intended meaning. I may well be
completely wrong on this.

Basically, there's nothing you can be really sure about, including the fact
that you can't be really sure about anything. There are only things that
appear to work for you and, possibly, others. You can use them. Just don't
believe them unquestioningly.

------
mpweiher
Possibly relevant are these 2 great papers co-authored by Michael
Stonebraker[1]:

 _“One Size Fits All”: An Idea Whose Time Has Come and Gone_ [2]

 _The End of an Architectural Era (It’s Time for a Complete Rewrite)_ [3]

They basically show that classical RDBMs are inefficient by around a factor of
at least 10 in every conceivable application. I tend to trade in a little of
that for the kinds of dramatically simpler mechanisms discussed in TFA.

[1]
[https://en.wikipedia.org/wiki/Michael_Stonebraker](https://en.wikipedia.org/wiki/Michael_Stonebraker)

[2]
[https://cs.brown.edu/~ugur/fits_all.pdf](https://cs.brown.edu/~ugur/fits_all.pdf)

[3] [http://cs-www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf](http://cs-
www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf)

~~~
everybodyknows
Do they identify any cases for which classical RDBMS are inefficient by a
super-constant factor?

~~~
mmalone
ACID semantics subtley and indirectly requires locks. Any scenario where lock
contention might be a problem would be a problem for an RDBMS (assuming the
traditional definition of RDBMS as a ACID compliant). Those scenarios can
easily be “super-constant”, especially if you have to distribute across
multiple nodes. That requires locks that are held for the duration of an
“external call” over the network which is A Bad Idea and could definitely be
“super-constant”.

------
commandlinefan
"Never write/roll your own" is the counterpoint to the NIH "Not Invented Here"
syndrome... however, the older I get, the more reasonable NIH cases I see,
such as the one in the article. People will say, for example, "why would you
write your own ORM when there are so many available for free? Why reinvent the
wheel?" Yet when I start looking at the wheels that are available, they take
up a lot of space, a lot of time, take quite a while to learn (while the boss
is asking me "is it done yet? is it done yet?"), have their own bugs, and need
to be regularly upgraded and maintained against. This isn't necessarily a flaw
- these "wheels" (ORMs, file storage solutions, web application frameworks
and, dare I say it? security solutions!) are designed to solve generic
problems rather than the specific targeted problem that I'm focused on.

~~~
jmalicki
If your problem is that your boss is asking "is it done yet?" writing your own
new ORM does not seem like the most productive antidote to learning an
existing hard-to-understand ORM....

~~~
commandlinefan
Not in my experience. If I'm coding, and I can demonstrate that I'm checking
stuff into Git (or SVN or CVS or Clearcase), and I have something to demo
every couple of days, they're happy, even if there's theoretically a faster
way to get from point A to point B. On the other hand, if my only progress
(for a whole day!) is "researching" some software, they go into full-blown
panic mode. I've found this to be absolutely consistent over a career of now
25 years and 10 different employers.

------
DrScump
If there are use cases even _now_ where a roll-your-own solution is best,
imagine the 1980s.

My first programming roles were in a mainframe shop with a major defense
contractor, which had some _brilliant_ mainframe system programmers over the
years. Their major unclassified systems (payroll, shop order control, etc.)
were internetworked with a homegrown real-time system, and they all had a
home-grown disk access system (random and sequential) that was surreal in its
speed and reliability, all coded in 370 assembler. On the business
applications side, they had a thorough API that was callable from even COBOL
programs.

By the mid-1980s, upper management decided they had to "standardize" and began
developing replacement systems using IBM's IMS. Performance was _unusably_
bad. I left around that time to join the UNIX RDBMS world, so I don't know if
they ever found a solution that could actually be rolled out.

Working with the older stuff was actually _fun_ because things just _worked_ ,
and the customers of our legacy systems were really happy.

------
randomsearch
The major cost factors in software development are developer wages.

It’s all about how you spend the time you have. Don’t build stuff that you can
get off shelf. You’re not going to write a better database, even if you think
you can. You may get great performance for the current system on day one
(after spending a huge amount of time and effort on developing it, perhaps
with zero value created) but over the lifetime of the database you’ll incur
huge costs that you probably can’t even fully foresee. The details here are
not clear enough, but working from first principles (“I want to build an
Evernote-like app”) I can’t imagine an experienced developer suggesting you
should write a database (cache/file system).

The best software development is mundane: glue together what’s there, buy the
resources you need to get sufficient performance. Switch a component if you
need something to be faster, re-architect key parts, and buy the new
components off the shelf too. It’s fun to build new stuff with custom
algorithms you work out, but instead of that you can go home at 5pm, try
building a database in your spare time and learn why you shouldn’t, and still
have space to relax.

As Sam Altman wrote recently, the productivity tip that most people are
astoundingly ignorant of is: choose what you do carefully.

It’s telling that this article does not begin: “we had a bottleneck that could
not be solved.”

~~~
dman
On the flip side - building just enough software to solve the problem at hand
can sometimes run circles around assembling things using off the shelf
components. This is especially true if something like speed / memory /
correctness / optimality is important.

~~~
randomsearch
Could you give a concrete example?

~~~
dman
a. High frequency trading

b. Embedded software

c. High performance computing

d. Robotics

e. Military / Aviation software

In each of the above fields there are categories of software where you can run
circles around general solutions by writing something custom and yes some of
those involve writing your own database.

~~~
Latteland
and f. databases.

ha, just kidding.

~~~
dman
As someone who works professionally on databases this made me chuckle.

~~~
Latteland
I work on database software too, so I know it should be there. plus fuck
databases, they are such a pain :-)

------
rdtsc
I have written my own database. In 3 months. Shipped it and installed in
customers' sites, and it's been working great for many years now. So do write
your own database, but know exactly how kernel, libc, language libraries,
sockets work :-)

~~~
rzzzt
Alternatively, write your own kernel!

~~~
rdtsc
A coworker did but for an embedded device. It worked well but we did spend a
lot of time debugging the networking.

~~~
tabtab
Often such things "work" technically, but if you leave the project, some poor
schmuck has to figure out what you did. It may be obvious to you, but not to
others. Remember that maintenance is a bigger cost than development on
average. Write systems for other developers, not for yourself.

~~~
rdtsc
Well it works and the poor schmuck is the person I hired and trained and they
are still there. The db is still being sold and making money. Since code will
be read more times than be written, I try to write it so that a junior version
of me from 10 years ago could understand it, after waking up in the morning
before the 1st cup of coffee :-)

The best advice is really to keep it simple. Build the system of out small,
isolated units with clear APIs between them. The API and components should be
isolated at runtime, so they can be restarted or fail separately, as well as
in the code - separate modules / files / libraries. That way looking at it can
be easier to understand how everything fits together.

The simple part wasn't just a platitude or a generalizing statement. A db is
really a beast that can easily turn into a giant ball of spaghetti with a ton
of features, settings and tweaks and never-ending list of bugs. A lot of work
and thinking I did on it (including a rewrite after an initial prototype) was
with the goal of making it simple by cutting unnecessary features.

------
projektfu
It seems to me that Microsoft frequently writes new embedded databases,
leaving them poorly documented and accessible from weird tools. For example,
the registry, Windows installer, structured storage.

~~~
coldacid
Structured storage isn't embedded database tech. It's actually FAT (or close
enough) inside a file, with a bunch of helper COM objects.

------
Animats
There are two old commonly used programs that have their own database - BIND
and Sendmail. Both have problems because their database is crappy.

~~~
Something1234
Doesn't OpenLDAP implement it's own database or use bdb? I remember it
corrupting itself if the server lost power.

~~~
manigandham
OpenLDAP uses LMDB now, one of the fastest and reliable key/value stores
around, which they wrote precisely because berkeleydb sucked.

~~~
Something1234
This was early 2016 from the centos 7 repos, so this must be relatively
recent. It didn't even have to be writing to it. I just had to reboot and it
was left in a dirty state.

~~~
Mononokay
CentOS repos often don't use current software, so not necessarily.

------
notacoward
The looser your requirements are, the easier it is to write your own so-called
database. Nobody should be writing a full ACID/SQL/whatever database as part
of another project. On the other hand, if you want to implement a "database"
in the same sense that many implement a "filesystem" which is just an HTTP
object layer on top of a real filesystem, knock yourself out. Just don't think
you're doing the same thing as the people writing more featureful general-
purpose databases.

~~~
a1369209993
If you're willing to accept that durability is a lie, and also to compromise
(possibly a lot) on performance and trust that the CPU and RAM aren't actively
helping the disk to sabatoge you, you can make a fairly full-featured ACI
database in a weekend using Merkle trees and a single-threaded rendevous
serializer. The hard part is getting those guarantees while _also_ having
latency and throughput vaguely comparable to naive file writes. (I'd add
something about a better API than assembly language, but database people
decided to imitate COBOL, so I can't really criticise assembly with a straight
face.)

~~~
mmalone
Log structured merge is the last piece of your architecture. It solves the
latency part. It can also solve your durability issues, if you fsync when
necessary and you understand your hardware.

~~~
a1369209993
Yeah, sorry, I assumed log/journal plus reapply-on-recovery went without
saying. You can't have guaranteed durability (short of blockchain-style
distributed stores) because it's always possible for your database server to
suffer a unrecoverable hardware failure between transmitting "transaction
success" and taping out its weekly backup. If all else success, your
datacenter catching fire and burning down would suffice. So it's only ever a
matter of reducing (not eliminating) the _probability_ of having to roll back
commited transactions. (Tldr: fsync can't help if your hard drive melted.)

~~~
mmalone
Oh, I assumed we were talking distributed. Fsync to more than one disk. My
biases :).

~~~
a1369209993
Fsync to more than one disk is great way to reduce the probability of forced
rollback, but simutaneous failure is a still possible.

Then again, ACI does assume you aren't having CPU/memory/etc failures, so to
the extent that you count anything as guaranteed, it is probably possible to
'guarantee' durablity. My original point was more that "minimally adequate"
isn't that hard compared to going from there to a high performance, general
purpose system (and also a offhand snipe at SQL).

------
amluto
I wrote my own database once. It’s an “event sourcing” database, but I wrote
it way before event sourcing was cool. It can handle over 1M events/sec on one
core. I don’t know any off-the-shelf product that could replace it.

I also helped write a log-structured OLTP-ish database that sits in front of
MySQL.

I have only two regrets about these systems. First, MySQL was, in retrospect,
the wrong backing store. Second, I used Thrift serialization. Thrift is not so
great.

~~~
dwyerm
Did using MySQL allow you to ship a product on-time and under budget? Or, to
put this the other way, if you had used The Right Backing Store would you
still have delivered the product in a timely manner at the right price-point?

~~~
amluto
Unclear. I think the right backing store these days would be PostgreSQL. At
the time, both were equally easy to get started with. PostgreSQL didn’t have
fully serializable transactions yet (IIRC), and I was more familiar with
MySQL. OTOH, MySQL has sucked in myriad ways. Replication is very easy to
screw up. The query optimizer and SQL advanced feature support is not so good.
In general, there have been a lot of rough edges.

------
perl4ever
There's also the (anti)pattern of making your own database in a lesser sense -
implementing a set of tables on a SQL database for generic data. I think this
is called the entity-attribute-value pattern. My impression is this is often
considered deplorable, yet practically every application built on a database
that I've encountered at work uses it to some extent.

~~~
perlgeek
As soon as you allow some kind of user-defined meta data, you basically end up
with the entity-attribute-value pattern.

Every sufficiently advanced system does that, be it a CRM that allows custom
fields, or a project management or ticket system, a workflow system etc. And
basically all enterprise software contains elements of these systems.

~~~
nostalgeek
A good alternative would be to use JSON types in RDBMS in combination with
classic field types. EAV is really just free form data and JSON is easier to
query in MySQL or Postgres.

~~~
perlgeek
Yes, but that's a fairly recent development, often without good ORM support,
and I don't know how cross-database compatible the support is.

For a new application that only supports one database, it's probably a better
choice.

------
Walkman
"As alluded to above, this does introduce a separate challenge with multi-file
consistency. The team took an interesting approach here. The index file points
to the location of the root of the page graph and free space map in each page
cache file. When persisting a multi-file transaction, all page cache files are
flushed to disk before the index file."

That sounds familiar: [https://pthree.org/2012/12/14/zfs-administration-part-
ix-cop...](https://pthree.org/2012/12/14/zfs-administration-part-ix-copy-on-
write/)

------
krylon
I used to work at a company where the "database" was literally a raw dump of
the in-memory array of structs defined in a C program. Had some fun with that
because struct padding depended on compiler flags, so different varieties of
the program that were built with different compiler flags could not read each
other's "database".

I never got over it, really. On the plus side, it was super fast, that I
cannot deny. But it always seemed super gross to me.

~~~
kalleboo
Lots of software used to do this, back when computers were really slow and low
on RAM. The old MS Office, Adobe Photoshop formats are notorious for it.

~~~
krylon
To be fair, the software had been first written in the early 1990s, so I can
imagine it was a rational decision back then.

I was just always amazed that there never were any problems with corrupt
files.

------
mastax
I was not expecting this article to be about OneNote, but I'm glad that it is.
OneNote is a surprisingly good product, but it takes so excruciatingly long to
sync what must be 10KB of text. I look forward to what these changes enable.

------
jchanimal
If you’re writing your own database and you are serious about making it a
business, reach out to me via my profile here. Ive been there done that, happy
to help.

------
unilynx
We wrote our own database 15 years ago - needed something embeddable into the
application on both windows and linux, i386 and alpha, and the ability to do
transactional schema modifications without global locking. If I recall
correctly, postgresql was at 7.3 at the time and would happily dump core if
you tried something like 'drop table x;rollback'.

And we didn't need full sql semantics, because we partially implemented it
into our scripting language anyway and used only cursor APIs. postgresql's
mvcc was still a great inspiration for how to design the actual row storage.

Served us well, but 15 years later, a bit too much of custom features make it
hard to switch to a standard database.

------
progx
If everybody never writes a database, who does it?

~~~
themacguffinman
Domain experts for whom the database is the main project rather than a tacked
on requirement.

------
dicroce
I wrote a database for my app and it was the best decision we made.

~~~
naspinski
Without a comparison of how good the other decisions were, this means nothing
:)

~~~
zupa-hu
that comment made my day xD

------
gerdesj
_The advantage of rolling your own is that you get to optimize and control
things at a level that is just not possible when working through a complex
external component_

Then you stick it on a filesystem that does not work the way you think it does
or even worse you stick it on a remote filesystem (eg CIFS/SMB) that does not
work the way you think it does and is now backed by an FS that also does not
work the way you think it does.

Your DB _may_ work fine but make sure you onboard the lessons that all the
others have learnt through bitter experience.

------
Lxr
OneNote is fantastic. Of all the Microsoft things I have installed on my Mac,
it’s light years ahead of the rest in terms of stability, usability, speed. It
doesn’t get enough recognition!

------
mpweiher
TL;DR: Don't build your own database -- here's how and why we (Microsoft
OneNote) did anyhow and it worked out really well for us.

~~~
petra
Did it ? last i've tried oneNote, it was a buggy piece of crap, coulnd't even
finish the import(on win 7). so IDK , databases ?

~~~
disordinary
I'd say that since this article was written well after Win7 that it's in
response to your experience and other experiences like yours.

~~~
petra
I tried the import recently.

------
ne01
At SunSed we have created our own DB system for the exact same reason:

WordPress average page generation: 1 sec

SunSed CMS: 0.03 sec

Database is almost always the reason for slow applications.

Edit: if you do caching right, WordPress becomes as fast.

------
tschellenbach
It took me a long time to feel comfortable with the idea of building our own
database. For Stream it worked out well though:
[https://stackshare.io/stream/stream-and-go-news-feeds-for-
ov...](https://stackshare.io/stream/stream-and-go-news-feeds-for-
over-300-million-end-users) Tools like RocksDB and Raft give you really
flexible building blocks.

------
amelius
We would have never had NoSQL ...

~~~
gaius
Yeah you would. IBM was doing it in the 1960s pre-SQL.

------
quotemstr
This article is timely. I have just written a database to fill a gaping hole
in the ecosystem. It even understands SQL.

------
nannePOPI
Thing with rules is that you can't know when is right to break them if you
don't know them.

------
_o_
Well depends on the case, I needed a fast ip to country resolving, really fast
one. I did my database in C for it and it was 120 times (yes!) faster than
fully optimized postgres table, not to mention memory, disk and cpu footprint.

Custom made database for just a specific problem can be a lot faster than
generic one. But depends on your knowlidge.

~~~
jacquesm
That's just a lookup table, not a database.

~~~
_o_
"A database is an organized collection of data."

By definition. First sentence in wiki. I don't know what are your definitions.

~~~
jacquesm
If you want to call a lookup table a database I'm ok with that.

~~~
hinkley
But that means I'm writing my own databases at least a couple times a year.
I'm not sure I'm okay with that.

------
marknadal
I'm pretty sure the mantra is "Never Write Your Own Crypto", not "never write
your own database".

I know tons of people who roll their own flat-file storage engine and are
perfectly happy with it, even scaling up to their moderate couple of tens-of-
thousands of users. Nothing fancy.

Personally, I had to write my own database (just like what the author of the
article wound up doing), and had a delightful time learning all sorts of
things, and now it is one of the most popular databases out there
([https://github.com/amark/gun](https://github.com/amark/gun)) and I encourage
others to try (if they have time) building one themselves!

~~~
ovao
The mantra tends to be “just use SQL” or “just use SQL, unless...”. Similar to
the advice “just use an int, unless...”.

The wisdom in that is usually pretty sage. Reach for a SQL database unless
there are reasonable and concrete reasons not to. Then, evaluate whether
there’s a need to roll your own, or if there’s something you can grab right
off the shelf.

I’d argue there are plenty of perfectly valid reasons to roll your own, and
it’s not _particularly_ difficult if you build on an already-proven storage
foundation (file system, LMDB, etc.)

