Hacker News new | past | comments | ask | show | jobs | submit login
Never Write Your Own Database (2017) (medium.com)
445 points by ahiknsr 11 months ago | hide | past | web | favorite | 154 comments

Akin's rules for spacecraft design [0] include: "Any exploration program which "just happens" to include a new launch vehicle is, de facto, a launch vehicle program."

By analogy, any software project that includes writing a database is, de facto, a database project.

[0] http://spacecraft.ssl.umd.edu/akins_laws.html

Since this obviously inspired https://news.ycombinator.com/item?id=16996306, I'm going to move the spacecraft replies to that thread. The database replies can stay here. Beautiful analogy tlb.

Another common version of this is: video game project turning into a video game engine project, usually not even reaching the actual content phase.

But weren't such projects rather common?

20 years ago, when much less things were expected from engines and graphics alike.

It's like a web framework. Coding one when CGI came out was way less work than now. If you want to match what django or rails does now, it takes years.

I read somewhere it would cost millions if you tried to hire dev to recode django at today's market rate.

While these tools such as django and rails or Unreal and Unity are excellent achievements, we'll never get even better tooling if really smart people aren't working on these sorts of projects.

I follow a project called Handmade Hero[0] by a longtime game and game tooling programmer where he codes a complete game from scratch with absolutely zero dependencies except Windows. It's an excellent educational resource and I've learned a lot about how games work, including game engines such as Unreal of Unity. This has enhanced my knowledge of how I might better use those engines and I feel that it's been a valuable use of my time.

[0] https://handmadehero.org/

That is definitely a common thing in the gamedev world.

Any software project that needs a custom database most likely is a database project anyway. In such cases, it's often more realistic to write a new database than is it to hack an existing one. And these days you have many good options for low level storage.

And OneNote would be happily qualified as a database project even without the truism. The whole point of the thing is to quickly and consistently synchronize multimedia notes across computers.

The analogy is amazing

Thank you for this.

Sometimes it's a huge advantage. I wrote a network search engine. On a single 1TB spinning disk, I could handle 5TB of traffic, stored and indexed, per day. That's around 2 billion packets indexed. The key was having an log/merge system with only a couple bits of overhead per entry, and compressed storage of chunks of packets for the actual data. (This was before LevelDB and Elasticsearch.)

In practice the index overhead per packet was only 2-3 bits. This was accomplished by lossy indexes, using hashes of just the right size to minimise false hits. The trade-off being that an occasional extra lookup is worth the vastly reduced size of compressed indexes.

To this day, I'm not sure of general purpose, lossy, write-once hashtables that get close to such little overhead.

Competitors would use MySQL and insert per packet. The row overhead was more than my entire index. But it worked out: just toss 50k of hardware at it.

But... It does take over a lot of engineering time writing such bespoke software. Just compressing the hashes (a common info retrieval problem) is a huge area, now with SIMD optimised algorithms and everything.

That’s fascinating. I guess you probably can’t open source it, but I’d love to read a blog post about it.

Here's a version https://github.com/michaelgg/cidb -- Just some of the raw integer k-v storage part. It assumes you already have the hashed entries (you truncate them and the compression takes it from there). It is really what you should expect more from a college course IR project but since I never went to school... oh well.

I used this same library to encode telephone porting (LNP) instructions. That is a database of about 600M entries, mapping one phone number to another. With a bit of manipulation when creating the file, you go from 12GB+ naive encoding as strings (one client was using nearly 50GB after expanding it to a hashtable) to under a GB. Still better than any RMDBS can do and small enough to easily toss this in-RAM on every routing box.

Some day I'd like to write it in Rust and implement vectorized encoding and more compression schemes. Like an optimized SSTable just for integers.

I'm going to go out on a limb and guess that it would have been cheaper to upgrade the hardware.

Depends on scale. At higher end, it was near impossible to scale when you're e.g. inserting a MySQL row per packet. But maybe good enough for a viable business. I would probably try to take it as far as possible on Elastic if I were to write it today.

Same thing if you read the Dremel paper. Worrying about bits helps when scaling.

Because Lucene wasn't good at near-realtime in 2009 or so, Twitter's original (acquired via summize) search was written in mysql. It might have even been a row for every token, not quite sure.

IIRC, when we moved to a highly-customized-Lucene-based system in 2011, we dropped the server count on the cluster from around 400 nodes to around 15.

You can only upgrade hardware so much. If by doing a lot of low-level optimizations, you can remove (or delay) the need to build a complex distributed system, then the optimizations end up paying big dividends beyond just the cost of the machines.

I think it's also important to know when this occurred. I've found there's a general tendency among software engineers to (surprise!) believe that it's easier/cheaper to solve the problem of scale in software rather than hardware, and it's often fueled by the misconception that the alternative to doing so is a complex, distributed system.

This is a false dichotomy.

Maybe during the days of the dot-com boom, it was was true enough because scaling a single server "vertically" became cost prohibitive very quickly, especially since truly large machines came only from brand-name vendors. That was, however, a very long time ago.

A naive interpretation of Moore's law implies CPU performance today is in the high hundreds of times as fast as back then. Even I/O throughput has improved something like a multiple of mid-10s, IIRC. More importantly, cost has come down, too.

The purchase price premium for getting the highest-performance CPUs (and mobo they need) in a server over the lowest cost per performance option is about 3x. Considering that this is, necessarily, a single [1] server, the base for that premium isn't exactly tremendous. The total cost would seem to be on the same order of magnitude as a team of programmers.

Of course, in the instant example, the database was particularly specialized, including what strikes me as a unique feature, a lossy index. I'd expect data integrity to be one of the huge challenges of databases, which, if relaxed, makes writing a custom one a more reasonable proposition.

[1] Or a modest number, on the order of a dozen, for something like read slaves, rather than the multiple dozens if not hundreds the distributed system.

It's not an either or situation. Often there is a ~10-1,000x performance gains to be had in software from the initial production version to the optimized version. Similarly you can often get ~10-1,000x speed bump from better hardware.

But, the gains become more expensive as you move up the scale. So, at least starting down the software path is often very cheap with many large gains to be had. Similarly, it's at least looking at the software before you scale to the next level of hardware tends to be a great investment.

It's not about always looking at software, it's a question of regularly going back to software as it's better to regularly do so rather than as a one time push.

> It's not an either or situation.

I'm a bit confused.. are you agreeing or disagreeing? My point was to call out a false dichotomy and offer a third option.

> It's not about always looking at software

Yet that's exactly what happens. Software engineers completely dominate the field, including management, so they always look at software and only software.

Moore's law is about transistor count, not performance

I'm well aware, which is why I specified a naive interpretation. Still, are you actually saying that transistor count increase in the range of a multiple of 1024 hasn't been matched by comparable CPU performance improvements during that time?

I had to roll my own transactional database for an embedded product working in a very constrained environment (2MHz ARM, 2MB total memory). I did very extensive search and I found no product with the combination of characteristics I was looking for:

- working in very small but more importantly, constant memory (predictability is a must for reliable embedded app),

- provide provable transactional characteristics with guarantee to roll back to previous state in case transaction is interrupted and still be consistent and able to accept new transactions,

- minimum storage overhead -- the amount of overhead was directly translating to competitiveness of the device at the time as multiple vendors tried to provide their own solutions for the same preexisting platform

- storage write pattern that would provide wear leveling for the underlying flash

I ended up writing my own database (less than 1k LOC total with encoding/decoding data using BER-TLV) that would meet all characteristics, take few tens of bytes of stack, take few bytes of overhead per write. The database would vacuum itself and coalesce record state automatically. It had some O(n^3) algorithms but THAT'S OK since the amount of data could never be so large that it could pose any problems.

The project took 2 years to complete. I spent maybe a month designing, implementing and perfecting the database. I wouldn't say that the Akin's law of spacecraft design applies here. I would probably spend more than that if I had to integrate existing product and end up with inferior product anyway.

The title is misleading - it's actually about how and why they did end up writing their own db. From the article:

> There’s an old adage in software engineering — “never write your own database”. So why did the OneNote team go ahead and write one as part of upgrading OneNote’s local cache implementation?

What I am really objecting to is those hard "rules". "Never optimize early". "Never roll your own database", etc.

All those rules work for most but not all projects. It's the same as saying "You shall allways obey traffic rules". Maybe I should, but sometimes I may not want to brake on yellow light when I have clearly impatient driver tailgating me.

As we gain experience we learn the world is not black and white. Akin's law #8 says:

"In nature, the optimum is almost always in the middle somewhere. Distrust assertions that the optimum is at an extreme point."

The article deliberately contradicts the title, it's literally stating the same thing you're saying but with a real world example (One Note) and why they needed to write their own.

I'm fine with those rules. Until you can understand why the rule exists (on a technical level), can you break it. Because anyone who knows all the edge cases, filesystem issues, contention issues, deadlocking and integrity guarantees with databases will search hard for an OOB solution that fits his requirements, and will consider rolling his own a last resort.

But your objection is rather silly!! Those rules are meant for beginners. If you're not a beginner you already know when/if/how you can break the rules. It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't.

Exactly. Fully understand why should never do x, so that fully understand when you actually have to do x.

values > principles > practices

You are free to violate any of those on the right in the interest of the ones to the left.

"It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't."

Not correct. Example, a stranger: "are you alone a home?" or "what is the keycode for the door" etc.

I don't agree that these are counterexamples; there's a difference between telling the truth and telling the truth someone else wants to hear.

“I am not going to answer that” is a true, correct, and appropriate response to both of those questions.

But to a potentiell hostile stranger it is better to lie, than to tell the truth. More extreme examples come to my mind, with active agressor wanting to know something, where it is smarter to fool the enemy, than to tell the truth. "I don't knoe the code instead of "I am no allowed to tell".

But besides, most kids learned to lie pretty well, to circumvent various restrictions - and learning from the adults who tell them no to lie, but do it by themselves.

I am in favor of leading by example. If I am lying, how can I possible demand it from kids, to do otherwise? When they find out, they loose confidence in me. But when I tell them to only lie in extreme situations to "enemys" and act accordingly, they are much more likely to become truthful by default as well.

I think its still correct. Sorry, I don't find your examples to be a reason to teach children that truth is optional.

It would be much better to teach children that the truth doesn't exist, that everything we call "reality" is a lie on some level, and that the best you can do is to stay honest to yourself and your intentions. Probably much harder to teach, but much closer to what they will fight with for the rest of their lives. I think it's the lack of such preparation in childhood which makes people give up, either never trying to figure out what the "truth" even is or relying on an imaginary friend in the sky as a source of one and only truth.

Some simplifications are useful in some contexts, but the world is an unimaginably complex thing and trying to dumb it down can only take you so far.

"the truth doesn't exist, that everything we call "reality" is a lie on some level"

Wait so does this truth exist, or is it also a lie on some level?

> does this truth

It's not truth at all. Just an observation. It happens to fit with my perceptions. That's it.

You can artificially define truth by tying it to a particular frame of reference, but that's not "the truth", as that frame of reference is not 100% transferable to others anyway. The idea of umwelt, as I understand it, seems to work here. Still, it's just an observation, an impression.

I'm not really saying anything new here. Descartes was saying something similar quite a long time ago. Then again, he could have meant something else entirely and there's no way for me to know which is it. I just assume my understanding is close enough to the intended meaning. I may well be completely wrong on this.

Basically, there's nothing you can be really sure about, including the fact that you can't be really sure about anything. There are only things that appear to work for you and, possibly, others. You can use them. Just don't believe them unquestioningly.

Never said that. Default is truth. But there are rare exceptions.

If you didn't then I don't even know what you're saying, or why you're disagreeing.

Me: It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't.

You: Not correct. [...]

Exceptions exist in every situation for every single thing you say or do or think. Pointing out exceptions doesn't get you anywhere. Every adult knows this. A childs mind doesn't, and depending on the maturity level, cannot comprehend them. There is an established successful method of instruction of starting with simple rules/laws/examples, and then layering complexity later on. All of our systems of education are based on this. Nobody tells a child to also look up for a possible meteor or a skyscraper collapsing, or a vulture about to attack them before crossing the street. We just teach them to look both ways. Programmers are taught to take for granted that a bit in memory retains its value once set. When you're learning how to program you don't need to account for CPU bugs or cosmic radiation.

"If you didn't then I don't even know what you're saying, or why you're disagreeing.

Me: It is correct to teach kids that they should tell the truth in every scenario, even if as adults we don't." teach children that truth is optional."

Optional means maybe. I said default is truth. And when I say every, I mean every. But there are exceptions, in the case of enemys. So if they lie to someone, it means this one is a (temporary) enemy. Which is a serious implication. They do understand that usually.

They also understand the concept of friend and enemy very early, I bet you agree. (not that they can allways correctly sort it out, but also we adults can't do that allways)

So they very intuitively understand the exception in the case of a meeting with a potential dangerous person and I bet, instinctivly act accordingly and not tell him, where others are hiding for example. (Or break down and cry, also a valid strategy)

Sorry, I don't agree with your comment. No point in going round and round. Goodluck!

But it is much more likely for a kid to be approached by strangers who ask them if they are alone, than be hit by a meteor. So I say it does make sense to explain them the concept of a complex world and different rules for different situations as early as they can get in situations, where they have to look for themselves.

"Never roll your own authentication" seems to hold up pretty well though...

“Never use absolutes.” - Brightball’s Law

"Never use absolutes." is absolute in itself

That's the joke.

What a smart observation

Exactly. Exceptions prove the rule. But this applies to everything, not just databases. Never write your own OS, unless you are faced with an exception where you have to.

If you have a special case where RDBMs can't fill your need, then you obviously have to build your own. But these cases are so rare that it proves the rule.

It's "funny" how you (and many others too) didn't read the article and just commented based on the title.

It's as if RTFA was a real thing.

Is this something publicly available? I am currently hunting around for an embedded database which has amazingly similar requirements, and have found nothing. I started my own db, but i'd love to be able to take a look at what you ended up with, if possible.

It is proprietary, unfortunately, but you can pm me to discuss it.

Once you understand the problem it isn't really difficult to implement it. The trick is to decide what kind of properties you really need and only implement what is absolutely necessary to achieve it.

I did this in ANSI C using some of the stuff I have already implemented for the project. For example, I already had BER-TLV parser/serializer with very specific properties (managing collections within buffer of specified size, etc.) so I reused it for the file format and then again on application layer for the record format.

The basic database is KV store kept in the form of a append-only log file. The entries are records of modifications performed to the database. The keys and the values are binary and the structure is managed by the application. Application supplies callback to perform some operations (for example, given a base version of the record and a patch, calculate the patched version of the record).

The transactions were basically an identifier and a flag (is the entry end of transaction?)

All algorithms are very simple and focused on constant use of memory.

For example: coalescing operation was basically reading old file and writing alive records to a new file. I would traverse the old file entirely, for each record (O(n^3)) and, using callbacks supplied by the application, write end result of all committed transactions to the new file. After this was done to all records I would switch the files.

Fetching any record meant doing linear search through all database entries to find all entries related to the record, taking base version and then successively applying all patches until entire database file is traversed. This is inefficient as hell but the use case was that the database was very rarely actually accessed. The typical use was to just write data to the database and it was only accessed after failure or in some very specific background tasks.

wonder if sqlite would work for something like that, if you can get away from sql, i.e., strings.

It can. Just make sure you "prepare ()" your statements. This turns them into a compiled set of VM instructions.

The issue being sqlite did not even remotely meet the criteria:

- it requires operating system,

- its memory footprint even in best case way larger than total memory available on the system,

- can't work in constant amount of memory (no dynamic allocations after application startup, compile-time verifiable stack usage)

- can't provably continue from any power cycle -- there is no guaranteed automated recovery method in case of unfortunate power cycle

- data storage overhead was unacceptable for the application

You are no doubt correct about points (2) and (5). But just to set the record straight, points (1), (3), and (4) are mistaken.

The standard build of SQLite uses an OS, but there is a compile-time option to omit the OS dependency. It then falls to the developer to implement about a dozen methods on an object that will read/write from whatever storage system is used by the device. People do this. We know it works. We once had a customer use SQLite as the filesystem on their tiny little machine.

Likewise, the use of malloc() is enabled by default but can be disabled at compile-time. Without malloc(), your application has to provide SQLite a chunk of memory to use at startup. But that is all the memory that SQLite will ever use, guaranteed. Internally, SQLite subdivides and allocates the big chunk of memory, but we have mathematical proof that this can be done without ever encountering a memory allocation error. (Details are too long for this reply, but are covered in the SQLite documentation.)

Finally, we do have proof that SQLite can continue after a power cycle - assuming certain semantics provided by the storage layer. Hence, the proof depends on your underlying hardware and those methods you write to access the hardware for (1) above. But assuming those all behave as advertised, SQLite is proof against data loss following an unexpected power cut. We have demonstrated this by both code analysis, and experimentally.

So probably you were correct to write your own database in this case. My point is that SQLite did not miss your requirements by quite as big a margin as you suppose. If you had had a bigger hardware budget (SQLite needs about 0.5MB of code space and a similar amount of RAM, though the more RAM you feed it the faster it runs) then you might have been able to save yourself about two years of development effort.

Thank you for your explanation.

SQLite is fantastic product, it is just not aimed at extremely constrained platform like the one I was working on.

The device was a platform the company has already invested into a lot (few thousand units, committed to buy another few tens of housands) and not cheap (few hundred dollars per unit). The application was bid to extend the lifetime of that hardware by cramming new features to existing platform. At the end, after considering many products, it was clear to us that every byte saved was byte available for other features and memory was our limiting factor. So it was not even a question whether we could fit a particular database but how much space we would be left with for the really important features.

I don't remember object (*.o) sizes, but the entire database was 1kLOC of ANSI C while the entire application was about 70k LOC of heavily optimized code. The memory requirements were few tens of bytes of stack (not really important as other parts of application were using more stack) and hundred bytes of statically allocated memory. It even came to dumbing down algorithms to keep object sizes down. I learned a lot on that project.

I admit I did not do much research on the provable characteristics of SQLite back then (decade ago) once it was clear it could never fit our application. The research was mostly aimed to prove we need our own database. The management did not agree ("Never roll your own database...") So I just ended up doing it as a skunk works project. It worked, the product is still in use and it has never failed a single transaction (out of tens o billions processed).

I may even write my own blog post in the spirit of the one in the title of this thread, it just never occured to me it is interesting to general public.

sounds cool, I would be interested in a blog post. databases for highly constrained systems sounds interesting.

That's a great and informative answer, only minor comment that I think OP said the project was 2 years but the database portion was 1 month of that 2 years.

Still I agree better not to roll your own stuff unless it is absolutely critical.

Possibly relevant are these 2 great papers co-authored by Michael Stonebraker[1]:

“One Size Fits All”: An Idea Whose Time Has Come and Gone[2]

The End of an Architectural Era (It’s Time for a Complete Rewrite)[3]

They basically show that classical RDBMs are inefficient by around a factor of at least 10 in every conceivable application. I tend to trade in a little of that for the kinds of dramatically simpler mechanisms discussed in TFA.

[1] https://en.wikipedia.org/wiki/Michael_Stonebraker

[2] https://cs.brown.edu/~ugur/fits_all.pdf

[3] http://cs-www.cs.yale.edu/homes/dna/papers/vldb07hstore.pdf

Do they identify any cases for which classical RDBMS are inefficient by a super-constant factor?

ACID semantics subtley and indirectly requires locks. Any scenario where lock contention might be a problem would be a problem for an RDBMS (assuming the traditional definition of RDBMS as a ACID compliant). Those scenarios can easily be “super-constant”, especially if you have to distribute across multiple nodes. That requires locks that are held for the duration of an “external call” over the network which is A Bad Idea and could definitely be “super-constant”.

"Never write/roll your own" is the counterpoint to the NIH "Not Invented Here" syndrome... however, the older I get, the more reasonable NIH cases I see, such as the one in the article. People will say, for example, "why would you write your own ORM when there are so many available for free? Why reinvent the wheel?" Yet when I start looking at the wheels that are available, they take up a lot of space, a lot of time, take quite a while to learn (while the boss is asking me "is it done yet? is it done yet?"), have their own bugs, and need to be regularly upgraded and maintained against. This isn't necessarily a flaw - these "wheels" (ORMs, file storage solutions, web application frameworks and, dare I say it? security solutions!) are designed to solve generic problems rather than the specific targeted problem that I'm focused on.

If your problem is that your boss is asking "is it done yet?" writing your own new ORM does not seem like the most productive antidote to learning an existing hard-to-understand ORM....

Not in my experience. If I'm coding, and I can demonstrate that I'm checking stuff into Git (or SVN or CVS or Clearcase), and I have something to demo every couple of days, they're happy, even if there's theoretically a faster way to get from point A to point B. On the other hand, if my only progress (for a whole day!) is "researching" some software, they go into full-blown panic mode. I've found this to be absolutely consistent over a career of now 25 years and 10 different employers.

It might be easier to sell to your boss when there is measurable output from your work.

Overtime I have gone back and forth on this. When I was much younger I liked to write my own versions of libraries and such to have more control.

I got older and realized I now had to maintain 8 different projects, including my in house versions of things. Giving up control of a project to another person made me realize I had to eventually trust someone else to implement things correctly.

Now a days I am back on implementing my own solutions since I am more worried about efficiency. It would be nice if the compiler could inline the library implementation WELL(it usually sucks at this with static linking even with -ffunction-sections, etc.). I think there is a need for a language level facility for controlling what aspects of a third party library wind up in your final binary. Or more modular type libraries.

I just wish design decisions made by others weren't so baked in, like datatypes used or tag bits and what not.

The other dimension rarely mentioned in these ruminations is that in the NIH scenario, you're robbing your less senior team members of an opportunity to build transferable skills.

I'm not sure whether that doesn't rank very high for people, doesn't even occur to them to be a problem, or they just don't give a shit about anybody else.

Taking a long time to learn is, in my experience, becoming less common. Library authors are more aware of ergonomics than they were, say, 10 years ago. For every huge ORM, there's an intentionally small alternative built out of frustration.

Having a variety of high-quality, widely-used libraries to choose from is vital, though. That's why I don't use brand new stacks at work, even if they are "better" in some way.

I would love to write my own version of many things. I am confident that my version would be less bloated because it would be focused to my company’s particular use case (I.E. the MVC framework doesn’t need a table of non-standard HTTP methods unless we’re actually using them, and the ORM doesn’t need to be compatible with Oracle 11g DBs unless we’re using those). It would also have less surface area for vulnerabilities. It also wouldn’t get random updates that break backward compatibility in subtle ways. But I’ll never do it, because it’s more code we have to maintain in-house. Companies naturally accrete huge overburdened tech stacks, and anything I can do to cut back or slow the progress of that effect is a win.

Yeah but the stuff you roll yourself grows over time, has its own bugs, and gets retargeted for more projects over time.

Sure, sometimes. But most software developed in-house dies fairly quickly (<1 yr) and instead is rewritten with new requirements versus retooling old stuff. Lots of software dies on the vine or becomes legacy after 6 months.

I don't know where that's true.

If there are use cases even now where a roll-your-own solution is best, imagine the 1980s.

My first programming roles were in a mainframe shop with a major defense contractor, which had some brilliant mainframe system programmers over the years. Their major unclassified systems (payroll, shop order control, etc.) were internetworked with a homegrown real-time system, and they all had a home-grown disk access system (random and sequential) that was surreal in its speed and reliability, all coded in 370 assembler. On the business applications side, they had a thorough API that was callable from even COBOL programs.

By the mid-1980s, upper management decided they had to "standardize" and began developing replacement systems using IBM's IMS. Performance was unusably bad. I left around that time to join the UNIX RDBMS world, so I don't know if they ever found a solution that could actually be rolled out.

Working with the older stuff was actually fun because things just worked, and the customers of our legacy systems were really happy.

The major cost factors in software development are developer wages.

It’s all about how you spend the time you have. Don’t build stuff that you can get off shelf. You’re not going to write a better database, even if you think you can. You may get great performance for the current system on day one (after spending a huge amount of time and effort on developing it, perhaps with zero value created) but over the lifetime of the database you’ll incur huge costs that you probably can’t even fully foresee. The details here are not clear enough, but working from first principles (“I want to build an Evernote-like app”) I can’t imagine an experienced developer suggesting you should write a database (cache/file system).

The best software development is mundane: glue together what’s there, buy the resources you need to get sufficient performance. Switch a component if you need something to be faster, re-architect key parts, and buy the new components off the shelf too. It’s fun to build new stuff with custom algorithms you work out, but instead of that you can go home at 5pm, try building a database in your spare time and learn why you shouldn’t, and still have space to relax.

As Sam Altman wrote recently, the productivity tip that most people are astoundingly ignorant of is: choose what you do carefully.

It’s telling that this article does not begin: “we had a bottleneck that could not be solved.”

The story people should take away from the article isn't "Microsoft wrote their own database for their note-taking app, so should I for mine!", it's that "Microsoft launched and made wildly popular a note-taking app using standard tools, then learned their problem space including subtle requirements about the deployment environment, and THEN wrote their own database to solve real pain-points they had in practice."

On the flip side - building just enough software to solve the problem at hand can sometimes run circles around assembling things using off the shelf components. This is especially true if something like speed / memory / correctness / optimality is important.

Could you give a concrete example?

a. High frequency trading

b. Embedded software

c. High performance computing

d. Robotics

e. Military / Aviation software

In each of the above fields there are categories of software where you can run circles around general solutions by writing something custom and yes some of those involve writing your own database.

Also just about any software that handles business workflow. A common tradeoff in development is dynamic vs efficiency. For example, in a database, if you can define columns for properties you know about, you will have a much faster database than if you created a bunch of "Thing" tables with "ThingProperty" relationships.


Storing it doesn't take too much of a noticeable hit, but any queries on "Thing" tables get expensive real fast.

I’d take exception to embedded software and robotics, where reuse and safety usually take precedence over performance, although I’d agree that a minority of such applications may be good examples.

Apart from that, I agree with your list but I’d consider them to be very specialised domains that are not like the vast majority of software development, especially building conventional apps like OneNote (a document database).

and f. databases.

ha, just kidding.

As someone who works professionally on databases this made me chuckle.

I work on database software too, so I know it should be there. plus fuck databases, they are such a pain :-)

Thoroughly agree. There are very rare times where you need to write your own low level, high complexity system for a high level application, but ... they’re really, really rare nowadays. Almost always it’s developers practicing resume driven development.

I’ve never worked with anyone who tried to implement their own database, but I have worked with people who implemented their own network protocols, JS SPA frameworks and service discovery layers, and they were all really bad ideas.

It gets even worse when said person leaves the company, their system is still critically important and full of bugs, and has the inevitable little-to-no documentation. Other devs have to maintain this mess, and nobody truly understands it.

I have written my own database. In 3 months. Shipped it and installed in customers' sites, and it's been working great for many years now. So do write your own database, but know exactly how kernel, libc, language libraries, sockets work :-)

Alternatively, write your own kernel!

A coworker did but for an embedded device. It worked well but we did spend a lot of time debugging the networking.

Often such things "work" technically, but if you leave the project, some poor schmuck has to figure out what you did. It may be obvious to you, but not to others. Remember that maintenance is a bigger cost than development on average. Write systems for other developers, not for yourself.

Well it works and the poor schmuck is the person I hired and trained and they are still there. The db is still being sold and making money. Since code will be read more times than be written, I try to write it so that a junior version of me from 10 years ago could understand it, after waking up in the morning before the 1st cup of coffee :-)

The best advice is really to keep it simple. Build the system of out small, isolated units with clear APIs between them. The API and components should be isolated at runtime, so they can be restarted or fail separately, as well as in the code - separate modules / files / libraries. That way looking at it can be easier to understand how everything fits together.

The simple part wasn't just a platitude or a generalizing statement. A db is really a beast that can easily turn into a giant ball of spaghetti with a ton of features, settings and tweaks and never-ending list of bugs. A lot of work and thinking I did on it (including a rewrite after an initial prototype) was with the goal of making it simple by cutting unnecessary features.

It seems to me that Microsoft frequently writes new embedded databases, leaving them poorly documented and accessible from weird tools. For example, the registry, Windows installer, structured storage.

Structured storage isn't embedded database tech. It's actually FAT (or close enough) inside a file, with a bunch of helper COM objects.

There are two old commonly used programs that have their own database - BIND and Sendmail. Both have problems because their database is crappy.

Can you elaborate on the problems that stem from their crappy databases?

Doesn't OpenLDAP implement it's own database or use bdb? I remember it corrupting itself if the server lost power.

OpenLDAP uses LMDB now, one of the fastest and reliable key/value stores around, which they wrote precisely because berkeleydb sucked.

Also because Oracle was kind of a dick (as usual) with the license after BDB 6.0.12.

A lot of projects have switch away from BDB for that reason, others still rely on BDB 5.3.

Also, I kind of agree that BDB is a bit of a nightmare to work with.

On this topic - we tried, really really tried, not to write our own database. We've used BerkeleyDB in OpenLDAP since around 2000. We'd sunk a ton of energy into learning how to use it optimally. And it was ~2 years, from when we started looking for better approaches in 2009, until we finally bit the bullet and started writing LMDB in 2011. Frankly I think "never roll your own" is good advice in 98% of cases, but if after you've examined all your options, you don't find what you're looking for, then there's no other choice.

This was early 2016 from the centos 7 repos, so this must be relatively recent. It didn't even have to be writing to it. I just had to reboot and it was left in a dirty state.

CentOS repos often don't use current software, so not necessarily.

The looser your requirements are, the easier it is to write your own so-called database. Nobody should be writing a full ACID/SQL/whatever database as part of another project. On the other hand, if you want to implement a "database" in the same sense that many implement a "filesystem" which is just an HTTP object layer on top of a real filesystem, knock yourself out. Just don't think you're doing the same thing as the people writing more featureful general-purpose databases.

If you're willing to accept that durability is a lie, and also to compromise (possibly a lot) on performance and trust that the CPU and RAM aren't actively helping the disk to sabatoge you, you can make a fairly full-featured ACI database in a weekend using Merkle trees and a single-threaded rendevous serializer. The hard part is getting those guarantees while also having latency and throughput vaguely comparable to naive file writes. (I'd add something about a better API than assembly language, but database people decided to imitate COBOL, so I can't really criticise assembly with a straight face.)

Log structured merge is the last piece of your architecture. It solves the latency part. It can also solve your durability issues, if you fsync when necessary and you understand your hardware.

Yeah, sorry, I assumed log/journal plus reapply-on-recovery went without saying. You can't have guaranteed durability (short of blockchain-style distributed stores) because it's always possible for your database server to suffer a unrecoverable hardware failure between transmitting "transaction success" and taping out its weekly backup. If all else success, your datacenter catching fire and burning down would suffice. So it's only ever a matter of reducing (not eliminating) the probability of having to roll back commited transactions. (Tldr: fsync can't help if your hard drive melted.)

Oh, I assumed we were talking distributed. Fsync to more than one disk. My biases :).

Fsync to more than one disk is great way to reduce the probability of forced rollback, but simutaneous failure is a still possible.

Then again, ACI does assume you aren't having CPU/memory/etc failures, so to the extent that you count anything as guaranteed, it is probably possible to 'guarantee' durablity. My original point was more that "minimally adequate" isn't that hard compared to going from there to a high performance, general purpose system (and also a offhand snipe at SQL).

I wrote my own database once. It’s an “event sourcing” database, but I wrote it way before event sourcing was cool. It can handle over 1M events/sec on one core. I don’t know any off-the-shelf product that could replace it.

I also helped write a log-structured OLTP-ish database that sits in front of MySQL.

I have only two regrets about these systems. First, MySQL was, in retrospect, the wrong backing store. Second, I used Thrift serialization. Thrift is not so great.

Did using MySQL allow you to ship a product on-time and under budget? Or, to put this the other way, if you had used The Right Backing Store would you still have delivered the product in a timely manner at the right price-point?

Unclear. I think the right backing store these days would be PostgreSQL. At the time, both were equally easy to get started with. PostgreSQL didn’t have fully serializable transactions yet (IIRC), and I was more familiar with MySQL. OTOH, MySQL has sucked in myriad ways. Replication is very easy to screw up. The query optimizer and SQL advanced feature support is not so good. In general, there have been a lot of rough edges.

There's also the (anti)pattern of making your own database in a lesser sense - implementing a set of tables on a SQL database for generic data. I think this is called the entity-attribute-value pattern. My impression is this is often considered deplorable, yet practically every application built on a database that I've encountered at work uses it to some extent.

As soon as you allow some kind of user-defined meta data, you basically end up with the entity-attribute-value pattern.

Every sufficiently advanced system does that, be it a CRM that allows custom fields, or a project management or ticket system, a workflow system etc. And basically all enterprise software contains elements of these systems.

I always felt like the separation between DDL and DML in standard databases might be fundamentally contributing to that though. I'm not smart enough to come up with a better paradigm. But sometimes it seems like the most basic principles of how SQL databases work are at war with how they are used.

A good alternative would be to use JSON types in RDBMS in combination with classic field types. EAV is really just free form data and JSON is easier to query in MySQL or Postgres.

Yes, but that's a fairly recent development, often without good ORM support, and I don't know how cross-database compatible the support is.

For a new application that only supports one database, it's probably a better choice.

In some cases, you can also expose in some ways the database schema to the user through a custom made ORM.

The result is also a nightmare, it makes for really difficult upgrades, and a complex product for the customer as he has to have some database knowledge and can shoot himself in the foot quite easily.

I still don't know which approach is the worst between EAV and exposing the data schema...

Aren't you describing GraphQL Here?

To me it sounds more like the way Salesforce allows you to add custom fields to objects.

"As alluded to above, this does introduce a separate challenge with multi-file consistency. The team took an interesting approach here. The index file points to the location of the root of the page graph and free space map in each page cache file. When persisting a multi-file transaction, all page cache files are flushed to disk before the index file."

That sounds familiar: https://pthree.org/2012/12/14/zfs-administration-part-ix-cop...

I used to work at a company where the "database" was literally a raw dump of the in-memory array of structs defined in a C program. Had some fun with that because struct padding depended on compiler flags, so different varieties of the program that were built with different compiler flags could not read each other's "database".

I never got over it, really. On the plus side, it was super fast, that I cannot deny. But it always seemed super gross to me.

Lots of software used to do this, back when computers were really slow and low on RAM. The old MS Office, Adobe Photoshop formats are notorious for it.

To be fair, the software had been first written in the early 1990s, so I can imagine it was a rational decision back then.

I was just always amazed that there never were any problems with corrupt files.

I was not expecting this article to be about OneNote, but I'm glad that it is. OneNote is a surprisingly good product, but it takes so excruciatingly long to sync what must be 10KB of text. I look forward to what these changes enable.

If you’re writing your own database and you are serious about making it a business, reach out to me via my profile here. Ive been there done that, happy to help.

We wrote our own database 15 years ago - needed something embeddable into the application on both windows and linux, i386 and alpha, and the ability to do transactional schema modifications without global locking. If I recall correctly, postgresql was at 7.3 at the time and would happily dump core if you tried something like 'drop table x;rollback'.

And we didn't need full sql semantics, because we partially implemented it into our scripting language anyway and used only cursor APIs. postgresql's mvcc was still a great inspiration for how to design the actual row storage.

Served us well, but 15 years later, a bit too much of custom features make it hard to switch to a standard database.

If everybody never writes a database, who does it?

Domain experts for whom the database is the main project rather than a tacked on requirement.

The same people who write cryptography software.

I wrote a database for my app and it was the best decision we made.

Without a comparison of how good the other decisions were, this means nothing :)

that comment made my day xD

The advantage of rolling your own is that you get to optimize and control things at a level that is just not possible when working through a complex external component

Then you stick it on a filesystem that does not work the way you think it does or even worse you stick it on a remote filesystem (eg CIFS/SMB) that does not work the way you think it does and is now backed by an FS that also does not work the way you think it does.

Your DB may work fine but make sure you onboard the lessons that all the others have learnt through bitter experience.

OneNote is fantastic. Of all the Microsoft things I have installed on my Mac, it’s light years ahead of the rest in terms of stability, usability, speed. It doesn’t get enough recognition!

TL;DR: Don't build your own database -- here's how and why we (Microsoft OneNote) did anyhow and it worked out really well for us.

Did it ? last i've tried oneNote, it was a buggy piece of crap, coulnd't even finish the import(on win 7). so IDK , databases ?

I'd say that since this article was written well after Win7 that it's in response to your experience and other experiences like yours.

I tried the import recently.

At SunSed we have created our own DB system for the exact same reason:

WordPress average page generation: 1 sec

SunSed CMS: 0.03 sec

Database is almost always the reason for slow applications.

Edit: if you do caching right, WordPress becomes as fast.

It took me a long time to feel comfortable with the idea of building our own database. For Stream it worked out well though: https://stackshare.io/stream/stream-and-go-news-feeds-for-ov... Tools like RocksDB and Raft give you really flexible building blocks.

Thing with rules is that you can't know when is right to break them if you don't know them.

We would have never had NoSQL ...

Yeah you would. IBM was doing it in the 1960s pre-SQL.

This article is timely. I have just written a database to fill a gaping hole in the ecosystem. It even understands SQL.

Well depends on the case, I needed a fast ip to country resolving, really fast one. I did my database in C for it and it was 120 times (yes!) faster than fully optimized postgres table, not to mention memory, disk and cpu footprint.

Custom made database for just a specific problem can be a lot faster than generic one. But depends on your knowlidge.

That's just a lookup table, not a database.

At least to be fair he is comparing with Postgres.

I'm still unsure whether a database would benefit me, or if keeping everything flat is easier for ~50GB on a single machine. I'm leaning towards yes, as does the article, but my log files do not need much maintenance so I do not make persistent transformations.

"A database is an organized collection of data."

By definition. First sentence in wiki. I don't know what are your definitions.

I could be misinterpreting but I believe a lookup table is more of a suggestion than anything.

It would be easier than for instance a hash table, or rather in networking, a deterministic solution is more common.

I'm obviously not a systems programmer though so I can't provide much else.

If you want to call a lookup table a database I'm ok with that.

But that means I'm writing my own databases at least a couple times a year. I'm not sure I'm okay with that.

yeah, just a hash table would suffice

It was actually simpler, I have packed from and to range into uint64, added data offset, ordered at generation of db file and searched using bisection. Used the fact that ip is just a uint32 and range can fit into one register on 64 bit machine. Fixed size range (+data) enabled me to calculate position in file so bisection was simple.

I'm pretty sure the mantra is "Never Write Your Own Crypto", not "never write your own database".

I know tons of people who roll their own flat-file storage engine and are perfectly happy with it, even scaling up to their moderate couple of tens-of-thousands of users. Nothing fancy.

Personally, I had to write my own database (just like what the author of the article wound up doing), and had a delightful time learning all sorts of things, and now it is one of the most popular databases out there (https://github.com/amark/gun) and I encourage others to try (if they have time) building one themselves!

The mantra tends to be “just use SQL” or “just use SQL, unless...”. Similar to the advice “just use an int, unless...”.

The wisdom in that is usually pretty sage. Reach for a SQL database unless there are reasonable and concrete reasons not to. Then, evaluate whether there’s a need to roll your own, or if there’s something you can grab right off the shelf.

I’d argue there are plenty of perfectly valid reasons to roll your own, and it’s not particularly difficult if you build on an already-proven storage foundation (file system, LMDB, etc.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact