
Architecture of a Database System - adamnemecek
http://blog.acolyer.org/2015/01/20/architecture-of-a-database-system/
======
jandrewrogers
Due to the number of years required to design and implement a solid database
architecture, database design principles in current systems always tend to be
a bit different than what you would build if you were starting today (whenever
"today" actually is). Database designs are always fighting the last war, based
on assumptions about resource balance, hardware behaviors, and system
architectures that may not be strictly true anymore.

Some related comments, with respect to the article:

\- Modern database models are one process per physical core regardless of how
many sessions, queries, connections, etc that there are. These cores own the
part of the data they work with. This has several advantages on modern
computing architectures. It also makes for a pretty elegant implementation.

\- Related to the first point, new database kernels are increasingly "shared
nothing" even within a single server. As in, physical resources will be cut up
between cores / processes and rarely shared. It is like running a cluster
within a machine. Again, this has some significant performance advantages on
modern machines.

\- The key advantage of SSDs, which was not true for spinning disk, is that
you can usually guarantee effective disk I/O bandwidth is always significantly
greater than the full-duplex network bandwidth to the server no matter what
the workload. This has an interesting technical implication: if the I/O
scheduler is correctly designed and implemented, an in-memory database engine
should never be faster than a disk-backed database. In-memory was only an
optimization that made sense for spinning disk; with SSD, if you aren't
compute-bound, you can always saturate the network (and if you are compute-
bound, in-memory does not help).

~~~
jamii
Some of the more exciting ideas I've seen recently:

[http://arxiv.org/abs/1310.3314](http://arxiv.org/abs/1310.3314) \-
Understanding the core difficulty in answering relational queries and examples
of problems for which current query optimisers _always_ produce plans which
are asymptotically suboptimal

[http://arxiv.org/abs/1404.0703](http://arxiv.org/abs/1404.0703) \- The first
theoretical analysis which can relate the choice of indexes to worst-case
bounds. Presents a single join algorithm which is asymptotically optimal on
_every_ problem without even using cardinality estimates.

[http://arxiv.org/abs/1210.0481](http://arxiv.org/abs/1210.0481) \- A join
algorithm that meets some of the bounds of the above paper and is also fast in
practice and can be incrementally maintained.

[https://infosys.uni-saarland.de/projects/octopusdb.php](https://infosys.uni-
saarland.de/projects/octopusdb.php) \- Treating index creation, query
optimising, view materialisation, incremental maintenance etc as one large
optimisation problem.

[http://www.vldb.org/pvldb/vol4/p539-neumann.pdf](http://www.vldb.org/pvldb/vol4/p539-neumann.pdf)
\- Compiling query plans to LLVM because the query plans / indexes are good
enough to make the plan cpu-bound instead of memory-bound

[http://hyper-db.de/HyperTechReport.pdf](http://hyper-
db.de/HyperTechReport.pdf) \- Running OLAP and OLTP workloads on the same
database without interference

It seems possible that in the future, far from having a Cambrian explosion of
specialised databases, we will be able to store everything in a single db and
treat questions of data layout, partitioning, indexing etc as a direct
optimisation problem.

~~~
adamnemecek
Those look awesome, thanks for posting them. Quick question, how do you learn
about all these papers? They all seem very recent.

~~~
jamii
I'm working on join algorithms at the moment, so I spent the last few months
getting up to speed on the latest research.

The rest is just from general interest. I spend around ten hours a week
reading papers or textbooks. Whenever I find something really mind blowing I
follow up citations, set up google scholar alerts for the authors, subscribe
to their rss feed etc.

A lot of my favourite papers are linked on the OP site - it looks like a good
place to start.

------
dman
Great post.

