
Notes on the Amazon Aurora Paper - aptxkid
https://blog.the-pans.com/amazon-aurora/
======
matharmin
Separating the storage layer from the database and query layer seems to be the
future of databases - there's no reason for every database to re-implement the
storage and replication layer just to provide a different query language or
data model. This is also what makes FoundationDB so attractive for me. Are
there any other large projects doing this?

~~~
jandrewrogers
People often greatly underestimate the performance loss incurred by separating
the storage layer from the execution layer. The difference is an integer
factor. In a high-performance database design, the execution scheduling
behavior _must_ be tightly and intricately coupled to the I/O operation
scheduling behavior or you lose much low-hanging throughput optimization.

That said, this can be solved by changing the level of abstraction. If you
tightly coupled the execution and storage engine as a single module, which is
not intrinsically dependent on the type of data model, it _would_ allow high-
performance multi-use within reasonable constraints. Albeit with a very
different API than a storage engine. But that is not a thing that exists in
open source AFAIK.

The loss from separating storage and execution may not matter for databases
where operational efficiency is not paramount (e.g. because the scale is too
small for it to matter) but many database engine designers are not going to
take that use case limiting performance hit because the CapEx/OpEx
implications are large.

~~~
ralusek
I think the point is that now that we're in the era of
distributed/cloud/partitioned databases, the query logic cannot necessarily be
colocated with the storage, so might as well get the advantages of decoupling.

The performance gains at the distributed level seem to rely on either
parallelized compute (a-la MapReduce) or over-indexing (a-la
ElasticSearch/GIN) rather than the sort of efficiency gains that were made at
the level of optimizing between cpu/cache/memory/disk. I think modern problems
regarding distributed databases are concerned with the time complexity not
exploding as the size of the data does.

~~~
jandrewrogers
The query logic/execution is pushed down to the storage in most reasonable
distributed database architectures, along with enough metadata that the
individual storage engines are fully aware of (their piece of) the execution
plan and orchestrate it directly with other nodes e.g. for joins. The reason
for local storage is that this will be almost completely bandwidth bound. When
you look at growing markets like sensor analytics for edge computing this is
the only model that works at scale. Most new parallel databases aren't using a
classic MapReduce model or really anything that requires centralized
orchestration.

The distributed databases in open source are a bit of a biased sample as they
tend to copy architectures from each other and are a trailing indicator of the
state-of-the-art. For example, you virtually never see facilities for
individual nodes to self-orchestrate parallel query execution (this is almost
always centralized), which entirely changes how you would go about scaling out
e.g. join operations or mixed workload. Building this facility typically
requires a tightly coupled storage and execution engine, but open source
systems are strongly biased toward decoupling these things for expediency.
Open source database architecture is trapped in a local minima.

~~~
cozos
In other words, are you saying that distributed databases with local storage
can greatly reduce network I/O with things like predicate pushdown, which is
not possible when the database is built on top of a distributed file system?

------
wgjordan
> But here in the paper, it compares Aurora to a MySQL synchronous mirroring
> setup, which is really unfair, IMO. Why the active primary has to replicate
> everything over the wire to the active standby? Doesn't MySQL's replication
> support Statement-Based Replication?

The comparison in the paper is against the synchronous-mirroring pattern
underlying RDS Multi-AZ [1], which is the existing fully-managed, high-
availability MySQL solution Amazon offers, so the direct comparison is
appropriate for existing RDS Multi-AZ users considering a migration to Aurora.

[1] [https://aws.amazon.com/blogs/database/amazon-rds-under-
the-h...](https://aws.amazon.com/blogs/database/amazon-rds-under-the-hood-
multi-az/)

~~~
aptxkid
Thanks! That makes much more sense!

------
benmmurphy
One of the neat things about aurora is there is a very small slave lag between
the primary and read replicas. If you only have a single thread doing
replication then you can easily have a situation where a multi-cpu master can
get ahead of a slave only using a single thread. Lazily materializing the
pages gives aurora 'parallel replication' for free. I think Mysql supports
parallel replication now but I think it is limited if transactions conflict
[[https://mariadb.com/kb/en/library/parallel-
replication/](https://mariadb.com/kb/en/library/parallel-replication/)].

Another thing I noticed with Aurora is the incremental cost of storage for
Aurora is extremely cheap compared to the cost of storage in EBS or the cost
of storage if you used instance storage in EC2. The Aurora storage is 0.1/GB
but this is replicated 6 times which is more than EBS and EBS costs the same
0.1/GB. This also might be why no-one is going to build a similar system to
Aurora. It will be hard to sell something like Aurora to cloud users because
the storage costs are going to more than what EC2 charges.

------
ralusek
Given this architecture, do we know why Aurora caps at 64TB max size?

------
eecc
Funny how the CQRS circle closes: it is an architectural design that overhauls
some concepts of database design (the redo log, views) into the application
layer (for some, to the extreme of reducing the database to a simple log.) And
now this same separation is appearing in a DBMS design :)

~~~
tirumaraiselvan
As per my understanding, CQRS fits here if you consider the database log as
the "event source". But wouldn't it be terrible performance wise to create the
"view" from the event source rather than using the storage layer.

~~~
ricardobeat
I think there’s a parallel with how apps keep a trailing materialied view of
the data instead of re-creating from the log, unless necessary.

------
kakwa_
I'm wondering how aurora would behave with queries mixing read and write.

For example, with something like "INSERT INTO my_table(field1, field2, field3,
...) SELECT f1, f2, f3... from root_table WHERE ...", I'm wondering how it
will spread the load across nodes.

------
danielecook
I don’t understand how aurora achieves the speed it does with a log based
approach. Can someone please clarify?

~~~
dmlittle
Last time we benchmarked Aurora (~2 years ago) the write speed of Aurora is
pretty slow compared to RDS (Postgres RDS was able to achieve 3x write
throughput)

~~~
PetahNZ
Anecdotally after migrating from RDS MySQL to serverless Aurora there was a
noticeable slowdown of our dashboard and reporting tools. Our typical
workloads (ecommerce transactions) are slightly slower on average, but the
peaks seem down.

~~~
sciurus
Serverless Aurora likely has different performance characteristics than normal
Aurora.

------
ddorian43
For something open-source like this, look at tikv(transactional key-value
(rust)) + tidb(query layer (go)).

------
anentropic
dumb question which maybe has a well-known answer: has anyone done the same
thing but for postgres instead of mysql?

~~~
a-robinson
Amazon - [https://aws.amazon.com/rds/aurora/details/postgresql-
details...](https://aws.amazon.com/rds/aurora/details/postgresql-details/)

~~~
anentropic
aha cheers!

LOL

~~~
mcheshier
Be careful though - we use Aurora PG and it's great for what it does, but they
do not support managed upgrades across major PG versions yet! We're stuck on
9.6.x because the time to dump and restore our large DB is a non-starter with
the rest of the business.

~~~
foxylion
Did you try to upgrade using AWS database migration service?

------
ddorian43
Everything is faster when you move your data to ebs \s.

Though google has done this for all databases (at least their internal ones)
(even bigtable). And some companies say they can do it for nvme & ram.

You just need real-good networking and some type of asic and removing kernel
from the path.

