
Amazon Aurora: Avoiding Distributed Consensus for I/Os, Commits, Membership - bookofjoe
https://blog.acolyer.org/2019/03/27/amazon-aurora:-on-avoiding-distributed-consensus-for-i-os,-commits,-and-membership-changes/
======
crdrost
This is just one of those solutions that was beautiful and simple and why
didn't we think of that. The goal of distributed consensus can be viewed as
something to do with creating a transitive order A < B < C < D out of
individual ordering judgments: so I might know A < B and C < D, someone else
knows B < C, eventually everybody can know the full sequence of items. Making
this sequence contain metaïnformation about the entities that it contains has
been a well-known trick for just about forever, usually done by using that
metadata to elect a “leader” who handles the writes personally, for speed's
sake.

But the idea of switching from a single leader to a quorum and _then using the
metadata to commit to overlapping quorum groups_ in order to preserve this
transitive property between orderings was just really really slick.

About three years ago there was an early presentation about the backend
[https://www.youtube.com/watch?v=-TbRxwcux3c](https://www.youtube.com/watch?v=-TbRxwcux3c)
which made it just seem like the 4/6 thing just had to do with a normal
replicated database or so, it seemed like a fixed topology to do with their
availability zones—I didn't realize that they were also dealing with some sort
of consensus overhead to enable swapping new nodes in or out.

There was also a somewhat marketingy presentation about this technology in the
deep dive at
[https://www.youtube.com/watch?v=U42mC_iKSBg](https://www.youtube.com/watch?v=U42mC_iKSBg)
which I confess kinda went over my head at the time, so these two blog
articles are a wonderful condensing of the heart of the service for me. None
of the companies I've worked for has made much use of Amazon but it's a really
clever idea that certainly deserves to kick around in the back of folks'
heads.

------
namibj
Another, related paper describing rather easily why this separation of
consensus-requiring and consensus-free computation/data-writing is the right
separation is the CALM paper: [https://blog.acolyer.org/2019/03/06/keeping-
calm-when-distri...](https://blog.acolyer.org/2019/03/06/keeping-calm-when-
distributed-consistency-is-easy/)

~~~
twotwotwo
Yes, a single coordinator using other machines to do a lot of the work does
make some otherwise very difficult problems seem more tractable.

An interesting unit for this kind of design outside the cloud would be a
single rack of servers (or however much you can hook up to one switch)--assure
pretty low latency to your remote RAM/disk.

------
m0zg
Whenever I see someone tout the magical properties of their distributed
systems, I always want to see a report from Jepsen, which takes their system
apart like a rotisserie chicken.

~~~
zzzcpan
Jepsen is about consistency and doesn't address most of the things the paper
is focused on, real world production things.

~~~
m0zg
In a distributed system consistency _is_ "real world production things",
unless you want to lose customer data.

~~~
zzzcpan
If only that was true. Consistency is a design constraint, not a production
problem. Making things work well in real world in production is the hard part,
this is what distributed systems are actually about. And I like that about
Aurora, focusing on the hard part.

~~~
sethev
Jepsen has done a good job of showing that a lot of systems that focused on
the hard part failed to do the "easy" part.

------
sbmthakur
Adrian had also reviewed a paper on design considerations of Amazon Aurora:
[https://blog.acolyer.org/2019/03/25/amazon-aurora-design-
con...](https://blog.acolyer.org/2019/03/25/amazon-aurora-design-
considerations-for-high-throughput-cloud-native-relational-databases/)

------
DroidX86
I like how it has almost become the norm to post the link to Adrian Colyers
blog about any paper on Distributed Systems than a link to the paper itself.
:)

(I'm not complaining, I love his blog)

~~~
ignoramous
Re: distsys: You might also like
[https://muratbuffalo.blogspot.com/](https://muratbuffalo.blogspot.com/)

------
eternalban
Surprised there wasn't any mentioned of [1] in the paper's citations. This
notion of articulating the internal sub-systems of a DB as first class
elements of a distributed realization has been emerging (from various
quarters) for the past few years. I would also include Apache Pulsar in this
category. Aurora itself seems very interesting from an architectural point of
view and definitely worth studying.

[1]: [https://www.confluent.io/blog/turning-the-database-inside-
ou...](https://www.confluent.io/blog/turning-the-database-inside-out-with-
apache-samza/)

~~~
zzzcpan
This talk is completely irrelevant to the paper in question.

