
Application-Level Consensus [pdf] - hugothefrog
http://weareadaptive.com/wp-content/uploads/2017/03/Application-Level-Consensus.pdf
======
ergl
Jane Street uses the same approach to build their exchange [0]. Like the doc
says, it can be great to replay some sequence of messages in dev to reproduce
issues, and to give fault-tolerance to the system.

One downside is that, if all your nodes are using the same application code,
simply replaying the log might not help as all nodes might hit exactly the
same bug with the same sequence of transitions.

[0] There's an overwiev of their infrastructure here
[https://youtube.com/watch?v=b1e4t2k2KJY](https://youtube.com/watch?v=b1e4t2k2KJY)

~~~
odeheurles
Thanks for sharing the video and great talk btw. Brian, the speaker, actually
asks the audience (around minute 20 in the video) if anybody use paxos for the
matching engine. What I'm talking about in the article is exactly that: we're
just using another consensus algorithm (Raft) which is significantly simpler
to implement than Paxos.

LMAX use synchronous replication in their exchange:
[https://www.infoq.com/presentations/LMAX](https://www.infoq.com/presentations/LMAX)

~~~
sourcedelica
What kind of latency does the consensus add? We are looking at adding fault
tolerance to our matching engine but can only afford 10-15 micros.

~~~
sourcedelica
Related to the latency question, I just watched the Jane Street video (very
nice!) and he mentioned that they use operator-initiated failover and he
didn't know of anyone using a consensus based approach because it adds an
extra hop. Does your Raft-based failover solution do automatic failover?

------
gawi
This is very interesting. I have no doubts that not having to deal with fault
tolerance at the application level compensates for the efforts to put in place
this architecture. And yes, in my opinion, "application-level consensus" is
the perfect term to designate this architecture.

~~~
mr_luc
I agree. One place where application-level consensus is fairly common is in
Elixir applications, mostly thanks to the crdt implementation that's nicely
wrapped up by Phoenix.Tracker in the phoenix_pubsub library.

This is used by the Phoenix project's Presence module to provide a distributed
notion of what users are 'present', but it's also used by others to do service
location using hash rings, or implement a dht, etc. I've used it for master
election and failover on a few projects for little services.

------
silviatorres
Hi there!! looks like there was some minimal mistakes on the text and the
document was updated: [http://weareadaptive.com/wp-
content/uploads/2017/04/Applicat...](http://weareadaptive.com/wp-
content/uploads/2017/04/Application-Level-Consensus.pdf)

------
eternalban
Try 'Edge-Coherence'.

