
Master-Less Distributed Queue with PG Paxos - ahachete
https://www.citusdata.com/blog/14-marco/411-master-less-distributed-queue-postgres-and-pg-paxos
======
teraflop
How does Paxos replication interact with Postgres transactions? This doesn't
seem to be explained in the article or docs, and the examples only show the
behavior with autocommit.

In particular, suppose I do the following sequence of operations on an empty
table:

    
    
        BEGIN
        INSERT INTO table VALUES ('foobar', 123);
        SELECT * FROM table;
        ROLLBACK;
    

Would the INSERT be submitted to the Paxos log immediately, causing it to be
applied on other replicas even though the transaction never committed? Or
would it be deferred until commit time, causing the SELECT to return an empty
result? Or is there something more sophisticated going on?

~~~
mslot
Once a query has been logged there's no turning back. The problem with
transactions is mainly that Paxos is fundamentally incompatible with read
committed mode. It is technically possible to log a multi-statement
transaction as a single string, which makes it serializable.

------
toolz
What are the benefits of a system like this? More available reads at the
expense of terribly slow write locks?

~~~
utternerd
From their own documentation the use case is reliable replication, and even
reads would be horribly slow:

"The drawbacks are high latency in both reads and writes and low throughput.
Pg_paxos cannot be used for high performance transactional systems. But it can
serve very well for low-bandwith, reliable replication use cases."

~~~
dijit
Better to wait for postgresql 9.6 which will have synchronous replication,
write latency but not read.

[http://michael.otacoo.com/postgresql-2/postgres-9-6-feature-...](http://michael.otacoo.com/postgresql-2/postgres-9-6-feature-
highlight-multi-sync-rep/)

~~~
ahachete
PostgreSQL supports synchronous replication since 9.1. What 9.6 will have is
support for more than one synchronous replicated server.

In any case, synchronous replication means that _all_ of the participating
servers have to participate in the replication process. If one of them slows
down or hangs, replication (and your transaction) does not proceed.

Paxos, on the contrary, can proceed when N/2+1 of the nodes are available.
That's a huge difference, and it's irrespective of the latency and
performance. In other words: while 9.6's synchronous replication is a really
welcomed addition, a single miss-behaving node will halt transactions on the
cluster, while pg_paxos will continue operating without problems. Both are
meant for different use cases.

~~~
teraflop
This isn't quite true. As described in that blog post, you can configure
Postgres to synchronously replicate to N servers but only wait for M
responses. With M=N/2+1, you get the same availability as Paxos.

The difference is that with Postgres' replication, when the master fails,
write operations can't be executed until a new master is promoted. This has to
be done carefully, because you want to make sure that no in-flight operations
are still happening on the old master (aka STONITH), and that the most up-to-
date slave becomes the new master.

Paxos avoids the need for manual (or very delicately-automated) failover, at
the cost of extra network round-trips and disk syncs on every operation.

~~~
ahachete
You are right. If you, effectively, configure it for M responses, you get the
same availability.

But there are more differences between both setups:

\- Paxos is master-less, so you can write to any node (there's no need for a
master).

\- Failover is very tough to get it right. Indeed, other than consensus, there
are no other bullet-proof solutions to achieve it under any circumstance, so
relying on a master is a significant difference.

Regarding the extra round-trips and syncs, they can be pipelined if wanted
too. I wouldn't conclude this is necessarily slower (it of course depends on
the Paxos imlementation) until properly benchmarked.

------
koolba
How does using paxos compare to standard 2PC/XA transactions? Does this only
work for append only unique data structures (ex: immutable log style)?

~~~
mslot
Paxos provides strong consistency and can proceed even if some nodes fail. 2PC
has intermediate states in which a transaction is only partially committed,
and all nodes need to be available to perform writes. The downside is that
Paxos' write throughput is bounded by network latency and it requires network
round-trips on both reads and writes. 2PC is more suitable when you require
low read latency or high write throughput.

> Does this only work for append only unique data structures (ex: immutable
> log style)?

Paxos is based on a technique called state machine replication. It replicates
an append-only log of changes to an initial state, which allows you to
replicate arbitrary data structures. For example, pg_paxos logs SQL commands
on a table (e.g. UPDATE).

------
hardwaresofton
Aren't writes going to be (potentially) crazy slow? seems like they require
every transaction to achieve a quorum.

