
Bidirectional Replication is coming to PostgreSQL 9.6 - iamd3vil
http://blog.2ndquadrant.com/bdr-is-coming-to-postgresql-9-6/
======
ukj
Holy crap, I am scared!

Please, please, please read the fine print and ensure you understand the
design tradeoffs as well as your application's requirements before blindly
using this.

The moment I heard multi-master I thought Paxos, Raft or maybe virtual
synchrony. Hmm, nothing in the documentation. Maybe a new consensus protocol
was written from scratch then? That should be interesting!

No, none of that either - this implementation completely disregards
consistency and makes write conflicts the developer's problem.

From [http://bdr-project.org/docs/stable/weak-coupled-
multimaster....](http://bdr-project.org/docs/stable/weak-coupled-
multimaster.html)

* Applications using BDR are free to write to any node so long as they are careful to prevent or cope with conflicts

* There is no complex election of a new master if a node goes down or network problems arise. There is no wait for failover. Each node is always a master and always directly writeable.

* Applications can be partition-tolerant: the application can keep keep working even if it loses communication with some or all other nodes, then re-sync automatically when connectivity is restored. Loss of a critical VPN tunnel or WAN won't bring the entire store or satellite office to a halt.

Basically:

* Transactions are a lie

* Consistent reads are a lie

* Datasets will diverge during network partitioning

* Convergence is not guaranteed without a mechanism for resolving write conflicts

I am sure there are use-cases where the risk of this design is acceptable (or
necessary), but ensure you have a plan for dealing with data inconsistencies!

~~~
jeffdavis
Do you know of any relational products which offer high-throughput, low-
latency, high-availability transaction processing using perfectly synchronous
multi-master replication?

~~~
nine_k
Doesn't Oracle offer such an option?

~~~
vidarh
You _can 't_ guarantee low latency and consistency without setting
geographical boundaries on the distribution of the servers, unless you have
very generous definitions of "low latency", as the speed of light (and more
realistically: the speed you can transmit a signal via the internet, which is
substantially lower) between the servers will place lower bounds on latency.

This is the case as you can't guarantee consistency without coordination
between the servers (or you won't know what order to make operations visible
in the case of multi-master, or whether a transaction has even been
successfully applied by a slave when replicating master-slave), which at a
bare minimum involves one round-trip (sending a transaction, and waiting for
confirmation that it has been applied; assuming no conflicts requiring
additional effort to reconcile).

You can pipeline some stuff to partially avoid actual delays, but you can't
avoid the signalling, and for many applications it has disastrous effect on
throughput when certain tables/rows are highly contended.

~~~
nine_k
Yes, I meant a strictly one-cluster solution for machines within the same
datacenter and preferably the same rack.

Synchronous replication across such a cluster can give much more read
performance with write consistency and durability guarantees even when
hardware failures occur. I don't know if any potential write performance
increase would be worth the increased complexity, compared to a standard
single-master setup.

------
aembleton
Some info from 2nd Quadrant on what BDR is:
[https://2ndquadrant.com/en/resources/bdr/](https://2ndquadrant.com/en/resources/bdr/)

Bi-Directional Replication for PostgreSQL (Postgres-BDR, or BDR) is the first
open source multi-master replication system for PostgreSQL to reach full
production status, developed by 2ndQuadrant and assisted by a keen user
community. BDR is specifically designed for use in geographically distributed
clusters, using highly efficient asynchronous logical replication, supporting
anything from 2 to more than 48 nodes in a distributed database.

~~~
iLoch
> anything from 2 to more than 48 nodes in a distributed database

Why specify a range if you're going to leave it open-ended?

~~~
pgaddict
Yes, it means it was tested with up to 48 nodes.

There's no hard limit on the number of nodes, but at the moment BDR uses full
mesh topology (each node has connections to all other nodes), which becomes an
issue as the number of nodes increases.

------
craigkerstiens
While indeed very exciting, it's important to note that this makes the BDR
extension from 2ndquadrant compatible with stock Postgres. This does not
include BDR shipping with core Postgres.

This continued improvement with the core code and extension APIs will make
more and more extensions feasible which will mean more are able to plug-in and
add value without things having to be committed to core. Though in time this
is one that has a good chance of actually being in core much like pg_logical.

~~~
pgaddict
Not exactly. It means that enough infrastructure was moved into PostgreSQL
9.6, making it possible to run BDR on unmodified PostgreSQL. Before 9.6 it was
necessary to use patched PostgreSQL packages.

------
_Codemonkeyism
The title is misleading, the replication is not coming to stock Postgresql
9.6.

A replication extension got the patches it needs to run into Postgresql 9.6 so
you can use the extension without patching Postgres.

~~~
simon2Q
The purpose of extensions is they allow you to run stuff without including it
in core database. What language should be used for this case to avoid
confusion in future?

~~~
_Codemonkeyism
"Bidirectional Replication extension no longer needs patches in PostgresSQL
9.6" ?

------
oliwarner
As the developer who also manages the servers we deploy on, and not a full
time PgDBA, things like multi-master replication scare the hell out of me.
They really make me worry about what happens after downtime. And latency.

Could anyone here recommend good reading material for scaling out your first
database on to multiple servers? How do I know which scheme is the best for
me?

~~~
pgaddict
The answer really depends on what you mean by "multi-master" \- particularly
whether you're looking for synchronous or asynchronous solution, what
consistency model you need (strongly consistent cluster or nodes consistent
independently), and what are your goals (write scalability, read scalability,
disaster recovery, ...).

BDR is meant to be asynchronous multi-master, i.e. a collection of nodes that
are strongly consistent on their own, but the changes between the nodes are
replicated asynchronously. Great for geographically distributed databases
(users access their local node), for example.

~~~
oliwarner
We're dealing with a booking system so sync is important. But so is redundancy
and throughout.

------
Zaheer
Some more information for anyone trying to understand this better:
[http://bdr-project.org/docs/stable/overview.html](http://bdr-
project.org/docs/stable/overview.html)

Sourcecode:
[https://github.com/2ndQuadrant/bdr](https://github.com/2ndQuadrant/bdr)

------
booleanbetrayal
Would love to see this land in Amazon RDS's list of supported extensions!

------
anthony_franco
Looking forward to playing around with this. Native master-master replication
is the only thing keeping me on MySQL.

~~~
dcosson
Just curious, what Postgres features are you missing on MySQL?

I had only used MySQL until a year or two ago, and wondered what I was missing
since Postgres seems to get more love/hype from the developer community for
whatever reason.

Now using Postgres in production, there are few if any features that I notice
our team using which don't exist in MySQL (maybe Json landed in Postgres first
is one big one?). One thing I have noticed is I find the user/permissions
model for Pg less intuitive. It's as if it's designed for use in a computer
lab or something where there's one human who is the owner/dba and some things
can only be done by them, which doesn't map well to a web app trying to follow
"principle of least privilege".

This combined with the fact that we're on RDS where MySQL/Aurora is the clear
first class citizen makes me wish we were using MySQL.

~~~
allan_s
transactional DDL => if your application often has schema update and you use a
tool like Doctrine for PHP / Alembic for python, when a downgrade or upgrade
fail on MySQL in the middle became the create index was already taken by
someone who "hot-fixed" the database and that now you're in an inconsistent
state and you have to clean stuff by hand you will regret to not be on
PostgreSQL where it will have simply rollback the transaction, leaving you in
a consistent state, you fix the migration, you hit again the command and you
can go back home

hstore/jsonb/array/composite types
[https://www.postgresql.org/docs/8.1/static/rowtypes.html](https://www.postgresql.org/docs/8.1/static/rowtypes.html)
: array is often a good option to implement tag system

partial index: imagine you have a lot of "soft deleted" rows (i.e with a flag
deleted turned to true), you can create an index that ignore them

index on expression: you often do request like "where date = today" , but you
store timestamp precise to the second ? and you don't want to run
date_trunc(your_column, 'day') , which it also a function not present in
mysql..., everytime, nor you want to create a dedicated column for that only
for the sake of performance, index on expression permit you to do that.

integrated full text search: you have a smallteam, and you don't feel like
maintaining one more service for indexing and keeping in sync your search
engine, here you are (of course it's not perfect but better than the option
provided by MySQL)

table inheritance for partionning: you create one table "orders" , and you can
easily partion them into "orders_2016" "orders_2015" etc. while still simply
selecting things out of "orders"

constraints: your column "event_start" must be before "event_end", you can
enforce that at the table level in PostgreSQL

text columns: in PostgreSQL don't worry with varchar(XXX) with XXX being the
subject to flamewars (256 ? 500 ? 1000), the type "text" in postgresql is up
to 2Go and as efficient as varchar()

uuid support: postgresql support uuid natively as primary keys (without
needing to resort to a varchar ofcourse...)

And I've only talking about the advantage of PostgreSQL, not the strange
defect of MySQL: for example that you can only have 1 column with a default
timestamp, that your autoicrements will overflow silently taking back previous
ids , a lot of things are only "warnings" (value not in an enum, fine I will
insert null) that you will not see in your application code except if you
really look hard for it.

Edit: I've used MySQL extensively and only started for now 2 years to use
PostgreSQL, and though I have more knowledge in MySQL optimization and
internals, and I don't consider it "bad", it's just 'so so', you will
definitely be able to do whatever you want with it and it will not betray you
hard, but PostgreSQL is just from an other league and will actively help you.

~~~
evanelias
A few things from this list are out of date. For example, MySQL 5.6 (GA
release 3.5 years ago) added support for multiple columns with default
timestamp. MySQL 5.7 (GA release 1 year ago) added generated virtual columns,
which can be indexed, basically supporting index on expression. And for an
extreme example, table partitioning was added in MySQL 5.1, 8 years ago.

The default SQL mode also changed to be strict in 5.7, preventing silent
overflows and other oft-complained-about write behaviors. Using a strict SQL
mode has been recommended best practice for quite a long time anyway.

MySQL certainly has its flaws, and there are a number of useful features that
are present in pg but missing in MySQL. But the gap is smaller than many
people realize.

------
elevensies
Here is a rationale for multi-master from James Hamilton's "On Designing and
Deploying Internet-Scale Services".

 _Designing for automation, however, involves significant service-model
constraints. For example, some of the large services today depend upon
database systems with asynchronous replication to a secondary, back-up server.
Failing over to the secondary after the primary isn 't able to service
requests loses some customer data due to replicating asynchronously. However,
not failing over to the secondary leads to service downtime for those users
whose data is stored on the failed database server. Automating the decision to
fail over is hard in this case since its dependent upon human judgment and
accurately estimating the amount of data loss compared to the likely length of
the down time. A system designed for automation pays the latency and
throughput cost of synchronous replication. And, having done that, failover
becomes a simple decision: if the primary is down, route requests to the
secondary. This approach is much more amenable to automation and is
considerably less error prone._

------
no1youknowz
I look forward to when this lands on PostgreSQL 9.7 without the need for an
extension. But more so when I can also include the Citus DB extension.

Running CitusDB with just 1 master made me nervous. They did talk about having
multi-master replication as a belt and braces solution, but I don't know how
far they got.

Thinking about this. Both being used may give you a 100% fully fault tolerant
solution?

~~~
atsaloli
After 9.6, the next version will be 10.0.

[https://www.postgresql.org/message-
id/flat/CABUevEzT3RqJZR2i...](https://www.postgresql.org/message-
id/flat/CABUevEzT3RqJZR2ioSePD7JQ_datTLNgS_v2GAwQMRWODc02jg%40mail.gmail.com#CABUevEzT3RqJZR2ioSePD7JQ_datTLNgS_v2GAwQMRWODc02jg@mail.gmail.com)

------
asdf742
Please make sure you understand log replication and go through fire drills for
the list of things that can go wrong with bi-directional replication. The last
thing you'll want to do is deploy this into production and wing operations as
you go.

------
idorosen
BDR is a nice building block to multi-master PostgreSQL. I'm looking forward
to parallel aggregates in 9.6 being added to core. Using the agg[0] extension
for something as core as using more than one core per (aggregate function)
query felt strange. (I wonder if the time has come to decouple connections
from processes/threads in postgres, as well...)

[0]: [http://www.cybertec.at/en/products/agg-parallel-
aggregations...](http://www.cybertec.at/en/products/agg-parallel-aggregations-
postgresql/)

------
StreamBright
Now that is something very interesting. I would love to use this ASAP! :)

------
imaginenore
Can someone explain to me the point of BDR? Since the writes must happen on
all servers anyway, why not just have a master-slave?

~~~
jsmthrowaway
It means you can write to any master, so your application need not be aware of
"master" or anything. That's master/master anything, really. It just makes
replication strategy transparent to applications and is far simpler to reason
about. It's also far harder to implement on the server side, which is why most
software you see that handles master/master (especially cross-DC
master/master) comes with severe caveats, probably this included. Distributed
systems are extremely difficult and come with lots of corner cases.

There's no reason you can't hang slaves off such a setup for various purposes
either, I would assume, though I haven't used BDR and I'm not sure if that's
supported in this software. The best replication strategy I've played with _in
general_ is a master/read slave setup in each facility with master/master
between each facility. A lot of stuff is built that way, but most people never
worry about datacenter failover so it's not the sort of thing you find on
StackOverflow.

If I give you the knowledge that your Gmail inbox "lives" in one datacenter,
imagine how you'd architect a backup for when that facility fails. That's
where stuff like master/master starts to come in handy, because then you start
thinking about things like "why would we build a backup facility and never use
it? The user's latency to the backup is lower today."

------
rgacote
Do all nodes need to be up 100% of the time? If not, how long can a node be
down without replicating (perhaps because a server is under maintenance).

Does BDR have rules for primary key insertion conflicts? I have a (perhaps
odd) situation where identical data is already being written to multiple
servers. Currently handling with a custom replication mechanism.

~~~
pgaddict
The nodes don't need to be up 100% of the time. Thanks to replication slots,
the WAL on the other nodes will be kept until the node connects again and
catches up. So it really depends on how much disk space you have on the other
nodes.

Regarding the PK conflicts - I'm not sure I understand the question, but it's
possible to use global sequences (which is one of the parts that did not make
it into core yet). Otherwise it'll generate conflicts, and you'll have to
resolve them somehow (e.g. it's possible to implement a custom conflict
handler).

See [http://bdr-project.org/docs/stable/conflicts.html](http://bdr-
project.org/docs/stable/conflicts.html)

------
sargun
Is there some sort of consensus mechanism at work here, or is it closer to
circular replication a la MySQL?

------
jtchang
I have not used 2ndquadrant's BDR extension. Anyone comment as to how easy it
is to setup?

------
ex3ndr
Does anyone know good replication for psql in dynamic environments like
kubernetes?

