
Bi-Directional Replication for PostgreSQL v1.0 - eloycoto
http://blog.2ndquadrant.com/bdr-1-0/
======
sinatra
As the linked page doesn't describe what it is: Bi-Directional Replication for
PostgreSQL (Postgres-BDR, or BDR) is an asynchronous multi-master replication
system for PostgreSQL, specifically designed to allow geographically
distributed clusters. Supporting more than 48 nodes, BDR is a low overhead,
low maintenance technology for distributed databases. [0]

[0]:
[https://2ndquadrant.com/en/resources/bdr/](https://2ndquadrant.com/en/resources/bdr/)

~~~
rattray
> Supporting more than 48 nodes

Hmm, on the mailing list announcement they claim 2-48 nodes.

EDIT: perhaps they're referring to support for read-replica's that can go
beyond the 48 primaries.

[https://www.postgresql.org/message-
id/CAH%2BGA0qO0q9NJhKxaoo...](https://www.postgresql.org/message-
id/CAH%2BGA0qO0q9NJhKxaoo8CV-QR9Cuw6LtDGDb33pq16rU88F_dA%40mail.gmail.com)

~~~
pgaddict
AFAIK there's no explicit limit per se - the BDR developers tested it with 48
nodes, but it will work with more nodes.

But there are things you'll probably run into - for example this only supports
full mesh topology, i.e. each node speaks to every other node. So the more
nodes you have, the higher the number of management connections.

------
baq
It's not immediately clear for me when to use BDR and when would I choose XL.
Can somebody do a quick comparison? I like this quote in the announcement
email:

\---

IS Postgres-BDR RIGHT FOR YOU?

BDR is well suited for databases where:

\- Data is distributed globally

\- Majority of data is written to from only one node at a time (For example,
the US node mostly writes changes to US customers, each branch office writes
mostly to branch-office-specific data, and so on.)

\- There is a need for SELECTs over complete data set (lookups and
consolidation)

\- There are OLTP workloads with many smaller transactions

\- Transactions mostly touching non overlapping sets of data

\- There is partition and latency tolerance

However, this is not a comprehensive list and use cases for BDR can vary based
on database type and functionality.

In addition, BDR aids business continuity by providing increased availability
during network faults. Applications can be closer to the data and more
responsive for users, allowing for a much more satisfying end-user experience.

\---

But I can't seem to find something similar for XL and especially a diff
between the two.

~~~
brian_cloutier
I'm by no means authoritative, but XL lets you distributed postgresql and keep
the transactionality. That's great if you're a bank but necessarily slow and
not Available, so running it over a wan might be a bad idea.

BDR lets you distribute your database but throws transactionality out of the
window. You have two independent databases which stream changes to each other.
A lot faster, and well suited for wans, but not Consistent.

~~~
icebraining
Offtopic, but I find it funny how we use banks as the paramount example of a
need for consistency, when they're actually not :)

~~~
rantanplan
I would be interested if you could elaborate a bit.

~~~
DrJokepu
Banks basically don't use database transactions to record financial
transactions. Instead, failed operations are rolled back with an inverse
operation. It's done this way to ensure that there is an audit trail of every
operation, rolled back or not.

~~~
djsumdog
Huh. So banks really couldn't use event sourcing could they? Oh maybe they
could so long as every failure was a new event.

~~~
tmd83
I would assume that banks can be the best enterprise profile user of event
sourcing. For traditional application we use the DB rollback for failure
mitigation and banks are actually explicitly doing that. I think the event is
already there and they are essentially using the same concept as event
sourcing if I'm not mistaken.

And I think its much easier to make it work. There is a lot of associated
domain and logic but the core of banks is a series of transactions (never
worked so could be completely wrong) .. source, destination, amount. It looks
poster child for event sourcing/log based architecture.

------
Klathmon
So this might not be the right place for this, but i'm curious.

How do people deal with "eventual consistency"?

In my head once a transaction is done, it's everywhere and everyone has access
to it.

What happens if 2 nodes try to modify the same data at the same time? Or what
happens if you insert on one node, then query on another before it propagates?
And if the answer to those questions are what I think they are (that bad stuff
happens), how do you setup your application to avoid doing it?

~~~
sanderjd
This is a really big question that quickly gets into distributed systems
theory in general, but in part, the idea is to recognize that a lot of times
the "bad stuff" is not so bad. For instance, if someone updates their profile
picture on Facebook, and I make a request that should include their profile
picture, but their update hasn't propagated to the node I'm reading from, I
just get the old profile picture, and that's a-ok. There are definitely nasty
things (like simultaneous updates) that you have to be aware of, but for a lot
of applications, there are a lot of cases where out of date data makes little
difference.

~~~
Klathmon
I guess the part I have trouble with is how do you separate what can't be
delayed from what can?

Is it just a matter of keeping the information on 2 separate systems, or are
there tools that let you for instance mark one query as "take your time and
make sure everything is 100% up to date before committing this" and another as
"return whatever you got, it's not important"?

~~~
gchpaco
Depends on the database, but Cassandra, for example, has quorum mode for
writes, which requires a majority of the cluster members ack the write. This
can be enabled on a per-query basis, and also for reads.

The other way of doing it is things like CRDTs
([https://en.wikipedia.org/wiki/Conflict-
free_replicated_data_...](https://en.wikipedia.org/wiki/Conflict-
free_replicated_data_type)) which have a join operation for any two data
values.

You have to keep it in the back of your mind that it's a thing, but working
without consistency can be done.

~~~
im_down_w_otp
Quorum and CRDTs deal with completely different problems.

CRDTs do one thing... they mitigate the issue of "lost updates". All
acknowledged writes will be represented in the results of a query and no
"winner" strategy is involved that would cause some acknowledged writes to be
possibly incorrectly dominated by others and thus lost.

Quorum (strict) just provides a very, very, very weak form of consistency in
the case of concurrent readers/writers (RYW) and just very, very weak
consistency in the case of serialized readers/writers (RR).

My personal opinion is that any eventually consistent distributed database
that doesn't have built-in CRDTs, or the necessary facilities to build
application-level CRDTs, is a fundamentally dangerous and broken database
because unless all you ever write to it is immutable or idempotent data,
you're going to have your database silently dropping data during operation.

------
jimktrains2
> I’m pleased to say that we’ve just released Postgres-BDR 1.0, based on
> PostgreSQL 9.4.9.

While I'm still excited to play with it, I can't wait until pglogical comes
into mainline.

~~~
continuations
How does pglogical differ from PostgreSQL's built-in streaming or log-shipping
replication?

~~~
pinaraf
PostgreSQL streaming : send disk diffs over network, raw binary, absolutely
unusable if you don't have the same postgresql version on the other side
Logical : send data diffs over network. Could be used for replication, but
also audit or sending to different databases...

------
koolba
Anybody in the HN crowd have experience using this? How does it perform on a
WAN?

~~~
eloycoto
We used it for a few months now, it works. we have 3 datacenters with ~10ms of
delay between them and no problems.

Over WAN will work, but not sure how reliable will be. Regards.

~~~
baq
Have you tried PostgresXL and if yes, can you make a high level comparison?

~~~
eloycoto
No, I didn't try. So I can't compare. I saw the contributors/support from
postgres, and I chose BDR.

~~~
pgaddict
Well, we're also Postgres-XL contributors, so ... ;-)

------
DelaneyM
I wonder if it's plausible for AWS to release a customized variant on this
targeted to their data centers and designed for ease of multi-zone deployment?
Something like Azure (which optimized and facilitates MySQL).

I realize that redshift already loosely meets that definition, but it doesn't
quite work as a globally distributed regionally clustered web service back
end. This is ideal.

~~~
neuronexmachina
Minor point: Did you mean to say "Aurora" (AWS's MySQL-ish DB) instead of
"Azure" (Microsoft's cloud service)?

~~~
DelaneyM
Yes. That is what I meant. :P

Thank you.

------
looneysquash
What I want to know is the timeframe of supporting PostgreSQL 9.5 and/or 9.6.
(Though 9.6 is still in beta.)

Also, my understanding is that they're feeding patches back to postgres, and
want it to eventually run on stock postgres. But it's not clear to me how
progress on that is going.

I was also surprised that UDR was removed, I didn't even realize it was
deprecated.

I'm not actually using the product at all right now, but I've been watching on
the website, because I want to use it eventually.

I'm kind of hoping it works with stock postgres before I jump in. But if not
that, I think I at least want to wait for 9.6 support.

~~~
martinmarques
UDR has been deprecated in favor of pglogical, which is based on better code
already available on PostgreSQL (and production ready... actually we are using
it with customers on production systems).

There will not be any BDR release for 9.5. That has been commented may many
times on the pgsql-general list.

There will most likely be a 9.6 version, but it won't run on stock postgres
(you'll still need to have a patched version of postgres).

If the rest of the features BDR relies on get in for the PG 10 release, then
there will be only a BDR extension which you'd have to install (no patching
postgres)

~~~
bonesmoses
That may be the case, but they need to update the documentation to reflect
that. This is what it currently says:

"Note: All the new features required have been submitted for inclusion in
PostgreSQL 9.5. Many have already been accepted and included. If all the
functionality BDR requires is added to PostgreSQL 9.5 then the need for a
modified PostgreSQL will go away in the next version."

Well, PostgreSQL 9.5 has been out for several months now, and there's no note
at all for 9.6. For a 1.0 announcement, that's a pretty strange omission.

So we get something that only really works with a patched version of 9.4 when
9.6 is going to be released soon, and talks about 9.5 like it isn't out yet?
No thanks.

------
nubela
Any reasons why I should not use this? Any other (Postgres-esqe) alternatives
for such a solution?

~~~
mason55
> _Any reasons why I should not use this?_

If your use case can't handle eventually consistent data or you have no way to
successfully resolve conflicts/never cause conflicts then this product is not
for you.

They've chosen availability over consistency, so when the network between your
DB servers partitions your applications will still function. But partition A
won't see what's happening on partition B and vice versa and when the
partition heals you might have conflicts to resolve.

One great use case I can think of for this is a case where you have
geographically clustered clients that need to read globally but only write
locally. As long as you model your data correctly you could set up a master in
each region so that your writes were fully ACID compliant then have the
masters replicate to each other so that data from other regions was available
for reading. You'd get the benefit of distributed writes, your local clusters
could continue working even if there was a partition or one of the other
clusters went down, and you'd assure yourself that you wouldn't have
conflicts.

------
zzzeek
um, LICENSE? am I missing something

[https://github.com/2ndQuadrant/bdr](https://github.com/2ndQuadrant/bdr)

~~~
kej
The link at the top of the repo goes to the product page, which says:

>Postgres-BDR, an extension to PostgreSQL is free and open source licensed
under the [same terms] as PostgreSQL.

So it's under the PostgreSQL license, which is similar to MIT.

------
rattray
They claim performance comparable to Hot Standby:

[https://2ndquadrant.com/en/resources/bdr/bdr-
performance/](https://2ndquadrant.com/en/resources/bdr/bdr-performance/)

and, in some cases, ~1.5x over Slony.

------
onderkalaci
Is there a documentation on "How BDR works?" or so?

~~~
srparish
[http://bdr-project.org/docs/stable/index.html](http://bdr-
project.org/docs/stable/index.html)

------
brightball
Can't wait to try this out.

I wonder how long it will take Heroku, RDS or another PG provider to make this
available?

~~~
Artemis2
Surprisingly I haven't found a lot of HA-ready (actual HA, no failover) hosted
PostgreSQL. There's Citus Data but that's a fork and they're expensive.

~~~
Judson
Just wanted to point out that Citus Data is no longer a fork of Postgres and
is now open source[0]. (Though, I'm sure their managed service is 'expensive'
compared to unmanaged).

[0]: [https://www.citusdata.com/blog/2016/03/24/citus-unforks-
goes...](https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes-open-
source/)

------
therealmarv
When reading this I wonder how to do this: OS or Postgres updates with one
normal Postgres (or a typical master+slave setup) without downtime and without
using Postgres BDR. Does somebody now?

~~~
colanderman
OS upgrades, sure, just use two systems with one of the existing async rep
solutions between them. You'll need to make the master read-only for a brief
period before failover as the slave "catches up" with the master.

Online PG upgrades aren't (until now) possible because the binary log format
changes. But pglogical (which underpins BDR) solves this issue:
[https://2ndquadrant.com/en/resources/pglogical/](https://2ndquadrant.com/en/resources/pglogical/)
Process would be the same as an OS upgrade (brief period of read-only
availability required), but you can do it on one machine.

------
rattray
It seems like this is very much a (FOSS) product of 2nd Quadrant. Does anyone
here have experience with them? What is their reputation like?

~~~
Tostino
Not worked with them directly, but have followed the pg-hackers mailing list
and they are have a bunch of contributions back to core Postgres, and have
some core maintainers on staff.

They seem very open with everything they do, as opposed to EDB, which do
contribute back but also have a bunch of closed source tools and their own
closed source Postgres fork.

------
therealmarv
Would not this be an ideal Docker database?! But I could not find a good
maintained Docker image in the hub:
[https://hub.docker.com/search/?isAutomated=0&isOfficial=0&pa...](https://hub.docker.com/search/?isAutomated=0&isOfficial=0&page=1&pullCount=0&q=postgres+bdr&starCount=0)

~~~
cookiecaper
Docker and database do not mix.

~~~
thom_nic
Source? I have not seen strong argument against running postgres or other DBs
in a container. Data directory should be mapped to a persistent volume, of
course.

~~~
cookiecaper
Source is a basic understanding of containers and a basic understanding of
databases.

a) Docker and other container systems are still _very_ young. They're missing
a lot of badly-desired features and the community is still coalescing on best
practices and safe approaches. It's certain there are complex and significant
bugs. This is triply-true if you're using an orchestration framework over the
top like Kubernetes. This isn't a reason why DBs are a bad fit in itself, but
it's a reason why running production data in Docker is a bad idea at present.

b) Docker is meant for stuff that's portable and can be isolated from its
hardware. It's meant to make it easy to run many applications on one machine
without the resource overhead of virtual machines. DBs are distinctly not
well-suited to this environment. DBs require a good deal of tuning and
hardware awareness to run properly. DBs want to run forever and keep
frequently-used data in memory; often, restarting a DB server is a big deal
because it makes the caches go cold and the server is slow until they warm
back up. DBs don't take kindly to sharing a box with 30 other applications. On
a VM at least you know you have a dedicated chunk of memory. Not the case on a
Docker host. As I discuss in C, Docker is basically the antithesis of long-
running; there are many ways to make your container suddenly disappear or
stop.

c) There are many non-obvious gotchas involved in Docker usage; it's a very
non-intuitive interface. For example, using `docker attach` to connect to a
running container will often place the user in control of the process. A
simple ^C, which most sysadmins would interpret as "OK, I'm done with this
log", closes the container's process and cleans up your container. That's just
one example of a risk among many brought to you by the counter-intuitive and
unfriendly Docker UX. Not the kind of fragility we want for something as
mission-critical as a database system. Again, Docker is meant for processes
that are consequence-free if they're cleaned up. Their UX is obviously
designed that way too. You're supposed to run 8 containers from the same image
and if one goes away the LB detects it and it's nbd. Databases don't work like
that.

d) Docker sometimes becomes a zombie and fails to respond to commands. You
can't connect to any of your running containers. You can't issue start or stop
commands. You can't get output from `docker ps`. This has happened to me on
multiple occasions. Do we really want a production DB running in that context?

e) You mentioned it, but storage. Even persistent volumes can be a PITA to
configure properly with Docker, and if you fail to do so, you are looking down
a blackhole backed by the slow AUFS virtual filesystem. By default, all data
written to a Docker container goes into its own differenced image and when you
stop the container, it "goes away" (though it can usually be recovered by
calling up the specific container ID instead of restarting from the image).
These AUFS volumes have a habit of consuming a lot of disk space on the host,
which sometimes causes point d to occur if you get to 0. Volumes cannot be
mapped at build time and must be defined at runtime. There are many bugs with
data volumes, including data loss bugs and filesystem feature bugs.

The long and short of it is that Docker is oriented toward applications that
aren't close to the metal and can have their log and write output easily
redirected to more durable systems. This fits the bill for many web apps (the
application part, not the database parts). It absolutely does not fit the bill
for long-running, close-to-the-metal, mandatory-firm-and-reliable-storage
systems like databases.

I use Docker to run local development databases (as well as applications). I
wouldn't use it for any DB more important than that. Docker is a cool set of
abstractions around jails, but it is _not_ a universal solution, just as cloud
isn't.

~~~
therealmarv
Container virtualization in Linux is actually quite old. First appearance in
Linux around 2001: [http://rhelblog.redhat.com/2015/08/28/the-history-of-
contain...](http://rhelblog.redhat.com/2015/08/28/the-history-of-containers/)
and the concept and real practice realization (in commercial products) of
containers is even older. It has roots back to 1979!
[https://dzone.com/articles/evolution-of-linux-containers-
fut...](https://dzone.com/articles/evolution-of-linux-containers-future)

Docker is only a philosophy (one application in one container) and toolset
around this containers. And docker itself is quite new in comparison.

"This isn't a reason why DBs are a bad fit in itself, but it's a reason why
running production data in Docker is a bad idea at present."

You cannot say this in general. As everything in the world this really depends
on what you are trying to achieve with your system.

~~~
cookiecaper
Yeah, I understand that the concept of process isolation has existed for a
long time. We used to know them as "chroot jails". cgroups obviously is a
feature that has been dormant in the Linux kernel for a long time (until
Docker came along and convinced everyone they had to use them, which actually
isn't really a point in favor of containerization either: "we depend on this
lightly tested feature of the kernel!"). Those things on their own are
substantially different from what are today known as containers, a mix of
concepts that involves custom daemons, complex network routing layers, bolt-on
orchestration components like Swarm/Kube, and all sorts of other voodoo that
makes today's understanding of "containers" distinct from historical uses
either of the term or the concepts implemented via jails/zones/whatever.

------
petepete
Title makes it sound like people are still writing extensions for PostgreSQL
1.0

