

MySQL::Replication - peer-to-peer based, multi-master replication for MySQL - alfiejohn_
http://lists.mysql.com/replication/2114

======
jermy
I like Slide 251: 'solve it at the application layer' :)

I'm part of a team responsible for cloud-based video editing software. We use
multi-master replication (perhaps also known as optimistic replication) with
our own tools throughout, but it does require careful design - keep as much
data as immutable as possible, every piece of data that might be updated by
different machines at the same time should have its own row, GUIDs on each
row.

Each machine can generate its own local IDs, which look a lot like a timestamp
with some unique stuff on the end. Each row gets a GUID and a 'version' ID
column, and we only update relayed database updates if the incoming version is
newer. This is largely last-timestamp-wins for the case of conflicts (rare
because of design decisions), but there is some Lamport timestamp behaviour in
there too for updating a existing row.

The main downside is still that all machines need to handle every write, but
with batching up incoming processing into larger transactions, we've had no
problems with quite a number of database updates on a dozen commodity
machines. Obviously filtering into different shards would be an easy solution.

I'm looking forward to seeing what other people are doing with multi-master
replication.

~~~
IgorPartola
At TransLoc we use a multi-master setup. At its heart it's two nodes
replicating from each other. Our data is divided into several databases. Each
database "belongs" to only one master at a time. For example if we have nodes
A and B, and databases w, x, y and z, we would have A be responsible for
writes to w and x, while B would be responsible for y and z.

If B fails, we have a monitoring system in place to tell A that it is
responsible for w, x, y and z at once. The monitor sets read-write permissions
for databases y and z on A (through user permissions), and then notifies all
of our application servers that things have shifted. The applications for
their part include a piece of common code that monitors for changes in the
cluster and allows the application code to cope with these changes. For web
requests, if failover happened in the middle of a transaction, the request
fails. For long running processes, the process will have to go to the top of
the event loop and request a new database connection, etc.

So far it's worked fairly well. We are able to achieve high availability with
it, since our master nodes are in two different data centers. There are
definitely issues with this approach in general, but it works for our work
load.

~~~
alfiejohn_
You're kind of in an active-passive multi-master setup which is find for now
but as replication queries increase between the two nodes, you might start to
see load issues. You're going to have to drop in a third machine and a
decision is going to have to be made on how to replicate your data.

~~~
IgorPartola
Yes. The other big problem with this setup is that a single node must be able
to handle all queries if the other node fails. Fortunately, we are nowhere
near saturating the capacity of our nodes under normal operation (we are only
doing about 1,200 queries/second). We are also able to offload read-only
queries to slaves that are replicating off of the masters, which should help
with the read capacity, which is our biggest demand.

------
ww520
Couple years ago I was looking at building highly available database (MySQL in
particular), and looked into the multi-master setup. While sounds good on
paper, its benefits don't warrant the high development and operation cost.

\- The tables need to be changed and the application layer needs to be changed
to support it, which is a big hassle and very fragile. It's easy to introduce
update conflict. It's a nightmare when dealing with group of updates in a
transaction. You can't really roll back a transaction at the replicated nodes.

\- Whenever a node fails, the replication ring is broken and updates pile up
at the previous node, while subsequent nodes' data become stale. It requires
immediate human attention to fix it, which defeats the purpose of a HA
cluster.

\- Related to above. It's very difficult to add a new master node without
stopping the cluster. The "catchup" process is very manual and fragile.

\- Data in different node becomes stale under high replication load. Clients
reading different masters would get stale data. They are supposed to be
masters and got stale data?!

\- Multi-master doesn't help write scalability as all; all nodes need to
handle all writes. MySQL's single thread update in replication doesn't help.
For read scalability, master-slave is better.

I abandoned the design after a while and chose a different approach. I ended
up using a disk-based replication, like DRDB. A two-machine cluster forms the
master cluster, one active and one standby. Writes are replicated to both
machines at disk level synchronously. When the active node fails, the standby
becomes active within seconds automatically with the complete data on disk.

The beauty of this approach is the simple design and setup. The data are
always in sync, no stale data. Failover is immediate and automatic. The failed
node can automatically catch up when back online. The database application
doesn't need any change and all the SQL semantics are preserved. The cluster
has one IP so the clients don't need special connection logic. They just need
to retry when connection fails.

For disaster recovery, I built another two-machine cluster in another
datacenter acting as the slave, which did async replication from the master
cluster. When the two-machine master cluster completely failed (as in the
datacenter got burnt down), the slave cluster can become master via a manual
process within 30 minutes. The 30 minutes SLA is for someone got paged, look
at the situation and decide to fail over. There are too many uncertainties
across datacenters to fail over automatically.

Added bonus, slaves can still hang off the master cluster for read
scalability. And it works with any disk-based databases, not just MySQL.

------
chuhnk
I'm always in favour of technology which solves the replication issues in
mysql. It's been one of the most painful parts of managing databases for
myself.

I would like to throw in the tungsten replicator into this discussion
<http://code.google.com/p/tungsten-replicator/>. I have been researching it
for the past few weeks as a replacement to mysql's built in replication hoping
to solve a lot of the current pitfalls especially in 5.0.x. There is a very
thorough guide here [http://tungsten.sourceforge.net/docs/Tungsten-Replicator-
Gui...](http://tungsten.sourceforge.net/docs/Tungsten-Replicator-
Guide/Tungsten-Replicator-Guide.html)

~~~
alfiejohn_
I can't remember specifically, but I think we evaluated Tungsten Replicator
and at the time it didn't do multi-master replication.

------
brunoqc
I don't know much about replication but what happens when a master dies just
after the database is updated. Will the changes not be replicated? If the user
see that the second master does have the latest changes and he repeat his
latest actions, what will happen when the first master goes back online? Will
the data be duplicated or merged?

~~~
jermy
It depends on your use case - either you accept there is a write hole that the
update wont appear until that host comes back online and other hosts can relay
the change, or you add application-level checking to not return to the user in
the first instance until the change hits at least another host.

In your scenario, yes. The two changes would be a conflict, but hopefully (if
the user makes the same change) your conflict-resolution code would realise it
is the same and merge them correctly. I fear I don't know how
MySQL::Replication handles this, but it has to be application-level logic, so
I assume it will allow you to override a default of, say, last-modification-
wins.

------
Snoddas
"what happens with collisions? when two databases update the same record It's
a race condition solution? solve it at the application layer "

I suspect this is a deal breaker for many. It certainly is for me.

------
alfiejohn_
Ok. It's now on GitHub if anyone is interesting in following:

<https://github.com/alfie/MySQL--Replication>

------
jriddycuz
I'm confused: doesn't the use of a single local relay to which all clients
connect create a single, non-redundant point of failure?

~~~
alfiejohn_
There's nothing stopping you from having standby relays, active relays etc.
You are free to create a topology of your choosing with minimal restrictions.

