

How database replication helps me sleep at night - martian
http://engineering.thumbtack.com/2011/04/06/how-database-replication-helps-me-sleep-at-night/

======
pkteison
Cons lists are massively oversimplified. What about the case reddit had
recently, where their masters and slaves got out of sync (presumably because
master didn't fully flush a commit to disk, e.g. write buffering was on
somewhere)? Hot standby is a hard problem, and has far more cons than just
"you can't recover from drop table". When the fortune 500 company I used to
work at would try replication with Oracle, it was often responsible for poor
write performance, and sometimes we'd end up with two points of failure (a
problem with either of 2 dbs could now make a system effectively unavailable)
where before there had been only one. Maybe it's gotten better, maybe postgres
has a great implementation, maybe our DBAs were just horrible... but five
years ago I developed the opinion that the only way to get it right with
traditional databases is to teach the application layer to handle failure, and
sacrifice some part of ACID - and I haven't seen anything to convince me to
change my mind since.

~~~
birken
I agree this is certainly a simplification, but I don't think one grossly so.

A lot of the points you mention do not apply to our setup:

\- Our standby database is guaranteed to be consistent (barring some massive
hardware failure), because we disable write caching on both the drives and the
RAID controller, and the only data being exchanged is the write-ahead log. If
the standby receives corrupt data, it will not apply it and will break off the
replication and maintain its consistent (and stale) state. The underlying file
system of the master and standby are completely distinct, thus any failure on
either machine has no effect on the other.

\- Streaming replication has no effect on write performance on the master, as
the replication is best-effort and asynchronous. Of course this is a
redundancy trade-off, but one that is fine for our needs. If you need to
commit every transaction to multiple machines, that is a whole different and
more complicated problem.

\- Increasing availability was not the main goal for us. If our master
database crashes, our website will break. However, our database state should
be intact (either on the master or standby), and that was our goal. Having
master-master replication or something along those lines is significantly more
complicated and invites a host of problems that involve a lot of trade-offs.
We accept that in case of master database hardware failure, a human is going
to have to fix it. However, we believe with our current system, we can greatly
reduce the time it will take.

------
midas
Aren't there other ways to do database replication?

~~~
birken
(post author)

Yes, there are a lot of third-party replication solutions for Postgres
([http://wiki.postgresql.org/wiki/Replication,_Clustering,_and...](http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling)),
which offer various features (some of which offer even more complex and
redundent replication options than the built-in replication). However, since
they are all third-party solutions, there are such varying levels of
complexity and support I think you would prefer to use the built-in versions
of replication.

And as discussed in the post, I believe that streaming replication is a very
nice replication solution, and probably good enough for most applications.

~~~
jswinghammer
How do you like Postgres? I'm doing a lot of work with MySQL lately and
curious about Postgres.

~~~
birken
I can't say I have a ton of experience with Mysql, but I am a huge fan of
Postgres. I think especially for people who just want their database to work,
having to worry about which of the storage engines you should be using for
MySQL would be a headache. With Postgres, they focused on one storage engine
and making it work as broadly as possible, and I think that is the right way
to go about it. In addition, administration + configuration is really nice,
the community and mailing lists are active and helpful, and the performance
has been great for our usage. We also use it heavily for geospacial stuff
(using PostGIS), and I think Postgres is ahead of MySQL for that.

I'm sure there are places online that do a much better job comparing them, but
from my experiences with Postgres I would have no reason not to use it for any
relational database needs.

