
Migrating Our Django App from MariaDB Galera to PostgreSQL and Patroni - _rami_
https://behind.pretix.eu/2018/03/11/mysql-to-postgres/
======
manigandham
These traditional relational databases are fundamentally single-master
systems. They have all gotten better at replication and failover but multi-
master is not something that just be bolted-on and will never go well.

Most companies don't really need multi-master either, the performance level of
modern servers is so high that usually failover is good enough. If it's really
needed, it's better to look at options like CockroachDB or TiDB instead that
are built from the beginning to be multi-master.

Also it's unclear from the article but why would the followers be so out of
sync? Is there a really bad network? Especially if they set quorum writes on
the master then at least 1 follower should always be up to date.

~~~
_rami_
> Also it's unclear from the article but why would the followers be so out of
> sync? Is there a really bad network? Especially if they set quorum writes on
> the master then at least 1 follower should always be up to date.

Yeah, I should have gone into more detail there. I'm talking about cases like
after an outage when there is a lot to catch up with, or worse, if there are
conflicting timelines (e.g. after a power outage) and synchronization stops
working completely -- the broken follower will still answer queries and I've
not yet found a simple way to monitor this condition.

------
elnygren
Why do you want SQLite for development? Running Postgres on your dev machines
is a docker one-liner that you can put in the README.md for your devs to copy
paste.

Most sane CI services also offer Postgres.

~~~
_rami_
Our development setup currently does not need Docker at all. In fact, it does
not need a lot more than git and a recent Python version:
[https://docs.pretix.eu/en/latest/development/setup.html](https://docs.pretix.eu/en/latest/development/setup.html)

I agree that this is an opinionated setup and I do see the value of a
development setup that mirrors production closely.

(1) During testing, in-RAM SQLite databases are really fast and well capable
of running tests in parallel. We have a test suite of ~2000 tests that
currently needs ~4min to run on a modern notebook with four threads, so this
matters.

(2) pretix is an open source project. If we'd only have a small number of
developers in an in-house team, I wouldn't care at all. However, there are
many people contributing small and large features and fixes to pretix, and
many of them are drive-by contributions: People fixing a problem they just
experienced and then leaving again. I want to make it easy for these people,
even if they use operating systems where installing Docker first might be a
hoop they do not want to jump through.

That said, deprecating MySQL at some point would also force all self-hosting
users of pretix to go through all the steps we did here, so this wouldn't be
viable in the short term.

~~~
fovc
2000 tests in 4 minutes actually sounds really slow. We use postres and our
Django tests run at roughly 9000/min on 3 year old laptops.

Start your pg container once and use Django's -k flag when running tests to
avoid start up times. We also take care to separate DB access from business
logic and to unit test those separately.

~~~
_rami_
Sure, the main reason for this is that nearly all of our tests touch the
database and operate on a comparatively high level, we have way more things in
a fuzzy category between functional and integration tests than actual unit
tests. This is of course a flaw in our test writing, but it's what we have
right now ;)

~~~
bpicolo
Top down integration tests might be slow, but they're a really top notch way
to actually know your code works in dynamic language, mvc-style webapps. Can't
blame you there.

------
zzzeek
> We talked to a few database experts at the side of one or two conferences
> and this made us lose any trust that Galera is able to perform even remotely
> well in terms transaction isolation, constraint enforcement, etc., even
> though we did not observe such problems in practice ourselves.

there's a statement really short on specifics. Galera has nothing to do with
constraint enforcement, these are functions of the underlying MySQL / MariaDB
database, and the major constraint enforcement issue there is CHECK
constraints, which only MariaDB 10.2 actually supports, and that's a very
recent thing. Similarly with transaction isolation, that's also a function of
the MySQL/MariaDB engine with the exception of Galera nodes accepting write-
sets from other nodes, but while those introduce the concept of your whole
transaction being rejected, it doesn't introduce any other negative isolation
effects, except if you were hoping to have your transactions talk to each
other with dirty reads.

Like a lot of these "we migrated from X to Y" stories, all the negatives they
refer to here that are Galera-specific are resolveable. The "MySQL sucks" part
of it, e.g. the issue with the query planner, sure. Multi-master Postgresql
will be extremely useful if a product as easy to use as Galera is produced.

------
zimbatm
Here is how to do it with zero downtime:

Each customer has a completely separate data set, except maybe for login
information. This makes it a good candidate for sharding. Sharding is also a
good idea because eventually one big ticket sale event is going to hog the
whole system and having a way to isolate it allows the rest of the customers
to keep working.

1) Add a shard ID to the customer table. If it's blank, use the existing
database.

2) Create additional master-slave pairs with the same DB schema

3) Create a tool to migrate customer data between databases

4) Migrate customers individually, creating minimal downtime during their
transition. With streaming replication this can be turned to zero downtime.

Wordpress has similar scaling issues and talked a lot on how they do it.

------
shlomi-noach
I'm curious whether pretix attempted using MariaDB/Galera with single-write
mode, such that the three servers still run synchronous replication, but only
one of them gets the writes.

I suspect (disclaimer: not running Galera myself) "WSREP has not yet prepared
node for application use" and deadlocks would be mitigated, and am curious to
learn if that were indeed the case.

~~~
_rami_
No, we haven't. You are absolutely right that this would probably mitigate the
deadlocks. However,

(1) the deadlocks/WSREP was a problem annoying us, while the query performance
was a problem leading to real customer complaints, (2) as I see it right now,
we would have needed to take complicated steps to ensure that all application
nodes always connect to the same database node; just a hardcoded priority
would probably not be enough, (3) a mechanism for (2) would likely make us
loose more of the advantages of Galera, e.g. short failover downtimes

~~~
shlomi-noach
Since you already use HAProxy you should get that almost for free. Your app
would talk to the databased by communicating through HAProxy, which would
direct traffic to the single-writer; HAProxy's health tests would detect who
the single-writer is, and would adapt in the event of failover.

------
clon
>MariaDB Galera is really easy to set up and maintain.

I take exception to that. When you log into all of your nodes after a network
snafu and discover that every single node has attempted to perform a full
state transfer, involving the destruction of the data directory...

    
    
      1. rm /var/mysql
      2. Attempt to do FST
      3. Crap happens
    

After a while, all nodes were left with no data. Off to backups :)

I am sure there is a way to prevent that, but Galera is still not easy ops
wise.

~~~
tetha
Maybe it's my small scale experience, but so far, multi-master relational
databases with automated fail-over seems like a lot of risk for just a little
payoff. If you can handle the risk, and the operational/development cost, and
you need the payoff, go for it. I'm not in that spot.

If I have a master01 with master02 replicating as a standby, I can switch
between these two masters within 5 - 15 minutes depending on my setup and at
what infrastructural level I do the switch - I could reconfigure the
application, switch a dns entry, use a load balancer like maxscale. It's
downtime, but it's a low-risk recovery with well-known impacts.

With a multi-master setup, I have to do at least two things: First, I must
ensure my applications transaction-safety. Read-Write splits with an
application with bad transaction management is fun, and write-splitting will
end up with even more of a mess. And then I need to setup and operate the
multi-master setup, which is a non-trivial decision and selection imo. Just
look at the number of possible solutions for postgres.

This in turn allows the system to automatically failover in case of trouble -
which my current infrastructure would have had to do 3 times over two years,
and it would have helped 2 times at most. Except if the failover itself fails
and ends up harder to handle than the database failover, like in your case or
in other really scary postgres failover horror stories.

And interestingly enough, in our b2b context, our customers actually prefer a
well-known, low-risk failure plan, even if it is 30 minutes of outage.

~~~
clon
This is very much our experience as well. As complexity increases, the
marginal utility goes down, as operating concerns skyrocket.

You really need pretty special operational needs to run multi master
relational databases.

It is am engineering tradeoff between complexity and fragility.

In our case Galera / PXC ended up with a significantly worse availability
record than running our previous simple master slave failover system. See my
other comment for more details.

------
abkfenris
Patroni has now implemented a Sync endpoint that you can check with HA_Proxy.
[https://github.com/zalando/patroni/pull/578](https://github.com/zalando/patroni/pull/578)

------
tbarbugli
Am I the only one thinking this is a terrible Buy versus Build decision?

You can get HTTP load balancing, Postgresql with automatic fail-over and
queuing up and running in a matter of minutes from AWS or GCE.

~~~
jimktrains2
Perhaps they don't want to be locked into a vendor's ecosystem or want newer
versions of postgres or a different configuration?

------
nh2
Interesting article.

> Your browser is usually clever enough to use the working IP address if the
> other one is not working.

I found this to not work well. Have you tested it? For me it took 1-2 minutes
for Chromium to switch over to the second entry after it had loaded the page
from the first entry once (with a white page and spinning spinner during that
time); this is usually not acceptable for site visitors as they will get
impatient and leave already after a couple seconds.

What I'm using instead for a setup is multiple A entries _in combination_ with
active removal of down IP addresses from the response using e.g. Route53
health checks and 10-second TTLs.

------
innocentoldguy
While I think, overall, the article provides good information, I would not
recommend developing with SQLite if you’re using some other database in
production. I have run into too many cases where doing this leads to hard to
find bugs.

~~~
_rami_
Our CI rans our test suite against all of SQLite, MySQL and PostgreSQL which
worked well enough that I can only remember one or maximum two bugs caused by
this, but yes, wise words.

------
aargh_aargh
Thanks for the detailed write-up, especially the dead-ends/failed approaches.

------
codedokode
This link doesn't open neither in Chromium 46
(ERR_SPDY_INADEQUATE_TRANSPORT_SECURITY) nor in Firefox 45 (shows just blank
page). Is it only me? I thinks there is something with their server setup.

~~~
ubernostrum
Worked for me because my ad-blockers stopped their analytics script. It loads
over HTTPS, but seems to include code that will load some other URL over HTTP.
That would be an HSTS violation and should cause your browser to bail out.

~~~
_rami_
Hi! :) do you explicitly see that HTTP URL or is it just a guess? I'm
struggling to find it right now :(

~~~
ubernostrum
In line 43 of piwik.js I see an HTTP (not HTTPS) URL being constructed. I
don't have the time to try to de-minify it and see what it _does_ with that
URL, but it definitely seems to be an HTTP URL.

~~~
_rami_
That's weird indeed (although I don't see any connections to a http: URL in
the dev console). If someone else reports a similar problem, I'll just remove
piwik alltogether.

------
tpetry
Did you evaluate citus?

~~~
_rami_
We haven't in depth, as they only advertise with their sharding and
distributed query features etc and less with their HA features (if that is
even in Citus Community).

Sharding data is something we won't do before we absolutely need it, since
disk capacity is not at all our constraint right now and our scaling problem
isn't that its hard to serve many tenants at once, but we may need to be able
to serve one tenant with very high concurrency the minute their tickets go on
sale (which would only affect one shard on a usual sharding setup).

Are you using Citus for HA? Would love to hear your thoughts!

------
agnivade
> MariaDB has a hard limit of using one index per table in a single query.

Wait, seriously ? What is the rationale behind something like this ?

~~~
tetha
This seems to be not generally true. MysqlDB > 5.5 and MariaDB > something
less than 5.3 support index_merge in their query plans. Skimming the mysql
page, this allows mysql to evaluate different parts of the where-clause on
different indexes and merges the results later on. Most examples on the mysql-
man page also use just a single table in their queries.

