
Migrating 1200 databases from MySQL to Postgres - yupyup
http://psyenix.blogspot.com/2017/08/migrating-1200-db-from-mysql-to-postgres.html
======
seanharr11
I wrote an open source tool that migrates between any 2 relational databases.

[https://github.com/seanharr11/etlalchemy](https://github.com/seanharr11/etlalchemy)

This includes automatic migration of the schema, constraints, indexes and
obviously data. It leverages fast bulk loading tools like mysqlimport, and
PostgreSQL's COPY FROM. It doesn't need any config. Just plug and play.

The tool uses SQLAlchemy to abstract column types, and handles various column
type conversions between any flavor of SQL.

The one thing it needs are tests, and scalability support for tables bigger
than a few GB!

~~~
icelancer
Thanks much - going to test it between MSSQL and MySQL in a few days here!

~~~
seanharr11
Let me know how it goes...please open issues if you find them!

------
willcodeforfoo
I'm working on migrating an app right now (just one MySQL database) but can
highly recommend [pgloader]([http://pgloader.io/](http://pgloader.io/)). It
has a bunch of built-in and configurable rules to cast different column types
to PostgreSQL and the developer has been very responsive in helping me make
weird column types work, too.

~~~
erjiang
Pgloader can be very powerful. Despite never having using Common Lisp, I found
it easy to add in a custom column conversion functions without modifying
pgloader itself. You can write additional lisp files with custom procedures
and load them at runtime.

------
sigi45
I like the approach and think something like this is fun but i don't
understand the businsess case at all.

I would have migrated MySQL 5.5 to 5.6 first, than cleaned the shit out of it,
normalized everything, added a few indezes and suddently realize that there is
not much left to do.

I like to work with postgresql but if you have 1,2k dbs which are more or less
not claned up and crappy, what are they used anyway?

~~~
richdougherty
I agree. It sounds like they decided to do a cleanup ahead of an investment
and not much of a real business case. Your migration path seems much, much
simpler - and that's assuming the cleanup is even needed in the first place.

------
SOLAR_FIELDS
One thing author did not touch on: since the tables were denormalized-ish in
the original MySQL DB, did his application lose significant performance by
having to perform the joins for every single query in the renormalized
PostGres instance? Or were the DB's small enough and indexed properly enough
so it didn't really matter?

Might have been worthy of testing this to see if it was worth it to
materialize certain views and refresh them every 15-30 minutes or so.

~~~
Amezarak
At least in my experience, joins in 100s of GB databases are not a performance
problem. I've always been somewhat baffled by the claims I see pretty
regularly that joins are a major performance concern. Was this a big deal in
the past?

~~~
gaius
_Was this a big deal in the past?_

In MySQL, yes. It lacked the concept of foreign keys.

~~~
kondro
Lack of foreign keys does not effect performance.

In fact, having foreign keys is a net-negative performance-wise because it
means that each foreign key must be checked for every insert/update/delete.

~~~
gaius
Errm, we're talking about performance of joins

~~~
kondro
They don't help with that either.

~~~
Sacho
Perhaps parent is conflating indexes with constraints? I believe InnoDB
creates indexes automatically with foreign keys.

------
berns
One thing that Postgres lacks is accent insensitive collations. Having clients
with databases in Spanish, this is one of the reasons I wouldn't consider
migrating to Postgres. I know I can use the unaccent extension, but I consider
it a poor substitute to proper collations. I guess this isn't a problem for
most people because it's never mentioned.

~~~
claar
It appears work is active (May 2017) in this area, but accent-insensitive
collations are still a ways off: [https://blog.2ndquadrant.com/icu-support-
postgresql-10/](https://blog.2ndquadrant.com/icu-support-postgresql-10/)

edit: fulafel corrects me below that you can get accent-insensitive collations
working using PostgreSQL 10's new ICU collation support. Note that PostgreSQL
10 is currently pre-release.

~~~
fulafel
That post says that they currently work, but there is a new os-independent
collation method coming.

Edit re your edit: the already working collations are in PG 9.x - the OS
independent collations are coming in 10.x. (Hence the post title, "More robust
collations with ICU support in PostgreSQL 10")

~~~
fauigerzigerk
But do these "already working" collations include accent-insensitive variants?
I didn't think so, and I don't think they are coming in PostgreSQL 10 either.
Do you have other information on this?

~~~
petergeoghegan
They do. The problem is that equality is still strict binary equality for
text, for obscure technical reasons. So, while you can get extremely flexible
customizations to collation with icu [1], you can't have case insensitive
equality, or equality that ignores accents. This is recognized as a problem,
and is probably going to be fixed in v11.

[1] [https://www.postgresql.org/message-
id/CAH2-Wz=bcgmk97YaZ3rs4...](https://www.postgresql.org/message-
id/CAH2-Wz=bcgmk97YaZ3rs4OoCdn1nOco1HCfRGBCOOty0ztnCnA@mail.gmail.com)

~~~
anarazel
> The problem is that equality is still strict binary equality for text, for
> obscure technical reasons

AKA performance.

------
medius
If you are migrating to AWS RDS, I recommend AWS Data Migration service. I
migrated my live database (~50GB) from Mysql to Postgres (both RDS) with zero
downtime.

I used AWS Schema Conversion Tool for initial PG schema. I customized the
generated schema for my specific needs.

------
seanharr11
A few of these steps could be solved with this tool, including schema/index
migrations, and even initial (fast) data transfer:
[https://github.com/seanharr11/etlalchemy](https://github.com/seanharr11/etlalchemy)

------
tkyjonathan
I'm not against the idea of using postgres, but I have a sneaky suspicion,
that you could have gotten that 30% speed benefit from MySQL, if you did some
performance work on it. Or in other words, I don't feel that migrating from
one relational database to another similar relational database was worth that
much trouble.

~~~
collyw
Thats interesting, as my dev experience would allow me to tune queries better
in Postgres (the visual query planner makes it easier and there are more join
types), while tuning them through configuration on either database would
involve a lot more trail and error reading. I would more likely know what to
do by migrating.

------
aidos
I went through a similar process a few years ago (4k dbs into a single db, all
mysql). You'll get something working fairly quickly and then spend days
debugging data anomalies that have crept in over the years.

We ended up with a frankenstein mixture of bash scripts, a sequence of sql
transformations, some python and some php to decode some of the data that had
been stored in pickled / php encoded packages.

It's not something I want to repeat any time soon :-)

~~~
fermuch
It truly sounds like a nightmare. How much time did it took to move everything
to psql, and how long until it was stable enough to not keep an eye on it?

~~~
aidos
To clarify, it all stayed on mysql — just pulled everything into a single db.

I can't remember exactly but the whole project was done in a real hurry, about
4 weeks I think. It included a rewrite of the application / website (about 20k
lines of backend code, html and js / css), migration of all the data, creation
of new transcoders, migration of a lot of data asset/video data from another
server.

No idea how we got through it all in retrospect, but I had to. A few days
after we finished I got married and went off on my honeymoon, so I guess I
decided it was stable enough the day I released it...! :-)

------
JelteF
One of the main things I ran into when migrating from MySQL to postgres was
that the default text columns (TEXT, VARCHAR, etc) behave differently when
searching. MySQL matches text case insensitive, while postgres matches it case
sensitive. This resulted in searches for content suddenly not returning data.
Luckily there's an official CITEXT extension for postgres [1], which matches
text case insensitive and adds the correct indices for doing so efficiently.

[1]
[https://www.postgresql.org/docs/9.1/static/citext.html](https://www.postgresql.org/docs/9.1/static/citext.html)

~~~
pasta
_" MySQL matches text case insensitive"_

This is of course only when you set a case insensitive collation (_ci).

But I agree this can be a problem because case insensitive collations are used
a lot in MySQL.

Also other collations can be a trouble. For example the collation that returns
results for both 'ß' and 'ss' (German)

------
ehxcaet
I'm not entirely sure what the technical reason for this change was. Of
course, cleaning up your DBs and whatnot is a good thing. But why did they
move it to Postgres instead of simply clean?

------
noir_lord
Would love to do this to the monstrosity I inherited in new job but it's a
sispheyan task.

One day.

~~~
cntlzw
One time I saw a database, with a single table, with a single text column that
contained the old database.

~~~
shrikant
This sounds similar to what I saw at a client a few years back. All XML
responses from a REST API were inserted into a single VARCHAR(MAX) column (one
row per response, but a response could have multiple "records") in a SQL
Server database.

All interaction with this was through a byzantine web of XPath expressions
using the severely neutered dialect of XQuery that shipped with SQL Server
2008.

Apologies for the mini-rant, your comment appears to have triggered some
painful memories.

------
TekMol
Is it an universally accepted truth now that it's generally better to use
Postgres then to use MySql?

Would anybody here use MySql for a new project? If so, why?

~~~
falcolas
Yes, I would use MySQL. I'm very familiar with it for one, and the extra
storage engine options turn a sure thing into a no-brainer decision.

Ultimately, I think familiarity is and will always be the biggest driver for
experienced devs who choose a DB.

~~~
jeffdavis
What storage engines and options do you use?

~~~
forgot-my-pw
XtraDB if you can, InnoDB if you can't.

~~~
cookiecaper
FWIW, MariaDB 10.2 no longer defaults to XtraDB, now considering InnoDB mostly
on-par. [0]

[0] [https://mariadb.com/kb/en/mariadb/why-does-
mariadb-102-use-i...](https://mariadb.com/kb/en/mariadb/why-does-
mariadb-102-use-innodb-instead-of-xtradb/)

------
nrmitchi
> Now is when one of the main reasons to take Postgres as our new database
> server. Postgres allows the use of Views(include link) supporting INSERT,
> UPDATE and DELETE meanwhile the relation between views and tables are one to
> one, ....isn’t it awesome?

Just a heads up, I think you missed a link in there

------
ibejoeb
When talking about replaying live activity on the test system:

>We took the third option because the other two options were quite intrusive
and they might interfere at the normal use of our app because in both cases
they were going to be in between our users request and our app

One of those two rejected options was goreplay. It is passive, like tcpdump.
Right from the docs:

"Gor is not a proxy: you do not need to put 3-rd party tool to your critical
path. Instead Gor just silently analyzes the traffic of your application and
does not affect it anyhow."

------
StreamBright
Now that is what I call a proper migration. Great content! I was wondering how
could somebody end up with corrupt data, I guess it was due to MySQL not
strict enforcing the types probably.

------
ris
[https://github.com/the4thdoctor/pg_chameleon](https://github.com/the4thdoctor/pg_chameleon)

------
adrianlmm
I tried to migrate from Firebird to Postgres once and I failed, it needs more
administration than Firebird.

------
ZeusNuts
What was the reason to first migrate and then repair data? Repair scripts
could've been ran against the old setup too and you would've had less stuff to
migrate.

