
Why we're ending support for MySQL - bjoko
https://about.gitlab.com/2019/06/27/removing-mysql-support/
======
shanemhansen
MySQL is a decent nosql option with mature replication. If you care about your
data, expect MySQL to be your foe rather than your friend.

Do you expect your triggers to fire? You are SOL
[https://bugs.mysql.com/bug.php?id=11472](https://bugs.mysql.com/bug.php?id=11472)

Do you expect to be able to insert utf8 into utf8 columns? Don't be naive.
Read the docs and you'll discover utf8mb4 or as I like to refer to it:
mysql_real_utf8.

How about transactions? Surely a real SQL database supports transactions
right? Or at the very least if it doesn't it would complain loudly when you
use a transaction where not supported right? Once again this behavior is
helpfully documented
[https://dev.mysql.com/doc/refman/8.0/en/commit.html](https://dev.mysql.com/doc/refman/8.0/en/commit.html)

Do you expect to be able to put a value into MySQL and get that same value
back out? You are SOL. But it's documented.

I can honestly say that the only appropriate time to use MySQL is when you
can't use postgres and you are willing to move from a RDBMS to something that
requires significantly more work to prevent data corruption.

I respect the hell out of MySQL engineers in terms of raw performance and the
engineering of Innodb. I'm sure they aren't happy there's a million broken
applications relying on defaults they can't change. I'm sure they aren't happy
that fundamental limits in how pluggable storage engines work make ACID
transactions not possible in the general case.

Postgres, for all of it's issues, feels like a database that has your back.

~~~
brighter2morrow
O mainly use MySQL because of phpMyAdmin. Is there anything comparable for
Postgres?

~~~
aargh_aargh
[https://www.adminer.org/](https://www.adminer.org/) for both

------
deftnerd
I mean, I'm not too upset that they're focusing on one DB engine, but their
reasons are a bit facetious.

> There are lots of great use cases for MySQL, our specific needs just didn't
> seem to be a good fit. Our use of MySQL had a number of limitations. In most
> cases, it wasn't as simple as adding support to MySQL, but that by bending
> MySQL we typically broke PostgreSQL. To name a few limitations:

> We can't support nested groups with MySQL in a performant way

All they had to do was implement a nested set pattern for their groups [1]

> We have to use hacks to increase limits on columns and this can lead to
> MySQL refusing to store data

A hack? Their DB creation schema specified a TEXT column when it should have
been a LONGTEXT column. Using LONGTEXT is not a hack, it's a choice when your
data is more than 65535 characters, and they made the wrong choice out of
ignorance.

> MySQL can't add TEXT type column without length specified

That's just incorrect. What they MEANT to say is that they had a column to
store filenames that was a varchar(255) column and people were running out of
space with long directory paths and filenames. They could have moved to a TEXT
column, but didn't because they thought it couldn't be indexed without
specifying a length... But they were wrong, you CAN index a TEXT column
without specifying the TEXT column length, you just have to specify the length
of the substring you want to index.

Alternatively, since this is filepaths and filenames, they could have used a
nested set pattern again and gotten 255 characters for each component of the
path and a lot more feature options for their search system!

> MySQL doesn't support partial indexes

This is true, but is it really a show stopper?

> As a side note – we don't use MySQL ourselves

I think this is the real reason. They just didn't have the necessary talent to
implement the features correctly. Wrong schema specifications and not knowing
to implement nested set patterns is a sign that they don't have a
knowledgeable DBA on staff.

[1]
[https://en.wikipedia.org/wiki/Nested_set_model](https://en.wikipedia.org/wiki/Nested_set_model)

~~~
ken
> All they had to do was implement a nested set pattern for their groups

I've been told a million times that SQL is declarative, meaning you say what
you want, not how to get it.

I've also been told a million times that I need to write my queries in a
specific way, or use a specific data structure, or add hints for the query
optimizer. Otherwise it'll pick the wrong "how" for your "what".

Is there any practical difference between a declarative language with a
Sufficiently Dumb Compiler, and an imperative language with weak features and
an awkward syntax? However I'm forced to write it, we all know I'm really
trying to do is get it to LOOP first over these items here, and not those
there.

What's the point of being declarative if common tasks require us to bend over
backwards to design our schema/queries/indices/hints in exactly the right way,
in order for it to be performant on two popular SQL databases (when that's
even possible)?

Even if the vaguely-English-like syntax that's completely unlike any other
computer language weren't problematic for those reasons, it seems that the
lack of abstraction is a complete buzzkill. These two databases require
different implementations for "nested groups", but there's no (remotely
portable) way to define a CREATED NESTEDGROUP to allow for a similar
interface.

Of all the languages I have to use, SQL are the ones I hate most. From decades
of writing SQL, I can say GitLab got one thing absolutely right: it's easiest
to just treat it as a family of incompatible proprietary languages. It's
easier for me to target JS and the JVM from the same program than two
different SQL databases from the same queries.

~~~
manigandham
This is why decoupling RDBMS from the application is rarely ever useful, and
why no app really changes databases anyway unless there's a massive cost or
technical problem.

~~~
wyoung2
The app I’ve been working on for the past 24 years is currently on its third
DBMS, and I would be unwilling to bet on whether it’s on its last DBMS.

There is _zero_ SQL code in our app above a very low level. All high-level
code calls through a DB API, and all SQL code is behind that abstraction
layer.

That said, each one of these transitions was a “pull up stakes and move” kind
of thing. We never tried to support multiple back-ends at once. So on those
grounds, I think this is probably a good move for GitLab.

~~~
manigandham
If you're using basic ANSI/92 SQL syntax then it'll work pretty much anywhere,
especially with a decent ORM, but as soon as you start using any of the
interesting features of a particular database, then you're locked in.

Supporting multiple databases at this point is usually not economically
feasible.

~~~
wyoung2
That's the thing: it's _super_ tempting to use nonstandard features. On our
last SQL transition, we were most bitten by use of data types the new DBMS
didn't support and by changes in the way "standard" text data types behaved
between the two. (Just as GitLab was, on that last one.)

That's why I don't regret putting all of our SQL behind that DB layer: to port
to the new DBMS, we just had to rewrite parts of that layer, rather than go on
a hunt through the whole application for one-off SQL calls, as would probably
occur if we didn't enforce this discipline.

~~~
manigandham
But why prioritize portability over better functionality at all? Why did you
move databases so often?

~~~
wyoung2
> But why prioritize portability over better functionality at all?

It's not just about portability, or even primarily. It's simply good
application layering to put the DB code behind a layer. Intermingling SQL code
in your application logic is just as bad as intermingling GUI code in your
core app logic.

But even that aside, we haven't felt the lack from putting such code a single
function call layer away. I suppose it may give us a slight amount of pause
when we consider whether it's worth creating another function in that helper
library, but that's probably a good thing. If we decide we can get our work
done by calling one of the existing functions, then why write yet another DB
access function?

Example: if a table is known to have at most 5 rows, and the query that would
allow the DB to return just the one we actually want is complicated, and it's
almost never called, why not call the existing "SELECT all" one and iterate
over the rows? The performance hit of being inefficient will be immeasurable.

> Why did you move databases so often?

That's once every 8 years on average. A _lot_ changes in the computing world
over that span.

The first change did coincidentally happen about 8 years after the company was
founded, and it was because much better DBMS tech became available to us,
within the constraints we had on choosing.

That DBMS lasted us for the next 14 years. By that time, enough had changed in
the computing world that it was worth swapping again. The new one runs about
twice as fast, takes far less code space, and is simpler to use all around
than the prior one.

We'd have to switch again this year in order to maintain the mean of once
every 8 years. My sense is that this is more likely a nonlinearly increasing
growth curve, so that the next switch will be closer to 20 years than 14, if
it happens at all.

~~~
manigandham
We might be talking about different things. Of course using an ORMs and Query
constructors to programmatically generate SQL is the best option instead of
handwritten code everywhere, but that's a different issue.

What we're discussing isn't the SQL code, it's SQL syntax and features of a
particular database over another. JSONB, partial indexing, text search,
extensions, etc. in Postgres are not available in MySQL, and not using those
features just to have the ability to switch or run on multiple databases is
almost never worth the effort.

Simple CRUD apps with an ORM can probably do so, but otherwise most business
apps shouldn't avoid using functionality for some "what-if" scenario. Having
to switch databases after a decade is just a normal development item if it
needs to happen, and usually coincides with heavy rewrites anyway, so I don't
see much value in pre-planning for that at the architectural level.

------
noneeeed
All their more general points in the post are perfectly valid arguments for
focusing on a single database that I've heard from other projects, but naming
specific technical issues like that actually makes it seam more nit-picky and
less valid.

I'm not sure many people would have found anything to argue with them about if
they had simply said "Tagetting multiple DBs requires more resources to
develop and test, limits our ability to make the most of what Postgres offers,
and increases the cost of support. The proportion of users using MySQL is
decreasing, so we have decided to focus on Postgres to give the maximum
benefit to the largest group. We understand this may cause issues for some
users so here is a comprehensive migration guide/toolset".

~~~
em-bee
right, but by naming all the issues they had, they also opened up the
discussion for learning how to solve those problems as the posts here
demonstrate.

i therefore approve of the way the announcement was made, because it brought
more benefits to the readers.

sure, some readers will walk away with the impression that mysql is somehow
limited, but i think that's a reasonable price to pay for everyone else who
gets to learn from the ensuing discussions.

------
rhacker
I remember like 10 years ago it was all the rage to support 45 different
databases. I'm so glad that time has passed. It wasn't feasible. Yes you can
orm the crap out of things so you never see a line of SQL but you end up with
index tuning on multiple database platforms. I noticed the trend thankfully
just sort of walked itself away about the time non-traditional databases
entered the picture and well it was pretty much impossible to support Mongo
and Mysql (reasonably). I'm assuming more things are going back to
Mysql/Postgres/Sybase/DB2 again, but I hope more companies and OSS projects
stick to just one this time?

The bottom line argument is this: When you make the decision to support
multiple databases think of the 10-20 years of changes your product will exist
through (and possibly a lot longer) and realize how much extra work that is
going to be even if you use ORM. Then if you come to the decision Gitlab is -
are you going to want to write the porting software for your users or just let
them be shit-out-of-luck?

------
mutt2016
Unpopular opinion, and anecdotal data point: I stopped using Solaris and MySQL
when oracle took ownership.

Linux and postgres is my new standard.

~~~
petepete
I'm not sure that's an unpopular opinion in these parts.

~~~
chessturk
> these parts.

The keywords here. Even though every dev on my team knows postgres equally or
better than mysql, management chose mysql.

I'm one of the equally camp, and don't possess a strong personal preference,
but I was surprised that it was the favorite of the devs and not trusted by
the business.

~~~
petepete
If it makes you feel any better, in the GOV.UK department I'm contracting in,
just about every project is using PostgreSQL.

It might take some time but the tide is flowing in PostgreSQL's direction.

------
lifeisstillgood
They just decided to use boring technology
([http://boringtechnology.club/](http://boringtechnology.club/)). Given they
have limited resource spreading it across two databases seems a miss, and
taking that time and attention and putting it back into git-ty stuff looks
like a win.

~~~
mattbessey
Does this really count as using boring technology? they're choosing the
younger of the two DBs and to use some of the latest features e.g lateral
joins

~~~
dijit
I think when the difference in age is less than 2% of the age of the project
then you can’t claim one to be more trendy based on age.

Postgres Initial release: 8 July 1996; 23 years ago

MySQL Initial release 23 May 1995; 24 years ago

1yr2mo difference. Not exactly earth shattering for a pair of projects two and
a half decades old.

~~~
mrami
Funny, since you mention Postgres: 1996 is only PostgreSQL's first release.
POSTGRES goes back even further [0]:

> POSTGRES has undergone several major releases since then. The first
> "demoware" system became operational in 1987 and was shown at the 1988 ACM-
> SIGMOD Conference. Version 1, described in The implementation of POSTGRES ,
> was released to a few external users in June 1989. In response to a critique
> of the first rule system ( A commentary on the POSTGRES rules system ), the
> rule system was redesigned ( On Rules, Procedures, Caching and Views in
> Database Systems ), and Version 2 was released in June 1990 with the new
> rule system. Version 3 appeared in 1991 and added support for multiple
> storage managers, an improved query executor, and a rewritten rule system.
> For the most part, subsequent releases until Postgres95 (see below) focused
> on portability and reliability.

Your point stands, nonetheless. There are no whippersnappers present.

[0]
[https://www.postgresql.org/docs/9.3/history.html](https://www.postgresql.org/docs/9.3/history.html)

------
nickcw
> MySQL can't add TEXT type column without length specified

Mysql does have a TEXT type with a max limit of 65536 chars (you can have
MEDIUMTEXT or LONGTEXT if you want more).

You don't supply a length when you use TEXT

[https://dev.mysql.com/doc/refman/8.0/en/blob.html](https://dev.mysql.com/doc/refman/8.0/en/blob.html)

~~~
pizza234
Their statement is somewhat ambiguous. They may have tried to express that
"you can't add a column of the TEXT _class_ without specifying the exact
_type_ ".

So they could be "right", but expressed the concept in an imprecise form.

In a way, I think this shows their lack of expertise with MySQL - I reckon
that somebody with a long experience would have phrased the concept more
precisely.

~~~
striking
It doesn't matter, because they're looking for a different thing entirely.
PostgreSQL TEXT is just without a limit. Simple as that.

~~~
sbov
Not true, from: [https://www.postgresql.org/docs/current/datatype-
character.h...](https://www.postgresql.org/docs/current/datatype-
character.html)

> In any case, the longest possible character string that can be stored is
> about 1 GB.

------
oarabbus_
Good, PostgreSQL is superior to MySQL in almost every way.

~~~
HatchedLake721
Are you sure? [https://eng.uber.com/mysql-
migration/](https://eng.uber.com/mysql-migration/)

~~~
dahdum
They are using mysql innodb as a NoSQL store with fast replication instead of
as a conventional database. They even call it "Schemaless".

------
aeyes
Why is such a major change implemented in a minor version?

~~~
sytse
It was announced in the major version of 12.0

I think we wanted to make it easy to upgrade to 12 before making the switch.

~~~
valtism
That isn't how semver is supposed to work though.

------
ShakataGaNai
Good for them. Both MySQL and PostgreSQL are Good^. But supporting both
doesn't work and it totally makes sense to focus on one. Personally, I don't
generally use PGSQL but my GitLab instance does. Why? It's easy to run a
dedicated PGSQL instance for GitLab using Docker. Or... Using AWS RDS. Or...
Any number of ways.

(^ Good is relative. People have their opinions on which is subjectively
better but at the end of the day, lots of big sites and applications use both
options to great success.)

~~~
why-el
It also creates huge discrepancy between what a developer using an ORM like
ActiveRecord writes and how the engines end up optimizing, meaning that should
you want to write a performant one, you have to learn both, or, short of that,
notify your DBAs of what you did, and in both cases huge resources are wasted.
I never understood combining both from a business perspective anyway.

------
tbarbugli
If you never used or heard of LATERAL [JOIN] I suggest checking that out
because it is pretty awesome and handy.

[https://www.postgresql.org/docs/current/queries-table-
expres...](https://www.postgresql.org/docs/current/queries-table-
expressions.html#QUERIES-LATERAL)

------
xenator
I think here can be a good place to share my story why I never use MySQL in my
projects. Once I was working for the company where I was an architect for some
new project. And we have few gigabytes of data from a few sources. But with
plans to gather much more in the near future. I came up with some database
structure for the first draft of the early stage of development. It was nether
big nether complicated for my scale comparing to the other projects I was
working before. Since I'm python expert I choose SQLAlchemy for ORM layer and
use its features to create the initial database schema. Postgres was a natural
choice for me and almost fit to my requirements. I was worried about
performance in the future but decided that we always can optimize when we have
something. As I said it was early stage and we were on the research stage.

Everything was smooth development-wise but the database was pretty slow since
we most use desktop-grade computers for prototyping. 3 weeks before the first
internal release owners ask me to change the database to MySQL. I was against
this step, we don't have time or resources for the experiments. But point was
that company have MySQL experts that have more than 10 years of the
optimization experience. I simply have no choice. They force me to do this
against my will.

Since I wasn't sure that in production we will use Postgres I don't allow to
use any special database dialect specific capabilities for the developers. And
switching to another backend from Postgres to the MySQL was literally equal to
a connection string change.

MySQL wasn't ready to handle 40-60 tables and queries with 5-10 simple joins.
On our data, every request just hang server (they quickly provided a server
with a lot of memory and storage for this). A few days later we found that
actually MySQL is working but the same simple queries run for 4 or more hours.
So-called "experts with 10 years of optimization" spent a week trying to fix
indexes and other things but it never happened.

After release, I left the company, because of the toxic atmosphere, but it is
another story. But since that, I lost all my faith in MySQL. Maybe for others,
it is an option but never for me.

~~~
liveoneggs
So you created an app without considering the current staff or stack, made a
zero effort change to satisfy the actual requirements, and then abandoned your
work at the first performance problem you encountered?

I am not sure MySQL was to blame here :)

~~~
Akinato
Haha right? How did that technology stack make it past approval? Was there no
oversight?

------
avitzurel
IMHO, this comes down to one thing. They lack the talent (and/or the
will/resources to recruit it) to operate MySQL.

I can TOTALLY get the reasoning behind supporting only a single engine but the
reasoning they write there are either wrong or misguiding.

~~~
avitzurel
I realize my comment hit a sensitive spot but it's really not meant to be
negative.

It is 100% legit to give that up, it is 100% legit to go with the clear winner
in your mind (it is the same in my mind too), but the technical reasoning is
weak IMHO.

if you __wanted __to support both, you can, but you clearly don 't (and that's
ok too).

~~~
benatkin
Or they want to use that talent elsewhere.

It's great that the simplicity argument doesn't really apply. It's a testament
to developers getting better at programming in the large over the years. The
database layer can be neatly abstracted away from the business rules.

------
just_myles
I have used postgres for so long that a lot of the baked in functions and
features that were super convenient for me.

I recall wanting to change a data type in my query and having to actually use
the cast function. Also it was mentioned but when I was using it CTEs were not
implemented yet. However, they weren't with postgres until 9.x as I recall
either.

Regarding performance, I mean, that is as much design as it is rdbms choice.
Also i'm no DBA.

------
js4ever
Just look how slow gitlab is on a 8cpu with 32gb or ram instance ... It's
crazy ! I'm not sure about which part is because of Rails or the way gitlab is
implemented. Issue is not about MySQL but more about the lack of knowledge /
talent at gitlab

------
bdcravens
Doesn’t removing support for something violate breadth over depth?

[https://about.gitlab.com/company/strategy/#breadth-over-
dept...](https://about.gitlab.com/company/strategy/#breadth-over-depth)

~~~
perlgeek
I think that motto refers to features more than underlying technologies.

~~~
bdcravens
I agree, but if you’re a MySQL shop that’s a feature.

~~~
mgbmtl
I guess I'm in a MySQL shop, but when it comes to Gitlab, I just want it to
run its regular upgrades without me having to do anything (and it does! I love
Gitlab). I never considered running Gitlab on MySQL.

We rarely have to fiddle in the database itself (it may have happened once or
twice during our migration from Redmine), for the rest we use the Gitlab API.

If you are at the level where you need strong control on your Gitlab install,
then you're a Gitlab shop, and you should either use the recommended
dependencies, or assume that you will be helping maintain the niche features.

------
msla
OK, but do they support MariaDB?

------
xvilka
Next step would be to rewrite everything from Ruby to Go (they already did for
some parts of GitLab infrastructure).

~~~
rswail
All of the MySQL "they just shoulda" and "they just coulda" arguments here are
irrelevant.

There's no distinct advantage of MySQL over Postgres, either in smaller or
larger environments for the gitlab use case and with their development team.

My personal experience is that Postgres is much closer to operating commercial
RDBMSs which makes the experience of DBAs much more portable and relevant.
Moving someone from the Oracle or MSSQL world to Postgres is relatively much
easier than moving them from the MySQL world.

~~~
xvilka
It wasn't sarcasm. I totally support their decision to drop MySQL. I prefer
PostgreSQL in my projects too, especially with JSON/JSONB queries, it is
amazing.

