

Dammit, MySQL - stephen
http://www.draconianoverlord.com/2010/07/22/dammit-mysql.html

======
drblast
The design philosophy differences between Postgres and MySQL should be a case
study.

When I first started using SQL databases, Postgres had a reputation for being
a fully ACID compliant database that paid a performance penalty for it, and
was hard to use. They made it "correct" as a top priority.

MySQL had a reputation for being blazingly fast, even outperforming expensive
commercial DB's in some areas. However, things like foreign keys were missing.
There were debates in the MySQL community as to whether foreign keys were even
necessary, since you could just do all that stuff in the application layer.

That seemed reasonable to me at the time and MySQL was trivially easy to
install at a time when Postgres was a huge pain, so I picked MySQL. I almost
immediately got bitten by the lack of foreign keys, and although I didn't know
it at the time, spent most of my time coding application logic that was a
workaround for MySQL's lack of views. So I switched to Postgres, it solved
many of my problems and have tried to use it whenever possible.

Every release the MySQL people get closer to Postgres's features, but there's
always something missing, and as of a few years ago at least, all of these
features were dependent on what backend table type you were using, which
became especially annoying if you didn't have control over the installation
(cheap web hosting.)

The funny thing is, MySQL's performance on microbenchmarks really meant
nothing in more complex applications. A few years ago I was doing something
more than a simple search that involved a few joins and intersections with
MySQL; the query optimizer in MySQL seemed determined to pick the slowest
possible path to return a result (intersection of a small set of rows with a
huge one somehow would cause MySQL to check every row of the large set for a
match). It ended up being 50 times faster to implement the logic in PHP and
use two queries instead of one.

So I switched to a host with Postgres access, and the identical query that
took forever in MySQL was no problem for Postgres.

After a few experiences like that, you begin to see the virtue in doing things
right the first time and optimizing later. You also see the virtue in using a
database system that takes the onus off of the application developer to
implement logic that should be in the DB.

~~~
Lewisham
MySQLs lack of views for a _long_ time was just killer. I'm truly surprised
MySQL has done as well as it has for as long as it has done. I thought that
it's popularity would have tailed off when people realized just how much it
was missing (I felt a little bit unwell when I found it didn't have views),
but it seems that it just hit critical mass.

~~~
derefr
I have a feeling that the basic "need" for ORMs was originally driven purely
by the unwieldiness of querying a MySQL database. When everything is in the
application layer, you have to write abstractions on top of it to keep it
comprehensible.

When you have triggers and views, on the other hand, you can just "ask the
right questions" of the database, and get your information; the SQL becomes
self-documenting, rather than a tangled mess.

~~~
pvg
[http://en.wikipedia.org/wiki/Object-
relational_impedance_mis...](http://en.wikipedia.org/wiki/Object-
relational_impedance_mismatch)

Whatever their merits, ORM layers and the problems they try to address existed
long before and quite independently of the particulars and limitations MySQL.

~~~
mmt
I've always felt that begs the question, by assuming that a mismatch is
somehow fundamental, rather than an implementation artifact.

Just as real RDBMSes don't strictly adhere to the formal math relational
model, object-oriented languages vary in their, for lack of a better word,
orientation.

Looked at another way, what about a relational-object model instead? Making a
useful one for MySQL doesn't seem as plausible, given its chronic feature
absence.

------
samstokes
The last MySQL limitation to bite me was that time columns are truncated to
whole-second granularity - no milliseconds or microseconds.

<http://bugs.mysql.com/bug.php?id=8523>

If you want sub-second granularity, the advice is to store milliseconds-since-
epoch as a BIGINT and have your application convert it to a time:

[http://feedblog.org/2007/05/26/why-doesnt-mysql-support-
mill...](http://feedblog.org/2007/05/26/why-doesnt-mysql-support-millisecond-
datetime-resolution/)

~~~
pbh
I had the same problem (and, really, 5 years?), but "solved" it at the time
with ISO8601 strings. They sort, they're standard, they support microseconds,
they're fairly common in web services. Slower than a reasonable native type
though, obviously.

------
mmt
I only wish I were so lucky as to have been spoiled by Postgres for so long.

My favorite quote from an experience DBA is "MySQL's biggest feature is that
it's retarded." Sadly, as someone who actually _likes_ technology and working
with it, I can't possibly see that as a feature.

 _DDL Doesn’t Respect Transactions_

My Oracle DBA friend tells me Oracle doesn't, either, not that this is any
kind of excuse.

 _Maybe with the first-class replication in the 9.0 release, it will happen
sooner rather than later._

I thought RDS wasn't so great on MySQL replication, either, or is that only if
one wishes to do it off-AWS?

~~~
briansmith
MS SQL Server and Oracle use CREATE SCHEMA and ALTER SCHEMA for DDL
transactions. I think it is the ANSI/ISO standard way to do it. But, also "The
CREATE SCHEMA command does not support Oracle extensions to the ANSI CREATE
TABLE and CREATE VIEW commands (for example, the STORAGE clause)."

~~~
mmt
I'm certainly no expert on standard SQL, but I suspect that's academic.

Since my knowledge of transactional DDL is PostgreSQL[1], which supports
effectively everything, with no onerous locking, I now have the question:
what's the difference? Was my DBA friend in error, or merely out of date?

[1] first-hand knowledge, anyway. My 3rd-hand knowledge of MSSQL is that it's
more or less Sybase, which does fully support DDL inside a transaction.

~~~
pradocchia
Yes, MSSQL does support transactional DDL, via locking rather than versioning.
Not sure how PostresSQL implements it. Anyone? Row versioning would be more
concurrency-friendly, and arguably more "right".

I'd be interested if anyone took advantage of versioned transactional DDL to
persist arbitrary data structures on the fly.

~~~
mmt
Postgres does it with versioning, as that's how everything is stored on disk.

As I've seen DDL outside of a transaction take much longer than inside one, it
may use locking there.

------
sgt
I see that many of you are discussing PostgreSQL. I have to admit that I'm
already using PostgreSQL 9.0beta3 in production, and it's been rock stable so
far.

------
Calamitous
Does Postgres have reliable, tolerably-easy-to-install-and-configure
master/slave replication yet? That's the only thing that kept us off postgres
last time we were looking, we definitely need a fast, reliable, read-only
slave.

~~~
stephen
I think what you're looking for is the marque feature of 9.0:

[http://developer.postgresql.org/pgdocs/postgres/release-9-0....](http://developer.postgresql.org/pgdocs/postgres/release-9-0.html)

~~~
Calamitous
Oh, man, sweet. :) Now to get the Ops guys on board...

~~~
mmt
Any "Ops guys" who don't find something like Slony _tolerably_ easy to install
might not be suited to your environment.

I've found the "feature" that MySQL replication is integrated[1] to a huge
problem, especially when faced with messes made by those who
installed/implemented it because understanding something more complex (like
the realities of database replication) would have been too high a barrier to
entry. It also, of course, meant that competing solutions were rare, something
that, in the case of MySQL, I _do_ consider a benefit.

[1] I hesitate to use the term "built in," since there's a giant piece
missing: initial synchronization. They want me to do _what_?! A full dump and
restore, before I even _turn on_ the replication? That, as well as the "tip"
of just copying files over always struck me as amazingly manual.

~~~
chrisbolt
The way we do initial synchronization with MyISAM:

* Everything is on LVM

* Run FLUSH TABLES WITH READ LOCK

* Take an LVM snapshot then record binary log filename and position

* Run UNLOCK TABLES

* Mount snapshot, rsync to new server and bring up replication with recorded file and position

Very fast and only locks tables for a short time.

~~~
LiveTheDream
Beware of FLUSH TABLES WITH READ LOCK potentially causing an exclusive lock.
The percona guys give an explanation:
[http://www.mysqlperformanceblog.com/2010/04/24/how-fast-
is-f...](http://www.mysqlperformanceblog.com/2010/04/24/how-fast-is-flush-
tables-with-read-lock/)

------
MichaelGG
MySQL: Visual Basic's "On Error Resume Next" for databases.

------
dlsspy
This didn't mention that check constraints are _completely_ ignored. That one
bit me pretty hard the last time I tried to use MySQL.

For whatever reason, app developers kept getting longitude and latitude
backwards. I wrote a check constraint that prevented that from being possible
(in many cases), so if an app was doing it backwards, the DB would reject it.

I ended up losing data (in dev -- which really meant I lost 45 minutes in
having to restore it) in testing this. Then I read the docs. I expect things
to do what I tell them too frequently.

------
kingofspain
Only the auto-updating timestamp thing has really got in my way in the past.
Much as I liked Postgres when I used it, I have such easy access to MySQL and
I so rarely encounter any problems that I kinda ignore the toy jibes.

Maybe I'm not pushing it enough but if that's the case, why should I jump
through hoops to set up something I won't be making good use of?

~~~
Lewisham
It depends how valuable your data is to you. For many/most companies, data is
incredibly valuable, and as the company grows, it gets more valuable over
time.

If you find the usability of MySQL better than Postgres, and you don't put a
lot of value on your data, then choosing MySQL is fine. The author is right to
point out that a lot of the complaints are really just "this doesn't have to
be this way." You should really expect your database to enforce non-null,
especially if it lets you specify it and doesn't warn you it's not going to do
a damn thing about it.

As I mentioned in my comment above, you can fix buggy applications, but you
can't fix incorrect data.

~~~
ars
> You should really expect your database to enforce non-null

It does. The author of the article was wrong.

~~~
gxti
You're technically correct (the best kind of correct!) -- it's not inserting
NULL. But it is inserting a value it pulled out of its ass, which I posit is
actually worse. For text types it uses the empty string, for numbers it uses
0, etc. It seems to flag this as a "warning" but said warning does not appear
in my terminal or in the mysqld logs, so I have no idea what it says.

~~~
ars
Type "show warnings" to see them.

I really don't understand why people have such a problem with this (and boy oh
boy am I loosing a lot of karma for this opinion).

It's not like it forces this on you. First you can turn it off, and second it
only happens if you leave out the columns from the insert, don't leave out the
columns and you have nothing to worry about.

I personally find it quite useful. I set column defaults as needed for real
data, but when testing I let MySQL put in empty values - it saves having to
type each and every column that I don't care about.

~~~
gfodor
When you add "NOT NULL" to a schema, you should be thinking: "make sure the
user specifies a value for this column" not "this column should be defaulted
to zero." This is insane.

If you want to specify a default, well, _specify a default_.

------
cosmok
I wrote about how I ran into his complaint #1
here:[http://www.trk7.com/blog/programming/innodb-mysql-
implicit-a...](http://www.trk7.com/blog/programming/innodb-mysql-implicit-
autocommit/) . MySQL is quite annoying but, it seems to do the job for most of
the projects that I have worked on.

------
ww520
The only feature that tipped in MySQL's favor was easy replication. Now that
Postgres has it too, there's no more reason to use it.

------
naturalethic
Postgres has no unsigned 64 bit integer. :(

~~~
anthonyb
No, but you can define your own:

[http://bytes.com/topic/postgresql/answers/423780-numeric-
typ...](http://bytes.com/topic/postgresql/answers/423780-numeric-type-
problems)

Also note the issues with MySQL and it's large integers :) Although they might
have fixed things since that post.

~~~
naturalethic
Thanks a ton for this.

------
c00p3r
MySQL is the same thing as PHP - illusion of simplicity, and as a result -
vast and active community and availability in any distro and on all cheapest
hosting. While quality of the code and design decisions were seconded.

Hype and community - that is why it was sold for a billion dollars.

------
ergo98
What's the beef with the timestamp? I would expect, by default, that a
timestamp reflects the last change to the row. Most systems use timestamps for
optimistic locking/replication purposes.

~~~
mbreese
Um... no. Shouldn't a timestamp reflect a specific point in time? And it
shouldn't change unless there is an ON UPDATE condition set. That would be the
correct thing to do according to the principle of least surprise. You could
then set a column to auto-update, but it shouldn't be the default action.

I'm fairly shocked that I haven't been hit with this "feature". Thankfully, I
usually set a DEFAULT on timestamp columns.

~~~
randallsquared
In MySQL (and SQL Server, apparently), it reflects the specific point in time
when this row was last updated, and neither database will let you have more
than one such field in a table. The data type everyone here seems to want is
"datetime", which behaves like you want a "timestamp" to behave.

~~~
JoachimSchipper
> neither [MySQL nor SQL server] ... will let you have more than one
> [timestamp] field in a table.

Wait, what?

~~~
gxti
There can be only one because it exists solely to be the "last updated" field.
There's no point in having more than one. In MySQL, it simply only updates the
first -- so you get a _sometimes_ magic field.

~~~
JoachimSchipper
I would think it useful to have 'created' and 'updated' fields, for instance.
And silently updating only the first...

~~~
randallsquared
But you can have this: the "created" field is a datetime field, and the
"updated" field would be "timestamp". This is far less problematic, in my
opinion, than things like requiring a table with exactly one row to do simple
selects, or pretending the empty string and null are the same (both of these
behaviors are Oracle, by the way, not postgresql).

------
jbooth
Every one of his complaints is a practice I've constantly yelled at everyone
around me not to use -- for my whole career.

Application logic belongs in the application. Don't rely on a database to do
it for you, or you're going to wind up with a giant ball of mud driven by
side-effects.

~~~
kingkilr
Don't rely on the database to not put null in a column marked not nulls?
That's like saying don't rely on C to not put a struct in a char*. It's got
nothing to do with application logic.

~~~
ars
He's wrong about that BTW.

It doesn't put a NULL there, it puts zero for a number field, or a blank (0
length) string for a string field. Date fields also get 0 (00-00-0000).

Edit: I get a downmod for this? The group think here is amazing. I suppose if
I bash MySQL I'll get some upmods even if the bashing is incorrect?

~~~
dagheti
Inserting a default value is even worse than inserting NULL. Now you can't
tell what data was wrongly inserted if you tried to clean up the mess later.

~~~
ars
What?

Have you ever programmed SQL? Because what you wrote doesn't make sense.

No data is inserted wrongly - you are simply leaving out a field. There is no
mess. Just an unused field.

Are you thinking it's like csv where if you leave out a column all the others
are shifted? It's not like that.

The standard says if you leave out a column that does not have a default the
SQL should return an error. Instead MySQL puts in a default (but only if you
tell it too in the configuration). Putting a NULL in a non-NULL field would be
much worse.

~~~
dagheti
Oh I agree they are both very bad. Putting NULL in a non-NULL field is totally
wrong.

However I really do think that inserting an unexpected default value is worse
than inserting NULL into a NON-NULL field. The NULLs will cause problems, but
they are problems you can see and resolve.

The default values are silent errors that will corrupt your data and be very
difficult to recover from in the future. You can only guess which data was
wrongly inserted.

~~~
ars
I still don't see how it's possible to insert wrong data, or corrupt it??

There IS no data. How do you corrupt something that doesn't exist?

And NULL doesn't help either. NULL is valid data, NULL is not a replacement
for programming errors (which is what this is).

This argument is pointless. People love to bash on MySQL, they look for the
silliest things. The more popular something is the more people bash on it.

I understand that, but at least bash on real problems? Like the transaction
DDL - that's a real problem. This? This is nonsense. (It's actually a very
useful - and optional - feature BTW.)

~~~
Lewisham
I'm not sure you understand the use of NULL.

NULL is not "default value" or "I don't care", NULL signifies "this might have
a value, I just don't know what it is".

There is a very significant difference between a payroll record which states
your pay is "0" vs. NULL. If the database is putting in default values, you
have no way of knowing whether the employee really did have a salary, but it
was incorrectly inserted as NULL, or whether the employee is unpaid.

~~~
ars
I understand NULL very well, and that's not the only use for a NULL.

NULL also means "value does not exist", not just "value is unknown". For
example if a student is not in a class, put NULL in the class id.

NULL is perfectly valid data, and is not a replacement for a programming bug.

And with mysql if your salary field is defined as accepting NULL then you will
get a NULL in there.

And to use your example if the field accepts NULL, you would also have no way
of knowing if the salary was not negotiated vs a programming bug.

If you want to argue the insert should fail, then fine, no problem. (And MySQL
can do that.)

But arguing that putting in NULL is better (in a field that does not accept
NULL), is simply wrong. I'll say it again: NULL is not a replacement for a
programming bug - NULL is valid data, and should not be used to find
programming errors.

~~~
anthonyb
> For example if a student is not in a class, put NULL in the class id.

I think you mean "don't insert a row in the student_class table, which is a
many-to-many join between student and class".

As a general rule of thumb, if your data schema requires NULLs for things like
that, then your schema is wrong, for most of the reasons that people are
trying to point out. NULLs are the absence of data, and should really only be
used for exceptional circumstances - hence the reason that silently inserting
NULLs into NOT NULL fields is a Bad Thing(tm).

------
mmaunder
You're creating the wrong impression here by calling MySQL a toy and citing a
handful of user preference issues you have with arcane features.

<http://www.mysql.com/customers/>

MySQL is used by the largest and most successful companies in the world. It is
incredibly performant and stable in the most brutal environments.

Who on earth expects transactions to be available when you're creating a
schema? "drop table if exists" is a fine way to back-out if your deployment
dies.

Deferred key constraints? You should be spanked for using those in the first
place.

~~~
kls
_MySQL is used by the largest and most successful companies in the world. It
is incredibly performant and stable in the most brutal environments._

I hate that line, you know what most of these companies have in common? They
where started by guys with only a few years in the field and a vision. Some of
them not the best developers in the world, just guys with a great idea for a
product and a passion. The other thing they have in common, is that they have
spent millions if not more on rolling their own solutions to compensate for a
technology decision they made a long time ago in a garage.

This is the way it works, the new guys come in, they say well company X uses
this, they build something on it because X is using it and they hit growing
pains at a certain point.

Twitter is experiencing those growing pains now, fortunately they seem to be
throwing out the bad and trying new things. A good deal of companies become
dogmatic about their choice and given their success just throw money at it to
build compensating workarounds.

MySQL was not chosen by those companies because of its merits, it was chosen
because those companies started in a garage, and the technical selection was
"what are the other guys using in their garage".

~~~
alexsolo
Or just because MySQL is really to get started with when using Rails and other
frameworks, and that when you're starting a business, you think "if I actually
ever run into MySQL scaling issues, that's a good problem to have".

~~~
blueberry
Name one database that won't have scaling issues for sure. Scaling DB-wise is
not only a good problem, it is inevitable if you constantly grow.

~~~
mmt
_Name one database that won't have scaling issues for sure_

You are, in effect, asking someone to prove a negative.

I don't think there's any contention that scaling will always be an _issue_ ,
but, rather, where along the scale curve debilitating issues occur and how
painful the resolutions to those issues are.

For example, with MySQL, without sacrificing ACID properties[1], one cannot
even separate a table's indexes onto a separate spindle from its data. For
rotating storage, this make creating a new index astonishingly painful. Even
with SSDs, more sequential/bulk I/O combined with eliminating contention for
the paths to the drive, there can be a performance improvement.

Then there's query optimization. Is there even anything remotely close in
MySQL to the EXPLAIN ANALYZE functionality in PostgreSQL?

I posit that scaling issues with MySQL come early and, once they do, require
sweeping and _expensive_ changes to address. One can't merely exercise a
little cleverness and cheap hardware to tide one over for another factor of
2-4 growth.

[1] i.e. InnoDB, not MyISAM

