When I first started using SQL databases, Postgres had a reputation for being a fully ACID compliant database that paid a performance penalty for it, and was hard to use. They made it "correct" as a top priority.
MySQL had a reputation for being blazingly fast, even outperforming expensive commercial DB's in some areas. However, things like foreign keys were missing. There were debates in the MySQL community as to whether foreign keys were even necessary, since you could just do all that stuff in the application layer.
That seemed reasonable to me at the time and MySQL was trivially easy to install at a time when Postgres was a huge pain, so I picked MySQL. I almost immediately got bitten by the lack of foreign keys, and although I didn't know it at the time, spent most of my time coding application logic that was a workaround for MySQL's lack of views. So I switched to Postgres, it solved many of my problems and have tried to use it whenever possible.
Every release the MySQL people get closer to Postgres's features, but there's always something missing, and as of a few years ago at least, all of these features were dependent on what backend table type you were using, which became especially annoying if you didn't have control over the installation (cheap web hosting.)
The funny thing is, MySQL's performance on microbenchmarks really meant nothing in more complex applications. A few years ago I was doing something more than a simple search that involved a few joins and intersections with MySQL; the query optimizer in MySQL seemed determined to pick the slowest possible path to return a result (intersection of a small set of rows with a huge one somehow would cause MySQL to check every row of the large set for a match). It ended up being 50 times faster to implement the logic in PHP and use two queries instead of one.
So I switched to a host with Postgres access, and the identical query that took forever in MySQL was no problem for Postgres.
After a few experiences like that, you begin to see the virtue in doing things right the first time and optimizing later. You also see the virtue in using a database system that takes the onus off of the application developer to implement logic that should be in the DB.
From my perspective, there has been absolutely no reason to choose MySQL over Postgres if you want a DB for a long time now. If you just wanted a fast data store with replication and without the ACID features, you probably really wanted Berkeley DB over MySQL.
The reason this is an interesting study is because MySQL has been so popular for so long. They did the right things at the right time to become popular.
And the timing WAS perfect; an easy to use and install database in 1999 when before the .com bubble burst, when so many people (like me) were using a database for the first time and didn't know the first thing about RDBM's.
When you have triggers and views, on the other hand, you can just "ask the right questions" of the database, and get your information; the SQL becomes self-documenting, rather than a tangled mess.
Whatever their merits, ORM layers and the problems they try to address existed long before and quite independently of the particulars and limitations MySQL.
Just as real RDBMSes don't strictly adhere to the formal math relational model, object-oriented languages vary in their, for lack of a better word, orientation.
Looked at another way, what about a relational-object model instead? Making a useful one for MySQL doesn't seem as plausible, given its chronic feature absence.
If you ignore the hype ("OMG! Slashdot uses it!!") MySQL occupies a dead ground between SQLite and Postgres... There's just no reason for it to exist.
You know what they say about premature optimization being the mother of all fuckups...
If you want sub-second granularity, the advice is to store milliseconds-since-epoch as a BIGINT and have your application convert it to a time:
My favorite quote from an experience DBA is "MySQL's biggest feature is that it's retarded." Sadly, as someone who actually likes technology and working with it, I can't possibly see that as a feature.
DDL Doesn’t Respect Transactions
My Oracle DBA friend tells me Oracle doesn't, either, not that this is any kind of excuse.
Maybe with the first-class replication in the 9.0 release, it will happen sooner rather than later.
I thought RDS wasn't so great on MySQL replication, either, or is that only if one wishes to do it off-AWS?
Since my knowledge of transactional DDL is PostgreSQL, which supports effectively everything, with no onerous locking, I now have the question: what's the difference? Was my DBA friend in error, or merely out of date?
 first-hand knowledge, anyway. My 3rd-hand knowledge of MSSQL is that it's more or less Sybase, which does fully support DDL inside a transaction.
I'd be interested if anyone took advantage of versioned transactional DDL to persist arbitrary data structures on the fly.
As I've seen DDL outside of a transaction take much longer than inside one, it may use locking there.
I don't think it exists.
CREATE TABLE A (b number); -- one transaction;
INSERT INTO a(b) VALUES(1);
CREATE TABLE c (d number); -- another transaction, commits the insert.
Said that, it's a great feature. It allows you to deploy your stored procs transactionally, instead of the locking nightmare that Oracle's model may result into.
I've found the "feature" that MySQL replication is integrated to a huge problem, especially when faced with messes made by those who installed/implemented it because understanding something more complex (like the realities of database replication) would have been too high a barrier to entry. It also, of course, meant that competing solutions were rare, something that, in the case of MySQL, I do consider a benefit.
 I hesitate to use the term "built in," since there's a giant piece missing: initial synchronization. They want me to do what?! A full dump and restore, before I even turn on the replication? That, as well as the "tip" of just copying files over always struck me as amazingly manual.
* Everything is on LVM
* Run FLUSH TABLES WITH READ LOCK
* Take an LVM snapshot then record binary log filename and position
* Run UNLOCK TABLES
* Mount snapshot, rsync to new server and bring up replication with recorded file and position
Very fast and only locks tables for a short time.
An excellent idea, which I, too, have advocated. It is, of course, external and not integrated with MySQL.
I have seen benchmark numbers from the MySQL Performance Blog people.
Of course, my startup does.
About time. =)
For whatever reason, app developers kept getting longitude and latitude backwards. I wrote a check constraint that prevented that from being possible (in many cases), so if an app was doing it backwards, the DB would reject it.
I ended up losing data (in dev -- which really meant I lost 45 minutes in having to restore it) in testing this. Then I read the docs. I expect things to do what I tell them too frequently.
Maybe I'm not pushing it enough but if that's the case, why should I jump through hoops to set up something I won't be making good use of?
If you find the usability of MySQL better than Postgres, and you don't put a lot of value on your data, then choosing MySQL is fine. The author is right to point out that a lot of the complaints are really just "this doesn't have to be this way." You should really expect your database to enforce non-null, especially if it lets you specify it and doesn't warn you it's not going to do a damn thing about it.
As I mentioned in my comment above, you can fix buggy applications, but you can't fix incorrect data.
Funny (and I use the term loosely) you should mention that.
I've just recently witnessed a situation where a BI analyst was brought in around the same time I was and realized that, not only was there likely data missing, but, worse, much of the data was wrong. Said analyst has been spending a remarkable amount of effort merely convincing everyone else that the reporting logic/infrastructure wasn't broken so much as the data.
I'm not convinced MySQL itself is to blame, but it's Awfully Suspicious, considering the use of MyISAM and a kludgy, if not exotic, replication setup.
 Business Intelligence, aka reporting. I'm not sure how common the term is. Presumably a small company that hires someone for this role considers the data to be extremely valuable.
It does. The author of the article was wrong.
I really don't understand why people have such a problem with this (and boy oh boy am I loosing a lot of karma for this opinion).
It's not like it forces this on you. First you can turn it off, and second it only happens if you leave out the columns from the insert, don't leave out the columns and you have nothing to worry about.
I personally find it quite useful. I set column defaults as needed for real data, but when testing I let MySQL put in empty values - it saves having to type each and every column that I don't care about.
If you want to specify a default, well, specify a default.
If my data integrity was so very important, I'd pass the buck to someone who knew their beans and they could find & maintain me a safe db. I'm a competent amateur (hence MySQL), not a DBA :)
Hype and community - that is why it was sold for a billion dollars.
Also note the issues with MySQL and it's large integers :) Although they might have fixed things since that post.
I'm fairly shocked that I haven't been hit with this "feature". Thankfully, I usually set a DEFAULT on timestamp columns.
1) a datatype (date with time)
2) a pseudo-version number for concurrency control
Both are equally as valid, and depending on your background, when you see the word "timestamp" with zero context you may automatically refer in your head to one definition or another.
I don't see what the problem here is if you expect #1 but got #2 when you used MySQL. This seems like the type of problem that bites you the very first time you encounter it, and then you read the documentation and say "Oh I understand, 'timestamp' has a different definition here, what I really need is the 'foo' datatype. Isn't that it interesting that one word can have more than one meaning!"
I haven't been hit with this bug either, but this was by accident (or design in the GUI interface I was using). I find it quite shocking.
Application logic belongs in the application. Don't rely on a database to do it for you, or you're going to wind up with a giant ball of mud driven by side-effects.
Putting in sane integrity checks to a database should be a priority. You can fix broken applications, but you can't fix broken data (especially if you collected it from users, sensors).
It doesn't put a NULL there, it puts zero for a number field, or a blank (0 length) string for a string field. Date fields also get 0 (00-00-0000).
Edit: I get a downmod for this? The group think here is amazing. I suppose if I bash MySQL I'll get some upmods even if the bashing is incorrect?
Have you ever programmed SQL? Because what you wrote doesn't make sense.
No data is inserted wrongly - you are simply leaving out a field. There is no mess. Just an unused field.
Are you thinking it's like csv where if you leave out a column all the others are shifted? It's not like that.
The standard says if you leave out a column that does not have a default the SQL should return an error. Instead MySQL puts in a default (but only if you tell it too in the configuration). Putting a NULL in a non-NULL field would be much worse.
However I really do think that inserting an unexpected default value is worse than inserting NULL into a NON-NULL field. The NULLs will cause problems, but they are problems you can see and resolve.
The default values are silent errors that will corrupt your data and be very difficult to recover from in the future. You can only guess which data was wrongly inserted.
There IS no data. How do you corrupt something that doesn't exist?
And NULL doesn't help either. NULL is valid data, NULL is not a replacement for programming errors (which is what this is).
This argument is pointless. People love to bash on MySQL, they look for the silliest things. The more popular something is the more people bash on it.
I understand that, but at least bash on real problems? Like the transaction DDL - that's a real problem. This? This is nonsense. (It's actually a very useful - and optional - feature BTW.)
NULL is not "default value" or "I don't care", NULL signifies "this might have a value, I just don't know what it is".
There is a very significant difference between a payroll record which states your pay is "0" vs. NULL. If the database is putting in default values, you have no way of knowing whether the employee really did have a salary, but it was incorrectly inserted as NULL, or whether the employee is unpaid.
NULL also means "value does not exist", not just "value is unknown". For example if a student is not in a class, put NULL in the class id.
NULL is perfectly valid data, and is not a replacement for a programming bug.
And with mysql if your salary field is defined as accepting NULL then you will get a NULL in there.
And to use your example if the field accepts NULL, you would also have no way of knowing if the salary was not negotiated vs a programming bug.
If you want to argue the insert should fail, then fine, no problem. (And MySQL can do that.)
But arguing that putting in NULL is better (in a field that does not accept NULL), is simply wrong. I'll say it again: NULL is not a replacement for a programming bug - NULL is valid data, and should not be used to find programming errors.
I think you mean "don't insert a row in the student_class table, which is a many-to-many join between student and class".
As a general rule of thumb, if your data schema requires NULLs for things like that, then your schema is wrong, for most of the reasons that people are trying to point out. NULLs are the absence of data, and should really only be used for exceptional circumstances - hence the reason that silently inserting NULLs into NOT NULL fields is a Bad Thing(tm).
The easier you'll make it on him not to cause horrible corruptions in the data by forgetting to update that additional table that depends on whatever he is updating - the better life will be.
Triggers, foreign keys and constraints are all excellent ways to do just that.
My point is not to throw the baby out with the bathwater. If you need massive scale then of course you need fresh approaches, but you are sacrificing a lot by abandoning a centralized DB.
If your data is moderately sized, then you will be trading a lot of data integrity, queryability and flexibility by giving up an SQL databases. Certainly consumer web services of the type that are en vogue in silicon valley need new approaches. However the majority of applications out there are probably still best served primarily by a relational DB, especially when you consider the value of a byte of corporate data vs the value of a byte of facebook data.
I've argued for years that this is what the "sharding" hype (and to some degree the current NoSQL hype) was mostly about.
I'm not suggesting a custom DBMS can't possibly be the right answer, but it seems silly to layer it on top of something as heavyweight as a relational database.
I implied, though foolishly didn't state outright, that I believe a custom DBMS is only very rarely the best answer.
MySQL is used by the largest and most successful companies in the world. It is incredibly performant and stable in the most brutal environments.
Who on earth expects transactions to be available when you're creating a schema? "drop table if exists" is a fine way to back-out if your deployment dies.
Deferred key constraints? You should be spanked for using those in the first place.
The appeal to authority doesn't add any weight to the argument.
I hate that line, you know what most of these companies have in common? They where started by guys with only a few years in the field and a vision. Some of them not the best developers in the world, just guys with a great idea for a product and a passion. The other thing they have in common, is that they have spent millions if not more on rolling their own solutions to compensate for a technology decision they made a long time ago in a garage.
This is the way it works, the new guys come in, they say well company X uses this, they build something on it because X is using it and they hit growing pains at a certain point.
Twitter is experiencing those growing pains now, fortunately they seem to be throwing out the bad and trying new things. A good deal of companies become dogmatic about their choice and given their success just throw money at it to build compensating workarounds.
MySQL was not chosen by those companies because of its merits, it was chosen because those companies started in a garage, and the technical selection was "what are the other guys using in their garage".
You are, in effect, asking someone to prove a negative.
I don't think there's any contention that scaling will always be an issue, but, rather, where along the scale curve debilitating issues occur and how painful the resolutions to those issues are.
For example, with MySQL, without sacrificing ACID properties, one cannot even separate a table's indexes onto a separate spindle from its data. For rotating storage, this make creating a new index astonishingly painful. Even with SSDs, more sequential/bulk I/O combined with eliminating contention for the paths to the drive, there can be a performance improvement.
Then there's query optimization. Is there even anything remotely close in MySQL to the EXPLAIN ANALYZE functionality in PostgreSQL?
I posit that scaling issues with MySQL come early and, once they do, require sweeping and expensive changes to address. One can't merely exercise a little cleverness and cheap hardware to tide one over for another factor of 2-4 growth.
 i.e. InnoDB, not MyISAM
Once you start doing something more advanced, though, you start seeing its flaws. I've had pretty miserable time with its query optimizer and sprocs.