When MySQL first started to take off (circa 1998) it had a lower memory requirement. So did PHP.
Memory was the most precious resource at that time, and so MySQL, PHP expanded rapidly.
They are both still going strong from that initial popularity spurt.
.
2. .:: Novice coders do not think about DB corruption ::.
:: If you are new to coding, and have not been classically trained in a Computing Degree, it seems that people don't care for transactions or consistency.
MySQL was/is worse than Postgres on both these fronts, but the users do not care. At all. And if they do care, it may be because they heard that it was important, rather than actually feeling ill without transactions.
.
:: I value my users and would never want to inflict data loss on them if I could avoid it. I also know SQL well from my Computer Engineering degree and industry experience with Oracle / PSQL / DB2. So I use Postgres and have since 1996.
Most people using MySQL don't realize they could cause data loss to their customers. Even if they do, they may still use MySQL and work around failures by having adequate backup strategies that cover them even if MySQL failed at a bad point.
One factor has to be the fact that for a long time the only way to install Postgresql on windows was to build it within Cygwin. Whereas mysql could be run on win98 with a standard installer.
Though postgresql is much easier than mysql currently (for windows installations). There's all kinds of "gotchas" on windows if you're installing mysql. For example, the normal mysql 5.1 on windows doesn't have innodb, so if you choose "transactional database" during install, mysql dies in the middle of installation and refuses to start up. By default mysql installs into "Program Files" but since that directory name has a space in it, mysql has issues. Uninstalling it is also a pain, sometimes it doesn't fully remove itself and you have to fire up Regedit to remove the mysql key from the registry.
I experienced all of these while trying to help a friend install mysql on windows. I honestly can't believe we spent 3 hours trying to install it. There's a 100 thread posting in the mysql forums for installation troubles on windows so I'm sure I wasn't the only one. On the mac it was as easy as "port install mysql5" and on my Debian machine it was just "apt-get install mysql-server".
Sadly, postgresql was so easy to install in cygwin, but cygwin doesn't have a mysql server equivalent.
Based on my experience with other developers, they can't figure out how to login to the database and create users, databases, etc, initially, IMO mainly due to the default config doing ident only by default, and not allowing passwords to be used unless you setup pg_hba.conf first. I think that gets into sysadmin territory for the masses of developers, that just want to install, run a few commands, and go.
And, even once you've got password logins working, authentication works really strangely in postgres. Every database has a single "owner" (based on who created it) who has full permissions. If you want to make more permissions available to another user, you have to grant them manually, on a table-by-table basis. Want to grant SELECT on every table in a large database? Sucks to be you!
Compare this to MySQL, where there's just a single magic table of permissions, and you'll begin to see why a lot of system administrators prefer MySQL to Postgres as well. :(
My Rails app runs on MySQL because my blog runs on MySQL and, since they're all black boxes for me anyhow, why bother installing another service.
At work, our Rails apps run on Postgres because our Big Freaking Consultingware runs on Postgres and, since they're all black boxes for me anyhow...
File that as instance #572 of the Power of Defaults. Now, go take a look at the popular OSS and commercial packages for deployment at web hosts, and you're going to see a preponderance of MySQL. (Wordpress, VBulletin, etc, etc.) That is a self-fulfilling prophecy.
A lot of it has to do with historical reasons which have reinforced themselves over time.
Here are some of the early differences:
* MySQL had an ungly way to allow users to only see their tables. While it wasn't great it was better than PostgreSQL which didn't really have a way to hide other databases.
* MySQL could create and destroy connections quickly. PostgreSQL made more robust connections but they took longer to setup.
* MySQL ran simple queryies much quicker under a light load.
* PostgreSQL database maintence tasks often required the database to be offline.
While these differences didn't make a big difference in a business environment PostgreSQL would have been a better option because:
* Use connections pools and not care about the connection setup time.
* Have a high load and care more about how the database ran under load.
* Have a trusted users and not care that they could see other databases.
* Have maintence periods.
That being said in the business world users used MS SQL or Oracle. Where MySQL took off was shared hosting which:
* Had light usage.
* Limited resources which meant that they often banned connection pooling.
* Had multiple untrusted users.
* Can handle a slowdown but not downtime.
In this situation MySQL was the better option.
This then caused a network effect where most software was written for MySQL so more companies offered it (remember at this time lots of companies only offer static hosting). Since most companies offered it more people wrote software that supported it.
Programs that support PostgreSQL often support MS SQL, MySQL and a number of databases BUT there are a number programs that only support MySQL. This means that if you want to run a number of programs it is easier to run MySQL rather than PostgreSQL since you'll have to run it anyway.
So while PostgreSQL has largely fixed the problems they had on shared hosting MySQL still has the market share.
Surprised that nobody has mentioned replication. MySQL replication is a known quantity. PostgreSQL has it, but they're third party and less proven/battle-hardened.
That definitely is important. MySQL has fairly turn-key and easy to understand replication and has had it for a while.
The Postgres crew hides behind the "replication means different things to different people, so it would be quite presumptuous of us to build it!" mantra. It's quite annoying.
You can hack together crude replication in Postgres with Write-Ahead Log (WAL) shipping. The config has some hooks in it to automate this process and bind to it. But that doesn't allow you to do circular and/or master-master replication; the receiving node has to continuously be in recovery mode.
I wouldn't say slony is less proven or battle hardened. Being one of the very few postgres replication solutions that actually work it is currently in use pretty much everywhere where replication is required.
I think slony is less popular than mysql replication because it is a royal pain the ass to setup and maintain. In mysql you flip a switch and have replication. It has a few known issues but is "mostly" reliable and understood.
In postgres/slony you enter the wonderful world of triggers and several layers of magic.
Well here's the thing: your typical web-oriented developer doesn't like SQL much. Probably because thinking in sets is very different from OO or whatever they're into. Most of them try to hide from it behind Rails/ActiveRecord or Hibernate or whatever. MySQL has a very basic, crude SQL and so doesn't intimidate them. Whereas to use PostgreSQL to its fullest requires really getting into set theory and relational thinking.
Plus if you look under the covers of MySQL each "table" is just a file on the disk. Unsophisticated database users feel comfortable with that too. "Sharding", another thing they like, is another attempt to evade using advanced features (not that partitioning is actually "advanced" these days, nor is a semi-decent query optimizer) that other databases take for granted.
Not only was I totally clueless that MySQL innards are represented on a one-file-is-one-table basis, because I'm an unsophisticated database user precisely because I don't want to know how my database works on the inside, I'm just sophisticated enough to know that any attempt to exploit the knowledge that the users table corresponds to a single file will result in my dog being assassinated by data corruption SQL ninjas.
You don't need to sacrifice single-image to scale horizontally, and you haven't for years. Sharding was invented by IBM in the 80s and all the major vendors had abandoned it by the 90s.
database newbie here - I always thought sharding was the only thing to do after your DB starts choking on the volume. I did not know there was any other way to go about it.
Care to share any information on how you would scale DB (PGSql maybe) - any google keywords would be welcome as well
So the MySQL camp would have you believe. But a) a real database running a balanced workload scales far further on the same kit than MySQL anyway and b) then you go with active/active clustering (e.g. Oracle RAC). I personally work on a 30Tb database like this and I've seen people take it to >100Tb.
A problem with shared-everything clusters like RAC is that they tend to scale poorly under write-heavy loads. Other problems would be the significant complexity, low predictability (esp. in terms of latency) and ofcourse the oracle tax.
It boils down to the old question of right-tool-for-the-job and a RAC cluster is not the right tool for most webapp scenarios.
I would say the bulk of it is ease of administration, and, as others have mentioned, the ability to get it up and going and use quickly and simply while having a relatively opaque understanding of how an RDBM works underneath.
The other major difference is that MySQL AB put a lot more effort into having an extremely concise, easily navigable and user-friendly online documentation repository and associated support community. Postgres has since made good strides in this area, but a lot of the documentation still reads like something intended for a fairly specialised audience that more or less knows what it wants; the ignoramus-friendly parts of MySQL's documentation are a lot friendlier.
For this and its administrative simplicity, it just got to be known as the quick and easy database, and Postgres as the rocket science database. (In reality, this is not true; only Oracle is the rocket science database. :-)
Also, MySQL was/is more appealing to corporate adopters since an Actual Company(TM) is Behind(R) the project. Postgres has a commercial footprint in the form of various third-party consultancies like CommandPrompt, but the core of the project is a Debian-like anarchic band of hackers. Nothing turns corporate America off more than a bunch of long-haired GNU hippies when it comes to big-ticket stuff, though they begrudgingly put up with it for Linux by now, Linux having become somewhat "legitimised" by the backing lent to it by IBM, the existence of Redhat, etc.
There are a few pain points with postgres that have been holding it back. Lacking Windows support has been mentioned (that's in the past now fortunately). Incredibly cumbersome (you could say broken) unicode support makes postgres difficult to use in not exclusively english speaking environments. And mysql is faster for simple things.
On the other hand postgres is way faster for complex queries and its support for SQL features is second to none.
I think you are mistaken about Unicode. It is much more mature on postgres. UTF8 was already offered on Postgres 7.3 while MySQL only managed to get it working in version 4.1.
Please see my other comment in this thread. There's the LIKE + indexes issue and there's the fact that postgres supports only one character set for the entire database, whereas mysql lets you choose on a per column basis.
Is that true about the broken Unicode support? I was going to try Postgres asap, but this would be a dealbreaker. I can hardly imagine it to be the case, though?
The problem is that for locales other than the "C" locale the LIKE operator will not use indexes unless you explicitly tell postgres to use text_pattern_ops. However, if you do that other comparison operators will not use that index. Effectively, what you have to do is to use two indexes on the same column, one using text_pattern_ops and one using the default operator class.
I know this is not a bug so it's debatable whether it should be called broken. Let me call it a broken design.
[Edit] And there's another workaround that allows you to avoid using two indexes. You can use text_pattern_ops and use regular expressions for all comparisons, even for equality. This solution may have other performance drawbacks. I'm not sure.
Trying to understand the issue: I suppose text_pattern_ops is a special kind of index optimized for LIKE searches? Then why should other queries use that index? Serious question, since you call the design broken - but what would be the proper way to do it in your opinion?
I don't really know what that comment was about. I've been using Postgres with a thoroughly-Unicode application stack (Django) for quite some time and never had a problem...
Which locale did you use to initialise your environment? It may well be that you haven't run into the LIKE + indexes issue or didn't notice it because the table didn't have a lot of data in it.
By the way, do you realise that Django's ORM does not support optimistic locking in a transactionally safe way?
"It may well be that you haven't run into the LIKE + indexes issue or didn't notice it because the table didn't have a lot of data in it."
I dunno, the backups (pgdump -> bzip), last time I looked at one, were over 6GB, so I'd say there's some data in there. I've just never seen Unicode-related issues.
"By the way, do you realise that Django's ORM does not support optimistic locking in a transactionally safe way?"
It doesn't really expose locking, period; consult the many threads on the dev list to find out why.
Shared-hosting providers generally offer MySQL as part of their low-end packages, but not PostgreSQL (I assume this is because MySQL is easier to administer in a shared environment). So if you want your web app to use PostgreSQL, you have to find a PostgreSQL-specific provider or take responsibility for administering your own box.
When MySQL first started to take off (circa 1998) it had a lower memory requirement. So did PHP.
Memory was the most precious resource at that time, and so MySQL, PHP expanded rapidly.
They are both still going strong from that initial popularity spurt.
.
2. .:: Novice coders do not think about DB corruption ::.
:: If you are new to coding, and have not been classically trained in a Computing Degree, it seems that people don't care for transactions or consistency.
MySQL was/is worse than Postgres on both these fronts, but the users do not care. At all. And if they do care, it may be because they heard that it was important, rather than actually feeling ill without transactions.
.
:: I value my users and would never want to inflict data loss on them if I could avoid it. I also know SQL well from my Computer Engineering degree and industry experience with Oracle / PSQL / DB2. So I use Postgres and have since 1996.
Most people using MySQL don't realize they could cause data loss to their customers. Even if they do, they may still use MySQL and work around failures by having adequate backup strategies that cover them even if MySQL failed at a bad point.