I'm forced to use MySQL at work, because it's much easier to work with for our operations teams.
That said, my perception is that PostgreSQL is catching up to MySQL in terms of operational overhead and replication strategies faster than MySQL is catching up to PostgreSQL on the end-user side of things.
To an end-user like me, what would you point out are some current advantages that MySQL has over PostgreSQL, and what do you see the MySQL project doing to help it catch up to the growing gulf in feature parity?
Having said that, I think expectations on what is the minimal functionality have evolved, and we have responded by adding functionality like JSON in MySQL 5.7, and CTEs and Window Functions in 8.0. In terms of the specific issues you raise:
* utf8 vs utf8mb4 is "problem #1" http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf... - we have switched the default in 8.0, and will deprecate utf8mb3 to reduce confusion:
* Implicit truncation and automatic type casting is no longer the default (strict was enabled for new installs in 5.6 (2013) and all installs in 5.7 (2015)). That is, unless the standard specifies it should (there are some weird cases).
* 5.7 has virtual columns + indexes. This allows for a functional index.
Edit: Missed a word, added computed columns
In the Java world, the Spring framework is about to drop a new major release revolving around asynchronous or "reactive" programming. Many other programming languages and frameworks are already moving in this direction.
This model pulls people toward alternative databases such as Cassandra, MongoDB, etc. Because you lose many of the advantages of asynchronous processing in the application tier, if your data access tier still relies on synchronous blocking API's (such as Java's JDBC and JPA).
Recently I was researching whether any relational database vendors were adding support for asynchronous access, and stumbled across MySQL's new "X Protocol" basically by accident. The documentation is still rather new and thin, and I was puzzled that it seems to revolve 100% around the use case of MySQL as a document store to compete with MongoDB.
Are there any plans to push the X Protocol further toward the spotlight, beef up the documentation, and emphasize use cases around asynchronous access as a relational database rather than document store?
There are plans to continue developing the X Protocol. We have also chatted about doing translation from the classic protocol to X Protocol in the MySQL Router. I agree with you that async is important. Stay tuned :-)
Could you elaborate more on MySQL being the Ikea of databases? What exactly does that mean, and how do those things differentiate it from PostgreSQL currently (and how will they in the future)?
If it's mostly just around new-developer friendliness, I worry that MySQL has more of an (admittedly somewhat exaggerated) reputation as the "PHP of databases" right now: it's easy to use out of the box, but in a way that doesn't discourage poor practice and results in traps and land-mines for future development. How do you guys plan on achieving such a (laudable) goal without bringing about the kind of baggage that stereotypically has come with it? FWIW, I suspect the strictness changes you mentioned will go a long way towards addressing that, but I'm curious if there's more you have in mind.
PS. Apologies if I've set up a bit of a straw man in my last paragraph. I'm trying to predict what your answer to the prior paragraph might be and pose additional questions based on that answer to avoid an extra round-trip.
In every major release we have a pseudo quota for incompatibility (5.5 changed default storage engine, 5.6 and 5.7 refactored a lot of the optimizer and transitioned defaults like strict mode on, 8.0 makes large changes to data dictionary).
w.r.t Ikea of databases - I have a large Ikea desk that I sit some very heavy computers on. It is simple in design, but that does not make it limited.
It also doesn't allow idiotic setups, which later turns out to be a very bad decision. We had one cluster that was two machines: master & slave, over time, decision was made to write certain data just to slave, while different kinds of data was only written to master.
Then, since slave contained more data, we started backing up the slave. Now replication between slave and master broke and that was discovered week later once all old logs were purged. An interesting acrobatics needed to be performed to restore that to working condition.
Postgres, it would not let you make any changes to the slave, the replication slots make sure no logs are purged if replication is down.
It's still a bit unfair to say, because we have more MySQL than Postgres nodes, but Postgres just seems to work so currently it requires no attention. Barman doesn't make daily backups, but continuously every transaction. It can be integrated with nagios/icinga and makes much more through tests.
I think your operations team resisting is just being scared of the unknown, but it is also very beneficial to them too.
How does the landscape look today? I am under the vague impression that the Postgres developers have worked hard on improving performance (and adding more features), but the MySQL developers probably have not been sitting on their hands all those years - what are MySQL's strong points today? (I remember the documentation being very good, and I count that a strong point!)
* Performance_schema means that any time MySQL is allocating memory, performing IO or waiting on locks - there is a really easy SQL interface to debug issues. I always say that most users are not getting the performance they are entitled to, because they don't have the visibility.
* Logical replication means that it is very easy to do rolling version upgrades. It is also easier to have remote replicas and not have schema changes re-send the whole table across the wire.
* Group Replication (new) - Built in active/active HA.
* The InnoDB Storage engine is very good. It uses an update in place w/REDO model, that has a lot of nice performance characteristics for short-medium sized transactions. The IO and CPU scalability is also very good these days, and we have a number of contributors to thank for that (Percona, Facebook, Google). InnoDB supports native aio, direct io, and can read/write in multiple threads. The change buffering feature that it has (aka insert buffer) is very good at reducing IO on a number of workloads. Its compression feature is also important for reducing space on SSDs.
* I actually think our bug workflow is very good if you are a production DBA. No new features in a stable release, and only the docs team closes bugs. This has a good way of forcing the release notes to be very accurate.
* The tunables and overrides for DBAs, and the tooling is very stable. MySQL 5.7 supports server-side query rewrite based on a pattern where I can insert a query hint if required.
(The performance part, too, but so far I've been fortunate enough not to run into problems where the database was really the bottleneck, and I hope it stays that way. ;-) )
EDIT: Now that I think of it, at work we do have a MySQL instance running for a few applications that require it. I am kind of embarrassed to admit that I pay it almost no attention beyond regular backups, but I'm happy to report that is has not given me any trouble at all (which is why rarely think of it - that is kind of the Nirvana for a sysadmin like me!)
I'll work on a new version for 8.0 at some point. In the mean time:
My own benchmarks:
There will always be use-cases for each.
When tables are small and fit into memory tokudb and innodb are damn close performance-wise. But innodb has crap compression so stops fitting into memory a lot sooner, and innodb performance drops off a cliff when it stops fitting into memory. Whereas tokudb just sails on awesomely. It's easy to have multi-TB tokudb tables. Just a shame that the next jump upwards is forgotten-ware like ShardQuery. MariaDB is doing Spider and Tokudb ...
Yeap. That tokudb doesn't support foreign keys is not a data-structure thing, its a feature they cut from the mysql shim that sits on top of the fractal tree library.
I've got a list as long as my arm about missing features, features that don't work well together, blatant opportunities that the optimizer misses, etc.
In all seriousness, I'd be all over postgres instead if it just had fractal trees ;) Most of the problems are not the engine, they are the (R?)DBMS on top.
As a product manager, I do watch Aurora (along with SQL Server, Postgres, MongoDB, MariaDB etc). I'd rather answer questions about our products if you don't mind :-)
But back to your question - Yes. We are working on improving use cases like insert throughput, and changing the file format so we can support an instant DDL. Having our new data dictionary in 8.0 provides a strong basis for this.
The larger vision is a 4 step mission, described in our Keynote from last year:
Actual mission slide starts at 57:30. Step 4 is to introduce write scale out with sharding.
(Hi btw!, I'm also an Australian living in Toronto.)
* uuid type
* datetime with timezone
* storing (and retrieving) the view definition as it was defined with comments, formatting, etc
* using the same temporary table multiple times in the same query
This is not an extensive list, but are things that bite me every day with mysql.
Even though it's not query cache related, I wonder why such basic features are still missing and what are the plans to include them? They sound much simpler and more important to add than adding a nosql protocol to a sql database.
* For UUID, we've added helper functions to store it in insert-friendly order:
* For Datetime + Timezone, this is something we are looking into currently.
Datatypes are actually not simple to add in MySQL. While STRICT mode is the default, we support the upgrade case of it disabled. Which leaves us with a number of implicit conversions to handle. I wish it wasn't the case, but it's not be lack of demand on our side :-) We intend to schedule refactoring work to make this easier in the future.
* For storing/retrieving view definitions
I hear you on that one. There is a documented reason though:
> The advantage of storing a view definition in canonical form is that changes made later to the value of sql_mode will not affect the results from the view. However an additional consequence is that comments prior to SELECT are stripped from the definition by the server.
* Re-using the same temporary table can now be worked around with a CTE (8.0), which is preferred.
It is not just a case of prioritization, but also resourcing. We have a large team, and have different people working on data types from protocol work :-)
- MySQL 5.6 (2013)
- MySQL 5.7 (2015)
I have a list @ http://www.thecompletelistoffeatures.com/
- MySQL 8.0 (in development)
In terms of some of the most recent work, I think the utf8mb4 performance improvements will have a big return for users: http://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-ove...
Clearly a hard and costly problem to crack, but I always wondered with MySQL's pluggable engine tech, if something couldn't be done in this area (or even if that might have been part of the goal of the pluggable tech?).
I am not quite the right person to answer if the storage engine API can handle the use case for this. It is a slightly different problem, in that you need to push down a lot more conditions into the engine. In some ways we do this, with our MySQL cluster product already.
Isn't MySQL is dependent upon the query cache for caching subqueries? Are there plans to introduce materialized views or improve subquery performance?
No current plans to add materialized views.
Our performance team feels like ON is still the better default, as it applies to more workloads than not. Improvements were also made in 5.7 to partition the hash.
Extending it to transactional DDL is something I'd like to see in the future, but it is not in scope for 8.0.
Yes, you could architect a separate system to do this (e.g. a separate table, possibly with a stored procedure firing for changes on any of the tables), but that would necessarily involve more complexity.
This is actually, directly and indirectly, the core issue of query caches.
The one mentioned is the direct problem; the indirect one is that for each record change in any table [present in the query cache], the entire query cache must scanned and invalidated, which causes locking.
Datagrip has that Java kind of non-nativeness (weird scrolling and the looks) and it's never easy to read/write triggers, relations and export data into csv etc.
I did find Postico being decent last time I checked but what a limiting factor on the tooling.
lots of benefits when renaming tables/cols - it updates my java references too.
Saying sequel pro is the best is rather controversial, ever tried pgadmin or http://www.postgresqlstudio.org/, or sql server management studio.
Oh my god this 1000%
I will use the "worse" DB every time if it allows me to be more efficient in my work.
Makes me wonder how many simplistic client side caches check for this. Pretty sure I've seen PHP shopping apps that blindly hash the query.
Author of ProxySQL here.
Maybe the PHP app's you've seen take the same approach?
Really? That surprises me. Do you have any information about that?
My understanding is that there's no "ambient" caching. Any caching is in the ActiveRecord::Relation objects themselves. So if you had a `users = User.whatever` then `users` might have some logic for deferring to a cache in it, but if you had a separate `users2 = User.whatever` then that wouldn't be subject to the cache. I'm quite sure this is the case for associations: u = User.find(123); u.posts; u.posts, the first statement will query the DB as will the second, but the third won't, since the fact that `u.posts` has been executed will be cached in the `u` object. But if you did `u2 = User.find(123)` that would hit the DB again.
I haven't done any rails 5, so maybe it's new there.
I'm not sure when this was added. v3 or v4?
I haven't done a big analysis, but I wouldn't be surprised at some issues with NOW(), CURDATE() or maybe UUID(), etc. Especially for things happening at midnight, or a month boundary, etc.
On my production server, "SHOW STATUS LIKE 'Qcache%'" currently gives me:
Anybody here who is more "in the know" then me and can tell me what I can conclude from these numbers and/or which other stats to look at?
- You can see a hit, but what was the cost of a miss? If it was a point lookup, it is very low cost.
- Comparing hits to inserts, does not show the cost added to every other query to search the cache (Com_select) that may have not been added to the cache after execution due to judged to be non-deterministic.
- Inserting results into the query cache can cause stalls (mentioned on Rene's post)
- Performance should be judged on 99th+ percentile. Even if it makes some queries faster, has it improved your p99?
If you wanted to take a more wholistic approach, it is good to measure this from the Application with something like NewRelic.
Yeah, no idea.
Regarding "Com_select", it seems to be 0:
SHOW STATUS LIKE "Com_select"
As for the wholistic approach .. well, the server tucks away nicely. I never experience any lag and all users are happy with the speed. So there is no strong drive to investigate. Computers are so fast these days. A cheap $15/month VPS can easily handle tens of thousands of users per day.
In my experience, anyone who enabled the QC was deeply mistaken about its behaviour.
The MySQL query cache was nothing less than a _trap_ for unsophisticated DBAs.
The easy solution is to just enable full page caching. On single server setups, something like WP Super Cache works well. On distributed setups like WordPress.com, this is more appropriate: https://github.com/Automattic/batcache
It used to persistent via writing objects to the filesystem but it actually ended up being _slower_ in some configurations.
The problem is WP just makes a ton of simple queries - in any complex install the bottleneck is probably going to be network latency to the database.
I have some hypothetical numbers illustrating the impact of network latency:
The simple queries are basically free once they get to MySQL. Query cache does not help.
Maybe this should be an option with false as default value, it could help with the transition.
> The query cache has been disabled-by-default since MySQL 5.6 (2013)
What they should have done was add a notification in the error log when you startup-- "Query Cache is enabled. Are you really sure you want to do that? Read http://whatever/qcisawful.html for more information."
And this should have been added to the final releases of every version all the way back to 5.0.
For people saying "please, it's been off since 5.6 in 2013", my company's primary product runs on Percona 5.5 (2010), and our secondary product, is actively developing the _next_ version on Percona 5.6. That isn't particularly unusual.