
MySQL 8.0: Retiring Support for the Query Cache - aleksi
http://mysqlserverteam.com/mysql-8-0-retiring-support-for-the-query-cache/
======
morgo
Product Manager for the MySQL Server here (and post author). Happy to answer
any questions...

~~~
stouset
I'm a developer who drastically prefers PostgreSQL due to things like window
functions, a more predictable query planner, a variety of index types and the
ability to index over calculated fields, improved strictness out of the box
(e.g., UTF-8 being UTF-8, no implicit truncation of long strings, no silent
and lossy automatic typecasting), and so on.

I'm forced to use MySQL at work, because it's much easier to work with for our
operations teams.

That said, my perception is that PostgreSQL is catching up to MySQL in terms
of operational overhead and replication strategies faster than MySQL is
catching up to PostgreSQL on the end-user side of things.

To an end-user like me, what would you point out are some current advantages
that MySQL has over PostgreSQL, and what do you see the MySQL project doing to
help it catch up to the growing gulf in feature parity?

~~~
morgo
I would maybe start off by saying feature parity was/is never the goal. The
original goal of MySQL was to be the "Ikea of databases" (both come from
Sweden).

Having said that, I think expectations on what is the minimal functionality
have evolved, and we have responded by adding functionality like JSON in MySQL
5.7, and CTEs and Window Functions in 8.0. In terms of the specific issues you
raise:

* utf8 vs utf8mb4 is "problem #1" [http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf...](http://mysqlserverteam.com/sushi-beer-an-introduction-of-utf8-support-in-mysql-8-0/) \- we have switched the default in 8.0, and will deprecate utf8mb3 to reduce confusion: [http://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-ove...](http://mysqlserverteam.com/mysql-8-0-when-to-use-utf8mb3-over-utf8mb4/)

* Implicit truncation and automatic type casting is no longer the default (strict was enabled for new installs in 5.6 (2013) and all installs in 5.7 (2015)). That is, unless the standard specifies it should (there are some weird cases).

* 5.7 has virtual columns + indexes. This allows for a functional index.

Edit: Missed a word, added computed columns

~~~
StevePerkins
> adding functionality like JSON in MySQL 5.7

In the Java world, the Spring framework is about to drop a new major release
revolving around asynchronous or "reactive" programming. Many other
programming languages and frameworks are already moving in this direction.

This model pulls people toward alternative databases such as Cassandra,
MongoDB, etc. Because you lose many of the advantages of asynchronous
processing in the application tier, if your data access tier still relies on
synchronous blocking API's (such as Java's JDBC and JPA).

Recently I was researching whether any relational database vendors were adding
support for asynchronous access, and stumbled across MySQL's new "X Protocol"
basically by accident. The documentation is still rather new and thin, and I
was puzzled that it seems to revolve 100% around the use case of MySQL as a
document store to compete with MongoDB.

Are there any plans to push the X Protocol further toward the spotlight, beef
up the documentation, and emphasize use cases around asynchronous access as a
relational database rather than document store?

~~~
morgo
The "X" comes from it being a crossover/hybrid protocol between SQL and a CRUD
API. It has a lot of modern protocol features, and uses protobufs, which makes
it easy to add new driver support. The CRUD API has overlap with Document
Stores like MongoDB, but the X Protocol really goes beyond that.

There are plans to continue developing the X Protocol. We have also chatted
about doing translation from the classic protocol to X Protocol in the MySQL
Router. I agree with you that async is important. Stay tuned :-)

------
rubiquity
These days it is much easier to implement a query cache specific to your
application's needs in other layers of your infrastructure. These other layers
also tend to scale horizontally much more easily than a relational database
does. I like it.

~~~
larkost
In a former job I had an application that made excellent use of the Query
Cache. A number of computers (in clusters across a large campus) that all
needed to check on the same computed value that was derived from a query that
spanned 5 tables. The computers each checked in up to every 5 minutes, and
really only needed to know when the computer value changed, but without
setting up some sort of push system (which would have been engineering
expensive), we just fell back on the Query Cache. Worked beautifully,
especially since we would only see an extra recompute a few times a day.

Yes, you could architect a separate system to do this (e.g. a separate table,
possibly with a stored procedure firing for changes on any of the tables), but
that would necessarily involve more complexity.

~~~
pizza234
This is a quite lucky edge case; in order for this to work, you need to have
the underlying tables (5 in this case) change infrequently.

This is actually, directly and indirectly, the core issue of query caches.

The one mentioned is the direct problem; the indirect one is that for each
record change in any table [present in the query cache], the entire query
cache must scanned and invalidated, which causes locking.

------
h1d
It's good that MySQL gets better because I'm tied to it since Sequel Pro is
the best database browser there is and it only supports MySQL.

Datagrip has that Java kind of non-nativeness (weird scrolling and the looks)
and it's never easy to read/write triggers, relations and export data into csv
etc.

I did find Postico being decent last time I checked but what a limiting factor
on the tooling.

~~~
manigandham
[http://dbeaver.jkiss.org/](http://dbeaver.jkiss.org/) is the best I've used
for a universal SQL (and nosql) client.

~~~
beefsack
I started using it since it was last mentioned on HN and it's really
impressive, I've been converted.

------
tyingq
>>Use of non-deterministic features will result in the query not being cached
(including temporary tables, user variables, RAND(), NOW() and UDFs.)

Makes me wonder how many simplistic client side caches check for this. Pretty
sure I've seen PHP shopping apps that blindly hash the query.

~~~
johannes1234321
How often would that actually be a problem? RAND() and udfs aren't often used
in queries, I would say. And NOW() is used for something like "items from last
week" where some time difference doesn't matter too much (you need some cache
invalidation as new data comes anyways)

~~~
morgo
I don't think it's common per-se, but it leads into the next point as well.
Because of MVCC, what you are supposed to see (while returning non-stale
results) gets complicated.

------
raarts
This PostgreSQL diehard had a usecase that required a cluster of 3 masters to
stay up if two of them died. Turned out that was really easy with MySQL multi-
master and multi-source where each master is a slave to the other two.

------
lathiat
Though this is probably a good move, I think it's going to catch out a lot of
bad web apps based on my experience of trying to disable it at a web hosting
company :-)

------
TekMol
I always assumed that it makes sense to have the query cache active. It just
makes sense logically.

On my production server, "SHOW STATUS LIKE 'Qcache%'" currently gives me:

    
    
        Qcache_free_blocks	2184
        Qcache_free_memory	7629288
        Qcache_hits	328939349
        Qcache_inserts	14440714
        Qcache_lowmem_prunes	7814306
        Qcache_not_cached	290027559
        Qcache_queries_in_cache	7497
        Qcache_total_blocks	17225
    

About 20 hits per insert. Sounds good to me.

Anybody here who is more "in the know" then me and can tell me what I can
conclude from these numbers and/or which other stats to look at?

~~~
morgo
It's not easily possible to tell from these numbers.

For example:

\- You can see a hit, but what was the cost of a miss? If it was a point
lookup, it is very low cost.

\- Comparing hits to inserts, does not show the cost added to every other
query to search the cache (Com_select) that may have not been added to the
cache after execution due to judged to be non-deterministic.

\- Inserting results into the query cache can cause stalls (mentioned on
Rene's post)

\- Performance should be judged on 99th+ percentile. Even if it makes some
queries faster, has it improved your p99?

If you wanted to take a more wholistic approach, it is good to measure this
from the Application with something like NewRelic.

~~~
TekMol
> what was the cost of a miss

Yeah, no idea.

Regarding "Com_select", it seems to be 0:

    
    
        SHOW STATUS LIKE "Com_select"
    
        Variable_name       Value
        Com_select          0
    

What does that mean?

As for the wholistic approach .. well, the server tucks away nicely. I never
experience any lag and all users are happy with the speed. So there is no
strong drive to investigate. Computers are so fast these days. A cheap
$15/month VPS can easily handle tens of thousands of users per day.

------
Markxr42
It is definitely high time, the QC is an abomination, it's totally at odds
with the idea of MVCC.

In my experience, anyone who enabled the QC was deeply mistaken about its
behaviour.

~~~
hendersoon
Exactly! This should have been done a long time ago.

The MySQL query cache was nothing less than a _trap_ for unsophisticated DBAs.

------
z3t4
when optimizing old school server rendered web apps turning on the query cache
always gave the most bang for the buck. hit rates around 90% or more.

------
jbergstroem
Out of curiosity: how does people using Wordpress [at somewhat scale] handle
layers of cache? Last I used it, relying on the Wordpress cache layer just
didn't cover enough cases. The MySQL one "saved" me seeing how Wordpress likes
to ask the same question. A lot.

~~~
Viper007Bond
If you enable WordPress's object cache, it should get most page views down to
a couple or even zero queries. It's poorly written themes and plugins where
problems can arise.

The easy solution is to just enable full page caching. On single server
setups, something like WP Super Cache works well. On distributed setups like
WordPress.com, this is more appropriate:
[https://github.com/Automattic/batcache](https://github.com/Automattic/batcache)

~~~
tyingq
The object cache doesn't persist across HTTP requests. You have to install
some 3rd party plugin for that.[1]

[1][https://codex.wordpress.org/Class_Reference/WP_Object_Cache](https://codex.wordpress.org/Class_Reference/WP_Object_Cache)

~~~
Viper007Bond
Correct, which is what I meant by enabling it. Apologies for not being
clearer.

It used to persistent via writing objects to the filesystem but it actually
ended up being _slower_ in some configurations.

------
tiennou74
I am trying [http://www.heimdalldata.com/](http://www.heimdalldata.com/) as a
replacement.

------
jarnix
I'm not sure that it will "only affect a small number of users".

Maybe this should be an option with false as default value, it could help with
the transition.

~~~
reipahb
It has already been disabled by default for a while. From the article:

> The query cache has been disabled-by-default since MySQL 5.6 (2013)

~~~
hendersoon
Yep.

What they should have done was add a notification in the error log when you
startup-- "Query Cache is enabled. Are you really sure you want to do that?
Read [http://whatever/qcisawful.html](http://whatever/qcisawful.html) for more
information."

And this should have been added to the final releases of every version all the
way back to 5.0.

For people saying "please, it's been off since 5.6 in 2013", my company's
primary product runs on Percona 5.5 (2010), and our secondary product, is
actively developing the _next_ version on Percona 5.6. That isn't particularly
unusual.

------
tiennou74
I'm working with a company called
[http://www.heimdalldata.com/](http://www.heimdalldata.com/) they do SQL auto-
caching with no code changes. A perfect replacement for Query Cache if you
have a read-heavy application. They also offer a nice dashboard that helps
identify database inefficiencies. I would try them out to see if you get a
performance boost.

~~~
munchai
How is this different from MySQL query caching?

~~~
tiennou74
Unlike other solutions, the software auto-caches and auto-invalidates SQL; it
takes away a developer having to manually configure the cache. Heimdall works
with both MySQL and Postgres

