

MySQL 5.7: over 1M queries/sec on a single machine - continuations
http://dimitrik.free.fr/blog/archives/11-01-2013_11-30-2013.html#2013-11-22

======
eonil
Read-only performance for unknown kind of query with unknown size of dataset
using memory-cache plugin.

Hmmm… I don't see any meaningful information. Should I be impressed? Am I
missing something?

~~~
corresation
Yes, you should be impressed.

Historically the overhead of SQL put a low ceiling on the potential
performance of simple queries. Which is exactly why KV databases appeared in
the first place, because a scalar value get in most RDBMS systems barely hit
1000 results per second, and that is being optimistic. That overhead was never
optimized away because the classic use of the RDBMS was query conversations
where each interaction was very expensive (e.g. large, complex queries), so
optimizing it wasn't important.

But, thanks to competition, most database systems have gotten fantastically
more performant. Even using the TDS interface, I can get close to Redis
performance with SQL Server now, while it was beaten by one if not two orders
of magnitude a few versions ago. And in this case MySQL added the ability to
use a much simpler, lighter memcached API in from of MySQL, for obvious reason
-- the underlying database system is fast enough that it is of value.

I should add that almost everyone who has commented thus far completely
misunderstands what this is taking about. It is the Memcached API atop MySQL.
Meaning you can talk to your database server through that API from any client
programmed to talk to a memcached server. And that same data that you access
via memcached (reading or writing) can be accessed in your normal SQL. Which
is pretty cool.

~~~
eonil
When I see explicit mention of _API_ , it implies there's no running memcache
daemon.

I dug MySQL 5.7 manual several times, but I couldn't find any proof that this
is just an _memcache API_ atop MySQL.

But I found some proof that says there's a _real memcache daemon_ atop MySQL.

> Running memcached in the same process space as the MySQL server avoids the
> network overhead of passing requests back and forth.

MySQL 5.7 manual clarifies that they're running memcache daemon inside the
database engine in-process mode. So actually real memcache is serving the data
atop the engine.

So, if dataset is read-only and small enough, it's likely to happen that whole
the data is being served by memcache. In that case, it cannot be slower than
memcache.

That's why I couldn't be impressed. The only thing MySQL doing is automatic
handling of memcache daemon. That should be convenient, but not impressive.

Maybe I am wrong. If you think I am wrong, please show me some reference that
explicitly mentions this is just API, and there's no running memcache
instance.

Also, regardless of whatever is really happening under hood, this benchmark is
still meaningless. Because database exists to _store_ data and all the hard
stuffs are only happening when storing data. Writing transactions usually
degrade performance a lot. That's why writing transactions must be included in
benchmark to show the _real world_ performance.

~~~
corresation
>Maybe I am wrong. If you think I am wrong, please show me some reference that
explicitly mentions this is just API, and there's no running memcache
instance.

[https://dev.mysql.com/doc/refman/5.6/en/innodb-memcached-
ben...](https://dev.mysql.com/doc/refman/5.6/en/innodb-memcached-
benefits.html)

Yes, there is a memcached "daemon" in the sense that they run a memcached host
within mysql, which they obviously have to do. That memcached "instance" reads
and writes from InnoDB and the shared buffer pool also used by the SQL engine:
It does not maintain a unique memory instance, and values read or written are
atomically consistent with the mySQL instance.

The single and only reason this product exists is that the memcached API is
much simpler and network/parsing efficient, allowing for much better round-
trip-times when doing simple queries.

~~~
eonil
OK. You're right. MySQL seems using InnoDB buffer pool. I found the mention.

> You do not need to manually load data into memcached at startup. As
> particular keys are requested by an application, the values are retrieved
> from the database automatically, and cached in memory using the InnoDB
> buffer pool.

So memory-cache is provided the engine itself, not by external plugin.

------
jeffdavis
For comparison, here's a scalability benchmark for postgres (also read-only)
from a year and a half ago:

[http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-
abo...](http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-
about-64.html)

It's hard for me to tell what the MySQL benchmark was really doing, but the
postgres one seems a bit more "normal" (ordinary SQL queries from postgres's
ordinary simple benchmark tool).

So, not apples-to-apples, but still interesting.

~~~
jeltz
I would have loved if they had benchmarked it against doing them same thing in
MySQL with SQL so we could see how much overhead is added by the SQL layer for
simple selects.

PostgreSQL manages to reach a third of their QPS in Haas's benchmark.

------
lelf
InnoDB API on top of memcached. _Read-only_. On the same “biggest HW box we
have in the LAB” machine.

over 1,000,000 Query/sec

Cool. Carry on guys.

~~~
martinml
Well, the performance under the same conditions was 500KQPS before, so
definitely there was an improvement!

~~~
hahainternet
Right but if someone showed a similar improvement by adding enough RAM to
prevent swapping, would you be impressed? This is what is going on here. They
have very much successfully proven that RAM is faster than HDD.

~~~
nailk
They have proven that SQL-based storage is faster than NOSQL databases.

The poster of the top comment doesn't understand what this topic is about.

This is actually a benchmark of InnoDB table access. You can query it using
usual SQL-commands. But if you want faster access, you can use memcache API.

"Read-only" is also misleading here. This is just a benchmark of get command.

~~~
Xorlev
"They have proven that SQL-based storage is faster than NOSQL databases."

Too broad of a statement. 'NoSQL' is an umbrella term, and thus likely has
several contenders for a raw QPS metric.

The additional step of parsing the query is obviated by statement caching.
You're right though, it's simply a benchmark of how fast the query path is to
the InnoDB buffer pool. The work is to optimize internal contention.

Very impressive results though.

------
morgo
For context, the first NoSQL interface for MySQL was:
[http://yoshinorimatsunobu.blogspot.ca/2010/10/using-mysql-
as...](http://yoshinorimatsunobu.blogspot.ca/2010/10/using-mysql-as-nosql-
story-for.html)

(It's appeared on HN a few times).

The MySQL team decided to run with a more standard memcached interface, and
MySQL actually embeds memcached to do this (with MySQL as backend storage).

Despite being read-only, the over 1M number highlights a lot work thats been
done for multi-core scalability. This test was on a 48 score machine.. that
was a distant dream a few years ago on MySQL 5.1.

------
chris_wot
A benchmark is of absolutely no use whatsoever unless you give others a chance
to reproduce it.

------
buremba
1M queries doesn't make sense if most of them actually use Memcached as
backend. You can run more than 1M queries like "select 1", it does the same
thing. The magic is Memcached.

~~~
baq
true but offtopic. from the docs:

==== 8< ====

MySQL 5.7 includes a NoSQL interface, using an integrated memcached daemon
that can automatically store data and retrieve it from InnoDB tables, turning
the MySQL server into a fast “key-value store” for single-row insert, update,
or delete operations. You can still also access the same tables through SQL
for convenience, complex queries, bulk operations, application compatibility,
and other strengths of traditional database software.

With this NoSQL interface, you use the familiar memcached API to speed up
database operations, letting InnoDB handle memory caching using its buffer
pool mechanism. Data modified through memcached operations such as ADD, SET,
INCR are stored to disk, using the familiar InnoDB mechanisms such as change
buffering, the doublewrite buffer, and crash recovery.

==== 8< ====

this is what is being benchmarked here, not pure memcached.

~~~
NickNameNick
Are these in memory innoDB tables durable?

~~~
jeltz
As far as I can tell this is just a a memcached interface to ordinary InnoDB
tables, so yes. 1000k queries per second also sounds plausible since stock
PostgreSQL can reach almost 400k queries per second with normal SQL queries.

The storage engines of relational databases are this fast (for read-only
queries to datasets which fit in memory).

------
CSDude
1M queries of what? Group by, 50 table joins or simple select count(*)?

~~~
riffraff
this is the memcache compatible interface, so they should be

`select y where key = x limit 1`

~~~
CSDude
So memcache should be faster than 1M queries, since only thing that MySQL does
here to add overhead?

~~~
cpleppert
Not necessarily, InnoDB has optimizations for in-memory tables and there isn't
any write overhead that would make the greater complexity of the
transaction/disk layer come into play.

~~~
wcummings
This benchmark is read-only so write-overhead doesn't matter

------
Mahn
> Read-Only

> Memcached

Well duh.

~~~
jeltz
It is read-only, InnoDB, and memcached. InnoDB is the storage engine used here
for memcached.

------
JohnTHaller
But is it web scale?

~~~
wooptoo
No, everybody knows Mongo is web scale.

~~~
eonil
Everybody also knows Mongo has no transaction. Not a comparable product. Ah
maybe comparable in this case, because this is read-only test.

~~~
MichaelGG
Transactions apply to read-only workloads. Getting a consistent read across
multiple records can be very important.

~~~
eonil
That's only true when we have any write operation. If all dataset are purely
read-only, it's always consistent. And I believe this article is the case.

------
shocks
<sarcasm> Pfft, GenericNoSQLDB has always been fast and totally mature and
production ready, honest. </sarcasm>

------
smoyer
I think I can safely claim that (with the right team) I can scale any database
to match that rate. The real problem with scaling is that until you hit a
limit, you don't generally know where the limit is ... and at the limit, you
need the system to do something reasonable (in this case, I'd say drop read
and/or write requests that couldn't be done consistently). I also wonder if
you couldn't perform the same feat with only memcached (wink wink).

Systems that behave erratically when you hit a limit should be avoided, and I
often wonder if people publish articles like this because they themselves are
amazed it was possible. For a well-behaved system, there's no need for fanfare
as you grow, but when you hit a limit you have to engineer a reasonable
solution.

As an aside, it's pretty amazing (to some who's first computer had 2K of RAM)
how big you can scale an individual server these days. The competition I'd
like to see is when another team accomplishes the same feat but with half the
memory, or half the number of cores or a result set that's twice as large -
and then publishes how they've done it. A little friendly competition might
result in some amazing discoveries.

