
Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server - davecardwell
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
======
mike-cardwell
Neat. I didn't know you could do anything like that with MySQL. Does Postgres
have an equivalent? I wouldn't move from an SQL DB to a NoSQL key/value store
without a significant performance improvement. Seems like you can tune MySQL
far more than I realised...

~~~
quassnoi
InnoDB was initially designed as a KVS. MySQL only added the SQL frontend to
it.

Initially, MySQL also supported BerkeleyDB, another KVS, but as for now they
dropped the support.

From InnoDB's point of view, MySQL is just a client, as is HandlerSocket the
author is using.

PostgreSQL's storage engine is much more closely integrated into the system
and has no low-level API easily available from the outside.

~~~
morgo
Yes, InnoDB was standalone. But it had it's own SQL parser and things like
Stored Procedures as well ;) It was rudimentary, but I don't know if KVS is
the best description.

How MySQL and InnoDB talk is very much a K<->V / row based relationship.

------
jtdowney
Interesting that bypassing the SQL part of MySQL would increase performance by
that much. In the comments he says that prepared statements would still be
slower but it would be useful to see some numbers on his hardware to compare
with.

~~~
mike-cardwell
I agree that the figures would be useful. His response in the comments was
reasonable though:

"Prepared Statement is still expensive in MySQL. It avoids SQL parsing, but it
still has to open/lock/unlock/close tables, make query execution plans, etc."

~~~
jamaicahest
The whole point of the article is to measure QPS, so I think it's fair to want
numbers to back up that statement instead of speculating.

------
mhd
So, if you're bypassing most of the bits of an RDBMS, what do you actually
gain? Is the InnoDB storage engine that much superior compared to other
products? Or is it about the management and admin features, e.g. sharding,
replication?

~~~
quassnoi
_So, if you're bypassing most of the bits of an RDBMS, what do you actually
gain?_

Speed.

 _Is the InnoDB storage engine that much superior compared to other products?_

InnoDB is faster than MySQL over InnoDB.

~~~
mhd
Sorry about the awkwardly worded question. What I meant is, what do you
actually get from using MySQL? I can understand that a lot of people aren't
willing to make a switch to some dedicated KVS, but going through all that
effort would seem to require the same amount of time, if not more. So there
has to be something about using MySQL that makes it worth using even in this
stripped down configuration.

~~~
mike-cardwell
I think the idea is that you store your data in a relational model and access
it using SQL. But for things that need to be looked up a lot, and very
quickly, rather than tagging on a KV store in front of MySQL, use the method
he describes instead.

This way you don't end up duplicating the cache, it is shared between the two
different types of lookups. You also don't end up with data inconsistancies.
And yet you get the speed that a dedicated KV store would have got you anyway.

------
robinhouston
There was a little discussion when this article was posted here three weeks
ago: <http://news.ycombinator.com/item?id=1807527>

(OT: under what circumstances are duplicate submissions allowed? There's
nothing in the FAQ or the Guidelines.)

~~~
davecardwell
Interesting… I didn't bother searching for a previous discussion as I was
under the impression that when you submit a URL that was already submitted you
were simply taken to that page and upvoted it.

~~~
51Cards
I think this is what should happen, or at least the duplication should be
noted to the poster before they commit their new entry.

------
tmountain
Pretty cool. It seems this kind of hack would also integrate well into the
Drizzle project as it's designed for light weight pluggable storage. I'm
curious if this has any impact on thread buffers and the number of clients
MySQL can accept. It's great to be able to perform 750k QPS, but if you're
still limited to a few thousand clients, other solutions may be preferable.

------
bluesnowmonkey
Cool project. It's great to see people doing things like this. On the other
hand, I don't know that it would be useful for many people. Note this caveat
near the end.

> No benefit for HDD bound workloads

Which is most workloads.

The author cites the expense of memory as the drawback to memcache. It's true,
a large pool of dedicated memcache machines can be quite expensive. However,
the given technology only works when the database fits in memory. If the
database fits, then a cached subset of it certainly will fit as well. So
memcache will only use a single server. One server is not a major expense
compared to overhauling an application to marry it to a custom MySQL
extension.

Also consider what happens if the database ever does grow beyond memory
capacity. If you're using memcache, you might just need to add more cache
servers. With HandlerSocket you'd have to... switch to memcache. Then add more
cache servers.

~~~
quassnoi
"HDD bound" here means that not all data fits into the cache.

Given the average RAM size on even an entry-level server (some 16 or maybe
even 8 GB) and the average size of a database (same 8GB on
<http://musicbrainz.org/>, the largest online music database), I wouldn't
claim that it's "most workloads".

~~~
icey
If we're comparing anecdotes, the _smallest_ database I deal with on any given
day is 50GB. Enterprise applications trap lots of data. I don't know that most
of our queries are HDD bound, but a significant portion of them are.

~~~
quassnoi
OK, OK. Most of _your_ workloads are HDD bound.

~~~
icey
Isn't that the definition of anecdote?

