
How MySQL memory table saved the day - webstartupper
http://www.domcop.com/blog/how-mysql-memory-table-saved-the-day/?rid=hn
======
jumby
8 million records (at 7GB!) and it's slow means there is something seriously
wrong with your schema. That table would entirely fit in InnoDB Buffer Pool on
any modern hardware.

I want to see your slow query log.

~~~
webstartupper
The domains table currently has 84 fields. We collect metrics from various
sources, so reducing the number of fields is not feasible. All the field types
are the smallest that we could use - e.g. tinyint(4) instead of an int etc.

Since there are so many fields with data from multiple sources, we have
queries running searching on individual fields. Due to this we need to have
many indexes. 4GB of the 8GB is the size of the index itself.

~~~
gngeal
_The domains table currently has 84 fields._

Are you sure you've read up on your C. J. Date? I've had that once before:
someone complaining that "queries take too much time" with a paltry single-
digit-GB database. When I asked about the specifics, the only repeating reply
was "we can't tell you". You don't mention anything of value, but querying a
few million records can't possibly take a few minutes on the aging desktop
computer I've bought seven years ago, much less on a modern server.

~~~
webstartupper
I presume the reason it was slow was because the domains table was write
heavy. There are multiple crons running in the background selecting data from
the domains table, accessing external APIs and updating individual records.

Selects per hour: 37K Updates per hour: 170K

While the speed of selects or updates by the background crons was and is not
important, the speed of selects run by the users on the same table was
important. The easy solution was to cache the data so that the users could
search domains at a good speed. The memory table just worked brilliantly as a
cache.

(I'm no mysql guru and its my first project where MyIsam did not work for me,
so I know I definitely could do a better job of optimizing the Innodb table
and Innodb settings in my.cnf)

~~~
jumby
You claim not a MySQL guru, but are shooting down the majority opinion here
that something major is wrong with the schema design. That's fine - use your
in-memory table and then claim you need a NoSQL solution when your "big data"
hits 10GB.

How's the disk io on that linode VPS?

~~~
webstartupper
My apologies if I came across as if I was shooting down the opinions here.

------
jonaldomo
Just a heads up: [http://dev.mysql.com/doc/refman/5.1/en/memory-storage-
engine...](http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html)

The MEMORY storage engine (formerly known as HEAP) creates special-purpose
tables with contents that are stored in memory. Because the data is vulnerable
to crashes, hardware issues, or power outages, only use these tables as
temporary work areas or read-only caches for data pulled from other tables.

------
Tomdarkness
Or you could actually use something designed for indexing data and searches,
like Elasticsearch or Solr.

Either solution would have no problem indexing all their data, rather than
having to limit it to a subset to fit in a in-memory table.

~~~
herge
Does Elasticsearch or Solr work with tabular data, can you search across
multiple columns?

~~~
arethuza
Lucene models documents as a collection of fields, each with a textual value.

[http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/d...](http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/document/Document.html)

At search time you can use the default field or specify the fields to be
searched:

[http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#F...](http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Fields)

------
elbac
A better solution, is just to increase your innodb buffer size, you will get
virtually same performance as the 'memory' table once all the data is in
memory. Plus all the data will be persisted.

This is an old, but still very useful script for helping to suggest what
settings to tweak: [https://github.com/major/MySQLTuner-
perl](https://github.com/major/MySQLTuner-perl)

[http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-
pool.ht...](http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html)

~~~
webstartupper
Thanks for the link. Unfortunately, since there are 30 different columns that
can be searched on, there are many indexes and the index size for the domains
table itself is 4GB. I run this off a 2GB linode, so unless I add a lot more
RAM, Innodb is not going to match the memory table speed.

~~~
jeffdavis
I'm a little confused... the InnoDB table and memory table had the same data,
but the InnoDB table was larger (at least twice as large, I presume)?

~~~
webstartupper
The Innodb table right now has about 9 million records and is 8GB in size (4GB
index size). The memory table has a subset of the same data - 1.2 million
records and is 276MB in size.

~~~
jeffdavis
It makes me curious what an apples-to-apples comparison would look like. What
it you put the same subset in a separate innodb table and tune the memory
settings so it's likely to stay resident?

~~~
webstartupper
We had tried this before switching to the memory table. I cannot remember
exact details, but the resulting innodb table was still too large - maybe
around 1 GB and we did not want to add additional RAM.

Another issue with using a second Innodb table would be the load time. Every 3
hours we empty the memory table and load the subset from the innodb table (we
actually use an MyIsam table in the middle.. but that's not important).
Loading the table takes less than 10 seconds. If we had used an innodb table
instead of a memory table, this load time would be significant.

------
willvarfar
I am always cautious of memory tables. They don't support transactions, for
example, and don't work well multi user.

Really, the first stop is to use tokudb backend in mysql. If its still slow,
and if you have a small subset that fits in ram, just put that straight into a
hash table in app space.

~~~
mtdewcmu
As I understood it, the memory tables are read-only. They're like a cache. So
transactions aren't needed.

~~~
willvarfar
Except for that awkward time when the tables are refreshed?

~~~
taf2
Could use two memory DBs and rotate them

------
saintfiends
We did something similar at work. We had to poll for changes in a table. So
instead of polling the tables we added triggers to insert events to a MEMORY
table and polled that table. It performs good enough for us.

------
Zr40
> Varchars take up the space for all the chars defined.

This only applies to memory tables. For non-memory tables, the size of varchar
columns depends on the actual string size.

(edited. Thanks for correcting!)

~~~
webstartupper
"MEMORY tables use a fixed-length row-storage format. Variable-length types
such as VARCHAR are stored using a fixed length."

From - [http://dev.mysql.com/doc/refman/5.0/en/memory-storage-
engine...](http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html)

I had no idea about this either....

------
ww520
What are some typical queries look like? Several minutes for searching 7M
records doesn't sound right. Are the columns indexed properly?

~~~
mtdewcmu
I'm thinking that this could be a fairly easy problem and the RDBMS may be
making it harder. This is probably a tricky indexing situation, so a lot of
queries might be effectively unindexed. I'd like to see what the performance
would be using really simple methods, like dumping it to a TSV file and doing
regex searches with grep or awk. It might be surprisingly fast.

------
iamthephpguy
Haha. The Jon Snow meme got me in splits.

~~~
Flimm
You're getting downvoted because this is not Reddit. Not that we dislike
memes, we just like them on Reddit, and off HN.

