Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Which persistent in-memory database to use?
28 points by paraschopra on May 26, 2009 | hide | past | web | favorite | 24 comments
My particular task involves a lot of (>100 per sec or perhaps >1000 per sec) reads/writes to the database. Though I haven't tested, but I doubt the db I am using (MySQL with MyISAM as storage engine) would scale to that level without a lot of additional resources (slaves, memory, faster disks).

So I was considering replacing it with a (distributed) memory based key-value or normal RDBMS. The problem is that I want data to be persistent and be able to dump least used data to disk periodically. Ideally it would be like memcache, except instead of deleting expired keys-values, it would dump them to a persistent database.

I have explored Memcache, Memcachedb, Redis, Tokyo Cabinet, in-memory SQLlite and some other solutions but none is close to what I have in mind. Can anybody can suggest a solution? Perhaps tweaking settings of existing solutions would do the job?

PS: I am using PHP as the language, but practically any language would do if I get what I am looking for.




You threw me off with asking about a "persistent in-memory" database... kinda like an "invisible multi-color" car...

I get what you are saying, seems like you want a write-through cache. Sure seems that you are in the right ball park for a solution, but may have to write a bit of application code layered to do what you want with those parts.

It would probably be helpful if you broke up the read/writes estimates. Also, are those writes mostly inserts or updates? There are different types of solutions depending one what you expect your profile to be like.

Also are you reads just a key look up or are you doing some sort of aggregation/searching across all records?

Seems like you may be prematurely optimizing your solution and underestimating the capabilities of database engines. Most of them can probably handle the load you are thinking with correct usage and sufficient tuning and hardware. For example, I believe twitter uses MySQL:

http://highscalability.com/scaling-twitter-making-twitter-10...


It would probably be helpful if you broke up the read/writes estimates. Also, are those writes mostly inserts or updates? There are different types of solutions depending one what you expect your profile to be like.

Also are you reads just a key look up or are you doing some sort of aggregation/searching across all records?

Please also estimate your overall size demands. Manipulating 2GB of data is different than 200GB. :)


Hey, Thanks for the pointers.

I estimate Reads to be 10x more frequent than writes and Reads would mostly be a key lookup, with occassional aggregation.

Yes, I was also wondering if anybody has demonstrated sclability with MySQl for this level, as I won't have to fiddle around with code too much.


That sounds rather normal (reads 10x writes and 100s/1000s per second). As far as I understand MySQL can definitely scale to that. It does take considerable effort to configure and tune it (but any database solution will perform horribly out-of-the-box). Twitter uses MySQL. I'd doubt that you will hit their level of performance (though perhaps you are hoping to ;).

I think you should read up on using Memcached with MySQL.

Here's a good starting point: http://highscalability.com/bunch-great-strategies-using-memc...

At this point in the game (it sounds early in your development process), you aren't painting yourself in a corner (the only thing you should be worried about here IMHO) if you go with these two, but you will probably have to refine things later on to get your scalability (again, that's true with any solution you pick).

Also realize that Twitter probably has a larger hardware budget than you do and real scalability costs real money in both hardware and effort. It's been widely reported that many of their engineers have been spending considerable effort keeping things stable and improving scalability. Another performance related mantra is that "to go faster, do less". This is definitely true here... there may be things you think you need to store, but don't. Removing that effort could have a profound impact on performance.

I suppose its worth mentioning that you may want to check out AWS and their storage solutions (and push the performance problems off to them). This decision also has other business factors (like can you deal with potential outages, potential security problems, cost, etc...).


> Though I haven't tested...

Let's stop there. "I haven't tested" translates to "premature optimization". Stop whatever you're doing and measure your task.

> I have explored ... but none is close to what I have in mind.

It's not surprised that you can't find a perfect solution to your imaginary problem. Find out exactly what your issue is first, then optimize. And don't expect to perfectly engineer a system from the ground up. It's OK to layer in new systems to optimize real bottlenecks in your application.


The only thing that really matters here is the write load and the distribution of your reads. The read load can easily be handled by a memecache-like cacheing layer unless it is completely random and you don't have enough RAM. The write load will determine how fast keys get expired out of the cache layer (plus the associated read load this creates) and what sort of i/o throughput your persistence engine will require.

To be completely honest, if memcache, memcachedb, redis, and tokyo cabinet are not getting the job done then you are probably doing things very, very wrong...


>To be completely honest, if memcache, memcachedb, redis, and tokyo cabinet are not getting the job done then you are probably doing things very, very wrong...

I am not saying that they are not getting the job done. They would get the job done but with significant code changes. I was just exploring if there was a better alternative. If I don't find one, I would resort to memcache most probably.


Check out BerkeleyDB, either in its original incarnation or 'Java Edition'. On one machine, it would seem to fit your bill: uses RAM cache as much as possible, persists to disk as necessary, various tweakable options to try to either minimize IO/seeks or maximize consistency/safety.


Redis 0.100 was released 2 hours ago. You may want to take a look (http://code.google.com/p/redis). If the new features we implemented are still not enough, we could love to know what are the things you don't like, this is very valuable for us! Thanks


Why you don't like Redit? It's doing exactly what [I think] you want - in-memory cache with periodically background saving.


If I were to guess, I would assume that it is the indeterminant nature of the durability that is the killer.


you can select how frequently to save based both on time and number of updates, but if it's not enough this is where Redis replication comes into play. Just sync another server and you are sure the dataset is durable.

Note that Redis 0.100 adds non blocking replication, so you can connect a slave at runtime while the master continues to happily serve queries. Also we have a SLAVEOF command now that makes it possible to control replication at runtime: to maka a running server a replica of some other server, to stop the replication and turn a slave into a master, and so on.

Redis replication is trivial to setup.


From the redis FAQ:

You may try to load a dataset larger than your memory in Redis and see what happens

This is what threw me off when I evaluated redis.

When choosing a database then I really don't want to "try and see what happens". I want defined and documented behaviour, please.


When your program is using more ram then available it's all up to the memory usage pattern. If you have 4x the dataset then ram, but you happen to use only the latest 10% keys inserted, then most pages will be swapped on disk and rarely touched, and it will work, otherwise not.

Anyway this is an edge case, it's not a good idea even if it will work to have datasets bigger than available RAM.

Before 1.0 stable we are even introducing 'maxmemory' config option. If the dataset starts to get bigger than 'maxmemory' Redis will try to free ram removing keys with a timeout set (starting from older ones), cached objects in free lists, and so on. If still it is out of memory will start to reply only to read-only operations and will issue "-ERR using more usage bigger than maxmemory parameter" error if you try to write more data.


And what when my dataset is bigger than RAM + swap?


You should use maxmemory to avoid this, but before to reach this condition Redis will start to be very slow of course, it's hard to reach this condition without to note it.

Btw the way to go is maxmemory, that will be in the next tar.gz and in days into Git. maxmemory also allows to use Redis with a memcached semantic of volatile keys expiring to make room for the new ones when we are low on memory.


With these constraints I'd say redis is more a persistent cache rather than a database, don't you think?


Have you looked at Project Voldemort? http://project-voldemort.com/

Voldemort combines in memory caching with the storage system so that a separete caching tier is not required (instead the storage system itself is just fast.)


Really consider your case. Stuff like web user sessions is easy to put into a key value storage (memcache or tokyo cabinet for persistance). If you need more look into tokyo cabinet table type. The master-to-master replication looks like a win.


How many reads and writes? MySQL is adequate if it's 1 write and 1000 reads.

Or you could look at Oracle Coherence/TimesTen or [kx] http://kx.com/.


A paper entitled "Memory-Centric Data Management" by Curt A. Monash is worth reading:

http://www.monash.com/MCDM.pdf


eXtremeDB, the in-memory database system (IMDS) from McObject, may be a fit. Tagged, the social media site, chose the 64-bit edition of eXtremeDB for what sounds like a similar purpose, and others use it in that way as well. See:

http://www.mcobject.com/November10/2008


Another avenue to consider is a column-oriented database like Metakit, MonetDB, or FastBit.


what about kickfire: http://www.kickfire.com/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: