

How to use MongoDB as a pure in-memory DB - areski
http://edgystuff.tumblr.com/post/49304254688/how-to-use-mongodb-as-a-pure-in-memory-db-redis-style

======
old-gregg
I don't like to sound negative, but it's hard to ignore the obvious. This
technique is absolutely pointless because MongoDB and tmpfs actually _are
doing the same thing_ , i.e. nothing. The kernel does all the work. Both of
them are mmapped regions, only MongoDB maps to its own files, while tmpfs maps
to the system swap.

Basically you're wasting RAM and introducing yet another moving part. What
happens then is: on each "write" MongoDB touches a page in RAM which maps to a
block in tmpfs file which maps back to the same RAM page which maps to a block
in a swap.

Why? If you have enough RAM to hold your working set, just use MongoDB
directly with async writes and you won't be blocked by disk I/O. You _will_
get blocked by the global DB lock, but that's another story.

EDIT: edited the sequence of mappings.

~~~
bkanber
From the article:

> The reason is that Linux is smart and it does not duplicate the pages
> between tmpfs and its cache

From you:

> MongoDB touches a page in RAM which maps to a block in tmpfs

These two statements sound contradictory to me.

~~~
old-gregg
Not really: I said "maps", which means that it's the same page, there is no
duplication. But the mere presence of tmpfs as yet another moving part adds
overhead and memory consumption.

The OP did mention the security angle, i.e. the database is never written to a
disk, so perhaps there's some convenience there.

~~~
reeses
The security angle is a bit of a canard, though. Most compromises involve
compromising running applications, not OS-level filesystem exploits.

This would improve security and reduce the potential attack surface if this
were used for ephemeral mongodb instances, but only in that it would prevent
poor programming practices from lasting beyond a working session.

------
dinedal
I'm not sure why this technique is MongoDB specific, since you can pretty much
follow all the same steps with _any_ database that only writes to disk, and
create an 'in memory' version.

In should be noted that if you really want a database that exists in RAM
without messing with the filesystem mounts, SQLite supports this out of the
box: <http://www.sqlite.org/inmemorydb.html>

------
cheald
I use this technique for my test DB. This is particularly nice since I end up
clearing data between tests, so not having to go to a platter per write ends
up noticeably speeding up my test suite.

------
ajross
I don't really see the value in this. Note that a tmpfs can still be swapped,
so unless you disable swap it's certainly not a "pure in-memory DB". You're
not really changing the performance uncharacteristics here, just the way the
backing store is managed. It really shouldn't be any faster or slower to run
in tmpfs vs. a real filesystem for a typical case. There are edge cases like
what kind of VM pressure can eject a clean page, but really nothing that would
change the architecture. This is really just a different way of tuning the
installation.

~~~
cpleppert
I think this idea is kind of pointless on an architectural level anyway. If
you are going to use an in-memory database you might as well use one that is
tuned to that kind of use case. MongoDB organizes its data assuming everything
will be flushed to disk at some point and doesn't make any of the
optimizations available to an main memory database.

Not to mention the hoops you have to go through to administer this setup.
Mongodb is already essentially tuned for working sets that fit in memory. If
your disk is a bottleneck after turning after data durability(journaling,fsync
etc) you are doing something wrong.

Redis at the very least would be far better for this use case.

------
reeses
The claim about PCI compliance is false. The requirements for DSS refer to
"data at rest", which this absolutely is. You might squeak past your internal
auditor or a really busy/lazy/cheap external auditor, but it's not going to
prevent you from having to accept all liability due to a data breach. It's not
as if PCI DSS is trying to prevent someone from stealing your physical media.

Please never use this as an excuse to store credit card numbers or track two
data in the clear.

------
izendejas
I'm not the biggest mongodb fan, but if you're stuck with it for whatever
reason (too invested to change code anytime soon, say), then for mostly read-
only collections this doesn't seem like a bad idea, actually.

Why? Mongodb does offer secondary indices among other useful features.

That being said, I'm keeping an eye on HyperDex (<http://hyperdex.org>).

------
nasalgoat
I've considered doing similar experiments, but even with this hack, MongoDB is
still orders of magnitude slower than Redis.

I actually don't understand why it's so slow - running it on a server with
256GB of RAM, writes are still disk-bound. Why aren't the disk writes running
in a subtask? Yes, I'm running with write confirm off.

Anyway, if you need the speed of Redis, use Redis.

~~~
dualogy
Have you posted this as an issue report to 10gen somewhere? Would be
interesting in a link so I can see how they explain, resolve or otherwise
follow up on it..

~~~
nasalgoat
10gen and my team are on a first-name basis at this point. Their most common
response is "we've never seen that before" whenever I report any problems.

------
mattzito
So, this solution has been around for forever - heck, even Oracle had a
solution where you could use tmpfs space as extra buffer cache.

However, the performance was not great, as it had to go through the VFS layer,
everything was in blocks and emulated a filesystem.

------
est
Or consider using this?

<https://github.com/Softmotions/ejdb>

