
LMDB – Lightning Memory-Mapped Database Manager - hoov
http://www.lmdb.tech/doc/
======
corysama
I’ve used LMDB as a simpler alternative to SQLite as “an alternative to
fopen”. The goal was simply robust file writes in the face of unpredictable
server reboots for a tiny Python program writing data to be processed later by
a tiny C++ program.

That’s harder than it sounds to roll by hand with fopen. SQLite with write
ahead logging is pretty much as good as it gets for reliablity, but SQL at all
was overkill for the task. LMDB is a close second and it’s memory mapped key-
value interface is much simpler. . Would write again.

[https://lwn.net/Articles/457667/](https://lwn.net/Articles/457667/)

~~~
isaachier
LMDB is a storage engine whereas SQLite is a small database. There is even a
version of SQLite that used LMDB as the underlying storage engine:
[https://github.com/LMDB/sqlightning](https://github.com/LMDB/sqlightning).

------
ddorian43
I like LMDB, but why does ~most sql/nosql use LSM/rocksdb compared to it ? At
least the ones going for read-speed ? Cause of missing WAL ?

There is also a fork? who claims is better/more-features than LMDB:
[https://github.com/leo-yuriev/libmdbx](https://github.com/leo-yuriev/libmdbx)

~~~
espeed
To understand the tradeoffs between LSMs, B-Trees, and Fractal Trees, see the
references in this previous post on TokuDB and Bε trees...

* BetrFS: An in-kernel file system that uses Bε trees to organize on-disk storage [https://news.ycombinator.com/item?id=18202935](https://news.ycombinator.com/item?id=18202935)

Memory model considerations and storage architecture design gets even more
interesting now that NVMe has become a thing. For example, in addition to
LMDB, how much more interesting have things become for Redis on NVMe?

* Caching Beyond RAM: The Case for NVMe [https://news.ycombinator.com/item?id=17315494](https://news.ycombinator.com/item?id=17315494)

* Intel Optane DC Persistent Memory is officially in Google Cloud [https://news.ycombinator.com/item?id=1834816](https://news.ycombinator.com/item?id=1834816)

And there are a few new forward-thinking DB architectures emerging on the
scene, some that have been in the works for more than 10 years. Look at the
work being done by the Berkeley RISELab team and the architecture behind
Fluent DB.

* Ground: A Data Context Service (2017) [pdf] (berkeley.edu), [https://news.ycombinator.com/item?id=18415456](https://news.ycombinator.com/item?id=18415456)

What might have been conventional wisdom in the realm of DBs years ago will
not be the best practices of today. Architectures have changed too much.

And this is not just true for storage, it's true for compute too. The
availability of CPU/GPU/TPU accelerators in the data centers is driving a
rethink in compute toward parallel algorithms in the form of
Vector/Matrix/Tensor multiplication. The best way to store and index these
arrays is something to consider too.

~~~
ddorian43
Can these be used for async filesystem access (AIO) that seastar framework
does ? (i don't think so for now)

At least it doesn't support mmap so removes lmdb.

------
jimis
LMDB is a very good choice for many well-known reasons. I don't need to expand
here, the advantages are well documented, and more and more projects are
choosing LMDB.

However LMDB does not solve all problems, and can be a bad choice for some,
and I couldn't find this documented anywhere. Specifically write-intensive
workload. Why?

\- LMDB by default provides full ACID semantics, which means that after every
key-value write committed, it needs to sync to disk. Apparently if this
happens tens of times per second, your system performance will suffer.

\- LMDB provides a super-fast asynchronous mode (`MDB_NOSYNC`), and this is
the one most often benchmarked. Writes are super-fast with this. But a little
known fact is that you lose all of ACID, meaning that a system crash can cause
total loss of the database. Only use `MDB_NOSYNC` if your data is expendable.

In short, I would advise against LMDB if you are expecting to have more than a
couple of independent writes per second. In this case, consider choosing a
database that syncs to disk only occasionally, offering just ACI semantics
(without Durability, which means that a system crash can cause loss of only
the last seconds of data).

~~~
hyc_symas
Your advice made sense in the age of rotating platter HDDs, limited to a max
of ~120 IOPS. Today's world of NVMe SSDs makes your considerations obsolete.

~~~
JulianMorrison
There's an even older technology, battery backed RAM cached HDDs, that gives
you everything an SSD can, except the thing you aren't actually using here,
fast random-access read performance.

------
zawerf
Is there a design doc or talk about the internals?

In particular are there any good resources about the details of using memory
mapping?

I know how to implement persistent data structures (and it seems like lmdb is
just a persistent b+-tree). But I don't know how to make it persist to disk.
Is it as simple as using a memory mapped file for all memory allocations? Can
all data structures be turned into a "database" in this way? If your workload
fits in memory is there any performance difference between in-memory data
structures? When do writes actually flush? What happens if multiple processes
use the same file? etc

~~~
espeed
See these two talks by 'hyc...

LMDB talk at DEVOXX (2013) [video]
[https://youtu.be/Rx1-in-a1Xc](https://youtu.be/Rx1-in-a1Xc)

LMDB CMU Databaseology Lecture (2015) [video] [https://youtu.be/tEa5sAh-
kVk](https://youtu.be/tEa5sAh-kVk)

------
amelius
> Data pages use a copy-on- write strategy so no active data pages are ever
> overwritten, which also provides resistance to corruption and eliminates the
> need of any special recovery procedures after a system crash.

But I imagine this is somewhat slower than keeping a log (and rewinding it if
necessary)?

~~~
hyc_symas
Don't imagine. Test and verify.

[http://lmdb.tech/bench/inmem/](http://lmdb.tech/bench/inmem/)

[http://lmdb.tech/bench/ondisk/](http://lmdb.tech/bench/ondisk/)

------
trasz
The web page seems to suggest that robust POSIX semaphores are Linux-specific,
while they've been available in FreeBSD for quite some time. I wonder if they
detect it properly, or is there some actual problem in FreeBSD's
implementation?

