That’s harder than it sounds to roll by hand with fopen. SQLite with write ahead logging is pretty much as good as it gets for reliablity, but SQL at all was overkill for the task. LMDB is a close second and it’s memory mapped key-value interface is much simpler. . Would write again.
There is also a fork? who claims is better/more-features than LMDB: https://github.com/leo-yuriev/libmdbx
* BetrFS: An in-kernel file system that uses Bε trees to organize on-disk storage https://news.ycombinator.com/item?id=18202935
Memory model considerations and storage architecture design gets even more interesting now that NVMe has become a thing. For example, in addition to LMDB, how much more interesting have things become for Redis on NVMe?
* Caching Beyond RAM: The Case for NVMe https://news.ycombinator.com/item?id=17315494
* Intel Optane DC Persistent Memory is officially in Google Cloud https://news.ycombinator.com/item?id=1834816
And there are a few new forward-thinking DB architectures emerging on the scene, some that have been in the works for more than 10 years. Look at the work being done by the Berkeley RISELab team and the architecture behind Fluent DB.
* Ground: A Data Context Service (2017) [pdf] (berkeley.edu), https://news.ycombinator.com/item?id=18415456
What might have been conventional wisdom in the realm of DBs years ago will not be the best practices of today. Architectures have changed too much.
And this is not just true for storage, it's true for compute too. The availability of CPU/GPU/TPU accelerators in the data centers is driving a rethink in compute toward parallel algorithms in the form of Vector/Matrix/Tensor multiplication. The best way to store and index these arrays is something to consider too.
At least it doesn't support mmap so removes lmdb.
* If your workload is random-writes heavy, choose lsm
* If your workload is serial-writes heavy, both are similar
* If your workload is read-heavy (random or not) go for lmdb
Also, if your writes are mostly smaller than ~1/2 a page, you can reduce your B+tree pagesize and regain performance.
However LMDB does not solve all problems, and can be a bad choice for some, and I couldn't find this documented anywhere. Specifically write-intensive workload. Why?
- LMDB by default provides full ACID semantics, which means that after every key-value write committed, it needs to sync to disk. Apparently if this happens tens of times per second, your system performance will suffer.
- LMDB provides a super-fast asynchronous mode (`MDB_NOSYNC`), and this is the one most often benchmarked. Writes are super-fast with this. But a little known fact is that you lose all of ACID, meaning that a system crash can cause total loss of the database. Only use `MDB_NOSYNC` if your data is expendable.
In short, I would advise against LMDB if you are expecting to have more than a couple of independent writes per second. In this case, consider choosing a database that syncs to disk only occasionally, offering just ACI semantics (without Durability, which means that a system crash can cause loss of only the last seconds of data).
Last I looked into LMDB, this was only the case if the filesystem doesn't respect write ordering, which depends on the filesystem. Otherwise you get everything but durability (i.e. ACI) If I recall, writes are ordered by default on Ext3.
In particular are there any good resources about the details of using memory mapping?
I know how to implement persistent data structures (and it seems like lmdb is just a persistent b+-tree). But I don't know how to make it persist to disk. Is it as simple as using a memory mapped file for all memory allocations? Can all data structures be turned into a "database" in this way? If your workload fits in memory is there any performance difference between in-memory data structures? When do writes actually flush? What happens if multiple processes use the same file? etc
LMDB talk at DEVOXX (2013) [video] https://youtu.be/Rx1-in-a1Xc
LMDB CMU Databaseology Lecture (2015) [video] https://youtu.be/tEa5sAh-kVk
But I imagine this is somewhat slower than keeping a log (and rewinding it if necessary)?