Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What is the SQLite of nosql databases?
58 points by rhim 47 days ago | hide | past | favorite | 48 comments
I am looking for a simple, file-based nosql database. So basically the sqlite of nosql databases.

As others have mentioned you have lots of options: LMDB, LevelDB/RocksDB, BerkeleyDB. For what it's worth, I spent a long time looking for an embedded key-value store for my current native project since I didn't need the full complexity of SQL. In the end I chose... SQLite.

All of these embedded NoSQL databases seem to be missing critical features. One such feature for my use case is database compaction. Last I checked, an LMDB database file can never shrink. Full compaction of LevelDB is slow and complicated (as I understand it essentially breaks the levels optimization which is the whole point of the thing.) SQLite meanwhile supports fast incremental vacuum, and it can be triggered manually or automatically.

SQLite just has everything. Plus the reliability is unmatched. Even if you just need a single table that maps blob keys to blob values, I would still recommend SQLite over any NoSQL database today.

Also there are high quality libraries for every language

The SQLite of NoSQL is still SQLite: https://www.sqlite.org/json1.html


Linux has VFS cache which is very robust and efficient.

Remember that it is typical for programs to read /etc/nsswitch.conf, /etc/resolve.conf and tons of others at startup time - the filesystem is the datasource in Unix tradition, so the machinery is very well optimized.

The problem with this is that if your records/documents are small, you're wasting huge amounts of space because each file uses a full filesystem block. If you have, say, ten thousand records where each is 200 bytes, a decent database would store that in a bit over 2MB. Storing these as individual files on a filesystem with 4kB blocks will take up at least 40MB. This is a huge amount of wasted space, not to mention slow. (Some filesystems do support tail packing but that won't fully solve the problem.)

Not to mention all the other problems with this. The filesystem has a complete lack of higher-level features: no transactions, no snapshots, no indexing beyond filenames, no easy robustness guarantees (doing fsync() properly is a lot more complicated than it appears.) Honestly for modern apps the filesystem is just terrible at storing any internal mutable app data.

Once you start writing code to store auxiliary indices, synchronize writes, or pack multiple records per file, well at that point you're just implementing your own database. This might make sense if, say, you have a special way of compressing your data (like git). But generally you're better off using a real embedded database.

Some filesystem can store small data directly in inode. Limit for ext4 is 160 bytes though.



btrfs has this too. it'd be cool if dirent_t could those this so one could quickly iterate thought these inline data.


For simple stuff I've had success keeping an in-memory data structure as the source of truth, and just persisting the whole thing to a file (JSON or otherwise) via a debounced function. Assuming you only have one process (or at least one main process), you only have to read the file on startup and can be pretty relaxed about your write strategy

You'll only be wasting space proportional to the number of objects, and the overhead is much smaller on ext4 (default fs on most linux distros) like the sibling comment explains. Most databases are quite small, and most of the rest are less than huge, so in most cases you won't be wasting huge amounts of space.

Overhead seems to be about 4KiB, at least using default ext4 parameters:

  # zfs create -V 100G -b 4096 tank/test && mkfs.ext4 -v /dev/zvol/tank/test && mount /dev/zvol/tank/test /mnt && cd /mnt
  # df -k .
  Filesystem     1K-blocks  Used Available Use% Mounted on
  /dev/zd16      102626232    24  97366944   1% /mnt
  # for x in `seq 1000000`; do echo $x >$x; done  # create 1M tiny files
  # df -k .
  Filesystem     1K-blocks    Used Available Use% Mounted on
  /dev/zd16      102626232 4022348  93344620   5% /mnt
  # bc -l
  (97366944-93344620)*1024/1000000  # free space diff per file
Plus you're wasting an inode per record - a limited resource in ext4, increasing which requires reformatting. You'd probably run out of inodes much sooner than out of space.

Interesting, indeed the inline data option is not enabled even by the latest e2fsprogs even though the feay has been there a long time.

Re inodes, this is a good point too. These definitely reduce the size of db that fs works nicely for.

This has a large overhead - easily a few kilobytes per file, depending on filesystem. Poor locality - your data gets scattered across entire disk; even on SSD this matters to some extent. And also starts to break down in some ways already at O(100k) files, e.g. globs in command line stop working. You can work around by splitting files into different directories but at that point it's just easier to use a normal database.

Isn't the main feature of nosql supposed to be easy horizontal scalability, the exact opposite of storing everything in a single file?

If you just need a r/w store for some jsons in a single file, why not sqlite? You can put arbitrary-length blobs into it. Some sql will be involved but you can hide it in a wrapper class tailored to your application with a few dozen lines of code or so.

> Isn't the main feature of nosql supposed to be easy horizontal scalability

There's no strict definition of nosql so everyone can choose their own. My personal take (in broad terms) follows:

No, that's not a feature of nosql. Nosql means not relational, which in turn means no guarantees about the relationship between two objects, i.e. no atomicity of access across multiple objects (for either read or write operations). A consequence of this lack of atomicity is that it's easy to store different objects in different places, thus opening up opportunities for horizontal scalability. Caveat: those opportunities can be taken away by other choices you make. If you decide to offer and enforce transactions, you are bringing atomicity back into the system, and thus making horizontal scalability hard again. Or you may decide you want a nosql-in-a-file.

> You can put arbitrary-length blobs into it.

And it's quite suitable for this purpose: https://www.sqlite.org/fasterthanfs.html

Practically, NoSQL also seems to just mean "Not-SQL" as stuff like Redis is often lumped in it, which is about the opposite of easy horizontal scalability.

SQLite has a backend which is well suited as a key-value store.

Here is a NoSql database based on the SQLite backend: https://github.com/rochus-keller/Udb.

I use it in many of my apps, e.g. https://github.com/rochus-keller/CrossLine. It's lean and fast, and supports objects, indices, hierarchical "globals" like ANSI-M and transactions.

NoSQL databases have many different different data models. Eg object, document, graph, and key/value DBs. In a lot of cases you should probably just use something on top of SQLite, but you should say more about your requirements.

An interesting one I ran into recently is Datalevin, a Datalog DB on top of LMDB for Clojure: https://github.com/juji-io/datalevin

I'd add Crux to this... it's document-oriented, but still generates attribute-level indices, and is also Datalog based. It can be used on top of RocksDB, LMDB, or many other (more scalable) backends (Kafka being the canonical one).

It's really awesome, and the team behind it are super responsive and helpful.

One of those I've tried is LiteDB - https://github.com/mbdavid/LiteDB. I liked it.

It's small yet capable. If you are familiar with MongoDB, you will feel right at home.

It's great for .NET developers as it's written in C# but since it's Netstandard 1.3 compatible, you can presumably run it under Ubuntu or Mac OS or wherever else the new .NET 5 runtime works. I've got a C# app running on ARM64 the other day - just saying.

I wrote about my experience playing with LiteDB here - https://tomaskohl.com/code/2020-04-07/trying-out-litedb/. It's not an in-depth look at all, just a few notes from the field, so to speak.

Probably something like LMDB [0] or Tkrzw [1], though nosql is a bit more diverse in a way SQL is not so it is hard to give a clear answer.

[0]: https://en.m.wikipedia.org/wiki/Lightning_Memory-Mapped_Data... [1]: https://dbmx.net/tkrzw/

+1 for LMDB, ended up there researching simple local key-value stores for my own use.

GDBM and BerkeleyDB are the "grey beard" references, but their 80's & 90's heritage shows.

Take a look on the ejdb2 https://ejdb.org

Take a look at UnQLite: https://unqlite.org/

BerkerleyDB is an older, simpler one, LevelDB / RocksDB are more modern, maintained, better for SSD workload

BerkerleyDB is rather complex and has a dual license (AGPL and commercial).

I like sled that is a nice embedded key value store written in Rust: https://sled.rs/

However, it is still in heavy development and a bit of a moving target even if the developers are currently heading toward stabilization of the file format.

Check LiteStore as bonus you have api included



I think you want Mongita. It was featured on HN a while ago: "Mongita is to MongoDB as SQLite is to SQL (github.com/scottrogowski)" https://news.ycombinator.com/item?id=26881915

I ended up writing a small wrapper on top of SQLite based on this: https://dgl.cx/2020/06/sqlite-json-support

With proper concurrency control, it can work very well even for multi process applications.

An object reference :-)

In memory, high performance, no schema. Get the object to journal to disk and you are almost there!

tkrzw is basically the modern dbm / berkeleydb

I second this. For those who don't know, tkrzw is the less-memorably-named successor of Tokyo Cabinet and Kyoto Cabinet.

How is one supposed to pronounce “tkrzw”? (^_^)

Tango Kilo?

Its same as sqlite. You can get more info in this sqlite tutorial.


Hey sureshdasari, this is probably against the rules for advertising.

Json on the filesystem

You mean an embedded NoSQL database? Because all databases whether SQL or NoSQL are file-based with just different formats and structures.

There are also persistent key-value store like dbm as part of standard library in python and several 3rd party implementations like bitcask for Go.

Personally, I like very much leveldb/rocksdb. They're very fast and solid.

My guess is that although you basically describe BerkeleyDB you probably want Redis.

I'd nominate TDB (the trivial database). It's small, robust, effective

Mostly, but actually not totally, kidding: *.INI files.

RavenDb would be what I would use.

For now, see Postgres as one


Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact