
LowDB – A flat JSON file database - ca98am79
https://github.com/typicode/lowdb?utm_source=nodeweekly&utm_medium=email
======
DogeDogeDoge
This has like zero fault tolerancy and by zero i mean none. Don't even push
flush to disk... it has a level of a in memory hash of persistence and is
totally insecure in terms of multithreading. As a Toy to show off yeah could
work but... in the very end of the day. You can minus me but this is crap.

~~~
blablabla123
I think the idea is cool, actually I once worked on something similar in Go,
very small code base. And adding flushing, caching and thread-safety is no
black art. The charming thing here is that, once severe Data errors happen,
one can easily recover from them using a simple Editor.

Imagine how to fix low-level data corruption on your <Insert your DBMS name>
Server.

~~~
couchand
_once severe Data errors happen, one can easily recover from them using a
simple Editor._

Unfortunately, in this case, you can't necessarily. One of the issues here is
that holding on to handles to multiple collections doesn't do what it seems
to, and also doesn't fail. Try running this:

    
    
        var low = require('lowdb');
    
        var foo = low('foo');
        low('bar');
    
        low.on('add', function(collection) {
            console.log("this should be foo:", collection);
        });
    
        foo.insert({ name: 'foo' });

~~~
typicode
Actually, in my code I always start with low():

    
    
      low('foo').insert({ name: 'foo' })
      low('bar').insert({ name: 'bar' })
      low('foo').insert({ name: 'foo' })
    

So never really considered this way of writing or had this bug.

But I can see that it's a flaw and unexpected behavior. I'm adding this to the
enhancements for the next version.

Thanks for spotting it :)

~~~
typicode
Just a little update. I've released LowDB 0.4.0 which is a complete rewrite of
the project.

It fixes some initial flaws/bugs and includes some new features. Also, writing
is now fault-tolerant and fully asynchronous.

------
ThePhysicist
If you're looking for something similar for Python, check out BlitzDB:

[https://github.com/adewes/blitzdb](https://github.com/adewes/blitzdb)

It's a serverless, flat-file, document-oriented database which supports
indexing and transactions and comes with a built-in ORM layer and a rich query
language modeled after MongoDB.

~~~
jamesmoss
On a similar tip for PHP, check out my library Flywheel:

[https://github.com/jamesmoss/flywheel](https://github.com/jamesmoss/flywheel)

It's no way near as powerful as BlitzDB but I'm slowly adding features to it.

------
andrewchoi
Unsure if OP is the maintainer, but a quick point: Benchmarks without any
indication of underlying hardware don't convey a huge amount of information.

~~~
rakoo
I think the most important thing is to compare with other known dbs on the
same hardware, though.

~~~
kasey_junk
Not really. Different hardware constraints can make serious differences in
performance metrics. Some systems perform better on high memory systems and
low hd speed, others are the opposite.

------
couchand
Please don't use this for anything significant. Shared global state and
synchronous file I/O pretty much guarantee that it will blow up in any non-
trivial use.

~~~
pizza
What demarcates trivial and non-trivial applications? # of machines,
guaranteeing xyz?

~~~
couchand
When I say trivial I mean one machine, one user, no errors.

The shared global state (see here:
[https://github.com/typicode/lowdb/blob/481cf43d6b0a52c1cb996...](https://github.com/typicode/lowdb/blob/481cf43d6b0a52c1cb996644f53763e630ad7da8/src/index.coffee#L59))
means that attempting to use more than one collection will cause issues but it
will silently succeed. The synchronous file I/O will grind your process to a
halt.

If you are serving a mock API for the purpose of local development you _might
be_ fine. If you're trying to build any web application at all, this isn't the
right choice.

~~~
TheZenPsycho
Do you think _in theory_ that it is possible to create a flat file based
system that _is_ multi-thread/multiuser safe?

My somewhat ignorant inclination is to suggest that maildir fits this
description.

~~~
couchand
Certainly. SQLite is a great example of one. It handles the tradeoff
elegantly: reads can be done in parallel but writes have to be sequential.

I don't believe maildir would be described as a flat file.

~~~
TheZenPsycho
Well I suppose then maybe I have the wrong idea of what "flat file database"
means. To you anyway. I would not actually describe sqlite as a flat file
database system... Since the data cannot be read safely directly from the
file, using a text editor, or -- since a flat file could be binary data, any
arbitrary program that just knows the format. And "flat" implies there is no
special data structure involved. I would think that would not be the case for
sqlite given how well it performs, it must use some kind of binary tree or
clever indexing system of some sort.

You must go via the actual sqlite process.

On the other hand maildir, composed of multiple ascii compatible text files,
can be read by any email client- provided it follows the protocol for reading
and writing of files in the maildir spec.

~~~
couchand
The term "flat file database" conventionally refers to a database that is
serialized to a single file, whether it is plain text or a binary. The
emphasis here is on the single file aspect: it's really easy to transfer the
entire database between machines.

~~~
mappu
MyISAM serializes each table to a single file on disk (well, three files, one
each for the definition, data and indexes).

Would you consider MySQL to be a flat file database?

~~~
couchand
I wouldn't, but I'm not really an expert here. If every table is three files,
that implies a non-trivial database would be dozens of files.

I suppose the other use of "flat file" I've heard is a flat directory with no
sub-directories, but I've never heard that applied to a "flat file database".

------
jdp
Related is tiny[0], an in-process document store that supports Mongo-style
queries, as well as a style similar to CouchDB views. You can dump its
contents to a JSON file.

Also interesting is PouchDB[1], another document storage library. It can be
used with Node or in the browser through various backends (like IndexedDB),
and can even replicate to CouchDB.

[0]: [https://github.com/chjj/tiny](https://github.com/chjj/tiny) [1]:
[http://pouchdb.com/](http://pouchdb.com/)

------
typicode
Hi everyone,

First, thanks for all the interest, it’s quite sudden and unexpected.

Actually, LowDB is an extract from JSON Server, a mocking REST server based on
plain JSON ([https://github.com/typicode/json-
server](https://github.com/typicode/json-server)).

So, basically, it's not meant to be used in critical / intensive applications.

Instead, it's much more a new convenient way to store data in simple use
cases.

Regarding file writing, if your database is small or if you don't run a
cluster of Node processes, you should be fine.

Regarding benchmark, as someone pointed it out, it’s mainly to show that
storing to JSON file is fast enough and to compare operations speed. I agree
that it says nothing about other databases and LowDB doesn't try to be the
fastest either, just fast enough. By the way, 'npm run benchmark' lets you run
it on your machine.

However, keep in mind too that LowDB official release is quite recent so it
should be improved over time.

Anyway, thanks for all the feedbacks and I hope you’ll have fun with it :)

------
bikamonki
I do find a 'production' use case: I have a static site that uses Backbone to
load a static json collection of pages. The json file itself is very small
(less than 100k) and will not grow much with time. I built a small CMS to let
me edit pages, the data for the CMS is dynamic (MongoDB), and I update the
static json file after edit. I could use a CMS that uses LowDB instead to edit
the json file directly. Keep in mind that this is one user making one or two
edits per day at most. If your realize that the vast majority of small
sites/blogs out there have similar requirements, then this tool makes sense.
No?

------
Sir_Cmpwn
I have a hard time believing that this will be performant for anything of
signifiance.

------
mkoryak
here is another flat file db that ive been eyeing:
[https://github.com/sergeyksv/tingodb](https://github.com/sergeyksv/tingodb)

upward compatible with mongodb.

Ive also used nedb, which is ok but tends to delete the datastore when it runs
out of disk space :P

~~~
louischatriot
NeDB creator here. I'd like to know more about the issue you're describing,
I've never heard about it and considering the test coverage I just don't see
how it could happen.

Could you post an issue on
[https://github.com/louischatriot/nedb](https://github.com/louischatriot/nedb)
with your environment details and how to reproduce this?

~~~
mkoryak
Sure, for some reason I thought this was a known issue. I saw it mentioned
somewhere else on github - but its can't be that well known since you don't
know about it. Ill open an issue this afternoon.

------
keyle
Also, I've used nedb and highly recommend it. It's fast and comprehensive.

[https://github.com/louischatriot/nedb](https://github.com/louischatriot/nedb)

------
kolodny
I always wondered in this situation:

    
    
        var topFiveSongs = low('songs')
          .where({published: true})
          .sortBy('views')
          .first(5)
          .value();
    

If any db engines figure when the dataset is really large and the limit is
really small (no idea what the cutoff would be), if instead of sorting and
giving the first five, it instead just looks for the 5 largest/smallest.
Anyone have any idea on this?

~~~
Erwin
PostgreSQL use a "top-N sort heapsort" for this. You can see that in effect
when doing "EXPLAIN ANALYZE" on a query like this. Without an index you cannot
avoid a full table scan. However the top-N heapsort avoids allocating memory
for ALL the rows, since you only care about the first 5 ones

For example, I had a 6 million entry table sitting around, and asked for top 5
rows by an unindexed column. With LIMIT 5 applied it told me:

    
    
              Sort Method: top-N heapsort  Memory: 95kB

Without:

    
    
               Sort Method: quicksort  Memory: 577595kB
    

So if Postgres had to store all the 6 million rows sorted it would need 577 MB
of work memory. If (see SHOW work_mem) it was below that that would lead to it
writing a lot of temporary files to disk:

    
    
               Sort Method: external merge  Disk: 144072kB
    

Note how Postgres is more wasteful with memory usage than disk writes for temp
storage.

Generally a query that requires you to do a full table scan on a large table
should be used sparingly however.

------
scotty79
Do you know any database that keeps the data in multiline human readable text
files (like json, yaml, or even nicely formatted xml) but also provides some
robustness, concurrent access by multiple users and indices to search the data
fast?

My usecase would be to keep the data for a website in it and keep the
datafiles themselves in repository so it can be backed up there, monitored and
possibly merged.

~~~
nitrogen
I don't know about indices, but at least at one point news.yc used flat files
for its storage. IIRC an older version is included with the Arc language
examples.

------
chippy
Reminds me of the Ruby Standard Library Hash based datastore PStore:
[http://ruby-doc.org/stdlib-1.9.2/libdoc/pstore/rdoc/PStore.h...](http://ruby-
doc.org/stdlib-1.9.2/libdoc/pstore/rdoc/PStore.html) ... but without the query
wrappers. Nice to see basic things done well.

~~~
andyl
And ruby has YamlStore. I use PStore and YamlStore all the time - extremely
simple and helpful. But - I wouldn't necessary call any of these tools
'databases', just persistent object stores.

------
jonny_eh
Are all operations sychronous? If so, this kind of makes it a no-go for most
uses with NodeJS.

------
eim_kasp
Interesting concept, if using it in the right way can save you number of
queries and add more speed to your application. Or could be great replacement,
where you need reasonable ammounts of data saved, like contests or prelaunch
signups, etc.

------
misiti3780
Serious question - why should I use this over MongoDB or any other NOSQL
solution?

~~~
mkoryak
You shouldnt. You should use this:
[https://github.com/sergeyksv/tingodb](https://github.com/sergeyksv/tingodb)
so that when you realize that you do need mongodb after all, you can just drop
it in.

~~~
louischatriot
Or NeDB which solve the same constraints :)

------
josteink
Sounds about as useful as the "XML-file only" databases some people decided
was the smartest thing ever at the turn of the century.

You don't see those around anymore.

------
econic
Anyone know if there is something similiar to this for Go?

~~~
fiatmoney
[https://github.com/HouzuoGuo/tiedot](https://github.com/HouzuoGuo/tiedot)

------
Guillaume86
Nice, I use nedb in one if my projects, always good to have alternatives.

------
krazykringle
Does it need to suck the entire database into memory to answer a query?

------
kpennell
Is this similar to firebase?

~~~
couchand
No, the storage backend here is the local filesystem.

------
korzun
Benchmarking per 1000 iterations is pretty silly. My guess is that it will
start to choke as soon as you have concurrency at a higher load.

~~~
jakejake
I think showing a small benchmark like that somewhat indicates the target use
case. I would just guess, it's something to use for single-user apps. Perhaps
storing user preferences or something. Most likely an app that is storing data
locally.

I think it's fairly safe to assume that this isn't going to be a wise choice
for a server back-end.

------
derpalittlederp
JSON files are my favorite NoSQL :D

