

WhiteDB – Lightweight NoSQL database written in C, operating in main memory - majidazimi
http://whitedb.org/
WhiteDB is a lightweight NoSQL database library written in C, operating fully in main memory. There is no server process. Data is read and written directly from&#x2F;to shared memory, no sockets are used between WhiteDB and the application program.
======
tammet
One of the authors here. A few answers quickly. It does write to disk: you can
either dump memory or write all changes to log (turn it on/off yourself). Sure
it has a global read/write lock, with several locking strategies to select
from (task-fair atomic spinlock queue or a reader-preference or a writer-
preference spinlock). It is definitely meant to be a simple library. We
strived to document it carefully to make usage as easy as possible. Yes, you
can very easily form lists, trees or any other pointer structures. Happy to
see it on the Hacker News, we never really expected that :)

~~~
bch
Congratulations on your project and the attention it's getting!

Can you explain-more/rationalize GPLv3 licensing w/ the conditional alternate
commercial license?

Why not just BSD, MIT, or (at least) LGPL ?

GPL is more understandable on "higher level software" (ie: complete
applications), but I don't understand your intent licensing a library this
way.

~~~
tammet
Making a clean cut between free-as-in-speech on one hand and free-as-in-beer
on the other.

~~~
mbreese
I appreciate the sentiment, but I was looking forward to using this until I
saw that part. I (like I assume many others) cannot use a GPL3 library (and
I'm in academia). If you want any sort of traction for a library, GPL3 is not
the way to go.

This is why the LGPL was created, so that you can have modifications done on
your library be free-as-in-speech, but still make the library as a whole
useable for a wide variety of other projects, including closed-source
versions.

Having a separate requirement to email you for a free-as-in-beer license is
just overly complicated for this. The more hurdles you put up for people, the
fewer that will adapt the library. I think that licensing is one of those
cases where is doesn't pay to be clever. Plus, what happens when you decide to
stop maintaining the code? Do you want to keep getting emails for licenses
years from now?

Edit: in last paragraph, I said free-as-in-speech, but meant beer (see comment
below).

~~~
tammet
The default GPL is free-as-in-speech. You do not have to email for GPL. You
have to email for free-as-in-beer. I assume that in case free-as-in-speech is
not OK, it is also not a major hurdle to email for the free-as-in-beer
version. In case emailing is a major hurdle, maybe you do not really need the
free beer part.

Should we stop maintaining the code or get bored mailing free beer licences,
we'll very likely change the licence to LGPL or MIT. Until then beer comes via
email.

~~~
bch
In the case of changing licenses, make sure over the course of your project
maintainorship that you have the right to relicense all the code, including
patches/contributions from others.

 _I_ wish it was simply licensed MIT or BSD, but congratulations on your
software and sticking to your convictions.

:)

------
endgame
Good on you for going GPLv3 as your free software license. I think it's really
funny to see people here going "b-but you should use my favourite license"
instead of "that's a cool thing you've built there".

~~~
bch
> "b-but you should use my favourite license" instead of "that's a cool thing
> you've built there".

First of all, "that's cool" and "license issues" aren't mutually exclusive.

Second of all, it's a case of picking an applicable license for the place the
software fits into a project. This is a low-level library (cool or not); if an
author wishes to "protect" their code, LGPL was built explicitly for this
purpose.

To confuse matters, in the projects own page[1] they effectively waive all the
rights of the GPL3 except for a very specific corner case.

People asking for license clarification or change are looking to simplify even
_beginning_ to use the library.

[1] [http://whitedb.org/licence.html](http://whitedb.org/licence.html)

~~~
nknighthb
What disgusting arrogance. There is no objective standard of "applicability"
of any particular license to any particular piece of software. The LGPL is not
"built explicitly for this purpose". The LGPL is a compromise license that
exists for strategic reasons. There is no clear reason it should be applied
here.

That some people may not want or be able to use this library because of its
license choice may matter to you, but there's no reason to believe it matters
to the authors, and you are not in a position to tell them what should matter
to them.

Nothing on that page constitutes a waiver of GPLv3. It is merely an offer of
an alternative license to a subset of potential users.

~~~
jafaku
He's just letting the author know that the library is unusable with this
license. So either people won't use it, or they will use it and violate the
license.

Would you seriously consider using (or even trying) a new library that has not
even proved to be better than others yet, if you had to comply with GPLv3? I
wouldn't.

~~~
nknighthb
> _the library is unusable with this license_

No, it isn't. I can use it. So can millions of others. That _you_ can't is
most likely the fault of shitty lawyers. Not the authors' problem.

~~~
jafaku
Odds are you don't really understand GPLv3. See what people like Linus (you
might have heard of him) think about it. And he made quite some contributions
to the open source world.

~~~
nknighthb
I understand it just fine, thank you. GPLv3 is hardly the only thing I
disagree with Linus on. Childish appeal to irrelevant authority is not an
argument.

~~~
jafaku
You were claiming that it was a problem with my "shitty lawyers", I showed a
perfect example of someone who worked on open source his entire life, who
can't use GPLv3.

Childish assumptions and irrelevant reference to the argument of authority
fallacy are not an argument. Talk about "arrogance"...

~~~
nknighthb
I claimed it was "most likely" shitty lawyers, because 100% of the people I've
heard from who genuinely _can 't_ use GPLv3 code anywhere in their work can't
do so because a corporate lawyer-drone is in the way.

The rest either _won 't_ because they don't like the license, or can't by
virtue of their own choices. Linus is one of these people. He has made his
choices, which is his right. I (mostly) do not agree with the reasons for
those choices, and would not have made the same ones, as is my right.

The identity and stature of the person who makes a choice is irrelevant, and
dragging it out as if it makes my opinion invalid is absurd.

~~~
jafaku
If a lawyer is the only thing stopping someone from using the library, then
that person was clearly going to violate the license. Which proves my point:
Those who want to steal the code will do it anyway. And those who wanted to
give it a legitimate use won't even touch it. The same kids that are slapping
GPLv3 to anything they build, are probably the ones breaking other people's
licenses because they don't understand it. This their typical response when
you call them out: "It's open source!". As if MIT/X11 open source was the same
as GPLv3.

I insist: No professional programmer is going to use WhiteDB. Not for open
source, not for anything. If there ever is someone willing to comply with
GPLv3 just to be able to use WhiteDB, he won't even find out about the
library, because nobody is using it.

~~~
nknighthb
> _If a lawyer is the only thing stopping someone from using the library, then
> that person was clearly going to violate the license._

Just by saying this you prove you've never dealt with corporate lawyers. I
know multiple companies where all GPLv3 software has been banned by the legal
department entirely. Not just for use as part of a product. It's literally not
allowed on the company's computer's at all, because the shitty lawyers who
couldn't make it in the real world have decided that if the company touches
GPLv3 software, all company source code is immediately GPLv3.

They are that stupid.

By the way, if your faith in corporate lawyers is so strong, why bother with
courts? We can just have corporate lawyers decide everything, since they'll
always get it right. Which is why there are no lawsuits where one side wins
and the other side loses.

> _I insist: No professional programmer is going to use WhiteDB._

Which particular term of the GPLv3 would prevent me from using WhiteDB in a
web application? I'm aware of none whatsoever. Note that this is GPLv3, not
AGPLv3, which does have terms which can pose a problem to web applications.

By the way, Red Hat and Canonical make GPLv3 software, contribute to GPLv3
software, and include GPLv3 software in their Linux distributions. Are you
accusing their programmers of being unprofessional?

------
yid
Looking at the benchmarks, I'm trying to rack my brain about how it
outperforms redis consistently on every single benchmark for a simple
associative map.

I don't know if this is likely, but it looks like redis doesn't lock memory
[1], which means that the benchmarks could be explained by swapping. Depending
on the type of shared memory used by whitedb, it could be that its pages are
locked and immune to swapping.

[1]
[https://github.com/antirez/redis/issues/1177](https://github.com/antirez/redis/issues/1177)

~~~
josephg
I think the difference is entirely explained by WhiteDB's lack of networking
overhead, both in write() calls to the OS[1] and creating & parsing network
messages. A fairer comparison would be with leveldb and lmdb[2].

[1] [http://highscalability.com/blog/2013/6/19/paper-megapipe-
a-n...](http://highscalability.com/blog/2013/6/19/paper-megapipe-a-new-
programming-interface-for-scalable-netw.html)

[2] [http://symas.com/mdb/microbench/](http://symas.com/mdb/microbench/)

------
bch
Why, oh why GPLv3 for a linkable library?

Request something more permissive (LGPL, MIT, BSD)?

~~~
kodablah
Closed-source licenses are provided per request and the only restriction seems
to be that you don't sell a DB system backed by the lib[1]. If you really want
to prevent closed-source abuse of your library, a GPL-based dual licensing
setup is the only way I can think of.

1 - [http://whitedb.org/licence.html](http://whitedb.org/licence.html)

~~~
bch
Ah -- ok -- I got my info from the COPYING file in the git repo.

Regardless: "those which are distributed and marketed as database systems to
be used by other developers" looks to be _full_ of wiggle room within what I
suspect[1] is the authors intent.

[1] "I/somebody will/might make this a higher-order database tool for
developers and I don't want anyone to compete with anybody else without
forcing everybody involved (with this code) to open their entire codebase"
(??)

------
SandB0x
Python bindings! Finally I can stop using dictionaries like a common peasant.

~~~
pekk
What is the advantage of this over using dictionaries?

~~~
Goopplesoft
Accessible across multiple python processes (shared memory).

------
sakai
Does anyone mind sharing what they see as the advantage of this vis-a-vis a
Redis, LevelDB, Mongo, etc. (realizing those are all very different)? Is it
principally read throughput, write throughput, interfacing, space efficiency,
scalability to very large datasets, or something else?

I ask as friends and I are doing research into very space-efficient (read:
probabilistic) key-value stores. We have a few scientific use cases, but I'm
curious if others would find the ability to scale to very large key- and value
spaces (~1e9+ key-value pairs) in a space efficient way practically useful.
Or, if interest is principally academic.

Apologies for the (pseudo-)hijack.

Ps. Looks very interesting, I have it on my to-dos to install and check it out
more deeply.

------
throwaway420
Is it accurate to describe this as a faster Redis that doesn't have all of the
useful data structures like lists, sets, etc?

~~~
meowface
Yes, somewhat.

It's also different from Redis because Redis is intended to be ran as a server
(it stands for Remote Dictionary Server). This is ran entirely as a process
and communicates via IPC; other machines can't reach the database, only the
local machine. This is a big reason for why it's very fast. However, it also
means you can't distribute the database across multiple servers.

You could think of it like a very fast NoSQL Sqlite, I guess.

~~~
StavrosK
I keep wondering why there isn't a libredis you can just link into your
program without needing to run a server and all that. It sounds very useful,
but I'm not sure how hard it would be to develop.

~~~
meowface
For the same reason you can't do that with MySQL or Postgres or any other
database server. Redis intended to be ran as a server listening for TCP
connections, and communicating over a TCP/IP network. If you're running it on
localhost, the communication should generally be pretty fast, but you're right
that overhead will still be incurred.

It would require adding a lot more code to allow for typical interprocess
communication.

------
symisc_devel
The project seems to be a fork of the more mature UnQLite project
([http://unqlite.org](http://unqlite.org)) except that UnQLite support
pluggable run-time interchangeable storage engines (B+Tree, LH, R+Tree) and
has support for on-disk databases as well in-memory operations.

------
otterley
How does performance compare to Kyoto Cabinet?

[http://fallabs.com/kyotocabinet/](http://fallabs.com/kyotocabinet/)

------
elacey
Maji, nice work. Enabling graph like functionality with linking records is an
interesting addition. Any idea what performance looks like going 2 or 3 levels
deep with a cross-linked graph of say 100k vertices and 500k edges?

------
joshguthrie
Ruby bindings are here:
[https://rubygems.org/gems/whitedb](https://rubygems.org/gems/whitedb)

What has been done so far:

* Create a database

* Create a record

* Set field to a string

If you feel like testing it:

    
    
        require "whitedb"
    
        db = WhiteDB::Database.new("foo", 20000)
        rec1 = db.create_record(5)
        rec2 = db.create_record(4)
        5.times{|x| rec1.set_field(x, "Rec1 #{x}") }
        4.times{|x| rec2.set_field(x, "Rec2 #{x}") }
    

Then use the wgdb binary (`wgdb foo select 20`) to see your data in your
database.

I'm still new to ruby C extensions so the API is quite ugly (no blocks yet, no
error checks,...) and I'm currently adding features as I'm following the
tutorial.

------
ateeqs
But why exactly is an in-memory NoSQL database necessary? This is a "non-
sequitur." The fundamental purpose to using NoSQL databases is when your data
spans multi-terabytes (where "multi" > 4) and you need replication and
slicing. Main-memory NoSQL database seems unnecessary, doesn't it?

Also, when you say "main memory" NoSQL database, are you saying "never store
in page file" but always reside/lock in main memory? If it will go to the page
file, it's not really main-memory, is it?

~~~
aryastark
Pretty much what I was trying to figure out. I can't see a use case here,
since we already have things like Berkeley DB and LevelDB. But main memory?
Your use case would have to be such that your entire dataset can fit in memory
_and_ for some reason you need extreme performance from that dataset, and from
multiple processes running on a single machine. I just don't see it. Is the
caching performance of other databases so horrible that this is really
necessary?

------
jonny_eh
Does it write to disk ever? I could just assume that it does because it's a
DB, and would be nearly worthless if it didn't, but it keeps talking about
being only in memory.

~~~
malkia
It's using memory mapped files, so that's why I guess. Not sure if these are
portable between architectures.

------
sigzero
So "NoSQLite"? :)

------
n00j
How would this compare with something like in-memory sqllite database? I guess
the big difference is SQL vs NOSQL but would be curious about the performance
difference.

------
gwu78
Will it compile on BSD/Solaris?

------
a8da6b0c91d
Am I missing something or is this just an associative map allocated in shared
memory? Seems to be an awful lot of text for something so simple.

~~~
xfax
I'm convinced that one of these days we're going to see a new "DB" which will
be a simple hashmap with collision handling sacrificed for "speed and
performance"

~~~
adamnemecek
But it will be WEBSCALE!!1!

/s

~~~
n00j
It is webscale...

"Locking We use a database level lock implemented via a task-fair atomic
spinlock queue for concurrency control ..."

Fantastic, database level locking!

~~~
JulianMorrison
If the task is read-heavy (example: shared cache), it's mostly harmless.

