
Show HN: SummitDB – In-Memory NoSQL DB - tidwall
https://github.com/tidwall/summitdb
======
qwertyuiop924
Can somebody get us an _on disk_ , small, fast, in-process, low-footprint
NoSQL DB? Thus far, I've been pressing SQLite into use for a lot of monotyped
data that would have been better served by a NoSQL DB, but I couldn't find one
suitable.

Apparently, in-memory DBs in now. They're fast and all, that's nice, but I'm
still running on a machine with 8 Gigs of ram, and when the dataset starts to
exceed that, I want an option other than "grab some more DDR3." I remember the
fate of TinyMUD: on-disk operation must at least be an option for any DB I'd
use for datasets of a non-fixed size.

~~~
misframer
There are several key-value stores you can choose from. RocksDB, LevelDB,
lmdb, Berkeley DB, Tokyo Cabinet, sophia, and others. What about those? You
can always build abstractions on top.

~~~
qwertyuiop924
Thanks for the suggestions. Some of those might work. The only problem with
k/v stores is the serialization/deserialization cost, as most of them can only
store strings.

~~~
ddorian43
lmdb has some special cases where you can store capnproto/flat-buffers objects
and read without copying/deserializing!

~~~
qwertyuiop924
Really? that could be useful...

~~~
ddorian43
LMDB can hand you back a pointer to your object in the memory map, and we can
easily use that pointer with Cap'n Proto without any calls to malloc().
source:
[https://news.ycombinator.com/item?id=12394385](https://news.ycombinator.com/item?id=12394385)

------
SteveNuts
Someone should invent a package manager for databases, the same way we have
npm, composer, etc.

it's getting difficult to manage all my different databases.

~~~
arielweisberg
Totally! I can just sudo apt-get install low-latency-strong-consistency-multi-
dc-with-declarative-query-language-db as a virtual package and get the right
one for my distro.

~~~
manigandham
Great options already exist:

MemSQL for blazing fast distributed full SQL database with cross-datacenter
replication and in-memory rowstore + disk-based columnstore.

ScyllaDB for Cassandra rewritten in C++ for blazing fast dynamo-style multi-
master multi-datacenter disk-based wide-column database.

I'll also throw in AMPS (by 60East) as the best messaging platform that
supports innovative SQL and real-time state-of-the-world queries on it's
message streams (instead of using that rabbitmq or kafka crap).

~~~
akbar501
[https://github.com/Netflix/dynomite](https://github.com/Netflix/dynomite)

Dynomite for production proven, limitless scale for Redis.

Provides ability to Redis for both in-memory and on-disk workloads.

Dynamo-inspired linearly scalable, shared nothing multi-master architecture.
Pairs well with Cassandra.

Supports pluggable backends (ex. RocksDB).

Been in production 2+ years. One of the larger clusters handles over 3.6
million sustained ops/sec in production, every day.

~~~
ddorian43
Redis is only the api/driver, not the features, right ? Meaning it's just crud
and not the many features/data-types that redis provides, correct ?

~~~
akbar501
It's the full Redis API and protocol. See
[https://github.com/Netflix/dynomite/blob/dev/notes/redis.md](https://github.com/Netflix/dynomite/blob/dev/notes/redis.md)
for a list of all supported Redis features.

The benefit of Dynomite's support for the Redis API + protocol is a.) you can
use any Redis client and b.) the same code can be used for standalone Redis on
your laptop and on a distributed Dynomite cluster.

------
fiatjaf
This is exactly like what I wanted Redis to be. Secondary indexes solve all
the problems.

However, I wanted a feature that allowed custom indexes with Javascript, like
CouchDB map functions.

~~~
dvirsky
I'm working on a secondary index implementation in redis using modules, that
should be out in a few days. It indexes redis hashes similar to how you index
tables.

~~~
ddorian43
Check out redis-search module which I think does that (only for keys for now)
[https://github.com/RedisLabsModules/RediSearch/commit/86eb1c...](https://github.com/RedisLabsModules/RediSearch/commit/86eb1c4a76dddd172fc208f3caa162bdc7b0692c)

~~~
dvirsky
I wrote that module as well :)

The secondary index module will be more focused on automating more traditional
indexing of numbers and simple string values. It has an SQL-ish language for
WHERE expressions, and then the result of that is piped to any redis command
you want it to.

If you're interested, here's a draft of the syntax (it's changed a bit since
but you'll get the idea).
[https://gist.github.com/dvirsky/3ef73143a6d8212f2b50096a8eb6...](https://gist.github.com/dvirsky/3ef73143a6d8212f2b50096a8eb68018)

~~~
ddorian43
Oh, I should've seen the usernames. I'm not interested on the secondary-
indexing, mostly on the full-text-search. It could grow on something big
(compared to the benchmarks on redislabs [https://redislabs.com/blog/adding-a-
search-engine-to-redis-a...](https://redislabs.com/blog/adding-a-search-
engine-to-redis-adventures-in-module-land))

~~~
dvirsky
Yeah, that blog post is also me :) hehe. Anyway glad to hear you find it
interesting. If you want to help out ping me. There's tons of stuff on the
roadmap, and currently I'm almost entirely focused on the secondary index.

~~~
ddorian43
I still see some issues with it: 1\. redis being single-thread, where
inverted-indexes are ~easily sharable by cores (i think based on what I've
read). This making hotspots easier(since it's a single core not single
machine) 2\. sharing of data(terms) between multiple indexes on 1 server (like
in elasticsearch you use _routing, but all things are in 1 index, though some
people like separate-per-user like dropbox does) 3\. cluster not fully nice
yet (losing writes) 4\. no option to merge from different nodes (or even
redis-api to do so as far as I know)

~~~
dvirsky
Salvatore just added support for asynchronous operations in side threads in
modules. This would allow merging results from the cluster possible, and I
want to get to it soon. I'm now implementing it in the client side and it
works great, but I want it to be as transparent as in elastic.

The recent additions would also allow more parallelization of the indexes for
reads. I could create RW mutexes in the engine and allow multiple clients to
work on the same term index. It won't be trivial but it's not super hard as
well.

------
skrebbel
I can't find from the docs at all whether this persists to disk.

If not, what is the use case? Why would I need all those ACID-y guarantees if
my server can fail at any time and all data is gone?

~~~
pasta
In-Memory could be used for processing data. Or as temp tables.

And by the way: a lot of existing databases support this.

------
deforciant
looks great, I really like BuntDB, already using it in project :) Glad to see
this new SummitDB, will definitely try it out!

------
ddorian43
Looks like it has no sharding unfortunately. Do you have any info/eta/idea on
this op ?

~~~
tidwall
Yes it is something that I certainly want soon, but there's no ETA at the
moment. I haven't yet fully flushed out the strategy for sharding the key
space.

