

zBase – A high-performance, elastic, distributed key-value store - slynux
http://code.zynga.com/2013/08/zbase-a-high-performance-elastic-distributed-key-value-store/

======
noelwelsh
I don't understand what this offers that isn't offered by, say, Riak. Riak
ticks all the same big feature boxes as zBase (distributed, elastic KV-store
that persists to disk) and has the advantage that Someone Else (i.e. Basho) is
paying for development.

~~~
shin_lao
Riak is extremely slow, perhaps zBase delivers better performance?

~~~
e12e
opendomain: you're dead. Not sure why (I had a look through the comment
history, and couldn't find anything that stood out). FYI.

------
slynux
If you look at the key features, following are the attractive points about
zBase:

\- LRU based or random eviction based cache management.

\- Support for multiple disks and thereby IO parallelism.

\- Incremental Backup and Restore (You can pack 5x .. 10x size of RAM in ZBase
and make use of incremental backups for node failover)

\- Incremental backup helps to offer Blob level restore in hourly, daily and
weekly granularities)

\- Cluster manager - ZBase operates by partitioning entire data into virtual
buckets and servers act as containers to hold these vbuckets. Hence provides
scalable ways to increase or decrease the number of servers in a cluster.

------
RyanZAG
That dynamic resharding looks very nice. The big issue I see for using this as
a real datastore is the apparent lack of queries and indexes on the data.
Keeps it a lot simpler I guess, but so many workloads require the use of
queries. I guess you'd load the data into some other system for querying and
just use this for storage? Or would you use another database for storing the
data, and load it into zBase for quick access to buckets?

~~~
nieksand
It's a distributed key-value store with durability. If you're looking for
something to do ad-hoc queries against this is not for you.

Think of it as memcache + disk persistence. (So rather than erasing things by
purging cache when memory slab fills, you just evict it from memory and read
from disk if its needed again).

~~~
RyanZAG
I get that - but the usual implementation would be to have a set of databases
with indexes (maybe mysql or mongodb) where you could store all the data and
run ad-hoc queries against. You'd then put memcache in front of that for fast
access to repeated queries where you already know which data you want. If the
data isn't in the memcache, it would fall through to the underlying DB that is
already on disk.

zBase would have it's own full copy of the data already on distributed disks,
so it wouldn't need to fall through to some other database. That seems to be
the entire point there - but surely you'd still need to store the data in some
place you could run ad-hoc queries on it? That means that the data is
duplicated into two places that would need to be kept up to date in sync. If a
transaction fails on one of the data stores, don't you have inconsistent data
now?

~~~
slynux
Currently zBase does not have any capabilities for indexing. But, the inherent
design enables to use incremental replication protocol to build things outside
of zBase to do indexing.

zBase is used as highly available key-value store for writes and reads. It
offers few fancy operations like get-lock as well.

------
danmaz74
Anybody cares to compare this to Redis?

EDIT: To clarify, I know Redis, I'm interested in learning how this differs
beyond its distributed nature.

~~~
hbbio
Redis clustering and elasticity is still a work in progress. That said, when
sharded or for single nodes, Redis is an outstanding tool, and their roadmap
to distribution is very promising: [http://redis.io/topics/cluster-
spec](http://redis.io/topics/cluster-spec)

Note also that Redis Sentinel
[http://redis.io/topics/sentinel](http://redis.io/topics/sentinel) provides
high availability.

~~~
danmaz74
Thanks hbbio - I was mostly interested in learning the differences from the
point of view of zBase, as I'm a user of redis.

~~~
hbbio
Sorry not answering earlier. I don't know zBase neither, just Redis...

~~~
danmaz74
No problem hbbio - I appreciate :)

------
maniktaneja
Note that Zynga's workload is typically very write heavy and zBase has been
designed to support just that. in fact its one of the largest No-SQL d/b
deployments with over 6000+ nodes in production.

~~~
ddorian43
are all of them in 1 cluster?

~~~
slynux
No. Many smaller clusters.

------
continuations
This sounds somewhat like RethinkDB, although I don't believe RethinkDB has
dynamic resharding.

Other than the dynamic resharding part, how do zBase and RethinkDB compare to
each other?

------
lowglow
Hey look at all that hard work those people that got fired put in! Yay Zynga!

------
JeremyMorgan
Looks neat and all, but I have a hard time getting behind anything Zynga is
doing...

