

Keyspace - a distributed key value store - joubert
http://scalien.com/keyspace/

======
stephenjudkins
Although this could technically be called a "distributed" key-value store, I
think based on many people's assumptions of what "distributed" means, they
might misunderstand what this offers. According to the FAQ, "Keyspace is a
replicated system, meaning all nodes store the same data", so it's distributed
in the same way as MySQL is a distributed SQL database. That is, reads scale
horizontally but writes do not.

That said, it appears this might be an interesting project. It appears it uses
Paxos (<http://en.wikipedia.org/wiki/Paxos_algorithm>) to achieve high
availability. This is a good deal more sophisticated than naive replication
like MySQL. Also, reads can be specified as "dirty" or not, meaning they are
more performant and tolerant of failure or network partition, but also
possibly inconsistent. It appears to largely sacrifice the "partition-
tolerant" part of the CAP triangle.

Also interesting are ranged get, atomic increment, and compare-and-swap
operations.

------
mrg
To get critical mass, a new project has to present a plug-n'-play install.
First impressions are everything. That is not the case here. I've read the
docs and an initial install looks to need several different
languages/compilers to get going. And then some.

Even Tokyo Cabinet/TT are more aligned and straightforward than this project.
It will be interesting to see what happens here, But please oh please you
Scaliens, give some more realistic and complete benchmarks to whet one's
appetite.

------
fsiefken
How does this compare to say, cassandra, which also is a distributed key-value
store?

~~~
stephenjudkins
Keyspace is replicated; when consistent, every node has a complete copy of the
data. Cassandra distributes the keyspace across many nodes; in a reasonably-
sized cluster a node will have only a fraction of the data. A write-heavy
workload will scale horizontally with Cassandra but will probably be as slow
as the slowest server in the cluster with keyspace.

Cassandra is tolerant of partitions in the cluster; if communication between
two parts of a cluster are cut off, read and write operations that don't
demand consistency will not fail. Keyspace has "dirty" reads that will
continue to return inconsistent data when this occurs but all write operations
require a majority of nodes to be up and in communications with one another.

Cassandra's data model is more sophisticated than Keyspace's. You can have
multiple column families (essentially, namespaces) in Cassandra that can be
ordered arbitrarily, given a plugin to compare two keys. Supercolumns are
(this is a very simplified explanation) columns with tuples (key1, key2) as
keys. Keyspace is a simple key/value store.

Both Cassandra and Keyspace support ranged gets; Cassandra supports multi-gets
of a list of arbitrary keys as well. Keyspace supports increment and compare-
and-swap operations that Cassandra does not.

Cassandra is used in production by a good number of places; I don't know who,
if anyone, is using Keyspace.

All my knowledge of Keyspace comes from reading the thorough and clear
whitepaper they offer at <http://scalien.com/pdf/Keyspace.pdf>.

------
sumeeta
Ha. Scalien is a great name.

How does a startup that develops database software plan to make money? Selling
support?

