Hacker News new | past | comments | ask | show | jobs | submit login

[ First off: I'm a committer on Project Voldemort, a Dynamo-style distributed data store ]

First, Riak is excellent. I can only say positive things about it as well as the folks that work on it.

Re: "store for unimportant data". I'll go beyond that. Not only should new databases be suitable for reliable storage, new databases should do things than existing databases can't. I am a bit sad that NoSQL had become to mean "replacement for an improperly tuned, ad-hoc sharded MySQL setup". To be clear, having a simple setup that provides partitioning, replication and defaults more tuned to modern hardware is a fine goal -- but why not do better? If I wanted something better than MySQL, I'd use Postgres (or properly tune my MySQL installation).

For example, Dynamo-style stores allow for any replica to initiate a write (something not possible with primary copy replication), allowing high availability applications. Some systems (Voldemort, riak-core, HBase with co-processors) also allow custom code to run on the server, significantly extending the capability of a system in a way in which a store procedure can't.

It's also sad to see NoSQL style systems repeat many mistakes that MySQL has made. MySQL in late 90s with MyISAM is a completely different beast from MySQL today with InnoDB: far better concurrency, durability, referential integrity, better replication. BerkeleyDB JE is also a powerful beast: log structured storage (this is why we're using it as the default storage engine in Voldemort), Paxos-based leader elections with tunable replication.

Schema-less data or (as in Voldemort) evolvable schemas is also a huge feature, but it's not impossible to replicate it on top of MySQL (e.g., Friendfeed's data model).

Here are some things that I'd like to really see evolve in NoSQL space:

* Support for new and interesting distribution models. Allowing users to choose between eventual consistency, quorum protocols, primary copy replication and even transactional replication.

* Support for large, unstructured blob data: Riak is going the right way with Luwak, I believe Facebook has been using HBase as a front-end for Haystack -- it would also make a great choice for Haystack's metadata store.

* Most NoSQL systems support transactions within the scope of a single value (or document) via the use of quorums, serializing through a single master, etc... However, it'd be nice if something like MegaStore's Entity Groups (or Tablet Groups in Microsoft Azure Cloud SQL server) were supported.

* Secondary indices, whether internal or external (by shipping a changelog) to the system.

* True multi-datacenter support (local quorums if desired, async replication to the remote site) including across unreliable, high latency WAN links (disclosure: Voldemort supports this -- https://github.com/voldemort/voldemort/wiki/Multi-datacenter... )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: