HyperDex, a consistent, fault-tolerant key-value store, hits version 1.0.rc1

jws · on Jan 16, 2013

Looks interesting. From reading http://hyperdex.org (tutorial link is not useful, find the QuickStart in the regular documentation) and considering it against redis for my current project…

• In a single server configuration, HyperDex is marginally faster than redis at GET and PUT.

• SEARCH looks like it can be much faster on HyperDex. I don't search, but I do keep some ancillary mappings from secondary attributes to keys, I could lose that with a good SEARCH.

• redis has a richer set of data types and operations, in particular I would miss the server side atomic EXEC of Lua code¹.

• HyperDex uses a schema as opposed to the free form of redis. That's ok for me. I like to make rules about my data.

• If you plan to go beyond "in RAM" sizes, redis won't work.

• Should I be so lucky as to need multiple servers, it feels like HyperDex would scale better, but it is early days for redis clustering so that would need to be assessed at that time.

• I'm not seeing a wire protocol documentation for HyperDex. I think because there must be logic in the client end for hyperspace hashing. The wire protocol is a beautiful thing if you live in a strange world of insane performance goals achieved with asynchronous threading. You can just toss the supplied libraries and write your own. It will probably take a full port to figure out if there are problems with the HyperDex API in my world.

So what it comes down to is, I'm not switching now, but I like what I'm seeing and will give it a spin and be ready to move if I need to.

␄

¹ In practice, the unprotected nature of redis means code out front anyway and I can move the logic out there, but that is nonblocking C code for me and it is a lot easier to write the logic in Lua that gets atomicity for free, plus probably faster since it doesn't have to shuttle data back and forth to my process.

rescrv · on Jan 16, 2013

Thanks for your time. I'm the lead dev on HyperDex, so I'd be happy to address your points.

- A SEARCH is simply a mapping from secondary attributes to objects and it was designed for your use case.

- We're still adding server-side atomic operations. In the future we'll definitely have richer types and operations.

- Server-side exec of Lua code is definitely an interesting feature.

- We've put a lot of effort into making it easy to manage a HyperDex cluster without sacrificing on the scalability or fault tolerance. When you do your assessment comparing the two systems, please let us know the outcome because we'll definitely correct any deficiencies.

- The client does indeed maintain a mapping for hashing. Our design is pretty asynchronous already (from the C API), so it'd be a good idea to optimize that further. Of course, we'll eventually document the wire protocol for those who want to go further.

Feel free to contact me if you have questions or need a hand.

DEinspanjer · on Jan 16, 2013

@rescrv I remember seeing some of your earlier posts about HyperDex. It continues to be an interesting project to follow.

I had several questions that I didn't see covered in the docs. Would love to see them answered somewhere:

1. If you have more than f faults and the cluster shuts down, what are the ramifications? How do you start it back up and recover? What if one or more of the faulted nodes are unrecoverable? Can you easily recover the partial data?

2. Can you perform OR logic in searches? i.e. Where attribute is 'a' OR 'b'?

3. Where is the new count method documented?

4. Is it possible to add new attributes to an existing space? Add new subspaces? Remove subspaces? What is the storage/memory cost associated with subspaces?

5. Are there plans for grouping and aggregation of any sort? I've been searching for a while to find a good key/value or document store system that can provide very expressive searching in combination with grouping and aggregation to provide an analytic platform for storing documents that don't fit very well in a traditional SQL store due to complexity of the structure (nested lists, etc) or dynamic addition of attributes.

rescrv · on Jan 16, 2013

1. In the current release it's possible to completely shutdown the cluster. You cannot just cut power, but it's relatively painless to do otherwise. Checkout http://hyperdex.org/doc/06.faults/#shutting-down-and-restori... for more details.

2. Currently you cannot perform OR logic in searches. It'd be relatively easy to add though, so we'll probably do so in the future. It's possible to do two searches concurrently and combine the results.

3. This is an oversight on our part. We have to document it. In the Python bindings, you call it the same way as a regular search, but instead of getting a generator that yields objects, you simply get a number.

4. Currently, things are pretty static. We've got ideas for changing that, but just haven't pursued it yet. Each subspace replicates the data again. The memory cost of a subspace is nearly nothing (just a list of replicas).

5. I think that one-level group-by might be achievable, similar to our sorted_search primitive. There's a lot of research that people have done on this topic, and it's very interesting. That being said, the simple/straightforward approach would probably be what we'd take.

Thanks for following the project!

gizzlon · on Jan 16, 2013

@rescrv It's my impression that Redis is well-opitmized C code. How can HyperDex be "2-14 faster" ?

rescrv · on Jan 16, 2013

First, HyperDex is well-optimized C++ code. And second, they take issue with the standard benchmark we used to evaluate both systems. While we recognize that Redis may have an impedance mismatch with the benchmark, we were hesitant to create our own lest we inadvertently introduce bias toward our system. Thus the numbers reported are accurate, but the generic nature of the YCSB benchmark may not be exploiting all that Redis offers. We'd love to work with the Redis team on a standardized benchmark we both can get behind but that has not yet happened (and may never happen).

In the grander scheme of things, the two systems are very different. In my opinion, Redis is very well suited for inter-process communication where extremely low latency is necessary and data is ephemeral. HyperDex, on the other hand, is great at storing persistent data. The configurable fault tolerance setting makes it very clear how safe your data is, and the strong consistency is very convenient when building applications. Further, HyperDex is built from the ground up to operate in a clustered environment and does so quite well.

gizzlon · on Jan 16, 2013

I see.. I googled it and it seems like the comparison is multi-threaded (multi process? multi machine??) vs single-threaded, single process. It doesn't really seem like a fair comparison, and since you guys are obviously knowledgeable I bet you understand as much.

Please don't let what seems like a cool project be tainted by marketing bullshit. (I really, really hate marketing bullshit :)

rescrv · on Jan 16, 2013

I bet you found the thread on their mailing list. I was the one who ran the benchmark and it was one core vs. one core. In the other benchmarks we had multiple machines, but not the Redis benchmark. We clarified after they speculated that a multi/single comparison was performed.

The main issue they had was YCSB's method of storing multi-attribute objects as maps (from attribute names to their values) and the fact that Redis cannot support search across keys so YCSB searches within a single set.

flipchart · on Jan 17, 2013

Any plans for other language bindings? I'm a C# dev, although I do know C/C++ and Python, but don't like interop. I think that this is a crucial way to gain adoption. I'd be happy to write a C# binding, but I'd need a wire protocol

HyperDex reminds me of RethinkDB (http://www.rethinkdb.com/). Seems like similar goals with having distributed servers and fault tolerance.

swdunlop · on Jan 20, 2013

Hyperdex 1.0rc1 shows a lot of promise but calling it a release candidate is premature:

- the unit tests use an old api and schema -- they flat out don't run

- the disk storage primitive was recently replaced, in its entirety

- the network protocols are undocumented

- the schema is undocumented

Most of these items are fine in a project in development, but with the current state, I wouldn't consider it a candidate for anything but discussion as a research project.

aidenn0 · on Jan 16, 2013

http://hyperdex.org/extras/ makes assumptions that just aren't true these days. Who knows what the latency between 2 EC2 nodes are going to be, even when in the same AZ? Is it not possible to run a cluster that isn't on a single LAN?

rescrv · on Jan 16, 2013

EC2 is just one platform. Although many people point to EC2 as the gold standard, it's worth noting that Google and Facebook both avoid virtualized servers. On the other hand, EC2 is about leveraging spare capacity and intentionally oversubscribing machines.

aidenn0 · on Jan 16, 2013

Yes, but it's a popular platform that doesn't (at first blush) appear to meat the assumptions on the page I posted. My question still stands at "Do I need to have all my machines in the same datacenter"

rescrv · on Jan 16, 2013

You don't need all machines in the same data center. Latency between machines does have effects on performance, so it is faster in a single data center.

jws · on Jan 16, 2013

Good questions. I have no answers but only this guideline…

Reality is generally more constrained than the models people make for theorems. That can help or hurt.

Back in the days when ethernet meant unswitched 10mbps and it was proven that you couldn't get more than about 1mbps utilization and should probably go with token ring or one of the better thought out networks… my name was on a paper showing 9mbps utilization in our real application without doing anything special. Except we were using real world hardware that was not even close to capable of sending back to back packets, so the our data streams were naturally interleaving and collisions went to near zero.

aidenn0 · on Jan 16, 2013

That's a case of reality being less constrained than the theory, assuming the theory actually proved a maximum utilization of ~1mbps; perhaps they were assuming all nodes were saturating the line?

jws · on Jan 17, 2013

Its been a while. The best I recall, the model the theory used was for a random distribution of legal interpacket timings that didn't account for the rather significant minimum interpacket gap imposed by the hardware.

It was also significant here because people forgot all the preconditions of the original paper and just applied the sound bite synopsis.

SriniK · on Jan 16, 2013

From the docs, interesting to see they are using leveldb for storage engine. Nice to see a server version of leveldb. Also excited to see first class python interface.

Does search performance depend on the fact that leveldb keys are also sorted?

rescrv · on Jan 16, 2013

We love Python and it makes for easy-to-read code snippets! Internally we do make use of the fact that LevelDB sorts objects by key.

johnbellone · on Jan 16, 2013

There's a bunch of node examples using leveldb as well. Quite a nice API (and easy) to write bindings for.