

HyperDex, a consistent, fault-tolerant key-value store, hits version 1.0.rc1 - rescrv
https://groups.google.com/d/topic/hyperdex-discuss/_ZXWENZTFXs/discussion

======
jws
Looks interesting. From reading <http://hyperdex.org> (tutorial link is not
useful, find the QuickStart in the regular documentation) and considering it
against redis for my current project…

• In a single server configuration, HyperDex is marginally faster than redis
at GET and PUT.

• SEARCH looks like it can be much faster on HyperDex. I don't search, but I
do keep some ancillary mappings from secondary attributes to keys, I could
lose that with a good SEARCH.

• redis has a richer set of data types and operations, in particular I would
miss the server side atomic EXEC of Lua code¹.

• HyperDex uses a schema as opposed to the free form of redis. That's ok for
me. I like to make rules about my data.

• If you plan to go beyond "in RAM" sizes, redis won't work.

• Should I be so lucky as to need multiple servers, it feels like HyperDex
would scale better, but it is early days for redis clustering so that would
need to be assessed at that time.

• I'm not seeing a wire protocol documentation for HyperDex. I think because
there must be logic in the client end for hyperspace hashing. The wire
protocol is a beautiful thing if you live in a strange world of insane
performance goals achieved with asynchronous threading. You can just toss the
supplied libraries and write your own. It will probably take a full port to
figure out if there are problems with the HyperDex API in my world.

So what it comes down to is, I'm not switching now, but I like what I'm seeing
and will give it a spin and be ready to move if I need to.

␄

¹ In practice, the unprotected nature of redis means code out front anyway and
I can move the logic out there, but that is nonblocking C code for me and it
is a lot easier to write the logic in Lua that gets atomicity for free, plus
probably faster since it doesn't have to shuttle data back and forth to my
process.

~~~
rescrv
Thanks for your time. I'm the lead dev on HyperDex, so I'd be happy to address
your points.

\- A SEARCH is simply a mapping from secondary attributes to objects and it
was designed for your use case.

\- We're still adding server-side atomic operations. In the future we'll
definitely have richer types and operations.

\- Server-side exec of Lua code is definitely an interesting feature.

\- We've put a lot of effort into making it easy to manage a HyperDex cluster
without sacrificing on the scalability or fault tolerance. When you do your
assessment comparing the two systems, please let us know the outcome because
we'll definitely correct any deficiencies.

\- The client does indeed maintain a mapping for hashing. Our design is pretty
asynchronous already (from the C API), so it'd be a good idea to optimize that
further. Of course, we'll eventually document the wire protocol for those who
want to go further.

Feel free to contact me if you have questions or need a hand.

~~~
DEinspanjer
@rescrv I remember seeing some of your earlier posts about HyperDex. It
continues to be an interesting project to follow.

I had several questions that I didn't see covered in the docs. Would love to
see them answered somewhere:

1\. If you have more than f faults and the cluster shuts down, what are the
ramifications? How do you start it back up and recover? What if one or more of
the faulted nodes are unrecoverable? Can you easily recover the partial data?

2\. Can you perform OR logic in searches? i.e. Where attribute is 'a' OR 'b'?

3\. Where is the new count method documented?

4\. Is it possible to add new attributes to an existing space? Add new
subspaces? Remove subspaces? What is the storage/memory cost associated with
subspaces?

5\. Are there plans for grouping and aggregation of any sort? I've been
searching for a while to find a good key/value or document store system that
can provide very expressive searching in combination with grouping and
aggregation to provide an analytic platform for storing documents that don't
fit very well in a traditional SQL store due to complexity of the structure
(nested lists, etc) or dynamic addition of attributes.

~~~
rescrv
1\. In the current release it's possible to completely shutdown the cluster.
You cannot just cut power, but it's relatively painless to do otherwise.
Checkout [http://hyperdex.org/doc/06.faults/#shutting-down-and-
restori...](http://hyperdex.org/doc/06.faults/#shutting-down-and-restoring-a-
cluster) for more details.

2\. Currently you cannot perform OR logic in searches. It'd be relatively easy
to add though, so we'll probably do so in the future. It's possible to do two
searches concurrently and combine the results.

3\. This is an oversight on our part. We have to document it. In the Python
bindings, you call it the same way as a regular search, but instead of getting
a generator that yields objects, you simply get a number.

4\. Currently, things are pretty static. We've got ideas for changing that,
but just haven't pursued it yet. Each subspace replicates the data again. The
memory cost of a subspace is nearly nothing (just a list of replicas).

5\. I think that one-level group-by might be achievable, similar to our
sorted_search primitive. There's a lot of research that people have done on
this topic, and it's very interesting. That being said, the
simple/straightforward approach would probably be what we'd take.

Thanks for following the project!

------
swdunlop
Hyperdex 1.0rc1 shows a lot of promise but calling it a release candidate is
premature:

\- the unit tests use an old api and schema -- they flat out don't run

\- the disk storage primitive was recently replaced, in its entirety

\- the network protocols are undocumented

\- the schema is undocumented

Most of these items are fine in a project in development, but with the current
state, I wouldn't consider it a candidate for anything but discussion as a
research project.

------
flipchart
Any plans for other language bindings? I'm a C# dev, although I do know C/C++
and Python, but don't like interop. I think that this is a crucial way to gain
adoption. I'd be happy to write a C# binding, but I'd need a wire protocol

HyperDex reminds me of RethinkDB (<http://www.rethinkdb.com/>). Seems like
similar goals with having distributed servers and fault tolerance.

------
aidenn0
<http://hyperdex.org/extras/> makes assumptions that just aren't true these
days. Who knows what the latency between 2 EC2 nodes are going to be, even
when in the same AZ? Is it not possible to run a cluster that isn't on a
single LAN?

~~~
rescrv
EC2 is just one platform. Although many people point to EC2 as the gold
standard, it's worth noting that Google and Facebook both avoid virtualized
servers. On the other hand, EC2 is about leveraging spare capacity and
intentionally oversubscribing machines.

~~~
aidenn0
Yes, but it's a popular platform that doesn't (at first blush) appear to meat
the assumptions on the page I posted. My question still stands at "Do I need
to have all my machines in the same datacenter"

~~~
rescrv
You don't need all machines in the same data center. Latency between machines
does have effects on performance, so it is faster in a single data center.

------
SriniK
From the docs, interesting to see they are using leveldb for storage engine.
Nice to see a server version of leveldb. Also excited to see first class
python interface.

Does search performance depend on the fact that leveldb keys are also sorted?

~~~
rescrv
We love Python and it makes for easy-to-read code snippets! Internally we do
make use of the fact that LevelDB sorts objects by key.

