

HyperDex: A searchable distributed key-value store - cemerick
http://hyperdex.org/

======
latch
FWIW, I find your website suffers the same problem as Riak's, which is to say,
it is entirely too focused on the product rather than how _I_ use the product.
It's like it's written for people who'd be interested in writing their own
data store, rather than someone, like most of us, who just want to CRUD in
some shape or another.

I feel like I'm reading a white paper.

Redis' website is the exact opposite. Similarly, MongoDB's website, while a
far cry from Redis', is still much, much better than Riaks.

~~~
egs
Thanks, this is a valid point.

Do take a look at our basic tutorial: <http://hyperdex.org/doc/tutorial/>

Then the more advanced one showing asynchronous operations, atomic operations,
and fault tolerance: <http://hyperdex.org/doc/tutorial-advanced/>

And finally the rich DB interface that supports lists, sets and dictionaries,
each with supporting asynchronous and atomic operations:
<http://hyperdex.org/doc/tutorial-datastructure/>

to see which, if any, of the features and API might be a good fit for your
specific needs.

------
rdtsc
Was trying to figure out where it lies in the CAP trade-off space. Found this
set of slides

<http://hyperdex.org/slides/2012-04-13-HyperDex.pdf> :

\---

* Consistent: linearizable. GET returns latest PUT, always

* Available: in the presence of  < f failures

* Partition-Tolerant: for partitions with  < f nodes

\---

So it seems they have strong C and a tunable trade-off between A and P ?

Then they take a jab at the popular interpretation of the CAP theorem and
claim they have a work-around:

\----

    
    
       Working around the [CAP] theorem:
         * Constrain the failure size
         * Redirect clients to majority partition
         * Proﬁt: Retain all of C, A, P
       Realistic for a modern data center
    
       CAP misses the point The real tradeoff is between C, A, and Performance.
    

\---

Not sure what they meant here. Anyone understood this better?

~~~
rescrv
CAP, as stated, is a tautology. If you want to understand this better, please
read our description of the CAP theorem (<http://hyperdex.org/extras/>). As
formulated by Gilbert and Lynch, the CAP theorem really says, "If you must
always read the latest value, and you must have any server serve any request,
and clients are collocated with servers, then you cannot survive a partition."
This is in the same class as, "If you separate all clients from servers, the
clients cannot contact the servers."

If, instead, you make assumptions about what kind of failures are actually
possible, then you can do more. HyperDex assumes that if you have less than f
nodes fail at any give point in time. A failure could be the process crashing,
the machine being turned off, or a partition occurring within the datacenter.
So long as this assumption is true, then HyperDex is able to survive the
partition, make progress (remain available), and provide strong consistency
while doing so.

By making additional assumptions, we are able to make additional guarantees.
This is not unlike what other engineers do (warning: massive simplification).
A engineer building a bridge can assume the bridge can hold only so many cars
and that each car has an upper bound on its weight. The product of these gives
the entire weight the bridge must withstand. If the engineer builds the bridge
to withstand the expected weight, plus some extra, then the bridge will work
in all anticipated scenarios. It will not, of course, withstand bumper-to-
bumper traffic consisting of 18-wheelers filled with lead bricks. But again,
it doesn't have to.

~~~
Diederich
I was guilty, for a time, of flinging 'CAP' around. Matters are, I believe,
both simpler and far more complex.

Here's the simplest way I can imagine a scenario. This is very similar to what
my company deals with all the time.

Given two, network distant data centers. Requests can land on either
datacenter. A request can be a PUT or a GET.

We deal with phone calls, so we have to provide an exceptionally reliable and
consistent service.

The problem: a PUT lands in datacenter A. All subsequent GETs have to be
consistent with this PUT. Even if they land in datacenter B.

The rest of the problem: datacenters have to be able to function
independently. This means that we can't block a PUT while it sync's to the
other datacenter. The other datacenter might be down, or there might be some
network delay, or whatever.

In my mind, it's simply impossible to have complete consistency and complete
reliability when the redundant pieces are WAN connected.(1) One can only
approach and approximate this, with more and more hardware, circuits and
development complexity.

Is my assertion incorrect? I'm unclear if HyperDex helps me with this problem.

By the way, I'm pretty excited about this product overall. If I wasn't in
catch-up mode at work, I'd be hammering out a proof of concept project on it
right away. It seems pretty compelling.

And thanks for this offering!

1) I believe a 'thick client' can, almost, solve this problem. Consider some
Javascript web app. It can be developed to transparently handle various
network partition situations. More or less. But many of our requests allow
absolutely no code on the client side. They are standard REST calls.

~~~
Diederich
Ok, I found these slides and I believe we are on the same page, so to speak:

<http://hyperdex.org/slides/2012-04-13-HyperDex.pdf>

Quote:

What CAP really says: \- if you cannot constrain your faults in any way, \-
and your requests can be directed at any server, \- and you insist on serving
every request, \- then you cannot possibly be consistent.

No shortage of systems that preemptively give up on C, A, and P, especially C

End quote.

I think that's exactly correct.

Which leaves me with a nearly impossible problem. But it sure is fun to work
on!

------
mamcx
Look interesting. Is like a mongodb + redis in one?

If i understood it well, if the coordinator die then everything die?

Any plan in provide pub/sub?

And/or capped collections (by number (keep 1000) or TTL (keep 24h))

~~~
egs
Great, let me answer your questions one by one:

HyperDex is a next generation NoSQL store, so it kind of resembles traditional
NoSQL stores like Mongo, Cassandra, Riak, and Dynamo, with a rich interface
similar to that of Redis, but it offers much stronger properties than all of
these systems. It differs from Mongo in that it provides much stronger
consistency and fault-tolerance guarantees. Mongo's default will gladly
pretend that an operation was committed even before it was seen by any server.
HyperDex differs from Redis in that it shards its data across a network and is
generally designed from the ground up for a networked environment. Both Redis
and Mongo can and will return stale results following a failure whereas
HyperDex will always return consistent results -- it guarantees something
called "linearizability" for key-based operations, which roughly means that
time will never go backwards from the point of view of any client. And in
spite of offering stronger properties, HyperDex is faster than both Mongo and
Redis on industry-standard benchmarks.

The HyperDex coordinator sounds like a singular entity, but is in fact a
redundant, replicated service. The Paxos algorithm (provided by the ConCoord
implementation here: <http://openreplica.org>) ensures that the coordinator
overall can survive failures of some of the coordinator replicas.

Building a pub/sub system on top of HyperDex would be an excellent project.

HyperDex does not support "capped collections" out of the box, but it would be
trivial to implement these with a background thread that prunes the database.
The data store supports sorted queries, so you can say "return the top-1000
objects sorted by insertion time" or whatever else metric you liked to sort
by. And you can delete groups of objects. These operations are implemented
efficiently.

Hope these help.

------
crazygringo
This looks interesting, but can someone explain the benefits over
memcache/Redis?

Is "hyperspace hashing" storing multiple copies, so it's like storing records
on multiple shards of a database?

And "enables lookups of non-primary data attributes": how useful is this,
actually? Is this a big step forward for NoSQL?

The site doesn't seem to be giving me a real-world case where HyperDex solves
existing problems better than other things... maybe I just can't find it?

~~~
rescrv
Memcache is a caching solution that stores binary blobs. HyperDex stores data
persistently and offers a wide variety of types such as lists, sets, and maps.

Redis offers many datastructures as well, but it has a limited architecture.
If you want to run multiple Redis instances, you must run them in a master-
slave configuration with no guarantees when the master fails. HyperDex can
withstand such faults while guaranteeing linearizable semantics. Further,
we've got some changes in the pipeline that will enable us to support every
datastructure operation Redis supports (BRPOPLPUSH, I'm looking at you) with
horizontal partitioning across multiple machines.

Hyperspace hashing stores multiple copies, but this is configured by the user.
In our applications we've had at most two copies.

You can find more about looking up non-primary data attributes in the tutorial
(<http://hyperdex.org/doc/tutorial/#creating-a-new-space>). It enables you to
search over attributes of an object without having to maintain secondary
indices.

We've built a sample threaded-discussion application on top of HyperDex. We're
also in talks with some astronomers who would like to use it to analyze data
from a radio telescope. The geometric nature of hyperspace hashing makes it
easy to perform queries such as, "retrieve all objects in this cone through
space." For any application where you perform secondary attribute search,
HyperDex is the way to go. For other applications where you want high
throughput and low latency with strong consistency, HyperDex is a good bet.

~~~
DennisP
I'd be very interested in your threaded-discussion sample. Couldn't find it on
your site, do you have it online somewhere?

~~~
rescrv
We have a demo of an earlier version deployed on <http://gibbr.org/>. We
currently are not releasing the source, but we may make an example application
using the same design we use in the real app.

~~~
DennisP
Cool. An article on how it's designed might be just as good.

------
snarfly
Ever thought about using some kind of locality sensitive hashing or maybe a
neural network to map your properties into the table space? Intuition is
telling me your partitioning scheme would still work and similarity search
could be improved?

------
gizzlon
Would be more believable if you gave some examples of the tradeoffs that went
into the design, example use-cases, strengths and weaknesses, etc..

As it stands, it just reads: "It's faster!! It's better!! It's webscale!"

~~~
rescrv
We have a full 14 pages (<http://hyperdex.org/papers/hyperdex.pdf>) describing
the tradeoffs that went into the design and where the system's strengths and
weaknesses lie.

Our statements that it is faster derive directly from our observations and
evaluation in the paper.

~~~
gizzlon
That may be so, but people might dismiss everything as bs and before getting
to the paper.

~~~
egs
A real hacker would never be dismissive of code and claims backed by an open
git repo and documentation.

That said, HyperDex outperforms and provides stronger guarantees than previous
key-value stores due to two architectural differences.

First is a new way to distribute data called "hyperspace hashing," whose
description requires a picture, and can be found here:
<http://hyperdex.org/about/>. This is quite different from the simple
bucketing performed by memcached, Dynamo, Cassandra, Riak, MongoDB and others.
See the latest slide set for an illustration of differences.

Second, HyperDex maintains replicas through a technique known as "value-
dependent chaining." This enables the system to keep the replicas in sync
without having to pay high overheads for coordination.

~~~
gizzlon
_"A real hacker would never be dismissive of code and claims backed by an open
git repo and documentation"_

Time is limited, the stream of new NoSQL projects is less so.

I'm sure the project is interesting, but the webpage comes off as bombastic.

~~~
egs
You asked for a technical description, and also mentioned that you were
unwilling to read a long, detailed technical description, and asked for a
"TL;DR." I hope the short summary was useful. If you have technical feedback,
we'll be very happy to engage further.

~~~
gizzlon
Thanks, I see that I came off as overly negative, sorry.

The things you link to, and a login; article (that I haven't read yet) put
things in another light :)

I still think your webpage needs some work though. I twice dismissed hyperdex
as "just another nosql/storage project full of s"#¤". I'm probably not the
only one.

It's difficult to put my finger on _why_ it comes off as it does, but it might
be the density of buzzwords in the first few sentences.

What I actually want to know when looking at new projects is:

    
    
      - Where does it fit in the nosql "space"? Is it k.v.? graph? something else? 
      - Why is it different? Why do we need yet another .. [1]
    

(Just to be clear, I'm not asking you to answer this _here_ , but IMO you
should answer i on your homepage)

edit: [1] <http://nosql-database.org/>

~~~
rescrv
Thanks for the positive feedback. We've been working on improving the
documentation (and soon, the website). Your feedback certainly helps.

------
Ixiaus
Additionally, this looks like a repost anyways:
<http://news.ycombinator.com/item?id=3622888>

------
daemon13
Do you have examples of use in production?

Could not find on your site. Adding such info would be beneficial.

~~~
rescrv
We have worked with LinkedIn to use HyperDex for some of their analytics work.

------
DennisP
How hard is it to add APIs for other languages? (In my case I'm thinking JVM,
since I'm getting into Clojure.)

~~~
rescrv
We have complete Java bindings
([https://github.com/rescrv/HyperDex/commit/1c4e98d88517a63bf0...](https://github.com/rescrv/HyperDex/commit/1c4e98d88517a63bf04570bbb2545c784e978d2b)).
Does this help for Clojure?

Our base bindings are in C. Any language which can wrap C can make bindings.

~~~
DennisP
Awesome, yes.

------
Ixiaus
This looks very interesting; however, I do wish they would stop doling out the
koolaide on (almost) every page. Instead of telling me why they are "this much
faster than xyz" all I see are some graphs and numbers, which isn't
particularly helpful because the benchmarks could be flawed!

~~~
moe
_all I see are some graphs and numbers_

What the hell?

You are free to disagree[1] with their results or point out flaws in their
methodology. But how on earth can you complain about them providing real
numbers _and_ the code used to generate them?

This is rare enough, most other databases only provide the kool-aid, without
any numbers whatsoever (cf. MemSQL).

[1] [https://groups.google.com/d/msg/redis-
db/N4oy3lCngsU/NQUwf12...](https://groups.google.com/d/msg/redis-
db/N4oy3lCngsU/NQUwf12ErpgJ)

~~~
egs
Thanks for the note. We were flabbergasted to read the parent comment
complaining about "graphs and numbers."

Note that the issues raised by the Redis developer boil down to the following:

* The benchmark measures a primitive that Redis does not provide, so Redis looks slow: This may be true. The strength of a system lies in how well its primitives handle unanticipated uses. Undoubtedly, a system with a "do_benchmark()" interface would achieve the best results, but this is not a good way to build systems. For the record, HyperDex's interface is at the same level of abstraction as Redis's in this case.

* The benchmark compares "single-core Redis" to "multi-core HyperDex." It is true that HyperDex was designed from the ground up for a networked, multi-core system. Redis is not sharded and seems to work best when co-located on the same host as its clients. If your data fits on one host and clients are on the same machine, you should probably use Redis and not HyperDex. As for the complaint, we would have used something other than "single-core Redis" if such a thing existed. Our emails to Salvatore asking for an alternative binary went unanswered -- he chose to respond with blog entries instead of code.

* The benchmark is not real: The benchmark in question is the Yahoo Cloud Serving Benchmark. It's not something we made up. One can only imagine the kind of criticism we would get if we had actually constructed our own benchmarks. YCSB is an industry-standard benchmark commonly used to evaluate the performance of key-value stores.

These kinds of issues are really easy to resolve, without having to recourse
to noise on blogs and HN: We urge everyone to test their own apps against the
git repo. We worked hard to make HyperDex the best key-value store out there
with the strongest properties, and we hope you find it useful.

~~~
rescrv
Just a further note: Although HyperDex is multithreaded by default, when
benchmarking it against Redis, we disabled all but one thread from serving
network requests.

Edit: reword for clarity.

