Riak 1.0

socratic · on Sept 28, 2011

I am increasingly interested in Riak, in part because a very vocal minority on HN seems to think it is the One True NoSQL solution.

However, I still don't quite get it. What is Riak? It seems to be some sort of Dynamo implementation (like the ironically ill fated Cassandra), but apparently it has a workflow engine? What do people use it for? What is it best at?

Right now we're using PostgreSQL, Redis, and S3. PostgreSQL gives us ACID, Redis gives us fast in-memory access, and S3 gives us an infinite KV store. Is there some reason to use Riak? Would Riak just replace S3?

kennystone · on Sept 28, 2011

Ultimately it is a eventually consistent key-value store. They have map-reduce and a new meta-data thing that lets you query documents, but you really only want to use it if your data maps well to key/value (and lots of data does!). The really cool things are that it is truly horizontal scale - a new node brings read+write+map-reduce power, and that it will help you sleep better with its dynamo-ism.

amock · on Sept 28, 2011

If your values are very small Riak is probably a good replacement for S3. If your values are large then S3 is probably better than Riak.

siculars · on Sept 28, 2011

I'm not so certain. A value written to riak has something on the order of 450bytes of overhead (as of version 0.14, not entirely certain of the exact overhead in 1.0). Basically riak will write your value to disk with a bunch of other data that it uses internally to do its thing. Writing a stream of integers, one integer per riak key, would be a bad idea (tm), imho.

dschobel · on Sept 28, 2011

Why do you say that Cassandra is ill-fated?

socratic · on Sept 28, 2011

NoSQL seems to me to be a scene as much driven by hype as logic. (For better or worse, practitioners using and programming NoSQL systems tend to have little understanding of the 30+ years of relevant RDBMS academic literature, and 20+ years of distributed RDBMS academic literature.)

Given that, NoSQL adoption often appears driven by success stories. But Cassandra seems to have the opposite: numerous failure stories. Facebook, Digg, Reddit and a number of others have all tried Cassandra in production, and have either had serious complaints or moved off to either SQL or other solutions like HBase.

Of course, these failure stories are anecdotes, and numerous unrelated factors (like bad interactions between Cassandra and Amazon's EC2) could be at fault. But I'm not sure it matters.

Has anyone on HN had a really good experience with Cassandra? (This may be the wrong thread to ask for obvious reasons.)

nosequel · on Sept 28, 2011

Riak users have had quite the opposite results based on talks given at conferences and videos put online. Yammer, Voxer, Formspring, Bump are all riak customers and all have a video out there somewhere talking about how much the like riak. Yes riak and cassandra share a little bit of technology (being dynamo inspired), but most of cassandra woes seem to come from operational difficulties and not the technology theory. Riak has more of a focus on operational friendliness out of the box.

There are videos and talks out there from several customers listed on Basho's site, just take a look.

superjared · on Sept 27, 2011

"Riak 1.0 will be available later this month."

I can't stand pre-announcements like this.

timf · on Sept 27, 2011

Further, there are only three days left.. The HN title should be changed away from "Riak 1.0".

pjscott · on Sept 27, 2011

This is actually an announcement of RC1 being released:

http://downloads.basho.com/riak/riak-1.0.0rc1/

I just did a rolling upgrade from 1.0b4, and it went smoothly. I love the LevelDB support, and it's holding up well under the considerable load that I'm throwing at it.

nosequel · on Sept 27, 2011

Riak 1.0.0 RC2 is out now, and "later this month" is only 3 more days.

nirvana · on Sept 27, 2011

I think when you've got your release candidate, you're basically done, barring any unexpected surprises. Given stable Riak has been during the beta period, I don't think they're really jumping the gun much.

tsuraan · on Sept 27, 2011

Notice that this is a bit of a pre-announcement:

  Riak 1.0 will be available later this month. To preview 
  some of the new features, download Riak, or to inquire 
  about a commercial deployment, please visit http://www.basho.com.

Their github page still just has 1.0-rc1 tagged. I'm excited though.

jibs · on Sept 27, 2011

A bit tangential to this particular announcement - but i've been musing about using Riak, though so far put off by their (seemingly) open-core, rather than open-source implementation. Are the paid, enterprise functions stuff you eventually need in most use cases? the lack of multi-site replication in particular is curious; would this mean I can replicate between nodes on the cluster, as long as they are in the same datacenter, but not across the interwebs until i hand over some $$$?

skeltoac · on Sept 27, 2011

Enterprise includes a ring replication layer designed for higher-latency connections.

There is nothing preventing you setting up a cluster that spans continents. What will deter you is the poor performance of the cluster due to the added latency between nodes.

jibs · on Sept 27, 2011

Interesting - from what I remember of the the original Amazon Dynamo paper it seems the ring replication is pretty central to the thing (if we are both talking about the ring replication used for the distribution of keys across the nodes). This is sounding like crippleware :(

rzezeski · on Sept 27, 2011

Replicas (or as you put it, ring replication) is critical, and Riak very much has replicas. What it doesn't have in it's open source version is multi-ring replication (cross DC) which is a separate concern.

In the Dynamo paper the ring spans DCs but they also have a very different network than most that allows them to do that. In Riak it is recommended that each ring is contained in a single DC. If you want ring-to-ring replication from Basho then you can pay for Riak EDS. You could also build it yourself as others have mentioned (Kresten Krab Thorup has done something like this in Riak Mobile [1]).

Nothing is stopping you from running a single ring (cluster) across DCs, and it might even be okay for certain apps, but it's not a choice that should be taken lightly. In general, if you don't understand the tradeoffs you're making in that regard then it's best to stick to one ring, one DC.

[1]: http://www.erlang-factory.com/upload/presentations/413/Erlan...

jibs · on Sept 27, 2011

Ah, understood - that makes sense. Thanks! re. my "ring replication", sorry that was sloppy of me, but-you-know-what-i-mean :)

skeltoac · on Sept 27, 2011

Replication of keys around the ring is free. What you pay for is their solution to the problem of significant latency between nodes: code that replicates the whole ring in several sites and coordinates the communication between the sites.

reiddraper · on Sept 27, 2011

The replication in the Enterprise version is replication between entirely different clusters. The ring replication you talk about is definitely open source.

technoweenie · on Sept 27, 2011

From what I understand, you could build the same functionality on your own if you wanted. Riak has post-commit hooks that you can tap into. I think the multi-dc replication uses them, though I'm not positive.

nirvana · on Sept 27, 2011

Riak is Open Source. It contains a very complete platform. Riak Core is a dynamo style distributed system platform (not database specific), Riak Pipe is workflows, Riak KV is a KV database, Riak Search is full text search over that database. And there's lot of other stuff I'm not even mentioning (like bitcask, the logging stuff, etc.)

When you go to the Riak project on github, what you find is actually sort of a skeleton, that has as dependancies all those projects I mentioned above, such as riak_kv, riak_pipe, etc.

Riak ES, the commercial offering, is a superset of Riak. It has Riak as a dependency, and adds the feature of cross datacenter replication. I think the real reason you buy Riak ES is because you're wanting to buy support.

Riak ES being a commercial product doesn't make Riak any less open source, than Oracle Server being a commercial product makes Linux less open source.

Also, Basho is keen to develop users of Riak ES, and customers of Riak (who don't spend any money) still get some support from Basho. Basho has a "Riak ES for startups" program, which gives you a huge discount.

I'm building my business on Riak because Riak is open source. IF Basho goes away, I'll still have Riak. There's nothing missing from Riak that I need.

I figure if I get big enough where I want to be running out of multiple data centers, I'll be big enough to afford Riak ES, and if I can't afford Riak ES at that point, then I'll be able to build my own solution. (I don't think it would be that hard, actually.)

gizzlon · on Sept 28, 2011

I guess what he mean was this: "Open core (a.k.a. proprietary relicensing[1]) is a business model where an open source product is also made available commercially with non-open-source additions" [1]

I cant speak to Riak, but generally this model can create a conflict of interest between the "enterprise features" on the one hand and open source commitments on the other. For example if someone submits code to the opensource version that duplicates/overlaps an "enterprise feature"

1: http://en.wikipedia.org/wiki/Open_core

neonkiwi · on Sept 27, 2011

> I figure if I get big enough where I want to be running out of multiple data centers, I'll be big enough to afford Riak ES

I was thinking along those exact same lines, but a big unknown was pricing on their enterprise offering. That information is unavailable on the web, and despite my skepticism in contact-us-for-the-price situations, I filled out their online form, which is a request to be contacted by a representative.

I haven't heard from them, but they did put me on a mailing list—I got an email about this 'milestone release' today! Not quite what I wanted to know, though :)

Nirvana, or someone using their Enterprise offering, perhaps you could fill us all in on the price?

mshneider718 · on Sept 28, 2011

Hey neon...work for Basho so would like to research why you were not contacted...any details are greatly appreciated

latch · on Sept 27, 2011

I haven't used Riak, but I did look into it for a project short while back. One problem I had was that the documentation on their website is heavily focused on what Riak is, vs how to use it. It's great that you can get such a fundamental understanding of Riak as-a-dynamo-implementation, and they do a great job writing that stuff, but its completely out of touch with what I expected/needed.

Technically, what eventually put me off, is that I couldn't figure out how to maintain a clean secondary index. If you have a: SiteId, UserId, Data, and you want data to be accessible by SiteId or SiteId+UserId, I couldn't figure out a nice atomic way to maintain the secondary index. This is pretty basic stuff. I'm glad to see 1.0 will support native secondary indexes, but I think my inability to figure it out shows that their documentation is poor (or it could be that I suck).

bgentry · on Sept 28, 2011

The secondary index stuff is actually very new, so it's entirely possible that it didn't exist when you last looked at Riak.

More info here http://blog.basho.com/2011/09/14/Secondary-Indexes-in-Riak/

geoffhill · on Sept 27, 2011

LevelDB support! Would love to see how it compares to Bitcask in terms of speed.

willbmoss · on Sept 27, 2011

Bitcask can guarantee one disk seek, whereas LevelDB will do one disk seek per level, so at least from that perspective, it can't be better.

Level also has to look down the entire tree if a key is missing. This means inserts end up being more expensive than reads or updates (which are all just a hash lookup in Bitcask).

fizx · on Sept 27, 2011

"Bitcask can guarantee one disk seek, whereas LevelDB will do one disk seek per level, so at least from that perspective, it can't be better."

Yep, this is a standard tradeoff. When you want your data to be iterable, you have to take the hit. In practice (I oversee a large cassandra cluster), this hit happens about ~1% of the time, which is either a lot, or a little, depending on your constraints.

"Level also has to look down the entire tree if a key is missing."

This is why Cassandra has a bloom filter on top of a very similar data store.

nosequel · on Sept 27, 2011

LevelDB is there as the replacement for those who are currently using Innostore as their backend and not for those who have a dataset that fits bitcask.

Ixiaus · on Sept 27, 2011

I imagine it won't be as fast? The cool thing about Bitcask is that all of the keys are in memory - I imagine that would also be beneficial with secondary indexes now supported...

LevelDB seems mostly well suited for data that becomes (in terms of key size and number of keys) bigger than your RAM...

dsl · on Sept 27, 2011

I think it will be a welcome change for anyone who runs a decent sized Riak deployment. We are currently adding machines simply to increase available RAM in the cluster.

nirvana · on Sept 27, 2011

Why not bring a node down, and then replace it with a node that has more RAM? Are you exceeding the size of a node you can supply (in terms of RAM) for your cluster?

I'd be very curious to know a bit about the character of your data, the size of your cluster, etc. (I've only run test clusters at this point, so hearing from someone doing production work would be informative.)

dsl · on Sept 28, 2011

Replacement vs. addition is a situational trade-off, but ultimately the problem remains that you need to bring more RAM to the party.

My biggest RAM consumer stores historical data for a goods trading platform. Each trade is a unique key, with all the trade data being the value. Access speed is important, but not as critical as the other goodies I get from Riak (replication and automated rebalancing). Metadata is stored separately, but I hope to change that with Riak 1.0 secondary indexes.

varworld · on Sept 27, 2011

Secondary indexes are currently only supported on levelDB.

Ixiaus · on Sept 28, 2011

Interesting - well, that would get me to choose leveldb then!

lwat · on Sept 27, 2011

Is anyone here paying for the Enterprise level Riak? I'd love to hear how they charge and whether you think it's worth it. Currently we're looking towards the Denali release later this year but Riak is looking more interesting by the day.

vdm · on Sept 27, 2011

> Interested in Riak Enterprise for your company? Contact Us Now

I don't have a company. Yet.

What I am interested in is a go-away button on this obnoxious ad bar so I can read your webpages on my vertically-challenged 11" screen.