

Basho Releases Riak v1.0.0 - roder
http://blog.basho.com/2011/09/30/Riak-1-dot-0-is-Official/

======
dsl
Just a heads up if you are looking to dive right in: The client libraries
still lag behind Riak itself, so it might be a while before you get all the
goodness unless you plan to roll your own.

~~~
tsuraan
Yup. For haskell users, I have a trivial fork of Mailrank's riak-haskell-
client that is tested to work with riak-1.0rc1; I haven't tried the secondary
indices stuff yet, but it's possible that it would work too. It's at
<https://github.com/tsuraan/riak-haskell-client> for anybody who wants to try
it out.

------
nosequel
Made it by the end of the month as promised!

------
rb2k_
I can't seem to find information about weather or not ripple (the official
ruby library) supports the new features at this point. Any heads up?

~~~
seancribbs
There will be a minor gem release (pre/beta/something) next week that supports
Riak 1.0 features, then in the next major release those will also bubble up to
the Ripple document layer. Sorry for the delay.

~~~
rb2k_
Any news on this? :)

------
nirvana
Congratulations! I hope no Basho employees were injured in the last couple
days of what, reading between the lines, sounded like a hard fought battle!

I'm about to dig in and read the release notes, upgrade etc, but I've been
keeping up with the betas, so here's my early thoughts on Riak 1.0.

With 1.0 Basho has really started to separate from the pack.

I think secondary indexes are very well implemented and while they can be
expanded feature wise over the years, as an %80 solution they are a no
brainer-- and pack the double whammy of cutting developer time, and by
efficiently reducing scope of Map/Request processing they can also boost
performance.

I think the real sleeper hit, though, is riak_pipe. This moves Riak from just
being a "database" or even a "batch processing system" into a realtime
platform. I think in 3 years, this will be seen as the feature that put the
elbow in Riak's growth. I'm hoping to have high level support for riak pipe in
Nirvana when its released, and can't wait to start using it. Once again, I
think you've saved me a couple months of work.

I know a lot of work was done on supporting new backends, specifically,
LevelDB, and consolidating/unifying the existing ones (like merging ETS and
Cache into the new RAM backend.) I think a blog post on each of the backends
and when best to use them would be very useful (though this might be covered
in the new docs.)

And Search Integration. I think this is the first NoSQL solution with built
in, scalable, full text search.

Before, if you'd decided to go NoSQL, you kinda had to decide which
architecture worked best for you and hope they had the features you wanted. I
chose Riak because, I believe, it has the best architecture for the class of
problems I'm solving... but now it also has a very complete set of features.
I'm not sure if any of the competition is as complete out of the box, but even
if they are, Riak should be in a lot more evaluations than it has been in the
past.

Further, everything is so elegantly engineered that you've built an
exceedingly attractive platform for us developers.

Bravo, and thank you very much!

~~~
samstokes
_riak_pipe... moves Riak from just being a "database" or even a "batch
processing system" into a realtime platform_

Interesting - could you explain more about this? I've not really grokked
Riak's map-reduce yet, and all I understood from the blog post about riak_pipe
was that it was a new layer under the hood but didn't change the programming
model for map-reduce queries. Is it simply that it's so much faster as to
permit new use cases?

~~~
nirvana
This is all personal opinion, of course. I think the key to what makes Riak
great is that it is fully distributed. Every node is a peer, and this
eliminated single points of failure. But this also makes things a challenge
for organizing work. Riak was previously a fairly monolithic product[1] with a
set of features, including being a KV database and doing Map-Reduce
processing. At some point Basho, wisely, decided that making the product more
modular would allow them to be more agile in their development.

So, they split the KV database from the ring code, creating Riak_Core and
Riak_KV. Riak_Core allows you to crate a ring of virtual nodes on a cluster of
physical nodes, and spread work around it. (essentially the dynamo concept.)
Thus, Riak KV then became an application running on the virtual nodes of Riak
Core, providing a key-value database. At this point (e.g.: post Riak Core
split, but pre-Riak 1.0) Raik_KV also managed the Map-Reduce functionality.

With Riak Core, you can create an application that does whatever kind of work
you want, and spread it around a dynamo style ring. The ring is just a way of
partitioning up work using a hash function so that it can be evenly
distributed across the virtual nodes (Which are cleverly distributed across
the physical nodes in the cluster.)

Riak Pipe is an abstraction on top of Riak Core that makes lets you build a
pipeline of processing. Each stage in the pipeline is called a fitting. Each
fitting has a function (that does the work) and a function to decide which
vnode to do the work on. When the pipeline has data going thru it, the vnodes
that get work create queues and worker processes to do the work. A key feature
of this is that if a queue gets full, earlier fittings in the pipeline are
stopped from adding to it, such that their queues will eventually get full
(say if there's a very slow process near the end of the pipe) producing a
"back pressure" to prevent work from overwhelming the cluster (or a particular
vnode).

So, for Riak 1.0, they re-worked their Map Reduce implementation to run on
Riak Pipe. This will allow for more flexible map-reduce jobs in the future
(maybe even now). As an example of how the map-reduce implementation works, a
map phase might be described as a fitting that uses a word-count function (to
do the work) and uses the hash of the piece of data from Riak KV to determine
which vnode on which to run. so, as you fill the pope with documents to have
their words counted, the tasks get spread to fittings across the cluster, and
then each fitting sends its results to the appropriate vnode for the next
stage in the pipe (which might be reduce) ... and here's the key point...
without it having to talk to the node that started the job. Previously, the
node that started the map-reduce job (I believe) had to coordinate it across
the cluster)... now it self coordinates.

The great thing about Riak Pipe, though, is that it is (as I see it)
essentially a realtime processing engine. Say you had a job where you were
monitoring the twitter firehouse for mentions of your company. The task is
relatively straightforward, but you wouldn't want it to all be running on a
single node, right? So, you'd have the fittings work function be the code that
scans for your company name in the tweet and flags it, and the function that
determines which vnode to run on could be a random hash (so it's evenly
distributed across all vnodes.) When the firehouse overwhelms your cluster,
you don't find yourself swapping because back pressure will stop new tweets
from going into the pipe, and if you need to add capacity you just add a new
machine to the cluster.

I'm still wrapping my head around some parts of Riak Pipe. I think that it
will turn out to be a really killer feature.

[1] Seems silly to call any cluster of a bunch of erlang processes
"monolithic", but a better word is escaping me at the moment.

~~~
samstokes
Great explanation, thanks!

