

My Year of Riak - rlander
http://inaka.net/blog/2011/08/25/when-to-use-riak/

======
drewr
Most people label Riak a key/value store, as does the author, but I get a more
useful mental picture thinking of it as a distributed file system. The same
kinds of things that are fast on a file system are fast with Riak (direct
access), and the same kinds of things are relatively slow on both (find(1)ing
a file based on a pattern match compared to M/R for data). They both also
excel at storing binary data assuming you understand the limits.

This model has helped me make better decisions about when & where to use it
and the cost involved. For example, if you're on AWS, just use S3 and keep
metadata in an RDBMS or search engine.

------
firefoxman1
I've been looking into Riak mainly because its key-value style, but what I'd
_really_ love to see is an exact replica of Redis which is purely disk-based.

I don't know if anyone else feels a need for this, but I'd love to be able to
do my entire DB with it, but the problem is RAM is really expensive to scale
compared to disk space.

~~~
DanWaterworth
You may be interested in <http://inaka.github.com/edis/> .

~~~
firefoxman1
Oh that's really cool. Its using leveldb got me thinking, how hard would it be
to create a Riak client that emulates Redis?

~~~
DanWaterworth
Well, I use redis quite a lot and I find that the feature I most value is
sequential consistency. That's not something that can be tagged on later.

------
itay
Only at the end did I realize this was written in August of 2011, before 1.0
was released!

~~~
rubyrescue
i should (and will) update it soon; the post comes off slightly more
negatively than I would like at this point - We are now really happy with
secondary indices and are happily deploying Riak on (relatively) cheap
physical servers at stormondemand. The more I use Riak the more I love and
recommend it...

------
lobster_johnson
One thing that concerns me is that Riak pushes two solutions for doing queries
that operate on collections of objects: Riak Search and MapReduce. Both are
implemented in a way that is pretty unacceptable for "real-time" use (eg.,
live application use, as opposed to background maintenance jobs).

Riak Search should in theory be good stuff. It uses Lucene underneath, which
is known to be rather slow, and last I checked it could not properly
distribute documents across nodes, severely limiting its scalability. It is,
in short, known to perform pretty badly.

MapReduce is interesting, but evidently not designed for "real-time", since it
relies on traversing inter-object links, key by key. It does not scale to
large number of links, either, since it's not possible to add or remove links
incrementally. To add or remove a link, you have to rewrite the entire object
and all its link in a single update operation.

I am sure Basho is working on those problems, but in the meantime, I'm
surprised not more people complain about these issues. Riak had awful, awful
performance before they released 1.0 with LevelDB support. Now the performance
is much better, but it's still "merely" equivalent to the basic CRUD support
of S3 (minus ACL support and so on).

It's not fast enough to replace Memcached or Redis as a first-level cache, and
unlike Cassandra it does not do range lookups well, and anything that needs to
scan the dataset sequentially will end up sucking, performancewise. These
things limit its usefulness for many apps.

Right now you have something that is quite decent for implementing a
distributed file system, session store or persistent fallback cache (behind
something fast such as Memcached or Redis), but is a bad fit for many other
applications.

------
moonboots
The post mentions that the disk performance of Amazon small ec2 instances was
poor enough to justify the relatively expensive large instances. If the blog
author is on HN, were you using Amazon EBS or ephermeral storage to back Riak?
I have seen different advice regarding using ephermeral storage to increase
disk performance at the expense of data integrity in the event of instance
termination. I remember Reddit programmers saying they run Cassandra on
ephermeral storage.

~~~
rubyrescue
(i'm the author) - we have used both at different points. At this point, we
deploy everything we used to deploy on EC2 to StormOnDemand physical servers,
and sometimes Storm SSDs.

~~~
rlander
Chad, us South American Erlangers are few and far between. =)

I could not believe that your (Inaka's) posts on Riak and scaling Erlang had
not yet been submitted to HN. Very useful stuff!

I also really enjoyed the edis preso, that's what got me to the blog.

------
Fluxx
I wrote a similar blog post back at my old job about our usage of Riak.

[http://devblog.seomoz.org/2011/10/using-riak-for-ranking-
col...](http://devblog.seomoz.org/2011/10/using-riak-for-ranking-collection/)

Keep in mind we used Riak pre-1.0, so YMMV.

------
jbyers
(August 2011)

