

Riak Adoption - We Have Some Work To Do  - pharkmillups
http://www.themarkphillips.com/2012/04/24/riak-adoption-we-have-some-work-to-do.html

======
cagenut
I can't help but notice you're asking almost exactly the wrong people. People
on the riak-users mailing list are by definition the choir.

As someone who works somewhere that looked into riak, here was my first
impression:

    
    
      [root@brandt ~]# yum install riak
      <snip>
      No package riak available.
      Error: Nothing to do

~~~
nosequel
You should read this as an example why many software vendors don't bother with
trying to get their packages into distributions' package repositories.

[http://blog.lusis.org/blog/2012/03/16/why-you-should-stop-
fi...](http://blog.lusis.org/blog/2012/03/16/why-you-should-stop-fighting-
distro-vendors/)

------
jimparkins
Three things:

(1) Embrace the fact that a lot of people want to partner up Riak with an in
memory DB like Redis and provide documentation, articles, tools to make this
easy.

(2) A public face of Basho at events like Scala-days and who engages the
community. @rit for 10Gen does a really good job of this - check out his
twitter and Github

(3) Do a gap analysis between the content on sites like the cassandra website
- I keep feeling like my week reading Cassandra content and articles hammered
home fundamentals about the how things were working under the hood more than
Riak. Which gives me more confidence than a "it just works" mentality.

~~~
mhluongo
Doesn't Riak have a memory-based backend? So wouldn't it make more sense to
say "here's why you don't also need Redis"?

Ref - <http://wiki.basho.com/Memory.html> , <http://wiki.basho.com/Multi.html>

Edit - added another ref link

------
ismarc
We chose Mongodb over Riak for a host of reasons, but after working with
Mongo, there's times I definitely wish we had gone after Riak, but there's a
laundry list of why it lost out (and where it was super strong as well). I
took a look at the list of improvements and one of the biggest reasons we
didn't pick Riak (Riak's APIs and model make it significantly difficult to use
it and be able to mentor a mediocre developer to working with an existing
system...I've got a whole list of details on where, how and why this lost out)
doesn't seem to be on the list. There's standardizing the client APIs, which
is nice, but wouldn't address a large number of the issues. On the other hand,
a lot of the issues could be resolved by wrapping existing APIs (whether
directly in Riak or in client APIs/the REST API). We decided not to create our
own wrapper because of maintenance and training/documentation concerns.

There's really too much info to go into it here (we spent something like 4-6
months investigating different datastores with actual production data and load
levels), but Mark, if you want to sit down and talk about it, or chat via
email, feel free to drop me a line. My contact info should be in my profile.

~~~
gregholmberg
Interesting. After evaluating carefully for a few months, we didn't feel that
MongoDB would work well for our project.

Later, we reluctantly gave up on CouchDB.

Riak 1.1.x seems to meet all of our design goals so far.

------
pixelcort
One of the questions I haven't been able to track down that has prevented
adoption by me is how efficient the caching around MapReduce queries are in
Riak.

In CouchDB MapReduce queries are completely incremental; if only a little data
changes, only the relevant parts need to be rerun through the MapReduce
process.

Riak looks really promising as:

1\. It allows chaining MapReduce functions together

2\. It supports sharding

Looking into Riak, it looked like there was a cache for part of the MapReduce
system, but I wasn't sure how that cache worked or if it would be enough for
large datasets that only change in little increments.

Edit: formatting.

------
ahi
I tried for the second time a couple months ago and found the documentation to
be incomplete and inconsistent. e.g. This has been on the Riak Fast Track for
many months now: "There are various screencasts throughout the Riak Fast
Track. At the time they were made, Basho was using Mercurial and Bitbucket for
our version control system and development platform, respectively. We have
since switched to Git and GitHub but the screencasts do not yet reflect this.
Do not be alarmed. We will re-record these using Git/GitHub when time
permits."

~~~
s-ben
I'm the tech writer Basho just hired to work on this. Let me just say that I
feel your pain like Bill Clinton. Setting up my own system using the Riak wiki
has been less than pleasant. There's lots of good content, but also lots of
examples like the one you point out. We're overhauling the Fast Track soon.

------
robotmay
If Riak was available as an on-demand service on any cloud provider other than
Joyent then I would have jumped on it a while ago. If I ever had anything
large enough to warrant it then I'd build my own cluster, but it's just not
worth the effort/cost on a smaller build.

------
rb2k_
Last time I checked the ruby driver for risk didn't properly support secondary
indexes. Looking at <https://github.com/basho/riak-ruby-client> and
<https://github.com/seancribbs/ripple>, I can't see it there today either.

I'm sure that I might be able to find it in the spec folder somewhere, but I
think it's a great feature and putting it in the readme would help a lot of
people

~~~
seancribbs
@rb2k_ Here's the relevant docs:

[http://rdoc.info/gems/riak-client/Riak/RObject#indexes-
insta...](http://rdoc.info/gems/riak-client/Riak/RObject#indexes-
instance_method) [http://rdoc.info/gems/riak-client/Riak/Bucket#get_index-
inst...](http://rdoc.info/gems/riak-client/Riak/Bucket#get_index-
instance_method)

------
Cloven
I think the Riak pages are not geared at the people making the selection,
e.g., architect-level technologists. The wiki is full of marketing claims
('Riak is the most powerful open-source, distributed database you'll ever put
into production') ('Riak is the most boring database you’ll ever run in
production').

But then suddenly:

'curl -v -X PUT -d '{"bar":"baz"}' -H "Content-Type: application/json" \ -H
"X-Riak-Vclock: a85hYGBgzGDKBVIszMk55zKYEhnzWBlKIniO8mUBAA==" \
<http://127.0.0.1:8091/riak/test/doc?returnbody=true>

now, I understand why that is the way that it is -- the Riak guys are
simultaneously very proud of their product, and also extremely technical --
but it's missing the middle ground. The middle ground is where you explain the
characteristics of the system in non-marketing terms, describe what it's good
at and what one could reasonably expect it to do, and describe also where it
fails horribly and what you should not try to use it for. Once that's been
outlined, _then_ bring out X-Riak-Vclock.

By comparison, e.g., redis.io has simple pages describing every command in the
system and, critically, the associated algorithmic complexity and discussion
of likely issues and problems. It describes what performance expectations are
likely to be achieved on commodity hardware. And it allows you to test out
your thoughts in real time right there on the page.

Personally, I have very little idea how, e.g., Riak compares to Redis. And I
built Erlang and Riak from source and did the tutorial. I don't have a sense
for how many ops/sec Riak can manage, what the equivalent to sinterstore and
sunion look like, what the minimal real architecture for a production box
setup should be. And a lot of the tutorial fills me with The Fear.

Which brings me to the last point: The Fear of Riak is pretty strong, and
that's because very few people are running erlang on purpose in production. A
lot of developers (and certainly devops people) have a hard enough time with
their existing stack, without bringing on an entirely alien software, logging,
alerting, monitoring, managing, and developing stack, and trying to understand
how to reason about it. And, even those developers who can work up the courage
to dive into erlang will have to deal with the fact that they will be novices
for quite a long time on an extremely technical product that is designed to be
at the core of their world.

~~~
nirvana
I think the point you bring up about comparing the ops/sec between Riak and
Redis is very interesting in multiple ways:

1\. Riak has an in memory database mode (its one of the backends you might
choose and you can run multiple backends simultaneously) but most people don't
know about it- and I bet you were thinking of a comparison of disk based
(though I think of Redis as an in memory database)

2\. Its a typical expected question, but it also belies a profound ignorance
about the nature of scalability (sorry, not calling you ignorant, just saying
the culture of technologists is kinda ignorant.).... redis exists only on one
node, while Riak is distributed.

Redis on one node vs Riak on 100 nodes is going to be one hell of a comparison
in favor of Riak!

But everywhere you look people are doing benchmarks on single nodes. MongoDB
doesn't scale in a homogenous distributed way, but people think its faster
than Riak because its single node performance is higher (I presume.)

3\. The people who are making these decisions do not understand what they are
doing, I think. People who are afraid of Erlang because they have trouble
managing their existing stack is like being afraid of a Volvo because their
current car is unsafe. Stability and manageability is erlang's hallmark, but
most people are kinda ignorant of this. (though of course it does work a
different way than typical software.)

Not to say your points are not good, they are, but that I think there's a lot
of education that is needed, and thus there is a gap that riak has to bridge.

How do you show your superiority to people whose prejudices or
misunderstandings make them unable to recognize it?

~~~
moe
_How do you show your superiority to people whose prejudices or
misunderstandings make them unable to recognize it?_

By drawing a handful of very simple diagrams.

"This is what a database that you may know looks like". "And this is how our
database looks in comparison".

And by elaborating with a HTML <table>: "Our Database" vs "Their databases".

It's not rocket science, really.

------
DavidAbrams
Task 1: Tell us what it is.

Task 2: Tell us why it's better than Redis.

Meanwhile, at least they spell "key/value" correctly (it's not "key-value").

