

CorrugatedIron - The New .NET Client for Riak - pharkmillups
http://corrugatediron.org/

======
equark
Is there a reason connecting needs so much boilerplate to connect compared to
Redis's Booksleeve client?

Here's Redis: <http://code.google.com/p/booksleeve/>

    
    
      using (var conn = new RedisConnection("localhost"))
      {
          conn.Open();
          conn.Set(12, "foo", "bar");
      }
    

You guys seem to suggest:

    
    
      <configSections>
        <section name="riakConfig" type="CorrugatedIron.Config.RiakClusterConfiguration, CorrugatedIron"/>
      </configSections>
      <riakConfig>
          <nodes>
              <node name="dev1" hostAddress="riak-test" pbcPort="8081" restPort="8091" poolSize="20" />
              <node name="dev2" hostAddress="riak-test" pbcPort="8082" restPort="8092" poolSize="20" />
              <node name="dev3" hostAddress="riak-test" pbcPort="8083" restPort="8093" poolSize="20" />
          </nodes>
      </riakConfig>
    

And then,

    
    
      var clusterConfig = RiakClusterConfiguration.LoadFromConfig("riakConfig");
      var cluster = new RiakCluster(clusterConfig, new RiakConnectionFactory());
      var client = cluster.CreateClient();
      var value = Client.Get("my_bucket", "my_key");

~~~
OJ
Hi equark,

Thanks for the question. Yes there are reasons why there are such differences
between the CorrugatedIron configuration and that of the project you have
linked to. To understand them, I think it's important to look at the
differences between the two services being used, and the goals of the
libraries that connect to them. Please bear in mind that I am no expert with
Redis, nor the libraries that connect to it.

Production Riak configurations are clustered and hence it makes sense to
distribute the connections across the entire cluster to avoid hammering one
node. Clusters are often hidden behind some kind of proxy or load-balander
such as HAProxy. Hiding behind a proxy is a great way of removing the need for
clients to worry about the nodes that are in the cluster and managing their
lifetimes.

In the .NET world I haven't yet seen many (or any) production configurations
where there is a proxy, such as HAProxy install, sitting between an
application and it's database. I'm sure there are many reasons for this,
inculding built-in clustering for MS SQL (the most popular choice for RDBMS
sitting behind .NET applications) or the lack of a cluster full stop.

When putting together the design for CI, we thought about what setup people
are likely to have when they reach production. We came to the conclusion that
there's a good chance that most people would be interested in reducing the
number of "moving parts" and avoid the need to put a proxy in place just to
load-balance connections across a Riak cluster. As a result, we decided to
attempt to put this into the client library itself, hence removing the need to
have a proxy installed. This doesn't, however, force you to do so. If you do
have a proxy application which is load-balancing connections to your Riak
cluster, then you simply have to modify the CI configuration so that there is
just one node in the cluster and point it to the proxy.

This design allows both rich client and web applications to connect to the
whole cluster without the need of a proxy. The "problem" with this design is
that the cluster needs to be configured, and in the .NET world that is
generally a little verbose.

The specification of the configSection is something that is required in .NET
if you want your configuration to be included in app.config or web.config
files. Forcing people to have separate config files would not make sense and
goes against convention in the .NET world. However, if you do want to have
your own configuration file outside of the usual locations, we support that
too (just pass in the full path to the config file as an extra parameter to
RiakClusterConfiguration.LoadFromConfig()).

Before embarking on the development of CI, we wanted to make efforts to reduce
the amount of management that the user had to do. By management I mean
handling the lifecycle of connections, freeing up resources, figuring out
which node to connect to, etc.

The net effect of this effort is that once a cluster has been instantiated
(which, mind you, should only be done once for the entire application's
lifetime) the user can pull client instances out of the cluster and do not
have to think at all about connection management. Getting hold of a client
becomes as simple as:

    
    
        var client = cluster.CreateClient();
    

RiakClient instances are not disposable. RiakClient API calls, behind the
scenes, use higher-order functions to avoid the leaky abstraction that is
disposable resources and hence the RiakClient object doesn't have to manage
connection lifetimes either. Get a client, use it, don't worry about cleaning
up after yourself because we have that covered already. This gets rid of a log
of fluff like:

    
    
        using(var client = ....)
        {
          client.DoStuff();
        }
    

.. and turns it into:

    
    
        client.DoStuff();
    

It might be just me, but I do prefer reducing "boilerplate" throughout the
user's codebase and having just a little more in the up-front configuration
than the other way around. This also gives us the ability to load-balance
calls across the nodes in the cluster without the user having to think about
it. Again, I think this makes the API cleaner, easier to use and reduces the
chances of boilerplate leaking out into user code.

I think comparing this design to that which you have linked to, the
RedisConnection, isn't really comparing apples with apples. But for the sake
of discussion the RedisConnection, behind the scenes, obviously has some
default values (such as port number) hidden away. We do the same, everyone but
the node name and the host address has defaults behind the scenes and could be
reduced to something as simple as <node name="foo" host="bar" />. This isn't
hidden in the sample configuration so that people can see the options that are
available to them. Also note how the user of this connection needs to know, in
code, about the host they're connecting to. Of course, this could be specified
via configuration too, but it has to go somewhere, and that will result in
boilerplate. You're also managing the connection yourself, and the users of
the client run the risk of leaking resources if they forget to clean up at the
right time. Lastly, the RedisConnection doesn't appear to be pooled (please
correct me if I'm wrong) and hence managing the number of connections to your
Redis instance is up to the API user. In the case of Redis, this might be ok,
but in production, it tends to be a good idea to limit the number of
connections to services as network resources can be expensive.

So, just to reiterate, the design goals of both of these clients are obviously
quite different. CI configuration is only done once up-front, and from there
client instances are created simply by calling `cluster.CreateClient()`,
management of resources is done for you, and it has built-in load balancing.
Very different to Booksleeve from what I can tell (again, happy to be
corrected).

I hope this helps clarify the stance and helps understand why and where the
sacrifices were made.

Thanks again for the question.

OJ (TheColonial).

~~~
equark
Thanks for the detailed response. First, I'm glad you guys are doing this.
It's great, I didn't mean to be overly negative.

I'd just lean towards having sensible defaults, so that first experience is:

var client = new RiakClient()

I just get nervous when I see what seems like should being a very simple API
using ConnectionFactory and company.

~~~
OJ
I agree with the point about the ConnectionFactory. That is something I still
don't like. It's there, at the moment, to make it easy to test certain parts
of the app. We'll look to add an overload to this so that we don't have to
specify it for non-testing scenarios.

As a quick side-node something as simple as:

    
    
        var client = new RiakClient();
    

is nice, but has to infer a _lot_ from the world. We could write a lot of code
to make sure that it's smart enough to pull all the details from the default
locations or we could let people tell us where to look. We went for the
latter, as it's much safer to be told than to pull config from the wrong spot.

Again, we'll look to improve this, particularly as we get more feedback from
people using the library.

Thanks!

------
peschkaj
To everyone on this thread: thanks for all of your input. We're already
working on the next release of CorrugatedIron and we're taking it into
account. Seriously. We want to make sure that CorrugatedIron is as easy for
you to use as it is for us to blog about it.

So, thanks already for taking a look at our code and poking at it.

