
Canonical drops CouchDB because they were unable to make it scale up - patrickaljord
http://linux.slashdot.org/story/11/11/22/171228/canonical-drops-couchdb-from-ubuntu-one
======
Argorak
I had the chance to speak to one of the Couchbase guys about Ubuntu One-like
systems at Couchconf, and gained some (limited) knowledge. Be aware that some
of this is guesswork, as I don't know Ubuntu One all to well.

First things first: I think the failure of "really large systems" does not
mean that the underlying technology is bad - most likely, it was a wrong pick
for all the specific cases that this product needs. The number of variables is
just so high that what looks like a good pick first, is bad in hindsight. As
far as I know, Ubuntu One is one of the largest CouchDB based systems. (Zynga
being the other)

As far as I learned, it is an Authorization/Authentication issue, more than a
performance issue. So, the proposed solution for such systems is to use (at
least) one CouchDB database per user - CouchDB supports authentication per
Database, so this usage is perfectly fine. So each user gets n Databases that
are named after some clever scheme (lets say "{username}/{contacts}", couchdb
allows "/" in database names). Actually, as far as I learned, CouchDB handles
this without major problems. The data model is also great: you can replicate
data between all the users cell phones, desktops etc. just by replicating the
correct database. So far, no problems.

The problem is sharing. So lets say, I want to share my business contacts with
my co-founder. CouchDB only allows database-level authentication, so once I
give my co-founder access, he will see all of my contacts. This includes the
Replication API: once I have access, I can basically slurp the whole database
(filters cannot be enforced). So, as you can manage a whole universe of
databases, the solution here is simple: setup another database, say
"{myuser}-{mycofounder}/shared_contacts", give both of us access and setup
filtered push-replication in my database to the other database. So, now the
source replicator can be trusted to be mine. So, suddenly, my nice "To the
Cloud? Out of the door, left line, one database each"-system turns into a
really big graph where every relationship between datasets is a database
itself, along with many processes caring for moving data along those lines.
Also, once my data is shared with my co-founder, its is basically public, as I
readily copied it to him - deletion becomes a messy topic. (As long as the
replication chain is intact, deletions are propagated, but honestly: who wants
to support such a system?)

So, along those lines, one big problem becomes obvious: CouchDB does not
support document-level authentication. Considering the data model of CouchDB
(basically, views are aggregations of the global document store), this is also
a hard thing to do, because it means that every view has to be filtered per
user. On the upside: the Couchbase also said that they would really like to
support it.

~~~
skrebbel
Cool story, and seems to make a lot of sense.

It brings up one thing I've been wondering about for a while already - I've
always found that CouchDB's authentication feels more "bolted on" than
anything else, and this is a nice use case where it doesn't fit.

I love Couch, but I'd have loved for the authentication scheme to be an
entirely separate layer, more customizable and programmable, less "one per
database, period.".

Such a design would probably cause all kinds of other problems again, though,
but I wonder to what extent this has been thought through.

------
mbreese
Here's the comment that John Lenton from Canonical made about this on /.

[http://linux.slashdot.org/comments.pl?sid=2539244&cid=38...](http://linux.slashdot.org/comments.pl?sid=2539244&cid=38138754)

------
mcs
I'm curious if they tried to employ BigCouch, the dynamo-esque fork of
CouchDB.

~~~
mark_l_watson
+1 - good idea. BigCouch is very nice, and based on Cloundant's experiences
with it (they wrote it), it seems to scale to handle very large data
customers.

~~~
jwhitlark
In my experience, bigcouch isn't mature enough for prime time, yet. Admin of
it can get very tricky.

~~~
itaborai83
I'm curious. Would be willing to elaborate a little more?

------
j45
So.... old technologies suck because they're stable and scale, and new
technologies suck because they're fun but aren't stable when they scale?

------
mattadams
This might sound like a big deal but it shouldn't be a headliner. Whatever the
details (and we don't have many) companies use and drop technologies on a
fairly regular basis. Sometimes it's a good fit and sometimes it's not.
Obviously in this case Couch didn't do everything Canonical needed (I think
someone else actually pointed out that Canonical mentioned that their needs
were unique).

For every Canonical that drops Couch there will be 10s of other companies that
adopt it because it's a good fit there. All this should reinforce is that
every tool has a good fit and that smart implementors pick the one that jives
best or moves to a better one when the opportunity presents itself.

------
bconway
Original link: [http://www.h-online.com/open/news/item/Canonical-dropping-
Co...](http://www.h-online.com/open/news/item/Canonical-dropping-CouchDB-from-
Ubuntu-One-1382809.html)

------
nirvana
This article is a good example of how myths are created and engineering
ignorance is perpetuated.

CouchDB doesn't "scale"? If you're trying to "scale" with it, you don't know
what you're doing in the first place. CouchDB federates. That's a wholly
different thing. And in terms of federated databases, I challenge anyone to
come up with one as good or better than CouchDB. (And if you do, it will be
news to me, and I'll thank you profusely!)

If its not obvious to you how to scale a federated database, then its not
couchDB that can't scale, its you. (which is ok, everyone has to learn
sometime, just don't put forth your lack of knowledge as proof of a weakness
in an open source product!)

Further, rather than just saying "We've got this great new invention-- a
better technology, and we're moving to that!" the message seems to be "we are
just wanting to re-invent the wheel, so to justify it, we have to make a
negative claim about couchDB.

Now, I expect some particular databases[1] fans to tell us, in the future,
that "couchDB doesn't scale".

Ironically, they're punting on CouchDB to use, among other possibilities,
SQLite. To claim that "Scaling" is the problem is .... bad engineering form.

CouchDB is great if you want to federate, have databases across the planet
talking to each other and keeping in sync (its almost a turnkey CDN in a way),
want to run a noSQL DB on a mobile device, etc.

MongoDB is great if you care about SQL and single node performance and its
complex distribution mechanism works for you.

IF you want "scale" your choices are Riak or CouchDB-- for "scale" where
homogenous distributed servers are the best solution.

And of course there's Cassandra and graph databases, etc. which provide
different solutions to scalability.

IF you're serious about scalability, I strongly recommend people look at and
choose Riak. I don't think anything out there touches it-- at least for the
type of data I need. Cassandra and what I consider the "more complicated"
alternatives might fit your particular problem type well. And if you think
that its silly of me to recommend Riak then this is probably the case for you.
But in terms of general databases, Riak seems to be pulling away from the
pack. IF you're a fan of CouchDB, then BigCouch is a dynamo/Riak like version
of it that I understand to be quite good. Plus, since its based on CouchDB, if
the CouchDB way of doing queries (which is distinctly different from Riak)
fits your way of working, then BigCouch deserves a look.

But please, don't ever say "couchDB doesn't scale". If you do, really its that
you don't scale, CouchDB is fine.

[1] In an earlier edit I named a database. That was a mistake, not only is it
bad form, I don't think that my characterization is appropriate at this time,
as that database's fans are not as rabid as I imply. Apologies.

~~~
PanMan
I have read quite some things on the different larger key-value stores,
especially on how they scale. And what I have seen I really like Riak as well.
However, we have been setting it up over the last few weeks, and sofar it's
less stable than I have hoped/expected: we have had nodes crash for no
apparent reason. I hope we can resolve them, as I really like the model,
especially the horizontal scaling, but it must be stable to use...

~~~
arielweisberg
There are worse things than crashing. Like soldiering on and corrupting data.

The most unstable clustered database I have ever come across was suffering
from broken TCP drivers. Never assume a cause until you have actually tracked
it down.

I agree with itaborai83 that individual node crashes shouldn't be as big deal
with the consistency model and redundancy offered by Riak. That is one reason
you might go with Riak over something that offers stronger consistency, but is
more picky about node crashes and recovery.

------
ecommando
Another one bites the dust, another one bites the dust, bamp bamp, another one
bites the dust.

~~~
va_coder
Mr. Ellison, you're the MS of databases. Don't rejoice too much.

~~~
ecommando
Awww.. you mad... I'm so sick of hearing about all these "new" technologies
that are so revolutionary, but can't hold a candle to postgres and memcache.

Despite what the brochures tell you, a "degree" from DeVry and 3 hours in a
Ruby book doesn't make an architect, and this is one of several "revolutionary
technologies", like Ruby, that won't scale and will wither and die.

~~~
edu
Exactly, how does a programming language scale or not-scale? Ruby might be
slower and hungrier (memory) than other languages but it's the applications
that might or might not-scale.

~~~
j45
Assuming you have the most efficient code possible, the efficiency and speed
of the interpreter can still affect the ability to scale an app.

Another thought is, the interpreter may or may not have scaling / clustering
available, be it through a built-in functionality or an external queue or
something.

Sometimes we have to code around interpreter/speed issues. I work in the JVM
languages sometimes. I have to do things differently directly in Java if a
particular JVM language I use isn't cutting it. Same goes for ".NET", which is
over 30 some languages.

