Hacker News new | comments | show | ask | jobs | submit login

I had the chance to speak to one of the Couchbase guys about Ubuntu One-like systems at Couchconf, and gained some (limited) knowledge. Be aware that some of this is guesswork, as I don't know Ubuntu One all to well.

First things first: I think the failure of "really large systems" does not mean that the underlying technology is bad - most likely, it was a wrong pick for all the specific cases that this product needs. The number of variables is just so high that what looks like a good pick first, is bad in hindsight. As far as I know, Ubuntu One is one of the largest CouchDB based systems. (Zynga being the other)

As far as I learned, it is an Authorization/Authentication issue, more than a performance issue. So, the proposed solution for such systems is to use (at least) one CouchDB database per user - CouchDB supports authentication per Database, so this usage is perfectly fine. So each user gets n Databases that are named after some clever scheme (lets say "{username}/{contacts}", couchdb allows "/" in database names). Actually, as far as I learned, CouchDB handles this without major problems. The data model is also great: you can replicate data between all the users cell phones, desktops etc. just by replicating the correct database. So far, no problems.

The problem is sharing. So lets say, I want to share my business contacts with my co-founder. CouchDB only allows database-level authentication, so once I give my co-founder access, he will see all of my contacts. This includes the Replication API: once I have access, I can basically slurp the whole database (filters cannot be enforced). So, as you can manage a whole universe of databases, the solution here is simple: setup another database, say "{myuser}-{mycofounder}/shared_contacts", give both of us access and setup filtered push-replication in my database to the other database. So, now the source replicator can be trusted to be mine. So, suddenly, my nice "To the Cloud? Out of the door, left line, one database each"-system turns into a really big graph where every relationship between datasets is a database itself, along with many processes caring for moving data along those lines. Also, once my data is shared with my co-founder, its is basically public, as I readily copied it to him - deletion becomes a messy topic. (As long as the replication chain is intact, deletions are propagated, but honestly: who wants to support such a system?)

So, along those lines, one big problem becomes obvious: CouchDB does not support document-level authentication. Considering the data model of CouchDB (basically, views are aggregations of the global document store), this is also a hard thing to do, because it means that every view has to be filtered per user. On the upside: the Couchbase also said that they would really like to support it.

Cool story, and seems to make a lot of sense.

It brings up one thing I've been wondering about for a while already - I've always found that CouchDB's authentication feels more "bolted on" than anything else, and this is a nice use case where it doesn't fit.

I love Couch, but I'd have loved for the authentication scheme to be an entirely separate layer, more customizable and programmable, less "one per database, period.".

Such a design would probably cause all kinds of other problems again, though, but I wonder to what extent this has been thought through.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact