
Massive CouchDB Brain Dump - rayvega
http://blog.mattwoodward.com/massive-couchdb-brain-dump
======
duck
I've never thought about posting one of the many brain dumps I put in txt
files because I figured it would only make sense to me in such a rough
draft... but after reading this I will have to start. Very helpful overview of
all aspects of CouchDB / document based storage.

------
po
"if you need a change in your schema, it's dead simple to do--just start using
the new schema if you don't care that the old documents don't have the new
field, you don't have to worry about them"

For me the scary part is that I usually _do_ care. How do you stop caring
about this? Doesn't this mean your application code starts to get populated
with if/then/else? Is there a best-practice for backfilling data when the
(effective) schema changes?

~~~
papaf
I've seen many systems that use relational databases be patched up with if
(null)/then/else in the application where fields have been added and the
change was not important enough to justify a data migration. I wouldn't say
this practice is OK just that it happens quite a bit in my experience.

If I was using couchdb I would assume that all but the most fundamental fields
were optional and have the application act accordingly.

~~~
po
Sure... I've seen that a ton. I've had to resort to writing a bunch of code
like that as well.

As an example of what I mean, there are some places where you can simplify by
knowing that a relationship is 1:1 instead of 1:n because based on the current
schema it's _impossible_ for it not to be 1:1. You don't need an if/then or
any logic in your application when you access that field.

This means if you want to make it 1:n you have to change the schema and deal
with all of the code that will break due to that change. Once that is done,
then your application can input lists of items into the storage.

In the document oriented approach, is seems like you (or any code touching the
document type) just start inputting lists. This makes changing the schema a
non-issue. It is up to the code pulling data to always assume the data could
be null, single or a list of items.

In the couchDB model it seems the schema is implicitly defined by the behavior
of the application. I don't really know what that means for the application.
I'm sure the tradeoff is worth it in a lot of cases, I'm just saying that this
is one of the parts where my imagination is failing me and the unknown-
unknowns are great.

It seems like here the skills of writing a great REST API are more relevant
than traditional data modeling skills.

------
jmah
I found Damien Katz's talk about quitting and risking it on CouchDB to be
really great (from resources): <http://www.infoq.com/presentations/katz-
couchdb-and-me>

------
andrewvc
I'd be interested to hear a little bit about choosing between CouchDB,
MongodB, and others. Couch and Mongo are the two I'm most interested in, but I
haven't had the time to really look at either closely.

~~~
nicpottier
I'd definitely look at Mongo first. Couch is ridiculously simple, too much so.
It doesn't do locking because it versions every single record, so if you do
updates very frequently you are going to have to clean your DB really often.
Their queries are less flexible than Mongo's as well.

Couch is interesting in a way, but I feel like the set of problems that fit it
well is far smaller than Mongo's. (and that that set is far smaller than SQL
still, but that's another matter)

Mongo is also way way faster, though the API isn't as 'cute' by far. I feel
like the great web UI for couch, along with their heavy javascript/json usage
is the main reason it gets so much traction.

~~~
andrewvc
Thanks for the helpful info, I do still have one reservation about mongo
however.

There is one thing that couch does to better than mongo, and thats single
server durability. You can pretty much kill -9 couch and it won't care (that's
pretty much how it stops itself).

Now, I understand mongodb's stance on why you should be running multiple
servers for your high performance app, but for a lot of apps that don't need
HA that's just overkill and a pain to configure (and extra $$$ for more
hardware). In fact, I'd go so far as to say thats the case for most apps. If I
recall they're working on it at the moment though.

~~~
lurkerperpetual
They have a syncdelay parameter which you can tweak to flush to disk as often
as you want <http://www.mongodb.org/display/DOCS/Durability+and+Repair>

~~~
andrewvc
that's all I needed to hear, I'll likely go with mongodb for my next project
now.

