
Why CouchDB? - bitdiddle
http://books.couchdb.org/relax/why-couchdb
======
chime
I've read tons of documents, articles, and how-tos on schema-free non-RDBMS
DBs like SimpleDB, CouchDB, BigTable, DBSlayer etc. by now and I agree they
seem very interesting and powerful. However, I just don't see how having each
document (e.g. invoice) have it's own field structure would be a good thing.
Every database I've ever worked on is structured and every new document has
almost always the same data as every other document. That is not because of
the limitations of the database but rather the nature of the business. Every
invoice should have the same exact fields as every other invoice because
employees are trained to fill and understand the implications of the data in
each field. This has a lot more to do with business process efficiency and
nothing to do with computers. RDBMS just facilitate that business rigidity
more naturally. Of course, nobody's saying I can't do this in CouchDB but then
why use CouchDB if I don't need the fluid document structure?

Also, the biggest question I have is, how powerful are the dynamic views? I
have tables with 10m rows and often serve 100+ select queries per second with
joins/where/having/order/group-by/full-text-search functions. The business
goal is not to make group-by queries but to summarize data in real-time as the
analysts want e.g. sales by product per day or shipments by employee per week.
How fast would a view that performs an equivalent function work on CouchDB?
Can I optimize it somehow because I don't see much mention of index colums
anywhere?

What bugs me most is that when dealing with databases like CouchDB and
SimpleDB, the "best practice" advice I hear is to just write summary data and
search indexes on my own. That is every time a new invoice is generated, add
+1 to the summary.count.invoice record, add +300 to the balance.customer.acme-
corp record and -5 to inventory.widget record. And don't forget to add every
unique word in the document to the search index table. While that would be
great if I knew the exact business needs ahead of the time but in reality,
first you capture all the data you can capture and then generate reports based
on the needs of the user.

I fully understand all the wonderful things that CouchDB and SimpleDB provide
like statelessness, replication, caching, and scalability. I know they are not
SQL databases and will not offer drop-in replacements. However, I just don't
see how I can use them for any of the hundreds of database projects I've been
involved in over past decade and a half as easily as any decent SQL server.

Here's an article I want to read someday. "How CouchDB will be a better
solution when you try to do X" where X is something real-world database folks
do on a regular basis and X is not a blog engine, recipe manager, or address
book. I have tens of GB of data that I'd rather store in SimpleDB than host
myself on MySQL or Postgres. However, what I make from all of these document-
oriented databases is that they're wonderful, they will solve all of my
scalability and concurrency problems, and they will require me to reimplement
all of the necessary grouping and indexing features of a typical RDBMS myself.

~~~
bitdiddle
business evolves. This is why often the focus is on process rather than
objects. In reality there are no real classes of things, just instances, each
distinct. The key word you mention is "structured". Sure, if you're building
apps to support the DMV where documents and forms change once every 30 years
and people work for life sitting there doing data entry on screens that look
like those forms then it's a safe bet you can design a relational schema that
supports this. The same holds for banking and other transactional settings.
The relational approach is powerful, with an algebra to back it up and support
a query language.

So where do you put the logic, business rules, integrity constraints that
govern this data? In stored procedures, key constraints? Or does it leak into
the application code, especially if the code is OO. Using OOP how do the
objects map to the tables? Some high level abstraction like hibernate that
presumes to make that declarative and automatic? Now what if the schema is not
so static, suppose it's dynamic. How does it evolve?

I think what many have recognized is that there is considerable overlap in the
db world with app servers and web servers. The web is a much more dynamic
place and increasingly I think relational databases are used more for just
simple storage.

We've all used RDBMS for years and they've had loads of PhD theses put into
making them what they are. Sometimes a different view is helpful. CouchDB has
a dirt simple REST based API. It uses JSON for communication and Javascript in
the database. It lets you store almost anything you want and it supports
replication. For a real world scenario think of say something like a web-based
lotus notes that supports on and off line use and collaboration.

Another very interesting aspect of CouchDB is it's choice of implementation
language, Erlang. What it inherits from OTP is readily seen in the small
amount of code needed to implement it. Moreover when it comes to robustness
Erlang is frightening in how rock solid it is.

For what it's worth I think it's a keeper

------
jchrisa
CouchDB is different from a relational DB in so many ways its almost silly to
be doing the comparison. However, many people use and understand SQL, so we
must show them what differences to expect.

Your argument about normalization being more realistic has its place. CouchDB
models documents, which by necessity are somewhat complete, even when taken
from their original context. Document modeling is radically different from
relational modeling.

In document modeling, more emphasis is placed on the document life-cycle as
opposed to inter-record normalization (relations). And in document modeling
the client has a greater responsibility for saving data that is useful to the
user, rather than asking the relational store to reconstruct the objects.

I think there are a lot of uses for CouchDB alongside relational DBs, but I'm
especially excited to see what kinds of crazy things people figure out how to
do with p2p offline replication.

------
shaunxcode
I can see the point in terms of avoiding table locking etc. but I still don't
see it (not that it is pretending to be) as a silver bullet - surely you would
still want to maintain your relational data structure and then have it write
to couch db when appropriate for read access, almost like a denormalized view
onto your relational data.

~~~
janl
CouchDB is no silver bullet. I hope we never give the impression that we think
it is.

On your point: You can still manage relations in CouchDB, they are just
implicit.

------
illumen
It's not installed, and code I wrote for it a year ago does not work anymore.

~~~
illumen
To explain my comment...

1\. Couchdb is not installed and available everywhere like mysql.

2\. The API is not stable (they haven't reached 1.0 yet), so code you wrote
for it a year ago does not work on versions today. Code you write for it today
will likely not work in two years time.

This is 'why not couch db' for me.

~~~
janl
These are fair points, but what do you expect from a project in alpha state
not nearly half the age of MySQL? :)

~~~
Harkins
If they're trying to sell themselves as a reasonable alternative (let alone
superior), yes, I do.

~~~
nslater
Wait, what? Where does it say we're an alternative to relational databases, or
MySQL?

~~~
illumen
The whole article is a comparison to relational databases.

~~~
caudicus
I think they're trying to say how they differ from relational databases since
most people know and understand relational databases. It's like telling
someone in America how cricket is played and using baseball as a base point to
explain it.

~~~
illumen
Yes. Saying how something differs is called a comparison.

~~~
caudicus
In response to "Wait, what? Where does it say we're an alternative to
relational databases, or MySQL?" you responded: "The whole article is a
comparison to relational databases."

Sure sounded to me like that was an answer to his question, and you were
saying the author is saying CouchDB is an alternative to relational databases
by saying it is a comparison.

I obviously wasn't trying to define what a comparison is, I was just giving a
reason FOR this particular comparison.

You need not be so snide, sir.

------
rantfoil
Does anyone know of large production websites that use CouchDB extensively?

~~~
nslater
<http://wiki.apache.org/couchdb/CouchDB_in_the_wild>

~~~
rantfoil
Hm, so I guess the answer is mostly no. =/

The biggest site on there is wego.com. Great site, and I think they open
sourced much of their Rails CouchDB integration. But only 30k unique visits
per month.

~~~
nslater
It's still very early days. If you want do do a very large production setup of
CouchDB, you will be sitting on the bleeding edge of community experience.
That's not for everyone, of course.

------
newt0311
One point from the article: RDBMSs do not represent data in the same way as in
the real world.

This is a horrible misconception. In fact, CouchDB et al. _store_ data like we
do in the real world but they certainly do not represent data as it is in the
real world. That crown goes to RDBMSs. Suppose we have two physical records of
a company's logo. Then the company logo changes but only one of the records is
updated. That kind of error took place because the physical storage model (two
places) did not correspond to the data model (one entity). That is where
RDBMSs shine and CouchDB fails. RDBMSs allow data relations to be expressed
and then have the DB enforce these relations. This incurrs some overhead (and
I suspect, makes most web programmers think that RDBMSs are "old school") but
in most cases, this overhead is paid for several times over in future data
integrity.

~~~
jchrisa
oops, commented when I meant to reply. Here goes:

CouchDB is different from a relational DB in so many ways its almost silly to
be doing the comparison. However, many people use and understand SQL, so we
must show them what differences to expect.

Your argument about normalization being more realistic has its place. CouchDB
models documents, which by necessity are somewhat complete, even when taken
from their original context. Document modeling is radically different from
relational modeling.

In document modeling, more emphasis is placed on the document life-cycle as
opposed to inter-record normalization (relations). And in document modeling
the client has a greater responsibility for saving data that is useful to the
user, rather than asking the relational store to reconstruct the objects.

I think there are a lot of uses for CouchDB alongside relational DBs, but I'm
especially excited to see what kinds of crazy things people figure out how to
do with p2p offline replication.

~~~
bayareaguy
_I'm especially excited to see what kinds of crazy things people figure out
how to do with p2p offline replication_

<http://en.wikipedia.org/wiki/IBM_Lotus_Notes>

