

My Thoughts on NoSQL - twampss
http://www.eflorenzano.com/blog/post/my-thoughts-nosql/

======
jokull
Some shaky statements in here. CouchDB should not be compared to these.
Description of Cassandra is misleading.

He disses Tokyo Cabinet table feature which is in fact _not_ like typical
tabular data but behaves more like a second level hash (hash of hashes) —
which he raves about as an awesome feature of other storages.

"None of these really set CouchDB apart, because you could just encode JSON
data and store it in Tokyo Cabinet, you can maintain your own indexes of data
fairly easily, and you can build a simple REST API in a matter of days, if not
hours."

Except you'd be missing the querying features in Tokyo Cabinet. He also misses
the fact that Cabinet has real master-to-master replication.

Redis' batch-disk-dumping-from-memory strategy is interesting and
theoretically allows it memcached speed with persistance. I'll be looking out
for this one.

Finally the statement on using each one for a different use case is crazy.
Feature overlap between Redis, Cassandra and Tokyo C/T is too big.

~~~
ericflo
> CouchDB should not be compared to these.

Does CouchDB use SQL? No. So it may be mentioned in a post about "NoSQL".

> Description of Cassandra is misleading.

Defend your claim.

> He also misses the fact that Cabinet has real master-to-master replication.

When you start to need real master-to-master replication, the kind that Tokyo
Tyrant offer is simply insufficient. It has no versioning, no way to resolve
inconsistencies, no quorum write/read, etc. By the time you're at that scale,
you had better be looking at Cassandra, HBase, or Hypertable IMO.

> Feature overlap between Redis, Cassandra and Tokyo C/T is too big.

Maybe I can buy that Redis and Tokyo Cabinet/Tyrant have too great of feature
overlap (and note that I'm only interested in Redis as a high performance
count server), but Cassandra is in a completely different realm with
completely different sets of tradeoffs. To suggest that it is similar to the
others is simply naive.

~~~
lucifer
CouchDB is a document centric system. Its really a different beast,
altogether. Its closer to a CMS than a DB.

~~~
jbellis
No. While you could build a CMS with couchdb, it definitely isn't one out of
the box.

All "document-oriented" means in the Couch context is that the db's data model
is "you have a key and a collection of attributes associated with it."

~~~
lucifer
[http://stackoverflow.com/questions/167716/what-is-the-
differ...](http://stackoverflow.com/questions/167716/what-is-the-difference-
between-couchdb-and-lotus-notes)

~~~
jbellis
You're confusing "Notes Domino" with "a CMS" which it is not. From your link:

> CouchDB shares a basic concept of a document oriented database with views
> that Notes Domino has.

> In this model "documents" are just arbitrary collections of values that are
> stored some how. In CouchDB the documents are JSON objects of arbitrary
> complexity. In Notes the values are simple name value pairs, where the
> values can be strings, numbers, dates or arrays of those.

------
Maro
He mentions Tokyo as a successor to BDB, but I'm not sure in what respects
that is true. I've used BDB in Keyspace and looked at Tokyo as an alternative,
and they're not at all equivalent. BDB has tons more features than Tokyo. For
example, concurrency in Tokyo is limited to one writer and many readers, which
is equivalent to the so-called Concurrent Data Store (CDS) in BDB, but BDB
also has a Transactional Data Store. Roughly, Tokyo and BDB's CDS use a global
lock per db, where the TDS is more like a real database with finer-grained
locks (and it's ACID). Also, the level of documentation is ~100:1 in favor of
BDB. Nevertheless, I have many gripes with BDB which I'll write down some day
soon.

He mentions Redis, but fails to mention the biggest limitation, which is that
the entire dataset must fit into memory. To be fair, Redis seems to hit a
sweetspot in terms of user-requirements --- lots of cases where data fits into
memory. But I still wouldn't recommend it to a client, because what if his
data storage needs grow? A few GBs are easily filled up, esp. if you're a web
programmer and liberal about what you store.

CouchDB is mentioned. Having seen a recent video of Damien Katz where he
describes how CouchDB came along, I feel it's more of an "MS Access for
Web2.0", which is no surprise given DK worked on Lotus Notes. It's a
lightweight integrated app+db server for Javascript programmers. It's not
meant to be used at a Facebook scale operation.

This article, mentioning all these KV-stores together without differentiating
them is really confusing and adds to the general melting pot of "NoSQL", which
is unfortunate.

~~~
neilc
_Nevertheless, I have many gripes with BDB which I'll write down some day
soon_

I'm considering BDB as the data storage layer for a new project, so I'd be
very curious to hear your experiences with it.

------
chime
He missed my favorite: <http://www.persvr.org/>

CouchDB requires custom views for each field you want to optimally query.
Persevere (using the JavaScriptDB) does that natively and smartly for all
fields: [http://www.sitepen.com/blog/2009/04/20/javascriptdb-
persever...](http://www.sitepen.com/blog/2009/04/20/javascriptdb-perseveres-
new-high-performance-storage-engine/) (see section: Adaptive On-Demand
Concurrent Indexing)

The biggest benefit of NoSQL dbs is the ability to add properties (i.e.
columns) on the fly. Querying these properties should not be an O(n)
operation. Unless I am completely mistaken, it is in CouchDB unless you write
a custom view. Persevere doesn't require that and supports most common methods
of querying ( <http://docs.persvr.org/documentation/jsonquery> ) If you query
a field, it will index it and if you query it often it will maintain the index
and update it regularly.

I'm just starting on a project that uses Persevere and already I'm quite
impressed by it. It may not be as famous as CouchDB/Cassandra yet but it
offers a lot of very interesting features including transaction support and
nested objects with referencing.

~~~
gaius
_The biggest benefit of NoSQL dbs is the ability to add properties (i.e.
columns) on the fly_

Are you claiming that you can't add a column on the fly to a table in a
relational database? Because that my friend is complete gibberish.

~~~
req2
I think he's saying that if you have a large database in an rdb, adding the
new column 'live' can take a prohibitively long time and kill access to the
server, whereas a NoSQL db can add the new column without any large initial
expensive operations on the db.

(Disclaimer: not a db guy.)

~~~
gaius
Once again, it is shown that the NoSQL crowd's complaints about relational
databases in general are in fact only complaints about MySQL in particular.

~~~
sailormoon
Generalise much? I agree that what the parent complains about is a MySQL
problem but it's pretty disingenuous to just dismiss the whole "NoSQL" idea
based on one misinformed comment on HN.

~~~
gaius
Nah, this keeps coming up on HN, and 99% of the time, the thing that "SQL
databases" just can't do is something that DB2, Oracle, Sybase, Informix et al
have done for 15 years or more.

~~~
sailormoon
Well, I don't know what you've been reading here, but there are plenty of
things you can do with one of the systems listed in that article that you
can't do, or can't do well/easily/cheaply, with any conventional RDBMS. The
listed systems are very disparate, but here's a couple of examples:

In Redis you can read or write ~100k operations a second. You can do set
operations, add, rem, member, list .. way cool if you can stomach the memory
use. Lazy sync to disk. Hard to think of anything that could beat it for
collecting stats or totals etc for high volume data streams.

In CouchDB you can do M/R style indexed views, like word counts. You can
access the DB directly from HTTP, no special drivers - the DB is implemented
as a web server. You can authorise via OAuth (soon), secure via TLS and proxy
through squid. You can access, and replicate, from the other side of the
world.

Tokyo Cabinet is over 10 times faster than MySQL as a simple hashtable, like,
say, a session store, which is the first place most sites start having trouble
with their DB. Tiny, bare bones, drivers for all major languages, network
accessible. A no-brainer for simple storage IMO.

And all of these are completely free, unlike the cripplingly expensive
commercial alternatives you mentioned - especially important for startups.

So .. basically I completely disagree, there's a lot of good things about
these alternative DBs. Don't discount them just because you heard a few dumb
ill-informed comments on sites like this, most people involved in non-RDBMS DB
implementations know exactly what they're doing.

------
geebee
I'm going in the opposite direction from NoSQL. I'd like as much sql as
possible. I'd enjoy using a language with an in memory database, one where you
could create and discard databases as easily as a set, list, or map. To me,
some of the issue here is that people are rebelling against a language (sql)
related to a branch of math (relational algebra) because they're dissatisfied
with the way SQL persists and scales (which is debatable, but seems to have
some merit).

So in this case, I think "nosql" is a misnomer.

~~~
kragen
You can do that pretty easily with SQLite:

    
    
            conn = sqlite3.connect(':memory:')
            conn.execute('''
            create table visits (plano text,
                                 "column" text,
                                 "row" text,
                                 route text)
            ''')
            conn.execute('''
            create table square (plano text,
                                 "column" text,
                                 "row" text,
                                 x int,
                                 y int)
            ''')
    
            for line in infile:
                if line.startswith('#'):
                    continue
                plano, column, row, routes = line.split(',')
                column = column.lower()
                x, y = square_coords(plano, column, row)
                conn.execute('''insert into square (plano, "column", "row", x, y)
                                            values (?,     ?,        ?,     ?, ?)
                                            ''',
                                                  (plano, column,   row,    x, y))
    

However, the trouble is that SQL is still kind of verbose, not type-safe, etc.
I've been writing up some thoughts on how to do better at
<http://canonical.org/~kragen/binary-relations.html> but I have a long way to
go still.

~~~
geebee
That's pretty cool. The thing is, I'm actually not looking for a "conn"
(connection" of any sort). I'm thinking of database as a container class. Kind
of like...

db = New db

db.addTable(colname, list[names], primaryKey) db.addTable(joinTable,
foreignKey, colname, list[names])

Map a = db.(select col a, col b from ...)

db = nil

and so forth... in other words, we're going the opposite direction from
database as storage. Instead, we're going to database as something that is as
in-memory as a hash (probably more likely to be persisted at some point, but
not necessarily).

Sometimes, I see classes created that are bending over backward to recreate a
bit of sql, and the model and logic really only exist for one purpose - sql-
like methods on the data. People rarely write their own list, set or map
classes, but often they will essentially recreate a bug-ridden 1/2 of common
sql in their app...

Anyway, looks like you've been thinking about this for a while, I'll
definitely take a look at the link. Thanks for posting it.

~~~
kragen
It's not connected to anything. ":memory:" means "something that is as in-
memory as a hash".

~~~
geebee
That's interesting, looks like I have something I need to read about.

------
daleharvey
I dont agree with the 99.9%, in the slightest

I dont dislike RDBMS because of the problems scaling them, I dislike them
because of the impedance mismatch, because I dont like the sql language and
because I am forced into creating a data model as the starting point of my
application when I might not have a clue about what model I actually want.

I am much happier now that I have dropped mysql and using mnesia or couchdb
most of the time, Its not about "needing more than mysql/postgre, its just
about striving for something thats a better fit.

------
billswift
I don't use databases myself, but the discussions over the last few weeks
about "no-SQL" have gotten me curious and I have been nibbling at Codd's "The
Relational Model for Database Management" since I had it laying around, and it
is mostly theory I am interested in right now. He spent a great deal of time
complaining that no vendor actually produced a database that was at all close
to the relational model.

My question is whether any DB products, especially any free software, matches
the relational model any better than the ones Codd was complaining about
nearly two decades ago? Which databases, if any, are the closest to being
really relational, not just sort of?

------
va_coder
I find the new datastores interesting but I have my doubts.

Has anyone ever worked on a long project (1 year or more) using an XML storage
or an Object datastore. They can be a real nightmare.

One reason: Users will always add new requirements:

* We want another application, with different programmers, to access this data. When you have problems good luck defending your choice of datastore. And although the problems aren't related to that datastore, the programmers on the other team may place the blame on you. * We want to query tables that didn't have relationships in the beginning of this project, but do now * We want to generate reports: Max, Min, Count, etc.

------
darshan
I'm surprised not to see MongoDB mentioned; it's the document-oriented
database I find most intriguing:

<http://www.mongodb.org>

I don't know enough about its alternatives to offer meaningful comparisons,
but things I really like about MongoDB in the brief time I've played with it:

\- It's trivial to get started with.

\- Everything just makes sense.

\- No schemata to deal with.

\- Excellent Ruby support right out of the box.

For the very simple use cases I tend to deal with, I've always defaulted to
throwing my data into MySQL or SQLite, and I think MongoDB will replace them
for me.

~~~
eldenbishop
MongoDB is very nice. It's basically a bunch of tables with a binary form of
JSON they call BSON instead of "rows". You can add an index to an arbitrary
JSON "path" and it literally just works. I just wish they had broken out the
layers a bit better. You are stuck with their default persistence for example
which is very fast but not particularly safe. I'd sacrifice some speed for
safer writes on certain collections.

And I am going to repeat the parent's comment that in MongoDB everything "just
makes sense". It's just a big bucket with indexes. Everything you need,
nothing you don't. So refreshing.

------
arohner
I take a different stance as well. I want a non-SQL RDBMS. I want transactions
and declarative queries, etc, just without the brain-dead SQL syntax and
string manipulation to write a query.

~~~
mahmud
See my long winded reply to someone else in this thread. _We_ have this, it's
called CLSQL (the OSS version) or CommonSQL (the commercial version; each more
identical than the other, btw ;-)

<http://clsql.b9.com/manual/>

Transactions are in chapter 5, unbrain-dead syntax, in both functional and OO
versions are there. A killer query syntax, in Lispy prefix is there. It uses
your database backend of choice (name any 8 major RDBMSes and it has support
for them, it also ODBC for the not-so-major ones)

It's a clean beautiful API, learn it, use it, clone it for your favorite
language[1], etc.

\--

[1] Clojure guys, you _need_ this; throw a Lispy API over Java's orgy of
database interfaces at once. Get your major API defined and "standardized"
ASAP, otherwise clojure will be assimilated into Java, culturally, and a
generation of misguided programmers will be writing Java code with
parenthesis.

------
vicaya
99.9% web sites are CRUD, which don't really need full features of RDBMS,
which is both an overkill and a bottleneck for scalability.

NoSQL is ironic word in that both Hive and Pig are gonna introduce an ANSI SQL
dialect for scalable data queries :)

The NoSQL movement is really a NoBlindJoinsViaRDBMS movement, but the latter
is less catchy.

------
aceofspades19
Why can't everyone just use the best tool for the job and be done with it?

