

Some misconceptions about NoSQL - bscofield
http://www.viget.com/extend/nosql-misconceptions/

======
sophacles
So up front: I know this question sounds like a classic flame, however I'm
honest about it, and the similarity to flame means I have yet to see an answer
that I am satisfied with, as the discussions degenerate quickly. Anyway:

What is non-relational data? I have been under the impression for a long time
that datasets all have internal relationships, be they semantic or other, and
that relational databases are systems to store and query data based on the
relationships. Even a key value store defines an explicit relationship.

When I first learned about databases, the class was about Normal Forms and
Relational algebra, with rdbms thrown in as an example. These 2 key concepts
are methods of working with data in a subtle and nuanced way -- a way strongly
based on set and/or category theory. SQL it always seemed to me to be a decent
extension to this, a DSL specifically designed to work on sets vs individual
pieces of data. In that I found it elegant. Now this doesn't mean SQL is
perfect, I get that, in a lot of cases it is clunk or worse. Nor does it mean
that the underlying datastore needs to model data to exactly follow the
relationships that describe it. And in light of CAP vs ACID views on data, a
lot of exploration of this space makes sense. None of this includes a concept
of relational vs non-relational data. In fact I view these systems as
potential new building blocks of relational sytems.

So this has turned longer than I originally thought, but badk to the main
question: what is non-relational data?

~~~
chipsy
I think it hinges on your definition of relational and where you want to cut
off "what's reasonable."

Everything in a computer's memory is related by existing in a uniform address
space.

Key-value relationships can be ordered implicitly or explicitly, either by
describing an algorithm to sort the keys, or an algorithm that creates a new
indexing by comparing the values.

Caching data creates a relationship with manually defined logic.

I think the right question is "how formalized are the relationships?" and a
related one is "how reflective is the data?" Programmers have plenty of ways
to access data both formally and informally - the informal methods are often
closer to the machine, but remove context that would help to automate and
verify the process. You can also have data that doesn't say much about its
context, and data that is very heavily cross-referenced.

NoSQL, to me, seems to be mostly about relaxing the heavy formalization of SQL
- to shortcut to simpler or even incomplete descriptions of data, and deal
with the resulting fallout later, in the same way that dynamic languages
eschew static checking. Good if your needs are simple, potentially dangerous
if your data becomes complex.

~~~
dstorrs
From what I understand, some proponents of NoSQL are even more minimalist than
that--they are fine with the idea of relations, just not with storing them in
a traditional RDBMS (e.g. MySQL, Postgres, Oracle, etc). The most convincing
argument that I see for this is traditional RDBMSs are all row-oriented and
organized via relational theory, and that a lot of data is better stored
and/or manipulated via column-oriented, or graph theory, or object oriented
systems.

~~~
jrockway
_From what I understand, some proponents of NoSQL are even more minimalist
than that--they are fine with the idea of relations, just not with storing
them in a traditional RDBMS_

I don't know about this. I would never use the word NoSQL... but I use object
databases heavily and find them a better fit than relational databases for the
majority of my work. However, I still store the physical data in a relational
database. (Key => value.)

This is beneficial in a number of ways. You already know how to administer
MySQL, you already know how to replicate MySQL, you already know how to back
up MySQL, etc. This method also affords you the opportunity to reuse the
database machinery in your application. You can extract attributes from
objects as you store them, and later index/search them with the usual database
querying infrastructure. The database does not know everything about the
objects you are storing, it just knows a few key facts that you know you want
to search on, so you can write efficient searches, but you are not limited by
the usual relational weaknesses. (The "object-ness" is stored in a structure
opaque to the database engine, like a JSON blob or something.)

As an example, consider cleaning out dead user objects that have not confirmed
their email address after a certain number of days. If you were only using an
object database, you would maintain a set of users to potentially expire. When
they confirm their email address, you remove them from this set. Fine. The
problem is the date constraint; you don't really want to scan the entire set
of potentially-expireable users to expire users, you would like to be able to
search for these objects efficiently.

That's easy to fix; when you store a user object, you can extract the
confirmation status and registration date from the object and store those in
real columns next to the opaque object data. Then you can "SELECT object_id
FROM objects WHERE registration_date < NOW - 30 days AND confirmed_email IS
NULL" and remove those objects from your system. Efficient, and it works as
you change the structure of the user object, or add subclasses, etc.

~~~
kingkongreveng_
I work with objectstore nearly everyday and wish we were using oracle or db2.
I am firmly convinced that you better have a damn good reason (like a big
heavily updated graph structure) to forgo the discipline of a normalized
relational schema.

~~~
jrockway
If you can't do it with objects, what structure do you use in-memory?

~~~
kingkongreveng_
Use objects in memory. I'm just saying the relational <-> object mapping is a
necessary evil because in the long run with complex data you'll suffer for
your sin of skipping normalization.

------
antirez
Great post, it is important to make clear that NoSQL is not just about
performances. For instance now that I'm used to Redis lists and sets I feel
strange when I'm using an SQL DB and I've a problem when to just pushing or
popping stuff is the trivial thing to do.

Of course SQL databases are very useful and a great tool for many domains. It
is also not a balanced view to think that SQL databases are to trow away.
Sometimes the table-based data model, with the querying power of SQL, is just
the way to go for many kind of problems, or in addition to a different kind of
DB.

------
ozten
The term NoSQL is catchy but wrong. Some databases with strong theory backing
them up, are fall under the NoSQL umbrella and others which are ad hoc and a
huge step backwards are too.

I think the ultimate lesson we've re-learned is to use graph, hierarchical,
and relational databases in an appropriate manner and to make engineering
tradeoffs around consistency as needed. NoSQL is a crappy name for this
lesson.

------
bayareaguy
While the "NoSQL" moniker is catchy, misconceptions are inevitable so long as
this category is defined more by what it isn't rather than what it actually
is. What is NoSQL all about aside from scalability, performance, key/value
access or support for non-relational data? This article doesn't clearly say.

------
richcollins
Steve Derkote and I have just related a new graph database with a filesystem
like REST API:

VertexDB: <http://github.com/stevedekorte/vertexdb>

