

NoSQL Data Modeling Techniques - plasma
http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques

======
bni
First off I find the NoSQL term in itself very strange. How can you say
anything intelligent about "everything that is not using SQL as a query
language"? Its like talking about NoJava, instead of talking about Ruby.

Props for a well written article with lots of nice graphs but I dont agree
with much of its content.

A few examples:

"software applications are not so often interested in in-database aggregation"

In my experience this is what 99% of business support apps are doing. Doing
this aggregation in procedural application code will only give you more code
to maintain and more bugs.

"joins are often handled at design time as opposed to relational model where
joins are handled at query execution time"

Im glad you know beforehand about your changing requirements over the next 10
years and can "design" your joins for every eventuality right now. It feels
like the exact oppisite of agile.

I also agree with the very insightful comment by Voice in the wind (comment #4
below the article)

~~~
Maro
"software applications are not so often interested in in-database aggregation"

Applications that used to be desktops apps are now moving to the web, and
these "cloud applications" are prime use-cases for NoSQL. Apps like Gmail,
Google Docs, Dropbox are cases where a NoSQL might be a better fit than SQL,
the same way Word.exe doesn't use SQL internally, instead it uses linked lists
and hashmaps [1].

"In my experience this is what 99% of business support apps are doing."

Business apps are the original, killer use-case for SQL, and here NoSQL makes
less sense, I agree.

[1] Some apps like Firefox started using SQLite for local storage, a technical
decision I don't agree with.

~~~
batista
_Applications that used to be desktops apps are now moving to the web, and
these "cloud applications" are prime use-cases for NoSQL. Apps like Gmail,
Google Docs, Dropbox are cases where a NoSQL might be a better fit than SQL_

I don't see anything in Gmail and Google Docs data usage needs that make NoSQL
a better fit for them --only the need to scale with no sharing etc.

(OTOH Dropbox might fit the case, they only need some hierarchical store a la
filesystem, with versioning).

 _the same way Word.exe doesn't use SQL internally, instead it uses linked
lists and hashmaps_

Yes, but Word.exe internally deals with just ONE document and it's data
structures, does not manage metadata for many documents and does not have to
provide aggregated info on them. That would be Windows Explorer, that manages
many .docs. And that one, MS did have a plan to make more SQL-like.

~~~
Maro
NoSQL is a better fit for Gmail because you wouldn't want one gigantic
'emails' or 'contacts' table, which is how you would model it in SQL. You also
want to cache the hell out of it, which is easier done in a NoSQL model (e.g.
if the cache layer is integrated and hence uses the same API as the persistent
stuff).

~~~
mattmanser
Huh, why wouldn't you want those tables? You seem to be confusing the need to
scale with how you would design the system.

Somewhere I worked before we used to import 100,000s of a companies email into
the project system to display them against the system. They were all in an
Emails table which just stored the body, I think there was an email header
table and an email contacts table which was relationally linked directly to
the Person table.

Worked fantastically, used to run an IMAP server off those tables for outlook
integration.

~~~
Maro
I wasn't talking about "email" as a use-case, I was talking about "Gmail".
Obviously you can serve 1000s of email users using an SQL and IMAP database.

------
einhverfr
This is an interesting article.

I am primarily a PostgreSQL guy who does all sorts of things like hierarchical
data representation in SQL. While these things have come a long way in the
past few years. This being said, the more I read about NoSQL data modelling
techniques, the more it occurs to me that some of these techniques may work
well in relational data environments where data is read-frequent/write-seldom.

In LedgerSMB (<http://www.ledgersmb.org>) we already use key-value modelling
in cases where it makes sense (system settings, and a few other things).

Currently what hierarchical stuff we are doing wouldn't benefit from the ideas
in this paper, but I wouldn't rule it out for some other things in the future.

I guess what this is reinforcing for me is that NoSQL and SQL models are not
entirely mutually exclusive.....

~~~
jandrewrogers
NoSQL is about the interface, not the implementation. NoSQL databases provide
a better impedance match out of the box for some applications.

What is often lost in the conversation is that you can do the same thing using
a competent SQL database engine if you can deal with the complexity. But you
have to use SQL, which for some applications is a poor interface, and you have
to configure the engine for your application and workload. This adds
complexity to the process. If you have great database architects and DBAs,
NoSQL does nothing that you can't do on a really good SQL engine. Most
startups have neither the people nor money for that.

The vast majority of databases, whether labeled SQL or NoSQL, implement the
same relational operator algorithms under the hood. They are not intrinsically
different in that regard. Even graph databases, which in theory cannot be
expressed in a simple relational algebra, can be and are expressed in practice
as recursive relational algebras. As long as databases are using the same
algorithms and representations they will have the same limitations.

~~~
wyuenho
Most startups also don't have complicated data models or a high enough traffic
to justify having a DBA and commit premature optimization by denormalizing
data right off the bat to speed up retrieval in your NoSQL DB.

As many have said before, NoSQL is a premature optimization in that all it
does is to remove some restrictions in your technology stack to let you move
come complexity such as data validation and the ability to easily aggregate
data up the stack.

The need for NoSQL is a rich man's problem. When you organize your data like
that using the article's techniques, you are going to have to write a lot of
very odd looking code and tightly coupled code to do even some basic
reporting. E.g when you try to query from the many-to-one direction.

If you are following the Lean Startup methodology at all, you should be aware
that being able to measure things is crucial early on when you are trying to
reach a business goal. Writing bunch of crazy for loops and map reduce stuff
in the application layer isn't exactly easy to write, look at or maintain.

~~~
einhverfr
The impedance mismatch is real which anyone who has seen stored procs written
by app developers can verify....

What NoSQL does is give you a light-weight object store which could have some
very cool uses. Those uses are also pretty narrow for the reasons the article
mentions (assuming the questions you have now are all the questions that are
important for example).

However, I think some of the solutions may help in relational environments
with the edge cases.

~~~
wyuenho
That's a little weak as an argument isn't it. SQL and the relational model has
nothing to do with stored procedures and you certainly don't have to, or need
to use stored procedures for reporting. In fact, prior to 2005, MySQL didn't
support stored precedures for the longest time. How do you think those people
did reporting? I don't know how you do reporting, but mind involves just a
ridiculously simple and ugly web app that essentially just generates some HTML
tables from a bunch of SQLAlchemy queries.

Why do I get the sense that the real reason most people use NoSQL stems from
their destain of SQL, whatever that reason maybe...

~~~
einhverfr
I am a total relational guy, BTW. But the fact is that there is a mismatch
between how you have to think when doing SQL queries (thinking in sets) and OO
programming (thinking in instructions). Anyone who has dealt with stored
procedures written by app folks understands what a mess you get when you try
to program one side in techniques aimed at the other.

While NoSQL is a good choice for some environments, namely ones where ad hoc
reporting is not likely to be needed and where other methods of interop are
preferred (LDAP being a great example of something that could benefit from a
NoSQL back-end), the fact is that this actually shows that, more often than
not, you lose more than you gain by getting rid of the mismatch....

IOW, I think it is a moderately weak case for NoSQL in some environments and a
strong case _against_ in a much larger number of environments.....

------
malingo
I like the comparison of the design themes of relational modeling and NoSQL
modeling as, respectively, "what answers do I have?" and "what questions do I
have?"

~~~
linuxhansl
To me the key point is that in relational databases you describe -
declaratively - the information you want and leave it to the database to
figure out to retrieve it (x); whereas in most NoSQL databases you describe
how to retrieve the information.

(x) this is not the entire truth, as SQL is actually a (somewhat) unfortunate
mix between the declarative relational calculus and the procedural relational
algebra.

~~~
LewisCr
I don't know enough to understand what makes relational algebra procedural. I
read the "Haskell more successful cousin" article and I thought that the
relational algebra part was what made SQL and Haskell similar.

------
Roboprog
Party like it's 1969! <http://en.wikipedia.org/wiki/Network_model>

Not that many old things don't work, well in some circumstances even. I'm just
having a hard time seeing how throwing away ACID and denormalizing data is
"post modern" rather than "back to the future".

Denormalization? How about a materialized view to support common searches to
reduce I/O from assembling data.

Aggregates? You could probably abuse and extend entity attribute values to
(physically) cluster arbitrary / sparse / repeating field values around a
common parent.

It's actually a pretty good article, otherwise, I should probably leave myself
a marker to find it. Sooner or later, I'll end up having to maintain one of
these things -- long gone contractors will build a "latest and greatest" app
using a database system not unlike a 1970 mainframe. There are a number of
good work-arounds in the article for dealing with systems with poor indexing
capabilities -- a single key field to be filtered.

------
antirez
I've the feeling that when talking about Data Modeling, Redis really does not
fit into the key-value category.

~~~
eternalban
To the extent that one could talk about data modeling using the constructs of
an imperative language e.g. C then one can do the same with Redis (with the
caveat that Redis lacks a reference type). K/V only stores restrict the
semantics to that of maps e.g. map["foo"]=bar, but if they introduce richer
operations e.g. map-reduce on the K/V containers (e.g. map.apply(func)), then
it is pretty much Redis restricted to its String type.

I personally have a preference for unified views of systems so my bias is to
look at the entire hierarchy of storage (image) and memory model (semantics)
as a singular space, with back end disks as Ln to L1 cache on the CPU. In this
light, to me Redis is a memory manager(/cache) + DSL, serving at Lx (where x
is somewhere south of local Disk and north of a durable and consistent
distributed FS backend e.g. HDFS).

------
mark_l_watson
I had a good talk with a potential customer two days ago and we covered non-
relational databases in some detail. I blog about non-relational databases a
lot so he thought it was 'religion' for me, but this is not the case at all: I
default to wanting to use PostgreSQL (or another relational database) and
instead choose MongoDB, CouchDB, etc. depending on special requirements.

As per the article, it is required to understand the CAP theorem and specific
capabilities of different data stores.

Client library support varies a lot. Some ORM tools like Ruby's Datamapper let
you design composite storage schemes using a relational database and, for
example, MongoDB. For document oriented data stores I very much prefer writing
client apps in languages like Ruby and CLojure that have a nice syntax for
maps, etc.

