

NoSQL – Back to the Future or Yet Another DB Feature? - itcmcgrath
https://speakerdeck.com/u/mas/p/nosql-back-to-the-future-or-yet-another-db-feature

======
JPKab
I hate it when speakers talk about "NoSQL" without specifying what they are
actually bitching about, which in this case is clearly Key-Value/Document
oriented databases. Of the non-relational databases, there are actually many
which provide everything you want in a relational DB (separation of logical
and physical within the DB) as well as storing the schema as PART of the data
instead of a physically configured separate entity. These databases are called
graph databases, and they are awesome. I am convinced they would have been the
major databases of today, if not for the fact that it takes way more computing
power and fancy algorithms to work with graph data models.

~~~
jandrewrogers
The real problem with graph databases is that they do not scale because the
algorithms used parallelize poorly. Partitioning graphs is a longstanding
challenge in computer science.

It is literally equivalent to the NoSQL "we don't do joins" algorithm problem.
Unlike most NoSQL databases, the basic operation of a graph database is built
on join algorithms (edge traversal is a relational join). If you figure out
how to parallelize ad hoc joins on large clusters then graph databases become
a viable solution for non-trivial databases.

------
rb2k_
I always cringe a bit when people point to papers[0] when it comes to software
that has is supposed to have real world usage. Just because a paper described
a system that handles distributed transactions well doesn't mean that this
software exists or will be anywhere near usable in the next 5 years. Even if
they are, only very few papers actually talk about a system that is already in
production use. The few that do (bigtable, dynamo) usually are great though.

0: [http://cs-www.cs.yale.edu/homes/dna/papers/calvin-
sigmod12.p...](http://cs-www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.pdf)

~~~
wmf
AFAIK the scalable SQL work _from the 1980s_ was actually shipped by Tandem.

~~~
zeit_geist
(re that Tandem paper: it's IMS / a hierarchical DB that they describe though)

~~~
wmf
I'm not sure what you're referring to; Jim Gray's work looks pretty relational
to me. [http://research.microsoft.com/en-
us/um/people/gray/papers/Be...](http://research.microsoft.com/en-
us/um/people/gray/papers/Benchmark_NS2_TR_89.4.pdf) is just one example.

------
Fizzadar
I can't but help agreeing with this. However, I think NoSQL is an important
tool when dealing with 'big' data (billions of records). NoSQL is also awesome
for 'accelerating'/caching standard SQL (memcache to cache sql results, etc).

At the end of the day, both have important roles to play and, in many
respects, compliment each other.

------
elchief
By the time a NoSQL "database" is ready for prime-time, it will have all the
features of a real database.

~~~
badboy
What is "unreal" about current NoSQL Databases?

~~~
elchief
I guess you're new to Hacker News?

<http://news.ycombinator.com/item?id=3202081>

<http://news.ycombinator.com/item?id=3982142>

<http://news.ycombinator.com/item?id=3954596>

Real databases have real features like transactions, and guaranteed writes to
disk, and not retarded security. Crazy shit like a commonly-used query
language and tool support.

------
nborwankar
The biggest problem with NoSQL (whatever that may mean) is that there is no
formal framework to reason about NoSQL in a uniform manner as opposed to SQL
systems and bare data structures where strong formalisms exist that can give
hard results on costs. IMHO, this is because NoSQL is a "conceptual bag" in
which anything is thrown that doesn't have a SQL interpreter in it. That has
many contradictions.

E.g. Google App Engine's GQL, Facebook's FQL and Yahoo YQL not to mention Hive
all have a very SQL like syntax. So what about all this is NoSQL ? SELECT,
WHERE and FROM are OK in a NoSQL database?

Dig further and you'll find that these are all "No-Join" data bases and it's
the badly scaling cost of the Join operation that gave SQL a bad name and the
NoSQL community a bad category name.

Bottom line there needs to be a unifying formalism to talk about NoSQL as a
category. Which ("Tada Boom!") gives me a seque and a pun to refer to a paper
published in the IEEE about using Category Theory to describe NoSQL.

I will critique it elsewhere, but suffice it to say that if I have to do a PhD
in Pure Math to design a database it's a non-starter. (Aside I have a "PhD" in
Pure Math - everything but thesis)

So anyhoo bottom line - Missing: a formal framework to reason about databases
that don't use SQL or Joins.

Without that it is not even possible to decide what is the membership
criterion for something to be called NoSQL, especially when NoSQL is defined
as "Not Only SQL" = SQL UNION ~SQL = the whole F __ing universe.

So NoSQL is a label that has no ability to make logical distinctions. And
NoSQL as a grab bag of technologies has no formal way to reason about it.

To make any sense this needs to be fixed, before any arguments about this are
even worth having.

[My creds: 20+ years in the DB world as developer, architect, instructor;
including 4+ yrs in the NoSQL business, including one year as emp #2 and VP of
BizDev at CouchOne, and including lead developer of a project that used
CouchDB in an National Science Foundation funded project back in 2008/2009
(slashdotted and survived) and currently using Couch, Mongo, NodeJS in various
projects]

------
caseydurfee
I'm surprised to see no mention of postrelational/multivalue databases.
Postrelational DBs played in the 80's and 90's, and are still widely used in
the world of big iron.

It's historically inaccurate to state we went from file-based databases to
relational ones and are now slipping back to file-based ones.

Rather, both the relational and non-relational DB worlds have taken many ideas
from the post-relational world (even if not intentionally.)

The SQL/NoSQL + ORM + MVC stack of today is much closer to what postrelational
DB's were like rather than what SQL apps were like back in the day. Every
generation thinks they invented sex, and high level database programming...

<http://en.wikipedia.org/wiki/MultiValue>

~~~
zeit_geist
You are right. In the talk I gave I briefly referred to Hierarchical Database
for instance. It's not in the slides because of the timeframe for the talk.

------
davidism
While the author may be making a good point, I couldn't understand it as
presented. Instead of reading an interesting article, I stared at small
screens of half sentences, tables, and pictures, with very little explanation.
Slides without a narrator are a poor way to communicate.

~~~
zeit_geist
Yes, several slides require more context. The talk was recorded and,
hopefully, will be available online at <http://www.nosql-matters.org/> soon.

------
kylebrown
Beat a dead horse much? This has already been slashdotted to death. I like
Erik Meijer's take: it should've been named CoSQL (dual to SQL):
<http://bit.ly/feNxRE>

------
tferris
A high level and theoretic NoSQL debate doesn't help. We know all the
strengths and limitations of both SQL and NoSQL. And we know when to use what.

Moreover, NoSQL does not equal NoSQL—there're so huge differences between a
Mongo, a Riak and a Couch. And there's much more than that beyond SQL (graph
based DB etc.).

What matters: The experience.

I used MongoDB with Node one or two weeks ago for this first time and I was
really impressed when I got it. There are many uses case where I wouldn't
employ NoSQL or Mongo but there're as many where I would go w/ Mongo when
usually relied on SQL.

Why: working completely without schemes and even migrations is awesome. Just
save a record in Node/Mongo with the native interface and the system is
setting up the respective table and even DB in the same moment if not
existant. Schemeless is not always the way to go but if you want to prototype
or to get quickly out of the door, it's an mind-blowing experience (and it's
enough for many web projects). And with the JS interface it's just incredible
and totally different to other NoSQL counterparts.

------
vicaya
NoSQL is trading off ACID transactions for scalability/throughput/latency.
Both Calvin and Omid can add ACID transactions to any NoSQL for about nx
decrease in throughput.

NoSQL is about choice. Period.

~~~
paperwork
NoSQL is giving up FAR more than ACID. It is giving up the ability to consume
the stored data in a beautifully concise and expressive manner.

edit: 'the ability to'

~~~
taligent
If my data is document based then how is SQL going to be more concise and
expressive than JSON ?

Say I have a domain model that involves a primary object e.g. User with lots
of maps and arrays then how is SQL going to be better then ? Lots of tables
and joins better than a single document. I think not.

~~~
paperwork
If your data is a User object with maps and arrays, and you design a data
structure which gets you the result in exactly that format, you _are_ giving
up the ability to ask other questions of this data. You are giving up the
ability to ask _many_ different kinds of questions (expressive) in a
relatively simple way (concise).

------
wissler
I don't agree with calling these databases "NoSQL", because it grants the
undeserved premise that SQL is somehow the center of the database universe.
Politically that may be so, but technically, it's not. Nothing anoints SQL
databases as being somehow intrinsically "right" or the logical starting point
or standard by which all databases should be compared. SQL is merely popular.

~~~
paperwork
I don't think the premise is undeserved. SQL IS the center of the database
universe. If we use SQL to mean the general concept of relational databases,
then it is true across even more dimensions of the universe.

Relational databases and sql should be considered the pride and joy of
computer science AND software engineering (with logic and math being the grand
parents in this increasingly confused and mixed metaphor).

~~~
wissler
"SQL IS the center of the database universe."

Politically, yes. SQL/relational wins the popularity contest. But so what?

Technically, no, it's not the center. It's merely the most popular, for the
time being. This is a temporary state of affairs.

~~~
batista
"Merely the most popular"? LOL. It's the only formally proven, mathematically
correct representation and querying of data sets.

Not that most RDBMS conform to the relational theory 100% (or even 90%), but
everything else is mere re-invention of the wheel badly.

As in: "Hey, let's trade ACID, security, uniform access to data by all apps"
for cheap speed and ill-thought developer convenience.

~~~
wissler
"LOL. It's the only formally proven, mathematically correct representation and
querying of data sets."

Someone led you down the garden path.

There is absolutely no proof that RDBMS is objectively correct in any
_meaningful_ sense whatsoever. Did someone invent an _arbitrary_ standard to
measure it by, and then prove that it met that standard? Sure. But that's a
far cry from a claim that RDBMS's are somehow "formally proven." That's just
pure mathematical silliness and marketing propaganda.

~~~
batista
Who said anything about RDBMS? It's relational algebra we're talking about,
and the reasonings pertain to the relational algebra operators.

