

Codd's relational vision – has NoSQL come full circle? - lukaseder
http://www.opensourceconnections.com/2013/12/11/codds-relational-vision-has-nosql-come-full-circle

======
sowhatquestion
Excellent article. As promised, the author brings a historical perspective
that's typically lacking in this debate (at least for someone like me who has
casually followed it on HN).

Not to stray too far off topic, but this article raised a question that I've
had since starting software development in earnest... Why do we have to choose
between heavily "normalized" relational databases that structure all data &
thus allow more arbitrary queries, OR un-structured databases that are fast
and flexible but often slow to query in complex ways? Why can't there be a
hybrid "smart" database that dynamically generates indexes (to use the term
loosely) based on _how_ it's queried, in order to speed up similar queries in
the future? With some kind of additional weight given to the most frequent
queries. Granted, it would need some time to "warm up" (not unlike a tracing
JIT compiler?), and the implementation might be fairly complex, but other than
that, I can't think of any downsides...?

~~~
MehdiEG
AFAIK RavenDB, which describes itself as a "2nd generation document database",
does exactly what you describe. Whenever you run a query that's missing an
index, it will automatically create a temporary "dynamic index" to service it.
If it finds that this index is used a lot, it will automatically promote it to
a permanent index. I haven't tried it yet however so can't comment on how well
it works.

But yes, generally speaking, I'm also surprised to see that in 2013 we're
creating our indexes manually. While there are of course applications where
you really want to ensure that all the right indexes are created ahead of
time, for most applications having indexes created automatically based on
query patterns would seem like a much better solution.

~~~
lukaseder
I agree that automatic creation of indexes looks like a reasonable thing to do
for a database like Oracle. Oracle could gather statistics and auto-create
indexes where it seems fit, i.e. where there is a lot of querying and little
writing going on.

In a way, this is what caches can do. Besides that, Oracle is already very
good at giving you statistics and tuning hints to help you assess where you
_could_ add an index:

[http://stackoverflow.com/a/2937047/521799](http://stackoverflow.com/a/2937047/521799)

DB2 and SQL Server probably have similar tools. As far as I know, they all
don't go as far as automatically _creating_ or _dropping_ any indexes.

------
coldtea
Which reminds me of Fabian Pascal, probably somewhere drinking heavily to
forget the current state of the DB industry.

(Context: for more than a decade he tried to educate people on the relational
model, the ill-thought back-to-the-past ideas that are NoSQL DBs, and more
besides. Unfortunately his old site was scrapped, it seems, and has as of 2012
turned it into this blog: [http://www.dbdebunk.com/](http://www.dbdebunk.com/)
)

------
m_mueller
> My point is that it’s dangerous to just grab a database off the shelf
> because its fun and trendy. You might end up creating a lot of technical
> debt for yourself. Do you know you’re handling an extremely high volume of a
> single somewhat-well-defined “thing”? Then the hierarchical model might be
> an amazing bet for you. Do you have many different kinds of records that
> link together, but don’t necessarily contain parts of the whole? A
> relational data store might be the ideal solution.

This seems to me the most important point of the article and I fully agree
here. What I'd like to add is that a DBMS like CouchDB is IMO fully suitable
to the use case you describe. ArangoDB, which was also presented at that
conference, appears to be even easier for those kinds of datastores - I
haven't been able to play around with it however.

There is no such thing as a perfect DBMS - and that applies to RDBMS as well.
With Couch you give up sharding, Ad-hoc querying and per-document security, in
return for conditionless replication, relative fool proofness in terms of
indexing, the ability to have fully normalized relational data (compared to
other NoSQL) and mobile offline capabilities.

~~~
lukaseder
One thing about ArangoDB. Why don't they implement SQL instead of their weird
COBOLESQUE AQL language?

~~~
m_mueller
I have been asking myself the same question - they tried to explain it but I
didn't really understand the point to be honest. From what I get SQL was what
they first pursued, but gave up at some point when it became clear that it
wouldn't be ideal for their use case of a multi paradigm database (document as
well as graph database). In general I agree that when developing a new
product, it's good to question everything rather than blindly repeat what
everyone has done for 20+ years. I'd love to read a blog entry about this
decision though.

~~~
lukaseder
I agree that one should question 1-2 things. But when you look at an example
(from
[https://www.arangodb.org/manuals/current/Aql.html](https://www.arangodb.org/manuals/current/Aql.html)):

    
    
        FOR u IN users
          FOR f IN friends
            FILTER u.active == true && f.active == true && u.id == f.userId
            RETURN u.name
    

What would keep anyone from writing this as

    
    
        SELECT u.name
        FROM users AS u
        JOIN friends AS f ON u.id = f.userId
        WHERE u.active = true AND f.active = true
    

Writing an entirely new language may inhibit adoption, if anything... Their
rationale is given here in the FAQ
([https://www.arangodb.org/faq](https://www.arangodb.org/faq))

> For example, think of nested list structures inside a document, graph
> traversals etc. There is no way to query such structures in standard SQL,
> and deviating from standard SQL does not make much sense.

I would have thought that PostgreSQL had solved this pretty well. So has
Oracle with its OBJECT types. There must have been a complete SQL show-stopper
in their use cases for them to invent yet another language.

I guess the language was inspired by XQuery, even if they don't mention XQuery
in the FAQ.

~~~
jsteemann
Yes, AQL was inspired by XQuery.

I think SQL is really good for querying relational databases.

Actually we started with implementing something like SQL's SELECT clause in
the very beginning of ArangoDB. The rationale was: "why invent another
language? SQL is everyhwere, so let's use it!".

We very soon found that SQL is not a good fit for working with semi-structured
data. There is no definite schema for a collection in ArangoDB, so it is
unknown which attributes (think: columns) a document (think: row) will have.
Thus using standard SQL would have introduced a lot of potential ambiguities.
Example:

    
    
        SELECT a, b, c 
        FROM c1 
        INNER JOIN c2 
        ON (c1.x = c2.y)
    

When inspecting the above query initially, the database has no idea if
attribute "a" will come from c1 or c2. Each document in both collections can
have an attribute "a", "b", "c" or none at all. So a query like the above
could throw an ambiguity error at runtime only, and not at query compile time.
Fully qualifying attributes with "table" names would have worked (e.g. "SELECT
c1.a"), but would be a deviation from standard SQL, which doesn't require
that. And then people would have asked "they claim to support SQL. But why
doesn't my SQL statement work in ArangoDB?". Probably a lot of confusion.

Apart from that, it is common to have multi-valued attributes in document
databases (and thus in ArangoDB). Think of an attribute which itself is a
list.

SQL really is not designed for this. Putting a horizontal lists into a single
attribute/column is an anti-pattern in the relational world. Instead, you
would normalize most data into separate n:m mapping tables etc., and join them
later.

No need to do this with a document database: horizontal lists are supported,
and there is less NEED for normalization. It is up to you how to model the
data.

With all that in mind, we very soon switched to a language we thought would
better fit a non-relational database such as ArangoDB. We intentionally
decided against using SQL keywords, to avoid confusion.

I hope the decision for starting AQL now is little more comprehensible.

------
Delphiza
One thing that is never directly mentioned in the history of SQL is the
available memory, cpu and storage at the time. When a mainframe has 192KB of
core memory and a 300KB/s disk transfer speed, coming up with a way to access
all that data was important. The OP example of data in a text file would be
okay(ish) today because you could load up the whole in memory and be done with
it. Yes, we have more data, more rows, and larger records but are solving
different problems with different tools and significantly more horsepower at
our disposal. This is not anti-SQL, but the SQL-as-a-solution in history needs
more context. With that context, those lessons may be less relevant.

------
SeanLuke
> This is neat because, in my Artificial Intelligence class in college, this
> kind of “reasoning” is presented as something advanced. In fact there’s a
> whole language, Prolog, that’s whole job is to take assertions about the
> world and give you extra facts. I had to learn all that stuff in AI when
> mundane-old SQL was right under my nose giving me the tools to learn
> anything I want about the world my relations represented!

[Sigh]

I guess broken worldviews like this are why Datalog never really got off the
ground in the DB community.

------
brasetvik
See also: What Goes Around Comes Around (Stonebraker and Hellerstein)

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.113.5640)

That one's from the time where XML-databases were all the rage. Unfortunately,
its publication year is missing. (The youngest cited article is from 97)

~~~
lukaseder
Gotta love Stonebraker!

------
alok-g
See also:

SQL and NoSQL are two sides of the same coin, mathematically

[https://news.ycombinator.com/item?id=3313234](https://news.ycombinator.com/item?id=3313234)

~~~
lukaseder
That seems more like a plea for LINQ than anything else ;-)

One of LINQ's main visions is to unify querying regardless of the data source
- and it does so quite well. At the time the author of that article conceived
LINQ, he intended to unify querying RDBMS, XML, and objects. "NoSQL" (whatever
that is) is a type of data store that was not part of LINQ's original master
plan. Thus, LINQ has to be retrofitted / enhanced to accommodate new
requirements. It would be all too nice if things were as easy as a simple
duality, specifically given the fact that the author of that article has now
also created a company called Applied Duality Inc.

But history will teach us where these things go. I currently don't see a
second E.F. Codd to solve the complexity introduced with the new abundance of
NoSQL data stores - yet.

(As always, when Erik Meijer writes something: Very interesting read,
nonetheless)

------
yepguy
Author thinks he is being contrarian by defending SQL... huh?

~~~
andrewflnr
It's a NoSQL conference, so yes.

~~~
yepguy
Ah, I missed that bit.

