
SQL and NoSQL are two sides of the same coin, mathematically - 6ren
http://queue.acm.org/detail.cfm?id=1961297&repost
======
talaketu
> Referential integrity implies a closed-world assumption where transactions
> on the database are serialized by (conceptually) suspending the world
> synchronously, applying the required changes, and resuming the world again
> when referential integrity has been restored successfully, rolling back any
> changes otherwise. Assuming a closed world is, we claim, both a strength and
> a weakness of the relational model. [...] The closed-world assumption,
> however, is in direct contradiction with distribution and scale-out. The
> bigger the world, the harder it is to keep closed.

This is the first time I have heard the expression "Closed World Assumption"
used to refer to transactional atomicity.

For example, the Terry Halpin textbook "Information Modeling and Relational
Databases" describes the CWA thus:

> When modeling a business domain, one may take different positions with
> respect to the completeness of the information captured in the model. The
> Closed World Assumption (CWA) is the assumption that all relevant facts are
> known (i.e, appear in the model, either as asserted facts or derived facts)
> and that all the relevant business rules are known.)

I find this discrepancy in academic terminology difficult to reconcile.

~~~
stonemetal
I am not sure what you are calling a discrepancy. For a Database to guaranty
the assumption in Halpin's definition, a database must guaranty all business
rules and relevant facts are known, and followed.

Say we are going to insert a row in a database that has a foreign key, for
Halpin's assumption to hold the database must verify that the foreign key is
present and valid in the data set. If it doesn't we have violated the
assumption. Same for the transaction quote you mention.

It is an assumption you make can make because it is a guaranty your database
provides.

~~~
talaketu
The discrepancy is that Halpin's use of the phrase relates to the
correspondence between the real world and the database - to the modelling
relationship itself, whereas Meijer and Bierman's use solely relates to the
valid states of the database - separate from any meaning that might be taken.

You highlighted the element of referential integrity within their description,
whereas I highlighted atomicity. But both are just operational characteristics
of the database.

Halpin's CWA comes into play even within a database that is not being updated
and has only one relation. Eg. Consider a relation P that models the fact type
that person with NAME was born on DOB. P = { {NAME: Bob, DOB: 2/2/22} }

CWA answers the query "Was Mary born on 2/2/22?" with False.

------
einhverfr
"A more skeptical view of O/R mappers is that they are undoing the damage
caused by normalizing the original object model into multiple tables. "

Somehow inconsistent with the idea that SQL and NoSQL are two sides of the
same coin, mathematically.....

~~~
dextorious
How is it inconsistent?

That SQL/NoSQL are "two sides of the same coin" does not mean, as you seem to
imply, that they are the same SIDE. Two sides means that they are opposite
ends. Thus, each has different (actually complimentary) pros/cons.

Which is exactly what the phrase you quote implies: that normalization incurs
some damage that we have to undo (via O/R mapping in the example given), if we
want to get the same pros of a NoSQL solution from a SQL datastore.

(Also note that O/R mappers != NoSQL)

------
skrebbel
Phenomenal quote:

 _Just as Codd's discovery of relational algebra as a formal basis for SQL
shifted the database industry from a monopolistically competitive market to an
oligopoly and thus propelled a billion-dollar industry around SQL and
foreign-/primary-key stores, we believe that our categorical data-model
formalization model and monadic query language will allow the same economic
growth to occur for coSQL key-value stores._

In other words: "this article will move mountains".

Let's wait and see!

~~~
route66
Disclaimer: I did not dive into their mathematical adventures, but am rather
fascinated by their underlying economical foundation: If a market offers
choice (the "monopolistically competitive market") it will not deliver profit
in the long run. If it's a oligopoly on the other hand, said oligopoly can
retain profits for a long time to come.

What is this? Tautological? Suicide of the market? At least it's spelled out
that the user is the loser in that system.

------
Dylan16807
But what about many to many relations? How do those map to nosql?

~~~
JulianMorrison
Instead of:

    
    
        [person_id,person_name]  [team_id,team_name] [person_id, team_id]
    

You get:

    
    
        [person_id, person_name, [team_id, team_id, ...]]
        [team_id, team_name, [person_id, person_id, ...]]

------
j_s
Brief discussion earlier this year with input from antirez defending Redis:

<http://news.ycombinator.com/item?id=2416566>

------
igravious
I'd like to see a definitive response by someone without full knowledge of the
entire spectrum of NoSQL whether the mathematical insights described in this
article cover the entire space of NoSQL databases. That is: document store,
graph, and key-value store (in all their many guises).
<http://en.wikipedia.org/wiki/NoSQL> Thanks in advance!

~~~
einhverfr
I think what the author is suggesting is that you can bridge some of the gap
by a sort of non-SQL, quasi-relational data store. The major ways this is
differentiated from the pure relational model is the general hostility towards
representing information in well normalized relations and favoring a fairly
object-oriented store instead.

This is not a new idea. PostgreSQL was originally designed with this sort of
workflow in mind and now is pretty relational in outlook. In theory it is
perfectly possible to transform one model into the other, provided adequate
mappings, but at the same time one cannot reasonably _derive_ one model from
the other, which is why I disagree with the author's view that these are flip
sides of the same coin.

Now, on to what is gained and what is lost. There are cases where the sorts of
representation she is talking about are helpful (an in fact with recent
PostgreSQL instances you can do them already).

On the other hand, the article isn't really clear about what the costs of this
sort of representation are. The closest you get is the last line of his
comparison table which contrasts query optimizer with developer/pattern. So
what this implies is that such a system is optimized for developer/transaction
workload and not for flexible reporting later. In other words, the developer
is expected to know in advance all the uses the data will have so that the
database can be properly optimized for this. Add new uses and this element
becomes hard.

Similarly the ACID/BASE comparison also is telling in the sense that he is
contrasting a system which is designed to give you accurate, consistent
information when you ask for it to a system which is designed to tend towards
a state of consistency, at least for older data. The "eventually consistent"
part of his BASE definition ought to scare the pants off anyone interested in
looking at how data in such a system could be put to decision support tasks.

------
alexro
I'd rather suggest to rename 'NoSql' to 'NSQL' meaning 'New Structured Query
Language'.

CoSql doesn't stand up the test of aggregation, as KV nodes are distance apart
and sometimes partitioned too. Map/Reduce is kind of a substitute for
aggregation and this fact can be somehow reflected in NSQL.

Also, NSQL should reflect the fact that more advanced filtering queries cannot
be run against KV stores, like when referencing secondary indexes in other
nodes.

All-in-all, there is little remaining of original SQL in KV-stores and NoSql
currently makes sense. But as the products continue evolving there is a
definitive need to have a ground level logical model for data retrieval and
then some sort of NSQL will be born.

