
A Relational Model of Data for Large Shared Data Banks (1970) [pdf] - open-source-ux
https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf
======
ZenPsycho
relational databases are secretly supposed to be logic programming languages
like prolog, each “table”, (relation) representing a predicate function. each
row (tuple) representing a fact about the world that satisfies the predicate.
with proper structure and labelling you can ask very sophisticated questions.
somewhere along the line people forgot this and typically just dump
spreadsheets in, to satisfy the fields in a form or an app screen, then think
they need a graph database to represent relationships. it is unfortunate that
SQL based systems became accidentally synonymous with the “relational model”
as it actually mostly gets in the way of implementing the relational model. oh
well.

~~~
paulddraper
> SQL based systems became accidentally synonymous with the “relational model”
> as it actually mostly gets in the way of implementing the relational model

Please explain?

What makes SQL bad a relational databases, and what is better?

~~~
grzm
There are at least two aspects of this:

1) SQL is not entirely faithful to the underlying logical principles of the
relational model. Chris Date's "Database in Depth: Relational Theory for
Practitioners"[0] or "SQL and Relational Theory: How to Write Accurate SQL
Code"[1] are good texts that go into this in detail. There's plenty of stuff
online if you search for it. For a very brief example of criticisms, you can
take a look at Wikipedia:
([https://en.wikipedia.org/wiki/SQL#Criticisms](https://en.wikipedia.org/wiki/SQL#Criticisms)).

[0]:
[https://www.goodreads.com/book/show/573821.Database_in_Depth](https://www.goodreads.com/book/show/573821.Database_in_Depth)

[1]: [https://www.goodreads.com/book/show/6219481-sql-and-
relation...](https://www.goodreads.com/book/show/6219481-sql-and-relational-
theory)

2) Most SQL systems provide a 1-to-1 mapping of tables to underlying storage
structure which places limits on performance, while relations are _logical_
and independent of the implementation. Other than indexing strategies, there
hasn't been a lot of work in production systems to truly abstract storage from
logical representation. They've been kind of stuck in their thinking. This
second point isn't as important as the first, however, and more off-topic with
respect to your question.

~~~
paulddraper
On the first point, I don't have either of those books, but IMO the Wikipedia
complaints aren't compelling.

On the second point, many SQL databases offer indexing [1] (as you said),
clustering [2] (arranging a table according an index), and partitioning [3].

I think database designers are _very_ conscious of the difference between the
logical relational model and the implementation details. For example, this [4]
conversation on the PostgreSQL mailing list about unique indexes vs. unique
constraints. I also think that in practice there just aren't many other ways
to abstract the logical model from the physical storage that gives you
something substantial.

IMO, this is one of the successes of the relational model: the logical model
and the physical implementations don't _need_ to differ for you to have a
really good situation all around.

[1] [https://www.postgresql.org/docs/10/static/indexes-
intro.html](https://www.postgresql.org/docs/10/static/indexes-intro.html)

[2] [https://www.postgresql.org/docs/10/static/sql-
cluster.html](https://www.postgresql.org/docs/10/static/sql-cluster.html)

[3] [https://www.postgresql.org/docs/10/static/ddl-
partitioning.h...](https://www.postgresql.org/docs/10/static/ddl-
partitioning.html)

[4] [https://www.postgresql.org/message-
id/CAO8h7BJMX5V1TqzScTx2N...](https://www.postgresql.org/message-
id/CAO8h7BJMX5V1TqzScTx2Nr1jH5iUFG8A071y-g1b_kdzpu9PDw%40mail.gmail.com)

~~~
ZenPsycho
the point is though, it's a little bit like a broken calculator. "The
Relational Model" represents a logical model as a kind of pure piece of
mathematics, and presents a system that isn't just dump disk persistence but
actually a logic induction engine. However, this is not how SQL is used in
practice. It is possible to implement the relational model in SQL, however if
you assume SQL databases are supposed to be implementations of it in and of
themselves, they violate the commutative, associative and distributive laws of
the relational algebra- it's a bit like a calculator that sometimes gives you
2+2=5 (because it's faster to give wrong results sometimes). In addition, the
ergonomics of a typical SQL database system do not make mathematical/logical
soundness the easiest thing to achieve. Most developers don't know or care
about logical induction or mathematical soundness, they just want a place to
dump their objects.

That said, out of the open source SQL database systems, postgresql actually
does try to implement the relational model properly. But I don't have enough
direct experience with it to say definitively how well it does.

How many millions of dollars hss NULL != NULL cost the world?

~~~
paulddraper
> what is better?

~~~
ZenPsycho
well, obviously "worse is better". historically, SQL has been "Good enough"
and was sponsored by IBM/Oracle that nobody has bothered to try and implement
the relational model correctly, except for perhaps postgres (which used to
have its own query language before bolting SQL onto its core engine)

what I hope people get out of my comments is not about selecting the right
system for today- A better thing does not necessarily exist yet-- But at least
understand the relational model well enough to see what its utility is.
Functional programming and Logic programming are coming back in fashion, and
maybe someone here will build the better thing.

I don't think "nosql" databases or graph databases are necessarily the better
thing. (but they might be for certain usecases). Though I think relational db
systems could learn something about developer ergonomics from them.
Interfacing with a DB using a fortran inspired query language is archaic.
Constructing the queries using string concatenation has been a security
nightmare. You could build something waaayyy better off of prolog or mercury,
full programming languages with the inductive logic engine at the core of
their paradigm, with a database system integrated in, and expose a graphql api
for your isomorphic react web app cos that's how we do in 2018.

Even microsoft's linq is a massive improvement over using SQL. Language
integration, strong typing. These are things we like, right?

~~~
paulddraper
> well, obviously "worse is better".

> A better thing does not necessarily exist yet

Ah, of course.

> Interfacing with a DB using a fortran inspired query language is archaic.
> Constructing the queries using string concatenation has been a security
> nightmare.

Perhaps, but this this often for very ubiquitous technologies. See almost
every HTML authoring tool.

You can write full application servers in Oracle PL/SQL now. [1] And with
PostgreSQL's row-level security, you can easily have a GraphQL server with
PostGraphile [2]. But those aren't popular solutions.

> ou could build something waaayyy better off of prolog or mercury, full
> programming languages with the inductive logic engine at the core of their
> paradigm

That doesn't really change the equation. You still either have to (1) choose
your DBMS's query language to write your application server (like today) or
(2) use one language to construct queries in another (like today).

I won't argue that a Prolog DBMS wouldn't be nice. I will argue that in terms
of the hard parts of a DBMS, it's not going to fundamentally change things.

[1]
[https://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns...](https://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_psp.htm#g2475353)

[2]
[https://www.graphile.org/postgraphile/](https://www.graphile.org/postgraphile/)

~~~
mamcx
The main problem that I see is that relational model / sql is ALWAYS coupled
with RDBMS, even with the proponents of the relational model!

The thing is, sql is much less capable then even forth. You can't do:

    
    
        query = where a = ?1
        columns = [name, id ]
        console.write(columns ?rename name as a |> query)
    

not even

    
    
        console.write
    

not even

    
    
        Window.open
    

neither store table in a field (not serializate it, store) create your btree,
make open gl game, make a osx app, create a rdbms and so on. (yes, i'm aware
some rdbms could, hacking around, kind of do it witth pain)

In short, the relational model is not see as something that can be made for a
full language WITHOUT RDBMS.

That is what I wish to change:

[http://tablam.org/](http://tablam.org/)

because I work before as foxpro developer and see how easy and logical was to
work with tables without sql.

~~~
tabtab
I agree that the integration of domain programming and database
programming/querying in FoxPro/xBase created a smoother CRUD-app shorthand
that current tools can't match. Less converting of sh8t between the layers
(and less need for layers). The newest generation must like to type and read a
lot of code, I guess. Or, it's a conspiracy from Carpel Tunnel and eye doctors
to get more revenue.

------
JunkDNA
If you’ve never read this paper, you owe it to yourself to give it a thorough
read. It truly gives you a sense for the problems relational databases were
attempting to solve with the tech of the day. Also note the strong
mathematical underpinnings of many of the core ideas. It’s no wonder these
ideas have withstood the test of time.

~~~
mathattack
I recall the first training class I had where I connected relational tables
with the theory I learned in school. Life never was the same after.

On Codd we trust.

------
EngineerBetter
The key, the whole key, and nothing but the key.

~~~
mtVessel
So help me, Codd.

------
DanielBMarkham
This changed the way people coded, stored data, and thought about data in
general. In my mind, the implications, not necessarily the details, were some
of the most profound in the history of computational papers.

For the last few years, I've been working with the premise that there is an
analogous information-minimization theory for all project information, not
just data storage. So far I'm optimistic that this is true --- it's just not
the same minimal format for different teams working on the same problem. It's
people-dependent, not data-dependent. (Obligatory link:
[https://leanpub.com/info-ops](https://leanpub.com/info-ops) ) If true, and it
looks like it is, it means that teams can have "analysis compilers" that work
inside the build pipeline alongside all the other tools. This leads to
"continuous information deployment"

~~~
kthejoker2
As someone who serves as a product owner for analytics use cases, I can attest
that the hardest problem in Agile is accurate qualitative and quantitative
measurement of progress, value, and delivery.

Definitely going to check out the book.

~~~
DanielBMarkham
To make it work, I had to propose a new working definition of what analysis
is. The rough definition for most people is that analysis is all the "work
around the work" If you're not coding you're doing analysis. I needed
something more concrete than that, yet flexible enough to allow tractability
on all the kinds of things teams do.

Once I got that definition in place, the rest of the process of information
minimization proceeded quite naturally. At that point you know what it is
you're trying to minimize, ie, you have a "key" in Codd's terms. Then the
trick is just minimize information and maximizing flow using key-based storage
and transmission. (This also lets you look at whether your processes are
working or not from a new, rational, and objective standpoint, which is a nice
side-effect)

------
redwood
Amazing that something optimized for when storage was the bottleneck and
shockingly expensive compared to people... is still the most popular model
today (when people are the expensive and rare resource, with storage abundant
and cheap).

------
lifepillar
It’s a shame that no one in academia has bothered so far to curate a critical,
annotated, edition of Codd’s collected works. His impact was immense (for good
and for bad, e.g., nulls in SQL) and he foresaw many of the important issues
of data management in the years to come (up to nowadays), including the
relevance of distributed systems and the need for different ways of looking at
data, as sets, as tables, as predicates, as graphs.

Unfortunately, people who knew or worked with him have died or are getting
old, so I’m losing hope…

------
commandlinefan
It's nearly 50 years old and still the best model for storing and querying
data. YesSQL FTW.

