Hacker News new | past | comments | ask | show | jobs | submit login
The Theory of Relational Databases (1983) (pdx.edu)
261 points by dragontamer 4 months ago | hide | past | web | favorite | 24 comments

Maier's The Theory of Relational Databases is one of the most highly cited textbooks on Relational Theory. I've given the first 2 chapters a look over, and I can see why: the writing style is clear, and its concepts are so fundamental that they are still applicable today (roughly 36 years after this textbook was first published).

I haven't read all of this yet, but I figure someone out there will find it helpful. My personal use of this textbook is for constraint-programming, which is closely related to the Relational Theory used in relational databases.

I also read the beginning of a few chapters and have to agree: the writing style is very concise.

Example (from the Introduction):

  One of the major advantages of the relational model is its uniformity. All data
  is viewed as being stored in tables, with each row in the table having the same
  format. Each row in the table summarizes some object or relationship in the
  real world. Whether the corresponding entities in the real world actually
  possess the uniformity the relational model ascribes to them is a question
  that the user of the model must answer. It is a question of the suitability of
  the model for the application at hand.
  Whether or not the relational model is appropriate for a particular set of
  data shall not concern us. There are plenty of instances where the model is
  appropriate, and we always assume we are dealing with such instances.

Worth noting that here the use of the word "model" refers not to some indefinite domain-specific data schema, but "The Relational Model" as a paradigm.

It displaced the earlier paradigm of hierarchical data model, which derived more or less from physical media (eg punch cards).

The relational model is not well suited to uncertain data, as a row in a table is generally interpreted as a true proposition. For statistical data sets, analytical processing may be better served by array/tensor models (which also exhibit uniformity).

> The relational model is not well suited to uncertain data, as a row in a table is generally interpreted as a true proposition. For statistical data sets, analytical processing may be better served by array/tensor models (which also exhibit uniformity).

Arrays/tensors and (relational) tables can be thought of as alternative ways to represent a set with this major difference:

o [Relational table] Each column is an axis. A row is point.

o [Arrays] Each dimension is an axis. A cell is a point.

This is why it is wrong to interpret a table with a two-dimensional array - their data have completely different semantics.

The success of a (data) model depends on its ability to represent and manipulate relationships in a simple and natural way.

There exist of course other uniform ways to represent data, for example, using functions (and operations with functions). But many of them have been developed under the umbrella of the relational model even though they have little to do with the relational principles.

>The relational model is not well suited to uncertain data, as a row in a table is generally interpreted as a true proposition. For statistical data sets, analytical processing may be better served by array/tensor models (which also exhibit uniformity).

An outer join is an outer product, and you have the null value. I've never seen anyone use real tensors outside of physical simulations, but then again relational databases don't use real relationships either.

Null is intended for missing information, not uncertain information. It's common to use vectors to represent a signal.

Relational databases really are based on the set theoretic concepts of relations and functional dependency.

> The relational model is not well suited to uncertain data, as a row in a table is generally interpreted as a true proposition

True, but the proposition does not necessarily indicate a certain fact about the world. It might be e.g. measurement or observation or opinion or guess or whatever. The proposition is just that the measurement, guess or opinion is this.

The power of the relational model is that we can use operators to combine relations to form new relations, which are then understood to also encode true propositions. A relational join is a logical implication. This is how SQL is able to answer complex queries with certainty from elementary certain data.

Bayesian networks are able to do similar things from uncertain data.

Anyone know the significance of the diagrams on the book's cover? They're intriguing:


The diagrams are indeed hypergraphs. They are taken from Chapter 12 of the book, and are simplifications of Figures 12.44 and 12.45. The solid lines enclose relation schemes, and the dotted line represent "objects", which here signify groups of relations it makes sense to join. There's a scan of the text at http://web.cecs.pdx.edu/~maier/TheoryBook/TRD.html if you want to look at the figure originals.

Gallo et al. expressed functional dependencies via them here https://www.sciencedirect.com/science/article/pii/0166218X93...

Good eye! They look precisely like figure 12.44 and 12.45!

He's the author of the book ;) (or someone stole his name)

Woops. I guess I should read usernames more carefully from now on.

Yup, I'm the author. As I recall, I didn't have much input on the cover. Whoever did the design must have liked the look of the figures. However, as you see, they're pretty hard to interpret with the attribute names removed.

I'm going to guess that the pictures are of hypergraphs. Hypergraphs are good representations of mathematical relations. Each vertex of the hypergraph is a domain, while a hyperedge would be a "relation", the subject of the book.


For those unfamiliar with hypergraphs, they are very similar to classical graphs. Except hyperedges may connect more than two vertices together. (While normal graph edges normally only connect two vertices together) Therefore, you have to draw regions to represent hyperedges.

Does anyone know what the hyper- prefix here signifies? These diagrams remind me of microphone response graphs, and there's a type of microphone called hypercardioid that resembles this shape.

I don't know for sure, but I generally expect "hyper" in mathematics to mean "extended into an extra dimension".

A Hypercube is a 4-dimensional cube. A hypergraph is a graph with 3 (or more) vertices per edge, instead 2-vertices. Finally, a "Hypercardioid" is a cardioid, except in 3 dimensions (instead of the typical 2-dimensions).


Set theory.

Relational model is basically Set. The diagram is illustrating the sets.

I started my IT career in the late '80s...and the buzz of the day was SQL. All the money those days (consulting, I mean) was for dBase. But the limits were becoming obvious (scale, mostly) and relational db's were moving rapidly. Yep, when Oracle was just an RDBMS. To learn SQL... well, hit the library. Lots of boring stuff on "relational calculus", but nothing I perceived as valuable. I found Maier's book...and while it had a lot of math, there was enough practical content that the lights came on for me. After that, I found books from E.F. Codd and Chris Date...wonderful! If you want to nerd out on this stuff, find the more recent works by Chris Date. I recall having many beers over null. And wacky joins. I haven't had to write SQL for pay in years...but for me, SQL and RDBMS are a wonderful diversion on the occasion I'm asked to unravel an issue. For some reason, the relational model still makes sense to me...and I don't know if it is because it's the first formal model I worked with, or of it's intrinsic value. Perhaps you never forget your first love. Anyhow, my advice to people digging into this, is to install (or get access to) a reasonably complete RDBMS, and create/populate a few tables and attempt to predict what your SQL will return. This is a key skill. I see so many devs writing queries and testing them... and watching what comes back and then (somehow) thinking they've got it right. Sure, with a few dozen rows...but try that on several million records. Nope. At scale, you must understand the relational model in general, AND you must profile the RDBMS you are running by experimentation to see exactly how it behaves... especially around nulls, default sort orders, and joins. Not all RDBMS behave the same way... and this will surprise most.

Does anyone know if they offer this as a full pdf? I have no idea why they decided to split it. I'm sure it's not hard to merge back.

Why is the book in parts? It was scanned almost 20 years ago, when multi-megabyte files were still challenging for some tools to deal with. I'm guessing as a single file, it would be 50+ MB.

(Note that the book was written before PDF existed.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact