Are there any other fields with a "Red Book"? I am mildly curious.
I'll show myself out...
If you ever need to design a logically sound query processor/optimizer and/or data storage layer, this is a book you should study.
It still sits on my bookshelf, waiting for the student whose interest in databases is deeper than dujourDB, so that I can pass it down to them.
It's still one of my favourite technical books.
I read it when I started looking at RDBMS after having worked in NoSQL and I clearly recall how excited I was to learn that there is a full fledged system that I could utilize which would get rid of 90% of my code.
I know this term is usually not used when it comes to technical books, but I found it to be quite a page turner!!
COUNT(case when <complex condition> then 1 end) as foo
Count occurrences of complex condition over a row set. Works because COUNT ignores null.
NVL(1/NULLIF(foo, 0), 0)
Protect against division by zero. The expression 1/X is a simple example but this generalizes well. In fact, I’ve come to think that NULL turns any SQL value into a Monadic option type like those used functional languages and Rust.
Note that I’m not saying the above cannot be done without NULL, just that NULL is a really useful tool in my experience.
For the counting case, sum(case when ... then 1 else 0 end) generally works as well, and generally att little to no cost.
Having nullability as a concept in the query language isn't very problematic though, I might even consider it be helpful on average bas long as a view or table can never contain nulls
Nullable columns in tables and queries is the problem. It boils down to how each nullable columns (in)effectively describes a schema equivalent to a table for each nullable columb, and one for the non-nullable, where the relationship between these tables are quite under constrained.
Thus to ensure data quality and correctness you will have to query all nullable columns of interest to make sure that the various combinations of nulls in those columns doesn't interfere with what you are trying to report. You will also have to track that no component of the system suddenly starts reporting nulls in columns where it didn't use to, as that is equivalent to a schema change.
This wouldn't be much of an issue if there were never more than one nullable column per table, or att the very least that only the absolute minimum of columns were nullable. This is however only very rarely the case, usually there are far too many nullable columns. To make reliable and stable reporting one then has to solve the under specification of the source system in the analysis and reporting pipeline for a class of issues that for the most part should be the responsibility of the source systems.
Best use of nullls I know of, is using them to lower cost of table scans in Google BigQuery.
create table foo(bar int)
insert foo values(1),(null),(2)
select count(*), count(bar) from foo
This is very different from Java, where a chain of operation on null values typically very quickly leads to runtime exceptions.
I know NULL causes a lot of heartache but it is a useful concept in DB design.
> Graphs allow us to build complex, intuitive relationships between data points. Unlike many traditional methods of structuring data, which focus on the format of relationships, graphs focus on the topology of relationships in the dataset.
When it comes to making database concepts more intuitive, I find visually rich representations very helpful, for example using geometric space-encoded logical relationships to aid diagrammatic reasoning.
It goes along well with many other people's efforts to improve mathematical represenations and interfaces through intuitive visualizations, ie:
It's a good idea to familiarise with the code in bottom-up order of layers.
Readme says this:
> The implementation takes heavy inspiration from Edward Sciore's SimpleDB, augmented by implementations of various exercises in his textbook Database Design and Implementation.
Some additional resources I'd recommend if you're interested:
Designing Data-Intensive Applications is a fantastic place to start if you're interested in the intersection of databases & distributed systems:
The Architecture of Open-Source Applications book has a fewer chapters on databases:
There's some fantastic documentation on Postgres and its internals:
The Classic: a basic database theory course covering the classical material would center around Parts B and C. Chapter 10 and parts of Chapter 9 of Part C are somewhat more advanced and could be skipped. If time allows, some of Chapter 12 and a selection of Part F might be covered.
Feast of Query Languages: a course on the theory of query languages would start with a quick review of the basic material on classical languages (Part B), and continue with Parts D and E. If time allows, some material on languages for complex objects and object-oriented databases (Part F) could be covered.
Gourmet Sampling of Database Theory: a course for people with theoretical appetites that emphasizes the specificity of database theory. Logicians wishing to explore the connection between finite-model theory and databases will be interested in Parts C and E. Those interested in descriptive complexity will find Part E closest to their hearts. Researchers in logic programming will prefer Part D, particularly Chapters 12, 13, and 15. People with a background in theoretical artificial intelligence will find Parts D and F of particular interest. Rule-based systems are related to Chapter 14 (see also parts of Chapter 22). Programming language people will be interested in much of the material on query languages, including Chapters 20 and 21 in Part F.
Fast-Food Database Theory: a course for applied database students that is meant to provide informal exposure to some of the basic results of database theory. This would include assorted topics, with an informal presentation emphasizing the examples, results, and intuition provided in the text, rather than the proofs. A possible syllabus would include Part B; parts of Chapters 8, 9, and 11 in Part C; Chapter 12 and parts of Chapter 15 in Part D; and selected chapters of Part F.
Numerous exercises have been included, and they are essentially of three categories. Routine exercises (unmarked) require easy manipulation of the basic concepts and results. Harder exercises are marked with a (*). Another category of exercises is meant to complement the material in the text and often contains results from related research articles. These exercises are usually on the hard side and some may constitute term projects. These exercises are marked with a ( ).
Play around with a small RDBMs like sqlite. Do a project around it. Look at the source maybe and then learn about the principles. Foundations of databases are based on discrete mathematics ( where relational comes from in RDBMs ) so a background in discrete math helps.
For enterprise level stuff ( clustering, replication, distribution, etc ), it's more practical to learn on the job where you have access to the hardware, servers, etc.
In fact I added this thread to my favorites because the book references look useful.
Here is an updated edition: https://www.amazon.com/dp/129202447X/
We banned you last year for egregiously violating the site guidelines, and told you why. When banned users create new accounts and show new signs of abusive behavior, we ban the new account too. The reason is obvious, but I'll explain anyway: when a banned user gives us reason to believe they've had a change of heart and will use HN as intended in the future, we're happy to unban them—but when they give us reason to believe the opposite, we do the opposite. Otherwise bannage would be a bit of a joke on a site when people can just create a new account in a few seconds (and many do).
We detached this comment from https://news.ycombinator.com/item?id=19735388 and marked it off-topic.
All users can enable showdead.