Hacker News new | past | comments | ask | show | jobs | submit login
A Tiny Intro to Database Systems (dancrisan.com)
154 points by sandcrain on Apr 24, 2015 | hide | past | web | favorite | 13 comments

As a non-CS grad coming fresh to databases, I found both the entity-relationship, and the object-oriented models confusing. Then I read Date [1] and Codd's [2] books and papers on the relational model, the one from the 1970s that is basically set and type theory applied to data, and found that to be a lot clearer and a more powerful abstraction to deal with your data model.

For example, your Relational Model introduction has a discussion of various data types. But arguably, whether your integer is implemented as BIGINT or TINYINT is an implementation decision which should be separate from the model discussion (dixit Date). In other words, that attribute has a type of integer and how that integer is stored is a separate issue, and your RDBMS ought to abstract it away (as, I think, Postgres is pretty good with, and MySQL quite annoying). The beauty of the latest RDBMS developments, particularly in Postgres world, is that the implementation has gotten so good that you don't need to really worry about it like you used to just a decade ago, at least in 95% of use cases.

Again as a non-"full time developer" it amazes me the number of "experienced" developers who are not aware of the relational model and who do not know what a foreign key is or why referential integrity might be important.

I think one can teach SQL (and the relational model) to a non-developer in about 2 hours, because it is so declarative and intuitive. One day I'll go write that tutorial, as many clients need it sorely...

[1] e.g. http://www.amazon.com/SQL-Relational-Theory-Write-Accurate/d...

[2] e.g. http://www.amazon.com/The-Relational-Model-Database-Manageme... or the original paper: http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

edit to add: on the E/RM vs the RM: http://www.dbdebunk.com/2013/09/entity-relatonship-model-not...

To really teach the relational model would take quite some more time. I would discuss database normalization (3NF/4NF/BCNF), query optimization, indexes, foreign key constraints and bridge set-theory with the relational model. Optional parts would be triggers and other kind of constraints.

To understand the query planner is tantamount to making good schema's and requires insight in the underlying data-structures (B-tree) and join methods.

Then, for the student to get used to this way of thinking, I'd have them implement a simple project, e.g. a hotel-booking system.

This is highly confusing. For one, it's pretty out of date (tape as "tertiary storage"? 512 byte page size? the whole topic of concurrency control without one mention of MVCC?), but also, the actual explanation seems to mix up quite a few things.

For example: a "Write-Read Conflict" ist just that, a conflict, which a database in some appropriate isolation mode would handle by appropriate locking in order to avoid "reading uncommitted data" (if the transactions were to happen concurrently--per the definition given, operations don't need to happen in concurrent transactions for them to be in conflict). Or, an actual DBMS with MVCC with SSI, like a modern Postgres, would simply execute the read on its snapshot and thus force the reading transaction to precede the writing transaction in any equivalent serialized order (even though the read happened after the write in realtime), and only abort the transaction if that could lead to contradictions (cycles) in the dependency graph.

Well done. But I have to note, the chapter "Schema Refinement - Functional Dependencies" is an example of what drives many students out of CS. Even so, this is one of the better introductions to functional dependencies that I've read.

If they can't stand theory, they shouldn't be trying to learn theoretical database foundations.

I agree that, at introduction, material should be made as accessible to students as possible. At the same time, we shouldn't dumb down the field for more advanced students.

It's a topic that you'll experience halfway through a graduate-level textbook on databases like Elmasri's Fundamentals of Database Systems. No gentle way to do it, and a student has probably been driven away or not well before they read about it.

We covered functional dependencies (in the same level of detail that's provided in the OP's link) in the first half of my undergraduate databases course at UIUC last year. It was painful to say the least.

I know what you mean (I think this concept put me to sleep in class) but on the other hand, it doesn't really have to be. I think with these sorts of concepts teaching technique is terribly important.

blueatlas cited the bit on functional dependencies as "an example of what drives many students out of CS" but I'd say that more than anything it's an example of a tricky topic that ought to be presented by a highly engaging, smart instructor instead of a boring one who may or may not understand the material very well. (in retrospect, my feelings and his aren't mutually exclusive)

This is great, I really enjoyed reading your explanation of B+ trees! Could you add in a small section regarding how to delete a node?

There are a lot of typos! Run through with a spell check.

A needed read. Thanks!

Jut curious: why python 2.7 over python 3.x?

Unless I a missing something this is just for relational databases.

The term database can encompass pretty much everything from CSV files to in memory distributed grids.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact