
Database Schema Refinement: Functional Dependencies - sandcrain
http://blog.dancrisan.com/intro-to-database-systems-schema-refinement-functional-dependencies
======
acscott
Just feedback if you want to improve the discussion.

1) The example rows use first name and last initial for Client primary key
attribute (e.g James H.). No one uses this in the real world for a name-based
primary key. In fact, in a sufficiently large enough system, you would _never_
use a name for a primary key since names are not guaranteed to be unique.

2) Functional dependencies are interesting ideas. The discussion says that
studID (student id) determines address. It would be beneficial to further
discuss how we know that. For instance, more than one student could live at
the same address. And a student could live at more than one address. So does
studID truly determine address? Does the dependent attribute have to be
unique? Does the determinant set?

3) The discussion moves from Data(studID, studName, ...) to breaking it out to
three entities and says "The following structure is considerably better." But
what criteria make it better?

4) It states that "Each student has his own address", but in the real-world
that's not _always_ true. There will be cases when a student does not have an
address, or the address is still unknown. Also, mailing address and physical
addresses can be two different attributes. I'm being picky, right? But these
things come up all of the time (so much so it gives me heartburn). So using
good examples will help prevent future bad designs I believe.

5) courseID is supposedly functionally dependent on courseName. Again,
courseNames can be non-unique in a real school. And courseID domain and range
would be important to know whether courseID is unique. For instance there are
many 101 courses out there. So is this a sequentially generated ID, a GUID?
Only then can we know if it's truly unique.

6) Now to the really good stuff. Functional dependencies are abstractions that
exist outside of the database logical and physical design and implementation
correct? What if we identified all of the functional dependencies in our
entities? Could we generate a good table design automatically from that? How
do we prevent from modeling the entire universe and how do we prove that we've
over-modeled? (I think one of the stated benefits of relation models is that
once, you get to 3NF you can expand the model as needed. So how do we use FDs
to determine whether we are 3NF)?

Interesting reduction of FDs to mathematical concepts, but to make it useful
to a practitioner, more to go.

~~~
moron4hire
I'd like to add that there should also be a consideration for names that
change over time. People get married and divorced, or courses might change
their name but not their number, or maybe there was just a data entry error
but we need to keep record that the error was in place at that certain time
(if you've ever had to deal with Sarbanes-Oxley compliant systems, you'll know
what I mean).

There are physical entities in the world and what we call them is almost
always completely arbitrary. So almost all databases that deal with real
people and things over time need to be Temporal Databases[0].

It can be a pain in the butt to have to emulate a TRDBMS on a RDBMS, but if
you get the design down right it will also _save_ your butt way down the road.

[0]
[http://en.wikipedia.org/wiki/Temporal_database](http://en.wikipedia.org/wiki/Temporal_database)

------
moron4hire
I've never heard this called "refinement" before. The term I've always used
was normalization[0]. I'm rather surprised that there is no mention of
normalization at all in the article. There are several "normal forms" that are
quite well known and understood in relational databases.

Also, I've never heard this called "functional dependencies". The term I've
always used is "relational algebra"[1]. For lay-people, the set-theory
notation might be a bit off-putting, but take the time to learn the language
and the concepts become a lot more intuitive.

[0]
[http://en.wikipedia.org/wiki/Database_normalization](http://en.wikipedia.org/wiki/Database_normalization)

[1]
[http://en.wikipedia.org/wiki/Relational_algebra](http://en.wikipedia.org/wiki/Relational_algebra)

~~~
peterfirefly
C.J. Date uses the term "functional dependencies" in the section about various
normal forms (at least he did in the second edition of "Introduction to
Database Systems" from 1977). In fact, they are part of the definition of some
of the normal forms.

~~~
moron4hire
I guess my point is that it's just a little weird to see a discussion of
relational algebra without actually using the word "relational algebra".

------
CHY872
Good description. It might be important to point out that the decision has to
be made carefully; whilst normalisation typically improves write performance,
a normalised database can absolutely munch _read_ performance, especially when
it comes to things like aggregations.

Furthermore, there are some redundancies that you can't eliminate without also
necessarily losing the ability to represent functional dependencies, which
kinda buggers that one.

------
walt_dinio
Am I missing something, but isn't this normalizing the database...

