I found this approach deeply troubling because it moves us away form semantic design towards logic design... which always runs into problems when the database itself becomes semantic content. The write up was rather confusing.
The solution seems straightforward. A single table that captures the meaning expressed by the separate VIEWS and DOWNLOADS tables. eg. USERACTION (USER_ID, ITEM_ID, ACTIONTYPE) where ACTIONTYPE is a value like V for view and D for download. Of course, that solution is hard to see because it's a synthesis of meanings occurring at different levels and not the product of predicate logic.
Database design IS logic. That's the point of the RDM: to formalize and symbolize semantics such that the DBMS can enforce integrity on and manipulate data, such that logical and semantic correctness is guaranteed. Leave that to users in apps at your peril. We used to do this before the RDM and the whole shabang collapsed. And we're still doing it because practitioners know nothing beyond SQL and coding.
> We used to do this before the RDM and the whole shabang collapsed.
No, it didn't, and relational-theory-purists aren't going to sell their ideas to practitioners in the real world by pretending that it did. The RDM certainly offers all kinds of abstract benefits, which practitioners often do not fully understand or leverage, and there is a very real problem when the not fully leveraging is due to not fully understanding (rather than weighing practical costs and benefits in the particular use case.)
OTOH, the reason that things built on the relational model took off in practice wasn't that non-relational systems had reached a point of catastrophic logical failure that led to their rejection, but because the relational model had a convenient mapping to implementations that were convenient and efficient in the technology of the day (particularly, hard disk storage), combined with some of the structural improvements over other approaches being particularly attractive for important application domains.
> And we're still doing it because practitioners know nothing beyond SQL and coding.
Yeah, look, we're probably never going to have a time when most practitioners are deep theoreticians rather than expert tool users, and if you want to sell practitioners on deeper consideration of the underlying theoretical models, you're going to need to make explanations of the practical benefits much more accessible than you have in the source article or your comments in this thread (and you're going to need to be a lot less personally abusive.)
Right, there is never time to do it right and lots of time to do it over.
I do not think that your reading of history of the field is anywhere close to reality. I do suggest that you read as carefully my comments as I write them: I did not say practitioners ought to be theoreticians, I said they should no engage in a field founded on logic without ANY intro to logic. Big difference.
In fact, the initial mapping to implementation--direct image SQL implementations--was not in the relational spirit at all and is in large part responsible to logical-physical confusion and confusion of tables with relations. And to call those initial implementations efficient in the technology of the day is from another planet. IBM would not budge implementing the RDM until Oracle forced it.
It just so happened that even the limited relational fidelity of SQL proved superior to the rigidity, complexity and lack of soundness of hierarchic and network technologies. Have you ever seen IMS or Codasyl code?
Listen, have done nothing but exactly making the practical implications of the theory for the last 40 years. I suggest you read my stuff and tell me exactly what is wrong with it.
The problem is lack of fundamental education which has been replaced by tool training. Practitioners are not even aware that there is something beyond experience with tools that they need to know.
> Right, there is never time to do it right and lots of time to do it over.
Purists like to snear when they say this, but in point of fact its often true: its often more efficient to do things good enough for now and fix the things that turn out to need to fixing later (because the real pace of change often means the things that are broken-in-theory, but good-in-enough-in-practice are going to need completely replaced because of requirements changes before they become problematic in practice.)
But its true, OTOH, that lack of knowledge of relational theory, the anomalies that it identifies that are tied to improper data models, and the practical impact of these can lead to poor analysis of the tradeoffs, and that the common cargo-cult rules of thumb (say, for degree of normalization to pursue) that are frequently used in practice are poor substitutes for deep understanding of the relational model and the problem of concern, and analysis of the real risks in the system under design of taking shortcuts.
> I do suggest that you read as carefully my comments as I write them
One of the biggest problems with your comments is that they don't appear to be written carefully -- particularly, if you hope to influence the practitioners that you treat with such condescension, you have failed to put due care into consideration of your approach to the audience, which is a central element of any communication.
> I did not say practitioners ought to be theoreticians, I said they should no engage in a field founded on logic without ANY intro to logic.
You seem to also, however, keep suggesting that either a lack of deep familiarity with relational theory or a disagreement with your interpretation of how that theory ought to shape practice are equivalent to (or can only be a result of) a complete lack of grounding in logic. Whether you are actually conflating these things or just engaging in particularly obnoxious condescension and personal abuse when you do this is less than clear, but neither is helpful or useful.
> In fact, the initial mapping to implementation--direct image SQL implementations--was not in the relational spirit at all and is in large part responsible to logical-physical confusion and confusion of tables with relations.
I'm not sure what you mean by the "relational spirit". SQL's design was clearly shaped by the relational data model, though SQL itself (in its current form as well as its early forms) is certainly not ideal from a relational perspective, even before considering the whole NULL controversy.
> And to call those initial implementations efficient in the technology of the day is from another planet.
The initial implementations weren't what became popular though; the implementations that were efficient were key to driving popularity.
> It just so happened that even the limited relational fidelity of SQL proved superior to the rigidity, complexity and lack of soundness of hierarchic and network technologies.
Sure. I just think that its easy for theory-purists to overstate the degree to which the "lack of soundness" was the problem driving adoption, rather than "rigidity and complexity". Insofar as the linguistic and expressive features of SQL and the relational model proved attractive, simplicity and flexibility were particularly important, and while the capacity for soundness is an important improvement, its one that's been underused since day one. Its simply not the case, as you seem to present, that we've "fallen" to a state where that is ignored from some rosier days when that aspect of RDBMS capacity was strongly embraced and effectively and rigorously used by practitioners.
> Have you ever seen IMS or Codasyl code?
I've even had to write (well, modify) some IMS code, far more recently than I'd prefer to have.
> The problem is lack of fundamental education which has been replaced by tool training.
I don't think that's true. I think that the number of people with "fundamental education" in the field is probably greater than ever before. Sure, the number of people with tool training has increased faster, but that's not tool training replacing fundamental education, its just that with any technology, the first generation of users will all (or, at least, disproportionately) be versed in the underlying principles because they are also the builders of the technology, but over time that's going to fade as, even with more people educated in the principles, people who are just pragmatic users of the technology with a more limited focus are going to grow at a faster rate.
> Practitioners are not even aware that there is something beyond experience with tools that they need to know.
Most practitioners I've encountered seem to be aware that relational theory exists. Sure, lots of them aren't well versed in it or what light it has to shine on their craft, but abstruse descriptions without clearly explained pragmatic benefits aren't an effective way to correct that and motivate them to dig more into theory, and neither is condescension and abuse.
> Purists like to snear when they say this, but in point of fact its often true: its often more efficient to do things good enough for now and fix the things that turn out to need to fixing later (because the real pace of change often means the things that are broken-in-theory, but good-in-enough-in-practice are going to need completely replaced because of requirements changes before they become problematic in practice.)
Do you have stats that prove your point, or is it based on the fact that this is how it is usually done because that is what is possible given the poor level of education and knowledge in the industry? If you're not part of that, don't underestimate its size. I spent 40+ years demonstrating the ignorance and its consequences.
By the way what does "impure theory" mean?
> But its true, OTOH, that lack of knowledge of relational theory, the anomalies that it identifies that are tied to improper data models, and the practical impact of these can lead to poor analysis of the tradeoffs, and that the common cargo-cult rules of thumb (say, for degree of normalization to pursue) that are frequently used in practice are poor substitutes for deep understanding of the relational model and the problem of concern, and analysis of the real risks in the system under design of taking shortcuts.
> Glad we agree on something. My claim is that there's more to that than you seem to think.
> One of the biggest problems with your comments is that they don't appear to be written carefully a-- particularly, if you hope to influence the practitioners that you treat with such condescension, you have failed to put due care into consideration of your approach to the audience, which is a central element of any communication.
The title of one of my books is "for the THINKING practitioner". He is the one I try to influence and aren't that many.
It's not entirely their fault--it's how the industry and business in general operates. I can detect very easily the difference between a thinker who is uninformed and a non-thinker and I treat them differently. It's just that there's many more of the latter than the former.
> You seem to also, however, keep suggesting that either a lack of deep familiarity with relational theory or a disagreement with your interpretation of how that theory ought to shape practice are equivalent to (or can only be a result of) a complete lack of grounding in logic. Whether you are actually conflating these things or just engaging in particularly obnoxious condescension and personal abuse when you do this is less than clear, but neither is helpful or useful.
So according to you it's not possible to detect the difference between a poor argument due to ignorance of logic and one grounded in logic? Again, I have spent 4 decades doing this and pls permit to believe that I discern quite readily who should be treated with respect and who not. You're entitled to disagree.
> I'm not sure what you mean by the "relational spirit". SQL's design was clearly shaped by the relational data model, though SQL itself (in its current form as well as its early forms) is certainly not ideal from a relational perspective, even before considering the whole NULL controversy.
The authors of SQL did not have a good grasp of the RDM, which is why SQL cannot be considered truly relational. It violates too many rel. principles. This according to Codd and Date who were both at IBM when SQL was developed. The specific spirit I was referring to is physical independence, a core objective of the RDM which a direct image implementation is not in the spirit of.
> The initial implementations weren't what became popular though; the implementations that were efficient were key to driving popularity.
Yes, but many of the reasons for which SQL was slow to make efficient was its poor relational fidelity. I've written a few articles on that subject.
> Sure. I just think that its easy for theory-purists to overstate the degree to which the "lack of soundness" was the problem driving adoption, rather than "rigidity and complexity". Insofar as the linguistic and expressive features of SQL and the relational model proved attractive, simplicity and flexibility were particularly important, and while the capacity for soundness is an important improvement, its one that's been underused since day one. Its simply not the case, as you seem to present, that we've "fallen" to a state where that is ignored from some rosier days when that aspect of RDBMS capacity was strongly embraced and effectively and rigorously used by practitioners.
I did not say soundness drove the adoption. In fact, there is no way this can happen given that a vast majority of practitioners have no clue of how a formal foundation for db mgmt is different and superior to non-formal ones. This is not different than the notion that data science is science. That's precisely why I keep stressing that substituting training for education killed the capacity to appreciate the difference.
> I've even had to write (well, modify) some IMS code, far more recently than I'd prefer to have.
My sympathy.
> I don't think that's true. I think that the number of people with "fundamental education" in the field is probably greater than ever before. Sure, the number of people with tool training has increased faster, but that's not tool training replacing fundamental education, its just that with any technology, the first generation of users will all (or, at least, disproportionately) be versed in the underlying principles because they are also the builders of the technology, but over time that's going to fade as, even with more people educated in the principles, people who are just pragmatic users of the technology with a more limited focus are going to grow at a faster rate.
Well, having spent so many years documenting the deterioration of education, I believe my evidence than your perceptions.
> Most practitioners I've encountered seem to be aware that relational theory exists.
Very well put. That's about the gist of it. And whatever little they know about it is wrong.
> Sure, lots of them aren't well versed in it or what light it has to shine on their craft, but abstruse descriptions without clearly explained pragmatic benefits aren't an effective way to correct that and motivate them to dig more into theory, and neither is condescension and abuse.
They should not be motivated to do it on their own. There should be a basic level of education required to be inducted into the profession by faculty that are themselves proficient in the material and not industry hires that are teaching coding and tools because that's what univs teach now to be "relevant".
I hear you on the education thing. Intuitively, the "two tables with same structure" approach is redundant, but it's helpful to have the math/logic to unambiguously define what "redundant" is.
Having 2 tuple types with 3 values seems harmless. This explodes quickly when done with many columns or many tables. Having some kind of foreign key to act as a "discriminator" scales much better to constrain the number of value types (# of tables x # of columns) that must exist.
But I only say this because of having to deal with a legacy DB at work that violates the Hell out of this :-)
... which brings us full circle to the tragedy of a lack of education.
Correction: I meant to say I made the practical implications of theory accessible via articles, seminars, books.
The only way to claim they are not accessible is in the absence of basic foundation knowledge.
The solution seems straightforward. A single table that captures the meaning expressed by the separate VIEWS and DOWNLOADS tables. eg. USERACTION (USER_ID, ITEM_ID, ACTIONTYPE) where ACTIONTYPE is a value like V for view and D for download. Of course, that solution is hard to see because it's a synthesis of meanings occurring at different levels and not the product of predicate logic.