But why? Ok, no foreign key checking, so it’s fast, great. Why not include the foreign ID in the local table? What’s the purpose behind a global “relationships” table?
>Most relations are properly modeled as N:N anyways
I'm not sure that's true. Containment is a very common type of relationship.
>Corollary: no 1000 relationship tables to name & chase around the DB metadata.
That depends on whether relationships have associated attributes. If and when they do, you have to turn them into object types and create a table for them. And it's exactly the N:M relationships that tend to have associated information.
That's the conceptual problem with this approach in my view. Quite often, there isn't really a conceptual difference between relationship types and object/thing types. So you end up with a model that allows you to treat some relationships as data but not others.
>Neat approach IMHO.
It's a valid compromise I guess, depending heavily on the specifics of your schema. Ultimately, database optimisation is a very dirty business if you can't find a way to completely separate logical from physical aspects, which turns out to be a very hard problem to solve.
* low friction till things go wrong and you're not even aware they went wrong because the relational mistake wasn't constrained
* aaargh we have trillion row tables that contain everything, how do we scale this (you don't)
* aaargh our tables indices don't fit into memory, what do we do (you cry)
* no at a glance metrics like table sizes to have an idea of which relations are growing
* oh no we have to have complex app code to re-implement for relationships everything a database gave for free
Reddit used a similar system in Posgresql and hit all of these issues before moving to Cassandra. Even then they still had to deal with constraints of the original model.
Edit: Didn't even read the article being about Reddit, it pretty much confirms what I said. It took them something like 2 years iirc to migrate to Cassandra and the site was wholely unuseable during that time. There is no doubt in my mind if another Reddit style site existed during that time it would have eaten their lunch same as Reddit ate Diggs lunch (for non technical reasons).
Furthermore, back then iirc they only really had something like 4 tables/relations: comment, post, user, vote. The 'faster development time' adding new relations was completely moot, they spent more time developing and dealing with a complex db query wrapper and performance issues (caching in memcache all over the place) than actual features.
The (app+query) code and performance for their nested comments was absolutely hilarious. Something a well designed table would have given then for free in one recursive CTE.
It wasn't until they moved to Cassandra that they were able to let the db do that job and work on adding the shit ton of features that exist today.
> low friction till things go wrong and you're not even aware they went wrong because the relational mistake wasn't constrained
These are people-problems, and usually a dev made this mistake exactly once after being hired. Writing robust code isn't hard if the culture supports it.
> we have trillion row tables that contain everything, how do we scale this (you don't)
You do, by sharding the table.
> no at a glance metrics like table sizes to have an idea of which relations are growing
select count(*) where relation = 'whatever'; is pretty simple.
> we have to have complex app code to re-implement for relationships everything a database gave for free
These queries are no different than standard many-to-many type queries. In all, the logic is no different than anything else. You also still have to do the same number of inserts that you have to do for many-to-many, but now the order isn't important. You can even do a quick consistency check before committing the transaction at the framework level.
Have you ever accidentally deleted a production table before? If so, I doubt you’d do it again if you are able to learn from your mistakes. No hostilities required. Hire good people who are smart and capable of learning.
Even where I work now (that has almost no table constraints at all because there is no way to support them at our scale), we have tooling that prevents you from making dumb mistakes and alerting + backups if you manage to do it anyway. Having processes in place to deal with these kinds of issues, and tooling is. That's what I meant by "people problem." No constraint will keep a determined person from deleting the wrong thing (in fact, it might be worse if it cascades).
We had someone accidentally fight the "query nanny" once, thinking they were deleting a table in their dev environment... nope, they deleted a production table with hundreds of millions of rows. It took hours to complete the deletion, while all devs worked to delete the feature so we could restore from backup and revert our hacked out code. They still work there.
My point is, these are all "people problems" and no technology will save you from them. They might mitigate the issue, or slow people down to get them to realize what they are about to do ... but that is all you can really hope for. A constraint isn't necessarily required to provide that mitigation, but it helps when you can use them.
Thanks for clarifying, I had misinterpreted your statement.
I can see how at huge scales, DB constraints can be impossible to implement. But they do come (almost) for free in terms of development effort, whereas what you describe probably took a long time to implement.
I never said it did. Just that people tend to learn from their mistakes. Are there people who will make the same mistake different ways over and over again? Yep (I’m one of them). It’s not an issue, I learn and get better at spotting the mistake when I review other’s code and others point it out when reviewing my code. That’s why I said “exactly once” since once you know what it looks like, you can spot it in a review.