But why? Ok, no foreign key checking, so it’s fast, great. Why not include the f...

fatherzine · on Aug 10, 2022

Guessing:

* Most relations are properly modeled as N:N anyways.

* Corollary: no "aargh, need to upgrade this relation to N:N" regrets.

* Corollary: no 1000 relationship tables to name & chase around the DB metadata.

* Low friction to add relations on-the-fly when developing code.

* Low friction to query all relations associated with an object.

* Low friction to update a relation, but keep the ex around.

Neat approach IMHO.

fauigerzigerk · on Aug 10, 2022

>Most relations are properly modeled as N:N anyways

I'm not sure that's true. Containment is a very common type of relationship.

>Corollary: no 1000 relationship tables to name & chase around the DB metadata.

That depends on whether relationships have associated attributes. If and when they do, you have to turn them into object types and create a table for them. And it's exactly the N:M relationships that tend to have associated information.

That's the conceptual problem with this approach in my view. Quite often, there isn't really a conceptual difference between relationship types and object/thing types. So you end up with a model that allows you to treat some relationships as data but not others.

>Neat approach IMHO.

It's a valid compromise I guess, depending heavily on the specifics of your schema. Ultimately, database optimisation is a very dirty business if you can't find a way to completely separate logical from physical aspects, which turns out to be a very hard problem to solve.

fatherzine · on Aug 10, 2022

Great points! Indeed, hard to tell without knowing a bit more about their schema specifics.

zepolen · on Aug 10, 2022

* low friction till things go wrong and you're not even aware they went wrong because the relational mistake wasn't constrained

* aaargh we have trillion row tables that contain everything, how do we scale this (you don't)

* aaargh our tables indices don't fit into memory, what do we do (you cry)

* no at a glance metrics like table sizes to have an idea of which relations are growing

* oh no we have to have complex app code to re-implement for relationships everything a database gave for free

Reddit used a similar system in Posgresql and hit all of these issues before moving to Cassandra. Even then they still had to deal with constraints of the original model.

Edit: Didn't even read the article being about Reddit, it pretty much confirms what I said. It took them something like 2 years iirc to migrate to Cassandra and the site was wholely unuseable during that time. There is no doubt in my mind if another Reddit style site existed during that time it would have eaten their lunch same as Reddit ate Diggs lunch (for non technical reasons).

Furthermore, back then iirc they only really had something like 4 tables/relations: comment, post, user, vote. The 'faster development time' adding new relations was completely moot, they spent more time developing and dealing with a complex db query wrapper and performance issues (caching in memcache all over the place) than actual features.

The (app+query) code and performance for their nested comments was absolutely hilarious. Something a well designed table would have given then for free in one recursive CTE.

It wasn't until they moved to Cassandra that they were able to let the db do that job and work on adding the shit ton of features that exist today.

withinboredom · on Aug 10, 2022

> low friction till things go wrong and you're not even aware they went wrong because the relational mistake wasn't constrained

These are people-problems, and usually a dev made this mistake exactly once after being hired. Writing robust code isn't hard if the culture supports it.

> we have trillion row tables that contain everything, how do we scale this (you don't)

You do, by sharding the table.

> no at a glance metrics like table sizes to have an idea of which relations are growing

select count(*) where relation = 'whatever'; is pretty simple.

> we have to have complex app code to re-implement for relationships everything a database gave for free

These queries are no different than standard many-to-many type queries. In all, the logic is no different than anything else. You also still have to do the same number of inserts that you have to do for many-to-many, but now the order isn't important. You can even do a quick consistency check before committing the transaction at the framework level.

Tainnor · on Aug 10, 2022

> These are people-problems, and usually a dev made this mistake exactly once after being hired.

Classifying bugs as "people problems" strikes me as an incredibly hostile work environment.

What happens to the poor developer who makes a particular mistake twice? They get fired?

withinboredom · on Aug 10, 2022

Have you ever accidentally deleted a production table before? If so, I doubt you’d do it again if you are able to learn from your mistakes. No hostilities required. Hire good people who are smart and capable of learning.

Tainnor · on Aug 10, 2022

Deleting a production DB is not the only way to mess up your data.

Smart people design systems that alert them when they're about to commit an error because they have 500 other priorities on their mind.

withinboredom · on Aug 10, 2022

Yeah, we had that.

Even where I work now (that has almost no table constraints at all because there is no way to support them at our scale), we have tooling that prevents you from making dumb mistakes and alerting + backups if you manage to do it anyway. Having processes in place to deal with these kinds of issues, and tooling is. That's what I meant by "people problem." No constraint will keep a determined person from deleting the wrong thing (in fact, it might be worse if it cascades).

We had someone accidentally fight the "query nanny" once, thinking they were deleting a table in their dev environment... nope, they deleted a production table with hundreds of millions of rows. It took hours to complete the deletion, while all devs worked to delete the feature so we could restore from backup and revert our hacked out code. They still work there.

My point is, these are all "people problems" and no technology will save you from them. They might mitigate the issue, or slow people down to get them to realize what they are about to do ... but that is all you can really hope for. A constraint isn't necessarily required to provide that mitigation, but it helps when you can use them.

Tainnor · on Aug 10, 2022

Thanks for clarifying, I had misinterpreted your statement.

I can see how at huge scales, DB constraints can be impossible to implement. But they do come (almost) for free in terms of development effort, whereas what you describe probably took a long time to implement.

replygirl · on Aug 10, 2022

"made a big mistake twice" does not correlate with "bad person who is not smart or capable of learning"—hope you're not in management

withinboredom · on Aug 11, 2022

I never said it did. Just that people tend to learn from their mistakes. Are there people who will make the same mistake different ways over and over again? Yep (I’m one of them). It’s not an issue, I learn and get better at spotting the mistake when I review other’s code and others point it out when reviewing my code. That’s why I said “exactly once” since once you know what it looks like, you can spot it in a review.

Scarbutt · on Aug 10, 2022

It wasn't until they moved to Cassandra that they were able to let the db do that job and work on adding the shit ton of features that exist today.

But cassandra is still a nosql and doesn't give the constraints of something like pg.