It's not straightforward to do polymorphic joins: one common pattern is to have ...

roller · on June 27, 2022

A combined approach works if want to encode the exclusive constraint:

    CREATE TYPE school_type AS ENUM ('college', 'high_school');
    CREATE TABLE schools (
      id SERIAL PRIMARY KEY,
      type school_type,
      unique (id, type)
    );
    CREATE TABLE colleges (
      id INTEGER NOT NULL,
      type school_type default 'college',
      check (type='college'),
      foreign key (id, type) references school(id, type)
    );

Ya, the syntax is annoying and repetitive. It would be nice if foreign key could be a literal to remove the extra column altogether. e.g.:

    foreign key (id, 'college') references school(id, type)

vore · on June 27, 2022

Good point, I hadn't thought of that. Thanks!

BeefWellington · on June 28, 2022

As go_prodev indicated, the former just isn't how you reason about data modeled in a relational form.

The latter makes little sense. What constraints are you placing on a college that do not also apply to a highschool, given they're both schools?

> The other thing in the grandparent's comment that's a constant pain in SQL is representing an ordered list: how do you insert items into the middle of the list? Depending on your database, it can also be painful to renumber the other items.

I'm unclear on what you mean by this. If you want a list of records ordered in a certain fashion, there's an entire "ORDER BY" clause for that express purpose. If you're trying to add "extra" data into the middle of some list that is not otherwise represented in the data in the database, that's essentially business logic and you should be using some kind of custom view or procedure to do that or doing it inside your application code.

If it's just a question of how you add data into the middle of a resultset from actual data in a table based on some arbitrary ordering, you can do that too, people solved that problem ages ago by simply having an ORDER column or similar that's just an int with whatever likeliest precision you get, e.g.: default might be 1000 and then if need be you can insert 999 items between two others before needing to do a re-numbering on the column. These are dumb tricks but needing to "insert a record between two other records" is often also a dumb trick someone is trying to do in the database because they haven't designed things well elsewhere.

I'd venture the 99.9% case is querying real live data in the database and ordering it by factual things like record names, dates of update or creation, status, etc.

vore · on June 28, 2022

That's exactly it though: having to do dumb tricks is the painful part. There's plenty of things with user-defined order where an explicit index isn't surfaced, like to-do lists, playlists, etc.

BeefWellington · on June 28, 2022

A user-defined order is an explicit index.

EDIT: To expand further, I would generally model a playlist as its own tables anyways. Something like:

  CREATE TABLE playlist (
    id ID_TYPE_FOR_DB PRIMARY KEY,
    name varchar(4096), /* Or whatever storage size makes sense */
  )

  CREATE TABLE playlist_entry (
    id ID_TYPE_FOR_DB PRIMARY KEY,
    song_ref ID_TYPE_FOR_DB FOREIGN KEY REFERENCES songs(id),
    order INT, /* Or bigint or whatever you want */
  )

go_prodev · on June 27, 2022

> How can you ensure that a school is either a college or high_school but not both?

Do you have any real world scenarios where you've faced this problem?

In your example, you wouldn't model it like that. A school just needs an attribute that identifies the type of school (high school or college), and other attributes that would be common to both.

I'm sure there's lots of examples but it's late and I'm struggling to think of one that a good normalized data model couldn't handle.

vore · on June 28, 2022

I'm modeling the GP's comment, but I think a common case is something like a polymorphic user/organization entity like GitHub: there is some kind of base user that is usable in a lot of places (e.g. https://github.com/<username>) but there's also a load of distinct organization-specific fields and a load of distinct user-specific fields.

slaymaker1907 · on June 28, 2022

Not to mention actually using this schema ends up being really verbose since you have to do a bunch of joins. Additionally, besides being verbose, these joins can wreak havoc with the optimizer since join optimization is exponential. The optimizer might play nicely and just join all these on the ID column in whatever query you're doing, but that is very dependent on how the optimizer understands the input queries. Having a single table instead of 3 limits the number of ways the optimizer can think about a particular query.

layer8 · on June 28, 2022

> How can you ensure that a school is either a college or high_school but not both?

If the DBMS supports it, you can add check constraints that query the other tables. See for example here: https://stackoverflow.com/a/2588427

rq1 · on June 27, 2022

You add a xor non null check on the foreign keys?