Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Constraint to enforce pairwise distinct values in two columns in table
1 point by prmph 44 days ago | hide | past | favorite | 4 comments
I'm kind of stumped by a seemingly simple problem in Db design, but the more I think about about it, the more it twists my mind.

Let's say I have columns C1 and C2 in a table. How do I enforce that the pairing of values of C1 and C2 is pairwise disticnt? That is, a specific composite value for C1 and C2, say X and Y, can appear any number of times in the table, but once X/Y appears, you cannot have X/Z, or W/Y.

It seems such a simple constraint to model, but several hours of research provides no clear solutions. I'm using PostgreSQL, but I guess the issue is db-agnostic. I explored partial indexes, but that feature seems too limited for this problem. Any ideas?

To put this in more context, my problem arises from trying to design a users table that maintains version history with the following columns:

- internal_id (auto-generated integer, primary key),

- external_id (varchar, not null),

- auth_id (varchar, not null, phone #, email address, etc),

- display_name (varchar),

- access_level (integer, not null),

- record_version_number (integer),

- record_version_created_by (varchar, not null),

- when_record_version_created (timestampz, not null),

- when_record_version_expired (timestampz),

- unique (external_id, record_version_number)

A record is valid from when_record_version_created to when_record_version_expired. This should basically work, but without enforcing the pairwise distinct relationship between external_id and auth_id, it would be possible to associate an active auth_id with more than one external_id, which would corrupt the data.




Perhaps doing less in the user table and adding an auth table?

Split out the internal_id and external_id from the auth_id and dates.

Of course, this assumes you are controlling the input of the value for the external_id column. In the example below, the external_id is populated by calling a function to get a cuid version 2 ID by default. You may have some other mechanism that gives you a unique external_id.

Here is an example (in Postgresql):

create table users_hn (internal_id int primary key generated always as identity, external_id text not null default cuid2(''::character varying, 12, 'hackernewsusersxmpl'::character varying), -- or whatever method to generate an external_id display_name text, access_level int not null );

create table users_hn_auth (users_hn_auth_id int generated always as identity, users_hn_internal_id int, auth_id text not null, start_date timestamptz check(start_date >= ((current_timestamp::date)::timestamp)), constraint fk_users_hn_internal_id foreign key (users_hn_internal_id) references users_hn(internal_id) );

/* Intial insert of new user. Only generate this user row once. / with insert_user_hn as ( insert into users_hn(display_name, access_level) values ('John Doe 1', 1) returning internal_id ) insert into users_hn_auth(users_hn_internal_id, auth_id, start_date) select insert_user_hn.internal_id, 'johndoe1@example.com', current_timestamp from insert_user_hn ;

/ Update user auth table with new data, for a specific user. All changes to auth_id and when that event occurs are recorded here. / insert into users_hn_auth(users_hn_internal_id, auth_id, start_date) --values (2, 'johndoe1@example.com', current_timestamp) --values (2, 'jdoe1@example.com', current_timestamp) --values (2, 'johnd1@example.com', current_timestamp) values (2, '(555) 555-1234', current_timestamp) ;

/ Get user info This query will need changes as more users are inserted into the table. A function with parameter of the internal_id to retrieve a specific user, for example.

*/ select u.internal_id, u.external_id, u.display_name, u.access_level, a.users_hn_auth_id, a.auth_id, a.start_date from users_hn u join users_hn_auth a on u.internal_id = a.users_hn_internal_id and a.start_date = (select au.start_date from users_hn_auth au group by au.users_hn_internal_id, au.start_date order by au.start_date desc limit 1 ) ;


Use before insert/update triggers? Which can do such queries and either fail or ignore if such row is found.


Thanks for the suggestion, but I'd really like to avoid triggers if at all possible. I dislike triggers a lot for various practical and philosophical reasons


You will likely end up with convoluted INSERT OR IGNORE INTO... WITH ...SELECT statement with WITH clause which first detects existing pairwise matching rows and if they are found, the SELECT generates values which purposedly violate some other unique constraint. If that's practically or philosophically more acceptable to you :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: