E.g. The description of an item in a sales invoice is the description at the time the contract of sale is made, and must be immortalised as such as a copy of that description.
We rely strongly on this at my place of work, it really works! Essentially, you just need two tstzrange columns, representing: (1) when the row was a “valid” representation of the key, (2) when the row could have been used to conduct other “transactions”.
With a valid period and a transaction period, you have a history of the values of an object, and the ability to make non-destructive updates to that object.
It’s an essential component of any audit-worthy system, because it empowers you to trivially answer the question “what did our database think John’s 2018/01/01 address was on the day that we mailed him a check on 2017/12/15”?
Let's say that the present concept of John's address is "123 Apple St"; this was inserted into the database at 1/1/2020.
We don't have credible evidence for what John's address was prior to 1/1/2020.
Therefore, the row looks like this:
> (address:"123 Apple St", valid_period:[1/1/2020,+inf), transaction_period:[1/1/2020,+inf))
This is a case where transaction period and valid period are equal. If John sent us this information by mail, and he signed it 12/25/2019, we have credible evidence that this was John's address, at least effective 12/25/2019.
Therefore, his row would look like this: (address:"123 Apple St", valid_period:[12/25/2019,+inf), transaction_period:[1/1/2020,+inf)).
Now, the transaction period is still 1/1/2020, because this information found it's way into the database on 1/1/2020.
Now, let's continue with the second scenario. Let's say we get a second letter from John, processed on 6/1/2020, saying that he moved to "456 Orange St" on 5/1/2020. There are now two rows in the database, as below:
> (address:"123 Apple St", vp:[12/25/2019,5/1/2020), tp:[1/1/2020,+inf))
> (address:"456 Orange St", vp:[5/1/2020,+inf), tp:[6/1/2020,+inf))
Then, we receive a FINAL letter from John, processed on 7/1/2020, revealing that his previous letter contained a typo! it was "789 Orange St", not "456"! Darn. Our table now contains three rows:
> (address:"456 Orange St", vp:[5/1/2020,+inf), tp:[6/1/2020,7/1/2020))
> (address:"456 Orange St", vp:[5/1/2020,+inf), tp:[7/1/2020,+inf))
Let me know if you have any questions about this example!
- Does the validity period of the erroneous entry ever get closed?
- Or the transaction period of the one that got superseded?
- Do transaction periods close for reasons other than finding out something was wrong, or can I think of them as the period that the rest of the row (including validity) is/was believed?
Thanks for the detailed explanation. This is really interesting.
I had a typo in the last row, it should have been
> (“789 Orange St”, vp:(5/1/2020, +inf), tp:(7/1/2020, +inf))
- the valid period of the middle record is never closed, because it would be a misrepresentation of how the database’s perception of the address at that point in ”transaction time”.
- indeed, the superseded transaction period is closed.
- formally, transaction periods close (and new rows are created) whenever a column’s value changes. Think of it like an “updated at” time stamp, except the meaning is more like “canonical during”. The second half of your question is totally correct, the TP is the period during which the values representing the specified slice of valid time are/were believed.
As the third bullet point implies, one characteristic of this schema is that you have “non-destructive” updates: state is never lost, it’s just put into transaction history. This makes it possible to “roll back” to an earlier, known-good state: simply specify a point in transaction time.
There’s a whole additional rabbit hole to dive down: how to make this stuff fast and intuitive. People have mostly solved the fast part, but we’re still working on the intuition.
The intuition I'm building is that the validity period is metadata and the transaction period is metametadata. I think the similarity between them is a little deceptive, since the validity period is in some sense just another piece of data sort of controlled by (or interpreted in light of) the transaction period, which is itself a way to simulate versioning the whole database; I think of it like the internal representation for a database wrapper that lets you time travel in a database with a history-oblivious schema. In that vision, validity periods are actually part of the history-oblivious schema. You could totally have "transaction periods" for stuff with no concept of validity period, e.g. something timeless where our understanding evolves over time. If you're trying to teach people, I'd suggest introducing the two concepts independently, then showing the interplay with an example like the one you gave here.
I don't know if weird nested contexts can ever be really "intuitive". :D
1. create a "hole" in the current transaction space's valid space (through expirations of records in transaction time, valid time).
2. insert the new version at (transaction_period=(now, inf), valid_period=(effective_date, inf)).
It sounds complicated, and it sometimes is, but in practice most of the hard work (specifically around transaction periods) is performed for you by triggers. Usually you only have to think about valid periods, as a developer, which is easier to wrap your mind around.
The unfortunate thing is that there aren't a lot of very straightforward open source implementations of bitemporal triggers. The ones that exist are mostly designed to be theoretically sound and feature-complete, rather than usable.
At my company, we have our own implementation that has slowly grown over the years. It's about 500 lines of SQL triggers, and maybe 2k lines of library code in Python/Go to make the ergonomics a little bit easier on the developer.
If your InvoiceItem has no description, and only has a FK to Product to get the description you're going to have a bad time. Once you build a system around a wrong relationship like that, it's very hard to go back and fix the inevitable issue that updating a product changes old sales invoice records.
in practice, you have to support retroactive changes, so you have to have "middle" values for versions (i.e. 1.1 between 1 and 2).
People are usually interested in the timestamps associated with those versions (for which period of time is this version valid, when was this version created), so in practice it's easier to just keep track of the timestamps, and leave the versions implicit.
+1. I maintain a couple of systems that younger-me wrote 15+ years ago. If younger-me had understood this, I could have avoided much re-engineering :)