
MariaDB Temporal Data Tables - alecbenzer
https://mariadb.com/kb/en/temporal-data-tables/
======
sjwright
I've been begging for exactly this for quite some time. Because of the way I
use databases, I've always been bewildered why this wasn't a core part of SQL
from the very beginning.

From what I'm reading there's still a lot to be fleshed out to be maximally
useful to me, but even in its current state I could imagine using this.

— I'd like to have a field property that limits stored values to a single
version and thus is automatically cleared whenever the row is updated. This
would be useful for inlining change annotations, and for associating a user_id
to specific changes.

— I'd like to be able to arbitrarily select the n-1 value of fields regardless
of their time period. E.g.

    
    
      select username, previous(username)
      from users
    

— When viewing a specific version, I'd like to know whether a field's value
was supplied in that revision. That's distinct from if the field was changed.
I want to know if the value was supplied—even if it was identical to the
previous value.

— This might be possible already (it's hard to tell) but I'd like to be able
to query/join on any revision. For example I might want to ask the question
"show me all products that james has ever modified". That could then get more
specific, e.g. "show me all products where james changed the price".

~~~
refset
> I've always been bewildered why this wasn't a core part of SQL from the very
> beginning.

It's a long and messy history (no pun intended), but essentially it was rarely
practical to consider retaining database history for the first few decades of
SQL, due to physical storage costs & limitations. Snodgrass and Jensen
proposed initial bitemporal extensions in the 90s and lot of research was done
subsequently, but most vendors didn't make their move until the 2011 standard
was formed (Oracle Flashback being the most notable exception). Unfortunately
the rollout of the 2011 temporal standard has been underwhelming across the
board, as each vendor ended up implementing something subtly different, which
I think has massively hindered adoption. Since then I would guess that
"immutability" has been the largest driving force behind the resurgence of
interest.

~~~
sjwright
> it was rarely practical to consider retaining database history for the first
> few decades of SQL

That does make sense from a historical perspective and I don't doubt that's
why. But still I find it _unsatisfying_ because any competent database schema
will always retain the history that needs to be retained. If you don't have
the storage capacity, you choose to not store so much history. If you don't
have native concepts for storing history, you kludge it yourself.

Whether you have native temporal support or have to kludge a DIY solution in
the schema, the data you need to store gets stored.

My frustration is that I feel that temporal concepts should have been deeply
native to SQL right to its core. History should have been as fundamental to
database design as columns and rows. It should be a thing you turn off when
you don't want it, not a thing you turn on when you do.

~~~
refset
> temporal concepts should have been deeply native to SQL right to its core

Oh absolutely. I think the original intuition by Snodgrass et al. in TSQL2 to
model temporality outside of the actual relational structure was a more
promising direction, otherwise the complexity of composing joins across many
tables, each with independent temporal versioning, seems rather overwhelming
(note that the article doesn't discuss joins at all). Schema migration is
another hairy topic that the 2011 standard barely addresses.

Modelling temporality outside of the database is what we've been pushing ahead
with on Crux, which provides Datalog as the native query language but will
imminently also support a simple bitemporal flavour of SQL for point-in-time
querying (using Apache Calcite), where application & system time are defined
by a single "as-of" prefix at the beginning of the statement that applies
across the entire query: [https://github.com/juxt/crux/tree/master/crux-
sql#bitemporal...](https://github.com/juxt/crux/tree/master/crux-
sql#bitemporal-querying) (queries over ranges and periods are currently out-
of-scope)

------
docsapp_io
I really hope Postgres can support temporal table out of the box. Temporal
table can simplify development for the feature that need audits.

~~~
mildbyte
Shameless plug (I'm a co-founder) but this is basically what we've built with
Splitgraph[0]: we can add change tracking to tables using PostgreSQL's audit
triggers and let the user switch between different versions of the table /
query past versions.

[0] [https://www.splitgraph.com/product/data-
lifecycle/research](https://www.splitgraph.com/product/data-
lifecycle/research)

~~~
jacques_chester
Change tracking is not a fully bitemporal scheme, though. A bitemporal table
tracks _two_ timelines. One is about when facts in the world were true ("valid
time" or "application time"), the other is about the history of particular
records in the database ("transaction time" or "system time"). Change tracking
can only capture the second.

~~~
eyelidlessness
This was extremely confusing to me on first read but, since temporal data is
an area of interest to me for improving an existing service I maintain as well
as for future endeavors, it prompted me to go do a bunch of reading to
understand what was meant by the distinction. If others are reading this and
confused, maybe what I learned will help make the distinction clearer.

There’s two cases where valid/applicable time might be meaningful:

1\. Future state known at present time (e.g. when you know now that a fact
will change at a specific future point in time).

2\. Corrections to current state which should be applied to historical state
(e.g. when you know your application produced invalid state and you want to
produce a valid state both at present and in historical representations).

The first case is used more in the literature I found, but didn’t really make
the distinction clearer for me because I have limited domain use for that kind
of behavior. The correction case really drove the point home for me, because
my use cases would benefit from it considerably.

I hope this helps other readers interested in the topic but struggling to
visualize the two timelines and how they could be used.

~~~
jacques_chester
It _is_ hard to maintain the two timelines in one's mind at once. Most of the
time I maintain the dehydrated version that "you need both" and talk myself
through the details when I need them.

Typically what happens is that folks incorrectly capture parts of both
timelines. For "valid time", folks will have "created at", "deleted at",
"updated at" fields etc and assume this covers it. Unfortunately, it doesn't
really capture proper _spans_. If I have a fact that starts being true on
Tuesday, then add another one that starts being true on Thursday, can I deduce
that the first fact was true from Tuesday until Thursday?

No, I can't logically make that claim. It's possible that the first fact was
true on Tuesday only, leaving a gap on Wednesday. Without explicit valid time,
I can't find those kinds of holes, or apply constraints against them.

Similarly, folks rely on history tables, audit tables etc to try to capture
transaction time changes. These are still not enough, for the same reason. You
need the _span_ of time, the interval, to properly assert the time during
which the database _held_ a record to be true. When we discover that the
Thursday fact was wrong, we need to be able to know over what window we held
that to be true. Overwriting the record, or entering an update into a log etc,
is not _enough_ to truly reconstruct the history. You must be explicit about
the span.

The necessity of bitemporalism was easy for me, because I had been responsible
for either creating or maintaining multiple databases in which I could not
answer a wide range of questions after the march of time. I learned many of
the standard hacks (copy the price of a stocked item into the invoice line
item, or it will change beneath you! Use slowly changing dimensions! Have a
transactions table which creates a slow-as-hell copy of what the database does
internally, but not as well!). When I read Snodgrass's book it all went
_clunk_ and I've been a wild-haired proselyte ever since.

------
alecbenzer
Had no idea until recently that MariaDB supported this out of the box. Does
anyone have experience using this? How does it compare to
[https://github.com/scalegenius/pg_bitemporal](https://github.com/scalegenius/pg_bitemporal)
?

------
amluto
> mysqldump does not read historical rows from versioned tables, and so
> historical data will not be backed up. Also, a restore of the timestamps
> would not be possible as they cannot be defined by an insert/a user.

Given this caveat, this seems unusable for production systems.

~~~
crazygringo
Well, conceptually this makes sense for what mysqldump is.

I'm guessing that "backups" would actually have to be live replicas set up
from the start, and if the master fails, you convert a replica to master.

In addition, you could perform actual static backups by pausing a replica,
backing up the actual table files themselves, then resuming the replica (and
it will catch up). In case of total failure, you just dump the table files
into a fresh install of MariaDB. (Copying database files is a common technique
for migrating data, not just SQL command import/export.)

Is there any reason why these wouldn't work?

~~~
PixyMisa
Or ZFS snapshots, for example.

From the description it looks like it would be easy to do backups, it's just
that mysqldump is not currently aware of temporal tables.

Just use

SELECT * FROM t FOR SYSTEM_TIME ALL;

And export it in an appropriate format.

~~~
crazygringo
That does, of course, export complete data, but the problem is you can't then
import it, because that breaks the entire guarantee about not being able to
edit past data.

~~~
PixyMisa
Ah. There is that.

They really need to create a workaround for that, because sooner or later you
_will_ need to migrate your data and right now it's simply impossible.

Edit: Particularly as, reading further, you cannot ALTER a temporal table.
Which does make sense, but the problem remains.

------
shivekkhurana
I’m very happy to see an open source dB which can do something similar to
Datomic/Crux, but is not tied to Clojure. It doesn’t seem as sophisticated but
I hope this project grows.

For anyone wondering why temporality matters and how this is different from
adding a “create_time” to each row, I would highly recommend watching Rich
Hickey’s talk title, “Value of Values”

------
TekMol
Is there a diff tool? Like show me all differences between now and 5 minutes
ago?

Could be nice to see what magic goes on behind the scene in some applications.

For example when you do some clicks in the backend of WordPress and wonder
what it actually did to the data.

------
crazygringo
This is fascinating. I've got two basic questions, however:

1) Is this always going to be performant with indices? It seems like "time" is
kind of like another index here, and when designing queries which indices are
used and in which order can be the difference between taking milliseconds and
taking an hour. It's not obvious to me whether this will have hidden gotchas
or query execution complexities, or if it's designed in a way that's so
restricted and integrated into indices themselves that query performance will
always remain within the same order of magnitude

2) What is the advantage of building this into the database, instead of adding
your own timestamp columns e.g. 'created_timestamp' and 'expunged_timestamp'?
Not only does that seem relatively simple, but it gives you the flexibility of
creating indices across multiple columns (including them) for desired
performance, the ability to work with tools like mysqldump, and it's just
conceptually simpler to understand the database. And if the question is data
security, is there a real difference between a "security layer" that is built
around the database, versus one built into it? It would be fairly simple to
write a command-line tool to change the MariaDB data files directly, no?

~~~
baq
re 2) - it is a complex topic but in short, the queries get really complex
really fast for anything other than a simple select. see
[http://www2.cs.arizona.edu/~rts/tdbbook.pdf](http://www2.cs.arizona.edu/~rts/tdbbook.pdf).

also, DDL migrations become a nightmare.

------
polskibus
How does this feature compare to MS SQL's Temporal Tables
[https://docs.microsoft.com/en-us/sql/relational-
databases/ta...](https://docs.microsoft.com/en-us/sql/relational-
databases/tables/temporal-tables?view=sql-server-ver15)?

This feature seems to be well fitted to support some of the cases where event
sourcing is introduced, I wonder if someone successfully applied event
sourcing with use of temporal tables to reduce the amount of work that has to
be done in the application code (Akka, etc.).

~~~
deleuze
When we looked at temporal tables in SQL Server for event sourcing, I was put
off by the fact that you have to read from multiple tables. CDC + some
external data source still seems to be the better solution here, imo.

~~~
polskibus
What do you use for your event sourcing? Do you use Akka/Akka.NET Persistence
or some other application framework?

~~~
deleuze
We use Debezium and Kafka, it's been really nice.

~~~
polskibus
Do you just process events it for analytics or do you actually perform
rehydrating application state or replicating application state?

------
Drdrdrq
I understand the benefits of this feature for audits, but how does one deal
with GDPR requirements? Is there some way to alter historic data to remove
PII, or should the affected columns be excluded?

~~~
satyrnein
Possibly the idea of "crypto-shredding" could apply, where the PII values are
encrypted and you throw away the key if you get a delete request.

------
beckingz
MariaDB continues to be great.

Now all they need is materialized views and they'll be close to postgres.

------
Xlurker
TimescaleDB competitor?

~~~
grzm
TimescaleDB is for time series data. Temporal data tables are for “versioning”
data; for example, being able to query the state of a database as-of a certain
time.

[https://en.wikipedia.org/wiki/Time_series_database](https://en.wikipedia.org/wiki/Time_series_database)

[https://en.wikipedia.org/wiki/Temporal_database](https://en.wikipedia.org/wiki/Temporal_database)

