
Ask HN: What do you know about bitemporal databases? - maest
I am in the proccess of building a system which is capable of showing data as of a past point in time. For that, I am building a set of bitemporal tables in a database (together with some adjacent binary storage systems).<p>What should I keep in mind when working with bitemporal systems?
======
jonahbenton
My personal opinion is this is difficult to do correctly and ergonomically in
SQL. One has to be clear on segregating the metamodel (the machinery that
supports as-of timekeeping) from the model (the actual data). What one winds
up wanting is a different sublanguage, an additional as-of syntax on a per
table basis, which kind of suggests that is not really the right solution
domain.

They are not SQL but Datomic[1] and Crux[2] are first class bitemporal
databases, look there for design guidelines and commentary.

1\. [https://datomic.com/](https://datomic.com/)

2\. [https://opencrux.com/](https://opencrux.com/)

~~~
j-pb
Isn't datomic 'just' a temporal DB? Through their transactor design there's
always a total order on events which removes the need for the whole "what did
you see when you created this" vs. "what would you have seen if you would have
had access to the entire distributed state of the system" shenanigans.

~~~
jonahbenton
It is both.

The transactor design ensures serialization level isolation for writes- but
semantically writes are considered "creating facts" and the time at which they
are written is the time at which they are "known" and one can ask questions
about the state of the world- the collection of facts- that were known as of a
particular time. And of course time is also a data type, so one can time based
questions on multiple levels.

The model is deeply general- facts are assertions that "this attribute of this
entity had this value at this time." One neat capability is that you can "go
back in time" and simulate applying transactions that did not occur,
supporting counterfactual and hypothetical analytics, and various kinds of
testing scenarios.

Attempting to carry through the implications of that model at the application
level in SQL is very hard.

~~~
j-pb
What you describe isn't a bitemporal database though, it's a temporal one.

Bitemporality afaik means that you get two views on your data, one that
describes state relative to individual writers (what you saw at write time in
a distributed and potentially inconsistent system) and what is actually true
considering global state (what if you had had absolute knowledge of the
distributed state at write time).

------
nobodywillobsrv
Has anyone managed to wrap CRUX from python? I want to keep all logic in
python ideally. Only thing keeping me away.

There is some other project crux that has a lot of python and this pollutes
the Google searches.

------
sharms
I have been experimenting with Crux
([https://opencrux.com](https://opencrux.com)) and it feels very easy to work
with. In past systems once they grew, I always needed to implement audit logs
/ tables / fields, and this solves that and other problems with respect to
having a transaction time, having certain data at a certain time, and being
able to update history without losing the history of what was updated.

I think many projects try to re-implement in a ad-hoc way, maybe without
knowing it, so if you can start with a bitemporal database it gives a big head
start.

------
mnm1
You may want to look into datomic or crux.

[https://github.com/juxt/crux](https://github.com/juxt/crux)
[https://www.datomic.com/](https://www.datomic.com/)

------
ripley12
This is one of my favourite topics. Most large DBs I've seen in the wild use
some form of temporal tables, but it's often a hacky hand-rolled solution. I'd
really like to see more support for bitemporal tables from vendors.

As noted above, the SQL:2011 standard supports bitemporality; here's a good
summary from SIGMOD Record:
[https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfea...](https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf)

This is a good summary of vendor support for temporal tables:
[https://www2.cs.arizona.edu/~rts/sql3.html](https://www2.cs.arizona.edu/~rts/sql3.html)

Oracle and DB2 both support bitemporal tables, SQL Server only supports
system-versioned tables. There is an extension for Postgres that supports
system-versioned tables, but it hasn't been updated in 2 years:
[https://github.com/arkhipov/temporal_tables](https://github.com/arkhipov/temporal_tables)

In the past, I've manually implemented system-versioning for audit purposes in
SQL Server using triggers, T4 templates and the CONTEXT_INFO variable to store
user IDs. It worked OK, but schema changes became much more difficult.

------
filmfact
Best to use a database with native bi-temporal support, a query planner can
get very confused about tables with bi-temporal columns and few constraints,
to the extent that it always fails to push the temporal conditions up through
the joins resulting in massive unnecessary materialization.

Also - start thinking about archiving now - n lg n is fast at the start - two
years later, not so much.

Good luck

~~~
dragonwriter
> Best to use a database with native bi-temporal support, a query planner can
> get very confused about tables with bi-temporal columns and few constraints

Why would they have few constraints, unless the DB lacks adequate support for
the kind of constraints that logically apply with temporal data (now,
admittedly, many databases _do_ lack that support, but Postgres via exclusion
constraints on ranges doesn't, and the functionality, while supporting bi-(or
tri- or quad-, or _N_ -, which once you start analyzing the temporal relations
that logically apply to data are much more common than one might think,
there's no magic universal model)temporal data, isn't limited to just that.

~~~
filmfact
Thanks - I hadn’t come across the exclusion ranges in postgresql before - I’d
need to take a look at how it works in practice.

------
pablobaz
For versioning numeric data complications include:

\- changing dimensions \- handling changing types e.g. A column of strings
becoming a compound of floats \- efficiently handling appends \- efficiently
handling corrections of previous data \- how to cleanup, remove old versions

The version store storage engine in Arctic provides fast and efficient
versioning for numeric data. The code is quite readable so it might be worth
having a look at its implementation too.

[https://github.com/man-
group/arctic/blob/master/README.md](https://github.com/man-
group/arctic/blob/master/README.md)

------
NovemberWhiskey
Not sure what language you're planning to use, but if you end up with Java and
want an ORM to layer on top of an enterprise-grade database, you could do a
lot worse than Reladomo:
[https://github.com/goldmansachs/reladomo](https://github.com/goldmansachs/reladomo)

The ORM has built-in support for various time-series data concepts including
bitemporality.

~~~
ripley12
That's quite cool, thanks. Documentation for the temporal chaining features in
case anyone else is curious: [https://goldmansachs.github.io/reladomo-
kata/reladomo-tour-d...](https://goldmansachs.github.io/reladomo-
kata/reladomo-tour-docs/tour-guide.html#N40556)

------
cdcarter
It sounds like you might only need temporal tables and not the whole
complication of _bi_temporal tables.

If you just need to get data at a point in time, you only need to store (so
called) application-time. If you need to handle both a “created date” and
“effective date” type scenario, then you’ll need bitemporality.

------
mrfox321
Are you trying to build a backtesting framework?

I've come up with a solution specifically for pandas dataframes. Not sure how
similar a solution would be for your use case.

~~~
touisteur
Hi, do you have a write-up somewhere or a github for this? Very interested.

------
adamnemecek
It's funny, I've never heard that term but looking it up, I find some
connections to some things I'm working on (that are not bitemporal databases).

What are some good resources?

A spreadsheet is a bitemporal database of sorts right?

~~~
rejschaap
I don't see how a spreadsheet is either bitemporal or a database. But you
could do some simple bitemporal modeling in a spreadsheet. I would imagine
querying to be extremely limited and cumbersome though.

~~~
nobodywillobsrv
A spreadsheet (with undo) is pretty much exactly a bitemporal database but it
only shows you the "latest" version. You can undo to go back in time but it is
not very convenient. THere are also DAGs and the code is data. Building this
at scale is what a lot of large orgs end up trying to do. Tricky bit is
probably the trade of between lazy eval (pull?) and streaming push through the
calcs. I think push is needed for low latency but am not 100%.

------
derision
Isn't this just temporal tables in SQL Server?

~~~
ripley12
SQL Server supports only half of the SQL:2011 temporal features – it does
system-versioned tables but not application-time tables. When I asked MS
engineers about it at Build 2018, there were no concrete plans to implement
application-time.

------
a_c
I use django for my day to day. I understand its "migration"
[https://docs.djangoproject.com/en/dev/topics/migrations](https://docs.djangoproject.com/en/dev/topics/migrations)
as a kind of bitemporal database. Would you mind providing more context about
your use case so maybe more concrete consideration can be discussed?

~~~
sixdimensional
Migration is about forward/reverse engineering of the data structure/content.
It isn't really what's meant by bitemporal databases, although I see why you
might think that. Temporal databases [1] are a specific area of study and
pattern.

[1]
[https://en.wikipedia.org/wiki/Temporal_database](https://en.wikipedia.org/wiki/Temporal_database)

------
sansnomme
[https://opencrux.com/](https://opencrux.com/)

------
babygoat
Is this different than a relational database with slowly changing dimensions?

------
mbrodersen
Check out "datomic".

------
j88439h84
There was mentat. Almost.

