
Immutability Changes Everything - ingve
http://queue.acm.org/detail.cfm?id=2884038
======
Scarbutt
At the database level, lets say I want to take this approach with postgresql,
is using just one big table with no deletes or updates-in-place and just doing
appends one way to go about this? (assuming space is not a problem) Or is one
better off looking at stuff like datomic which has the immutability part
already set up for you?

~~~
T-R
I've been working on a functional/event sourced postgresql database for the
past few years. The approach in general is great, but we're currently
investigating alternatives for implementation (whether going to something like
datomic, or writing a DSL to generate the SQL). It's doable on postgres,
particularly with recent JSON support, but SQL is just too low level, so it's
a lot of work:

\- Because you tend to be doing a lot of different aggregations over the same
or similar data sets, the things you most want to re-use, like "group by" and
"order by" clauses or a set of columns for projections or joins, happen to be
the things SQL as a language doesn't give you the ability to abstract over.

\- A lot of what you want to do is a fold over the event stream, followed by a
map into some format, then ultimately filtering down to just the data you
want. This is an easy enough query to write, but to get reasonable
performance, you need to be able to adjust the evaluation time of parts of
this process/materialize intermediate results: you almost never want to do an
aggregation over all of history on every read, you usually want to at least do
the aggregation on write, and maybe do the `map` lazily on read. In SQL, this
turns out to be a _significant_ amount of manual work/code for something that,
conceptually, should basically just be a strictness annotation. What you'd
really want, at the very least, are materialized views with the ability to
automatically materialize on a per-row basis when dependencies change. The
recent support for Upsert will make this slightly less painful, but even so,
it's a lot of work, and a lot of space for bugs / stale de-normalized data.

\- What you need for playback to be performant is for your database to
understand which events, or parts of events, commute with each other, so it
can play them back in parallel, or at least optimize the query beyond just
running a volatile stored procedure in a loop. There's no way to communicate
this to postgres, so you have to model this explicitly yourself - you end up
having to explicitly write 'live' code (e.g., an imperative function to update
an account) that needs to produce the exact same results as 'playback' code
(e.g., a materializeable query that produces the state of the whole accounts
table from all user-related events).

To generalize, there's just no way to tell the database what constraints
you're adhering to, so it can't do any of the heavy lifting for you - either
generating code or query optimizations - and it won't do anything to make sure
you won't break those constraints, either. Something that does know how you're
using it, on the other hand, should at least theoretically be able to do all
of this work for you and a whole lot more.

------
macintux
Pat Helland gave a talk on these ideas at the first RICON:
[https://vimeo.com/52831373](https://vimeo.com/52831373)

------
dang
[https://news.ycombinator.com/item?id=8955130](https://news.ycombinator.com/item?id=8955130)

