
Immutable Databases - adlrocha
https://adlrocha.substack.com/p/adlrocha-immutable-databases
======
yogthos
Crux [https://opencrux.com](https://opencrux.com) and Datomic
[https://www.datomic.com](https://www.datomic.com) are worth mentioning as
well. Both are able to leverage existing relational databases like Postgres
internally.

~~~
dzonga
looked at those a while back. seem to be only available for clojure.

~~~
kot-behemoth
Juxt were very vocal about making Crux available beyond Clojure (can't find
the exact quote right now, unfortunately). Perhaps the last time you checked
it, any non-Clojure support was still WIP. However, since then they definitely
have Java API available (Javadoc here: [https://crux-doc.s3.eu-
west-2.amazonaws.com/crux-javadoc/20....](https://crux-doc.s3.eu-
west-2.amazonaws.com/crux-javadoc/20.05-1.8.4-alpha/crux/api/package-
summary.html)), as well as REST API (see
[https://opencrux.com/docs#restapi](https://opencrux.com/docs#restapi)).

~~~
refset
That's right, we're certainly keen to help all JVM users who wish to embed
Crux, and for users sitting outside the JVM we have significantly more
comprehensive JSON support and SQL queries in the works too.

Our upcoming SQL module is based on Apache Calcite which does a lot of heavy
lifting and compiles SQL joins to Crux's native Datalog. For the curious:
[https://github.com/juxt/crux/blob/0f7d9c66db952a65efb4cba7e3...](https://github.com/juxt/crux/blob/0f7d9c66db952a65efb4cba7e3f797ef19db6132/crux-
sql/test/crux/calcite_test.clj)

------
scottfr
I was reading BigQuery the Definitive Guide last week and I was surprised to
learn that BigQuery is actually immutable at its core even when using DML
statements.

This is surfaced in the query language and you can query the table as it
existed in the past using the syntax:

    
    
      SELECT * FROM
        table_name
      FOR SYSTEM TIME AS OF
        TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -5 DAY)
    

The tables get rewritten periodically squashing this change history so you can
only access states for the past week.

~~~
jacques_chester
What you're seeing is an SQL:2011 thing -- system-versioned tables. It's from
a more general concept of bitemporal databases.

It's not quite the same. A cryptographic chain or Merkle tree can prove that
the history has gaps which appeared since the original recording was made. A
system-time table's safety rely on the guarantees of the implementing
database.

------
philips
Relatedly: Go’s module proxy uses a similar sort of Merkle tree and rsc has an
open source sqlite backed version of that service up on GitHub:
[https://github.com/rsc/tlogdb](https://github.com/rsc/tlogdb)

I built a curl/wget like application recently that does URL binary
transparency and is backed by tlog.
[https://github.com/transparencylog/btget](https://github.com/transparencylog/btget)

It is a prototype but some folks might find it interesting.

I think the critical thing for these sorts of immutable databases is having
lots of clients with long lived cache’s of the proofs to keep the db’s
accountable.

~~~
ithkuil
I made the same thing but (ab)using the go module proxy own transparent log
infrastructure:

[https://github.com/mkmik/getsum](https://github.com/mkmik/getsum)

~~~
philips
I built it initially on top of certificate transparency in a similar vain.
However it really wasn't a good experience for users and hard to onboard
projects since lots of folks have different formats and URL structures

[https://github.com/merklecounty/rget](https://github.com/merklecounty/rget)

------
ashtonkem
I believe that Immutable databases sans crypto have quite a large amount of
utility for _some_ use cases, since you can use such a database to give you
change history for free, separate from your domain schema.

Datomic is pretty close to what I have in mind, but it’s far too Clojure
specific to gain traction outside of that language, IMHO.

~~~
krn
> Datomic is pretty close to what I have in mind, but it’s far too Clojure
> specific to gain traction outside of that language, IMHO.

I believe that Crux[1] might be the solution, especially with its open nature.

[1] [https://opencrux.com/](https://opencrux.com/)

------
peterwwillis
This is a misuse of the term immutable to latch onto its cache as a popular
term. Immutability has nothing to do with verifiability or security or
integrity. Immutable just means 'does not change'. It's like saying read-only.

I really like the idea of a database with integrity and verifiability and
repeatability and fast recovery, but let's not muddy the definition of
immutable or people won't understand its use in other contexts.

------
jacques_chester
As a note, the idea of using chained hashes to verify sequential records
predates blockchain: it was called "hash chaining". The "block" in block chain
refers to batching these up to reduce the total number of network interactions
required.

[https://en.wikipedia.org/wiki/Hash_chain](https://en.wikipedia.org/wiki/Hash_chain)

~~~
CuriousSkeptic
Aren’t you thinking of Merkle Trees?
[https://en.m.wikipedia.org/wiki/Merkle_tree](https://en.m.wikipedia.org/wiki/Merkle_tree)

~~~
jacques_chester
I was prepared to say "yes and no", insofar as Merkle trees are more general.

But some poking around reveals that Merkle's original patent was filed in
1979, whereas Lamport's paper introducing a cryptographic hash chain (S/KEY)
was published in 1981.

For an append-only linear data structure like a log or database table
configured to be append-only, hash chaining is fine. I've used it for that
purpose. Merkle trees are much more useful for anything that needs to fan out
or (as git shows) fan in.

My real point was meant to be that these ideas predate blockchain, they don't
necessarily need new names.

[https://crypto.stackexchange.com/questions/68290/when-was-
ha...](https://crypto.stackexchange.com/questions/68290/when-was-hash-chain-
first-used)

------
Smaug123
I have a possibly very stupid question: how is this not "basically just Git"?
Git is a content-addressable distributed filesystem which represents its
history immutably and verifiably as a tree of committed transactions. Is the
point that you need much more data throughput than Git can handle, or
something?

~~~
carapace
I don't know yet how it will turn out but I'm making a simple prototype DB
using Prolog-syntax flat files stored in git.

(SWI Prolog provides a specialized `persistency` module that is kind of "log-
oriented" ( [https://www.swi-
prolog.org/pldoc/man?section=persistency](https://www.swi-
prolog.org/pldoc/man?section=persistency) ) but I want to see how well just
plain ol' Prolog files work.)

~~~
throwaway_pdp09
Like this
[https://en.wikipedia.org/wiki/Datalog](https://en.wikipedia.org/wiki/Datalog)
perhaps?

~~~
carapace
Nah, just plain old Prolog.

(I've only been working with Prolog for about a year. I know there are other
things out there like Datalog, Mercury, and Kowalski's Logic Production
Systems, but I want to grok the root in fullness before I get to those.)

------
toolslive
Why is it called "immutable" iso "persistent"?

[https://en.wikipedia.org/wiki/Persistent_data_structure](https://en.wikipedia.org/wiki/Persistent_data_structure)

~~~
convolvatron
nothing is ever deleted in an immutable store. a persistent store survives
across reboots, but you can delete elements

if you can deal with the consequences, removing deletes makes dealing with
concurrency a lot easier

~~~
amelius
Would such a database be practical when dealing with laws like GDPR where you
have to actually delete stuff sometimes?

~~~
jacobobryant
Could be an issue. For that reason, the two immutable databases with which I'm
familiar (Crux and Datomic[1]) provide operations for removing bits of data
from history.

[1]on-prem, not cloud

However even if the DB doesn't offer that operation, you could use "crypto
shredding" where you encrypt the data before putting it in the immutable db
and then store the key in some kind of mutable store. (Then delete the key
when you want to delete its corresponding data).

There's also this post which describes some other methods of making a Datomic
cloud system gdpr compliant:
[https://vvvvalvalval.github.io/posts/2018-05-01-making-a-
dat...](https://vvvvalvalval.github.io/posts/2018-05-01-making-a-datomic-
system-gdpr-compliant.html)

------
tuxxy
> ... where instead of using Merkle Trees, other cryptographic primitives
> could be use to ensure tamper proofness, such as Zero Knowledge Proofs

What would your zero knowledge proof _prove_ exactly without some data
structure behind it? In many applications, the zero knowledge proof is being
used to prove that something in a data structure is correct/valid/etc. You
can't just replace these data structures with zero knowledge proofs
arbitrarily.

------
hirundo
"Art. 17 GDPR: Right to erasure ('right to be forgotten') 1. The data subject
shall have the right to obtain from the controller the erasure of personal
data concerning him or her without undue delay..."

How does this right affect a website backed by an immutable database? Is it
enough for data to be superseded by later data, such as an assertion that the
prior data is defunct? Or does it have to be actually erased? Can it be
considered erased for Article 17 purposes if the database owner can still
access it?

Is a website with an immutable database illegal under the GDPR after a court
order to delete something has been received? Datomic devs want to know.

[https://gdpr-info.eu/art-17-gdpr/](https://gdpr-info.eu/art-17-gdpr/)

~~~
vbezhenar
What does it mean "actually erased"? If I delete file, its contents are still
on disk. If SSD remaps block, it might be inaccessible by operating system at
all but it still technically contains data.

~~~
derefr
The spirit of "right to be forgotten" is a right to not have past events
unduly influence your _public_ image.

I would assume that—unless it's been proven that you can GDPR-takedown-request
a credit agency to erase your credit history—there's no "right to be
forgotten" as applies to never-publicized, company-internal data about you.

~~~
ealexhudson
That's not quite it. The different legal bases for processing personal data
come with different rights; one of them is the right of erasure. It doesn't
matter if the data is public or private.

This isn't a general bazooka - you'd be unlikely to erase the data that
generates a credit score, for example, but then there will be equivalent
rights (right to explanation of an automated decision, plus the right to
correct a mistaken record). All of these rest on the ability of the subject to
gain access to their record in the first place too.

Immutability isn't an automatic problem for GDPR - you're allowed to take
backups of databases too! - but it is axiomatically more difficult to be in
compliance with such an arrangement.

------
tetromino_
Data, especially old data, is very often a liability. It carries storage
costs, legal and regulatory risks, and makes a tempting target for attackers.
Outside of a few domains like source code management, wouldn't you by default
want a _mutable_ database that auto-scrubs old data based on retention rules?

------
beobab
I remember reading "Heirs of Empire" by David Weber, and it casually mentioned
that Dakax had a database that was immutable. I remember thinking that was a
good idea, and planned to think about it later, but alas I had forgotten about
it by the time I'd finished the book.

[edit: added who's database it was]

------
mD5pPxMcS6fVWKE
Well, immutable database is just a database without the "delete" and "update"
operations. Blockchain, on the other hand, is something entirely different -
it's a distributed consensus protocol.

------
bobbiechen
_> Actually, I expect to start seeing immutable databases applied to some of
these use cases in no time:

> To immutably store every update to sensitive database fields (credit card or
> bank account data) of an existing application database._

Would this cause legal issues, for example, with GDPR deletion requests? I
can't imagine that regulators would accept the answer "Sorry, my database
doesn't allow me to delete data". So then you would need a way to make a "real
delete", which seems like it might erode the benefits described.

