
Unofficial Guide to Datomic Internals (2014) - grzm
https://tonsky.me/blog/unofficial-guide-to-datomic-internals/
======
JimmyRuska
We used datomic in production at Time Inc around 2016. The idea of an
immutable database where you can track changes over time, or query the state
of the universe at any given point, sounded amazing for marketing or
compliance use cases. Unfortunately from a dev standpoint it did not feel like
mature system, and the performance was not where we needed it to be.

Probably the most advanced database for triple stores these days is RDFox (
[https://www.youtube.com/watch?v=-DnmuHtywFs](https://www.youtube.com/watch?v=-DnmuHtywFs)
). While datomic uses datalog for querying, RDFox uses datalog for database
reasoning, and sparql, a w3 standard for querying. As you add data to the
database you can infer new facts. If you want immutability, simply add data in
append mode only with a timestamp. But this idea you can add the business
rules/logic to the database, and have it incrementally apply that logic as you
add data is a recent advance by oxford AI research.

~~~
kendallgclark
Stardog is full of features, performance, and enterprise hardening Rdfox
hasn’t even started thinking about yet.

------
jmiskovic
For uninitiated, Nikita has created in-memory database with mostly compatible
API (datalog) for ClojureScript or Javascript. My impression is that his
variant actually has more wide-spread use than original Datomic, based on
number of open source projects that use each.

I used DataScript for a while to get familiar with graph database querying. I
was fascinated how easy it is to construct queries that mine obscure relations
between distantly related entities. I hope I get to use similar tech again.

------
twic
> This simplicity enables Datomic to do more than any relational DB or KV
> storage can ever afford

> Datomic does not manage persistence itself, instead, it outsources storage
> problems to databases implemented by other people. Data can be kept, at your
> expense, in DynamoDB, Riak, Infinispan, Couchbase or SQL database.

These things can't both be true.

~~~
tnisonoff
If the argument is Datomic can't be simpler than a relational DB because it
can utilize it for persistence, then you'd have to argue that a relational DB
can't be simpler than directly using a hard drive for your storage solution.

~~~
jayd16
"This simplicity enables a _relation DB_ to do more than any _hard drive_ can
ever afford."

That also seems debatable for the exact same reasons. I don't think this would
convince anyone that needs convincing.

------
dgb23
Additional details about Datoms:

To determine whether a Datom is being rectracted or added there is a fifth
element in the tuple [0].

There are many similarities to modelling temporal data in SQL [1]. But Datoms
are simpler and more open as you can freely build relations between them
(composable), similar to a graph-db.

[0] [https://docs.datomic.com/cloud/whatis/data-
model.html](https://docs.datomic.com/cloud/whatis/data-model.html)

[1]
[https://en.wikipedia.org/wiki/Temporal_database](https://en.wikipedia.org/wiki/Temporal_database)

------
vlmutolo
this theme is hilarious and infuriating

EDIT: You can comment out the yellow background image in the style editor and
it becomes something reasonable

~~~
adamkl
I quite like the black-on-yellow theme. It’s different, and distinctive (I’ve
read a few things on tonsky’s site, so I know what to expect).

Even more hilarious is switching to “dark mode”.

------
archarios
I went to a Clojure meetup one time and they all went on about how using
Datomic in production is a nightmare and it's generally an over-engineered
product that isn't worth the trouble in the end. Do most people who have dealt
with Datomic in production feel this way?

~~~
dwohnitmok
Anecdotally I know of one company which is also in the same boat and generally
regrets their usage of Datomic and is trying to move away from it last I
talked with them. However, there's also people on HN like dustingetz who have
had a great time with Datomic and use it as a core component of their product.

I just wish Cognitect would allow people to run public benchmarks of Datomic
to make it easier to evaluate its tradeoffs.

~~~
zeroDivisible
What is the policy of Cognitect re: public benchmarks? I did not know that.

~~~
dwohnitmok
> The Licensee hereby agrees, without the prior written consent of Cognitect,
> which may be withheld or conditioned at Cognitect’s sole discretion, it will
> not... publicly display or communicate the results of internal performance
> testing or other benchmarking or performance evaluation of the Software

From the Datomic EULA here: [https://www.datomic.com/on-prem-
eula.html](https://www.datomic.com/on-prem-eula.html)

~~~
mercer
That's just vile. is there any /good/ defense of this kind of agreement other
than a 'think of the children' argument that people might make a mistake in
their performance reviews?

~~~
fiddlerwoaroof
It's annoying, but it's pretty standard in commercial databases: if your
competitors refuse to allow public benchmarks, all it can do is hurt you.

~~~
dwohnitmok
How standard is it? As far as I know among databases MS SQL and Oracle do this
but do other commercial databases do this as well?

~~~
fiddlerwoaroof
[https://danluu.com/anon-benchmark/](https://danluu.com/anon-benchmark/)

It’s common enough to have a name: “DeWitt clause”. It sounds like IBM is the
only major commercial rdbms vendor to allow benchmarks?

~~~
dwohnitmok
That article only lists MS and Oracle though. Apart from IBM, I don't think
CockroachDB Enterprise has such a prohibition, nor does Google Spanner (I
think?), nor does Amazon Aurora (again I think?). And of course all the open
source competitors don't have this clause.

Basically my impression is that DeWitt clauses are common enough to be well-
known, but still in the distinct minority. That's just an impression though.

------
paulgb
One thing I've never understood is why all the indexes have transaction
_last_. One of the selling points of Datomic is that it supports as-of
queries, but using the EAVT or AEVT indexes requires it to scan _all_ historic
values of that attribute, right?

In most situations this is probably fine, but if you have data that changes
frequently it seems like this could slow queries down compared to an EATV or
AETV index.

It's also likely that the people who made Datomic are both smarter about this
stuff than me and put more thought into it than I have, so I'd love to know
what the reasoning behind the choice of index is.

(PS @dang it would be nice to have (2014) in the title)

~~~
nlitened
The page mentions the Log index which is sorted by transaction id. It should
be enough to support as-of if I understand correctly.

~~~
paulgb
The log index supports as-of if you know the actual transaction ID, but if you
want to look up by entity/attribute efficiently it's not much help because you
don't know when the data point you're interested in was last modified.

~~~
nlitened
I think in this case you’d find all datoms via normal EAVT index and then sort
the results by transaction id, dropping everything after your desired
transaction.

