
Migration from Postgres to Datomic - grzm
http://grishaev.me/en/pg-to-datomic
======
modarts
This comes up every time Datomic is mentioned, but it's tremendously sad to
see such a revolutionary technology lack any sizable adoption because of its
licensing model.

Datomic packages up a CQRS + event-sourced architecture in a really nice way.
You can stash graph/tree/relational/tabular/(insert shape here) data and query
it out with the powerful Datalog language. You get time travel (versioning)
for free. Etc, etc.

It's a crime that this hasn't taken off as one of the leading data stores;
hopefully Cognitec has some sort of roadmap to an open source release.

~~~
zcam
Stuart Halloway said "Datomic is not and will not be open source" in the
latest defn podcast, so don't hold your breath. Hopefully they change their
mind. They seem to be focused on a cloud (managed?) solution offering at the
moment.

~~~
johnwalker
Thats disappointing, because Datomic is not and will not be adopted. No one
wants to revolutionize data back to postgres when cognitect goes out of
business. Stable solutions are found in cloud hosting from too big to fail
companies and self-management. However, maybe they have found a niche through
people taking on risk for major technical debt.

~~~
nine_k
Exactly.

I would only make a closed-source solution a centerpiece of my business if:

* The thing is known to work for a number of other people, and uniquely solves my urgent problem, while no open solution can't do anything comparable. This is probably the Datomic niche.

* The thing was around for ages, and is a cash cow of a major and reliable software maker, and is also significantly better than open software in some important area. This is how MS or IBM sell their databases; Datomic doesn't have nearly enough mindshare to compete here.

* I'm hastily building an MVP that I plan to scrap anyway when the growth hits / the startup is acquired. Datomic is likely too expensive for that.

By necessity, the first case holds for a small number of businesses.

------
Cieplak
You can achieve similar goals to Datomic using Postgres:

    
    
        create table snapshot (
            id uuid primary key default uuid_generate_v4(),
            ts timestamp without time zone default now(),
            entity_id uuid references entity(id),
            value jsonb
        );
    
        create table entity (
            id uuid primary key default uuid_generate_v4(),
            snapshot_id uuid references snapshot(id)
        );
    

Rather than mutating the value of an _entity_ , you simply append a new record
to the _snapshot_ table and update the _entity_ 's reference to it. This gives
you a full audit trail of the entity.

~~~
saurik
It is also an extremely trivial modification to PostgreSQL to just add that
functionality: essentially you just need to disable the vacuum, reify the
current transaction identifier, and allow queries to bypass MVCC (which is
already storing exactly the kind of information people love to play up about
Datomic: it is an append-only system that only is able to delete things due to
garbage collection and the vacuum).

I did a proof-of-concept of this years ago in this linked comment (which is
the bottom of a long thread which involved Rich Hickey; I remember it being an
interesting thread).

[https://news.ycombinator.com/item?id=4448767](https://news.ycombinator.com/item?id=4448767)

At the time I did not provide a patch (which is weird for me) to show how
"this is totally just leveraging stuff already there", but I just went and
found that folder, so here is the patch:

[http://test.saurik.com/hackernews/snapshot_txid.diff](http://test.saurik.com/hackernews/snapshot_txid.diff)

~~~
jtmarmon
Although this is definitely a part of the solution, I still see the coupling
Rich mentions in his comment in the thread you linked to as an issue.

To expand: when you have to query a remote database for data, you likely only
want to perform one query for performance reasons (and certainly not an
unbounded amount of queries - maybe two or three is acceptable but not N+1).
This means that information must be passed down the call stack for what
information is needed in that single or few queries, creating coupling between
unrelated layers of your application.

To make this more concrete: imagine you write a function to find users named
Jim. At first this function is for reporting, so you just return a list of
user IDs. Later, you decide to build a dashboard. You want to render all users
named Jim here, but you need each of their names for display purposes.

Given a remote database, you now need to modify the query function to be able
to return the specific attributes you need for this use case. You can imagine
if you extend the call stack this passing gets more complicated, requires
merging of the queried items, etc.

With datomic, since your data is in memory, your original query can stay the
same, since N+1 queries are irrelevant when your data is available in RAM.

I think this point is important.

~~~
elmigranto
What's stopping you from storing Postgres database (or particular tables) in
memory if it fits? I believe that should be possible.

Also, I may not be getting what Datomic is, but why modify original function,
you can just join results with users table.

~~~
jtmarmon
Also to answer your second question, first off there's no user table in the
example so it's a bit confusing. But say we're just comparing a traditional
table-per-record-type approach and why this approach still retains the
coupling problem.

In that scenario you could join the user table. You'd be overfetching in many
scenarios but that's not a huge concern for most people, this is what active
record does. However, say you want to get more than just the columns on the
user table, then you run into the same issue. Suddenly the query caller needs
to inform the query method to include results about some unrelated table.
Because datomic is an in memory graph structure, the caller can handle
grabbing that extra information without modifying the method or its call
signature, obviating the need for this coupling.

------
bm1362
In my experience, we often ran into strange issues with Datomic- like
restoring new databases with same name but different logical id, excision.. is
hard, ingesting a large amount of read-only data without impacting txs,
inability to have non-JVM clients- as well as the cost, opaqueness and
inability to hire. We're actually going the opposite direction and wrote a
replication stream to Postgres using the tx log API.

~~~
dm3
It's definitely not without issues. However, I've found it very simple to work
with once you are aware of all the operational gotchas. Needless to say, it's
especially powerful in a 100% Clojure stack.

The features I like most are

1) the transaction log - I've done something like that many times using
Postgres and EventStore, but nothing beats the simplicity of just defining a
few queries in code and having immediate updates on new transactions delivered
to every peer.

2) idempotency - reasserting the same facts is a noop. Doing the same with a
temporal table as suggested somewhere in this thread is not as trivial.

3) consistent db snapshots - once you get a hold of the database value, all
the reads will only see the data as of time the value was retrieved. This
makes application code much easier to reason about as the database can be
treated as yet another immutable argument.

4) assembling a transaction value out of multiple pieces - same as the above,
pure functions can all contribute to the final transaction value without
having to mutate anything.

5) "free" caching on the peer - once you query something, it stays in the
peer's memory. Subject to memory constraints, of course.

You can definitely build something as nice as Datomic on top of Postgres, but
it will take weeks to get all the details right.

~~~
bm1362
This doesn't really address anything I mentioned.

The database is fine; and it was nice to work with when it was just 2
engineers. For us, it didn't scale with # of engineers, business needs like
BI, ETLs of large amount of healthcare data into Datomic; then needing to
delete said data. We spent a lot of time reinventing the wheel around schema
management and building a declarative query interface as well.

> few queries > take weeks

Godspeed.

~~~
dm3
Hey, thanks for the comment.

To clarify - I wasn't trying to address the issues you've stumbled into, but
rather list all the points I like about it.

------
hcarvalhoalves
This is a good tutorial, but there are a few misconceptions / mistakes:

    
    
        > Avoid nils
        >
        > Datomic does not support nil values for attributes. When you do not have a value for an attribute, you should either skip it or pass an empty value: a zero, an empty string, etc. That’s why the most of expressions have (or "") at the end of threading macro.
    

Datomic doesn't support `nil` on purpose: since it's a fact database, the
correct approach is to not assert the attribute (iow, just omit it). Since the
schema is a property of the database (not the entity), this makes your
database forward-compatible to any new fact that you may need to store without
having to be explicit about the past.

You can also use `(missing? ?entity :attribute)` on queries, which should do a
faster lookup on the EAVT index vs. checking for sentinel values.

    
    
        > JSON data
        >
        > In Datomic, there is no JSON type for attributes. I’m not sure I made a proper decision, but I just put those JSON data into a text attribute. Sure, where is no a way to access separate fields in a datalog query or apply roles to them. But at least I can restore the data then selecting a single entity:
    

You can flatten a JSON object into a namespaced map. At this point you get
what is essentially an entity w/ well named attributes that you can transact
and query against. Since the schema is flexible and doesn't require
migrations, in theory you can support even arbitrary objects that you don't
know the schema in advance by inspecting the deserialized object.

\---

Datomic is a beast different enough to require some learning on what is
optimal/idiomatic in terms of data modelling, because the paradigm shifts from
appending/mutating tuples over multiple tables to asserting/retracting facts
on what is essentially "one big table" (entity, attribute, value,
transaction).

------
TeeWEE
Very nice article, but in practice it almost is never usefull to switch a
production system from postgres to dynamodb. Or rewrite it from python to
clojure. But since its a pet project, its a nice way to learn new technology.

~~~
JB024066
Interestingly, once you are on datomic you have a little more freedom of
choice on underlying storage. Since Datomic uses postgres/ddb/cassandra as
just a KV store it becomes trivial to switch underlying storage using the
backup and restore feature. Nice if you decide to shop around or like a
particular feature of a storage offering.

------
ComNik
For anyone interested in exploring the datomic model, there is a great
ClojureScript in-memory implementation called datascript
([https://github.com/tonsky/datascript](https://github.com/tonsky/datascript))
by Nikita Prokopov.

------
ledgerdev
Does Event Sourcing make Datomic a moot point? Sure it can be your event log,
but it seems like overkill.

~~~
blain_the_train
I imagine datomic query language is significant more powerful.

------
brian_herman
How does datomic compare to something like
[https://github.com/ApplauseOSS/djangoevents](https://github.com/ApplauseOSS/djangoevents)
django-eventsourced?

~~~
bjconlan
Only on a pseudo transactional level. (I say pseudo because I can't be 100% of
how data is represented from the README.md of the djangoevents project it
might infact be 1:1 to Datomic's EAVT(B) like index.

They do share some commonalities such as:

\- history/transaction subscriptions and queries

\- Providing a Django ORM/Model like data access system via Datomic's 'Entity
API' (although this is Django)

But Datomic isn't just 'event sourcing' (although it's a way to think about
solving a problem).

It provides Prolog style queries, functions on the database, a GraphQL like
API, Caching (Memcache) and robust Peer library (which also makes it hard for
NON JVM languages to use Datomic as it uses a closed protocol... (although
very likely uses Artimes/Hornet MQ's internal protocol))

I am talking from a Clojure/Datomic background (so my Django experience is old
& limited) But I'm sure you could cobble together something to satisfy all
these things for your stack, Datomic just does a beautiful job for Clojure
(And I find Cognitect's strangle hold on the technology not great but I think
they've loosened the licencing enough to make it acceptable for small
businesses & startups which strikes a good balance).

