Hacker News new | comments | show | ask | jobs | submit login
Migration from Postgres to Datomic (grishaev.me)
170 points by grzm 8 months ago | hide | past | web | favorite | 81 comments

This comes up every time Datomic is mentioned, but it's tremendously sad to see such a revolutionary technology lack any sizable adoption because of its licensing model.

Datomic packages up a CQRS + event-sourced architecture in a really nice way. You can stash graph/tree/relational/tabular/(insert shape here) data and query it out with the powerful Datalog language. You get time travel (versioning) for free. Etc, etc.

It's a crime that this hasn't taken off as one of the leading data stores; hopefully Cognitec has some sort of roadmap to an open source release.

Stuart Halloway said "Datomic is not and will not be open source" in the latest defn podcast, so don't hold your breath. Hopefully they change their mind. They seem to be focused on a cloud (managed?) solution offering at the moment.

Thats disappointing, because Datomic is not and will not be adopted. No one wants to revolutionize data back to postgres when cognitect goes out of business. Stable solutions are found in cloud hosting from too big to fail companies and self-management. However, maybe they have found a niche through people taking on risk for major technical debt.


I would only make a closed-source solution a centerpiece of my business if:

* The thing is known to work for a number of other people, and uniquely solves my urgent problem, while no open solution can't do anything comparable. This is probably the Datomic niche.

* The thing was around for ages, and is a cash cow of a major and reliable software maker, and is also significantly better than open software in some important area. This is how MS or IBM sell their databases; Datomic doesn't have nearly enough mindshare to compete here.

* I'm hastily building an MVP that I plan to scrap anyway when the growth hits / the startup is acquired. Datomic is likely too expensive for that.

By necessity, the first case holds for a small number of businesses.

It probably pays the bills, I guess they're happy with it if that's still a thing after 5 years (?).

It's not helping clojure grow as much as it could though and it's a bit a chicken/egg problem, more clojure users could be more datomic users and vis-versa.

Fortunately, at least if Cognitect does go out of business, the Clojure language would survive fine. Hence why Clojure with Postgres (a very common pairing) is by far the safest bet.

That's very disappointing. While I don't begrudge Cognitect money, it's obviously possible to make money with an OSS database and I wish they'd gone that route. It's my opinion that your database is the #1 most important part of your stack to be open, as your data is the most important thing.

I've been considering trying to get up to speed with Mentat[0], which is a Mozilla project to create a Datomic-like layer over SQLite for desktop applications, and help out with that or even try to start a full open source clone of Datomic, though probably won't happen for a good while due to real life happening.

It's got so many good ideas and it sucks to see them go to waste in a proprietary database where basically nobody gets to use them.

[0]: https://github.com/mozilla/mentat

> it's obviously possible

What if it's unobviously not possible?

do you remember when in the episode they talk about this?

edit: found it https://soundcloud.com/defn-771544745/23-the-right-honourabl...

Where can I find more about this cloud offering ????

Hell, even implementing CRUD applications with Datomic / Clojure and Clojurescript is blowing everything else out of the water.

No string literals, pure data. Add transit to the mix and you get full-stack killer combo with unparalleled productivity.

With so many parts from Cognitect filling the stack it would be silly not to boost adoption with a friendly licensing.

"Unparalleled productivity" sounds like hyperbole, especially considering the posts below, showing how exactly the same thing can be done trivially in Postgres.

Do you have an example?

It's because Datomic doesn't have the object/relational impedance mismatch. Because immutbility yields idealized caching so a lot of hard performance problems inherent to RDBMS go away. It's like moving from CVS to Git.

Fuck it, hyperfiddle isn't launched but lets just post it here. Hyperfiddle is JSFiddle + Datomic. http://hyperfiddle.net/ If you care about this, you should reach out, please email me!

It sounds like the site's not finished yet, but FYI it's a trainwreck on iPhone6 + Safari

I like the idea. Nice work!

> especially considering the posts below, showing how exactly the same thing can be done trivially in Postgres.

So then where is the open source version / extension to Postgres?

How about this - I'll give you $1000 if you can give me a seamless extension on top of Postgres that has most of the functionality of Datomic.

Since it is trivial it is easy money for you.

GGP talked about building a CRUD app with "unparalleled productivity". That's where the burden of proof is, not to the GP who is skeptical about that claim.

"So then where is the open source version / extension to Postgres?"

As a concrete example I like the way in the datomic schema you can eliminate bridge tables for many-many relationships using :db/cardinality :db.cardinality/many and then you store or get a vector.

There should be a way to have a super-SQL middleware compatible with any language bindings including a bare CLI that eats "SQL plus some cardinality" and behind the scenes implements the bridge table manipulations. Don't repeat yourself and all that, and many/many tables are kinda repetitive.

There doesn't seem to be anything remotely like that out there.

You are asking for evidence of disproof. That is a logical paradox, and frequently used as an argument by extremist politicians.

Where is the evidence that you do not [insert anything here]? That's a slippery slope of challenge.

This doesn't answer your question, but I just wanted to make clear that the posts below certainly do not show, trivially or otherwise, how to use "no string literals" and "pure data" in Postgres.

Those concepts don't make any sense, so it's no wonder that a serious RDMBS like Postgres doesn't support them.

Can you explain your statement?

Can you explain what those concepts mean and why are they important? It's the first time I've read about them or Datomic for that matter.

I asked since I don't know much about datomic either or how usable is it in production. So I haven't much to share.

Unparalleled productivity maybe, but only once you've learned how to read code that looks like an AST, and to juggle parens in Emacs (learn paredit etc).

For better or worse, history has proven that doesn't appeal to many developers. Lisps just aren't popular, and that lack of mindshare matters.

There is parinfer which turns lisp into a white space significant language, no special commands necessary

I don't understand this line of thinking. The product is outstanding and likely will be, if it is not already, the lifeblood of Cognitect as a company (in conjunction with their consulting services I'm sure). What is wrong with a company building a worthwhile product to sell? This is a community (Hacker News that is) that should be in full support of companies building great products to sell!

I think it's fueled by a naive idea that paying for closed source software is inherently bad. Actually, it's not necessarily a naive idea, as this specific OP may have weighed the pros and cons and have decided that it's not worth the risk, but I do think there are very wrongheaded cargo cult ideas around this.

We tend to gloss over the fact that when dealing with OSS:

- Project maintainers can leave/change/stop supporting the software and the idea that "since it's OSS, I can just pick up the slack" sounds better in ones head, then it does in practice.

- Even large projects (think Angular) can move in odd and unexpected directions, which in reality simulate a company "going out of business" for all intents and purposes.

- OSS 'support' is great, until it's not. Meaning, if there are plenty of SO questions and answers, it's great, until your specific issue isn't addressed, at which point you're either looking to a third party to pay for support, or you're rolling your sleeves up and digging into a codebase you have no experience with outside of an external API level understanding.

The dirty secret of large successful companies is that they actually pay for software, and the support of that software because they understand that time = money and vice versa.

After using datomic in production, I'd welcome my competitor to use Postgres. I'd even push them to. It's a monstrous competitive advantage for me.

There's more than just support to paid db services, look at neo4j, datastax, elastic, mongo and cockroachdb among others.

There is a "narrow niche" strategy, when you have a cool thing and only sell it to a small group of well-paying customers.

There is an "open" strategy, when you offer the core part of your cool thing widely (cheaply / freemium model / for free), and profit from only e.g. 10% of the users that buy premium plans. You also have a fair chance to make your better way of doing things predominant, and put in the hands of a lot of people that won't pay you either way.

The point is that when your cool thing may have a really wide adoption, with the "open" model you may have e.g. 50x the market share of the "narrow niche" model, and thus 5x the profit. Increased mindshare also helps further adoption. This, of course, depends on the adoption rate and paying customer rate.

> I don't understand this line of thinking.

Haven't you noticed the popularity contest going on with computing?

If datomic isn't popular it will disappear.

Reminds me of Rebol. Technically as good as Lisp but arguably better as it is batteries included e.g. GUI. By the time the author made it open source it was far too late

You might be interested in the Red language, which is an open source reboot of Rebol. It's actively developed, and has been for a while, although I think there are only a handful of people (one?) working on it.


$5k per year is almost free comparing to average single coder compensation.

The price is not really an issue. It could be $1 the problem would be the same.

It's essentially a black box now, you have no auditing possibility, no way to extend it, no guarantee you can pick up the pieces is cognitect/rich goes away. You basically sign a check, and hope for the better.

I'm sure price is still the issue. I bet for the right price Cognitect would be happy to provide the source to a customer.

Note: I too wish Datomic was OSS, just disagree that price isn't the issue. It is.

Until they go out of business and you're fucked. That's when you wish you'd stuck with Postgres.

Being closed for free doesn't mean it isn't open source for those that pay.

This was a common model before FOSS took off and it is still quite common in enterprise products.

If company X goes out of business you get the latest source version you paid for.

Having said this, I don't know if Cognitec has such kind of contracts.

Going off topic here, but we have an enterprise lead for our SaaS product who is asking the same. They want some sort of warranty that they can keep hosting the product themselves, should we go bust. Going open source (or open core) is currently not an option for us.

Does anyone have any advice how to approach this? Example contracts, good experiences with escrow services maybe? Or bad experiences of course.

I worked for a company which sold hosted services in the financial sector. Escrow was a standard part of contracts. However, it wasn't taken immensely seriously; we faithfully put all of our source code into escrow once a year (or perhaps more often, but not as often as we released), but not all of the build, deployment, and infrastructure tools and config you would actually need to turn that code into a replica of the application. We didn't make any effort to "restore from backup" to verify that what was in escrow was actually any use. Customers, and their auditors, seemed happy with that.

I do feel that in a really well-run operation, your build, deploy, and operation should be so automated that that's just another set of repositories to put in escrow. One way to look at it is that escrow is just another kind of disaster recovery; if you have safe copies of everything you would need to restore your operations if you lost all your current servers, then it's easy to put those in escrow too.

It's more meaningful for a shrink wrap software. It's easy to archive and handover the code repo, packages and documentation.

It's not really effective for saving a cloud hosting. A platform like that is not easily redeployable.

Thanks. Do you know what escrow service that was?

It's difficult to do for SaaS since you're not already packaging up the product.

Datomic has perpetual free licenses so if they go bankrupt, you just keep the software you already have.

If you had an on-prem setup or a hosted white label solution, then it'd be easier to license.

> Being closed for free doesn't mean it isn't open source for those that pay.

True, but...

> This was a common model before FOSS took off and it is still quite common in enterprise products.

No, it wasn't and it's not. “Shared source” for paying customers was common before Free/Open Source Software (FOSS) took off (“Free” doesn't modify “Open Source”, they are essentially synonyms that identify the same license features but which are preferred by different ideological factions that prefer those features for different reasons.) Shared source does include access to the source, and it may include some rights to create derivatives (usually with no or restricted distribution rights), but it is not open source.

Who cares about the religious meanings? What matters is having access to the code, whatever form it might be.

Also no one ever referred to it as "Shared Source" in the 80's.

Just to dig out one such product, here is the brochure of Turbo Pascal 5.5 from 1989.


" Turbo Pascal 5.5 Runtime Library Source

Modify the runtime library source code or use it as it is. You get the assembly language and Pascal source code to the System, Dos, Crt, Printer and Turbo3 units. It comes with a batch file to help with recompiling and building TURBO.TPL."

For me, for my work, that was open enough.

> Who cares about the religious meanings?

It's no religion; the terms have well-established meanings.

> What matters is having access to the code, whatever form it might be.

The legal permissions you have to make use of the code very much matter, too. And those vary considerably among non-Open Source, source available/shared source license arrangements.

Source code escrow is standard for this sort of vendor contract.

The same can happen with every system, and the fact it is OSS won't help you much on practice. E.g. sqlite is signing >10 year contracts with some of their commercial clients which guarantee at least some level of professional support. Many clients won't even touch your software, regardless it is OSS or not, unless you can provide such kind of involvement.

Or until they get bought up and shut their doors. Now where have I seen that one... :-)

That's 2 programmer months where I live.

The licensing policy may be unfortunate.

The more important question is how many key patents are involved. That is, how feasible a comparable open-source project might be.

You can achieve similar goals to Datomic using Postgres:

    create table snapshot (
        id uuid primary key default uuid_generate_v4(),
        ts timestamp without time zone default now(),
        entity_id uuid references entity(id),
        value jsonb

    create table entity (
        id uuid primary key default uuid_generate_v4(),
        snapshot_id uuid references snapshot(id)
Rather than mutating the value of an entity, you simply append a new record to the snapshot table and update the entity's reference to it. This gives you a full audit trail of the entity.

It is also an extremely trivial modification to PostgreSQL to just add that functionality: essentially you just need to disable the vacuum, reify the current transaction identifier, and allow queries to bypass MVCC (which is already storing exactly the kind of information people love to play up about Datomic: it is an append-only system that only is able to delete things due to garbage collection and the vacuum).

I did a proof-of-concept of this years ago in this linked comment (which is the bottom of a long thread which involved Rich Hickey; I remember it being an interesting thread).


At the time I did not provide a patch (which is weird for me) to show how "this is totally just leveraging stuff already there", but I just went and found that folder, so here is the patch:


Although this is definitely a part of the solution, I still see the coupling Rich mentions in his comment in the thread you linked to as an issue.

To expand: when you have to query a remote database for data, you likely only want to perform one query for performance reasons (and certainly not an unbounded amount of queries - maybe two or three is acceptable but not N+1). This means that information must be passed down the call stack for what information is needed in that single or few queries, creating coupling between unrelated layers of your application.

To make this more concrete: imagine you write a function to find users named Jim. At first this function is for reporting, so you just return a list of user IDs. Later, you decide to build a dashboard. You want to render all users named Jim here, but you need each of their names for display purposes.

Given a remote database, you now need to modify the query function to be able to return the specific attributes you need for this use case. You can imagine if you extend the call stack this passing gets more complicated, requires merging of the queried items, etc.

With datomic, since your data is in memory, your original query can stay the same, since N+1 queries are irrelevant when your data is available in RAM.

I think this point is important.

What's stopping you from storing Postgres database (or particular tables) in memory if it fits? I believe that should be possible.

Also, I may not be getting what Datomic is, but why modify original function, you can just join results with users table.

Also to answer your second question, first off there's no user table in the example so it's a bit confusing. But say we're just comparing a traditional table-per-record-type approach and why this approach still retains the coupling problem.

In that scenario you could join the user table. You'd be overfetching in many scenarios but that's not a huge concern for most people, this is what active record does. However, say you want to get more than just the columns on the user table, then you run into the same issue. Suddenly the query caller needs to inform the query method to include results about some unrelated table. Because datomic is an in memory graph structure, the caller can handle grabbing that extra information without modifying the method or its call signature, obviating the need for this coupling.

The machinenry for keeping peers up to date, for one. Datomic distributes all writes in real time to peers. You could probably recreate this with notify and listen in Postgres although I'm not familiar enough with the details to know if it would fully work.

Please please please try and get this into Postgres!

AIUI, this is more or less how Oracle's flashback queries work:


I'm a n00b and postgres vacuum was news to me. Can I ask, in a postgres cluster with high traffic, how do you manage the nodes? Do you take them offline before starting a vacuum process?

Conceptually that is fine but there are complications (speaking based on recent and ongoing experience on a project I'm working on):

1. When you have multiple entity tables that are inter-related, snapshotting one in isolation is not sufficient.

2. When only certain changes on certain fields should result in a new snapshot.

Granted, both of these are solvable with built-in Postgres mechanisms (and a bit of co-operation from the application server) but simple it is not.

In our case we also move historic records to a second 'history' table.

To add to your list, with 1. referential integrity becomes a problem. Basically you move it up the stack into the application code and lose consistency guarantees RDBMS can provide.

I'm curious wether Datomic supports RE and how.

Alternatively, you can automate this with triggers, where the client only needs to know about the entity table, but all updates to an entity cause the previous value to be inserted into another table entity_log. We do this and it works well for us.

See also the Temporal Tables extension to PostgreSQL https://github.com/arkhipov/temporal_tables

The whole point of storing history is ability for clients to explicitly query it (and sometimes manipulate it), isn't it?

You can query a history table.

Not if "client only needs to know about the entity table"

So If you need to give history access to your client I guess you can put up a "entity_history" function that will generate a snapshot entity table from his request and the log table.

It all depend on your need, RDMS schema are customely build to meet customers needs, I don't believe in proverbial Swiss knife tool anymore.

In my experience, we often ran into strange issues with Datomic- like restoring new databases with same name but different logical id, excision.. is hard, ingesting a large amount of read-only data without impacting txs, inability to have non-JVM clients- as well as the cost, opaqueness and inability to hire. We're actually going the opposite direction and wrote a replication stream to Postgres using the tx log API.

It's definitely not without issues. However, I've found it very simple to work with once you are aware of all the operational gotchas. Needless to say, it's especially powerful in a 100% Clojure stack.

The features I like most are

1) the transaction log - I've done something like that many times using Postgres and EventStore, but nothing beats the simplicity of just defining a few queries in code and having immediate updates on new transactions delivered to every peer.

2) idempotency - reasserting the same facts is a noop. Doing the same with a temporal table as suggested somewhere in this thread is not as trivial.

3) consistent db snapshots - once you get a hold of the database value, all the reads will only see the data as of time the value was retrieved. This makes application code much easier to reason about as the database can be treated as yet another immutable argument.

4) assembling a transaction value out of multiple pieces - same as the above, pure functions can all contribute to the final transaction value without having to mutate anything.

5) "free" caching on the peer - once you query something, it stays in the peer's memory. Subject to memory constraints, of course.

You can definitely build something as nice as Datomic on top of Postgres, but it will take weeks to get all the details right.

This doesn't really address anything I mentioned.

The database is fine; and it was nice to work with when it was just 2 engineers. For us, it didn't scale with # of engineers, business needs like BI, ETLs of large amount of healthcare data into Datomic; then needing to delete said data. We spent a lot of time reinventing the wheel around schema management and building a declarative query interface as well.

> few queries > take weeks


Hey, thanks for the comment.

To clarify - I wasn't trying to address the issues you've stumbled into, but rather list all the points I like about it.

This doesn't surprise me. It just plain doesn't have the eyeballs nor the deployment scale to shake all of these issues out. Postgres has both.

This is a good tutorial, but there are a few misconceptions / mistakes:

    > Avoid nils
    > Datomic does not support nil values for attributes. When you do not have a value for an attribute, you should either skip it or pass an empty value: a zero, an empty string, etc. That’s why the most of expressions have (or "") at the end of threading macro.
Datomic doesn't support `nil` on purpose: since it's a fact database, the correct approach is to not assert the attribute (iow, just omit it). Since the schema is a property of the database (not the entity), this makes your database forward-compatible to any new fact that you may need to store without having to be explicit about the past.

You can also use `(missing? ?entity :attribute)` on queries, which should do a faster lookup on the EAVT index vs. checking for sentinel values.

    > JSON data
    > In Datomic, there is no JSON type for attributes. I’m not sure I made a proper decision, but I just put those JSON data into a text attribute. Sure, where is no a way to access separate fields in a datalog query or apply roles to them. But at least I can restore the data then selecting a single entity:
You can flatten a JSON object into a namespaced map. At this point you get what is essentially an entity w/ well named attributes that you can transact and query against. Since the schema is flexible and doesn't require migrations, in theory you can support even arbitrary objects that you don't know the schema in advance by inspecting the deserialized object.


Datomic is a beast different enough to require some learning on what is optimal/idiomatic in terms of data modelling, because the paradigm shifts from appending/mutating tuples over multiple tables to asserting/retracting facts on what is essentially "one big table" (entity, attribute, value, transaction).

Very nice article, but in practice it almost is never usefull to switch a production system from postgres to dynamodb. Or rewrite it from python to clojure. But since its a pet project, its a nice way to learn new technology.

Interestingly, once you are on datomic you have a little more freedom of choice on underlying storage. Since Datomic uses postgres/ddb/cassandra as just a KV store it becomes trivial to switch underlying storage using the backup and restore feature. Nice if you decide to shop around or like a particular feature of a storage offering.

For anyone interested in exploring the datomic model, there is a great ClojureScript in-memory implementation called datascript (https://github.com/tonsky/datascript) by Nikita Prokopov.

Does Event Sourcing make Datomic a moot point? Sure it can be your event log, but it seems like overkill.

I imagine datomic query language is significant more powerful.

How does datomic compare to something like https://github.com/ApplauseOSS/djangoevents django-eventsourced?

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact