Fixing Bad Data in Datomic

solutionyogi · on Aug 19, 2014

That was a fantastic read. I can't count the number of hours I have wasted trying to figure out a bug (or trace a problem) in the system because someone decided to update the data silently. I think this is the kind of database which should be standard in finance industry. I have been working for financial companies in NYC for last 8 years and I have yet to see anyone use this product. Does anyone on HN have used this on a live system?

bobbycalderwood · on Aug 19, 2014

We do have a number of financial industry customers (e.g. http://www.datomic.com/pellucids-story.html), and they like it for the reasons you mention (history/audit-ability), plus a few others:

  * first-class/annotatable transactions (which Stu mentions in the blog post)
  * read scalability
  * local-process query facilitates analytics queries on transactional data
    without need for ETL
  * pluggable back-end storage means they can run it on top of what they already have 
    and easily move to something new/better later

chimeracoder · on Aug 19, 2014

I heard Rich Hickey speak about a year ago about Datomic.

Those are exactly the sorts of clients that Datomic aims to sell to (and their pricing model reflects that). Though Datomic is still relatively new, so knowing how quickly (read: slowly) financial institutions update their infrastructure, I'd be (pleasantly) surprised if it were widely deployed in a single financial institution yet.

I really liked Datomic from what I could see - at the time we started building our stack at my startup it didn't mature enough (library support, etc.) to be worth it, but it could make our lives a lot easier.

danneu · on Aug 19, 2014

I ported my 7-year-old vBulletin community to a forum I wrote from scratch with Clojure + Datomic earlier this year.

Datomic was a nice choice since it makes it trivial to expose a post's edit history. It also makes it easy to query for all the actions a moderator is responsible for. And if a moderator goes rogue and deletes everything, it's easy to revert.

solutionyogi · on Aug 19, 2014

Thanks for sharing your experience.

How much effort was it? Did you run in to bugs? How is the performance for a public facing website? How hard was to adopt to this new mindset about immutable database?

Ever since I started using Git, I have been wondering as to why we don't have a database product which works like Git. When I read through this fantastic page on their rationale (http://www.datomic.com/rationale.html), I was furiously nodding my head in agreement. Their approach makes so much sense in this day and age where hardware is cheap.

danneu · on Aug 19, 2014

Most of the effort involved just getting a workflow down, fleshing out my core `db.clj` abstractions, iterating until I understood What Not To Do, and getting a feel for Datomic's idiosyncrasies. I'd had a modicum of Datomic experience from using it to store the blockchain in my attempt at implementing a little bit of bitcoin in Clojure (https://github.com/danneu/chaingun). Here's my first attempt at writing a forum with Clojure + Datomic (https://github.com/danneu/clj-forum).

Like most of my Github projects, I now cringe at the two projects above even though the last commits were less than a year ago. But they worked and show how at least I was able to get started. I was also learning Clojure.

It's been half a year since I last touched Datomic. My forum is purring along without issue. Datomic's datalog queries are one of the best parts about Datomic. But it's also hard for me to grasp the performance implications of what I'm doing. I should've written a blog post back when the idiosyncrasies were fresh in my head.

My forum is deployed to Linode, and it hilariously shares a box with the Datomic transactor. My forum boots very slowly though as its Datomic peer loads data into memory. I never really dug into that. And I'm not sure how I would deploy my forum to Heroku since I'm trying to move my production projects there (I need to automate). I wanted to check out Datomic's REST api, but then I got a job.

berns · on Aug 19, 2014

This may interest you: https://github.com/mirage/irmin

sgrove · on Aug 20, 2014

I would love to see a comparison of the ideas behind Datomic and Irmin - they're actually seem quite different, but a blog comparing the two could really help clear things up and frame the tasks they're both suited for.

acrispino · on Aug 19, 2014

How are you implementing the pagination of forums/threads? I dabbled a little bit with datomic last year and from what I remember, pagination of result sets was not built into the api.

danneu · on Aug 19, 2014

Yeah, Datomic returns unordered data. I use a transactor function to bump the :post/idx on post insertion. On retrieval, I naively sort by :post/idx and use `drop` (offset) and `take` (limit).

When I asked in #datomic on Freenode a while back, support for ordered data was on the roadmap.

dgrnbrg · on Aug 20, 2014

You can watch a talk I gave about how we use Datomic at Two Sigma to manage our cluster's state: https://www.youtube.com/watch?v=YHctJMUG8bI

It's been a great experience using it, and I'd be happy to answer any questions.

sargun · on Aug 19, 2014

I really like the fact that facts are immutable in the Datomic model. It actually makes it easier to reason about eventually consistent, because you can rely on causal consistency between facts. Eventually consistent systems can give you durability, isolation, and atomicity, but it's typically hard to reason about, because the interface that's given to the user is some sort of multi-value, vector-clock tracked thing, and it's hard to reason about.

Datomic can manage a commit protocol under the hood, and deal with causality tracking without requiring the developer to "get dirty." It's query language is declarative, and doesn't provide the same implicit guarantees that SQL queries need.

mdavidn · on Aug 19, 2014

Datomic is not an eventually consistent system. All writes pass through a single "transactor," which serializes transactions and provides the same ACID guarantees as a traditional SQL database.

sargun · on Aug 19, 2014

But, the data storage can be in a AP system, allowing better stories around data availability.

dantiberian · on Aug 20, 2014

The Datomic docs at http://docs.datomic.com/acid.html describe it's ACID properties quite well.

> Another way to understand this is to consider the failure mode introduced by an eventually consistent node that is not up-to-date yet. Datomic will always see a correct log pointer, which was placed via conditional put. If some of the tree nodes are not yet visible underneath that pointer, Datomic is consistent but partially unavailable, and will become fully available when eventually happens.

ds_ · on Aug 19, 2014

Are there any promising open source alternatives with similar philosophies to datomic?