Hacker News new | past | comments | ask | show | jobs | submit login
Datomic: Look at all the things I'm not doing (augustl.com)
138 points by icey on April 21, 2018 | hide | past | favorite | 43 comments

One thing though: you still have to do data modeling. No matter how much you pretend that you don't have to do it, you still need relationships, and unique references, getting and querying ranges. Hell, even data access restrictions.

Yep. The NoSQL crowd constantly pushes the concept of no data modeling or need for a data administrator. These two claims have been proven false time and time again, often to disasterous results, yet it keeps on being said.

But quite often your first two or five models will be wrong. Usually the exact problem is not clear early on...

Delaying data architecture decisions is usually better than attempting to plan for them up front.

I whole heartedly disagree. Data doesn't care about your application.

I guess we all have our personal experiences to go by. My 24 years have thought me that perhaps 25% of the real need is know up front. And usually there is some pattern that one's problem seems to fit into. Then run with it until more details appear. Then adapt or add...

On the other hand, I have worked with people who like to try to plan for all possibilities. That usually results in enormous data models and extra layers of abstraction that never prove their value.

On the other hand I think that's just a misnomer. That because you went through the trouble of writing a schema, it's somehow worse when you're wrong.

But people don't often look at it from the opposite perspective: that when you are explicit about the data in your system, then you can respond explicitly and immediately and actionably when it turns out you're wrong. The alternative is to not know when or how you're wrong.

For example, not having to lift a finger when you're wrong about the data is like suggesting your codebase might be more unintentionally useful after removing static analysis.

It's been my experience that people who design their database around an app, or people that just dump data to fix for later, tend to have much greater issues in the near and far future.

Of course things can change, but most tables have fairly immutable designs.

For those that don't know, Datomic is a hosted database, similar to how Clojure is hosted.

Datomic can be hosted on MySQL or PostgreSQL (and maybe others?). It's basically a 2 column table, so yes, Datomic inherits transaction safety. According to people I know that use Datomic, an early lesson is indexing, which apparently is often done way later than it should. Datomic also inherits the speed of indexing a huge table.

A counter is that many things may be covered in majic, but all on top of a fairly leaky abstraction.

An interesting point to add is that Datomic can also write to Riak. But since riak lacks an atomic CAS, zookeeper is used alongside Riak.

What do you refer to when you say that Datomic inherits the speed of indexing a huge table? The query engine works by fetching blocks of data from storage and placing the blocks in a lazy immutable tree structure that are the indices of Datomic. They are also covering (the indices is where the actual data is stored). So it's not like the query engine runs queries on the huge table stored in mysql or postgres, it's used more like a key/value store. Case in point: you can put memcached between the datomic client (peer) and any of the supported storage engines. So Datomic piggybacks on very little other than the actual capabiliy to safely store data.

I don't know the full details, but I'm lead to understand that indexing a Datomic database too late is time-consuming and will end up causing considerable downtime.

I'll admit that I'm guessing about how much of this depends on the database and Datomic itself, so I shouldn't be making a sweeping statement.

Ah, that makes sense. I've never added an index myself, but essentialy Datomic has to rewrite the main index tree. In other words, all of the stored data, since there is no data stored outside of the covering indies. So I wouldn't be surpised if that causes some down time if you have lots of data in your database.

I think "late indexing" maybe referring to an implementation detail. The indexes aren't updated after each transaction. When you query the db, the result is a merge between the index and the transactions that haven't been indexed yet. The index is updated in batches.

Datomic Cloud indexes automatically and persists to DynamoDB; indexing happens in a background process

Nice. Too bad it's not Free Software.

I think this is an important point, and I know it often derided as people asking a company to give something away for free, so I want to defend it.

I admire Cognitect's ambition in going up against big database vendors as a small team, and creating something very unique. That said, as a user, it's frustrating that when I get an error I can't just pop open the source and see what's going on. It's frustrating that the console could use a few usability improvements that an interested OSS developer could add, but that don't happen. It's frustrating that the Client wire protocol hasn't been released yet, so support for non-JVM languages (which are provided by the community and do not have official Cognitect support) are stuck using the deprecated and unsupported REST interface.

I am not ideologically against proprietary software, I just think Datomic would be orders of magnitude more useful with an active OSS ecosystem around it.

I am regularly alarmed by the fact that nobody has tried to make an open source clone yet..

Well, why would they? Clones would be using relatively old technology. The current bleeding-edge research here is on more-structured, more-formal knowledge representation, stuff like FQL/AQL [0] which is open-source from the first day. Maybe Datomic's not sufficiently better than Pg or SQLite to motivate anybody.

[0] http://categoricaldata.net/fql.html

That looks very interesting! Do you have a recommended point of entry for a non-academic practitioner like myself? :)

Mentat is built for embedding, not for large applications. Datascript is in-memory for browsers.

But you're right, it's not fair to say that there's not been any effort! :)

Work is beeing done to transition datascript into a a much more database like thing:

"datahike is a durable database with an efficient datalog query engine. This project is a port of datascript to the hitchhiker-tree."


Interesting, keep 'em coming!

Historical information is not part of datahike, now and it's not on the roadmap as far as I can tell. But perhaps one day :)

Do they respond to users comments too ?

Rich Hickey has responded to this many times. They made Clojure free and open source and have put lots of their own $$ into Clojure. If they want to make a product that costs money, that is their choice. They also need to keep the lights on. Now if they won't let users who pay for the product see source-code under an NDA, that is a indeed a problem in my mind, but is pretty much just standard business practice. I have no clue if they're open in that area or not.

New DB with similar time snapshots and immutability ... https://flur.ee ... haven't fully investigated.

Whitepaper at https://flur.ee/assets/pdf/flureedb_whitepaper_v1.pdf .

I couldn't follow his arguments against event sourcing. Could someone elaborate on the differences?

Apparently dataomic creates indexes for you[1] so you don't have to write views on top of the events like you'd do if you use just insert events in a table in postgres.

So for example if you want to count the number of page views for a particular user, you'd simply use EAVT Index for that particular userId.


Yup. You don't create your own views, you get a generic query engine and generic indices instead. On top of all the other benefits, like cheap as-of time travel.

This will be a tongue in cheek comment, but there's another thing Datomic isn't making you do either:

GDPR compliance

OP here :)

True, in that there's no solution out of the box. I kind of want to make a "look at all the things I have to do", and GDPR compliance is one of those things where there are probably many ways to do it wrong with Datomic, and a few ways of doing it right.

One thing you can do, is to have a separate database for each person/user. It is trivial to join across multiple databases in Datomic, and you can even do crazy things like joining between a database and an excel spreadsheet. And deleting an entire database is easy. So there's that.

There's also excision, but that's a super expensive operation that you shouldn't be doing as part of your day to day routines, according to the Datomic team. I'd like to know more here. For example, is it OK to excise data once a week? Maybe it's GDPR compliant if deletion requests are batched like that.

There's also crypto shredding (encrypt values with separate key for each user, throw away the key on deletion request), but I'm not sure how GDPR compliant that is, since it leaves a lot of metadata behind. And you obviously can't encrypt values that you want to query on with the query engine.

IANAL, but I‘m pretty sure it is acceptable to delete data within 30 days of receiving a request to do so.

Forgot to add. The built-in excision API is similar to crypto shredding, in that all the metadata is retained, it's just the values themselves (in Datomic's EAVT structure) that are removed. So a lot of metadata is still retained.

For what it's worth, FaunaDB is also temporal and supports retention periods, TTL rows, and locality control, specifically for GDPR.

Document-relational model instead of Datalog.

Gpdr compliance is as much as not choosing to retrieve data after a certain point as it is not possessing that data.

It's a bit of an existential point, but if your logic allows the disclusion of data before a certain point of time, is that not effectively the same as not possessing it?

Well, they do support extinctions?

Yes. They call this excision[0] and it leaves an Audi table hole in the timeline which lets you attach metadata like why there is missing data while still removing the data.

[0] https://docs.datomic.com/on-prem/excision.html

That is certainly something an add-on or external tool could provide.

Immutable is only immutable if you don't have write access to the lowest layer of storage.

It's difficult to implement this by hand, though. What you'll have in storage are encoded blobs of chunks of some kind of sorted set tree structure. So if you are to poke the data directly like that, I would assume it is much easier to just do an offline backup and a restore, and filter the data in the restore process somehow, which I believe is possible.

Only way to experience this magic without paying $$ is to watch youtube videos? :\

I think you can adapt the concept of immutable data structures to any situation.

For key-value store an immutable hash array mapped trie is a good place to start.

I.e. each update creates a new "head" node and duplicates only those parts which are required to maintain unique versions of the tree. The hash array mapped trie is excellent for this purpose since each node has many children, and thus a unique version does not need to duplicate too many nodes.

Then, just store all the head nodes in a linear order in e.g. a list, and you can travel back in time by navigating the head nodes backwards.




We've been using the free version in production for many years now :)

You can try a mostly Datomic-compatible Datascript: https://github.com/tonsky/datascript/

There appears to be a no-cost starter edition: https://www.datomic.com/get-datomic.html

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact