Delaying data architecture decisions is usually better than attempting to plan for them up front.
On the other hand, I have worked with people who like to try to plan for all possibilities. That usually results in enormous data models and extra layers of abstraction that never prove their value.
But people don't often look at it from the opposite perspective: that when you are explicit about the data in your system, then you can respond explicitly and immediately and actionably when it turns out you're wrong. The alternative is to not know when or how you're wrong.
For example, not having to lift a finger when you're wrong about the data is like suggesting your codebase might be more unintentionally useful after removing static analysis.
Of course things can change, but most tables have fairly immutable designs.
Datomic can be hosted on MySQL or PostgreSQL (and maybe others?). It's basically a 2 column table, so yes, Datomic inherits transaction safety. According to people I know that use Datomic, an early lesson is indexing, which apparently is often done way later than it should. Datomic also inherits the speed of indexing a huge table.
A counter is that many things may be covered in majic, but all on top of a fairly leaky abstraction.
What do you refer to when you say that Datomic inherits the speed of indexing a huge table? The query engine works by fetching blocks of data from storage and placing the blocks in a lazy immutable tree structure that are the indices of Datomic. They are also covering (the indices is where the actual data is stored). So it's not like the query engine runs queries on the huge table stored in mysql or postgres, it's used more like a key/value store. Case in point: you can put memcached between the datomic client (peer) and any of the supported storage engines. So Datomic piggybacks on very little other than the actual capabiliy to safely store data.
I'll admit that I'm guessing about how much of this depends on the database and Datomic itself, so I shouldn't be making a sweeping statement.
I admire Cognitect's ambition in going up against big database vendors as a small team, and creating something very unique. That said, as a user, it's frustrating that when I get an error I can't just pop open the source and see what's going on. It's frustrating that the console could use a few usability improvements that an interested OSS developer could add, but that don't happen. It's frustrating that the Client wire protocol hasn't been released yet, so support for non-JVM languages (which are provided by the community and do not have official Cognitect support) are stuck using the deprecated and unsupported REST interface.
I am not ideologically against proprietary software, I just think Datomic would be orders of magnitude more useful with an active OSS ecosystem around it.
But you're right, it's not fair to say that there's not been any effort! :)
"datahike is a durable database with an efficient datalog query engine. This project is a port of datascript to the hitchhiker-tree."
Historical information is not part of datahike, now and it's not on the roadmap as far as I can tell. But perhaps one day :)
Whitepaper at https://flur.ee/assets/pdf/flureedb_whitepaper_v1.pdf .
So for example if you want to count the number of page views for a particular user, you'd simply use EAVT Index for that particular userId.
True, in that there's no solution out of the box. I kind of want to make a "look at all the things I have to do", and GDPR compliance is one of those things where there are probably many ways to do it wrong with Datomic, and a few ways of doing it right.
One thing you can do, is to have a separate database for each person/user. It is trivial to join across multiple databases in Datomic, and you can even do crazy things like joining between a database and an excel spreadsheet. And deleting an entire database is easy. So there's that.
There's also excision, but that's a super expensive operation that you shouldn't be doing as part of your day to day routines, according to the Datomic team. I'd like to know more here. For example, is it OK to excise data once a week? Maybe it's GDPR compliant if deletion requests are batched like that.
There's also crypto shredding (encrypt values with separate key for each user, throw away the key on deletion request), but I'm not sure how GDPR compliant that is, since it leaves a lot of metadata behind. And you obviously can't encrypt values that you want to query on with the query engine.
Document-relational model instead of Datalog.
It's a bit of an existential point, but if your logic allows the disclusion of data before a certain point of time, is that not effectively the same as not possessing it?
Immutable is only immutable if you don't have write access to the lowest layer of storage.
For key-value store an immutable hash array mapped trie is a good place to start.
I.e. each update creates a new "head" node and duplicates only those parts which are required to maintain unique versions of the tree. The hash array mapped trie is excellent for this purpose since each node has many children, and thus a unique version does not need to duplicate too many nodes.
Then, just store all the head nodes in a linear order in e.g. a list, and you can travel back in time by navigating the head nodes backwards.