I read a quote the other day where somebody suggested that the Clojure ecosystem feels like it’s filled with a bunch of tools from the year 3000, and I think that Crux is a good example of that. It looks very cool.
My personal favourite is the talk from ClojuTRE by the main architect behind Crux, which looks under the hood a lot more and discusses many of the key design trade-offs during initial implementation: https://youtu.be/YjAVsvYGbuU
However, Crux is much easier to justify with stakeholders if you have a complex problem revolving around ad-hoc graph joins or bitemporal queries & audit requirements. Or if you simply have high single-threaded transaction throughput requirements. Postgres in particular doesn't have pleasant answers to this intersection of problems.
I think the tables will inevitably turn against currently-safe choices life Postgres though, as systems like Crux are going to continue to be able to take advantage of the latest and greatest ideas, relatively unencumbered by decades of antiquated design decisions and legacy code. This is where the whole Clojure philosophy really shines in Crux, as all the layers interact through clean and composable Clojure protocols that facilitate new kinds of database modularity. These protocols allow the community to take advantage of whatever wonderful cloud services they might want to use, without any additional ceremony to interact with the core development team.
EDIT: Finally, as new generations of developers emerge the pressure for the industry to standardise on a better relational query language will build. Fingers are crossed for Datalog!
> Record the true history of your business whilst also preserving an immutable transaction record - this is the essence of bitemporality. Unlock the ability to efficiently query through time, as a developer and as an application user. Crux enables you to create retroactive corrections, simplify historical data migrations and build an integrated view of out-of-order event data.
> Apache Kafka for the primary storage of transactions and documents as semi-immutable logs.
> RocksDB or LMDB to host indexes for rich query support.
Kafka is usually more than most people want or need. There was a blog post recently about a new Firebase-like Clojure web framework which uses Crux + Digital Ocean's managed Postgres service: https://findka.com/blog/migrating-to-biff/
Disclosure: working on Crux :)
How straightforward would it be to add a new JDBC-based node implementation? I'm noticing that SQL Server is apparently missing, which is a bit of a shame given how many off-the-shelf products use it (and could therefore probably integrate with Crux if Crux did so, too).
It wouldn't be infeasible for someone new to Clojure to manage it about the same time also. The hard part is the SQL, not the Clojure.
Fortunately there actually is SQL Server support already: https://github.com/juxt/crux/blob/master/crux-jdbc/src/crux/...
...and I'll make sure the full range of backends is made clearer somewhere. Good feedback!
That being said, I believe (it’s been a while since I looked at Crux), it actually maintains a separate set of Entity-Value-Attribute indices generated from the contents of the immutable series of documents stored in Kafka/RocksDB.
This should give you similar level of query power when compared to something like Datomic.
Disclaimer, I haven’t had any actual hands-on experience with either Crux or Datomic, so I could be wrong.
Once documents are ingested into the EAV-like indexes you can query everything much the same as a typical graph database, because all the ID references within the documents translate into edges within a schemaless graph of entities.
Crux's native query language is a flavour of Datalog but we also have a SQL layer in the works using Apache Calcite, for a more traditional relational feel.
If we had native JSON support already I suspect we would have opted for "JSON Store", but we've only been able to support edn since we launched and "edn Store" isn't too helpful.
An official Lucene index is very likely going to be added for full-text search at some point. I know of two users in the community who have already integrated Lucene themselves, which is actually fairly straightforward as long as you don't need to do searches in the past or can cope with returning results from across all of time.
Incidentally they have some fairly nice materials about their own bitemporal query support and it's use across industries, e.g.:
I too had that opinion once, untill I started real DDD and event sourcing.
An opinion can change quickly ;)
It sounds interesting
That said, we've not built Crux with domain-level event sourcing in mind in particular. There is certainly overlap in the bitemporal model Crux has and what I gather some people are doing with retroactive events.
This blog post on Datomic and event sourcing is interesting and relevant for Crux also (Crux has been heavily inspired by Datomic): https://vvvvalvalval.github.io/posts/2018-11-12-datomic-even...
Perhaps "ddd quickly" is a good 66 page summary.
I help steer the roadmap for Crux, which is very openly visible on GitHub, and we have a 1.9 release coming up in the next couple of weeks which will be the most significant milestone since we launched last year.
Shortly after 1.9 we will be adding SQL query support, which uses Apache Calcite to compile SQL into Datalog on-the-fly, and end-to-end JSON APIs. Both of these together should really broaden the appeal of Crux far beyond Clojure and the Java/JVM communities. Stay tuned!
I would be very happy to answer any questions and hear feedback, as ever.
I'm working on a versioned database system in my spare time which offers similar features and benefits. The core is Java based, whereas the Server is written in Kotlin using Vert.x. A Python client as well as currently a TypeScript based client for a non-blocking, asynchronous HTTP-Server exists.
The data store can easily be embedded into other Java/Kotlin based projects without having to use the Server indirection.
Lately I changed a lot of stuff regarding the set-oriented query engine which uses an XQuery derivate to process both XML and JSON as well as structured data. For now I've integrated rule based index rewriting stuff to answer path queries and path queries with one predicate through the index, filtering false positives if necessary. Next, I'll add a twig based operator and AST rewrite rules to answer multiple path queries with a single scan over smaller subtrees.
Furthermore, I want to use the new Java Foreign Memory API to memory map the storage file and change the tuple-at-a time iterator model to fetch batches of tuples at once for better SIMD support through the JVM.
I've also thought about replacing the Kafka backend with SirixDB itself, as SirixDB doesn't need a WAL for consistency... so SirixDB can hopefully horizontally scale at some point in the future.
Some of the futures so far:
- storage engine written from scratch
- completely isolated read-only transactions and one read/write transaction concurrently with a single lock to guard the writer. Readers will never be blocked by the single read/write transaction and execute without any latches/locks.
- variable sized pages
- lightweight buffer management with a "kind of" pointer swizzling
- dropping the need for a write ahead log due to atomic switching of an UberPage
- rolling merkle hash tree of all nodes built during updates optionally
- ID-based diff-algorithm to determine differences between revisions taking the (secure) hashes optionally into account
- serialization of edit operations for changed subtree roots to make comparisons between consecutive revisions or subtrees thereof incredibly fast (no-diff computation needed at all)
- non-blocking REST-API, which also takes the hashes into account to throw an error if a subtree has been modified in the meantime concurrently during updates
- versioning through a huge persistent and durable, variable sized page tree using copy-on-write
- storing delta page-fragments using a patented sliding snapshot algorithm
- using a special trie, which is especially good for storing records sith numerical dense, monotonically increasing 64 Bit integer IDs. We make heavy use of bit shifting to calculate the path to fetch a record
- time or modification counter based auto commit
- versioned, user-defined secondary index structures
- a versioned path summary
- indexing every revision, such that a timestamp is only stored once in a RevisionRootPage. The resources stored in SirixDB are based on a huge, persistent (functional) and durable tree
- sophisticated time travel queries
Kind regards Johannes
 https://sirix.io and https://github.com/sirixdb/sirix
But reading from this thread that documents are converted into EAVT for every attribute? Sounds interesting
Alongside some less-exciting changes 1.9 will include full transaction function support, so schema-on-write is much simpler to achieve now. It took us time figure out how to resolve it whilst not completely breaking bitemporality & eviction. Here are a few tests showing how it to use transaction functions now: https://github.com/juxt/crux/blob/b13b4da988ed7e91dc7685e5d5...
> documents are converted into EAVT for every attribute?
The indexes are certainly schemaless, which means that every document attribute gets turned into a set of one or more triples that can be joined against without needing to declare anything about the attributes upfront, but internally the indexes don't follow the same EAVT pattern you may be familiar with from Datomic etc. checkout the ClojuTRE talk (linked in these comments) for a good overview on roughly what the internal indexes look like. It's a little out of date now, but still gives a strong impression for how things work today.
However, I still believe schema-on-read is more desirable in general though, as you can simply extend a query to only care about the AV combinations which conform to some definition for that A. The Calcite SQL integration we've been building works like this. Conformance could also be processed asynchronously and documents be labelled as "conformed" as/when to reduce the burden during general queries.
I found the public MIT lectures on "retroactive data structures" (by Erik Demaine) very helpful to clarify the mental model and properties of temporal databases. Point-in-time bitemporal queries are fairly simple compared to the kinds of things some organisations use data warehouses for (e.g. temporal range queries across N dimensions), but surprisingly few teams think about representing time so clearly within their transactional OLTP/HTAP systems, so even point-in-time queries seem more novel than they really should.
Everything is moving in the right direction though as immutable data becomes the default.
And here are the slides for that talk, which include a few references: https://juxt.pro/hakan-raberg-the-design-and-implementation-...
Feel free to reach out to the team, or find us on Zulip/Slack, if you want to hear more or just chat :)