Datomic Implicit Partitions

hlship · on April 21, 2023

Like any database, Datomic often needs to perform queries by reading data from its indexes. In fact, in Datomic, all data is stored in indexes - there aren't rows with indexes pointing to the rows, just indexes.

Datomic uses any of a number of stores, typically something like DynamoDB, to persist this index data.

When performing a query, Datomic peers read directly from the store (or from a cache) and read an entire block of index data as a single unit - each such block effectively contains many entity/attribute/value/transaction Datoms sorted by entity, attribute, or value, depending on which index is being scanned.

In other words, when reading all the attributes for entity X, Datomic will read from storage the block containing X, but also must inadvertently read other data that precedes or follows X.

The partitioning business is to ensure that related entities are stored close to each other; literally, a block of bits within the 64 bit entity id is set to the partition value. This ensures that the Datoms for related entities are stored close to each other, which in turn, ensures that the query can be satisfied using fewer of these index blocks, read from cache or storage, than if the entire entity id was arbitrary. Essentially, if entity X and entity Y are related, they can share a partition, and most likely will be stored together in a single index block.

This co-location of data can make a big difference in very large Datomic databases.

unixhero · on April 22, 2023

Does Postgres have this?

hlship · on April 22, 2023

In a traditional database such as Postgres, queries are routed through a database server that handles both queries and updates, and is quite close to the storage. Reading excess data while executing a query is less of an issue as only the relevant data is eventually streamed to the client.

Datomic is quite different, in that peers (the other services that query Datomic data) execute the query engine library, and directly read the raw data from the store (or from a cache, as Datomic data is immutable). This design makes it easy to scale reads, but introduces the locality issues that the partitioning enhancement addresses.

hlship · on April 22, 2023

It’s quite apples and oranges.