The core concept of all deterministic databases is simple: if your database is deterministic, multiple geographically-distributed replicas can execute the same transaction log independently, and they will all reach the same state. The problem of implementing a distributed database is reduced to implementing a distributed log.
However, there is one oddity of the Waltz design: the central locking mechanism. Other deterministic databases don't have anything like this, because it's not necessary. You can keep track of locks locally on each replica, and you can rely on every replica to reach the same conclusions about which transactions succeeded and failed, because they are deterministic.
Can anyone clarify why they are managing centralized locks?
1. Assign each transaction a log position. For strict serializability instead of just serializability, make this the highest log position ever acknowledged successfully to a client. This can be batched for throughput.
2. Have each client record their read set and write set, which includes the objects/ranges the transaction read and the writes / operations to be performed.
3. Have clients persist this R/W set into the log, or send directly to the lock server if it is also the master assigning log positions. Again, use batching for throughput.
4. Have your lock server either as a part of the master processing assigning log positions, or have it tail the log separately. The lock server will receive batches of transactions, take locks in the log's defined order, then commit another entry to the log with the commit/abort decision for each.
5. Respond to the client with the commit / abort decision.
To make this easier to program you'll probably want to include a read-your-writes cache on the client.
You can also scale out this log by having clients commit their operations to multiple logs. The only thing that needs to be serialized is the locking and commit / abort decision making. These other logs can be sharded by key range or just chosen randomly, as long as the commit / abort decision log includes which other log the data itself was committed to.
FoundationDB works roughly like this. The terminology is different (resolver instead of lock server, R/W set is conflict ranges, log position is version number, etc) but this is basically how it works.
For a company like Wepay, maybe a couple minutes of downtime is incredibly expensive and having a single server just isn't going to cut it.
They explained why this wasn't ideal for them in the article.
I've been using a simple file format that just writes each message out sequentially in a simple format like: [event-id][event-type][event-size][event-bytes]. And there's a small TCP server that speaks Redis protocol to support remote access. But it's not really production code, rather something I hacked together over a couple weeks in the evening.
This is a project of mine I use for some side projects for exactly that reason: I want streaming logs & pubsub semantics. It's basically Kafka xtra-lite. It doesn't have a super simple TCP protocol (though I wouldn't say no to a PR adding one!), it's dual-available over gRPC and HTTP (JSON API).
Disclaimer: It's definitely one of my low-activity side projects and subject to change at any time.
It's fast, has consumer groups (like Kafka), and also supports individual message acknowledgement.
Also, what's Redis' multi-node failover/high availability story these days (with streams)? Last I heard, it wasn't that great , but it's been a while.
Redis keeps the entire working set in RAM so it'll start dropping writes or freeze if you run out of memory. This is where the simplicity and speed comes from and is a fundamental limitation.
There's a simple replica system that works well but failover switching is the problem and requires a separate process or running the Redis Sentinel. There's also Redis Cluster but that just shards the keyspace and doesn't offer any scalability with a single key or stream, and is still hard to manage with the same failover issues.
The OP asked for a single-node option so I suggested Redis, if you need a serious messaging cluster then I recommend Kafka or Pulsar instead.
It's frustrating that there's no obvious middle ground this and Kafka and Pulsar, both of which are memory-hungry JVM apps with multiple external dependencies. Both require ZooKeeper; Pulsar also requires BookKeeper. None of these components are operationally simple.
I'm a fan of NATS itself, but NATS Streaming's clustering design leaves a lot to be desired. In particular, it punts on failover/HA, asking you to instead run an SQL database or shared file system that provides this. (An obvious low-maintenance option here would be CockroachDB, but NATS doesn't support it.)
If you dont care about open-source then there are plenty of other options like AMPS  or Solace . The latter has a free edition.
And, as you say, fragile. I've run RMQ in production for years and I would be very happy if I could throw it out. It's the least well-behaved component in any stack I've used it in. Even Elasticsearch (shudder) is better at not losing data. Not just the clustering, either. Even for a persistent queue, RMQ will start to chug RAM for any message that is delivered but not yet ACKed, for example, making it dangerous for apps that want to batch large groups of messages for efficiency. (It seems to me that it was not designed for that at all, but for one-by-one consumption, which is of course much slower.)
I'm looking for a mature distributed log that is clustered and lightweight. Kafka except, say, written in Go.
There are also proprietary systems like Solace: https://solace.com/
Other than that, there are embedded queuing/log libraries or you can just dump the messages into a key/value store or write them out to a file yourself.
This might not be needed if this is strictly used for FIFO event consumption, but I guess I was thinking of trying to make a system like this support time-sliced queries.
A log gives you "transaction time" but you need to create an efficient representation of "valid time" for backfilling and corrections.
Disclosure: I work on a database for Kafka that provides point-in-time bitemporal Datalog queries https://github.com/juxt/crux
For the "valid time" primitive I was thinking of implementing something like a hybrid logical clock that CockroachDB has (but with looser guarantees, mostly just need uniqueness and monotonicity). A sequential ID would provide a slightly nicer interface for pagination but has all the problems that I previously mentioned.
what about kafka connect ?
> Waltz is similar to existing log systems like Kafka in that it accepts/persists/propagates transaction data produced/consumed by many services. However, unlike other systems, Waltz provides a machinery that facilitates a serializable consistency in distributed applications. It detects conflicting transactions before they are committed to the log.
Kakfa supports transactions not sure about serializable though.