Hi all! I'm a member of the team making the release. Note that this was a bit premature, as we haven't announced the project yet. So expect a lot more documentation around how to use AMBROSIA to appear shortly!
Nevertheless, please give it a try and help us out by filing issues for anything you run into. You can get off the ground by simply running `Scripts/run_*_ci.sh` (which will need you to set up your Azure storage connection string for pushing service metadata to Azure).
Essentially this is a language agnostic framework for building data processing systems that are highly-avalible, distributed, topologically static (no dynamic scaling), and features exactly once processing.
You define a message handler that will always produce the same output sequence given the same input sequence and the framework provides delivery, serialization, buffering, durability and transparent recovery. They even provide a nice way of wrapping non deterministic behavior so that you can seamlessly continue even if you fail in the middle of processing a message.
That being said, the really are sloppy with their performance numbers, the comparison to gRPC isn't really fair at all due to their dynamic batching. And the code examples in the paper have some really silly errors.
But the paper is still a great introduction to reliable stream processing and basic strategies for delivering exactly once delivery.
Did you have a chance to read about Google's Dataflow paper[1] and their new Streaming Engine[2]?
From a layperson's perspective, it seems like they are tackling some of the same ideas (separation of state && computation, applying optimisation techniques used in functional world, etc).
I'd be interested in learning where and how AMBROSIA differ!
Regarding "topologically static", the system doesn't assume a fixed set of communicating endpoints (like MPI ranks). It will all you to dynamically add new participants to the network.
Why is the gRPC comparison not fair? Shouldn't they do dynamic batching also? It can be done without unduly affecting latency. (I did my phd in stream processing studying this proposition, and Jonathan Goldstein and others have demonstrated the same thing in Trill.) In the case of AMBROSIA, our latency increase vs gRPC is not because of batching, but is because on waiting for the log to persist in georeplicated storage.