To my understanding, I just implement small pieces of the processing steps and connect them through a common bus like Kafka. All those steps are just independent binaries. And to make the final processing flow I run multiple instances manually scaled depending on the task. Like to make an aggregation / reduce I just make sure I have only one instance of that binary is running. And so on. All can be orchestrated and scaled using Kubernetes or similar. Is that how I supposed to design the processing?
We do have a tool, Hydro Deploy, to setup and manage Hydro cluster deployment on GCP (other clouds later), but it's mainly for running experiments and not long-running applications.
The long term idea is that the Hydro stack will determine how to distribute and scale programs as part of the compilation process. Some of that will be rewriting single-node Hydroflow programs into multi-node ones.
Btw, another questions. Things like Flink provide a dashboard to analyze the processing flow / graph, to see the bottlenecks, etc. To my understanding Hydroflow doesn't provide it yet, but I'm curious if you're working on something like that, or if it provides other kind of metrics, perhaps something compatible with Grafana?