Hacker News new | past | comments | ask | show | jobs | submit login

I'm familiar with Java based tools for data processing, like Spring Reactor, Apache Beam, etc., and trying to figure out how I can Hydroflow instead.

To my understanding, I just implement small pieces of the processing steps and connect them through a common bus like Kafka. All those steps are just independent binaries. And to make the final processing flow I run multiple instances manually scaled depending on the task. Like to make an aggregation / reduce I just make sure I have only one instance of that binary is running. And so on. All can be orchestrated and scaled using Kubernetes or similar. Is that how I supposed to design the processing?




Yes that is correct for how you would have to set up a multi-node system for now. This is in contrast with Beam, Spark, etc. where the runtime deployment management/coordination is perhaps the biggest most important part of the product. Hydroflow aims to be a lot lower-level than that, with no opinions on coordination and networking. For example we'd want it to be possible to implement the coordination mechanisms one of those systems (Beam, Spark, MapReduce, Materialize) in Hydroflow.

We do have a tool, Hydro Deploy, to setup and manage Hydro cluster deployment on GCP (other clouds later), but it's mainly for running experiments and not long-running applications.

The long term idea is that the Hydro stack will determine how to distribute and scale programs as part of the compilation process. Some of that will be rewriting single-node Hydroflow programs into multi-node ones.


Thank you. I actually like the idea that I can split the processing into independent binaries and scale them manually. And especially the fact that I can implement it in Rust.

Btw, another questions. Things like Flink provide a dashboard to analyze the processing flow / graph, to see the bottlenecks, etc. To my understanding Hydroflow doesn't provide it yet, but I'm curious if you're working on something like that, or if it provides other kind of metrics, perhaps something compatible with Grafana?


Yeah, nothing like that provided and won't have anything like that for a while. Currently you would have to wire-in your own instrumentation in the dataflow graph.


I guess more a closer comparison would be with the Project Reactor https://projectreactor.io/ which is also a low level framework for data processing.


Ah, I see. Thank you. I will definitely try it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: