Hacker News new | past | comments | ask | show | jobs | submit login

Hey thanks for posting, I'm one of the main devs of Hydroflow. It is not built on top of timely nor differential (we have those deps for benchmarking). The design goals are a little different, Hydroflow aims to be faster and lower level, and with fewer unnecessary clocks. Hydroflow is also single-node, scaling is done with explicit networking rather than thru the runtime. It's the lowest level of the Hydro stack.

Hydro homepage: https://hydro.run/

For a fun easy demo check out the Hydroflow surface syntax playground: https://hydro.run/playground

More info on the Hydro project and stack, CIDR '21: https://hydro.run/papers/new-directions.pdf

Info on the "lattice flow" model in Hydroflow: https://hydro.run/papers/hydroflow-thesis.pdf

e: Also happy to answer any questions! :)

I'm familiar with Java based tools for data processing, like Spring Reactor, Apache Beam, etc., and trying to figure out how I can Hydroflow instead.

To my understanding, I just implement small pieces of the processing steps and connect them through a common bus like Kafka. All those steps are just independent binaries. And to make the final processing flow I run multiple instances manually scaled depending on the task. Like to make an aggregation / reduce I just make sure I have only one instance of that binary is running. And so on. All can be orchestrated and scaled using Kubernetes or similar. Is that how I supposed to design the processing?

Yes that is correct for how you would have to set up a multi-node system for now. This is in contrast with Beam, Spark, etc. where the runtime deployment management/coordination is perhaps the biggest most important part of the product. Hydroflow aims to be a lot lower-level than that, with no opinions on coordination and networking. For example we'd want it to be possible to implement the coordination mechanisms one of those systems (Beam, Spark, MapReduce, Materialize) in Hydroflow.

We do have a tool, Hydro Deploy, to setup and manage Hydro cluster deployment on GCP (other clouds later), but it's mainly for running experiments and not long-running applications.

The long term idea is that the Hydro stack will determine how to distribute and scale programs as part of the compilation process. Some of that will be rewriting single-node Hydroflow programs into multi-node ones.

Thank you. I actually like the idea that I can split the processing into independent binaries and scale them manually. And especially the fact that I can implement it in Rust.

Btw, another questions. Things like Flink provide a dashboard to analyze the processing flow / graph, to see the bottlenecks, etc. To my understanding Hydroflow doesn't provide it yet, but I'm curious if you're working on something like that, or if it provides other kind of metrics, perhaps something compatible with Grafana?

Yeah, nothing like that provided and won't have anything like that for a while. Currently you would have to wire-in your own instrumentation in the dataflow graph.

I guess more a closer comparison would be with the Project Reactor https://projectreactor.io/ which is also a low level framework for data processing.

Ah, I see. Thank you. I will definitely try it

Does it / will it support retractions?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact