Hacker News new | past | comments | ask | show | jobs | submit login

How does this differ from Airflow? https://airflow.apache.org/

Looking at the docs for both, Flyte has similar functionality to Airflow, except less mature and a more functional Task specification syntax. Airflow has the same data ETL operators as well, plus a few more.






Great question, I am working on a follow up blog that will explain the differences in more detail. Flyte does take some inspiration from airflow, but it has a lot of important differences - Flyte natively understands data flow between tasks. This is achieved using its own type system created in protobuf - Flyte tasks are first class citizens and hence can be shared, reused and are always associated with an interface declaration - Flyte is container and kibernetes native. It is also multi tenant. - Flyte corn scheduler, control plane api and the actual execution engine are decoupled. Each workflow can be independently executed on a different execution engine - Flyte workflows are purely specification - defined in protobuf and Flyte tasks also - Flyte provides an event stream of the execution - since Flyte is aware of the data, it comes with built in memorization and auto cataloging - like airflow Flyte can have plugins in python, but it supports a richer plugin interface - Flyte is written in Golang and on top of kuberenetes It is definitely less mature in the open source, so please help us make it better. But it has been battle tested at Lyft for more than 3 years in production.

Quite interesting to hear this, it's very much the same observations while working with some customers, Airflow is a very mature and an amazing tool, but it does not have a good state/artifacts management, which leaves the users tweaking around, scheduling is centralised, and is not designed for ML workflows, i.e. hyperparams tuning, distributed runs, ... the kubernetes support is also quite limited.

Polyaxon[0] took a similar approach to FLyte, i.e. for authoring specifications: strongly typed system in protobuf + intuitive yaml specification + sdks in Python/golang/java/... It also treats operations (tasks in Flyte) as first class citizens and allows to run them in a serverless way. Users can choose to register repetitive operations as components and share them with a description and a typed inputs/outputs.

[0] https://github.com/polyaxon/polyaxon


Have Lyft migrated all their workflows from Airflow to Flyte? Or does Airflow still play a role alongside Flyte? Was assuming Lyft is running workflows in Airflow from this post https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8f...

Another great question. So Airflow is used at Lyft for ETL. I think for traditional ETL it still is a good fit. But, there is an effort to not just migrate, but rethink how we can leverage Flyte's capabilities to improve our ETL experience.

But, as it exists, we have a FlyteAirflowOperator, so that users can easily connect their Airflow pipelines with Flyte and write the new ones on Flyte alone.

Stay tuned for developments on this front :)


Here is a blog post I wrote a few weeks ago that describes how Flyte's structured workflow specification allows for using open-source workflows with no code.

https://medium.com/@flytehub/introducing-flytehub-open-sourc...


It definitely feels like there is Airflow inspiration though poking through the site and docs, it seems that the devil is in the details with respect to the differences between the two... kinda like the relationship between Airflow and Prefect [1]. It looks like the barrier to entry is higher with Flyte but that there are benefits as well.

One example I see is that one can run Airflow simply without containers if desired with just simple Python functions whereas Flyte seems to be much more concerned with managing the execution environment for you (pros/cons to that).

Flyte also seems to be more "Kubernetes native" by default [2][3] vs with Airflow this is more of a choice amongst several executors.

I'd be curious to see a performance benchmark using comparable workflows vs Airflow with the Kubernetes Executor or Kubernetes Operator.

[1]: https://www.prefect.io/

[2]: https://kccncna19.sched.com/event/UaYY/flyte-cloud-native-ma...

[3]: https://www.youtube.com/watch?v=KdUJGSP1h9U


Thanks for the link! Looking at Prefect's comparison to Airflow (https://docs.prefect.io/core/welcome/why_not_airflow.html#wh...), I think it sums up a lot of the same areas we tried to address with Flyte! Particularly the bullet-points in the 'Overview' section were all things, to the word, that were paramount for us to solve at Lyft. Now perhaps we need to publish an article on comparison to Prefect ;)

Thank you for replying. I would definitely look forward to that post.

Airflow just cover authoring, monitoring and scheduling. I don't recall it covered caching, lineage nor resource allocation/reuse.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: