
Flyte: A Cloud Native Machine Learning and Data Processing Platform - mgrover
https://eng.lyft.com/introducing-flyte-cloud-native-machine-learning-and-data-processing-platform-fb2bb3046a59
======
minimaxir
How does this differ from Airflow?
[https://airflow.apache.org/](https://airflow.apache.org/)

Looking at the docs for both, Flyte has similar functionality to Airflow,
except less mature and a more functional Task specification syntax. Airflow
has the same data ETL operators as well, plus a few more.

~~~
kumare3
Great question, I am working on a follow up blog that will explain the
differences in more detail. Flyte does take some inspiration from airflow, but
it has a lot of important differences \- Flyte natively understands data flow
between tasks. This is achieved using its own type system created in protobuf
\- Flyte tasks are first class citizens and hence can be shared, reused and
are always associated with an interface declaration \- Flyte is container and
kibernetes native. It is also multi tenant. \- Flyte corn scheduler, control
plane api and the actual execution engine are decoupled. Each workflow can be
independently executed on a different execution engine \- Flyte workflows are
purely specification - defined in protobuf and Flyte tasks also \- Flyte
provides an event stream of the execution \- since Flyte is aware of the data,
it comes with built in memorization and auto cataloging \- like airflow Flyte
can have plugins in python, but it supports a richer plugin interface \- Flyte
is written in Golang and on top of kuberenetes It is definitely less mature in
the open source, so please help us make it better. But it has been battle
tested at Lyft for more than 3 years in production.

~~~
zerovar
Have Lyft migrated all their workflows from Airflow to Flyte? Or does Airflow
still play a role alongside Flyte? Was assuming Lyft is running workflows in
Airflow from this post [https://eng.lyft.com/running-apache-airflow-at-
lyft-6e53bb8f...](https://eng.lyft.com/running-apache-airflow-at-
lyft-6e53bb8fccff)

~~~
kumare3
Another great question. So Airflow is used at Lyft for ETL. I think for
traditional ETL it still is a good fit. But, there is an effort to not just
migrate, but rethink how we can leverage Flyte's capabilities to improve our
ETL experience.

But, as it exists, we have a FlyteAirflowOperator, so that users can easily
connect their Airflow pipelines with Flyte and write the new ones on Flyte
alone.

Stay tuned for developments on this front :)

------
roberto
Any plans to support Python type annotations, instead of using the @inputs and
@outputs decorators?

~~~
crorella
You mean DataClasses?

~~~
roberto
No, I mean type annotations in the functions defining the tasks. If you look
at their first example, they define a function called
`get_traintest_splitdatabase`, and the input and outputs are annotated using
two decorators.

~~~
kumare3
Roberto, this is absolutely one of our goals. When we started, it was with
python2.7 still around. We would love contributions, ofcourse we will work
with you and adapt it

------
crorella
"use cached versions of pre-computed artifacts" This is really nice. I wonder
if this also covers partial pre-computations, for example when the same
subquery is reused across several pipelines.

~~~
matthewphsmith
It is definitely possible to leverage Flyte's features here for sharing
partial outputs. It does, however, require formatting your pipelines in a
particular manner for this to be supported natively (i.e. create a task which
computes a view or similar. Then that task is shared among different
pipelines). It might be a little verbose, but, in my opinion, it is preferable
because it modularizes pipelines into tasks which are individually tractable
for testing and validation--especially as the tasks and pipelines evolve in a
large organization! Further, the verbosity of authoring such pipelines can be
reduced significantly by making good use of the flytekit library.

Additionally, there is always the option of introducing a custom plugin.
Although it would take more effort up front, one can really let their
imagination run wild and introduce behaviors as needed.

~~~
crorella
I see, thanks. I was hoping for this to be more automatic.

