Any inputs on Argo workflows vs Kubeflow vs MLFlow? Which is better suited?

mmq · on Nov 12, 2019

I will try to be as objective as possible answering this question, since I am working on a project in a competing space.

Argo workflow is a pipeline engine that is cloud and kubernetes native. It tries to solve graph and multi-steps workflows using containers on Kubernetes, It can be leveraged for ML pipelines as well as other use-cases.

Kubeflow is a large project that has several components: training operators, serving (based on Istio and Knative), metadata (used by tensorflow TFX), pipelines, ... and integrates with other projects. Kubeflow pipelines is using Argo workflow as a workflow engine, although I think there are efforts to support other projects such as Tekton which is also a google project, and possibly TFX as a DSL for authoring pipelines in python.

The main focus for MLFlow, I think, is tracking ML models and providing an intuitive interface to model deployment and governance. The main strength of MLFlow is that it's easy to install and use.

Polyaxon has been used mainly for fast developement and experimentation, it has a tracking interface and several integrations for dashboarding, notebooks, and distributed learning. Polyaxon also has native support for some Kubeflow components, e.g. TFJob, Pytorch job, MPIJob for distributed learning.

The upcoming Polyaxon release will be providing a larger set of intergrations for dashboards, in addition to tensorboards, notebooks and jupyter labs, users will be able to start and share zeppelin notebooks, voila, plotly dash, shiny, and any custom stateless service that can consume the outputs of another operation.

The new workflow interface focuses mainly on an easy declrative way to handle DataOps and MLOps, the main idea is to provide a very simple interface for the user to go from a data transformation to training models. Since the component abstraction is based on containers, it can be used to do other operations, e.g. packaging models and preparing them to be served on other open source projects, cloud providers, or lambda functions. Also support for some frameworks such as dask, spark and flink operators could be used as a step in a workflow, ...

For hyperparams tuning, Currently, the platform has grid search, random search, hyperband, and bayesian optimization, one of the major changes in the next release is a new interface for people to create their own algorithms and a mapping interface to traverse a space search provided by the user or based on the output of another operation.

FridgeSeal · on Nov 12, 2019

Kubeflow, unless I’m missing some things, is for Tensorflow pipelines, if you’re not using TF, or it’s not the only thing you use, it’s not ideal.

I thought MlFlow was a spark thing, and were trying to migrate off of spark/DataBricks due to the resources inefficiencies of Spark (at our scale) and maintenance nightmare that python notebooks are causing us.

Argo is just a container workflow tool, not ML specific. We’re planning on using Argo for the data engineering parts, and polyaxon for the ML training parts because of the convenient monitoring and hyper parameter search tools.

TheIronYuppie · on Nov 20, 2019

Hi! Co-founder of Kubeflow here - definitely not TensorFlow only! You can see ([1]) many many different repos and operators. The nice part about Argo for us is it let us build an ML specific DSL that was also Kubernetes native.

[1] https://github.com/kubeflow/