MLflow: An Open Source Machine Learning Platform

rememberlenny · on June 12, 2018

If you are looking for a dependable, scalable, closed-source option, check out http://www.comet.ml (the thing I work on).

The focus with Comet.ml is more on experiment tracking and hyperparameter optimization rather than model deployment. We make it very easy to compare your experiments results, code, and hashed datasets for better reproducibility.

We have a one-line integration with your existing machine learning code and make it stupid simple to start tracking your experiments.

All you do is:

  > import comet_ml
  > experiment = Experiment(api_key="MY_API_KEY")

_boom_

Comet.ml supports many libraries (keras, tensorflow, scikit-learn, custom-built code spaghetti, and everything else that makes you a ML wizard/unicorn/armored flaming hippopotamus).

  ++ Its free for public projects and academics.

tchalla · on June 5, 2018

What are the current open source alternatives to MLflow?

hcrisp · on June 5, 2018

Sacred (https://github.com/IDSIA/sacred)

FGLab (https://kaixhin.github.io/FGLab/)

Metricmachine (https://github.com/danielwaterworth/metricmachine)

Non-open source:

Neptune (http://neptune.ml)

Aetros (https://aetros.com/trainer)

sbpayne · on June 6, 2018

The open source alternatives you list seem to only provide experimentation logging. ML Flow seems to support more (such as model deployment).

Not to claim that the deployment processes are _good_, just that MLFlow seems more general than these open source alternatives listed here.

__bee · on June 5, 2018

How about SageMaker, Can we include it in this list. I played with SageMaker sometime ago and it helps you build a whole pipeline to host your models, in addition to host your notebook and bridge the gap between data scientists and data engineers.

brylie · on June 6, 2018

Anecdotally, we considered using the hosted versions of Jupyter and Apache Zeppelin that are part of AWS SageMaker and EMR. We couldn't figure out a simple/familiar workflow for keeping the notebooks under version control. So, we agreed to run the notebooks locally, use a familiar Git-based workflow, and interact with the AWS infrastructure through the local notebook instances.

garysieling · on June 6, 2018

Does Zeppelin work naturally with git? I've been struggling to get the right setup with just Jupyter

brylie · on June 9, 2018

Well, good question. The file format for Jupyter is not ideal for 'code craftsmanship', as pointed out by another comment. There are utilities to strip out some of the metadata from the Jupyter files, such as rendered output and run counters, but that is a trade-off to be decided by your team:

https://github.com/kynan/nbstripout

pilooch · on June 6, 2018

For deep learning, deepdetect can be useful in dev and prod phase.

mmq · on June 5, 2018

Polyaxon, https://github.com/polyaxon/polyaxon, is an alternative project to MLFlow that is also open source.

disclaimer: I am the author of Polyaxon.

syllogism · on June 5, 2018

I've been using Polyaxon to train models for spaCy. It uses Kubernetes, so it's easy to set it up to work very efficiently on Google Compute Engine. For instance, you can set it to use pre-emptible instances, which makes the experiments very cheap.

brylie · on June 6, 2018

Any plans to support scikit?

mmq · on June 6, 2018

You can use scikit-learn, or any other library, to train your models.

TheIronYuppie · on June 5, 2018

I would imagine Kubeflow (https://github.com/kubeflow/kubeflow) would be complimentary (e.g. run MLFlow on top of it) - they claim platform neutrality.

Disclosure: I work at Google on Kubeflow.

ddutta · on June 6, 2018

Yes, Kubeflow is a vey promising platform for ml lifecycle management on kubernetes. The combination of kubernetes, istio and kubeflow could enable other higher layer workflow tools (mlflow, h2o etc). This space is early.

xianghonglee · on June 6, 2018

the mlflow now suport ml lifecycly management?I think we can find a way to combinate kubeflow and mlflow

agnokapathetic · on June 5, 2018

https://www.mlflow.org is open source.

tchalla · on June 5, 2018

I understand. However, there could be more than one open source ML platforms. Hence, open source alternatives to MLFlow doesn't imply MLFlow isn't open source.

daveguy · on June 6, 2018

https://pachyderm.io is a data science infrastructure framework that has both open source and enterprise versions.

marcuniq · on June 6, 2018

RiseML seems to be another alternative: https://riseml.com/

Not open source though

allisterb · on June 6, 2018

ClassifyBot https://github.com/allisterb/ClassifyBot

Designed to automate and make repeatable different stages in classification pipelines. Written in .NET but agnostic about language or framework. Embeds a Python interpreter and can interface with Java or R.

Disclaimer: I am a walrus

shakedown1 · on June 6, 2018

A couple more alternatives:

H2o: https://www.h2o.ai/h2o/ Data Robot: https://www.datarobot.com/

phltech · on June 7, 2018

Also not open source, but looks similar to what ParallelM does http://www.parallelm.com

axilmar · on June 6, 2018

There is also Orange, although I am not sure it is 100% related to MLFlow. Orange is a joy to use though, so even if it doesn't solve all the problems solved by MLflow, it's worth to be mentioned in this context.

riku_iki · on June 5, 2018

So, what MLflow can do more efficiently, than Tensorflow?

nicolewhite · on June 6, 2018

I can't tell if you didn't read the article or are asking for a comparison between this and, say, TF's Experiment class.

stochastic_monk · on June 5, 2018

It isn’t a deep learning framework. It’s closer to a LIMS system.

tafycent · on June 5, 2018

FYI typo: mlflow.log_atrifact("roc.png")