Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MLflow: An Open Source Machine Learning Platform (databricks.com)
135 points by r4um on June 5, 2018 | hide | past | favorite | 28 comments


If you are looking for a dependable, scalable, closed-source option, check out http://www.comet.ml (the thing I work on).

The focus with Comet.ml is more on experiment tracking and hyperparameter optimization rather than model deployment. We make it very easy to compare your experiments results, code, and hashed datasets for better reproducibility.

We have a one-line integration with your existing machine learning code and make it stupid simple to start tracking your experiments.

All you do is:

  > import comet_ml
  > experiment = Experiment(api_key="MY_API_KEY")
_boom_

Comet.ml supports many libraries (keras, tensorflow, scikit-learn, custom-built code spaghetti, and everything else that makes you a ML wizard/unicorn/armored flaming hippopotamus).

  ++ Its free for public projects and academics.


What are the current open source alternatives to MLflow?



The open source alternatives you list seem to only provide experimentation logging. ML Flow seems to support more (such as model deployment).

Not to claim that the deployment processes are _good_, just that MLFlow seems more general than these open source alternatives listed here.


How about SageMaker, Can we include it in this list. I played with SageMaker sometime ago and it helps you build a whole pipeline to host your models, in addition to host your notebook and bridge the gap between data scientists and data engineers.


Anecdotally, we considered using the hosted versions of Jupyter and Apache Zeppelin that are part of AWS SageMaker and EMR. We couldn't figure out a simple/familiar workflow for keeping the notebooks under version control. So, we agreed to run the notebooks locally, use a familiar Git-based workflow, and interact with the AWS infrastructure through the local notebook instances.


Does Zeppelin work naturally with git? I've been struggling to get the right setup with just Jupyter


Well, good question. The file format for Jupyter is not ideal for 'code craftsmanship', as pointed out by another comment. There are utilities to strip out some of the metadata from the Jupyter files, such as rendered output and run counters, but that is a trade-off to be decided by your team:

https://github.com/kynan/nbstripout


For deep learning, deepdetect can be useful in dev and prod phase.


Polyaxon, https://github.com/polyaxon/polyaxon, is an alternative project to MLFlow that is also open source.

disclaimer: I am the author of Polyaxon.


I've been using Polyaxon to train models for spaCy. It uses Kubernetes, so it's easy to set it up to work very efficiently on Google Compute Engine. For instance, you can set it to use pre-emptible instances, which makes the experiments very cheap.


Any plans to support scikit?


You can use scikit-learn, or any other library, to train your models.


I would imagine Kubeflow (https://github.com/kubeflow/kubeflow) would be complimentary (e.g. run MLFlow on top of it) - they claim platform neutrality.

Disclosure: I work at Google on Kubeflow.


Yes, Kubeflow is a vey promising platform for ml lifecycle management on kubernetes. The combination of kubernetes, istio and kubeflow could enable other higher layer workflow tools (mlflow, h2o etc). This space is early.


the mlflow now suport ml lifecycly management?I think we can find a way to combinate kubeflow and mlflow


https://www.mlflow.org is open source.


I understand. However, there could be more than one open source ML platforms. Hence, open source alternatives to MLFlow doesn't imply MLFlow isn't open source.


https://pachyderm.io is a data science infrastructure framework that has both open source and enterprise versions.


RiseML seems to be another alternative: https://riseml.com/

Not open source though


ClassifyBot https://github.com/allisterb/ClassifyBot

Designed to automate and make repeatable different stages in classification pipelines. Written in .NET but agnostic about language or framework. Embeds a Python interpreter and can interface with Java or R.

Disclaimer: I am a walrus


A couple more alternatives:

H2o: https://www.h2o.ai/h2o/ Data Robot: https://www.datarobot.com/


Also not open source, but looks similar to what ParallelM does http://www.parallelm.com


There is also Orange, although I am not sure it is 100% related to MLFlow. Orange is a joy to use though, so even if it doesn't solve all the problems solved by MLflow, it's worth to be mentioned in this context.


So, what MLflow can do more efficiently, than Tensorflow?


I can't tell if you didn't read the article or are asking for a comparison between this and, say, TF's Experiment class.


It isn’t a deep learning framework. It’s closer to a LIMS system.


FYI typo: mlflow.log_atrifact("roc.png")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: