
Show HN: Continuous Machine Learning – CI/CD for Machine Learning Projects - rhythmvertigo
http://cml.dev
======
rhythmvertigo
Hi, I'm one of the project creators. Continuous Machine Learning (CML) is an
open source project to help ML projects use CI/CD with Github Actions and
Gitlab CI
([https://github.com/iterative/cml](https://github.com/iterative/cml)).

CML automatically generates human-readable reports with metrics and data viz
in every pull/merge request, and helps you use storage and GPU/CPU resources
from cloud services. CML addresses three hurdles for making ML compatible with
CI:

1\. In ML, pass/fail tests aren’t enough. Understanding model performance
might require data visualizations and detailed metric reports. CML
automatically generates custom reports after every CI run with visual elements
like tables and graphs. You can even get a Tensorboard.dev link as part of
your report.

2\. Dataset changes need to trigger feedback just like source code. CML works
with DVC so dataset changes trigger automatic training and testing.

3.Hardware for ML is an ecosystem in itself. We’ve developed use cases with
CML and Docker Machine to automatically provision and deploy cloud compute
instances (CPU & GPU) for model training.

Our philosophy is that ML projects- and MLOps practices- should be built on
top of traditional software tools and CI systems, and not as a separate
platform. Our goal is to extend DevOps’ wins from software development to ML.
Check out our project site ([https://cml.dev](https://cml.dev)) and repo, and
please let us know what you think!

~~~
doppenhe
Deployment, inference and management can participate in this as well!

Here is the missing part for a total e2e solution:
[https://github.com/marketplace/actions/algorithmia-ci-
cd](https://github.com/marketplace/actions/algorithmia-ci-cd)

{disclaimer, we built this Github action}

~~~
davidortega
Hi doppenhe, we have that part already implemented using cml-send-github-check
and dvc metrics diff. You can compare the metric that you prefer with dvc and
then just set the status of the github check uploading your full report. Of
course, you can also fail the workflow as your Github action does, but I think
is more useful to see it as a report in the check.

disclaimer: I'm work with CML

------
ishcheklein
Hey! Disclaimer - I'm one of the DVC maintainers :) Super excited for the team
on this release!

For the last two years we have seen over and over again how our users take DVC
and use it inside Gitlab, Github, etc. This product was born partially as a
result of these discussions, partially as an initial visions for the ML tools
ecosystem - Hashicorp-like.

Having A software engineering background I really hope that integrating ML
workflow into engineering tools will be the future of this space. And with CML
and other tools (e.g. [https://github.blog/2020-06-17-using-github-actions-
for-mlop...](https://github.blog/2020-06-17-using-github-actions-for-mlops-
data-science/)) we see this happening.

~~~
sytse
Very cool to see this!

I've added integrating CML closer into GitLab to our direction page with
[https://gitlab.com/gitlab-com/www-gitlab-
com/-/merge_request...](https://gitlab.com/gitlab-com/www-gitlab-
com/-/merge_requests/55367/diffs)

If you want to show the results in the Merge Request itself like we do for
things like test results and security scans please let us know. Open an issue
and if there is no response my Twitter handle is @sytses

~~~
ishcheklein
Hey Sytse! Thanks for the kind words. We're very interested in a deep
integration with Gitlab :) Can you share some examples of these security scan
reports, please? Right now, we return CML reports as comments in Merge
Requests like this: [https://gitlab.com/iterative.ai/cml-cloud-
case/-/merge_reque...](https://gitlab.com/iterative.ai/cml-cloud-
case/-/merge_requests/3). We'd appreciate any tips or suggestions.

~~~
sytse
Awesome! Other replies to your comment already gave suggestions. Consider
asking them for a call to give in if you need more information.

~~~
ishcheklein
Thanks! we'll check it out and be in touch.

------
toisanji
Very interesting, I've been looking for something like this to add to our ML
pipeline. a few questions:

1) can we see examples of generated reports?

2) what happens if training fails?

3) what kind of metrics can it graph? can we have it track our custom metrics?

4) can we connect with external services like with webhooks,slack, and other
integrations.

5) is this a docker technology, or how does it deal with images and
dependencies?

Great work!

~~~
rhythmvertigo
Thanks, and good questions!

1\. Yes! Let me link some reports and example repos:

\- A basic classification problem with scikit learn:
[https://github.com/iterative/cml_base_case/pull/2](https://github.com/iterative/cml_base_case/pull/2)

\- CML with DVC & Vega-Lite graphs:
[https://github.com/iterative/cml_dvc_case/pull/4](https://github.com/iterative/cml_dvc_case/pull/4)

\- Neural style transfer with EC2 GPU:
[https://github.com/iterative/cml_cloud_case/pull/2](https://github.com/iterative/cml_cloud_case/pull/2)

2\. If training fails, you'll be notified that your run failed in the GitHub
Action dashboard (or GitLab CI/CD dashboard). See here for some real life
examples of failure ;) :
[https://github.com/iterative/cml_cloud_case/actions](https://github.com/iterative/cml_cloud_case/actions)

3\. CML reports are markdown documents, so you can write any kind of text to
them. If your metrics are output in a file `metrics.txt`, you can have your
runner execute `cat metrics.txt >> report.md` and then have CML pass on the
report to GitHub/GitLab. Likewise, any graphing library is supported because
you can add standard image files (.png, .jpg) to the report. So custom metrics
and custom graphs. We like DVC for managing and plotting metrics, but we're
biased because we also maintain it.

4\. Yep, GitHub Actions is pretty powerful and flexible. Works with whatever
external services you can connect to your Action!

5\. It's not strictly a Docker technology. We use Docker images preinstalled
with the CML library in our examples, but you can just install the library
with npm in your own image. [https://github.com/iterative/cml#using-your-own-
docker-image](https://github.com/iterative/cml#using-your-own-docker-image)

Let me know if there's anything else I can tell you about

------
bigfoot675
I think taking the approach of "help[ing] your team make informed, data-driven
decisions" through generating reports is valuable here. In my opinion, it goes
too far if we start continuously deploying ML code like it's a SWE project. To
take an example in the case of autonomous vehicles, pushing continuous updates
to perception modules without thoroughly exploring the ramifications of an
update could be potentially catastrophic.

Obviously we can't predict every error by thinking hard, but datasets will
never serve as a full representation of what models might experience in the
real world. Continuous deployment to an ML model could affect undefined
behavior in unpredictable ways.

~~~
rhythmvertigo
Definitely- automatic model deployment would only work for a subset of
applications. We think a lot more about having models available as candidates
for deployment, which can then be inspected by domain experts on a team.

One of our motivations for building visual reports that appear like comments
in a pull request is giving teams metrics & info to discuss when deciding if
merge is right. That way, the automated part is training and testing, but the
decision making is human (i.e., data scientists whose skills are better used
interpreting models & data than running repetitive training scripts).

------
tknaup
ML is a relatively young field, and decades behind Software Engineering in
terms of best practices for running production systems. CI/CD massively
improved the innovation cycle time and quality of production software, and I
believe it is key for building robust production ML systems as well. CML looks
like a really easy to use product for bringing CI/CD to ML projects.

~~~
obowersa
Just something I wanted to share which might be of interest on this side of
things.

The CD Foundation has a SIG around MLOps which is pretty active and has some
awesome folk participating.

For anyone who's interested in this space, there's some more detail here:
[https://cd.foundation/blog/2020/02/11/announcing-the-cd-
foun...](https://cd.foundation/blog/2020/02/11/announcing-the-cd-foundation-
mlops-sig/)

------
tnachen
Awesome to see a github native workflow for CI/CD in the ML space! This team
is closet I seen that's like Hashicorp for ML

~~~
rhythmvertigo
Thanks! Hashicorp is our inspiration :D

------
ayanb9440
Very exciting! I used this team's previous product (DVC) at a research lab at
Caltech. This looks like a very useful tool.

~~~
rhythmvertigo
Out of curiosity, what kind of research? I'm very interested in getting DVC
and Git more broadly into academic research circles, but finding a lot of
barriers in my home field.

------
rkaplan
Longtime DVC user here - this is going to be so helpful. We use DVC for all of
our model and data versioning, but what's been missing is the ability to
cleanly integrate that into our CI workflow. Looks like that's solved now! The
cml.yaml syntax also looks quite nice, very easy to follow. Looking forward to
trying this out.

~~~
rhythmvertigo
Really glad to hear that, rkaplan. Let us know if we can be of any help!

------
m0sth8
CML looks really awesome. I've been on one of your online meetups. Are you
planning to host more in the future? It would be great to learn real
production use cases!

~~~
m0sth8
Btw. Here's a recording of the first DVC meetup
[https://www.youtube.com/watch?v=19GMtrFykSU](https://www.youtube.com/watch?v=19GMtrFykSU)
I like it a lot. Highly recommended!

