
Show HN: DVC 1.0 release. 5 lessons from 3 years of building open-source ML tool - dmpetrov
Hey HN, creator of DVC here!<p>DVC (https:&#x2F;&#x2F;dvc.org&#x2F;) is known as Git for data projects. Technically, DVC codifies your data and machine learning pipelines as text metafiles (with pointers to actual data in S3&#x2F;GCP&#x2F;Azure&#x2F;SSH) while you use Git for the actual versioning. DevOps folks call this approach GitOps or more specifically in this case - DataOps or MLOps.<p>We’ve been working towards 1.0 since we started 3 years ago. What began as my pet project now has 100+ code contributors, 100+ documentation contributors, and thousands of users.<p>Our community has taught us a lot - here are some of the biggest lessons:<p>1. Users say the serverless and distributed nature of DVC (inherited from the underlying Git) is one of its &quot;killer features&quot;.<p>2. To share ML projects within and between teams, it’s not enough to track only files and pipelines. You also need metrics, plot and hyperparameter tracking. In DVC 1.0 we implemented hyper-parameter diffs, metrics and plot diffs right from Git history.<p>3. In DataOps, data transfer optimization is huge. Large deep learning models, millions of images in datasets, etc. We doubled down on optimizing 1.0.<p>4. ML pipelines evolve faster than data engineering pipelines and need to be easy to change. In 1.0, we’ve simplified the pipeline metafile format.<p>5. More and more teams use DVC as a part of CI&#x2F;CD for ML and other MLOps tools. DVC is used under the hood in the CD4ML tool that was described in the canonical post on Martin Fowler’s blog: https:&#x2F;&#x2F;martinfowler.com&#x2F;articles&#x2F;cd4ml.html. We built 1.0 with CI&#x2F;CD users in mind.<p>More details on https:&#x2F;&#x2F;dvc.org&#x2F;blog&#x2F;dvc-1-0-release.<p>Happy to answer any questions here or at DVC Discord chat https:&#x2F;&#x2F;dvc.org&#x2F;chat.
======
jimmyvalmer
How do you make money? Sorry to be that guy; I wouldn't ask this if you didn't
claim "Happy to answer _any_ questions here".

~~~
dmpetrov
Good question! We build separate products (that use DVC) for monetization. No
plans to monetize DVC.

Some analogy - Git is free for versioning, GitHub/GitLab as monetization.

------
braza
Is it possible to use DVC within the new implementation of GitHub Actions? I
checked it out on the website and apparently it looks like it supports it, but
I wanted to know more about how you guys are getting ready for this new CI /
CD feature?

~~~
rhythmvertigo
We have a lot in store here- we are unrolling a new tool for CI/CD soon that
works with GitHub Actions & GitLab CI. Adding run-cache to DVC 1.0 is just one
way of preparing for more CI/CD uses of DVC. (FYI I am part of DVC)

