
DVC 1.0 release: new features for MLOps - cl42
https://dvc.org/blog/dvc-1-0-release
======
dmpetrov
Hey HN, creator of DVC here!

DVC ([https://dvc.org/](https://dvc.org/)) is known as Git for data projects.
Technically, DVC codifies your data and machine learning pipelines as text
metafiles (with pointers to actual data in S3/GCP/Azure/SSH) while you use Git
for the actual versioning. DevOps folks call this approach GitOps or more
specifically in this case - DataOps or MLOps.

We’ve been working towards 1.0 since we started 3 years ago. What began as my
pet project now has 100+ code contributors, 100+ documentation contributors,
and thousands of users.

Our community has taught us a lot - here are some of the biggest lessons:

1\. Users say the serverless and distributed nature of DVC (inherited from the
underlying Git) is one of its "killer features".

2\. To share ML projects within and between teams, it’s not enough to track
only files and pipelines. You also need metrics, plot and hyperparameter
tracking. In DVC 1.0 we implemented hyper-parameter diffs, metrics and plot
diffs right from Git history.

3\. In DataOps, data transfer optimization is huge. Large deep learning
models, millions of images in datasets, etc. We doubled down on optimizing
1.0.

4\. ML pipelines evolve faster than data engineering pipelines and need to be
easy to change. In 1.0, we’ve simplified the pipeline metafile format.

5\. More and more teams use DVC as a part of CI/CD for ML and other MLOps
tools. DVC is used under the hood in the CD4ML tool that was described in the
canonical post on Martin Fowler’s blog:
[https://martinfowler.com/articles/cd4ml.html](https://martinfowler.com/articles/cd4ml.html).
We built 1.0 with CI/CD users in mind.

More details on
[https://dvc.org/blog/dvc-1-0-release](https://dvc.org/blog/dvc-1-0-release).

Happy to answer any questions.

PS: I've published this in another thread, but I repeated it here.

