Xvc is a data and files management tool on top of Git. It allows you to track large and numerous data files next to your code and notebooks. You can also configure remote storages in S3 and compatible services (like Minio, DO Spaces, R2, Wasabi) or Rsync connections over SSH. You can send, receive and share these data files selectively. You can create signed URLs on S3-compatible services for your files that are valid for a limited time.
Xvc can also run your pipelines defined by steps (stages) that can be invalidated by various kinds of dependencies, like files, globs, URLs or regexes. If you have a directory with thousands of files that you preprocess and only one of these changes, you can define a command to run only for the changed file (or changed lines in a CSV file). It runs these steps in parallel if there are no dependencies between each other.
It can be used for general software development as well. You can keep non-code artifacts along with the code and define dependencies. It can be used for deployment via Git. It's small enough to be used on CI. All features can be used from Python and Jupyter notebooks. User docs are in [3] and development docs are in [4].
Python bindings and example usage are in [2].
You can install for CLI with `cargo install xvc` and for Python with `pip install xvc`.
Data management features are similar to DVC, git-annex, and Git LFS, but Xvc doesn't depend on Git directly. It runs the git commands on your behalf but this can be disabled and if needs arise and popularize, another VCS can be used. Pipeline features are similar to DVC's but have more granularity and flexibility.
I experimented with ECS at the core of software. Modeling the software using entities and components attached to them is, in my experience, much more productive, flexible and maintainable than OOP. There is a state machine to run multiple possibly interdependent processes in the pipeline. I had to write a directory walker with gitignore, a file watcher to track file metadata at the OS level when a pipeline is running and even experiment with documenting the software with dialogues between hare and tortoise.
The larger experiment was, of course, whether I could do this. It's an ambitious project but it's my daily habit now and development in Rust feels nice. I feel as if I'm meeting with my friend in the morning.
I started Xvc as a solo project after leaving my work on DVC's technical docs. (See [5] for a brief history. See [6] for similarities and differences from it.)
I still don't think it's ready for prime time yet but this is HN and I wanted to share. My inner voice says 'you shouldn't release before writing all those Python tests' but I told him it's my birthday and it's nice to release something on your birthday.
I'm open to any feedback, questions, and suggestions. I added boxes for feedback and questions to the docs. I don't accept any non-trivial code contributions at the moment. I may change the license to a more "business-friendly" one partially (for just xvc-ecs or xvc-walker crates, for example) and don't want to feel bad about it. In any case, it will be free and open for human individuals and it doesn't track anything about you. I don't even track the number of visitors or downloads. I only know when someone stars the repo.
I plan to add more features, like experiment tracking, adding annotations to binary files and using these in file operations, and a server to manage your data repositories and maybe a GUI for all these. I don't intend to pollute my priorities by trying to build a business around it. My priority is development, writing, and keeping my fellow hackers happy.
Cheers from Istanbul.
[0]: https://xvc.dev
[1]: https://github.com/iesahin/xvc
[2]: https://github.com/iesahin/xvc.py
[3]: https://docs.xvc.dev
[4]: https://docs.rs/xvc
[5]: https://emresahin.net/a-brief-history-of-xvc/
[6]: https://docs.xvc.dev/start/from-dvc