
Ask HN: Python package for managing science datasets - kelsolaar
I was looking for a Python package to manage science datasets, mainly description, download, extraction, on-disk handling, etc... ML frameworks tend to roll their own:<p>- https:&#x2F;&#x2F;github.com&#x2F;nilearn&#x2F;nilearn&#x2F;blob&#x2F;master&#x2F;nilearn&#x2F;datasets&#x2F;utils.py
- https:&#x2F;&#x2F;github.com&#x2F;tensorflow&#x2F;datasets&#x2F;blob&#x2F;master&#x2F;tensorflow_datasets&#x2F;core&#x2F;download&#x2F;download_manager.py
- https:&#x2F;&#x2F;github.com&#x2F;scikit-learn&#x2F;scikit-learn&#x2F;blob&#x2F;master&#x2F;sklearn&#x2F;datasets&#x2F;base.py
- https:&#x2F;&#x2F;github.com&#x2F;pytorch&#x2F;vision&#x2F;blob&#x2F;master&#x2F;torchvision&#x2F;datasets&#x2F;mnist.py<p>I was hoping that somebody had made a generic package for this purpose so that I don&#x27;t roll yet another one.
======
amirouche
I started a list of such software [https://github.com/awesome-data-
distribution/awesome-data-di...](https://github.com/awesome-data-
distribution/awesome-data-distribution#awesome-data-distribution)

~~~
kelsolaar
That is awesome (pun intended), thanks!

------
swaroop
How about [https://dvc.org/](https://dvc.org/) ?

~~~
kelsolaar
Interesting, I did not think about this way at all, I will take a look.

