
Ask HN: Options for distributed peer-to-peer image datasets? - lovelearning
I want to start an image dataset that has wide uses and can solve some unsolved problems.<p>Requirements:<p>#1 I need others to contribute personal photos and videos.<p>#2 Since the number and file sizes of relevant photos and videos is likely to be quite high, I think an architecture where photos&#x2F;videos and their annotations are all stored in their own local machines is better than expecting people to upload GBs of data to some central location.<p>#3 I&#x27;d like contributors to always retain access control of their photos and videos - they can revoke subsets of their files from the dataset at any point.<p>#4 It&#x27;ll probably also require creating an annotation solution that can distribute annotation tasks to volunteers, but the photos and videos still remain on local machines and only temporary copies with limited access may get uploaded centrally until annotated and then get deleted.<p>Questions:<p>Does some software to do all this - or some of it - already exist?
If not, is something like IPFS a good enough storage solution for these requirements? Any other suggestions?<p>I&#x27;m not concerned about issues with distributed learning&#x2F;compute for now. If the dataset is good enough, eventually some solution will probably emerge.
======
gus_massa
> _#3 I 'd like contributors to always retain access control of their photos
> and videos - they can revoke subsets of their files from the dataset at any
> point._

I don't understand. If the dataset can be used by other people (you, the
owner, and other people) then you can't force everyone to delete the photos.
They can pretend that they deleted the photos and avoid using them publicly
from this time. This is similar to the API in Tweeter that make you pinky
promise to delete the deleted tweets. But there is no delete button in the
Internet, only a hide button.

~~~
lovelearning
The access control is non-negotiable. Compute solutions will have to be
distributed in a way that they can work with it. For example, first few layers
of a neural network can be computed locally on contributor machines and
transfer only the results of the calculations.

------
stefkors
Perhaps you can do something with the dat:// protocol?

~~~
lovelearning
Hadn't heard of it before, but its goals sound like it could help me with my
problem. Thank you very much for telling me about dat!

