So the core idea seems solid. Thank you for this!
for scientific endeavours, this should be considered a feature, not a bug.
Or for some data it would make sense to partition the data into smaller chunks instead of one huge archive. That way adding a chunk (the new year's data for a multi-year dataset perhaps) just menas releasing a new torrent with the extra srchive in and a name meaningful enough to indicate the difference. Anyone with the last set could then just download the new partition (and any modified ones).
Universities typically have great bandwidth and good peering, and already host much larger data repositories than this seems to be targeting (e.g. here's a 30-terabyte repository, http://gis.iu.edu/), so they should be able to provide space for your local scientific data. Complain if not!
It's meant to include companion datasets for published papers, and gives out DOIs so datasets can be cited in other works. And it's mirrored at various universities to prevent loss.
For example, until a few years ago, some of our ISPs had different caps for national vs international traffic, and there were popular forks of P2P clients that allowed you to filter based on that.
We have since moved to unlimited everything, but I wouldn't be surprised if some countries still had different caps or speeds for international traffic.
But there is a need for a way to distributed large datasets that come out of nonacademic projects.
For example, the DBPedia data dumps are very slow to download at the moment.
* 100mbps unmetered 2x2tb 39 eur/mo
* 1gbps unmetered 24x2tb 349 eur/mo
* 10gbps unmetered 24x2tb 1089 eur/mo
I'm tempted to grab the first, and open a GitTip account in case anyone wants to chip in towards the second (4tb isn't a lot of space as far as this stuff goes). The third is unlikely to be useful; this stuff is long tail by its nature, so storage is probably more important.
Though in a world containing Google Fiber, would it still be a valuable service?
There's a university box seeding the torrent I'm grabbing (2011 weather patterns), but it still seems to be going quite slowly.
I simply wished that the messaging was more clear and told a story that I could tell to my friends who ultimately are "too busy" to think about the value of this product.
Unfortunately "We've designed a distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds." Just isn't a story that I can tell to my buddies and get them excited.
I think it would be pretty cool to have trending datasets on the front page (I'm sure you could do a small cron that would find the most-downloaded per-week/per-day/etc)
Also, while not a dire necessity, I think a cooler name would help this project fly farther -- You should be able to make a play on "data torrents", maybe something like datastorm/samplerain/datawave/dataswell/Acadata?
Any way, trivial stuff aside, nice implementation -- bookmarked for when I get the urge to do a data-analysis project!
Direct link: http://911datasets.org/images/911datasets.org_all_torrents_J...
I use coursera downloader because it's hard to keep up with Coursera's own schedule. I already have a ton of materials from different courses on my computer and I would be happy to make them available to everyone, but my upload speed sucks.