
Academic Torrents - yinghang
http://academictorrents.com/
======
cing
The team should learn from the ghost-town that is BioTorrents[1] and offer
more than just a tracker. [1]
[http://www.biotorrents.net/browse.php?incldead=1](http://www.biotorrents.net/browse.php?incldead=1)

~~~
_delirium
That's one reason I'd prefer that academics just put data into some kind of
local university archive, where possible. Many universities provide resources
to host scientific data (and have done so for decades, since the days of
ftp.dept.university.edu servers), and putting it there makes it more likely
that it'll still be there in 10 years. Torrents by comparison tend to be: 1)
slow, as you rely on random seeders rather than a university that's peered
onto Internet2 or the LambdaRail; and 2) unreliably seeded, as people drop
off. Plus the workflow of "curl -O URL" is nicer than torrenting.

Universities typically have great bandwidth and good peering, and already host
much larger data repositories than this seems to be targeting (e.g. here's a
30-terabyte repository, [http://gis.iu.edu/](http://gis.iu.edu/)), so they
should be able to provide space for your local scientific data. Complain if
not!

~~~
runarberg
Perhaps if universities would robustly seed their staffs and students
torrents?

~~~
glomph
Kind of solves a problem that doesn't exist though doesn't it? It isn't like
these universities are crying about bandwidth costs and it isn't like demand
is maxing out their upstream.

~~~
icebraining
It's not just about their bandwidth; some countries / zones / networks have
much better local connectivity than external, particularly international.

For example, until a few years ago, some of our ISPs had different caps for
national vs international traffic, and there were popular forks of P2P clients
that allowed you to filter based on that.

We have since moved to unlimited everything, but I wouldn't be surprised if
some countries still had different caps or speeds for international traffic.

------
TheBiv
This is really cool.

I simply wished that the messaging was more clear and told a story that I
could tell to my friends who ultimately are "too busy" to think about the
value of this product.

Unfortunately "We've designed a distributed system for sharing enormous
datasets - for researchers, by researchers. The result is a scalable, secure,
and fault-tolerant repository for data, with blazing fast download speeds."
Just isn't a story that I can tell to my buddies and get them excited.

~~~
henryzlo
Thanks for the comment. We've created a shorter "pitch" style presentation for
the non-technical / too-busy, which summarizes the benefits etc. in a short
several minute description.

[https://docs.google.com/presentation/d/1JC2d1g9U6HaenGSn_Xvk...](https://docs.google.com/presentation/d/1JC2d1g9U6HaenGSn_Xvkl7FXyNo5LgVpLFQqCrNgF6k/edit#slide=id.g2665fc4d4_0117)

~~~
Schwolop
Respectfully - that's an EVEN LONGER message. You need a sentence. Ten words,
tops.

------
hardwaresofton
Wow, this is pretty cool -- one of the most direct approaches to open-data
that I've seen so far (and the research world is of course in dire need of
this kind of open data/connect-the-dots enabling effort)!

I think it would be pretty cool to have trending datasets on the front page
(I'm sure you could do a small cron that would find the most-downloaded per-
week/per-day/etc)

Also, while not a dire necessity, I think a cooler name would help this
project fly farther -- You should be able to make a play on "data torrents",
maybe something like datastorm/samplerain/datawave/dataswell/Acadata?

Any way, trivial stuff aside, nice implementation -- bookmarked for when I get
the urge to do a data-analysis project!

~~~
yinghang
Thanks! But this is not my project. It is something created by a grad student
I met just a couple hours ago at a hack night discussion.

------
teddyh
So what do I do if I want to seed them all? Also, are all the data sets (and
other things) freely licensed, i.e. no “non-commercial use only” clauses or
things of that nature? Can I count on this going forward?

------
jakeogh
A few TB of FOIA information related to the September 11th attacks is
available via BT.

Direct link:
[http://911datasets.org/images/911datasets.org_all_torrents_J...](http://911datasets.org/images/911datasets.org_all_torrents_Jan_30_2014.zip)

------
dav-
Any reason passwords for user accounts are limited to 40 chars?

------
ses
Projects like this confirm my suspicion that traditional academic publishing
is going to take a nosedive in the next few years. Working in this industry as
I do, I don't see commercial publishers moving quickly enough to change.
Really love the idea of this and can't help but support the general ethos of
it, even if it / its descendants will put a lot of us out of a job.

------
kartikkumar
Brilliant idea if I understand it correctly. Just want to check that my use
case would fit. I just submitted my first and main paper for my PhD to Icarus.
I'm planning on soon uploading it to ArXiv as well. My paper is theoretical in
nature and through a suite of Monte Carlo simulations I generated a few
hundred MBs of data. Can I make use of this system as a way to deposit that
data so that it's available to anyone that wants to verify the conclusions I
reach in my paper and possibly extend the research?

------
thedudemabry
Wow! That's a snappy site. Major props to the frontend dev(s).

~~~
kirubakaran
Looks like just stock Bootstrap (not that there is anything wrong with that).

~~~
pointernil
True. I guess ppl of academia are used to "different" quality/snappiness
levels ;)

------
csense
I'm surprised they don't have the Google Books n-gram dataset [1]. Then again,
maybe they're more focused on data that doesn't have a good home already than
on mirroring.

[1]
[http://storage.googleapis.com/books/ngrams/books/datasetsv2....](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)

------
lancemjoseph
Many of the datasets that I've seen in academia are stored in static SQL
databases that tend to be about 10-20 terabytes. Where does this leave
individuals with limited resources who would like to query large databases
without having to juggle the data management side of research? Are there
softwares that make database querying P2P accessible?

------
macarthy12
The problem is the word "torrent". Too many negative connotations for many in
the traditional academic world.

~~~
teddyh
And you’re writing this on “Hacker News”…

------
shitlord
I have an idle server with 500 Mb/s upload. Now I can finally put it to good
use! :)

------
incogmind
I remember the old days of DC++ whenever I hear blazing fast speeds.

------
linux_devil
Great ! Looking forward to coursera, edx and ocw videos too

~~~
dombili
That'd be great actually. Especially for those who're not able to reach
Coursera because of the stupid laws.

~~~
sitkack
If you can get a cheap VPS in the US, you can use coursera downloader to grab
all the content and then rsync it to your home country.

~~~
dombili
I'm able to reach Coursera just fine. I don't live in any of those countries
(nor I'm from any of them). I just thought it'd be nice to make them available
to everyone, because that's the way it should be.

I use coursera downloader because it's hard to keep up with Coursera's own
schedule. I already have a ton of materials from different courses on my
computer and I would be happy to make them available to everyone, but my
upload speed sucks.

------
alagappanr
We would need a significant number of seeders in order for this to become a
successfully used product. Perhaps, universities can seed data?

------
nvdk
this seems to be very focused on US academics, at least that is what
impression I'm given by labeling ".edu" addresses. It gives a feeling that
these torrents/datasets are of better quality. I'm also missing a catalog on
this tracker, some basic taxonomy would be most welcome...

~~~
jsumrall
I didn't get that impression. Are you referring to the ".edu" address of the
creators of the site? Do you mean people with a ".edu" address, and therefore
at an American institution, give you a sense of their work being higher
quality?

~~~
nvdk
to clarify: torrents are marked "edu" if the user has a .edu address, this
makes those torrents stand out. The majority of non us universities do not
offer *.edu addresses to their staff and students.

~~~
jsumrall
Yes, you're right. Which then brings up the question about how to determine if
the data comes from an "academic" address, as was pointed out, only US
institutions or institutions which are accredited by the US Dept. of Education
can apply for an .edu top level domain— meaning nobody has it.

------
mathattack
I am no expert on torrents, but I like this conceptually. Publicly funded
academic research should be free.

------
talles
What a wonderful idea. This fits so well with the torrent protocol (maybe even
philosophically speaking).

------
huevosabio
This awesome! Thanks for sharing!

------
erikb
awesome invention! Could this be connected with the google scholar to add
keyword searching?

------
guspe
Aaron Swartz's dream come true?

~~~
skrebbel
No

------
sillysaurus2
One problem with offering a dataset as a torrent is that it's impossible to
edit it after it's released. However, it seems like that doesn't matter at all
in this case, because any scenario I can think of which could be solved by
editing the dataset (like redacting private info that was accidentally
included) wouldn't avoid the original problem: that they accidentally released
private info in the first place. Perhaps it'd be useful to edit the original
dataset in order to add to it / enhance it with more info, but in that case
they could just release a second dataset as an addendum.

So the core idea seems solid. Thank you for this!

~~~
pixelcort
BitTorrent Sync might be useful for that:

[http://www.bittorrent.com/sync](http://www.bittorrent.com/sync)

~~~
rodolphoarruda
I'd used BT Sync for a couple of weeks to sync data between my own machines.
It works neatly. One question here. When you modify some part of a big file,
does the program send out only the difference to the other authorized
machines, or entire file? Let's say a researcher exports her data to a 1GB CSV
file of my interest. I download it. In the following week the same researcher
updates her CVS with more data, now it has 1.01GB in size. How big my next
download will be?

~~~
lojack
Seems as though it supports patching so only the parts that are changed would
be synced. Of course, the download size is completely dependent on what parts
of the file were changed.

[http://forum.bittorrent.com/topic/19228-latest-sync-
build-11...](http://forum.bittorrent.com/topic/19228-latest-sync-
build-1170/page-6#entry56304)

------
jackmaney
Excellent! It's far too early to tell, but I'd like to be hopeful that this
distribution network could be another nail in the coffin of the old,
expensive, dead-tree journals.

~~~
rfoeorfisdjus
I guess you mean papers. Go back to reddit, libtard.

~~~
jackmaney
Yes, I do mean papers. I'm not on reddit, you mouth-breathing neanderthal.

------
CompleteMoron
thanks for sharing! I shall store in the vault of Hard Drives I keep here in
the desert

