
BitTorrent Turns 10: Happy Birthday  - Uncle_Sam
http://torrentfreak.com/bittorrent-turns-10-110702/
======
gruseom
An obvious use of BitTorrent would be to build a supercomputer on top of it.
The biggest barrier to ad hoc computation is getting at the data. (John
Carmack: "It pains me to have hundreds of cpu cores across the office idle,
but data transport precludes most computation of opportunity" -
<http://twitter.com/#!/ID_AA_Carmack/status/61076521640665089>). That is what
BitTorrent is good for.

Especially for problems where all you need to do is divide your data into
slices, apply some function, and assemble the results, BitTorrent already
solves the hard part of this. (Edit: because the hard part is getting at the
data in the first place, not computing over it and assembling the results.)
That's what you're doing when you download a movie, it's just that the
function is "identity" and the assembly is "concat".

So HN, how would you build this? has it been worked on? what am I missing?

~~~
JonnieCache
What you are describing is MapReduce, and it is the foundation on which
google's empire is built.

I think I must be misunderstanding you. Are you talking about some p2p system
for distributing the code/data to the nodes? Like a decentralised BOINC?

The bit about BT that makes it special is the tit-for-tat negotiation system,
which prevents clients from "cheating" by downloading and not uploading. In an
environment where there is no motive for an individual node to behave
differently from the rest of the swarm for its own gain, then BT's unique
benefits disappear.

~~~
gruseom
I'm not talking about MapReduce but rather the back end presupposed by things
like MapReduce. Let me try an example. Imagine some large dataset. All the
stock trades ever, say. Or all the temperature readings ever. (Maybe these
particular datasets don't exist, but some do.) Suppose some Saturday afternoon
a question pops into your head. Wouldn't it be great to know, say, the
average, max, and min temperatures in that dataset, maybe broken down across
geographical buckets. How would you go about it? The hard part isn't computing
averages - the difference between computing "average" and computing "identity"
is practically negligible. The hard part is obtaining the data, arranging it
in a suitable form, and then being able to get at it. You want the data to
already exist in zillions of morsels on lots of different computers, have a
fast way of discovering which morsels your computation needs, and a fast way
of accessing those morsels. Ideally there would be many copies of each morsel,
giving you a lot of choices about whom to talk to.

In other words, you want a decentralized, redundant morsel discoverer-
replicator. To me this sounds like what BitTorrent already does with, say,
movies. Any popular movie is already split into morsels and replicated a
zillion times, so any particular download (computation, in our case) can
immediately start running. Of course that's only true for files that are
already well-seeded. But it's easy for anyone to come along with a new movie
(dataset, in our case) and plug it in, making it easy for anyone else to
download (compute with), and getting faster the more popular it becomes.

(MapReduce can do the computation easily... _if_ you've first arranged all the
data appropriately. But that "if" is what, in Carmack's words, "precludes most
computation of opportunity.")

The reason I wonder if BitTorrent would be suitable for solving the hard part
of this problem (discovering and transferring big data) is that it already
_has_ solved it, massively. What else would you need to get true ad hoc
computation on arbitrary data? A query language, an evaluator for that
language that knows how to compute with morsels and assemble the results
(there's your MapReduce)... and what else?

~~~
JonnieCache
Nice ideas. It sounds to me like you should be doing a Phd.

~~~
gruseom
I work in a related area, which is why the idea occurred to me. To combine the
two would be awesome, but is a long-term prospect.

One more point. You mentioned BOINC. I believe BOINC sends all input data to
the client with each request. In other words, it's only good for
computationally intensive tasks (lots of computation per unit of data). The
same is true of GPGPU. That's why I brought up "average" as an example: it's a
commonplace function that is not computionally intensive at all. The work is
almost entirely in finding the data and accessing it.

What I really want to know, though, is who else has thought about using
BitTorrent for this kind of thing, and what they think.

------
matthew-wegner
BitTorrent fascinates me because it was a thought problem. The technology to
support BitTorrent had been in place for years--the requisite bandwidth,
processing power, and storage space--but nobody had conceived of a clear-
enough implementation to execute the high concept of piecemeal transmission
before Bram.

Most unsolved problems today are probably thought problems, not technical
problems (excepting battery life and blanketed super-fast wireless).

~~~
InclinedPlane
It's interesting to think about the sorts of technologies that are more about
knowledge than supporting infrastructure. A perfect example would be the
telegraph, which was only developed about a century ago but could have easily
been built in cruder forms going back 2 or 3 thousand years. It makes one
wonder what sort of fundamental innovations will be discovered in the future
and what sorts of things we are missing out on today merely out of ignorance.

------
JonnieCache
This makes me feel old. At 23, that isn't good.

------
GvS
Thanks Bram, you made internet better place.

------
ujjvala
Happy Birthday BitTorrent!!!

