
Thinking about 'meta' torrent file format - mattengi
https://gist.github.com/mait/8001883
======
asdfaoeu
I've actually been thinking about this a bit as well.

I think you can just avoid the torrent file completely and use a merkle tree
hash like how new torrent files work and then you end up with just one torrent
file per file. And have peer acquisition work through DHT

Directories would be simple and just a matter of creating a new "file" with
hashes and names of the contents like how git directories (extending on this
you can have a version control system like git).

A noticeable change is that each individual file is uniquely shared. This I
believe is both a feature (avoiding duplicate torrents for the same file) as
well as means that anyone can see whos downloading a file a solution would be
another key hash which causes the dht id to be hashed again to allow
individual darknets.

~~~
KMag
I agree that advertising single file Merkle tree roots on the DHT is a good
thing, and that one could nicely build git-like directory structures, but why
force the leaves of the tree to be singleton torrent files?

Why not instead advertise individual files on the DHT by their Merkle tree
roots, and put the Merkle tree roots in each entry of the "files" section of
the torrent file? This doesn't force re-packaging of existing torrents into
singleton torrents. Seeders can advertise single files from old torrents and
clients with new torrents can take advantage of this advertising.

I disagree with the munged-key darknet idea. If you want a darknet, run it on
a non-public DHT, with cryptographic handshakes and encrypted traffic.
Cryptographically munging the DHT keys on a public DHT only creates a "light
grey net" that's trivially circumvented and provides a false sense of privacy.

~~~
jychang
> Why not instead advertise individual files on the DHT by their Merkle tree
> roots, and put the Merkle tree roots in each entry of the "files" section of
> the torrent file?

You can't do that because torrents aren't file delimited, they are block
delimited. You can't check 2 files are the same across torrents without first
downloading both torrents.

~~~
KMag
You misunderstand. This has nothing to do with comparing two torrents.
Comparing two torrents solves the wrong problem.

Clients that have downloaded all of the data for a single file (but may or may
not have downloaded all of the data for the full torrent) have the data for
the file and can calculate the Merkle tree root for that file, and advertise
availability on the DHT.

Clients with new style torrent files that included Merkle tree roots in file
descriptions would then be able to download those files. This has nothing to
do with comparing torrent files.

------
RamiK
[https://en.wikipedia.org/wiki/Metalink](https://en.wikipedia.org/wiki/Metalink)

It's in there somewhere...

Edit: Here's a more relevant use case:

[https://wiki.debian.org/Metalink](https://wiki.debian.org/Metalink)

------
oakwhiz
To see one method that is used to work around this sort of thing: The folks
over at [http://www.tlmc.eu/](http://www.tlmc.eu/) have been expanding the
same 1.2TB collection of files for a while, just by stopping the old torrent,
running a Python script to patch the changes, and then rechecking and starting
the new torrent from the old directory.

~~~
zeitg3ist
but doesn't this mean that all the other peers would need to manually upgrade
their copy of the torrent file?

~~~
plorkyeran
Yes. It works reasonably well in this specific case because it's such a niche
thing (you don't download 1.25 TB of Touhou music if you don't really care
about Touhou music), but it doesn't benefit from people who continue to seed
things they've long forgotten about.

~~~
enko
> 1.25 TB of Touhou music

I am going to have nightmares tonight...

------
dz0ny
Private trackers will say no. Public trackers may welcome this...

~~~
predakanga
There are many types of private tracker that would love to see this - for
instance consider gaming trackers, where you may have a single .torrent for a
large collection of ROMs, or DLC for a game. Consider TV trackers, tracking a
whole TV season with a single .torrent file, or music trackers with
discographies.

More importantly, a key concern on private trackers is swarm size - an
extension like this would have the potential to expand the available peers on
a given file, if the file exists in other swarms on the same tracker. Not a
very common use case, but one to consider nonetheless.

------
sargun
Is this basically an append-only torrent file? This could actually be
implemented without having to do many changes to the torrent format. You can
just have the client de-dupe based on file length + hash.

~~~
oakwhiz
Couldn't you also hash the root of the hash tree for the new appended data
with the root of the hash tree for the old torrent? It would be like a hash
chain of hash trees, but pointing backward in time.

------
kovalkos
Another problem with torrents is compression of files. Compressing a torrent
makes it impossible to select only 1 file from a big collection.

~~~
j_s
I would think this is a failure of the client, which should support
compression formats well enough to be able to fish around inside of the
compressed file once it got the metadata portion (zip directory or whatever).

[http://en.wikipedia.org/wiki/Zip_%28file_format%29#Design](http://en.wikipedia.org/wiki/Zip_%28file_format%29#Design):
_A directory is placed at the end of a .ZIP file. This identifies what files
are in the .ZIP and identifies where in the .ZIP that file is located. This
allows .ZIP readers to load the list of files without reading the entire .ZIP
archive._

~~~
JTon
This is a great idea! I wonder why it hasn't already been implemented

~~~
j_s
I think it is because most torrent client developers are focused on the
protocol rather than the end-user experience.

Here is a tool that makes it possible to preview video/audio quality by
getting the first and last .rar file: [http://techzil.com/play-rar-files-
without-extracting-uisng-d...](http://techzil.com/play-rar-files-without-
extracting-uisng-dziobas-player/)

------
brokenparser
Perhaps we could make trackers more intelligent and have them combine peer
pools, so they create something like a venn diagram of torrents. In addition
to telling you which peers are available, it'll tell you what to request from
them. You already have all of the file hashes in the torrent anyway, so any
wrongdoing here will get discarded.

~~~
predakanga
Unfortunately it's not as simple as that - when asking each other for data,
the individual peers ask for a particular 'piece' of the torrent, where that
piece isn't relative to a given file, but the torrent as a whole.

The files are concatenated into one long stream, and the piece number is an
index to that, with no guarantees about alignment.

For instance, if you have a torrent (we'll call it 'X') with three files: the
4mb file 'a', the 3mb file 'b' and the 1mb file 'c', and two separate torrents
('Y' and 'Z') describing files 'b' and 'c' seperately, then the pieces would
map something like this:

'Y' piece 1 -> 'X' piece 17 'Z' piece 1 -> 'X' piece 29

That's an absolute best case scenario though - in most cases, file sizes
aren't quite as perfect as that (each being a multiple of the default piece
size, 256kb). If 'b' just happened to be 1373kb, or anything else that wasn't
a multiple of 256kb, then any files after it aren't addressable from other
torrents.

~~~
TheLoneWolfling
Why not?

You just have at most two blocks of additional overhead.

You would have to have where the file begins and ends within the blocks
downloaded, but that's already in the torrent file.

~~~
predakanga
Because the hashes that are stored in the .torrent operate on that unaligned
data.

In practice, what this means is that you can't verify that two files of the
same name and size but at different alignments within the consolidated data
stream are identical; you can't compare hashes, can't do anything without
first downloading. This opens the door to mass poisoning of swarms without
even having to enter them in the first place.

There are potential solutions (including providing a broader hash per-file, as
opposed to per-piece), but my statement was only that it's not that simple,
not that it's impossible.

~~~
jychang
Why do you want to be completely backwards compatible with classic Torrents?
Torrent2 can dump some features of classic torrenting, like folder structure,
and mandate that each "subtorrenat" is basically a single Torrent1 containing
only 1 file and no folder structure.

