The Infinite File Download

Mojah · on July 24, 2015

Hi HN,

I didn't expect this project to make it to the frontpage. Right now, it's pulling close to 500Mbps in terms of bandwidth. It's cool, I like to see my graphs explode.

However, at one point, I may have to pull the plug on this. Bandwidth isn't entirely free and there are limits to what my little uplink and server can provide.

But in the meanwhile, enjoy the Silicon Valley startup parody: https://ma.ttias.be/infinite-file-downloader/

joosters · on July 24, 2015

Fun site! You could surprise downloaders by serving up an extremely-compressed file... A gzip transfer encoding can reach ratios of 1000s-1 or more, so anyone foolishly running wget will end up with a full hard drive long before your bandwidth quota gives out :)

(Yeah, I know this is impractical and won't do anything against >/dev/null )

on July 24, 2015

[deleted]

chillydawg · on July 24, 2015

malone · on July 24, 2015

It might be fun to engineer a gzip stream that uncompresses to something huge, and then serve that instead. I wonder if you could compress it enough that the limiting factor becomes the clients disk write speed rather than the network.

It'd be amusing to see people fill their disks at a far faster rate than they expect their network connection to allow.

gtufano · on July 24, 2015

https://en.wikipedia.org/wiki/Zip_bomb (good times!)

bradhe · on July 24, 2015

Lets say this service is generating truly random bytes. If a subset of the bytes that you download happen to describe, say, the last episode of Mr. Robot encoded in MP4, does that actually constitute piracy?

slavik81 · on July 24, 2015

No. Copyright only covers copying. If two people independently create the same work, they both have rights to that work.

Though, note that what you're suggesting is basically impossible for a work of any meaningful size, and the act of searching for a specific work in a random stream probably would prevent it from being considered an independent creation.

http://www.techpatents.com/Blog/independent-creation-is-a-de...

grkvlt · on July 24, 2015

See also "Pierre Menard, Author of the Quixote" by Borges (https://en.wikipedia.org/wiki/Pierre_Menard,_Author_of_the_Q...) about the differences between Menard's and Cervantes' Don Quixote. Both the same text, 100% identical, but one written in contemporary times to the setting, one in modern times. The modern 'copy' has a different meaning, due to the authors different circumstances, despite the content being syntactically exactly the same. Really interesting concept, like all of Borges, really!

scintill76 · on July 24, 2015

This is getting even more offtopic, but I'm kind of curious if anyone knows if it would be illegal to distribute a copy of something copyrighted, if it's "encrypted with itself" -- for example, an encrypted eBook with plaintext metadata that tells you how to derive a decryption key from a long sample of words in the book. In theory the copy is random bytes until you possess the book, which should mean you have the right to possess the copy for personal/backup/interoperability purposes. (The usecase in mind, which perhaps applies best to books, is being able to turn physical media into high-fidelity, open digital formats that can be distributed lawfully, without the cooperation of the copyright holder.)

drewcrawford · on July 24, 2015

> In theory the copy is random bytes

There is not any such thing as random bytes. Here's a byte: 0xca. Is it random?

Well, if I got it from a fair die, perhaps. But if I got it by printing the hex of a Java program, of course not. Bytes cannot be random, they can only be created in a random way (or not).

In your hypothetical you are not creating the bytes in a random way, you are creating them from a book. So the bytes have nothing to do with so-called "random bytes", but are in fact a derivative work of the book, at the very least.

scintill76 · on July 25, 2015

I guess this is bit color, then (a similar "random" example is used in the Colour essay somebody else linked.) Thanks for mentioning "derivative work", it's a good key word to look into this further. After skimming https://en.wikipedia.org/wiki/Derivative_work I guess one question is, is applying cryptography "transformative"?

To take it in a slightly different direction, since parody has been upheld, what about a deterministic, reversible, mechanical parodic transformation? AFAIK machine-created works can't be given a copyright, but I'm not asking for that, merely that a machine can create a sufficiently transformative derivative work.

In fact, maybe it doesn't even need to be parody. In https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com.... search engines were permitted to serve thumbnails of copyrighted images because the copies "served a different function than [the original] use – improving access to information on the Internet versus artistic expression." In the same way, an e-book could be "improving access" rather than offering "artistic expression."

I guess establishing "transformation" is a bit tricky -- presumably ASCII-encoding an English book's text is not transformative. But it seems like somehow you could come up with something interesting and legal, even if it's not quite like my original formulation. Google Books' excerpts are probably like this.

Oh, and after re-reading the Colour article, I see I am not even all that innovative: http://monolith.sourceforge.net/

kedean · on July 24, 2015

Now your just getting into semantics. Do you have a problem with the term 'random number' too? Because that's all a byte is, a number between 0 and 256.

drewcrawford · on July 24, 2015

As a matter of fact, I do; as do many people who take cryptography seriously:

> Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number — there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method. - John Von Neumann

But to your broader point, no, this is not a question of semantics. There is an actual syntactical difference being asserted here.

That difference is this: randomness has nothing to do with numbers. 0xca--the same number--can be either "random" or "non-random" depending on how you got ahold of it.

When we talk about numbers--prime numbers, even numbers, rational numbers, etc., we are talking about properties of the number; whether they are divisible by 2 and so on. But "random numbers" have no property by which we could recognize them. It is entirely a question of where the numbers were made, not what they are or what they look like.

avereveard · on July 24, 2015

Ianal but I think it would be illegal. Bits have flavour. Even if you randomly generate them with the intent of matching them with an existing copyrighted bits is enough to give them enough flavour to get you in trouble

gknoy · on July 24, 2015

Colour of bits [0] was an eye-opening read.

0: http://ansuz.sooke.bc.ca/entry/23

robryk · on July 24, 2015

Does this also apply to sha256(a_copyrighted_work)?

Note that retrieving any "useful information" about the original is believed to be similarly impossible from sha256(a_copyrighted_work) and encrypt_{sha_256(a_copyrighted_work)}(a_copyrighted_work).

coldtea · on July 24, 2015

>In theory the copy is random bytes until you possess the book, which should mean you have the right to possess the copy for personal/backup/interoperability purposes.

In what theory? We can very well theoritically conceive that you just ask for those "sample words" from someone else who has the book either legitimately or illegitmately, and he gives them to you.

Even if the set of sample words is different for each metadata file, the book would be the same, so you just need a person willing to share the words with someone that doesn't own the book.

scintill76 · on July 24, 2015

Yes, but in such cases you can also just borrow the entire book, photocopy it, and keep the copy. It's still legal to loan books or have photocopiers, but using them together without license is illegal. The point is to make it so that the distributors can reasonably argue they are not actually distributing copyrighted material and have made some effort to ensure the recipients have a license to the work, and that there are no reasonable physical prohibitions the law or copyright holders can make to stop this (i.e., banning the sharing of books isn't reasonable -- for the moment.) It's the responsibility of the decrypting party to only decrypt lawfully, just as it's currently with disc copying/ripping.

Now, an actual lawyer might obliterate this through some logic I don't know, and/or appeal to "bit color" as others are pointing out.

P.S. I do have to admit my idea reminds me of warrant canaries, a "clever hack" around the law that I ultimately believe can't really be lawful.

P.P.S. CleanFlicks et al may be relevant case law, as from what I understand they made varying efforts to "ensure" their customers held a license, but were basically still distributing copies of works they didn't own.

coldtea · on July 25, 2015

>Yes, but in such cases you can also just borrow the entire book, photocopy it, and keep the copy.

Yeah, but this way you get the electronic version that you want + the key (from some other user). In the end it comes to those distributing they keys, and it's not different from site offering trial-versions and others giving out serials and cracks.

>The point is to make it so that the distributors can reasonably argue they are not actually distributing copyrighted material

Just because it's encrypted it doesn't mean it's not copyrighted material. If I encrypt "Star Wars" and give it for download (letting others give the key), they'll still be all over my ass.

kozhevnikov · on July 24, 2015

What if it's not random but π?

https://github.com/philipl/pifs

cbd1984 · on July 24, 2015

> What if it's not random but π?

π is known to never repeat (that is, it's irrational); it is not, however, known to be normal.

That means the digits in its decimal expansion (to pick a base) are not known to be normally distributed. This means that it is not necessarily the case that every possible sequence of decimal digits is present in π.

For example, for all we know it's the case that beyond a certain (massively huge) number of decimal places, π never contains another '7'. That would render some sequences impossible, while still preserving the proven-to-be-true property that π never repeats itself in its entirety (that is, that it's irrational).

leni536 · on July 24, 2015

> That means the digits in its decimal expansion (to pick a base) are not known to be normally distributed.

You mean uniformly distributed. Being a normal number states more than that: any finite sequence of digits are uniformly distributed.

While you are right that Pi isn't proven to be normal, constructing a normal number isn't exactly hard. I think the following number is proven to be normal (binary representation, I added spaces for clarity):

0.0 1 00 01 10 11 001 010 011 100 101 110 111 ...

It's clear to see that this number contains all possible finite binary sequences.

tshaddox · on July 24, 2015

How is searching for something in a random stream equivalent to copying it?

slavik81 · on July 24, 2015

It's hard to call it "independently created" if you used a copy of the work in question as an input to your algorithm.

dredmorbius · on July 24, 2015

At some point the copy of the work you use to either compare or generate a comparison hash might be considered copying.

The maths tend to work against you as well.

danbruc · on July 24, 2015

At least for me it is all zeros (0x00). Except when I downloaded the entire file for the first time, there I first got zeros (0x00), then zeros (0x30) separated by two windows newlines (twice 0x0D 0x0A) and reached infinity after 544,930 bytes.

leni536 · on July 24, 2015

Mandatory "What Colour are your bits?" post.

http://ansuz.sooke.bc.ca/entry/23

sklivvz1971 · on July 24, 2015

Of course it does, that's equivalent to what torrents do. The "index" and "length" would constitute the copyright violation.

In the same way torrents are lists of hashes, which are used as keys to find byte blocks in the "cloud" of torrent clients.

p1mrx · on July 24, 2015

The probability of that occurring is on the order of 1 in 2^1000000000. You would exceed the computational capacity of the universe long before that point.

Mojah · on July 24, 2015

You're talking accidental piracy. That would make for a great court case!

heinrich5991 · on July 24, 2015

Unfortunately (or fortunately?) this can pretty much be claimed to be impossible. Even just generating one specific kilobyte of data using uniform randomness yields a probability of 1/2^8192 (Python tells me that 2^8192 > 10^2000, which is far bigger than the estimated number of atoms in the universe, which is somewhere around 10^80).

poizan42 · on July 24, 2015

Something related - you can upload special files such as /dev/urandom to websites that expects a normal file. Now interestingly a lot of sites only checks the filesize after the upload has completed. This means that a lot of sites a vulnerable to a DoS by filling there harddrive with temporary data from e.g. /dev/urandom.

V-2 · on July 24, 2015

You should have mentioned that the file is gluten-free

Mojah · on July 24, 2015

Hehe, good one - I've added it to our list of perks. :-)

V-2 · on July 24, 2015

Nice going :) It just feels so much better, it feels lighter

qbrass · on July 24, 2015

It should feel heavier if it's gluten-free.

thejerz · on July 24, 2015

Someone, somewhere, is paying for this bandwidth...

Mojah · on July 24, 2015

In this case, that would be my employer: https://www.nucleus.be/en

Most ISPs/hosting providers have some bandwidth to spare, to be able to cover the spikes in traffic and support their growth. As long as this Infinite Downloader doesn't consume _too_ much traffic, we'll be happy to sponsor it. Bandwidth/peering agreements are made in terms of commitments, and as long as you stay beneath your committed data traffic, additional bandwidth isn't charged.

Once it starts to saturate uplinks or pose in any way a problem to other clients, it'll get shot down. But I don't see that being the case any time soon.

irth · on July 24, 2015

> Nucleus

HBO's Silicon Valley, anyone?

kedean · on July 24, 2015

I bet this DAAS project would be much more efficient if it utilized middle-out compression in the transfer.

iotku · on July 24, 2015

Well, this would probably thrash my 5GB/month satellite internet cap in a bit under an hour.

Obviously I'm not going to try it, but it can serve as a good reminder about how much data you can go thru on a modern (or in my case simi-modern) connection in hardly no time at all.

Gotta love caps I suppose...

anon4 · on July 24, 2015

curl -o /dev/null for a complete experience

tinix · on July 24, 2015

no no no, you gotta give it back to /dev/urandom!!!

christiangenco · on July 24, 2015

I think you could implement this in javascript and have the download generate itself on the client, making this scale much easier. Email me at username at gmail if it's something you're interested in and are stuck.

gnurag · on July 24, 2015

I usually prefer wget to browser based downloads.

flipp3r · on July 24, 2015

Chrome indicates the download didn't fail, it even completed. This file isn't infinite at all, what a scam!

( I accidentally left it to download. The resulting file is 1.991.671.687 bytes. )

strictfp · on July 24, 2015

Seems as if Chrome is using 30.8913327029 bits to store the filesize :)

nudpiedo · on July 24, 2015

I expected random junk but the stream is just zeroed :(

With a compression algorythm we could safe 100% of the bandwith (trending to infinite, xD).

By the way, the donwload speed always drops, even when using axel.

Retr0spectrum · on July 24, 2015

I wonder if this would be possible in the web browser with javascript.

anc84 · on July 24, 2015

Maybe if you leverage the blockchain in mongodb with a node.js docker image?