Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Infinite File Download (ma.ttias.be)
36 points by tomkwok on July 24, 2015 | hide | past | favorite | 50 comments


Hi HN,

I didn't expect this project to make it to the frontpage. Right now, it's pulling close to 500Mbps in terms of bandwidth. It's cool, I like to see my graphs explode.

However, at one point, I may have to pull the plug on this. Bandwidth isn't entirely free and there are limits to what my little uplink and server can provide.

But in the meanwhile, enjoy the Silicon Valley startup parody: https://ma.ttias.be/infinite-file-downloader/


Fun site! You could surprise downloaders by serving up an extremely-compressed file... A gzip transfer encoding can reach ratios of 1000s-1 or more, so anyone foolishly running wget will end up with a full hard drive long before your bandwidth quota gives out :)

(Yeah, I know this is impractical and won't do anything against >/dev/null )


[deleted]


Hah.


It might be fun to engineer a gzip stream that uncompresses to something huge, and then serve that instead. I wonder if you could compress it enough that the limiting factor becomes the clients disk write speed rather than the network.

It'd be amusing to see people fill their disks at a far faster rate than they expect their network connection to allow.



Lets say this service is generating truly random bytes. If a subset of the bytes that you download happen to describe, say, the last episode of Mr. Robot encoded in MP4, does that actually constitute piracy?


No. Copyright only covers copying. If two people independently create the same work, they both have rights to that work.

Though, note that what you're suggesting is basically impossible for a work of any meaningful size, and the act of searching for a specific work in a random stream probably would prevent it from being considered an independent creation.

http://www.techpatents.com/Blog/independent-creation-is-a-de...


See also "Pierre Menard, Author of the Quixote" by Borges (https://en.wikipedia.org/wiki/Pierre_Menard,_Author_of_the_Q...) about the differences between Menard's and Cervantes' Don Quixote. Both the same text, 100% identical, but one written in contemporary times to the setting, one in modern times. The modern 'copy' has a different meaning, due to the authors different circumstances, despite the content being syntactically exactly the same. Really interesting concept, like all of Borges, really!


This is getting even more offtopic, but I'm kind of curious if anyone knows if it would be illegal to distribute a copy of something copyrighted, if it's "encrypted with itself" -- for example, an encrypted eBook with plaintext metadata that tells you how to derive a decryption key from a long sample of words in the book. In theory the copy is random bytes until you possess the book, which should mean you have the right to possess the copy for personal/backup/interoperability purposes. (The usecase in mind, which perhaps applies best to books, is being able to turn physical media into high-fidelity, open digital formats that can be distributed lawfully, without the cooperation of the copyright holder.)


> In theory the copy is random bytes

There is not any such thing as random bytes. Here's a byte: 0xca. Is it random?

Well, if I got it from a fair die, perhaps. But if I got it by printing the hex of a Java program, of course not. Bytes cannot be random, they can only be created in a random way (or not).

In your hypothetical you are not creating the bytes in a random way, you are creating them from a book. So the bytes have nothing to do with so-called "random bytes", but are in fact a derivative work of the book, at the very least.


I guess this is bit color, then (a similar "random" example is used in the Colour essay somebody else linked.) Thanks for mentioning "derivative work", it's a good key word to look into this further. After skimming https://en.wikipedia.org/wiki/Derivative_work I guess one question is, is applying cryptography "transformative"?

To take it in a slightly different direction, since parody has been upheld, what about a deterministic, reversible, mechanical parodic transformation? AFAIK machine-created works can't be given a copyright, but I'm not asking for that, merely that a machine can create a sufficiently transformative derivative work.

In fact, maybe it doesn't even need to be parody. In https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com.... search engines were permitted to serve thumbnails of copyrighted images because the copies "served a different function than [the original] use – improving access to information on the Internet versus artistic expression." In the same way, an e-book could be "improving access" rather than offering "artistic expression."

I guess establishing "transformation" is a bit tricky -- presumably ASCII-encoding an English book's text is not transformative. But it seems like somehow you could come up with something interesting and legal, even if it's not quite like my original formulation. Google Books' excerpts are probably like this.

Oh, and after re-reading the Colour article, I see I am not even all that innovative: http://monolith.sourceforge.net/


Now your just getting into semantics. Do you have a problem with the term 'random number' too? Because that's all a byte is, a number between 0 and 256.


As a matter of fact, I do; as do many people who take cryptography seriously:

> Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number — there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method. - John Von Neumann

But to your broader point, no, this is not a question of semantics. There is an actual syntactical difference being asserted here.

That difference is this: randomness has nothing to do with numbers. 0xca--the same number--can be either "random" or "non-random" depending on how you got ahold of it.

When we talk about numbers--prime numbers, even numbers, rational numbers, etc., we are talking about properties of the number; whether they are divisible by 2 and so on. But "random numbers" have no property by which we could recognize them. It is entirely a question of where the numbers were made, not what they are or what they look like.


Ianal but I think it would be illegal. Bits have flavour. Even if you randomly generate them with the intent of matching them with an existing copyrighted bits is enough to give them enough flavour to get you in trouble


Colour of bits [0] was an eye-opening read.

0: http://ansuz.sooke.bc.ca/entry/23


Does this also apply to sha256(a_copyrighted_work)?

Note that retrieving any "useful information" about the original is believed to be similarly impossible from sha256(a_copyrighted_work) and encrypt_{sha_256(a_copyrighted_work)}(a_copyrighted_work).


>In theory the copy is random bytes until you possess the book, which should mean you have the right to possess the copy for personal/backup/interoperability purposes.

In what theory? We can very well theoritically conceive that you just ask for those "sample words" from someone else who has the book either legitimately or illegitmately, and he gives them to you.

Even if the set of sample words is different for each metadata file, the book would be the same, so you just need a person willing to share the words with someone that doesn't own the book.


Yes, but in such cases you can also just borrow the entire book, photocopy it, and keep the copy. It's still legal to loan books or have photocopiers, but using them together without license is illegal. The point is to make it so that the distributors can reasonably argue they are not actually distributing copyrighted material and have made some effort to ensure the recipients have a license to the work, and that there are no reasonable physical prohibitions the law or copyright holders can make to stop this (i.e., banning the sharing of books isn't reasonable -- for the moment.) It's the responsibility of the decrypting party to only decrypt lawfully, just as it's currently with disc copying/ripping.

Now, an actual lawyer might obliterate this through some logic I don't know, and/or appeal to "bit color" as others are pointing out.

P.S. I do have to admit my idea reminds me of warrant canaries, a "clever hack" around the law that I ultimately believe can't really be lawful.

P.P.S. CleanFlicks et al may be relevant case law, as from what I understand they made varying efforts to "ensure" their customers held a license, but were basically still distributing copies of works they didn't own.


>Yes, but in such cases you can also just borrow the entire book, photocopy it, and keep the copy.

Yeah, but this way you get the electronic version that you want + the key (from some other user). In the end it comes to those distributing they keys, and it's not different from site offering trial-versions and others giving out serials and cracks.

>The point is to make it so that the distributors can reasonably argue they are not actually distributing copyrighted material

Just because it's encrypted it doesn't mean it's not copyrighted material. If I encrypt "Star Wars" and give it for download (letting others give the key), they'll still be all over my ass.


What if it's not random but π?

https://github.com/philipl/pifs


> What if it's not random but π?

π is known to never repeat (that is, it's irrational); it is not, however, known to be normal.

That means the digits in its decimal expansion (to pick a base) are not known to be normally distributed. This means that it is not necessarily the case that every possible sequence of decimal digits is present in π.

For example, for all we know it's the case that beyond a certain (massively huge) number of decimal places, π never contains another '7'. That would render some sequences impossible, while still preserving the proven-to-be-true property that π never repeats itself in its entirety (that is, that it's irrational).


> That means the digits in its decimal expansion (to pick a base) are not known to be normally distributed.

You mean uniformly distributed. Being a normal number states more than that: any finite sequence of digits are uniformly distributed.

While you are right that Pi isn't proven to be normal, constructing a normal number isn't exactly hard. I think the following number is proven to be normal (binary representation, I added spaces for clarity):

0.0 1 00 01 10 11 001 010 011 100 101 110 111 ...

It's clear to see that this number contains all possible finite binary sequences.


How is searching for something in a random stream equivalent to copying it?


It's hard to call it "independently created" if you used a copy of the work in question as an input to your algorithm.


At some point the copy of the work you use to either compare or generate a comparison hash might be considered copying.

The maths tend to work against you as well.


At least for me it is all zeros (0x00). Except when I downloaded the entire file for the first time, there I first got zeros (0x00), then zeros (0x30) separated by two windows newlines (twice 0x0D 0x0A) and reached infinity after 544,930 bytes.


Mandatory "What Colour are your bits?" post.

http://ansuz.sooke.bc.ca/entry/23


Of course it does, that's equivalent to what torrents do. The "index" and "length" would constitute the copyright violation.

In the same way torrents are lists of hashes, which are used as keys to find byte blocks in the "cloud" of torrent clients.


The probability of that occurring is on the order of 1 in 2^1000000000. You would exceed the computational capacity of the universe long before that point.


You're talking accidental piracy. That would make for a great court case!


Unfortunately (or fortunately?) this can pretty much be claimed to be impossible. Even just generating one specific kilobyte of data using uniform randomness yields a probability of 1/2^8192 (Python tells me that 2^8192 > 10^2000, which is far bigger than the estimated number of atoms in the universe, which is somewhere around 10^80).


Something related - you can upload special files such as /dev/urandom to websites that expects a normal file. Now interestingly a lot of sites only checks the filesize after the upload has completed. This means that a lot of sites a vulnerable to a DoS by filling there harddrive with temporary data from e.g. /dev/urandom.


You should have mentioned that the file is gluten-free


Hehe, good one - I've added it to our list of perks. :-)


Nice going :) It just feels so much better, it feels lighter


It should feel heavier if it's gluten-free.


Someone, somewhere, is paying for this bandwidth...


In this case, that would be my employer: https://www.nucleus.be/en

Most ISPs/hosting providers have some bandwidth to spare, to be able to cover the spikes in traffic and support their growth. As long as this Infinite Downloader doesn't consume _too_ much traffic, we'll be happy to sponsor it. Bandwidth/peering agreements are made in terms of commitments, and as long as you stay beneath your committed data traffic, additional bandwidth isn't charged.

Once it starts to saturate uplinks or pose in any way a problem to other clients, it'll get shot down. But I don't see that being the case any time soon.


> Nucleus

HBO's Silicon Valley, anyone?


I bet this DAAS project would be much more efficient if it utilized middle-out compression in the transfer.


Well, this would probably thrash my 5GB/month satellite internet cap in a bit under an hour.

Obviously I'm not going to try it, but it can serve as a good reminder about how much data you can go thru on a modern (or in my case simi-modern) connection in hardly no time at all.

Gotta love caps I suppose...


curl -o /dev/null for a complete experience


no no no, you gotta give it back to /dev/urandom!!!


I think you could implement this in javascript and have the download generate itself on the client, making this scale much easier. Email me at username at gmail if it's something you're interested in and are stuck.


I usually prefer wget to browser based downloads.


Chrome indicates the download didn't fail, it even completed. This file isn't infinite at all, what a scam!

( I accidentally left it to download. The resulting file is 1.991.671.687 bytes. )


Seems as if Chrome is using 30.8913327029 bits to store the filesize :)


I expected random junk but the stream is just zeroed :(

With a compression algorythm we could safe 100% of the bandwith (trending to infinite, xD).

By the way, the donwload speed always drops, even when using axel.


I wonder if this would be possible in the web browser with javascript.


Maybe if you leverage the blockchain in mongodb with a node.js docker image?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: