
Illegal Numbers - Pwnguinz
http://en.wikipedia.org/wiki/Illegal_number
======
tokenadult
Are we sure that this Wikipedia article, kindly submitted for our discussion,
lays out the issues thoroughly? The Wikipedia article's talk page includes the
tags

"This article has been rated as Stub-Class on the project's quality scale.

"This article has been rated as Low-importance on the project's importance
scale."

What are Wikipedians doing about that? The article history

[http://en.wikipedia.org/w/index.php?title=Illegal_number&...](http://en.wikipedia.org/w/index.php?title=Illegal_number&action=history)

makes it look like this is a rather sporadically edited article, which needs a
lot of work. Are Hacker News participants willing to roll up their sleeves and
jump in on editing the article? Or is that part of the problem on Wikipedia,
that an article can be known to be lacking, but still not get fixed?

Where else would one go to find good sources on this issue? I've got no
special knowledge of this specialized issue, so I can't personally help. My
experience as a Wikipedian, fixing articles I know how to fix, is that most
Wikipedia articles on most subjects need a lot more reliable sources.

[http://en.wikipedia.org/wiki/Wikipedia:Identifying_reliable_...](http://en.wikipedia.org/wiki/Wikipedia:Identifying_reliable_sources)

------
FlukeATX
The often relevant "What Colour are your bits?" -
<http://ansuz.sooke.bc.ca/entry/23>

~~~
shasta
Summary: "Illegal number" is a term that you use to refer to protected digital
information when you also want to convey that you have no grasp of the legal
issues involved.

~~~
ewillbefull
> you have no grasp of the legal issues involved

Let's have a thread about it then. What legal issues am I misunderstanding? It
seems to me that the anti-circumvention provisions of the DMCA are rather
unprecedented.

Brandenburg v. Ohio demonstrated that free speech protects abstract advocacy
of force or law violation. The anti-circumvention provisions criminalize the
distribution of not just software that can be directly used, but anything that
serves as a tool to violate copyright. That could be a small number, a key, or
even a description of how a DRM scheme operates.

I cannot imagine a constitutional basis for those provisions of the DMCA. What
say you?

~~~
shasta
> The anti-circumvention provisions criminalize the distribution of not just
> software that can be directly used, but anything that serves as a tool to
> violate copyright.

I'm not a lawyer, but I'm going to speculate that if you upload image.bmp, a
picture you took with your camera, that happens by coincidence to contain a
key that can be used for circumvention, then you would not be prosecuted or
convicted. In other words, it's not the number that's illegal.

~~~
ewillbefull
I'm not speaking about accidental violation of the provision, I'm even talking
about intentional violation. Anti-circumvention provisions don't cover the
copyrighted work, they don't even cover a derivative work. It's just
suppression of speech that tangentially supports or encourages illegal
activities, which the Supreme Court has ruled the government has no compelling
interest in quelling.

A lot of this speech can take many forms. It can be blog posts delving into
how some encryption scheme works, it can be keys that are derived
mathematically from Sony's mistakes, and it can be software which may only
incidentally be used as a tool to circumvent DRM. This does not survive strict
scrutiny.

------
rachelbythebay
Sometimes when I'm feeling innocent and pure, I like to pretend that the world
is a place where everyone realizes that "... everything is just zeroes and
ones" and thus anything can represent anything else when viewed just the right
way. That means everything is illegal, and so nothing is illegal.

"This coke can plays the melody to Funkytown!" ...
<http://rachelbythebay.com/w/2012/07/26/encoding/>

Then I snap out of it. It's right up there with "what if each of your
fingernails contained a universe" talk: inevitable, but predictable.

------
cletus
This is an interesting problem that we, as a society, need to deal with.

HN's readership would be predominantly from a tech background with probably a
reasonable basis in mathematics and the sciences. As such we can recognize
some of the absurdities of the legal system, like how you can't patent a
mathematical formula but you can patent software, which is ultimately
indistinguishable from a formula in many (if not all?) cases.

The question is naturally asked: if I make some number produce something
illegal is that number then illegal? The engineers among us like absolute
rules and certainty as a general rule. I've had a discussion with someone else
about how a group of people could route each other's packets and then
hypothetically law enforcement couldn't prove who they came from. The same
argument comes up with "an IP address is not a person" arguments.

While I sympathize with these arguments, this isn't how the legal system
works. The law doesn't work on absolutely provable certainty. It works on
reasonable doubt, intent and (hopefully) facts.

So when it comes to numbers, one has to remember that numbers encode
information (See [1]). So for a small integer to encode, say, an obscene image
or a program that circumvents copy protection, it would require another
program that actually does that. So if the number 7 produced a DeCSS
circumvention program with a given program A, then program A is the problem,
not the number 7.

Now to turn any number into a video or a program or an image requires another
program or infrastructure. Even if it's juaw the raw bytes of an i386/ELF
program, you still need a kernel and a filesystem to run it. The test here, as
I imagine the law would apply (IANAL), is whether the required infrastructure
is _general purpose_ or not.

Turning the number 7 into something obscene can't be done with something
general purpose. Turning a 1400 digit number into a program can.

I don't like the stance of the US on IP, particularly the Obama
administration, which has to be the most pro-IP anti-tech administration in US
history (IMHO). I also don't like how selective enforcement is here. Share a
few songs on Limewire and you'd up for hundreds of thousands of dollars?
Really? At the same time, every city has a place you can go to buy pirate DVDs
or counterfeit goods (eg Canal St in NYC). Why is this, which is actual
trafficking for profit, ignored while the administration tries to elevate
file-sharing to terrorism (the original ACTA draft)?

Anyway, numbers are simply a way to represent information and that information
is the problem, not the number.

[1]: <http://en.wikipedia.org/wiki/Information_theory>

~~~
iopq
>So for a small integer to encode, say, an obscene image or a program that
circumvents copy protection, it would require another program that actually
does that.

so winzip is the problem, not the zipped file? ridiculous statement, bro

fine, you still need to play the file, so media player is the problem?
obviously, it's the source file, not the generic tools that work on everything

~~~
vidarh
He is saying that the problem is those components of the total system used to
produce the infringing output that is not generic.

So in the case of a zipped file, the zipped file and not winzip would be the
problem, because winzip is generic, and have general purpose uses, while the
zipped file is a very large specific number that is translatable by a generic
process into a specific protected work.

On the other extreme, the number "7" on the other hand, does not encode enough
information to on its own contain the information about the infringing
content. Even if you were to write a "decoder" that turns the number "7" into
a movie, in this case the "decoder" is the problem, as the decoder embodies
enough specific information about the content that in effect the number you
feed in is little more than a key to unlock access to that content.

------
Leszek
Let's rephrase this problem as not something that people will defend, e.g.
anti-DRM, but something people will not defend; let's go straight for the big
guns and use child pornography. Clearly, child pornography is illegal, and yet
the images are just a series of 0s and 1s; these could potentially be re-
interpreted as a single number, giving rise to the problem here. Or maybe each
triplet of bytes of the JPEG's data could be represented as an RGB colour, and
you could have a "child pornography flag" analogous to the free speech flag
here, albeit a lot longer. Does this make it ok to distribute this number or
this flag, because it's just a number/sequence of colours?

~~~
aes256
I've never really understood the child pornography thing.

Criminalizing mere possession of such images seems like a roundabout, largely
ineffective, and perhaps even counterproductive, way of tackling the real
issue; the sexual abuse of children.

It blurs the very real and very important distinction between those who are
sexually attracted to children and those who will act on that attraction to
sexually abuse a child.

It criminalizes the inquisitive, has the potential to make people inadvertent
criminals, and ultimately — to return to the original subject matter — it
makes a crime of possessing a particular sequence of 0s and 1s, which strikes
me as particularly absurd.

My personal suspicion is that sexual attraction to children (primarily
referring to teenagers here) is not the abhorrent, unnatural illness that
people speak of in public, but a perfectly natural, harmless preference that
is far more commonplace than most people would like to admit.

Ultimately, I'm for the free sharing of 0s and 1s in any order, and I'm not
the least bit swayed by the child pornography challenge.

~~~
Leszek
Fine, replace child pornography with a text file containing your name,
birthday, social security number, credit card details, email password and
mother's maiden name, the argument is the same.

~~~
lmm
There would be no crime in possessing such a file, nor should there be. It
would and should be criminal to use it to impersonate me, or to access my
emails without permission.

~~~
chii
> There would be no crime in possessing such a file, nor should there be.

exactly. There should be no crime in mere possession of information. It may be
a crime to use that information in ways you are not allowed - for example, the
above file of personal information, if copied off my machine by somebody who i
didn't authorize, and then that person used it to do something, i have grounds
(and the state has grounds) to pursue criminal activity.

Now swap that with any information that is deemed "illegal" in society these
days, and the same arguments should apply.

------
rayiner
There is a Latin term for arguments based on the reduction of software to
numbers: reductio ad absurdum.

~~~
cbr
You seem to be implying that the argument is absurd, but reductio ad absurdum
is valid. "If X were true then we would have impossibility Y, so X must be
false."

------
tshtf
See also: <http://en.wikipedia.org/wiki/Illegal_prime>

------
takluyver
I'm intrigued by how the law would handle one time pad cryptography in such a
case. You could produce two numbers of similar length, each on its own bearing
no relationship to the 'illegal' number, but which can be combined to give
that number. Then post them separately. Could either of them legally be taken
down?

~~~
derekp7
The only way to produce such a pair of numbers is by using the restricted
material as a source -- therefore the numbers would be a derivative work. This
is already covered under copyright law.

Now there is a twist to this concept -- plausible deny-ability. Let's say Bob
posts a non-infringing work encrypted with a one time pad (randomly
generated), and also posts the one time pad too. Both files would appear to be
random text until put together. Further, someone else, Bill, produces a
"random" number by XORing the Bob's one-time pad with an infringing work, and
uses the result as a one-time pad to encrypt another non-infringing work, and
posts both of those files. The two one-time pads can recreate the infringing
work by XORing them together. But you can't prove who's "random" file was
actually randomly created, and which one was produced as a derivative work.
Who do you send the take-down notice to?

~~~
harshreality
The initial example presents the same which-part-is-infringing conundrum as
the more convoluted example you presented.

    
    
      - restricted material is X
      - generate Y randomly (using a typical cryptographically secure RNG or PRNG, zero-bias, with an entropy source unlikely to be observed by anyone else)
      - Z = X xor Y
    

There is no way to prove that it wasn't Z generated randomly, with Y = X xor
Z.

It's clear that the only way to generate _both_ Y and Z is to have the
restricted material, but you don't know which piece is tainted and which
isn't. You have to know which piece was generated first to know which one is
infringing (the non-infringing one had to be used as input for the infringing
one).

~~~
wmf
That just encourages the powers that be to take a "shoot 'em both and let God
sort it out" approach. Also, the metadata that says Z = X xor Y will be seen
as infringing.

~~~
Dylan16807
I doubt that metadata will be infringing. It doesn't have any real information
content.

------
joesb
They accused me of threatening to kill other people when all I did was just
sending numbers, which happens to be interpreted by mobile phone to show on
screen as "I'll kill your family if you don't do what I said".

Apparently some numbers or are considered illegal.

------
baaksies
Numbers corresponding to obscenity depends entirely on the software that
interprets and renders them, doesn't it? If I write software that translates
the number 5 into obscenity, would 5 become illegal?

~~~
Evbn
Not entirely, partially. If you could come up with an actual example, you may
have a point.

------
rolux
The main problem with "illegal numbers" seems to be that for any illegal
number _a_ , which represents someone's intellectual property or trade secret
_A_ , I can easily create two original pieces of intellectual property _B_ and
_C_ so that _b_ \+ _c_ = _a_.

~~~
Evbn
Give us an actual example of b and c that are created without knowledge of a,
and you may have case. But information theory says it is unlikely, if a is
deeply original and not derivative of something like b and c to begin with.

~~~
aes256
I may not be on the same wavelength as rolux here, but compression would be
one example. Create B (a series of compressed files) and C (a decompressor),
both unique creations in their own right, that when combined create the
original work of another person, A.

Now I'm no expert on compression, but there are surely a vast, if not
infinite, number of possible pairs B and C that could combine to form the
original copyrighted work.

Thus, with copyright law in its current state, we are not simply granting
copyright holders the exclusive rights to a particular sequencing of 1s and
0s, but also rights over any method of creating that sequence.

If there are an infinite number of methods of creating that sequence — that is
to say, any B, if combined with a suitable C could form A — then aren't we in
effect granting rights over everything to the copyright holder? Where do we
draw the line?

The 1s and 0s that make up this post could, with suitable decompression, form
an exact copy of a hit blockbuster, but neither this post nor the decompressor
would resemble the blockbuster on their own.

Edit: Interesting idea time. If I uploaded a series of files alleging they are
encrypted copies of blockbuster films, but resolved not to release the
encryption keys to any of the files for 12 months, would the copyright holders
have the right to have the encrypted files taken down in the meantime?

They can't actually prove that the files are infringing without the encryption
key. Is the mere suggestion that a file may potentially, with some
manipulation (i.e. decryption with the appropriate key), resemble a
copyrighted work, sufficient to have it taken down?

~~~
wmf
Again, that can only work if either B or C is a derivative of A.

~~~
aes256
Apparently what I was describing here has already been put into practice in
the shape of Monolith [1]

In the case I describe, as in the case of 'munging' using Monolith, neither B
nor C bears any resemblance to A except when the two are combined.

In my view, at least, you cannot say either is a derivative of A. To do so
would bind you to declare everything a derivative of A, because any B when
combined with an appropriate C can form A.

If I take your post here as B, you will likely deny it is a derivative of any
copyrighted work, but if I make the text of the post the encryption key to
another file (C) which, when decrypted, becomes copyrighted work A, is your
post itself derivative of a copyrighted work?

What makes your post non-derivative and the encrypted file I create
derivative? They are both nothing in themselves, and yet a copyrighted work
when combined with the other.

[1] <http://monolith.sourceforge.net/>

Edit: Reading through the article on the color of bits it seems this exact
argument prompted the article in the first place. I guess I should finish
reading this!

~~~
wmf
Let me put it a different way. If A = decompress(B), then _necessarily_ B =
compress(A), so B is obviously a derivative of A. Introducing xor does not
change anything; _one_ of the parts must be a derivative of the original.

~~~
aes256
Okay, let me propose an alternative procedure.

I set a series of random number generators going, and with each set of
results, I apply randomly generated XOR to create a new sequence of numbers.

I perform this process over and over. Eventually, it produces (give or take a
few bits) a copy of an MP3 file of a copyrighted work.

Now, once we've eliminated any procedure of creating B and C that includes A,
would you still say one of B or C are derivative of A?

Should my random number generator be banned? Perhaps more importantly, do I
acquire copyright to all the files it creates?

I could quite easily create every possible variation of an MP3 file of a given
length. Does that mean any musician who, using a different procedure, produces
one of these files is infringing on my copyright?

~~~
nathan_long
>> Eventually, it produces (give or take a few bits) a copy of an MP3 file of
a copyrighted work.

...which you recognize by having a copy of that work and specifically matching
for it. If I were a copyright lawyer, I'd argue that your algorithm for
plucking this value out of the stream of randomness was the infringement.

>> I could quite easily create every possible variation of an MP3 file of a
given length.

If you're prepared to pay $35 each to register the copyright on all of those,
knock yourself out. I'll enjoy not paying taxes anymore.

~~~
aes256
> ...which you recognize by having a copy of that work and specifically
> matching for it. If I were a copyright lawyer, I'd argue that your algorithm
> for plucking this value out of the stream of randomness was the
> infringement.

I don't have to recognize it myself. Say I put all the resulting files up for
download on an FTP server, and the RIAA stumble across the collection. Within,
say, a collection of every possible 30 second long MP3 file encoded at
128kbps, I'd probably be infringing on a few thousand copyrighted works.

For each infringement there'd be many, many more 'infringing' files (i.e.
every slight variation on a work that a copyright lawyer would deem
indistinguishable from the original work)

> If you're prepared to pay $35 each to register the copyright on all of
> those, knock yourself out. I'll enjoy not paying taxes anymore.

Apparently you can register copyright for music tracks in bulk. In any case,
where I live you don't have to register copyrights.

~~~
nathan_long
>> Say I put all the resulting files up for download on an FTP server

128kilobits per second * 30 seconds = (128 * 1000 * 30) = 3,840,000 bits per
file.

There are 2 to the 3,840,000 possible combinations of that many bits. Ignoring
the fact that many of those won't be valid mp3s, each of those is about a 0.46
megabyte file.

I'm guessing you don't have enough hard drive space to put all those mp3s up.
:)

Assuming you did, the RIAA would have a tough time crawling all that content
for infringement.

It _would_ make an interesting test for the theory that "linking isn't
infringing," since the link would be the only thing distinguishing a song from
random noise.

~~~
aes256
Obviously I'd set the random generator up such that it operates within the
rules of the mp3 specification and only creates valid mp3 files; I don't think
that detracts from the experiment.

Storage space is the only major limitation here. With current computing power
I could easily have random mp3 files spat out at an alarming rate, such that
it wouldn't take too long (I'm guessing a matter of months) until I managed to
produce an infringing file this way.

I could probably speed the process up by teaching the 'random' mp3 generator
certain patterns to pursue; fade-ins and fade-outs, repetition, etc. Again, I
don't think these detract from the substance of the experiment.

It's kind of like teaching someone to play a sport; you show them the rules of
the game, and a bunch of 'patterns' that players tend to adhere to.
Eventually, they'll make a sequence of movements, lasting 30 seconds or so,
near enough identical to that performed by a famous sports star.

~~~
nathan_long
OK, I got a little help for my sorry math skills.
[http://math.stackexchange.com/questions/225155/how-can-i-
qua...](http://math.stackexchange.com/questions/225155/how-can-i-quantify-the-
amount-of-space-required-to-store-all-possible-128kilobit)

According to Ross Millikan over there, the number of possible 3,840,000-bit
files is a number with more than a million zeros. The number of atoms on the
universe is only around 10 to the 80. So if you could use the entire universe
as your hard drive, storing a bit on every atom, you'd need many, many
universes to store those files.

You're going to have to use some _serious_ algorithmic bias to get mp3s, much
more bias to get non-static, much more to get anything resembling music and
containing any English words, etc etc.

Bluntly, you won't get copyrighted works _ever_ unless you're specifically
targeting them, for any reasonable value of _ever_. It's theoretically
possible only in the sense that it's possible for someone's DNA to
spontaneously appear at a crime scene.

This is why the "songs are just numbers" argument is misguided. Yes, they can
be represented as numbers. But you'd never discover them that way.

------
skybrian
This sounds absurd only because we have a hard time really understanding the
consequences of dealing with a vast search space. Yes, 128^140 is the space of
all possible tweets (in ascii), but just because we're able to count them
(sorta) and assign a number to each message doesn't actually give us any power
over them. After all, the omega number [1] is just another number too.

[1] [http://plus.maths.org/content/omega-and-why-maths-has-no-
toe...](http://plus.maths.org/content/omega-and-why-maths-has-no-toes)

------
baddox
Not to mention the binary representation of every single copyrighted MP3,
text, software, picture, etc. in every conceivable encoding and file format.

------
KalobT
I don't speak binary very well, so correct me if I'm wrong on this subject. If
a number that represents illegal information is formed, that number is
illegal. What if there is a larger number that has the illegal number in it?
Example: 10 is illegal. 1000 has the number ten in it. The "10" can be
filtered out.. How would this be handled?

~~~
Fargren
What truly matter is intent. If you are trying to convey the illegal number,
you are trying to break the law. If you accidentally transmit the bits that
form the number while transmitting something else, you are not culpable.

Obligatory IANAL.

------
pharrington
Are people trying to say the context in which something happens shouldn't have
legal bearing?

~~~
nitrogen
People are trying to say that criminalization of the possession of or
trafficking in so-called circumvention tools is absurd.

------
jorts
The first thing that comes to mind when looking at the flag are Windows 8
theme colors.

