
Whitemark: Steganographic encoding of a watermark in the whitespace of text - luu
https://github.com/rlk/whitemark
======
vkjv
Reminds me of a fun Obfuscated C Contest winner:

[https://github.com/c00kiemon5ter/ioccc-obfuscated-c-
contest/...](https://github.com/c00kiemon5ter/ioccc-obfuscated-c-
contest/blob/master/2004/sds.hint)

The author, Stephen Sykes, submitted a program that encodes a message into the
whitespace of another file. Since whitespace doesn't matter in C, he encoded
the encoder into the source of the decoder and only submitted the source for
the decoder. Neat!

------
crpatino
Extremely weak under an active attack: 'sed s/\\([^ ]\\) */\1 /g'

What you want is something that cannot be easily removed from the message
without also affecting the meaning of the cover message. I am thinking of some
sort of encoding that uses orthographic errors that are not trivial to fix
automatically (e.g. substitute words with valid homophones, instead of
changing individual characters). That would be still easy to detect, so you
probably need a very low bandwidth channel to make it indistinguishable from a
typical text with mistakes.

Doing this with multimedia is probably far easier.

------
CookieMon
...and to avoid TPP etc. leaks becoming traceable, now something to routinely
scan for this sort of thing and perform character and eol normalisation.

~~~
crpatino
This technique has been publicly known for well over a decade. See:
[http://www.amazon.com/Information-Techniques-
Steganography-D...](http://www.amazon.com/Information-Techniques-
Steganography-Digital-Watermarking/dp/1580530354)

It's been a long time since I read it, but if I recall correctly, the UK
government has been encoding the identity of the receiver of highly sensible
information in this fashion for a long time. Probably it is SOP by now in most
environments where truly important secrets are exchanged.

------
cranium
In case you are wondering how it works: it converts 8-bit chunks of data into
10-chars string, with the normal space as the '1' and non-breaking space as
the '0' (the remaining two chars are used for control and to mark begin/end of
data). The string is then put back into the message.

~~~
neilellis
There are multiple unicode characters that are whitespace. I have a plugin for
IntelliJ just to spot these darn things. So you could actually encode more.

Also of course there are multiple .'s and ;'s in unicode so you could encode
more data in punctuation characters like them.

~~~
GPGPU
However that won't survive a scanning and OCR.

This could be useful to see who leaked a script, for example.

------
ldite
Context for this (from the author, coming off a discussion of
[https://github.com/reinderien/mimic](https://github.com/reinderien/mimic)):

[http://www.metafilter.com/154087/Zalgo-text-would-be-
kinder#...](http://www.metafilter.com/154087/Zalgo-text-would-be-
kinder#6259125)

------
bikeshack
Reminds me of this talk from Defcon 2008:
[https://www.defcon.org/images/defcon-16/dc16-presentations/d...](https://www.defcon.org/images/defcon-16/dc16-presentations/defcon-16-kolisar.pdf)

Tool called Whitespace used to encrypt messages

------
neilellis
This also reminds me of the Whitespace programming language:
[https://en.wikipedia.org/wiki/Whitespace_(programming_langua...](https://en.wikipedia.org/wiki/Whitespace_\(programming_language\))

------
pyvpx
Is anyone aware of legally (US and/or British law, specifically) "strong"
stenographic techniques? Meaning a court would allow that this object is
assigned to this person because of a stenographic key embedded in the object?

~~~
JupiterMoon
I am not a lawyer. I am not your lawyer.

I suspect that the common law legal systems would approach this via case law.
Unless case law exists already I think that if you wanted to claim ownership
of a file you'd need to present your case including expert witnesses and
persuade a judge/jury of its validity.

~~~
benmcnelly
Exactly what a lawyer would say....

------
adricnet
Neat, thanks for posting!

Invisible Secrets [[http://www.east-
tec.com/invisiblesecrets/](http://www.east-tec.com/invisiblesecrets/)] has a
mode for this that works quite well on HTML source. Rendering is unchanged and
you have to be looking at the source and looking at the spacing to even see
the difference manually.

------
nerdy
Really succinct & easy-to-read code at only 118 sloc, very nice!

