Hacker News new | past | comments | ask | show | jobs | submit login
Chaffing and Winnowing: Confidentiality without Encryption (1998) (csail.mit.edu)
69 points by ronancremin on July 15, 2013 | hide | past | favorite | 47 comments



It's cool but basically solves a problem that no longer exists. Once you've caused enough suspicion, they can simply dig up the records of all the data you've sent, both chaff and wheat, and serve you with an order to disclose your authentication key/lawfully hack you computer and obtain it without asking/apply some lead pipe cryptanalysis and get it anyway. In the end, it's no better than regular encryption, at the cost of being at least twice more inefficient.

Still, for all the crypto export nonsense, 1998 appears to have been a more innocent time:

> "But access to authentication keys is one thing that government has long agreed that they don't want to have."


Near the bottom they mention using more than one wheat stream to achieve something like deniable encryption. If they ask you for the key, give them the one that produces innocent-looking messages.


I'm sure if law enforcement doesn't like your innocent looking messages they can just keep demanding that you give them the real key.

How easy is it to produce a stream of messages that is fake but looks real?


Depends, how good are you at creative writing? I can think of a lot of messages you might send to someone that you'd want to be private that aren't nefarious plots. Weird fan fiction. Deviant porn. Messages exchanged with a secret mistress. Depending on the situation, you might even want to give them a fake copy of your nefarious plot. Include more than one extra set of messages if you like and give them whatever keys you like in whatever order is appropriate.


>In the end, it's no better than regular encryption, at the cost of being at least twice more inefficient.

He goes on to explain how to make it more efficient: If you need every "wheat" packet to reconstruct any part of the message, you can send a finite number of chaff packets (e.g. 1000) in random locations, which would make reconstructing a message of arbitrary length infeasible for an adversary that can't separate the wheat from the chaff other than by exhaustive search.


See also "Chaffinch: Confidentiality in the Face of Legal Threats" by Richard Clayton and George Danezis from University of Cambridge, which has some more plausible deniability.

(http://www.cl.cam.ac.uk/~rnc1/Chaffinch.html)


I've thought about writing a Chrome plugin to do something similar. While on it would randomly chaff the low order bits of any image you upload, and would automatically add a chaff postscript to every Gmail. An adversary would have no clue which images/messages contain ciphertext, and which contain nothing but random chaff.


Was unreachable for me, here's the cached version: http://webcache.googleusercontent.com/search?q=cache:zgl1Lf2...


I'm probably misunderstanding this. The way I'm envisioning this is basically a half-dozen parallel conversations, with only one of them being the actual conversation.

Couldn't it be easily defeated with contextual analysis? I mean, if it were English sentences, the attacker could just choose a set of packets that make grammatical sense. Or in more real-world examples, you'd just choose the packets that form a valid HTTP session.

To work around this, you'd have to choose your chaff packets to flow seamlessly from one to the other, which would make chaffing a really hard problem.


In that document, he proposed two different systems.

The first transmits two tuples for every bit of the message. In pseudo-ish code:

    for(i = 0; i < message_bits.length; ++i){
        send( (sequence_number, 0, (message_bits[i] == 0? HMAC(shared_key, sequence_number . '0') : rand())) );
        send( (sequence_number, 1, (message_bits[i] == 1? HMAC(shared_key, sequence_number . '1') : rand())) );
    }
Without knowing the HMAC key, the only information exposed is the length of the message. Without the HMAC key, you can't tell whether '0' or '1' (both of which are sent, for every bit in the message) is the correct bit.

The second works in a slightly different way. You take a message, and transform it into a package in such a way that one must determine the entire package contents before being able to decode any part of the package. (Rivest's proposal for such an 'All-or-Nothing Transform' can be found in [1]).

Next, you packetize that transformed package. Rivest proposes blocks of 1024 bits, but it could be anything. The point is you can't send (2^1024 - 1) 'chaff' blocks, which would be required to not leak any information about a 1024-bit message block.

But, crucially, you must determine which is the correct block for every sequence number sent. So, if one includes just one 'chaff' block for each 'wheat' block (so there is one wheat and one chaff block for each sequence number), and sends n blocks, then an attacker who doesn't know the HMAC key would need to choose the right combination of blocks of all sequence numbers--a 1 in 2^n chance--before being able to decode your message.

If one splits an all-or-nothing transformed message into at least 256 blocks and sends one 'chaff' block with each 'wheat' block, you could be pretty sure the message remains confidential. Of course, this assumes all those pesky security assumptions about the random number generator, transform, and HMAC function hold up.

[1]: http://theory.lcs.mit.edu/~cis/pubs/rivest/fusion.ps


Keep reading: the first example is a bit misleading (for the reason you state) and the article gets more interesting.

They deal with this problem by coding the message with single-bit packets, always contrasting 0s with 1s.


Yup, that was the critical piece I was missing. Thanks!


Kind of interesting scheme that doesn't really work in 2013.

Wouldn't this be vulnerable to replay attacks, or am I missing something?


If Alice's messages could all be intercepted and manipulated prior to Bob's receiving them, then yes, they could be changed without either party knowing.

Combined with asymmetric encryption of the messages, you should be able to prevent that from happening.


Without any manipulation why wouldn't this be vulnerable to a replay attack?


No. This is just a method of "securing" messages without encryption. It still requires a shared key. Replaying messages sent this way would be no different to replaying an encrypted message.

This is not an authentication scheme.


Might be just me, but I'm thinking encryption is way easier than this.


But that's rather missing the point, isn't it? The premise is "Under circumstances where encryption is not a viable option, what secure communication methods might be possible?" so responses that ignore the premise, like "just use encryption" or "just don't get into such circumstances", aren't the most salient critiques.


Two points: First, this was from 1998 - back when "exporting encryption" from the US was punishable as exporting weapons. Secondly, this is proposing a communication scheme where there is no "encryption/decryption key" for the NSA to coerce people into handing over.


Isn't this still, in essence at least, Steganography?


Nope. You would be painting a huge red target on yourself if you tried something like this.

The purpose of steganography is not to get noticed in the first place. It's orthogonal to regular cryptography.


Unfortunately, common steganography algos used on images are easy to detect with statistical analysis.


Images are just the lowest hanging fruit and attract a lot of sloppy schemes. You can encode information in pretty much anything as long as you are free to choose the message.

You could even encode info in plain text by the lengths of non-whitespace characters, modulo 2. As an example, try using the so-defined scheme on the letters count of this very sentence; they're encoding, repeatingly, S.O.S.

Posts of similar length as this one are enough to send a public key using ECDH and negotiate a shared secret for use with a block or stream cypher. You could then send short messages using this shared secret inconspicuously. Shannon entropy of English is about 1.5 bits per character so you could store quite a bit if you cared to compress it well before encrypting, preferably using a shared static dictionary.



Can you elaborate?


People read a book, they see a simple description of steganography, they whip up an implementation as a proof of concept, they share that code, other people think it's secure when it's not meant to be.

(http://www.ifp.illinois.edu/~ywang11/paper/CISS04_204.pdf)

(http://eprint.iacr.org/2005/305)

(http://vision.ece.ucsb.edu/publications/sullivan_ICIP06.pdf)



Steganography suffers from the warden problem: being in possession of steganography software is suspicious regardless of whether or not you use it.


A solution to this is to increase the "noise floor" by bundling steganography tools with common widely distributed software, so that obviously 99+% of people and computers with steganography software would be 'innocent'.

For example, if Ubuntu default installation would create a small (10mb?) sized volume filled with random bits and install an appropriate steganography tool designed to write/read encrypted data there, then it would enable anyone to hide some arbitrary data while having a file/software setup that's not distinguishable from millions of others in any way.


"A solution to this is to increase the "noise floor" by bundling steganography tools with common widely distributed software, so that obviously 99+% of people and computers with steganography software would be 'innocent'."

Good luck with that one. As a practical matter, this is unlikely to happen; hardly anyone requires steganography as part of their security solution (the MPAA stands out due to the use of watermarking). Email and online businesses were the killer app for public key cryptography; what killer app do you see for steganography?


I don't see a killer app for that - the whole point is not that millions need it, but that all tools needed for steganography are shipped also to millions of people who don't need it.

Someone (preferably multiple organizations) should bundle steganography just because it's desperately needed for a tiny minority - doing so would not be because of a killer app but simply a service for public good, facilitating democracy, free speech, whistleblower protection, etc.

This is aligned with the stated ideals of multiple FOSS organizations, so it is feasible to assume that someone with popular widespread software (like, say, Firefox, Ubuntu or VLC) could do that for purely idealistic reasons. The software size is tiny, so the distribution overhead would be trivial while making a serious strategic change. Do it just because it can be done.


The default installs of shells and window managers are likely to reveal whether the command has either ever or recently been run.. Disabling the defaults is also "suspicious".

I don't think you can fix a social problem with a technical fix. Innocent until proven guilty (of a crime with a victim please!) has to apply to employment law and clearances. Otherwise we are building a group of criminals who can honestly be believed when they say they are willing to violate the constitution to protect executive branch interests.

The trouble with the Snowden case is that the NSA now has more power to filter its employees/contracts in order to further violate the terms of the agreement.

Even drastic action would not fix it. Impeach the entire chain up the executive branch and the next one will be more secretive and let Hoover shine as the simple misunderstood Prom Queen he wanted to be.

I just hope Obama's actions will ruin him and this nonsense about replacing the President with an outsider. If that suddenly gets you an honest system instead of a cynical President, then kissing the frog must work too.


If BackTrack were outlawed as a munition, it would spread more, briefly. http://www.backtrack-linux.org/ Download it now, whether or not you need it, because it adds noise.


> Download it now, whether or not you need it, because it adds noise.

Why would I download a security specific image?

If you want to be as secure as possible, download the smallest possible system that can bootstrap the compiler then build out from source, retaining all source and looking for variances when you recompile.

Personally, I don't much care. I am not looking for a technical solution. I am looking for a social upheavel in the form of citizens visibly exercising rights: http://www.aeinstein.org/ to finally end the cold war mentality in the US government.


Whistleblowing would be a killer app.

Imagine you wanted to leak something but don't want to attract attention to yourself. You could encrypt it (with the public key of the organization you want to leak to), hide it with steganography and then upload the result to some public place you know the organization would be monitoring.

If you had ready access to tools to do so you could do all that inconspicuously.


Is that really true? Steganography is a lot of work to set up surreptitiously (we're not all IT techs like Snowden); it also gives you a rather narrow channel to send messages through, and you still need to attach the channel to the recipient somehow. Then, afterwards, you'll want to make sure you haven't left any stego-litter that will be detected and used against you.

By contrast, a USB flash drive or micro-SD card is tiny, easy to set up surreptitiously, gives you a channel for a whole lot of data, and doesn't usually leave much evidence after you hand it over to the recipient. I'd hazard that people who care enough to strip-search you for unauthorized mass-storage devices at the door could probably also detect your steganography too, if it comes down to it.

I would imagine that there are really very few circumstances related to whistle-blowing when it would make sense to choose steganography. It seems more appropriate for espionage situations where a deep-cover field agent really, really needs to receive messages through a channel that's essentially untrackable (e.g. classified ads in a newspaper).


To be obnoxiously blunt, imagine the current situation with Snowden and assume he wanted to leak directly to Wikileaks and that they were using similar scheme to the one in my post below. This is what he would need to do:

1. Write a normal message discussing his favorite videogame on Ars Technica.

2. Encode his public key in it.

3. Use the WL public key (already available to him via the hypothetical stegano-crypto suite in common distros) to derive a shared secret.

4. Use the secret to encode and hide 20 top secret slides in his holiday family photos and upload them to his flickr account.

5. Write another post on Ars discussing some other videogame, hiding in it the URL to his flickr photos.

6. Meanwhile, WL monitors the several thousand posts per day on the most used internet forums, and detects a possible public key and tries to decrypt all the messages within the next 24 with the common secret that could be derived using it. One of them has correct checksum after decryption and gives the URL to the photos.

7. WL also daily randomly visits several thousand photos on flickr, including this time the one with the sent URL. After it gets it, it uses the shared secret and gets the message.

This whole process could be accomplished without leaving the room, without transmitting any suspicious data or contacting suspicious addresses, and would be indistinguishable from his normal online activity. As long as his computer or the WL private key are not compromised it should be perfectly untraceable.

I fail to see how arranging for a microsd card to be sent over to WL would be easier to accomplish, assuming he could be tracked and recorded constantly.

If it comes to wasting 2 MB per CD on the odd chance it could aid a whistleblower of similar importance every couple of decades, I'm all for it.


without transmitting suspicious data except:

1) ars technica post with encoded public key 2) ars technica post with shared secret of some kind 3) ars technica post with hidden url 4) flickr photos of size (visible_resolution + resolution_of_hidden_images + any_salt)--way larger than they should be

This is without mentioning that in order to use this system, he has to have either already contacted wl to set it up (just moving that risk to some other time) or wl has to have indicated that messages of those kinds will be read (ensuring that the nsa knows too, and is paying attention).


Exactly. Narrow pipe, difficult to route to its destination, and unless it's very well constructed it's quite probable that it leaks information about the existence of secret messages to an adversary.

Sure, with TrueCrypt on your laptop's drive you have lots of data and you can just say "I'm just securing my hard disk against loss, there's no hidden partition" and that'd be one thing. That's fine. But if you work for the TLA and they're reverse-engineering the latest leak and they find out that you've been posting lots of JPEGs and there statistically more entropy in the low bits of the pixels than would be anticipated given traditional JPEG encoding libraries ... then you might have some serious 'splainin to do.

A USB drive does not suffer that flaw. It can only leak the existence of a transmission to people who can physically see it. Isn't the goal of steganography hiding messages? Now you can physically hide the message...

You can even send it in the mail for at most a couple dollars' worth of stamps, without any direct way to trace it back to you. And then they have one chance to intercept it (which you can surely render tamper-evident in some manner.)


Would it surprise you to know that all U.S. postal mail is also being monitored and recorded? [1] As soon as they see something addressed to a known WL address they will trace it back, find where it was sent from, find out the serial number of the sd card, get the shop where it was bought, etc. You wouldn't want that kind of attention. And if you don't have a private channel to WL you could only use their public addresses by definition, which would be guaranteed to be heavily monitored.

This is why I consider a working public steganography protocol so important. Using a very short message you could arrange the sd card to be dropped at some random place and know that somebody would come back in a couple of days to retrieve it. Encoded with ordinary simple text, using messages of typical lengths on popular, public websites. There are just too little bits of encoded information there to be statistically significant.

I'm not so worried about statistical analysis of how natural sounding or typical or expected the text you're producing is, as it would be a very difficult problem considering it requires a good understanding of natural languages to be done well automatically. What would be really problematic is that WL may very well be infiltrated and the private key compromised. Then you would be really screwed.

[1] http://www.nytimes.com/2013/07/04/us/monitoring-of-snail-mai...


I know perfectly well that all postal-mail envelopes are recorded, yes. But I have doubts that they're intercepting all envelopes heading to Wikileaks. You can attempt to make the envelope tamper-evident, you can buy the SD card with cash at some no-name Chinatown vendor, you can drop it in the mail at any mailbox in town, and you can encrypt all the contents with the public key. This does not guarantee that your message makes it through to Wikileaks per se, but it does make it really hard to trace to you.

If you'd like, you could also identify a separate, known Wikileaks sympathizer and mail the encrypted data to them, requesting that they forward it? They could be quite brazen about it, as they are at no risk of being identified. And there are many Wikileaks sympathizers in the world; the government may be scanning their envelopes but they're not inspecting and opening all of them.


I don't think you understood the process--there is no 2) post to exchange a shared secret, as it is derived from his private key and the already known to him public key of WL.

There is also no prior need to contact WL--this is the whole point of the scheme. All this is public so NSA is supposed to be well aware of WL monitoring all these messages. That's why he would be encoding using wordlengths modulo 2 (see my post below) so they are statistically indistinguishable from normal text. NSA would not be able to detect suspicions activity without having the WL private key.

Also, there is no need to raise undue suspicion by using untypical photo sizes as he can publish as many and as different files (e.g. video, music, etc.) as he requires once the harder process of establishing channel of communication with WL has already been accomplished.


You need steganography to store documents that you can disclose if you want, but cannot be legally forced to admit that you have them if you don't want to.


In that respect, is it any worse than being in possession of a copy of GPG?


The noise level with GPG is fairly high. Anyone with a modern Linux distro install has a copy by default.


No, but it is not any better either. There are other issues to consider as well e.g. the lack of widely deployed public key stegosystems.


That's why you should always steganographically hide your steganography software.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: