Hacker News new | past | comments | ask | show | jobs | submit login
TrueCrypt Volumes are Indistinguishable from Random Data (16s.us)
51 points by ef47d35620c1 on Jan 29, 2014 | hide | past | web | favorite | 42 comments

It is true that any strong symmetric encryption generates data that it is indistinguishable from random for any (efficiently computable) statistical test, but these tests are extremely weak and don't prove anything.

All but the last two only look at distribution of the bytes, meaning that the string "\0\1\2...\255" repeated many times would give the same values, but it doesn't look random at all. The Monte Carlo computation of Pi also ignores the order (sums are commutative).

The serial correlation coefficient only looks at the correlation between the sequence and itself shifted by one, ignoring higher-distance correlations, so it almost as easy to produce very regular sequences that have give very small coefficients.

It's heartening that TrueCrypt encrypted volumes pass statistical randomness tests, but it's important to remember that this says essentially nothing about its cryptographic security.

Indeed. The output of a noncryptographic generator like the Mersenne Twister would also pass those same tests, and would be not secure at all. This is not how indistinguishability works.

The 'correct' to make this point would be to analyze the volume format, and reduce distinguishing the container to distinguishing the underlying block cipher from a pseudorandom permutation. That is, distinguishing a TC volume implies distinguishing AES (or whichever block cipher combo TC supports).

It's basics of cryptography, to have a stream statistically indistinguishable from random data. If TrueCrypt volumes were not random, that would be a huge fail. Otherwise it's just a necessary condition to be secure.

In doing volume forensics it is exceedingly rare to find a full volume of random data and when one is found one can be fairly certain that someone is using TrueCrypt or had a lot of time on their hands when destroying evidence by using dd to write dev/random to disk. Depending on the technical capacity of the forensics lab this observation could be sufficient enough evidence in itself to back additional search warrants or court orders. Remember, it is absolutely possible to stand out simply by being off the curve of normal, whether when using HD encryption or anonymizing services.

The question is, do people usually keep large areas of random data on their hard disks?

When talking about normal people, the answer to that question is, "No". TCHunt was written in 2007 (7 years ago) and demonstrates that random data that is modulo 512 and larger than X bytes is very unique when looking at files that typically reside on the filesystems of end use computers. Forensic IT examiners use that to find disguised TC volumes.

It would be nice if the default filesystem on an unsuspicious and relatively popular operating system (Linux?) normally overwrote erased data with random data, and worked to maintain a large contiguous area at one "end" of the drive. You would need to be careful not to fill up your drive to the point that it overwrote part of your hidden volume, of course.

I think this would allow for true deniability.

You might find this a worthhile read about such an approach to filesystems at the filesystem level https://www.usenix.org/legacy/events/sec2001/full_papers/bau...

Though shred and other tools to delete via random overright do exist, but as you imply a low level approach would be much more secure.

Another consideration is the type of storage, then there are backups which will still have deleted data. Let alone SSD's which are a whole different breed and can transparantly make a block as dead and realocate some of the reserved data storage without you knowing and with that leave the existing data permently inplace for forensics. With that having an excrypted file system would certainly help cover such issues, though encryption within encryption could be the extra layer of plausable denability you require.

Worse, overwriting SSD's with random data would halve their lifespan, and some SSD's don't even allow you to control where exactly you are writing to. Better to just never write unencrypted data to the drive at all.

Doesn’t FAT do that? Although maybe a Linux user choosing to use FAT for some other reason isn’t that plausible in itself…

Don't be so quick to judge, I've got a small FAT partition on my boot drive right now for the EFI system partition. That partition also doesn't frequently have data written to it, you could in theory hide a small encrypted volume in the free space and so long as Grub wasn't updated nothing would touch that free space.

I don't think FAT even bothers to zero the data when it's allocated, so you might be able to see what was previously stored in a block by reading it straight after allocation.

OTOH this might be driver dependent.

What random bits on my drive? Oh those, those are left over from testing to see if truecrypt volumes are distinguishable from random data.

Overwrite entire disk with random data. Create new filesystem. Place encrypted volume in unallocated space. "The disk was just securely wiped prior to the current OS install"

Even the most plausible deniability fails when the information-holder is sufficiently incentivized to tell the truth, whether that is by threats of punishment, torture, etc.

Also, if the police have a record of you downloading CP (or whatever), find your hard drive with a huge random segment, and find TrueCrypt software on your machine, they're probably going to put two and two together.

I think the point of plausible deniability applies to the notion that random data on your computer is circumstantial evidence that isn't particularly useful in getting a conviction.

Circumstantial evidence is used all the time to get convictions. It's not proof, but if there's enough of it it becomes very persuasive.

They might not get a conviction for images of child sexual abuse, but they'll put you through the grinder to get you to reveal any encrypted data. The US has some protections and case law for this. The UK has RIPA and peoe have gone to prison for not revealing encrypted content. Some cases have maximum terms of 2 years but some have maximum of 5 years.

This needs to be factored into the risk assesssments of people using encrypted volumes.

I almost always use truecrypt to erase my disks completely when I'm re-purposing them or before throwing them away or giving them to someone else.

You never know where they are going to end-up years later.

I just bang randomly on my keyboard when Truecrypt asks for a password and then let it overwrite the drive. It's pretty fast too and works the same way on all platforms.

Certainly some types of data sets can appear very random and radio telescope do produce much data that could certainly qualify as random.

Also with trust issues in entropy for random number generation the viable options for large random chuncks of data start to become more appealing in use for certain tasks.

Nope. Grain on an image sequence? Yes.

Hypothetical scenario:

  $iv = rnd();
  $encrypted_header = byte[256];
  $checksum = sha1($iv + $encrypted header);

  disk layout: $iv + $encrypted_header + $checksum
This would look like random data to any generic statistical test and yet be easily identifiable for a specific pattern matcher simply by doing the hash based on the visible data and checking if it matches. Of course this can be easily prevented, but a statistical test is insufficient to prove that.

That is why I created a service that will securely delete your data if you don't remind it every day. Just say that is where you keep your passwords (and maybe add it to your bookmark bar to be more convincing) It's the only service you don't have to use to be useful!

What if you end up in hospital or have a power cut?

I think the idea is just that you say that you used some dead man switch to store your real password. You could even have it actually overwrite some 1 MB random file and claim that it was the keyfile for the volume. In reality you wouldn't use it at all and just memorize the password like normal, no one would be able to prove if that really was a keyfile or just a decoy and you couldn't be in contempt of court if you can plausibly claim that you don't have the information to open the volume anymore.

Exactly. Out of nowhere, I ended up in the hospital for five days and this would have made things even worse.

The canary will die.

iirc there was an article few days ago about tools to detect trucrypt encryption types in use. The article mentioned the presence of some files cached by OSes, sometimes disclosing the presence, the type or even part of the content of TC volumes. Personally that is the most scaring part, having cached data leftover in the system, which is much more indicative than some pseudorandomness of TC strings.

This has to be false, no? If there is data encoded there, that is non-random, then there is also non-randomness encoded there.

One of the primary goals of encryption is to make the result random, such that no structure can be determined. However you are correct, there is structure there, but it's encoded with so much randomness that the structure is effectively hidden. It's the whole point of encryption.

The point of encryption is to make non-random data appear random, and sense can only be made of it when the correct key is used to decode it.

there are a lot of people who know something that 99.9% of the population does not.

99.9999999% * 7.14 billion implies seven people and maybe a parrot knows something.

Does this mean that TrueCrypt encryption also works as near-perfect compression?

No, quite the opposite, it takes your once compressible data and makes it uncompressible.

that means you are using truecrypt and now please give us the private key or...

Placing a marker on TC volumes, or some sort of easily identifiable header, would have prevented governments from thinking that /dev/urandom data may in fact be TC volumes. By trying to hide the volumes, and to claim that they "are just random data", they've increased the risk to innocent people who may indeed have random data on their systems that may be mistakenly identified as a TC volume.

I can just image someone getting tortured for access to a hidden volume that they can't prove to not to exist.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact