
Insecurity in the Jungle (disk) - cperciva
http://www.daemonology.net/blog/2011-06-03-insecurity-in-the-jungle.html
======
tptacek
Summary: Jungledisk doesn't protect the integrity of encrypted data, and
doesn't securely derive keys and is thus vulnerable to fast offline attacks.
The thing Jungledisk right is to use the same block cipher mode as Tarsnap
(and, incidentally, virtually every mainstream encrypted storage system).

The impact of using unauthenticated encryption to store data is that your
backup provider could end up owning your machine. Attackers can carefully
choose which data to corrupt. They can exploit the randomization of corrupted
decryption to set up conditions for memory corruption exploits, and, in more
sophisticated but totally realistic attacks, exploit guesses about known
plaintext to produce attacker-controlled nonrandom plaintexts. A backup
provider with client-authenticated crypto can't do that, because the keys that
encrypt the data also ensure it's integrity.

The password storage issue is no different than any other password storage
problem; again, direct your attention to <http://codahale.com/how-to-safely-
store-a-password/>, mentally substituting "storage of password hash" to
"derivation of AES key".

To my mind, the key derivation is the real problem here. A surprisingly large
number of secure encryption storage products don't ensure data integrity.
Realistic attacks against that vulnerability are feasible but difficult: you'd
have to be targeted.

If you're going to write an article about how a competitor's encryption is
inferior to yours and cast it as a vulnerability report, I'd suggest not
recommending your own encryption scheme as the replacement. The scrypt
recommendation in this article sticks out like a sore thumb. Virtually nothing
uses scrypt.

We can nerd out on CTR mode vs. CBC mode; I'm starting to come around to
Colin's take on CTR because of ciphertext indistinguishability as I see more
practical vulnerabilities that take advantage of it. I think the padding issue
is a red herring. CBC padding is easier to get right than absolute rock solid
reliable generation of CTR nonces and absolute rock solid management of CTR
counters, which are things I see people get wrong regularly.
Distinguishability is the real problem with CBC.

~~~
cperciva
_To my mind, the key derivation is the real problem here. A surprisingly large
number of secure encryption storage products don't ensure data integrity.
Realistic attacks against that vulnerability are feasible but difficult: you'd
have to be targeted._

I think the lack of integrity is more important than you're making it sound.
There's a lot of situations where a lack of integrity can be exploited to
create a lack of privacy too.

But the main reason I mentioned the lack of integrity first is that I needed
to mention the lack of HMAC to explain why they had the ridiculous "salted key
hash" construct.

 _If you're going to write an article about how a competitor's encryption is
inferior to yours and cast it as a vulnerability report, I'd suggest not
recommending your own encryption scheme as the replacement. The scrypt
recommendation in this article sticks out like a sore thumb. Virtually nothing
uses scrypt._

I think you're misstating what I wrote a bit. I said that scrypt is the state
of the art in the field -- which it is -- and that given that Jungle Disk was
around before I developed scrypt, they should have used PBKDF2 or bcrypt.

~~~
tptacek
I'd rather geek out about CTR v CBC than harp on the scrypt recommendation.
Consider the scrypt thing a friendly style note. You wrote an article about a
competitor's insecurities. When you do that, don't recommend they adopt your
own cryptosystem unless (like CRI had to do with DPA countermeasures) they
have to. Here, it just made you look unnecessarily petty.

What privacy attacks were you thinking of? Call some of them out.

~~~
cperciva
_Consider the scrypt thing a friendly style note._

Note taken. :-)

 _What privacy attacks were you thinking of?_

Things like replacing files with malware.

~~~
tptacek
Yeah, we were saying the same thing, I think, but you said it more clearly.

------
rarrrrrr
My understanding is that SpiderOak, Tarsnap, and Wuala all do this correctly
(using one of PBKDF2, bcrypt, or scrypt.)

Colin - Perhaps the companies in the backup space that put effort into
handling this carefully should work together and create a PSA style website
with a matrix chart of how the varies providers handle "encrypted" data. Make
it a separate domain and do our best to be elaborately objective about it. Any
interest?

~~~
tptacek
What block cipher mode does SpiderOak use, and how does it verify the
authenticity of its data? Tarsnap goes through a lot of extra trouble to MAC
its data; few other providers do. You'd hate to see everyone treat key
derivation as a shorthand for "doing all of encryption right".

I looked on the SpiderOak site, saw a lot of material on how keys are derived
and not stored on SpiderOak servers (great!), but didn't see a lot of details
about the mechanics of actually encrypting and checking data.

~~~
rarrrrrr
Thanks for asking. If you're interested, would be very happy to discuss
SpiderOak's crypto strategy in depth with you the next time I'm in Chicago.
Could share source code, etc. IMO, most interesting parts are the key scoping,
which allow users to selectively publish ("share") portions of stored data by
publishing the appropriately scoped keys.

SpiderOak uses AES256 in CFB mode with authentication via HMAC. The code is
careful about unique nonce/counter usage, crypto code is confined to specific
modules that rarely change, and reviewed by cryptographers outside SpiderOak.
Client and server have minimal trust relationship.

Being paranoid about data integrity (not only because of crypto issues, but
also because bitrot happens routinely at petabyte scale) the data
authentication happens repetitively at a few different layers. From all end
user devices, we see about one bit error per 4.2tb of upload transactions.

~~~
Locke1689
Why CFB vs CTR? Is there some reason parallelization is unneeded or
impossible?

~~~
tptacek
Also... what counters?

(Regardless: happy to get together anytime in Chicago).

------
imajes
@cperciva: Thanks for this; now i'll convert my 8char ascii system password to
a 10char one. Do you have any data showing how large a password needs to be to
make it ridiculously expensive for a TLA (gency) to commit a large amount of
hardware to cracking? i.e. how much time past the 10chars does it consume ?

~~~
cperciva
It depends on your KDF. MD5 is ridiculously weak; the standard MD5-crypt is
1000 times stronger; bcrypt is better yet; and scrypt is vastly stronger.

The best source for this my scrypt paper, really.

~~~
SoftwareMaven
What license is the scrypt code released under?

~~~
tptacek
It's BSD licensed but probably not easy to integrate on your platform. BCrypt
is an easier choice. When we see Java and .NET implementations of scrypt,
we'll start recommending it, but I'll be honest and tell you that we rarely
recommend scrypt today.

------
euroclydon
If, like me, you're wondering: "Does 1Password use all the stuff?"

<http://agilebits.com/products/1Password/user_guide>

~~~
jmtulloss
More details here:
[http://help.agilebits.com/1Password3/agile_keychain_design.h...](http://help.agilebits.com/1Password3/agile_keychain_design.html)

~~~
euroclydon
Thanks, that's the page I was on -- stupid AJAX navigation!

Anyway, the document seems *nix specific. Does Windows have a /dev/random?

------
kevindication
Stupid question: Why is the 34 character password easier to crack than the 8
character password?

(Upon re-reading I think I may have missed the assumption that the long
password only contains english text, no punctuation, numerics, etc.)

~~~
cperciva
Yes, the 34-character is English text.

------
PonyGumbo
Given the available options, what's the best option for automated backups?

~~~
dchest
Apart from Tarsnap, maybe Duplicity <http://duplicity.nongnu.org>?

~~~
cperciva
Agreed. There are some things I dislike about duplicity (e.g., its reliance
upon GPG) but it's probably what I would use if I couldn't use Tarsnap.

~~~
click170
What makes you shy away from backup apps that rely on GPG?

~~~
tptacek
Cryptographers hate GPG. GPG is ugly as sin†. Unfortunately (and I mean that
only with a little bit of snark), GPG mostly still works, in the sense of
standing up to active, informed attackers with modern techniques.

† _For instance, look how it handles message integrity._

~~~
cperciva
Your definition of "mostly still works" is "it's secure as long as you ignore
the vulnerabilities people keep on finding"?

~~~
tptacek
This is a slippery slope argument that ends in you arguing that the best
tested cryptosystem in common use (TLS) is _also_ insecure. All cryptosystems
have vulnerabilities; the question is, how workable is the system after those
flaws are fixed.

~~~
cperciva
Well, yes. I also think SSL is too complicated for people to get right. ;-)

~~~
tptacek
For the record, I respect the critiques practitioners have of GPG.
Unfortunately, their alternatives tend to be ad-hoc. There should be a clean,
simple, GPG-like standard, perhaps based on ECC and AE cipher constructions,
to replace GPG. But until that happens, in the choice between ugly and
workable vs. simple and fragile, ugly and workable is the right choice for
most people.

As always I think you drastically underestimate how dangerous this stuff is
because you've dedicated your career to it, while normal implementors --- even
crypto enthusiasts (look at Tor and SSH) --- have little of the nuance
required to get it right.

I like the fundamentals of TLS more than you do; I don't think it's a bad or
needlessly complex protocol (except maybe session resumption). I see that
reasonable people can differ on that point. But, _very importantly_ , TLS is
also a vehicle for collecting and implementing the best known methods in
cryptography. I think you tend to overlook that.

As always, my opinions are as a software security practitioner and not as a
cryptographer, since I am not one.

~~~
sigil
It sounds like Colin is taking issue with openssl the implementation, while
you're defending TLS the protocol. In that case, I agree with you both.

(As an aside, it's great to see two of my favorite HN commenters in the
security field engaged in conversation at this level.)

~~~
tptacek
The appearance and track record† of the code in OpenSSL does the credibility
of TLS no favors, and it is totally understandable why someone who had to deal
with software security for a platform that ships and depends on OpenSSL would
become allergic to it.

But, two responses to that:

* First, what Joel Spolsky says about rewrites. Sometimes code is ugly for a reason. Clean rewrites of OpenSSL will inevitably introduce bugs. Introducing bugs in SSL†† implementations is perilous.

* Second, there are mature alternatives to OpenSSL. For instance, most? browsers don't use it.

† _In fairness, that's because OpenSSL dates back to a time when nobody was
getting C software security even close to right._

†† _I use TLS and SSL interchangeably, which is a foible I should work on
correcting, but the difference doesn't matter much here._

------
drivebyacct2
I continue to not understand how people imagine these services working (de-
dupe, block level updates, etc) without access to the unsecured version of the
data. As for the claims about what Amazon could do to your data... there's
even less sinister options. S3 is _not_ 100% safe storage. There's a chance
for bit rot and that may occur. If you don't check the file yourself, you
won't know. Again, that seems a bit inevitable, no?

edit: left out a 'not'

~~~
gst
De-dupe: Wuala encrypts the file with a key derived from the file itself. This
key is then encrypted with the user's key and both (the file and the encrypted
key) are uploaded to the cloud. Disadvantage: If the file is known to an
attacker (i.e., a copyright holder) the attacker can possibly find out which
users have access to this file. Advantage: Allows for de-duplication, but is
more secure than Dropbox.

Block-level updates: I don't see a problem with this. Partition the file into
blocks on the client (before the encryption). The server doesn't need access
to the data for this.

~~~
tptacek
As Steve Weis pointed out in an earlier thread about schemes like this,
deriving keys from the contents of files breaks semantic security. Lay
engineers reason about this problem the way you just did: "the RIAA can tell I
have Lady Gaga MP3s". But practitioners are worried about much more subtle and
devastating flaws, particularly in cases where attackers may exercise some
control over the blocks being encrypted.

Any scheme that derives passwords from file contents gives me the willies.

