Yes, this is known as the "confirmation of file attack" and there is no feasible way for the system to operate without it.
The confirmation of file attack is actually the degenerate case of the "learn the remaining information attack", in which the majority of the plaintext is known except for some low-entropy portion.
You can imagine a standard form letter that contains your credit card number. An attacker can then generate all possible permutations of that low entropy data and find matches where those are stored.
Thanks for that information, that is extremely informative.
But what does this limitation mean for the security of Cryptosphere for its defined use cases? from the article: "If you want to store banned books or political pamphlets without attracting the attention of an oppressive government, or store pirated copies of music or movies without attracting the attention of copyright holders, then the confirmation-of-a-file attack is potentially a critical problem."
Doesn't this mean this system is DOA for its intended purposes?
No, I plan on employing the same system that Tahoe does: I will optionally incorporate a random convergence secret. This effectively disables the deduplication properties, but provides a defense against these two attacks. This convergence secret can be added to the end of every capability token, or optionally omitted (in which case I use zeroes). So you have two options: allow deduplication but be susceptible to the confirmation of file attack/learn the remaining information attack, or more security but with duplication.
Cryptographically this feeds in as a salt/initialization vector to HKDF along with the entire plaintext. HKDF is then used to generate a key and iv for use with AES
The confirmation of file attack is actually the degenerate case of the "learn the remaining information attack", in which the majority of the plaintext is known except for some low-entropy portion.
You can imagine a standard form letter that contains your credit card number. An attacker can then generate all possible permutations of that low entropy data and find matches where those are stored.
For more information see: https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html