
A global decentralized encrypted datastore with anonymous publishing - adulau
https://github.com/tarcieri/cryptosphere
======
api
Unfortunately it will end up full of kidporn just like Freenet. This says a
lot about the present state humanity, and what it says isn't good. Giordano
Bruno and the Renaissance alchemists would have killed for a system like this,
but the most interesting thing we can think of to do with it is fill it full
of pictures of child abuse so people can jack off to them.

It's nasty stuff, too. This is not "merely offensive." Kids are abducted and
murdered to make this stuff. In some cases, the act of making it causes bodily
harm. (Think of the mechanics of an adult having sex with a six year old.) In
parts of the world children are more or less raised for the purpose. It's
horrific. Picture someone getting off on photos of war atrocities. This is up
there with that, but worse... imagine that the war was held expressly for the
purpose of producing the images.

Edit: there is also, of course, rape porn depicting adults being abused
horrifically. That isn't any better, and freenets are full of that sort of
stuff too.

I'm not saying the technology is bad. I'm saying that it says something very
depressing about the users of said technology. It also makes me dubious about
running nodes on such networks, since I know that a lot of the material I'll
be storing and forwarding is abuse-porn.

~~~
kvnn
I want to solve this problem.

~~~
api
I've thought about solving this problem too.

One idea I've had is a content-type-restricted network that permits only text.
That would allow utterly un-censorable _communications_ : chat, planning
revolutions, whatever, but wouldn't be useful for CP. (Unless you like ASCII-
art CP.)

It could support ANSI. That would be neato. It would feel like the old BBS
world. Wonder if anyone would still care if a name like ViSiON-X were stolen
for it. :)

~~~
rmc
This has 2 problems:

* You can base64 encode any file, so it'll look like text. Limitations on message size might solve that.

* Sometimes, photos and videos are important. Think of the Abu Ghraib torture pictures, the Tianamen Square Tank Man. Sometimes photos & videos are censored and should be shared with the world.

~~~
waterlesscloud
But base64 encoding is pretty easy to identify, as are most imaginable
encoding schemes. Simply disallow them.

If the encoding schemes become so obscure as to not be recognizable, then the
problem is still effectively solved.

~~~
jackowayed
How about 256 words (or sets of words) representing each byte value? It's less
dense by a factor of ~5, but it would work, be easy to decode, and be very
difficult to identify, especially if you used sets of words. You could even
cleverly generate in a way that is grammatically correct.

------
knowtheory
_However, in a direct peer-to-peer exchange the Cryptosphere does nothing to
mask the transactions a particular is performing, as opposed to systems like
Freenet and Tor which make an effort to obscure which host you're actually
talking to by routing them through a chain of proxies. In this regard the
anonymity guarantees of the Cryptosphere are no different from a system like
BitTorrent, aside from the plausible deniability defense that comes from the
fact all content is encrpyted and peers automatically provide storage service
to other peers.

Instead, the Cryptosphere favors system robustness over guarantees on
anonymity. Participants in the system maintain a history of their activities
in the form of a long-chain certificate. You can think of this being somewhat
like the BitCoin block chain, where the longest version always wins, and its
integrity can be cryptographically verified. Every peer maintains its own long
chain certificate of all its activities, including services requested and
services completed.

Rather than verifying the integrity of a long chain based on hashes, the
Cryptosphere uses public key cryptography. Peers requesting services sign off
on both the request and delivery of a service (e.g. storing and serving a
particular chunk of a file). While in isolation the data points contained
within a particular long chain certificate are meaningless, peers can collect
several of these certificates and build a database of other peers in the
system, using tools like collaborative filtering to make intelligent decisions
about which other peers are worth interacting with._

This is interesting. If i understand correctly, this means that given a
transfer that you'd like to engage in, searching for a particular file you can
trace back the provenance of that file through a network of peers who are
making it available (if their transfer histories are accessible).

~~~
alttab
_trace back the provenance of that file through a network of peers who are
making it available_

That doesn't sound very anonymous, does it?

~~~
ewillbefull
_Instead, the Cryptosphere favors system robustness over guarantees on
anonymity._

You trace back the "provenance of that file" through crytographic signatures.
You could make your own throwaway identity, use it to publish something, and
through the continued propagation of that data through the network its
publication would no longer require your activity.

It should be considered psuedoanonymous publishing.

~~~
sp332
But since your throwaway identity has no reputation on the network, it is much
harder to find a node willing to carry your data.

~~~
davorak
Which would create a need for reputable publishers who would review
submissions and publish them if they were valuable.

------
s_henry_paulson
Fantastic idea. What happens though if a peer sharing data goes offline? Does
that person's data disappear as well?

Or can you build up enough "creds" to keep your data in the cloud for some
time after your node disappears?

~~~
ewillbefull
_Or can you build up enough "creds" to keep your data in the cloud for some
time after your node disappears?_

Bingo, from what I understand. More to the point, the data will persist as
long as users continue to transfer it.

------
coob
How do I not unwittingly get CP being stored on my device?

~~~
JulianMorrison
Going by the description, if you know a file's plaintext, you know both its
plaintext hash (decryption key) and its encrypted hash (storage key). The
communication to request files uses the storage key. It would be rather easy
to identify by statistical means, people who request files of a common type,
for example known CP images. Forwarding a few requests for CP or having it
stored doesn't indicate anything. Reading or writing a lot of known CP would
be probable cause for a search. This is especially easy because the majority
of CP is not new and FBI presumably has the whole damn lot on a hard disk
somewhere. They can easily pre-compute hashes and just sit monitoring.

As a result, this network would be useless for CP-mongers, they'd get caught
about as easily as using plain old FTP.

On the other hand, I do think the ability to pre-compute hashes is a flaw that
massively reduces this network's usefulness to dissidents. It is quite
effective as a "publish a manifesto network" with a secret writer and overt
readers - the original write is of a never-before-seen file, there is no hash
to monitor. It's unsafe as a "store my pirated stuff" network, writers of a
well known file can be tracked. And it would take very little statistical
monitoring to reveal the interests of a reader.

~~~
maxerickson
It would be trivial for a user to layer some encryption on top of the service,
preventing anyone on the network from doing any analysis of what they are
transmitting.

~~~
JulianMorrison
Not if it has to be decrypted as soon as it reaches the forwarding nodes - the
way to monitor it, is to run black-hat nodes.

~~~
engtech
index sites could have password encoded rar files of
AwesomeNewMovie-1080P-SceneHipsters.rar (like they already do if they're using
file lockers)

wouldn't the hash be different for every rar password used?

I guess the problem is once any copy infringing hash is found it is trivial to
search the network and find everyone who has transferred it?

edit: although if the only writer is the uploader, would you be able to tell
who read the copy infringing copy vs who just has it because their part of the
network?

------
Torgo
I quickly read over the description of how this works. Nodes have a crypto
identity which is used to establish trust between nodes, and storing data from
another node gives you bandwidth "credit" to download other stuff.
Transactions are cryptographically signed, but presumably this is safe because
the contents of the files are unknown because they are encrypted. So how are
they encrypted and then stored, and later retrieved? Content is hashed, the
hash is used as the AES256 decryption key. Another hash is made of the
encrypted data, and this is your lookup key. Only the crypto hash is needed to
query a file, but it's useless to you unless you know the original data hash
to serve as a decryption key.

I see two problems with this:

1\. So, if both hashes of a file of illegal content becomes publicly known,
like say on a website, I don't see how you avoid liability having it on your
machine. It seems you can only avoid legal liability if someone stores stuff
on your machine that is never intended to become publicly available. In any
other case, the system has created a cryptographically provable trail between
the data and your storage, which can be used to prosecute you.

2\. The FBI can generate a SHA256 hash of every computer file of child
pornography it has ever collected, and immediately be able to identify every
node that contains this data. Presumably this gives them enough legal
authority to shut down your node, regardless if you have plausible deniability
that you are aware of the contents.

~~~
bascule
Yes, this is known as the "confirmation of file attack" and there is no
feasible way for the system to operate without it.

The confirmation of file attack is actually the degenerate case of the "learn
the remaining information attack", in which the majority of the plaintext is
known except for some low-entropy portion.

You can imagine a standard form letter that contains your credit card number.
An attacker can then generate all possible permutations of that low entropy
data and find matches where those are stored.

For more information see: <https://tahoe-
lafs.org/hacktahoelafs/drew_perttula.html>

~~~
Torgo
Thanks for that information, that is extremely informative.

But what does this limitation mean for the security of Cryptosphere for its
defined use cases? from the article: "If you want to store banned books or
political pamphlets without attracting the attention of an oppressive
government, or store pirated copies of music or movies without attracting the
attention of copyright holders, then the confirmation-of-a-file attack is
potentially a critical problem."

Doesn't this mean this system is DOA for its intended purposes?

~~~
bascule
No, I plan on employing the same system that Tahoe does: I will optionally
incorporate a random convergence secret. This effectively disables the
deduplication properties, but provides a defense against these two attacks.
This convergence secret can be added to the end of every capability token, or
optionally omitted (in which case I use zeroes). So you have two options:
allow deduplication but be susceptible to the confirmation of file
attack/learn the remaining information attack, or more security but with
duplication.

Cryptographically this feeds in as a salt/initialization vector to HKDF along
with the entire plaintext. HKDF is then used to generate a key and iv for use
with AES

------
peteretep
I had half a design prepped that allowed a similar thing using bitcoin wallets
- in order to buy space on the system, you have to have a bitcoin in a wallet,
and use the comment on the commit of the bitcoin to the wallet to sign the
piece of data...

------
dedward
Does this not set things up for a hash collision? They do exist, after all...
if this scales to global proportions using SHA256 without verification for
deduplication (if I read this correctly) is a risk.

~~~
wcoenen
Assuming that there are no weaknesses in SHA256, you'd have to calculate a set
of 4.8x10^35 hashes to have a one in a million chance of seeing at least one
collision in that set[1].

If you could calculate (and store!) a trillion trillion (10^24) hashes per
second, that would take about 15000 years. Needless to say, nobody has ever
found a SHA256 collision.

[1] <http://en.wikipedia.org/wiki/Birthday_attack>

------
mtrn
... in 410 lines of code.

~~~
judofyr
Well, he's already built Celluloid (for handling concurrency) and DCell (for
handling the distributed part).

~~~
mtrn
Just learned about those two projects, thanks.

~~~
axx
Just leaned about the fact, that he's the maker of Celluloid/DCell. Both
pretty awesome!

------
conductor
Tony Arcieri, be prepared for airport searches and harassing interrogations.

edit:

Well, at least that's what CryptoCat author has got

~~~
knowtheory
Depends how he plays it. The author of CryptoCat and also Jacob Applebaum (who
works on TOR) have been outspoken in the media about their gear and its intent
(and in Nadim Kobeissi's case, he's not an American citizen).

I'm betting that Tony could get away with keeping things on the downlow and
not getting harassed too badly. But this is an empirical question. We shall
see.

~~~
pyre
I thought that Jacob Applebaum gets harassed due to his connections to
Wikileaks. The Tor Project was US government-spawned after all (wasn't it a
Navy project?).

------
sweis
Is this thing using raw RSA with no padding?
[https://github.com/tarcieri/cryptosphere/blob/master/lib/cry...](https://github.com/tarcieri/cryptosphere/blob/master/lib/cryptosphere/crypto/asymmetric_cipher.rb#L52)

~~~
bascule
Padding is added automatically by OpenSSL unless explicitly disabled. Also I
will be switching to Curve25519 (for DH-style key exchange) and ECDSA (for
signatures) quite soon. There will be no other use of pubkey crypto, so
actually that entire file will be gone soon.

------
lizzard
I think this may be a good solution for feminist hackers who want to name and
shame rapists or incidences of sexual assault.

------
mtgx
Could this be used for <https://projectmeshnet.org> and how?

~~~
ewillbefull
cjdns doesn't need a bartering system for handling traffic. It uses Kademlia
and some novel routing concepts, peering is assumed to be mutual because it's
already explicit.

------
lizzard
This looks like the perfect platform for feminist activists to name and shame
rapists. Great!

~~~
Zikes
Doesn't the media do a pretty good job of that already?

