
An off-grid social network - staltz
https://staltz.com/an-off-grid-social-network.html
======
georgecmu
As a historical note, there used to be quite a few very popular solutions for
supporting early social networks over intermittent protocols.

UUCP
[[https://en.wikipedia.org/wiki/UUCP](https://en.wikipedia.org/wiki/UUCP)]
used the computers' modems to dial out to other computers, establishing
temporary, point-to-point links between them. Each system in a UUCP network
has a list of neighbor systems, with phone numbers, login names and passwords,
etc.

FidoNet
[[https://en.wikipedia.org/wiki/FidoNet](https://en.wikipedia.org/wiki/FidoNet)]
was a very popular alternative to internet in Russia as late as 1990s. It used
temporary modem connections to exchange private (email) and public (forum)
messages between the BBSes in the network.

In Russia, there was a somewhat eccentric, very outspoken enthusiast of
upgrading FidoNet to use web protocols and capabilities. Apparently, he's
still active in developing "Fido 2.0":
[https://github.com/Mithgol](https://github.com/Mithgol)

~~~
JdeBP
It's slightly misleading to refer to Fidonet only in the context of Russia. It
was popular in quite a lot of places around the world, not just Russia. Not
even _principally_ Russia, in its heyday.

These things are definitely systems to learn from, both their architectures
and their histories; and people have already been drawing parallels to Usenet
on this very page, notice.

~~~
georgecmu
When I said that Fidonet was a very popular alternative to internet in Russia
as late as 1990s, I didn't mean that it was limited to Russia, but that in
Russia particularly (well, FSU) it was still popular even in late 90s, while
elsewhere in the world it was subsumed by internet.

------
chc4
This sounds like what I wanted from GNU Social when I first joined over a year
ago. GNU Social/Mastodon is a fun idea, but it falls apart when you realise
that you still don't own your content and it's functionally impossible to
switch nodes like it advertised, along with federation being a giant mess.

I tried to switch what server my account was on halfway through my GNU Social
life, and you just can't; all your followers are on the old server, all your
tweets, and there is no way to say "I'm still the same person". I didnt
realise I wanted cryptographic identity and accounts until I tried to actually
use the alternative.

That's also part of the interest I have in something like Urbit, which has an
identity system centered on public keys forming a web of trust, which also
lets you have a reputation system and ban spammers which you can't do easily
with a pure DHT.

~~~
rocky1138
Not being able to switch nodes pushes you to try and host your own instead.
That's what I've done. IMO we should instead be looking at packaging a self-
hosted version into a native Windows and Mac app. Run it in the background and
everything's done.

~~~
s_kilk
And then your laptop gets stolen and everything's gone.

~~~
rakoo
If I understand scuttlebut correctly, your stuff is broadcasted to whoever it
might concern _and_ the pubs. If you still have your private key, you should
still be able to access whatever is in the ether, right ? You somehow become
the recipient of your own messages. Only the thief will also have access to
your private key, so the account can be considered compromised.

~~~
staltz
This is a valid point and I don't trust my computer. I would trust, however, a
Ledger Wallet [http://ledgerwallet.com/](http://ledgerwallet.com/) and
theoretically and economically it's feasible to have a Ledger wallet app to
sign every SSB message. This would be awesome to have.

------
the8472
> However, to get access to the DHT in the first place, you need to connect to
> a bootstrapping server, such as router.bittorrent.com:6881 or
> router.utorrent.com:6881

This is a common misunderstanding. You do not _need_ to use those nodes to
bootstrap. Most clients simply choose to because it is the most convenient way
to do so on the given substrate (the internet). DHTs are in no way limited to
specific bootstrap nodes, _any_ node that can be contacted can be used to join
the network, the protocol itself is truly distributed.

If the underlying network provides some hop-limited multicast or anycast a DHT
could easily bootstrap via such queries. In fact, bittorrent clients already
implement multicast neighbor discovery which under some circumstances can
result in joining the DHT without any hardcoded bootstrap node.

~~~
glasz
to me it always sounds like approaches like dht are the solution but i'm
having difficulties diving into it for the purpose of implementing it for my
own apps.

are there any noteworthy resources for non-academics to get started?

~~~
jude-
Speaking as an academic who studies distributed systems, my advise is to stay
away from anything that relies on a public DHT to work correctly. They're
vulnerable to node churn, Sybil attacks, and routing attacks.

The last two are particularly devastating. Even if the peers had a key/value
whitelist and hashes (e.g. like a .torrent file), an adversary can still
insert itself into the routing tables of honest nodes and prevent peers from
ever discovering your key/value pairs. Moreover, they can easily spy on
everyone who tries to access them. It is estimated [1] that 300,000 of the
BitTorrent DHT's nodes are Sybils, for example.

[1]
[https://www.cl.cam.ac.uk/~lw525/publications/security.pdf](https://www.cl.cam.ac.uk/~lw525/publications/security.pdf)

~~~
the8472
In practice none of those attacks have yet reached a level of concern for
bittorrent developers to deploy serious countermeasures. Torrents generally
are considered public data, _especially_ those made available through the DHT,
and provide peer exchange which allows near-complete extraction of peer lists
anyway, so it hardly introduces any new privacy leaks. Although maintaining
secrecy while exchanging data over public infrastructure is desirable, that
can be achieved by encrypting the payload instead of obscuring the fact that
you participated in the network at all.

BEP42[0] has been implemented by many clients and yet nobody has felt the need
to actually switching to enforcement mode.

All that is the result of the bittorrent DHT being a low-value target. It does
not contain any juicy information and is just one of multiple peer discovery
mechanisms, so there's some redundancy too.

[0]
[http://bittorrent.org/beps/bep_0042.html](http://bittorrent.org/beps/bep_0042.html)

~~~
jude-
> Although maintaining secrecy while exchanging data over public
> infrastructure is desirable, that can be achieved by encrypting the payload
> instead of obscuring the fact that you participated in the network at all.

If I'm "in" on the sharing, then I learn the IP addresses (and ISPs and
proximate locations) of the other people downloading the shared file.
Moreover, if I control the right hash buckets in the DHT's key space, I can
learn from routing queries who's looking for the content (even if they haven't
begun to share it yet). Encryption alone does not make file-sharing a private
affair.

> BEP42[0] has been implemented by many clients and yet nobody has felt the
> need to actually switching to enforcement mode.

It also does not appear to solve the problem. The attacker only needs to get
control of hash buckets to launch routing attacks. Even with a small number of
unchanging node IDs, the attacker is still free to insert a pathological
sequence of key/value pairs to bump hash buckets from other nodes to them.

> All that is the result of the bittorrent DHT being a low-value target. It
> does not contain any juicy information and is just one of multiple peer
> discovery mechanisms, so there's some redundancy too.

Are you suggesting that high-value apps should not rely on a DHT, then?

~~~
the8472
> Encryption alone does not make file-sharing a private affair.

Someone who is "in" on encrypted content can observe the swarm anyway, thus
gains very little from performing snooping on a DHT. On the other hand a
passive DHT observer who is not "in" will be hampered by not knowing what
content is shared, he only sees participation in opaque hashes. Additionally
payload encryption adds deniability because anyone can transfer the ciphertext
but participants won't know whether others have the necessary keys to decrypt
it.

What I'm saying is that any information leakage via the DHT (compared to
public trackers and PEX) is quite small, and this small loss can be more than
made up by adding payload encryption.

> the attacker is still free to insert a pathological sequence of key/value
> pairs to bump hash buckets from other nodes to them.

There is no bumping in kademlia with unbounded node storage. And clients with
limited storage can make bumping very hard for others with oldest-first and
one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.

> Are you suggesting that high-value apps should not rely on a DHT, then?

No, they should use DHT as a bootstrap mechanism of easy-to-replicate,
difficult-to-disrupt small bits of information (e.g. peer contacts as in
bittorrent) which then run their own content-specific gossip network for the
critical content. In some contexts it can also make sense to make reverse
lookups difficult, so attackers won't know what to disrupt unless they're
already part of some group.

~~~
jude-
> Someone who is "in" on encrypted content can observe the swarm anyway, thus
> gains very little from performing snooping on a DHT.

I can see that this thread is getting specific to Bittorrent, and away from
DHTs in general. Regardless, I'm not sure if this is the case. Please correct
me if I'm wrong:

* If I can watch requests on even a single copy of a single key/value pair in the DHT, I can learn some of the IP addresses asking for it (and when they ask for it).

* If I can watch requests on all copies of the key/value pair, then I can learn all the interested IP addresses and the times when they ask.

* If I can do this for the key/value pairs that make up a .torrent file, then I can (1) get the entire .torrent file and learn the list of file hashes, and (2) find out the IPs who are interested in the .torrent file.

* If I can then observe any of the key/value pairs for the .torrent file hashes, then I can learn which IPs are interested in and can serve the encrypted data (and the times at which they do so).

This does not strike me as "quite small," but that's semantics.

> There is no bumping in kademlia with unbounded node storage. And clients
> with limited storage can make bumping very hard for others with oldest-first
> and one-per-subnet policies, i.e. bumping the attackers instead of genuine
> keys.

Yes, the DHT nodes can employ heuristics to try to stop this, just like how
BEP42 is a heuristic to thwart Sybils. But that's not the same as solving the
problem. Applications that need to be reliable have to be aware of these
limits, and anticipate them in their design.

> No, they should use DHT as a bootstrap mechanism of easy-to-replicate,
> difficult-to-disrupt small bits of information (e.g. peer contacts as in
> bittorrent) which then run their own content-specific gossip network for the
> critical content. In some contexts it can also make sense to make reverse
> lookups difficult, so attackers won't know what to disrupt unless they're
> already part of some group.

This kind of proves my point. You're recommending that applications not rely
on DHTs, but instead use their own content-specific gossip network.

To be fair, I'm perfectly okay with using DHTs as one of a family of solutions
for addressing one-off or non-critical storage problems (like bootstrapping).
But the point I'm trying to make is that they're not good for much else, and
developers need to be aware of these limits if they want to use a DHT for
anything.

EDIT: formatting

~~~
the8472
> This does not strike me as "quite small," but that's semantics.

It is quite small because bittorrent needs to use some peer source. If you're
not using the DHT you're using a tracker. The same information that can be
obtained from the DHT can be obtained from trackers. So there's no novel
information leakage introduced by the DHT.

That's why the DHT does not really pose a big information leak.

> This kind of proves my point. You're recommending that applications not rely
> on DHTs, but instead use their own content-specific gossip network.

That's not what I said. Relying on a DHT for some parts, such as bootstrap and
discovery is still... well... relying on it, for things it is good at.

> But the point I'm trying to make is that they're not good for much else, and
> developers need to be aware of these limits if they want to use a DHT for
> anything.

Well yes, but these limits arise naturally anyway since A stores data for B on
C and you can't really incentivize C to manage anything more than small bits
of data.

> I can see that this thread is getting specific to Bittorrent

About DHTs in general, you can easily make reverse lookups difficult or
impossible by hashing the keys (bittorrent doesn't because the inputs already
are hashes), you can obfuscate lookups by making them somewhat off-target
until they're close to the target and making data-lookups and maintenance
lookups indistinguishable. You can further add plausible deniability by by
replaying recently-seeing lookups when doing maintenance of nearby buckets.

~~~
jude-
> It is quite small because bittorrent needs to use some peer source. If
> you're not using the DHT you're using a tracker. The same information that
> can be obtained from the DHT can be obtained from trackers. So there's no
> novel information leakage introduced by the DHT.

Replacing a tracker with a DHT trades having one server with all peer and
chunk knowledge with N servers with partial peer and chunk knowledge. If the
goal is to stop unwanted eavesdroppers, then the choice is between (1)
trusting that a single server that knows everything will not divulge
information, or (2) trusting that an unknown, dynamic number of servers that
anyone can run (including the unwanted eavesdroppers) will not divulge partial
information.

The paper I linked up the thread indicates that unwanted eavesdroppers can
learn a lot about the peers with choice (2) by exploiting the ways DHTs
operate. Heuristics can slow this down, but not stop it. With choice (1), it
is possible to _fully stop unwanted eavesdroppers_ if peers can trust the
tracker and communicate with it confidentially. There is no such possibility
with choice (2) if the eavesdropper can run DHT nodes.

> That's not what I said. Relying on a DHT for some parts, such as bootstrap
> and discovery is still... well... relying on it, for things it is good at.

> Well yes, but these limits arise naturally anyway since A stores data for B
> on C and you can't really incentivize C to manage anything more than small
> bits of data.

Thank you for clarifying. Would you agree that reliable bootstrapping and
reliable stead-state behavior are two separate concerns in the application?
I'm mainly concerned with the latter; I would never make an application's
steady-state behavior dependent on a DHT's ability to keep data available. In
addition, bootstrapping information like initial peers and network settings
can be obtained through other channels (e.g. DNS servers, user-given
configuration, multicasting), which further decreases the need to rely on
DHTs.

> About DHTs in general, you can easily make reverse lookups difficult or
> impossible by hashing the keys (bittorrent doesn't because the inputs
> already are hashes), you can obfuscate lookups by making them somewhat off-
> target until they're close to the target and making data-lookups and
> maintenance lookups indistinguishable. You can further add plausible
> deniability by by replaying recently-seeing lookups when doing maintenance
> of nearby buckets.

I'm not quite sure what you're saying here, but it sounds like you're saying
that a peer can obfuscate lookups by adding "noise" (e.g. doing additional,
unnecessary lookups). If so, then my reply would be this only increases the
number of samples an eavesdropper needs to make to unmask a peer. To truly
stop an eavesdropper, a peer needs to ensure that queries are uniformly
distributed in both space and time. This would significantly slow down the
peer's queries and consume a lot of network bandwidth, but it would stop the
eavesdropper. I don't know of any production system that does this.

~~~
the8472
> If the goal is to stop unwanted eavesdroppers, then the choice is between
> (1) trusting that a single server that knows everything will not divulge
> informatio

In practice trackers do divulge all the same information that can be gleaned
from the DHT and so does PEX in a bittorrent swarm. Those are far more
convenient to harvest.

> I'm not quite sure what you're saying here, but it sounds like you're saying
> that a peer can obfuscate lookups by adding "noise" (e.g. doing additional,
> unnecessary lookups).

That's only 2 of 4 measures I have listed. And I would mention encryption
again as a 5th. The others: a) Opportunistically creating decoys by having
_others_ repeat lookups they have recently seen as part of their routing table
maintenance b) storing data in the DHT in a way that requires some prior
knowledge to be useful, which will ideally result in the only leaking
information when the listener could obtain the information anyway if he has
that prior knowledge.

There's a lot you can do to harden DHTs. I agree that naive implementations
are trivial to attack, but to my knowledge it is possible to achieve byzantine
fault tolerance in a DHT in principle, it's just that nobody has actually
needed that level of defense yet, attacks in the wild tend to be fairly
primitive and only succeed because some implementations are very sloppy about
sanitizing things.

> To truly stop an eavesdropper, a peer needs to ensure that queries are
> uniformly distributed in both space and time.

Not quite. You only need to increase the number of samples needed beyond the
number of samples a peer is likely to generate during some lifecycle, and that
is not just done by adding more traffic.

> Would you agree that reliable bootstrapping and reliable stead-state
> behavior are two separate concerns in the application?

Certainly, but bootstrapping is a task that you do more frequently than you
think. You don't just join a global overlay once, you also (re)join many sub-
networks throughout each session or look for specific nodes. DHT is a bit like
DNS. You only need it once a day for a domain (assuming long TTLs), and it's
not exactly the most secure protocol and afterwards you do the heavy
authentication lifting with TLS, but DNS is still important, even if it you're
not spending lots of traffic on it.

~~~
jude-
> In practice trackers do divulge all the same information that can be gleaned
> from the DHT and so does PEX in a bittorrent swarm. Those are far more
> convenient to harvest.

I'm confused. I can configure a tracker to only communicate with trusted
peers, and do so over a confidential channel. The tracker is assumed to not
leak peer information to external parties. A DHT can do neither of these.

> That's only 2 of 4 measures I have listed. And I would mention encryption
> again as a 5th. The others: a) Opportunistically creating decoys by having
> others repeat lookups they have recently seen as part of their routing table
> maintenance b) storing data in the DHT in a way that requires some prior
> knowledge to be useful, which will ideally result in the only leaking
> information when the listener could obtain the information anyway if he has
> that prior knowledge.

Unless the externally-observed schedule of key/value requests is statistically
random in time and space, the eavesdropper can learn with better-than-random
guessing which peers ask for which chunks. Neither (a) nor (b) address this;
they simply increase the number of samples required.

> There's a lot you can do to harden DHTs. I agree that naive implementations
> are trivial to attack, but to my knowledge it is possible to achieve
> byzantine fault tolerance in a DHT in principle, it's just that nobody has
> actually needed that level of defense yet, attacks in the wild tend to be
> fairly primitive and only succeed because some implementations are very
> sloppy about sanitizing things.

First, no system can tolerate Byzantine faults if over a third of its nodes
are hostile. If I can Sybil a DHT, then I can spin up arbitrarily many evil
nodes. Are we assuming that no more than one third of the DHT's nodes are
evil?

Second, "nobody has actually needed that level of defense yet" does not mean
that it is a sound decision for an application to use a DHT with the
expectation that the problems will never occur. So the maxim goes, "it isn't a
problem, until it is." As an application developer, I want to be prepared for
what happens when it is a problem, especially since the problems are known to
exist and feasible to exacerbate.

> Not quite. You only need to increase the number of samples needed beyond the
> number of samples a peer is likely to generate during some lifecycle, and
> that is not just done by adding more traffic.

I'm assuming that peers are arbitrarily long-lived. Real-world distributed
systems like BitTorrent and Bitcoin aspire to this.

> Certainly, but bootstrapping is a task that you do more frequently than you
> think. You don't just join a global overlay once, you also (re)join many
> sub-networks throughout each session or look for specific nodes. DHT is a
> bit like DNS. You only need it once a day for a domain (assuming long TTLs),
> and it's not exactly the most secure protocol and afterwards you do the
> heavy authentication lifting with TLS, but DNS is still important, even if
> it you're not spending lots of traffic on it.

I take issue with saying that "DHTs are like DNS", because they offer
fundamentally different data consistency guarantees and availability
guarantees (even Beehive (DNS over DHTs) is vulnerable to DHT attacks that do
not affect DNS).

Regardless, I'm okay with using a DHT as one of many supported bootstrapping
mechanisms. I'm not okay with using it as the sole mechanism or even the
primary mechanism, since they're so easy to break when compared to other
mechanisms.

~~~
the8472
> I'm confused. I can configure a tracker to only communicate with trusted
> peers, and do so over a confidential channel. The tracker is assumed to not
> leak peer information to external parties. A DHT can do neither of these.

But then you are running a private tracker for personal/closed group use and
have a trust source. If you have a trust source you could also run a closed
DHT. But the bittorrent DHT is public infrastructure and best compared to
public trackers.

> I'm assuming that peers are arbitrarily long-lived. Real-world distributed
> systems like BitTorrent and Bitcoin aspire to this.

Physical machines are. Their identities (node IDs, IP addresses) and the
content they participate in at any given time don't need to be.

> If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes.

This can be made costly. In the extreme case you could require a bitcoin-like
proof of work system for node identities. But that would be wasteful... unless
you're running some coin network anyway, then you can tie your ID generation
to that. In lower-value targets IP prefixes tend to be costly enough to thwart
attackers. If an attacker can muster the resources to beat that he would also
have enough unique machines at his disposal to perform a DoS on more
centralized things.

> Are we assuming that no more than one third of the DHT's nodes are evil?

Assuming is the wrong word. I think approaching BFT is simply part of what you
do to harden a DHT against attackers.

> Second, "nobody has actually needed that level of defense yet" does not mean
> that it is a sound decision for an application to use a DHT with the
> expectation that the problems will never occur.

I haven't said that. I'm saying that simply because this kind of defense was
not yet needed nobody tried to build it, as simple as that. Sophisticated
security comes with implementation complexity, that's why we had HTTP for ages
before HTTPS adoption was spurred by the snowden leaks.

> Neither (a) nor (b) address this; they simply increase the number of samples
> required.

(b) is orthogonal to sampling vs. noise.

> I'm not okay with using it as the sole mechanism or even the primary
> mechanism, since they're so easy to break when compared to other mechanisms.

What other mechanisms do you have in mind? Most that I am aware of don't offer
the same O(log n) node-state and lookup complexity in a distributed manner.

~~~
jude-
> But then you are running a private tracker for personal/closed group use and
> have a trust source. If you have a trust source you could also run a closed
> DHT. But the bittorrent DHT is public infrastructure and best compared to
> public trackers.

You're ignoring the fact that with a public DHT, the eavesdropper has the
power to reroute requests through networks (s)he can already watch. With a
public tracker, the eavesdropper needs vantage points in the tracker's network
to gain the same insights.

If we're going to do an apples-to-apples comparison between a public tracker
and a public DHT, then I'd argue that they are equivalent only if:

(1) the eavesdropper cannot add or remove nodes in the DHT; (2) the
eavesdropper cannot influence other nodes' routing tables in a non-random way.

> This can be made costly. In the extreme case you could require a bitcoin-
> like proof of work system for node identities. But that would be wasteful...
> unless you're running some coin network anyway, then you can tie your ID
> generation to that. In lower-value targets IP prefixes tend to be costly
> enough to thwart attackers. If an attacker can muster the resources to beat
> that he would also have enough unique machines at his disposal to perform a
> DoS on more centralized things.

Funny you should mention this. At the company I work part-time for
(blockstack.org), we thought of doing this very thing back when the system
still used a DHT for storing routing information.

We had the additional advantage of having a content whitelist: each DHT key
was the hash of its value, and each key was written to the blockchain.
Blockstack ensured that each node calculated the same whitelist. This meant
that inserting a key/value pair required a transaction, and the number of
key/value pairs could grow no faster than the blockchain.

This was not enough to address data availability problems. First, the attacker
would still have the power to push hash buckets onto attacker-controlled nodes
(it would just be expensive). Second, the attacker could still join the DHT
and censor individual routes by inserting itself as neighbors of the target
key/value pair replicas.

The best solution we came up with was one whereby DHT node IDs would be
derived from block headers (e.g. deterministic but unpredictable), and
registering a new DHT node would require an expensive transaction with an
ongoing proof-of-burn to keep it. In addition, our solution would have
required that every K blocks, the DHT nodes would deterministically re-
shuffled their hash buckets among themselves in order to throw off any
encroaching routing attacks.

We ultimately did not do this, however, because having the set of whitelisted
keys growing at a fixed rate afforded a much more reliable solution: have each
node host a 100% replica of the routing information, and have nodes arrange
themselves into a K-regular graph where each node selects neighbors via a
random walk and replicates missing routing information in rarest-first order.
We have published details on this here:
[https://blog.blockstack.org/blockstack-
core-v0-14-0-release-...](https://blog.blockstack.org/blockstack-
core-v0-14-0-release-aad748f46d).

> Assuming is the wrong word. I think approaching BFT is simply part of what
> you do to harden a DHT against attackers.

If you go for BFT, you have to assume that no more than f of 3f+1 nodes are
faulty. Otherwise, the malicious nodes will always be able to prevent the
honest nodes from reaching agreement.

> I haven't said that. I'm saying that simply because this kind of defense was
> not yet needed nobody tried to build it, as simple as that. Sophisticated
> security comes with implementation complexity, that's why we had HTTP for
> ages before HTTPS adoption was spurred by the snowden leaks.

Right. HTTP's lack of security wasn't considered a problem, until it was.
Websites addressed this by rolling out HTTPS in droves. I'm saying that in the
distributed systems space, DHTs are the new HTTP.

> What other mechanisms do you have in mind? Most that I am aware of don't
> offer the same O(log n) node-state and lookup complexity in a distributed
> manner.

How about an ensemble of bootstrapping mechanisms?

* give the node a set of initial hard-coded neighbors, and maintain those neighbors yourself.

* have the node connect to an IRC channel you maintain and ask an IRC bot for some initial neighbors.

* have the node request a signed file from one of a set of mirrors that contains a list of neighbors.

* run a DNS server that lists currently known-healthy neighbors.

* maintain a global public node directory and ship it with the node download.

I'd try all of these things before using a DHT.

EDIT: formatting

~~~
the8472
> You're ignoring the fact that with a public DHT, the eavesdropper has the
> power to reroute requests through networks (s)he can already watch.

But in the context of bittorrent that is not necessary if we're still talking
about information leakage. The tracker + pex gives you the same, and more,
information than watching the DHT.

> we thought of doing this very thing back when the system still used a DHT
> for storing routing information.

The approaches you list seem quite reasonable when you have a PoW system at
your disposal.

> have each node host a 100% replica of the routing information, and have
> nodes arrange themselves into a K-regular graph

This is usually considered too expensive in the context of non-
coin/-blockchain p2p networks because you want nodes to be able to run on
embedded and other resource-constrained devices. The O(log n) node state and
bootstrap cost limits are quite important. Otherwise it would be akin to
asking every mobile phone to keep up to date with the full BGP route set.

> assume that no more than f of 3f+1 nodes are faulty. Otherwise, the
> malicious nodes will always be able to prevent the honest nodes from
> reaching agreement.

Of course, but for some applications that is more than good enough. If your
adversary can bring enough resources to bear to take over 1/3rd of your
network he might as well DoS any target he wants. So you would be facing
massive disruption _anyway_. I mean blockchains lose some of their security
guarantees too once someone manages to dominate 1/2 of the mining capacity.
Same order of magnitude. It's basically the design domain "secure, up to point
X".

> I'm saying that in the distributed systems space, DHTs are the new HTTP.

I can agree with that, but I think the S can be tacked on once people feel the
need.

> How about an ensemble of bootstrapping mechanisms?

The things you list don't really replace the purpose of a DHT. A dht is a key-
value store for many keys and a routing algorithm to find them in a
distributed environment. What you listed just gives you a bunch of nodes, but
no data lookup capabilities. Essentially you're listing things that could be
used to bootstrap into a DHT, not replacing the next layer services provided
by a DHT.

~~~
jude-
> This is usually considered too expensive in the context of non-
> coin/-blockchain p2p networks because you want nodes to be able to run on
> embedded and other resource-constrained devices. The O(log n) node state and
> bootstrap cost limits are quite important. Otherwise it would be akin to
> asking every mobile phone to keep up to date with the full BGP route set.

Funny you should mention BGP. We have been approached by researchers at
Princeton who are interested in doing something like that, using Blockstack
(but to be fair, they're more interested in giving each home router a copy of
the global BGP state).

I totally hear you regarding the costly bootstrapping. In Blockstack, for
example, we expect most nodes to sync up using a recent signed snapshot of the
node state and then use SPV headers to download the most recent transactions.
It's a difference between minutes and days for booting up.

> Of course, but for some applications that is more than good enough. If your
> adversary can bring enough resources to bear to take over 1/3rd of your
> network he might as well DoS any target he wants. So you would be facing
> massive disruption anyway.

Yes. The reason I brought this up is that in the context of public DHTs, it's
feasible for someone to run many Sybil nodes. There's some very recent work
out of MIT for achieving BFT consensus in open-membership systems, if you're
interested:
[https://arxiv.org/pdf/1607.01341.pdf](https://arxiv.org/pdf/1607.01341.pdf)

> I mean blockchains lose some of their security guarantees too once someone
> manages to dominate 1/2 of the mining capacity. Same order of magnitude.
> It's basically the design domain "secure, up to point X".

In Bitcoin specifically, the threshold for tolerating Byzantine miners is 25%
hash power. This was one of the more subtle findings from Eyal and Sirer's
selfish mining paper.

> The things you list don't really replace the purpose of a DHT. A dht is a
> key-value store for many keys and a routing algorithm to find them in a
> distributed environment. What you listed just gives you a bunch of nodes,
> but no data lookup capabilities. Essentially you're listing things that
> could be used to bootstrap into a DHT, not replacing the next layer services
> provided by a DHT.

If the p2p application's steady-state behavior is to run its own overlay
network and use the DHT only for bootstrapping, then DHT dependency can be
removed simply by using the systems that bootstrap the DHT in order to
bootstrap the application. Why use a middle-man when you don't have to?

~~~
the8472
> If the p2p application's steady-state behavior is to run its own overlay
> network and use the DHT only for bootstrapping, then DHT dependency can be
> removed simply by using the systems that bootstrap the DHT in order to
> bootstrap the application. Why use a middle-man when you don't have to?

It seems like we have a quite different understanding how DHTs are used,
probably shaped by different use-cases. Let me see if I can summarize yours
correctly: a) over time nodes will be interested or have visited in a large
proportion of the keyspace b) it makes sense to eventually replicate the whole
dataset c) the data mutation rate is relatively low d) access to the keyspace
is extremely biased, there is some subset of keys that almost all nodes will
access. Is that about right?

In my case this is very different. Node turnover is high (mean life time
<24h), data is volatile (mean lifetime <2 hours), nodes are only ever
interested in a tiny fraction of the keyspace (<0.1%), nodes access random
subsets of the keyspace, so there's little overlap in their behavior. The data
would become largely obsolete before you even replicated half the DHT unless
you spent a lot of overhead on keeping up with hundreds of megabytes of churn
per hour and you would never use most of it.

So for you there's just "bootstrap dataset" and then "expend a little effort
to keep the whole replica fresh". For me there's really "bootstrap into the
dht", "maintain (tiny) routing table" and then "read/write random access to
volatile data on demand, many times a day".

This is why the solutions you propose are no solutions for a general DHT which
can also cope with high churn.

~~~
jude-
> It seems like we have a quite different understanding how DHTs are used,
> probably shaped by different use-cases. Let me see if I can summarize yours
> correctly: a) over time nodes will be interested or have visited in a large
> proportion of the keyspace b) it makes sense to eventually replicate the
> whole dataset c) the data mutation rate is relatively low d) access to the
> keyspace is extremely biased, there is some subset of keys that almost all
> nodes will access. Is that about right?

Agreed on (a), (b), and (c). In (a), the entire keyspace will be visited by
each node, since they have to index the underlying blockchain in order to
reach consensus on the state of the system (i.e. each Blockstack node is a
replicated state machine, and the blockchain encodes the sequence of state-
transitions each node must make). (d) is probably correct, but I don't have
data to back it up (e.g. because of (b), a locally-running application node
accesses its locally-hosted Blockstack data, so we don't ever see read
accesses).

> In my case this is very different. Node turnover is high (mean life time
> <24h), data is volatile (mean lifetime <2 hours), nodes are only ever
> interested in a tiny fraction of the keyspace (<0.1%), nodes access random
> subsets of the keyspace, so there's little overlap in their behavior. The
> data would become largely obsolete before you even replicated half the DHT
> unless you spent a lot of overhead on keeping up with hundreds of megabytes
> of churn per hour and you would never use most of it.

Thank you for clarifying. Can you further characterize the distribution of
reads writes over the keyspace in your use-case? (Not sure if you're referring
to the Bittorrent DHT behavior in your description, so apologies if these
questions are redundant). For example:

* Are there a few keys that are really popular, or are keys equally likely to be read?

* Do nodes usually read their own keys, or do they usually read other nodes' keys?

* Is your DHT content-addressable (e.g. a key is the hash of its value)? If so, how do other nodes discover the keys they want to read?

* If your DHT is _not_ content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?

~~~
the8472
> Not sure if you're referring to the Bittorrent DHT

I am, but that's not even that important because storing a blockchain history
is a very special usecase because you're dealing with an append-only data
structure. There are no deletes or random writes. Any DHT used for p2p chat,
file sharing or some mapping of identity -> network address will experience
more write-heavy, random access workloads.

> Are there a few keys that are really popular, or are keys equally likely to
> be read?

Yes, some are more popular than others, but the bias is not strong compared to
the overall size of the network. 8M+ nodes. Key popularity may range from 1 to
maybe 20k. And such peaks are transient, mostly for new content.

> Do nodes usually read their own keys, or do they usually read other nodes'
> keys?

It is extremely unlikely that nodes are interested in the data for which they
provide storage.

> Is your DHT content-addressable (e.g. a key is the hash of its value)?

Yes and no, it depends on the remote procedure call used. Generic immutable
get/put operations are. Mutable ones use the hash of the pubkey. Peer address
list lookups use the hash of an external value (from the torrent).

> * If your DHT is not content-addressable, how do you deal with inconsistent
> writes during a partition? More importantly, how do you know the value given
> back by a remote node is the "right" value for the key?

For peer lists it maintains a list of different values from multiple
originators, the value is the originator's IP, so it can't be easily spoofed
(3-way handshake for writes). A store adds a single value, a get returns a
list.

For mutable stores the value -> signature -> pubkey -> dht key is checked.

------
EGreg
Yes, this guy gets it. This community gets it.

Not everything needs a global singleton like a blockchain or DHT or a DNS
system. Bitcoin needs this because of the double-spend problem. But private
chats and other such activities don't.

I have been working on this problem since 2011. I can tell you that peer-to-
peer is fine for asynchronous feeds that form tree based activities, which is
quite a lot of things.

But everyday group activities usually require some central authority for that
group, at least for the ordering of messages. A "group" can be as small as a
chess game or one chat message and its replies. But we haven't solved mental
poker well for N people yet. (Correct me if I am wrong.)

The goal isn't to not trust anyone for anything. After all, you still trust
the user agent app on your device. The goal is to _control_ where your data
lives, and not have to rely on any particular _connections_ to eg the global
internet, to communicate.

Btw ironic that the article ends _" If you liked this article, consider
sharing (tweeting) it to your followers"_. In the feudal digital world we live
in today, most people speak must speak a mere 140 characters to "their"
followers via a centralized social network with huge datacenters whose
engineers post on highscalability.com .

If you are interested, here I talk about it further in depth:

[https://youtu.be/WzMm7-j7yIY](https://youtu.be/WzMm7-j7yIY)

~~~
ytjohn
I have been researching along these same lines for a while now as well, ad-
hoc/mesh network messaging. My use case would be an amateur radio mesh
network. For a while, I was investigating running matrix.org servers on
raspberry pis, connected to a mesh network without internet. And that does
work, the closest I've come to a great solution.

But I had never heard of scuttlebut until now. This looks even more ideal. In
amateur radio, everyone self identifies with their call sign, this follows the
same model.

For amateur radio, there is a restriction against encryption (intent to
obscure or hide the message), but the public messages would be fine. Private
messages (being encrypted for only those the right keys) might be a legal
issue, so for a legit amateur radio deployment, the client would have to
disable that (or at least operators would have to be educated that private
messages may violate fcc rules).

------
DenisM
My friends and I have thought this through in detail a while ago, and have a
few suggestions to make. I hope you make the best of it!

 _Distributed identity_

Allow me to designate trusted friends / custodians. Store fractions of my
private key with them, so that they can rebuild the key if I lost mine. They
should also be able to issue a "revocation as of certain date" if my key is
compromised, and vouch for my new key being a valid replacement of the old
key. So my identity becomes "Bob Smith from Seattle, friend of Jane Doe from
Portland and Sally X from Redmond". My social circle is my identity! Non-
technical users will not even need to know what private key / public key is.

 _Relays_

Introduce a notion of the "relay" server - a server where I will register my
current IP address for direct p2p connection, or pick my "voicemail" if I
can't be reach right away. I can have multiple relays. So my list of friends
is a list of their public keys and their relays as best I know them. Whenever
I publish new content, the software will aggressively push the data to each of
my friends / subscribers. Each time my relay list is updated, it also gets
pushed to everyone. If I can't find my friend's relay, I will query our mutual
friends to see if they know where to find my lost friend.

 _Objects_

There should be a way to create handles for real-life objects and locations.
Since many people will end up creating different entries for the same object,
there should be a way for me to record in my log that guid-a and guid-b refer
to the same restaurant in my opinion. As well I could access similar opinion
records made by my friends, or their friends.

 _Comments_

Each post has an identity, as does each location. My friends can comment on
those things in their own log, but I will only see these comments if I get to
access those posts / locations myself (or I go out of my way to look for
them). This way I know what my friends think of this article or this
restaurant. Bye-bye Yelp, bye-bye fake Amazon reviews.

 _Content Curation_

I will subscribe to certain bots / people who will tell me that some pieces of
news floating around will be a waste of my time or be offensive. Bye-bye
clickbait, bye-bye goatse.

 _Storage_

Allow me to designate space to store my friend's encrypted blobs for them.
They can back up their files to me, and I can backup to them.

~~~
fossuser
For the distributed identity piece is there a good reason not to rely on
keybase.io?

Also important that an initial smaller community would be targeted and that it
would succeed there. FB did this will colleges, a federated one would in a
world where FB already exists would have an even harder time.

~~~
platz
> distributed identity piece is there a good reason not to rely on keybase.io

Depends on your definition of "distributed", I suppose

~~~
pcmonk
My impression was that keybase is distributed? Can it be used without talking
to keybase's servers?

~~~
cjbprime
The Keybase server manages giving out usernames, and recording the proof URLs
for users, and then your client hits the URLs, checks that the proofs are
signed with the appropriate key, and caches them to watch for future
discrepancies.

Keybase offers decentralized trust, in that the Keybase server can't lie to
you about someone's keys -- your Keybase client will trust their public proofs
and not the Keybase server -- but it's not a distributed/decentralized service
as a whole, because you still receive hints from the server about where proofs
live, and learn Keybase usernames from it.

(I work at Keybase.)

~~~
mundo
It looks like SSB is crying out for Keybase integration. Any plans for being
able to add a SSB identity to my Keybase account?

~~~
cjbprime
Dunno! Would encourage SSB users to post the request to
[https://github.com/keybase/keybase-
issues/issues/518](https://github.com/keybase/keybase-issues/issues/518).

------
rattray
Bit of feedback: when you download the desktop application, it prompts for a
desired name, image, and description.

It's unclear whether this can be changed later, and I'm not yet sure whether I
want to use my real identity or a throwaway.

After creating an account with the default ¿randomly? generated name, I tried
to use an invite obtained from
[http://198.211.122.115/invited](http://198.211.122.115/invited) which was
linked from [https://github.com/staltz/easy-ssb-
pub](https://github.com/staltz/easy-ssb-pub).

All I got back was "An error occured (sic) while attempting to redeem invite.
could not connect to sbot"

It worked with [http://pub.locksmithdon.net/](http://pub.locksmithdon.net/)
though I feel a bit odd trusting a "locksmith" I've never heard of to stream
lots of data to my harddrive...

It's cool that anyone can host a pub – basically, an instance of
FB/Twitter/Gmail, it seems – but things 1) will get expensive for them, and
it's unclear how they'll fund that – and 2) now I have to trust random people
on the internet – not only to be nice, but also secure.

As a "random technically aware netizen", I honestly trust fooplesoft more,
since they have a multi-billion-dollar reputation to protect. (Not that I
_trust_ fooplesoft).

~~~
substack
You don't need to trust the security of pubs. Validation of messages happens
through cryptographic signing and public messages are public anyways. You also
don't need to trust that pubs will be online much because your followers will
also help host your content.

~~~
rattray
> public messages are public anyways

Right, but someone I trust could have their message corrupted, no?

eg; some political leader intends to write "everybody vote for Alice" and it
is modified to read "everybody vote for Carol". Is this possible?

(I generally trust FB not to do this because their business would suffer if
they were caught, for example – not so with ephemeral pubs)

~~~
cel
Each message includes a signature by its author of its content, so it can't be
falsified (unless the author's key is compromised)

------
fiatjaf
Why do all "social networks" have to be a feed of news? Couldn't anyone think
of anything better than a system in which people are only encouraged to talk
about themselves and try to get other people's approval? In which having more
"friends" is always better, because you have more potential for self-
agrandissement in your narcissistic posts?

~~~
0x445442
The did, it was called Usenet and it was glorious.

~~~
waqf
$$$ MAKE MONEY FAST $$$

What I mean to say is, Usenet's social model hardly prevented it from drowning
in a sea of low-value content.

------
wesleytodd
Since the author didnt mention it, the original creator of the patchwork
project is [https://github.com/pfrazee](https://github.com/pfrazee)

When I used it, which admitedly was a long time ago now, the biggest setback
was lack of cross device identities. So I ended up having two accounts with
two feeds, `wesAtWork` and `wes`. Maybe they have solved this by now.

ps. Does patchwork still have the little gif maker? Because that was a super
fun feature.

~~~
staltz
Cross device identity is still an issue, but not a problem in the foundation.
It's a matter of making client apps (like Patchwork) recognize a message of
type "link this and that account together" and then your friend's app would
automatically follow both accounts and render them as if they are the same
thing. It'll be done eventually in Patchwork.

~~~
wesleytodd
Yeah that is what they were talking about when I was following the project.
Once that is done in patchwork, I might try using it again.

~~~
staltz
It will be a must once mobile is launched, which I'm working on.

~~~
mysticmumble
Is it also possible to use multiple devices without leaking from which device
each message was posted?

~~~
staltz
Well, yes and no. The log will show a different id (public key) which authored
the message. But the _device_ itself (iPhone or Google Nexus or whatever)
doesn't need to be mentioned.

~~~
mysticmumble
That could leak information a user doesn't want to be leaked, like at which
hours he is at work (using the work computer) etc. Which id belongs to which
device could probably be inferred when the service is used actively.

I understand that transparency might not be a design goal or techinically
possible, I'm just raising the concern.

Can't I just share my private key across multiple devices?

~~~
staltz
Nothing stops you from copy-pasting your asymmetric keys (it's a file) to
different devices. I bet it's feasible, the biggest issue is also making sure
your log stays the same, because a log shouldn't get forked.

------
someone7x
This excites me. I'm probably naive, but I always imagine that one day I'll
retire and spend my days trying to work on an open source mesh network (or
something similar). I want future generations to live in a world where 'the
internet' isn't a thing that authorities can grant/deny. A headless social
network is a promising omen of a headless internet.

~~~
crypt0x
> I want future generations to live in a world where 'the internet' isn't a
> thing that authorities can grant/deny.

_THIS_ is very much the spirit of SSB. :)

~~~
johanneskanybal
Did I misunderstand the article or is this only for social feeds? I want
headless internet in general, guess it's a common idea these days.

~~~
floatboth
[https://docs.meshwith.me](https://docs.meshwith.me)

The technology is here, the only thing left is to make people actually use it…

------
vhodges
I've been thinking about this very thing the past few days!

Forgive the rambling, this is the first time I've written any of this down...

My idea is to use email as a transport for 'social attachments' that would be
read using a custom mail client (it remains to be seen if it should be your
regular email client or have it be just your 'social mail' client. But... if
using another client as regular email, users would have to ignore or filter
out social mails). It could also be done as a mimetype handler/viewer for
social attachments.

Advantages of using email: \- Decentralized (can move providers) \- email
address as rendezvous point (simple for users to grasp) \- Works behind
firewalls \- Can work with local (ie Maildir) or remote (imap) mailstores. If
using imap, helps to address the multiple devices issue. Could also use
replication to handle it too (Syncthing, dropbox, etc)

Scuttlebutt looks like a nice alternative though. Will be following closely.

~~~
fiatjaf
email is so complicated. There are probably 3 people in the world who know how
to properly run an email server.

~~~
_jal
That's absurd. True, it has become more complicated than it once was, but
that's every technology that isn't dead.

Granted, I have been running mail for a long time, so I got to learn the
complications as they happened, rather than all at once. But anyone who can
set up a production-quality web server/appserver/DB along with the accessories
that go along with it can handle it.

Now if email isn't important to your business and/or you just don't want to
deal with maintaining it, that's valid. But it just isn't as difficult as a
lot of people seem to want to make it out to be.

~~~
bootload
_" I have been running mail for a long time, so I got to learn the
complications as they happened, rather than all at once. But anyone who can
set up a production-quality web server/appserver/DB along with the accessories
that go along with it can handle it."_

Are you using Sendmail?

~~~
_jal
> Are you using Sendmail?

Not in a long time. I haven't found a situation in which I couldn't use
Postfix in quite a while. Although the occasional sendmail.cf flashback still
hits me.

~~~
bootload
_" the occasional sendmail.cf flashback still hits me."_

that's why I asked.

------
snackai
A hipster living on a self-steering sailing boat has 600 modules published on
NPM. I can't even. Seriously how could this be even more funny?!

~~~
hugozap
I think of hipster as someone who follows (non mainstream) trends, goes to
starbucks, loves Apple products and cannot live without Wifi. Not a hacker who
builds it's own stuff and cares about privacy. But maybe the beard confuses
people.

~~~
btym
It's hard to define the term, but I think at the most basic level, "hipster"
specifically refers to anyone who enjoys being alternative. "Goes to
starbucks, loves Apple products and cannot live without Wifi" are common
qualities of people who _want_ to be hipsters but only because they think it's
cool (and don't actually embrace an alternative lifestyle).

~~~
draw_down
I recommend dropping the term. It's meaningless now, if it ever actually even
meant anything in the first place.

------
itchyjunk
I am not much of a social networking type of person, but I have wondered how
nice it would be to network with a community like HN. For example, I see a
nice comment chain going on in some news article, but as the article dies so
does all the conversation within it.

Maybe it's just me but if I see an article is x+ hours old (15+ for example),
I don't bother commenting.

What type of social networking would HN use for non personal(not for family
and immediate friends) communication? (I've tried hnchat.com, it's mostly
inactive imho)

~~~
jumpkickhit
Yeah, I'd be happy to see an active IRC channel for HN. Though, I doubt many
would use it since IRC is way past being en vogue.

~~~
itchyjunk
I like IRC still. For example, the science based channels in freenode.net is
pretty good. #biology, ##physics and ##math. Might not be the worst of ideas
for a mod to make one for us :s (provided they already have an IRC client
going, it wouldn't be too much added hassle).

------
rattray
So it seems there are two ways to exchange information:

1) be on the same wifi (presumably great for dissidents in countries with
heavy-handed internet control, and inconvenient for everyone else)

2) use "pubs", which can be run on any server, and connected to ¿through the
internet?

So most users would use pubs, which are described as "totally dispensable" (a
nice property). But how can users exchange information about which pub to
subscribe to? Is there a public listing of them?

It seems like the "bootstrapping server" problem (eg; reliance on
router.bittorrent.com:6881) will still exist in practice. For that matter, is
there currently an equivalent to router.bittorrent.com that would serve this
purpose?

This seems like a potentially significant project, and I'm excited by the
possibility that it might actually take off – hence the inquiry.

~~~
staltz
It depends if you want to use it like Twitter (public announcements) or like
Facebook (closed small/medium circles). If you use like Facebook, then it's
enough that one person among your circle of friends (probably the most tech-
savvy one) would host a pub and use that for their friends. You can see how
you would probably be connected to a few pubs, because you usually have
different circles of friends. If you want to use it like Twitter, then indeed
we might need a DHT, but the point there was the _resilience_ of the network.

~~~
rattray
Okay... in the FB case, though, my friend has to send me the IP/DNS address of
their pub somehow, right? eg; Signal?

What about organizing groups, which might currently use Slack? For example,
political dissidents who don't necessarily all know each other personally.
They must use some other communication channel to communicate pubs?

------
cakeface
Can I choose who's content I pass along? I am ok distributing my own feed,
that's presumably why I am joining the network. I am not OK passing along
someone else's hate speech, porn, warez, malware, spam, etc. I'd like to be
able to review the feeds available and say "Yeah sure I'll pass that around."
If everything in a feed is encrypted then I'd need to decide. Also yeah my
brother who's feed I follow and pass may upload a really nasty bit of content
and I may relay it.

~~~
shrimp_emoji
Curating exercises of freedom of speech, eh? Sounds like decentralization
won't lead to more digital freedom after all with attitudes like this.

~~~
edraferi
Your freedom to spread filth is precisely equal to my freedom not to repeat
it.

Also, please remember that the American First Amendment limits Government
speech restrictions. Private communities and individuals can make any rules
they want about social acceptable speech.

------
rb808
I'd like to see all my friends post updates and photos to blogs where I can
subscribe via rss. This would be the best social network for me.

~~~
cookiecaper
You can write a JavaScript component to read out the feed in the background
and transform it into an RSS feed. Current U.S. law prevents someone from
offering this commercially, especially in a SaaS package.

~~~
mr_spothawk
> Current U.S. law prevents someone from offering this commercially,
> especially in a SaaS package.

naturally. where can I learn more?

------
ISL
The storage requirements are tremendous, though, right?

If I want to have access to everything that's been shared with me, I have to
store it all. In the case of images, the storage burden can get large quickly.

~~~
crypt0x
Well.. I've been on there for quite some time, granted it's been not mega
active but here is a rundown of how much it took until now: there is the main
sigchain database, which stores all the messages (following, posts, ....)
which is now 150megs in size and there is the blobs (binary attachments like
images) which is about 500megs in size. YMMV depending on how many catpictures
your friend share ofc.

The flipside to your remark is, that it is fully offline capable and I'm
perfectly happy with that. Also: contrast it with how much space a thunderbird
profile takes up.

~~~
rattray
How would that change if you had, say, 5,000 friends – the fb limit, which
some people do reach – who were posting multimedia content multiple times a
day (which happens on fb)?

Is the protocol set up in such a way as to enable easy, automatic deletion of
old data from local devices, while still storing them for easy search/scroll-
based access on the Pub servers?

------
jdormit
I think I missed something. If information is exchanged when machines are on
the same network, how does the guy in New Zealand get updates from the guy in
Hawaii? Is there a server involved, or does the New Zealand guy have to wait
until he is on a network with someone who has already connected with the
Hawaii guy?

~~~
AljoschaMeyer
(Public) information is not only shared between friends, but also between
friends of friends. So as long as they have common friends connected to the
internet, the data flows without problems.

To help with this situation, the network includes so called pubs. These are
basically bots that run 24/7 and friend people. The article very briefly
mentions them. More information here:
[https://www.scuttlebutt.nz/concepts/pub.html](https://www.scuttlebutt.nz/concepts/pub.html)

~~~
jdormit
Very helpful, thank you!

------
olleromam91
I'm not totally sure how the traffic management works, but what I would like
to know is how services like this will be able to scale? What happens when
there is a Pub with millions of users? Does it creep to a halt? Is there a
need for dedicated Pub machines? If so, Who funds/maintains them? Does this
lead to subscriptions?

Decentralized social networks seems like an inevitable progression as internet
users become more aware of their privacy and ways they can improve online
relationships and ...."social networking"

~~~
staltz
It scales by each "circle of friends" or community having its dedicated pub,
set up by some tech-savvy person. Pubs should be easy enough to setup for
anyone who knows what a VPS is. A community or circle of friends is usually
not millions of people.

For the "social media aspect", like in Twitter, we're looking at making
alternative types of pubs. Imagine having a pub dedicated to only replicating
your content (and no one else's). Or multiple of these. So that whoever wants
to follow (if you're like Elon Musk famous) can just follow one of your pubs.

~~~
olleromam91
I love the idea of a decentralized community. I would just be skeptical that
even 1% of the population knows what a VPS is, and fewer would have the urge
to set one up to talk to their friends. Not suggesting that you need to have a
global audience, but just something I was curious about. Thanks for your
reply.

------
j_s
Right now Mastadon might as well be off-grid, unable to add additional
accounts on the main server. Popularity has stunted it's growth!

I am not sure how much thought has been given to the scalability of this
solution, it sounds like it will benefit from most of the advantages offered
by P2P in this department.

Eventually something like this could organically grow into the "next
Internet", in much the same way that the current internet has morphed into
what it is today.

~~~
dublinben
But you can register an account on any node[0], and communicate with anyone
else on the entire network. That's the strength of federated protocols.

[0] [https://instances.mastodon.xyz/](https://instances.mastodon.xyz/)

~~~
j_s
My point was not what I _could_ do but rather what I _will_ do to try out some
random new social network. Having now read that migrating identities is
currently impractical I am even more certainly not going to take a chance on
some other random server or even my own!

How well has federation worked out in practice (for other federated, social
network related protocols) so far?

As far as I know, federation has only worked for ancient stuff that has
nothing to do with social networks, like email and DNS. Basically, it is a
part of core functionality and thus can't be co-opted by commercial interests
(though GMail has made quite an inroad!).

Until it has proven itself, social federation doesn't really seem like a
strength to me. It does sound good in theory! Other people with actual
experience are adding their anecdotes which lines up with what I'm trying to
say.

~~~
algesten
> As far as I know, federation has only worked for ancient stuff that has
> nothing to do with social networks, like email and DNS. Basically, it is a
> part of core functionality and thus can't be co-opted by commercial
> interests (though GMail has made quite an inroad!).

Email only works _because_ of big players like GMail. Running your own server
spam free and away from black lists is an endless task.

DNS is going a similar way with more and more ISPs resorting to hijacking DNS
lookups for all sorts of nefarious reasons. This protocol seriously need a
broadly embraced signature system to validate origin.

------
cosenal
You can make sure that the author wrote this post by copy-pasting [this
signature]([https://raw.githubusercontent.com/staltz/staltz.github.io/ma...](https://raw.githubusercontent.com/staltz/staltz.github.io/master/signed_posts/2017-04-06-an-
off-grid-social-network.md.asc)) <\-- 404: Not Found. Now I am not so sure on
who the author is anymore...

~~~
staltz
Thanks for mentioning. Should be fixed now. I recently migrated from gh-pages
to netlify.

~~~
cosenal
I didn't know about netlify, it looks neat!

------
the_arun
Wouldn't the size of diaries grow big - GBs and TBs over period of time and
make it slow?

~~~
staltz
Yes, but not GBs. There are logs (diaries) and blobs. Images are stored in
blobs and can be garbage collected.

------
matt_wulfeck
The post starts by introducing two people (one in a boat in the ocean and
another in the mountains in Hawaii) and states that they are communicating to
each other. I thought this post was about some new long-range wireless
protocol that sync'd via satellites or some such. I was disappointed to see
this:

> Every time two Scuttlebutt friends connect to the same WiFi, their computers
> will synchronize the latest messages in their diaries.

Ultimately this technology seems to be a decentralized, signed messaging
system. What problem are they solving? That facebook and twitter can delete
and alter your messages?

Meanwhile I'm in search of a long-range, wireless communication system that
can function like a network without the need of an ISP. Anyone know anything
about this?

~~~
cel
> What problem are they solving? That facebook and twitter can delete and
> alter your messages?

that using those services for one's communications places too much power in
the hands of centralized authority. (I speak just for myself here).

[https://www.scuttlebutt.nz/stories/design-challenge-avoid-
ce...](https://www.scuttlebutt.nz/stories/design-challenge-avoid-
centralization-and-singletons.html)

> Meanwhile I'm in search of a long-range, wireless communication system that
> can function like a network without the need of an ISP. Anyone know anything
> about this?

On SSB we have discussed doing gossip connections over long-range wireless
connections:
[https://viewer.scuttlebot.io/%2547H0BQQHAXvvqf8K3ngdeMtAHQdP...](https://viewer.scuttlebot.io/%2547H0BQQHAXvvqf8K3ngdeMtAHQdPYU0qH0giuL7FKFY%3D.sha256)

Of course if you are looking for a network in a more traditional sense,
something lower-level may be more appropriate.

------
good_vibes
I really like his train of thought. The future of social networking will be
very different from how it is structured today. That's a very safe bet.

------
freedaemon
> For instance, unique usernames are impossible without a centralized username
> registry.

This is Zooko's triangle and was squared by blockchains. Namecoin (2011), BNS
(the Blockstack Name System, 2014), and now a bunch of other fully-
decentralized naming systems can give you unique usernames. Recently, Ethereum
tried launching ENS and ran into some security issues and will likely re-
launch soon.

~~~
freehunter
Problem is, I don't want to be assigned a username. I hate it when I get
assigned a username. I want _my_ username. If you hand me a username of
"$&OdUgr606cZ", I will never remember that, I will never share that, and I
will consequently never ever log in.

But it doesn't matter because this issue is already solved. We already have
globally unique usernames. They're called email addresses, they are unique by
their very nature, and they are (for all intents and purposes) already
decentralized.

~~~
zeveb
> But it doesn't matter because this issue is already solved. We already have
> globally unique usernames. They're called email addresses, they are unique
> by their very nature, and they are (for all intents and purposes) already
> decentralized.

No, they're not: billg@microsoft.com depends on microsoft.com, which depends
on com, which depends on the root nameservers, which are … a central
nameservice.

That's the whole point of Zooko's Triangle: of secure, decentralised and
human-readable, you can have at most two. Global-singleton approaches are
still centralised (the singleton is the centre), although they may build the
singleton in a decentralised fashion.

~~~
freehunter
I think you misunderstand what the phrase "for all intents and purposes"
means. It doesn't mean "literally, 100% true" it means "for true enough for
this argument". What network does your blockchain run on? It still relies on
Comcast to get to my house right? Because you want it to run over the
Internet? Maybe you're using AT&T? Probably L3 is in there somewhere, but
you're still relying on a centralized piece of equipment somewhere, and you're
probably going to have a .com or .org to advertise it, and you might have a
Wikipedia page or a Facebook group or collaborate development on Github and
chat with your team on Slack and exchange files on Dropbox and send messages
on Gmail and you log into all of those services with... your globally unique
email address. Possibly using a domain you own, with the mail exchange hosted
on a server you own that you set up specifically for this project.

Maybe I'm missing the point, and I would look to you to explain to me what
that is. But I guess congrats, you don't rely on ICANN anymore...

~~~
JdeBP
xyr point seems to be that your claim that e-mail addresses are decentralized
is faulty. No amount of "Well you are not decentralized in your block chain,
either." is going to rebut that. Indeed, it actually _reinforces_ the argument
that your claim was faulty, by implicitly agreeing to it with a "but neither
are you" response.

So perhaps you would like to now explain how e-mail addresses are a system
without a centre. Bear in mind that you yourself have just made the point
about ICANN being at their centre. (-:

~~~
zeveb
> xyr point

 _His_ point. I am a man.

------
yoandy
Does it normally take too long indexing database? since I started the app have
been a long while. I thought this could be a nice tool to use in places like
Cuba, but I've realized now, that once connected to a Pub it download more
than 1 GB, that would be a problem too in a place with lack of internet
bandwidth.

~~~
cel
It can take a while to index the database.

In places lacking internet bandwidth, people could run pubs in hackerspaces,
schools, offices, homes, Actual Pubs, etc. A pub in a place that people
frequent would gossip messages for the people, so they would not all need to
connect to the internet all the time. Even the pub itself doesn't have to
connect to the internet for it to be useful, as it would still help messages
spread when people connect to it. As long as someone in the network connects
to the Internet at least once in a while, people will be able to communicate
with the broader network. With this architecture we can make more efficient
cooperative use of network bandwidth.

------
XaspR8d
Basic question: since the entries form a chain and reference the previous, is
there no way to edit or delete your old entries? (I see it "prevents
tampering" and there's something of a philosophical question here about
whether you're "tampering" with your own history when you editorialize -- I
agree with the crypto interpretation, but in the context of offline
interaction, social communication isn't burdened with such expectations of
accuracy or time-invariance.)

If so I see that as a fairly large limitation for the common user. Even though
truly removing something from the internet is effectively an impossibility, I
think most non-technical folks aren't actively aware of this, and I'd at least
like the option make it _harder_ for folks to uncover.

~~~
staltz
There is no way (as far as I know) to delete old entries, and I think this is
good because with gossip mechanics we cannot lie to ourselves: there is no way
of stopping that information from spreading.

What's possible, on the other hand, is to make a message type "ignore the
previous" which client apps would interpret to hide them, but obviously a
client app can be configured to not hide them.

~~~
XaspR8d
Yes I suppose I'm targetting the protocol a little too much when I should
think about what features clients can implement.

I'm certainly happy with the "gossip" approach; I just see it as challenging
for some people to adopt when they are coddled with the idea that they can
censor their past.

~~~
staltz
Another very important point with no-deletes is: you can't deny history. You
can't, in real-life, take back something you said. The mechanics of digital
deletion in centralized systems is a big problem for _History_. I remember
Julian Assange describing a corruption scandal that was essentially erased
from History because a digital article by a large media company was deleted,
and not replicated fast enough.

~~~
JdeBP
Usenet is, once again, a lesson here. Remember cancel messages. Then remember
forged cancels. Then remember the people who decided to stop respecting cancel
messages. Then remember the discussions about signed cancel messages. And so
on. (-:

------
zeveb
I think that SPKI's name certs would be a good next step for this, so people
could associate human-readable petnames with keys.

C.f. [http://theworld.com/~cme/spki.txt](http://theworld.com/~cme/spki.txt)
and RFCs 2692 & 2693.

~~~
AljoschaMeyer
Currently, a user can simply post an `about` message in the stream. Clients
will then automatically use this name instead of the key.

~~~
zeveb
What happens if two users post an about message with the same name?

~~~
ahdinosaur
what happens "in real life" when two people go by the same name?

Scuttlebutt works the same way: anyone can name themselves anything and anyone
can name other people anything, it's up to the client how to interpret those
messages. more on how SSB embraces subjectivity:
[https://youtu.be/P5K18XssVBg](https://youtu.be/P5K18XssVBg).

------
noffle_
We also have git over SSB!

[https://github.com/clehner/git-ssb](https://github.com/clehner/git-ssb)

[https://github.com/noffle/git-ssb-intro](https://github.com/noffle/git-ssb-
intro)

~~~
staltz
This is so awesome that it deserves a separate blog post. I tried to contain
some excitement and not expand on it with this blog post, but it's really
revolutionary, and super easy to adopt.

------
rattray
Hmm, the "legacy" interface to patchwork seems much nicer to me:

[https://www.scuttlebutt.nz/applications.html#patchwork-
class...](https://www.scuttlebutt.nz/applications.html#patchwork-classic)

Curious what motivated the shift.

~~~
offa
Wow, you weren't kidding, this indeed looks much nicer.

------
Andrex
This seems similar to the (now defunct) project Opera Unite[1]. Basically
Opera turned each browser into a kind of server.

Granted, Unite still used the Web, Opera accounts, and ISPs, but I believe it
could communicate locally over a router too.

[1]
[https://www.youtube.com/watch?v=5oJd9lGWbWI](https://www.youtube.com/watch?v=5oJd9lGWbWI)

------
ICRqVNmDrU8FDi
This wouldn't scale. If we replaced twitter with this right now, that's:

500 million tweets per day. 140 bytes (140 characters * 8-bit ASCII) per
tweet.

140 bytes * 500 million = 70GB

Thats 70GB per day before metadata. Use this social network for a month and
we've exceeded the 1TB mark, twice.

Remember this isn't just 70GB per day on one server, this is 70GB per day on
every users PC.

~~~
crypt0x
how do you read 500 million tweets per day?

the idea is to only replicate who you care about+a couple of their friends.

~~~
rakoo
Except for pubs, which are helpful broadcasters of both private and public
stuff, but it makes sense to host them on an infrastructure that can ingest 70
GB a day and have a couple days of retention

~~~
substack
Pubs don't follow everyone on the network. People set them up and give out
invites to their friends.

~~~
rakoo
Which means even less requirements for such "private" pubs. However if SSB is
ever to replace Twitter, I would guess there would be other, "public" pubs
that try to get all the content possible.

------
bitwize
So... it's kinna like USENET then?

~~~
vog
Not sure why this was donvoted. From a bird's eye view, the whole system looks
remarkably similar to Usenet, especially in the old times of UUCP, back when
systems were mostly offline and had relatively short timespans to exchange
information (via dial-up connection or similar).

What's different now is that we have pleny of disk space, and more than enough
computing power to perform proper cryptography.

~~~
thraway2016
Probably the same reason that "slack == irc" and anti-systemd comments get
heavily downvoted. Younger generations want badly to believe that they've
invented something completely novel and unique.

This absolutely is usenet + crypto.

~~~
Karunamon
The reason those comments get downvoted is because they are _wrong_. Anyone
writing off slack as an IRC clone exhibits a very poor understanding of either
system. Slack is similar to IRC in that they deliver text-based communication,
but that's about it. The underlying protocols, features, and so forth are
radically different.

Most anti-systemd comments are similarly poorly thought out and articulated.

This might be superficially usenet+crypto, but oftentimes things are more than
the sum of their conceptual parents.

------
ghostwreck
This is great! Exactly the type of service I would use as someone that avoids
centralized social networks.

Also wondering, can this be a replacement for Slack? Can I set up a private
group chat room? Or can I only use the private @ feature to send private
messages to multiple people?

~~~
dangerousbeans
Yeah you can and it works quite well, but is limited to 5 or 7 people or
something like that. But 0 metadata about who's involved in the conversation
is great, feels way more secret shhhhh

------
macawfish
Would it be hard to port patchwork to android? I'd love something like this
for android.

~~~
crypt0x
you might want to get in contact with andre. I'm sure he could use some help..
;)

[http://viewer.scuttlebot.io/%b6nlgiAu3ZWkLqKnvkU1T/9PZCfiqSU...](http://viewer.scuttlebot.io/%b6nlgiAu3ZWkLqKnvkU1T/9PZCfiqSU/Ujg1xRmD/64=.sha256)

------
nebabyte
> Every time two Scuttlebutt friends

scuttlebuds?

------
dangerousbeans
If anyone would like an invite/pub connection, Project Entropy maintains a pub
here which can hand out invites: [http://ssb.project-
entropy.com/](http://ssb.project-entropy.com/)

------
bandrami
"Scuttlebutt was created by Dominic Tarr, a Node.js developer with more than
600 modules published on npm, who lives on a self-steering sailboat in New
Zealand."

That's roughly the level of paranoia I want now

------
camjohnson26
Is there a better guide to the service? The handbook is accurate but really
short. Would also help if it was easier to browse channels.

------
hsribei
So how do I sync two instances with nothing but a USB stick? What commands do
I use on both ends?

------
xori
A git repo, with GPG signed ~commits~ message posts, and a blob store for
files. Love it.

------
hackeraccount
it seems like this is trying to solve the same problem as ipfs
[https://ipfs.io/](https://ipfs.io/)

It's pretty neat when you see two people/groups working on the same problem
independently.

~~~
dotchev
Yes, it would be interesting to compare the two. ipfs seems more generic
approach.

------
0xdeadbeefbabe
Why do I have so many friends? Talk about not mirroring reality.

------
mattcoles
How does such a social network tackle abuse?

~~~
est
You don't follow them.

------
truedev
What a great idea. I hope it succeeds.

------
brian-armstrong
I'm really excited to see how this develops. Best of luck to the Scuttlebutt
team, what a great idea :)

------
danjoc
This sounds more like a mesh communication tool, or a git repo, than a social
network to me. "Social network" sites like Facebook aren't so utilitarian.
Facebook was the place where people post life's highlights. (I say was, I
haven't been there in years. I don't know what it is like today.) HN is also
like this, except it is a stream of project highlights.

------
f4rker
"off the grid"

and

"social network"

pick one.

~~~
abraae
Depends what you mean by "off the grid".

I've been working on something that has some similarities. Its a mesh network
of pest traps in the New Zealand bush. Battery life is very important, so each
node sleeps most of the time, then periodically wakes up and communicates with
its neighbours. Made more complex by devices not having real time clocks.

Once every node in the network is powered down most of the time, I don't think
you can consider it a grid.

------
techvinn
nice post!!keep going like this [http://techvinn.com/appvn-download-
apk/](http://techvinn.com/appvn-download-apk/)

