UUCP [https://en.wikipedia.org/wiki/UUCP] used the computers' modems to dial out to other computers, establishing temporary, point-to-point links between them. Each system in a UUCP network has a list of neighbor systems, with phone numbers, login names and passwords, etc.
FidoNet [https://en.wikipedia.org/wiki/FidoNet] was a very popular alternative to internet in Russia as late as 1990s. It used temporary modem connections to exchange private (email) and public (forum) messages between the BBSes in the network.
In Russia, there was a somewhat eccentric, very outspoken enthusiast of upgrading FidoNet to use web protocols and capabilities. Apparently, he's still active in developing "Fido 2.0": https://github.com/Mithgol
Usenet back then was spam free and you could usually end up talking to the creators of whatever you're discussing. I rather miss it.
Quite a few tech companies used private newsgroups for support, so you'd dial into those separately. As they were often techie to techie they worked rather well.
I first came across Usenet and uucp via the Amiga Developer programme. Amicron and uucp overnight all seemed a bit magic back in 87 compared to dialing into non-networked BBS's to browse, very, very slowly!
I still use usenet! It's not quite what it used to be, but you should check it out.
you need the www
For us raised in Soviet Union, it was eye opening experience that you may freely exchange messages with people around the globe.
These things are definitely systems to learn from, both their architectures and their histories; and people have already been drawing parallels to Usenet on this very page, notice.
Your email address might be george@cmu!vax!something!mitre!foo
Which meant Route the email through foo->mitre->something->vax->cmu.
Sysadmins would often keep tables of known routes. While people would describe their route from common known routing hosts.
We were so living in the cyberpunk future then.
And we liked it!
(Just guessing, but I wouldn't be surprised if it happened to be true).
I tried to switch what server my account was on halfway through my GNU Social life, and you just can't; all your followers are on the old server, all your tweets, and there is no way to say "I'm still the same person". I didnt realise I wanted cryptographic identity and accounts until I tried to actually use the alternative.
That's also part of the interest I have in something like Urbit, which has an identity system centered on public keys forming a web of trust, which also lets you have a reputation system and ban spammers which you can't do easily with a pure DHT.
The challenges that I see:
- Making it easy for any user to get up and running.
- De-authenticating old devices.
- Making it available from any mobile device.
Ideally, the end solution would be dead simple. Download the Windows app, run it, put your credit card in if you need a URL registered, and it does everything, including daily backups to a folder on your disk.
This is not a damn cloud ! This a remote computer you can access over the internet. Can we stop with this use of marketing lingo please ?
"Host" is generally less ambiguous, referring to a specific thing given the context of the discussion. It's a pronoun for machines (kinda).
The purpose of the original coinage of the word "cloud" was to obfuscate that you really meant "someone else's computer". It gives a nice warm, fuzzy decentralised impression - clouds are natural and ubiquitous! No one owns them! If it's in "the cloud" (note the definite article) then it's safe in the very fabric of the network, right?
Nope. It's in Larry and Sergey's basement. Not decentralised at all. Just somewhere else.
The proper term is "server", "datacenter", or "network", depending on what you're actually trying not to say.
What percentage of users do you think would be affected by such cases? If it's something over 0.001%, it's a huge problem for a social network.
Sites like Coinbase and Github exist because they re-centralize distributed systems — users don't trust themselves to host their own data securely.
Alternately, if this isn't a problem, why don't users simply host all their own infrastructure for existing tech problems today ?
I'm sure someone capable of living in the Mojave Desert is capable of hosting their own infrastructure - is this network simply for those people, or is it also for journalists, trans people, and HR professionals?
Very few people have considered whether or not they should attempt to back up their Facebook account. Same's true for Flickr, Twitter, and Gmail.
I'm probably in the minority being so irresponsible with my own backups, but I'm not alone.
Google and Facebook have a lot on the line with regard to user trust of their reliability. Also, they can't monetize data that they've lost.
I'd rather have my own backup copieS and take my own responsibilities. The scenario you evoke here would not happen if you had proper backup strategy, two is one and one is none.
Compare that to their experience at home with personal gear. Many like the convenience and reliability of Facebook over their own technical skills or efforts. You'd have to convince those people... a shitload of people... that they should start handling IT on their own. Also note that there's many good, smart, interesting, and so on people that simply don't do tech. Anyone filtering non-technical or procrastinating people in a service will be throwing out lots of folks whose company they might otherwise enjoy.
So, these kinds of issues are worth exploring when trying to build a better social network.
Same with backups, lose your data to drive failure or theft once and suddenly having a backup strategy becomes a priority.
But as long as they have not been bitten once they don't care enough to actually do something proactive.
It seems like the system would work just as well for people who decide to turn their system off when they go to work, or are on a sailboat. Of course it's not convenient in the same way that always-on social networks are, but that seems to be specifically not the point of SSB.
None of these things are a concern on traditional social networks. They have to be solved before the world has any chance of moving to a decentralized network.
I say no. Most of them don't have to be solved first.
Feel free to convince me that 3 days of downtime on my personal messaging account cannot have my personal account is a problem.
These are people who typed "Facebook Login" into a Google search, clicked the first result without reading, and got confused. Now tell these same users that Comcast blocked their social network or that they can't log in on their phone because their home Internet connection is down.
If you want a social network filled with just people like you and me, look at App.net or GNU Social for inspiration. If you want average users to sign in, these issues absolutely do have to be solved.
The early web (when I entered and before) wasn't for everyone. And thats OK for me. Actually I think it is a good way to start.
It got where it went by doing the opposite of what you're suggesting. The walled garden for smart elites were mostly working on OSI from what old timers tell me. The TCP/IP, SMTP, etc involved lots of hackers trying to avoid doing too much work. Much like the users you prefer to filter. Then, it just went from there getting bigger and bigger due to low barrier of entry. Tons of economic benefits and business models followed. Now we're talking to each other on it.
If you're internet connection goes down, power goes out, you get DDoS'D this would hinder your ability to use any third party online service anyways.
The data cap and ISP restrictive terms of service are a different problem that would be challenged and fixed given internet subscriber would go the p2p self host way. The commercial ISP situation is a terrible mess right now.
If you got hacked unplug from network, boot from recovery, restore from backup, you're back online in less time than it takes to recover a hacked facebook account.
You say decentralized but it seems to me you meant distributed here.
One data center * 24 backup generators * 3 MW each = 72 MW per data center. Four of those = 288 MW > 1/4GW.
Google has more than three data centers.
Assuming that they're doubling energy consumption every year they'd have reached 8GW in 2016. That's 8W per user if we assume 1 billion users. Energy usage of a Raspberry is not insignificant relative to even this.
Doing things at scale is vastly more efficient. And only a subset of Google services can be relegated to a Raspberry. Even if you host your own mails, are you ready to ditch the Google search index and Youtube?
And you just use it for yourself? well, ok... but at that point you could also use a fully decent system.
Sounds interesting. How can you ban spammers when they can just create a new public key/identity if their old one is banned? And also, what does "banning" comprise, in a decentralized social network? I would assume it would be sufficient to just "unfollow" that particular identity.
I'd rather be able to email everybody and have it be annoying to switch than be able to only email people on my chosen provider (and then have to make an account on every service anyway).
This is a common misunderstanding. You do not need to use those nodes to bootstrap. Most clients simply choose to because it is the most convenient way to do so on the given substrate (the internet). DHTs are in no way limited to specific bootstrap nodes, any node that can be contacted can be used to join the network, the protocol itself is truly distributed.
If the underlying network provides some hop-limited multicast or anycast a DHT could easily bootstrap via such queries. In fact, bittorrent clients already implement multicast neighbor discovery which under some circumstances can result in joining the DHT without any hardcoded bootstrap node.
The multicast neighbor discovery is a neat idea. I wonder what percentage of clients/connections it results in successful bootstrapping for.
You could also run your own bootstrap node on an always-up server if downtimes making the lists stale is a concern.
You can also inject contacts when starting the client, you would have to obtain them out-of-band from somewhere of course, but it still does not require anything centralized.
If you're desperate you could also just sweep allocated IPv4 blocks and DHT-ping port 6881, you'll probably find one relatively fast. Of course that doesn't work with v6.
So there is no centralization and no single point of failure.
> The multicast neighbor discovery is a neat idea. I wonder what percentage of clients/connections it results in successful bootstrapping for.
It could work on a college campus, some conference network or occasionally some open wifi. Additionally there are some corporate bittorrent deployments where peer discovery via multicast can make sense.
If I understand TFA correctly scuttlebutt assumes(?) roaming through wifis and LANs. Those circumstances are ideal for multicast bootstrapping, so in principle the DHT can perform just as well as scuttlebutt, probably even better because once it has bootstrapped it can use the global DHT to keep contact with the network even if there is no lan-local peer to be discovered.
There is no semantic difference between the two. The only difference is when you connect to the single-point-of-truth bootstrap, at download time (well, technically build-time) or at first startup time. And the latter probably gives you a more current, and not limited to long-lived nodes, thus better, answer.
> You could also run your own bootstrap node on an always-up server if downtimes making the lists stale is a concern.
Which itself needs to be bootstrapped. And once it is, it's equivalent to your local cache.
Possibly, which mechanisms are used varies from client to client. Usually DHT bootstrap is not a primary goal but a side-effect of other mechanisms. Things that work in some clients:
magnet -> tracker -> peer -> dht ping
torrent -> tracker -> peer -> dht ping
magnet -> contains direct peer -> peer -> dht ping
torrent or magnet -> multicast discovery -> peer -> dht ping
torrent -> contains a list of dht node ip/port pairs
I believe this is how bitcoin works. Or at least it used to.
are there any noteworthy resources for non-academics to get started?
But a DHT is usually just a low-level building block in more complex p2p systems. As its name says it's simply a distributed hash table. A data structure on a network. It just gives you a distributed key-value pair store where the values are often required to be small. In itself it doesn't give you trust, two-way communication, discovery or anything like that. Those are often either tacked on as ad-hoc features, handled by separate protocols or require some tricky cryptography.
The last two are particularly devastating. Even if the peers had a key/value whitelist and hashes (e.g. like a .torrent file), an adversary can still insert itself into the routing tables of honest nodes and prevent peers from ever discovering your key/value pairs. Moreover, they can easily spy on everyone who tries to access them. It is estimated  that 300,000 of the BitTorrent DHT's nodes are Sybils, for example.
BEP42 has been implemented by many clients and yet nobody has felt the need to actually switching to enforcement mode.
All that is the result of the bittorrent DHT being a low-value target. It does not contain any juicy information and is just one of multiple peer discovery mechanisms, so there's some redundancy too.
If I'm "in" on the sharing, then I learn the IP addresses (and ISPs and proximate locations) of the other people downloading the shared file. Moreover, if I control the right hash buckets in the DHT's key space, I can learn from routing queries who's looking for the content (even if they haven't begun to share it yet). Encryption alone does not make file-sharing a private affair.
> BEP42 has been implemented by many clients and yet nobody has felt the need to actually switching to enforcement mode.
It also does not appear to solve the problem. The attacker only needs to get control of hash buckets to launch routing attacks. Even with a small number of unchanging node IDs, the attacker is still free to insert a pathological sequence of key/value pairs to bump hash buckets from other nodes to them.
> All that is the result of the bittorrent DHT being a low-value target. It does not contain any juicy information and is just one of multiple peer discovery mechanisms, so there's some redundancy too.
Are you suggesting that high-value apps should not rely on a DHT, then?
Someone who is "in" on encrypted content can observe the swarm anyway, thus gains very little from performing snooping on a DHT. On the other hand a passive DHT observer who is not "in" will be hampered by not knowing what content is shared, he only sees participation in opaque hashes. Additionally payload encryption adds deniability because anyone can transfer the ciphertext but participants won't know whether others have the necessary keys to decrypt it.
What I'm saying is that any information leakage via the DHT (compared to public trackers and PEX) is quite small, and this small loss can be more than made up by adding payload encryption.
> the attacker is still free to insert a pathological sequence of key/value pairs to bump hash buckets from other nodes to them.
There is no bumping in kademlia with unbounded node storage. And clients with limited storage can make bumping very hard for others with oldest-first and one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.
> Are you suggesting that high-value apps should not rely on a DHT, then?
No, they should use DHT as a bootstrap mechanism of easy-to-replicate, difficult-to-disrupt small bits of information (e.g. peer contacts as in bittorrent) which then run their own content-specific gossip network for the critical content. In some contexts it can also make sense to make reverse lookups difficult, so attackers won't know what to disrupt unless they're already part of some group.
I can see that this thread is getting specific to Bittorrent, and away from DHTs in general. Regardless, I'm not sure if this is the case. Please correct me if I'm wrong:
* If I can watch requests on even a single copy of a single key/value pair in the DHT, I can learn some of the IP addresses asking for it (and when they ask for it).
* If I can watch requests on all copies of the key/value pair, then I can learn all the interested IP addresses and the times when they ask.
* If I can do this for the key/value pairs that make up a .torrent file, then I can (1) get the entire .torrent file and learn the list of file hashes, and (2) find out the IPs who are interested in the .torrent file.
* If I can then observe any of the key/value pairs for the .torrent file hashes, then I can learn which IPs are interested in and can serve the encrypted data (and the times at which they do so).
This does not strike me as "quite small," but that's semantics.
> There is no bumping in kademlia with unbounded node storage. And clients with limited storage can make bumping very hard for others with oldest-first and one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.
Yes, the DHT nodes can employ heuristics to try to stop this, just like how BEP42 is a heuristic to thwart Sybils. But that's not the same as solving the problem. Applications that need to be reliable have to be aware of these limits, and anticipate them in their design.
> No, they should use DHT as a bootstrap mechanism of easy-to-replicate, difficult-to-disrupt small bits of information (e.g. peer contacts as in bittorrent) which then run their own content-specific gossip network for the critical content. In some contexts it can also make sense to make reverse lookups difficult, so attackers won't know what to disrupt unless they're already part of some group.
This kind of proves my point. You're recommending that applications not rely on DHTs, but instead use their own content-specific gossip network.
To be fair, I'm perfectly okay with using DHTs as one of a family of solutions for addressing one-off or non-critical storage problems (like bootstrapping). But the point I'm trying to make is that they're not good for much else, and developers need to be aware of these limits if they want to use a DHT for anything.
It is quite small because bittorrent needs to use some peer source. If you're not using the DHT you're using a tracker. The same information that can be obtained from the DHT can be obtained from trackers. So there's no novel information leakage introduced by the DHT.
That's why the DHT does not really pose a big information leak.
> This kind of proves my point. You're recommending that applications not rely on DHTs, but instead use their own content-specific gossip network.
That's not what I said. Relying on a DHT for some parts, such as bootstrap and discovery is still... well... relying on it, for things it is good at.
> But the point I'm trying to make is that they're not good for much else, and developers need to be aware of these limits if they want to use a DHT for anything.
Well yes, but these limits arise naturally anyway since A stores data for B on C and you can't really incentivize C to manage anything more than small bits of data.
> I can see that this thread is getting specific to Bittorrent
About DHTs in general, you can easily make reverse lookups difficult or impossible by hashing the keys (bittorrent doesn't because the inputs already are hashes), you can obfuscate lookups by making them somewhat off-target until they're close to the target and making data-lookups and maintenance lookups indistinguishable. You can further add plausible deniability by by replaying recently-seeing lookups when doing maintenance of nearby buckets.
Replacing a tracker with a DHT trades having one server with all peer and chunk knowledge with N servers with partial peer and chunk knowledge. If the goal is to stop unwanted eavesdroppers, then the choice is between (1) trusting that a single server that knows everything will not divulge information, or (2) trusting that an unknown, dynamic number of servers that anyone can run (including the unwanted eavesdroppers) will not divulge partial information.
The paper I linked up the thread indicates that unwanted eavesdroppers can learn a lot about the peers with choice (2) by exploiting the ways DHTs operate. Heuristics can slow this down, but not stop it. With choice (1), it is possible to fully stop unwanted eavesdroppers if peers can trust the tracker and communicate with it confidentially. There is no such possibility with choice (2) if the eavesdropper can run DHT nodes.
> That's not what I said. Relying on a DHT for some parts, such as bootstrap and discovery is still... well... relying on it, for things it is good at.
> Well yes, but these limits arise naturally anyway since A stores data for B on C and you can't really incentivize C to manage anything more than small bits of data.
Thank you for clarifying. Would you agree that reliable bootstrapping and reliable stead-state behavior are two separate concerns in the application? I'm mainly concerned with the latter; I would never make an application's steady-state behavior dependent on a DHT's ability to keep data available. In addition, bootstrapping information like initial peers and network settings can be obtained through other channels (e.g. DNS servers, user-given configuration, multicasting), which further decreases the need to rely on DHTs.
> About DHTs in general, you can easily make reverse lookups difficult or impossible by hashing the keys (bittorrent doesn't because the inputs already are hashes), you can obfuscate lookups by making them somewhat off-target until they're close to the target and making data-lookups and maintenance lookups indistinguishable. You can further add plausible deniability by by replaying recently-seeing lookups when doing maintenance of nearby buckets.
I'm not quite sure what you're saying here, but it sounds like you're saying that a peer can obfuscate lookups by adding "noise" (e.g. doing additional, unnecessary lookups). If so, then my reply would be this only increases the number of samples an eavesdropper needs to make to unmask a peer. To truly stop an eavesdropper, a peer needs to ensure that queries are uniformly distributed in both space and time. This would significantly slow down the peer's queries and consume a lot of network bandwidth, but it would stop the eavesdropper. I don't know of any production system that does this.
In practice trackers do divulge all the same information that can be gleaned from the DHT and so does PEX in a bittorrent swarm. Those are far more convenient to harvest.
> I'm not quite sure what you're saying here, but it sounds like you're saying that a peer can obfuscate lookups by adding "noise" (e.g. doing additional, unnecessary lookups).
That's only 2 of 4 measures I have listed. And I would mention encryption again as a 5th. The others: a) Opportunistically creating decoys by having others repeat lookups they have recently seen as part of their routing table maintenance b) storing data in the DHT in a way that requires some prior knowledge to be useful, which will ideally result in the only leaking information when the listener could obtain the information anyway if he has that prior knowledge.
There's a lot you can do to harden DHTs. I agree that naive implementations are trivial to attack, but to my knowledge it is possible to achieve byzantine fault tolerance in a DHT in principle, it's just that nobody has actually needed that level of defense yet, attacks in the wild tend to be fairly primitive and only succeed because some implementations are very sloppy about sanitizing things.
> To truly stop an eavesdropper, a peer needs to ensure that queries are uniformly distributed in both space and time.
Not quite. You only need to increase the number of samples needed beyond the number of samples a peer is likely to generate during some lifecycle, and that is not just done by adding more traffic.
> Would you agree that reliable bootstrapping and reliable stead-state behavior are two separate concerns in the application?
Certainly, but bootstrapping is a task that you do more frequently than you think. You don't just join a global overlay once, you also (re)join many sub-networks throughout each session or look for specific nodes. DHT is a bit like DNS. You only need it once a day for a domain (assuming long TTLs), and it's not exactly the most secure protocol and afterwards you do the heavy authentication lifting with TLS, but DNS is still important, even if it you're not spending lots of traffic on it.
I'm confused. I can configure a tracker to only communicate with trusted peers, and do so over a confidential channel. The tracker is assumed to not leak peer information to external parties. A DHT can do neither of these.
> That's only 2 of 4 measures I have listed. And I would mention encryption again as a 5th. The others: a) Opportunistically creating decoys by having others repeat lookups they have recently seen as part of their routing table maintenance b) storing data in the DHT in a way that requires some prior knowledge to be useful, which will ideally result in the only leaking information when the listener could obtain the information anyway if he has that prior knowledge.
Unless the externally-observed schedule of key/value requests is statistically random in time and space, the eavesdropper can learn with better-than-random guessing which peers ask for which chunks. Neither (a) nor (b) address this; they simply increase the number of samples required.
> There's a lot you can do to harden DHTs. I agree that naive implementations are trivial to attack, but to my knowledge it is possible to achieve byzantine fault tolerance in a DHT in principle, it's just that nobody has actually needed that level of defense yet, attacks in the wild tend to be fairly primitive and only succeed because some implementations are very sloppy about sanitizing things.
First, no system can tolerate Byzantine faults if over a third of its nodes are hostile. If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes. Are we assuming that no more than one third of the DHT's nodes are evil?
Second, "nobody has actually needed that level of defense yet" does not mean that it is a sound decision for an application to use a DHT with the expectation that the problems will never occur. So the maxim goes, "it isn't a problem, until it is." As an application developer, I want to be prepared for what happens when it is a problem, especially since the problems are known to exist and feasible to exacerbate.
> Not quite. You only need to increase the number of samples needed beyond the number of samples a peer is likely to generate during some lifecycle, and that is not just done by adding more traffic.
I'm assuming that peers are arbitrarily long-lived. Real-world distributed systems like BitTorrent and Bitcoin aspire to this.
> Certainly, but bootstrapping is a task that you do more frequently than you think. You don't just join a global overlay once, you also (re)join many sub-networks throughout each session or look for specific nodes. DHT is a bit like DNS. You only need it once a day for a domain (assuming long TTLs), and it's not exactly the most secure protocol and afterwards you do the heavy authentication lifting with TLS, but DNS is still important, even if it you're not spending lots of traffic on it.
I take issue with saying that "DHTs are like DNS", because they offer fundamentally different data consistency guarantees and availability guarantees (even Beehive (DNS over DHTs) is vulnerable to DHT attacks that do not affect DNS).
Regardless, I'm okay with using a DHT as one of many supported bootstrapping mechanisms. I'm not okay with using it as the sole mechanism or even the primary mechanism, since they're so easy to break when compared to other mechanisms.
But then you are running a private tracker for personal/closed group use and have a trust source. If you have a trust source you could also run a closed DHT. But the bittorrent DHT is public infrastructure and best compared to public trackers.
> I'm assuming that peers are arbitrarily long-lived. Real-world distributed systems like BitTorrent and Bitcoin aspire to this.
Physical machines are. Their identities (node IDs, IP addresses) and the content they participate in at any given time don't need to be.
> If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes.
This can be made costly. In the extreme case you could require a bitcoin-like proof of work system for node identities. But that would be wasteful... unless you're running some coin network anyway, then you can tie your ID generation to that. In lower-value targets IP prefixes tend to be costly enough to thwart attackers. If an attacker can muster the resources to beat that he would also have enough unique machines at his disposal to perform a DoS on more centralized things.
> Are we assuming that no more than one third of the DHT's nodes are evil?
Assuming is the wrong word. I think approaching BFT is simply part of what you do to harden a DHT against attackers.
> Second, "nobody has actually needed that level of defense yet" does not mean that it is a sound decision for an application to use a DHT with the expectation that the problems will never occur.
I haven't said that. I'm saying that simply because this kind of defense was not yet needed nobody tried to build it, as simple as that. Sophisticated security comes with implementation complexity, that's why we had HTTP for ages before HTTPS adoption was spurred by the snowden leaks.
> Neither (a) nor (b) address this; they simply increase the number of samples required.
(b) is orthogonal to sampling vs. noise.
> I'm not okay with using it as the sole mechanism or even the primary mechanism, since they're so easy to break when compared to other mechanisms.
What other mechanisms do you have in mind? Most that I am aware of don't offer the same O(log n) node-state and lookup complexity in a distributed manner.
You're ignoring the fact that with a public DHT, the eavesdropper has the power to reroute requests through networks (s)he can already watch. With a public tracker, the eavesdropper needs vantage points in the tracker's network to gain the same insights.
If we're going to do an apples-to-apples comparison between a public tracker and a public DHT, then I'd argue that they are equivalent only if:
(1) the eavesdropper cannot add or remove nodes in the DHT;
(2) the eavesdropper cannot influence other nodes' routing tables in a non-random way.
> This can be made costly. In the extreme case you could require a bitcoin-like proof of work system for node identities. But that would be wasteful... unless you're running some coin network anyway, then you can tie your ID generation to that. In lower-value targets IP prefixes tend to be costly enough to thwart attackers. If an attacker can muster the resources to beat that he would also have enough unique machines at his disposal to perform a DoS on more centralized things.
Funny you should mention this. At the company I work part-time for (blockstack.org), we thought of doing this very thing back when the system still used a DHT for storing routing information.
We had the additional advantage of having a content whitelist: each DHT key was the hash of its value, and each key was written to the blockchain. Blockstack ensured that each node calculated the same whitelist. This meant that inserting a key/value pair required a transaction, and the number of key/value pairs could grow no faster than the blockchain.
This was not enough to address data availability problems. First, the attacker would still have the power to push hash buckets onto attacker-controlled nodes (it would just be expensive). Second, the attacker could still join the DHT and censor individual routes by inserting itself as neighbors of the target key/value pair replicas.
The best solution we came up with was one whereby DHT node IDs would be derived from block headers (e.g. deterministic but unpredictable), and registering a new DHT node would require an expensive transaction with an ongoing proof-of-burn to keep it. In addition, our solution would have required that every K blocks, the DHT nodes would deterministically re-shuffled their hash buckets among themselves in order to throw off any encroaching routing attacks.
We ultimately did not do this, however, because having the set of whitelisted keys growing at a fixed rate afforded a much more reliable solution: have each node host a 100% replica of the routing information, and have nodes arrange themselves into a K-regular graph where each node selects neighbors via a random walk and replicates missing routing information in rarest-first order. We have published details on this here: https://blog.blockstack.org/blockstack-core-v0-14-0-release-....
> Assuming is the wrong word. I think approaching BFT is simply part of what you do to harden a DHT against attackers.
If you go for BFT, you have to assume that no more than f of 3f+1 nodes are faulty. Otherwise, the malicious nodes will always be able to prevent the honest nodes from reaching agreement.
> I haven't said that. I'm saying that simply because this kind of defense was not yet needed nobody tried to build it, as simple as that. Sophisticated security comes with implementation complexity, that's why we had HTTP for ages before HTTPS adoption was spurred by the snowden leaks.
Right. HTTP's lack of security wasn't considered a problem, until it was. Websites addressed this by rolling out HTTPS in droves. I'm saying that in the distributed systems space, DHTs are the new HTTP.
> What other mechanisms do you have in mind? Most that I am aware of don't offer the same O(log n) node-state and lookup complexity in a distributed manner.
How about an ensemble of bootstrapping mechanisms?
* give the node a set of initial hard-coded neighbors, and maintain those neighbors yourself.
* have the node connect to an IRC channel you maintain and ask an IRC bot for some initial neighbors.
* have the node request a signed file from one of a set of mirrors that contains a list of neighbors.
* run a DNS server that lists currently known-healthy neighbors.
* maintain a global public node directory and ship it with the node download.
I'd try all of these things before using a DHT.
But in the context of bittorrent that is not necessary if we're still talking about information leakage. The tracker + pex gives you the same, and more, information than watching the DHT.
> we thought of doing this very thing back when the system still used a DHT for storing routing information.
The approaches you list seem quite reasonable when you have a PoW system at your disposal.
> have each node host a 100% replica of the routing information, and have nodes arrange themselves into a K-regular graph
This is usually considered too expensive in the context of non-coin/-blockchain p2p networks because you want nodes to be able to run on embedded and other resource-constrained devices. The O(log n) node state and bootstrap cost limits are quite important. Otherwise it would be akin to asking every mobile phone to keep up to date with the full BGP route set.
> assume that no more than f of 3f+1 nodes are faulty. Otherwise, the malicious nodes will always be able to prevent the honest nodes from reaching agreement.
Of course, but for some applications that is more than good enough. If your adversary can bring enough resources to bear to take over 1/3rd of your network he might as well DoS any target he wants. So you would be facing massive disruption anyway. I mean blockchains lose some of their security guarantees too once someone manages to dominate 1/2 of the mining capacity. Same order of magnitude. It's basically the design domain "secure, up to point X".
> I'm saying that in the distributed systems space, DHTs are the new HTTP.
I can agree with that, but I think the S can be tacked on once people feel the need.
> How about an ensemble of bootstrapping mechanisms?
The things you list don't really replace the purpose of a DHT. A dht is a key-value store for many keys and a routing algorithm to find them in a distributed environment. What you listed just gives you a bunch of nodes, but no data lookup capabilities. Essentially you're listing things that could be used to bootstrap into a DHT, not replacing the next layer services provided by a DHT.
Funny you should mention BGP. We have been approached by researchers at Princeton who are interested in doing something like that, using Blockstack (but to be fair, they're more interested in giving each home router a copy of the global BGP state).
I totally hear you regarding the costly bootstrapping. In Blockstack, for example, we expect most nodes to sync up using a recent signed snapshot of the node state and then use SPV headers to download the most recent transactions. It's a difference between minutes and days for booting up.
> Of course, but for some applications that is more than good enough. If your adversary can bring enough resources to bear to take over 1/3rd of your network he might as well DoS any target he wants. So you would be facing massive disruption anyway.
Yes. The reason I brought this up is that in the context of public DHTs, it's feasible for someone to run many Sybil nodes. There's some very recent work out of MIT for achieving BFT consensus in open-membership systems, if you're interested: https://arxiv.org/pdf/1607.01341.pdf
> I mean blockchains lose some of their security guarantees too once someone manages to dominate 1/2 of the mining capacity. Same order of magnitude. It's basically the design domain "secure, up to point X".
In Bitcoin specifically, the threshold for tolerating Byzantine miners is 25% hash power. This was one of the more subtle findings from Eyal and Sirer's selfish mining paper.
> The things you list don't really replace the purpose of a DHT. A dht is a key-value store for many keys and a routing algorithm to find them in a distributed environment. What you listed just gives you a bunch of nodes, but no data lookup capabilities. Essentially you're listing things that could be used to bootstrap into a DHT, not replacing the next layer services provided by a DHT.
If the p2p application's steady-state behavior is to run its own overlay network and use the DHT only for bootstrapping, then DHT dependency can be removed simply by using the systems that bootstrap the DHT in order to bootstrap the application. Why use a middle-man when you don't have to?
It seems like we have a quite different understanding how DHTs are used, probably shaped by different use-cases. Let me see if I can summarize yours correctly: a) over time nodes will be interested or have visited in a large proportion of the keyspace b) it makes sense to eventually replicate the whole dataset c) the data mutation rate is relatively low d) access to the keyspace is extremely biased, there is some subset of keys that almost all nodes will access. Is that about right?
In my case this is very different. Node turnover is high (mean life time <24h), data is volatile (mean lifetime <2 hours), nodes are only ever interested in a tiny fraction of the keyspace (<0.1%), nodes access random subsets of the keyspace, so there's little overlap in their behavior. The data would become largely obsolete before you even replicated half the DHT unless you spent a lot of overhead on keeping up with hundreds of megabytes of churn per hour and you would never use most of it.
So for you there's just "bootstrap dataset" and then "expend a little effort to keep the whole replica fresh". For me there's really "bootstrap into the dht", "maintain (tiny) routing table" and then "read/write random access to volatile data on demand, many times a day".
This is why the solutions you propose are no solutions for a general DHT which can also cope with high churn.
Agreed on (a), (b), and (c). In (a), the entire keyspace will be visited by each node, since they have to index the underlying blockchain in order to reach consensus on the state of the system (i.e. each Blockstack node is a replicated state machine, and the blockchain encodes the sequence of state-transitions each node must make). (d) is probably correct, but I don't have data to back it up (e.g. because of (b), a locally-running application node accesses its locally-hosted Blockstack data, so we don't ever see read accesses).
> In my case this is very different. Node turnover is high (mean life time <24h), data is volatile (mean lifetime <2 hours), nodes are only ever interested in a tiny fraction of the keyspace (<0.1%), nodes access random subsets of the keyspace, so there's little overlap in their behavior. The data would become largely obsolete before you even replicated half the DHT unless you spent a lot of overhead on keeping up with hundreds of megabytes of churn per hour and you would never use most of it.
Thank you for clarifying. Can you further characterize the distribution of reads writes over the keyspace in your use-case? (Not sure if you're referring to the Bittorrent DHT behavior in your description, so apologies if these questions are redundant). For example:
* Are there a few keys that are really popular, or are keys equally likely to be read?
* Do nodes usually read their own keys, or do they usually read other nodes' keys?
* Is your DHT content-addressable (e.g. a key is the hash of its value)? If so, how do other nodes discover the keys they want to read?
* If your DHT is not content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?
I am, but that's not even that important because storing a blockchain history is a very special usecase because you're dealing with an append-only data structure. There are no deletes or random writes. Any DHT used for p2p chat, file sharing or some mapping of identity -> network address will experience more write-heavy, random access workloads.
> Are there a few keys that are really popular, or are keys equally likely to be read?
Yes, some are more popular than others, but the bias is not strong compared to the overall size of the network. 8M+ nodes. Key popularity may range from 1 to maybe 20k. And such peaks are transient, mostly for new content.
> Do nodes usually read their own keys, or do they usually read other nodes' keys?
It is extremely unlikely that nodes are interested in the data for which they provide storage.
> Is your DHT content-addressable (e.g. a key is the hash of its value)?
Yes and no, it depends on the remote procedure call used. Generic immutable get/put operations are. Mutable ones use the hash of the pubkey. Peer address list lookups use the hash of an external value (from the torrent).
> * If your DHT is not content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?
For peer lists it maintains a list of different values from multiple originators, the value is the originator's IP, so it can't be easily spoofed (3-way handshake for writes). A store adds a single value, a get returns a list.
For mutable stores the value -> signature -> pubkey -> dht key is checked.
also note: DHT: hash table, BlockChain: linked list.
but there are a lot more datastructures than that!
There are several DHT papers that talk about bootstrapping DHTs off of social networks. They all fail to solve the Sybil problem in the same way: an adversary simply attacks the social network by pretending to be many people.
Not everything needs a global singleton like a blockchain or DHT or a DNS system. Bitcoin needs this because of the double-spend problem. But private chats and other such activities don't.
I have been working on this problem since 2011. I can tell you that peer-to-peer is fine for asynchronous feeds that form tree based activities, which is quite a lot of things.
But everyday group activities usually require some central authority for that group, at least for the ordering of messages. A "group" can be as small as a chess game or one chat message and its replies. But we haven't solved mental poker well for N people yet. (Correct me if I am wrong.)
The goal isn't to not trust anyone for anything. After all, you still trust the user agent app on your device. The goal is to control where your data lives, and not have to rely on any particular connections to eg the global internet, to communicate.
Btw ironic that the article ends "If you liked this article, consider sharing (tweeting) it to your followers". In the feudal digital world we live in today, most people speak must speak a mere 140 characters to "their" followers via a centralized social network with huge datacenters whose engineers post on highscalability.com .
If you are interested, here I talk about it further in depth:
But I had never heard of scuttlebut until now. This looks even more ideal. In amateur radio, everyone self identifies with their call sign, this follows the same model.
For amateur radio, there is a restriction against encryption (intent to obscure or hide the message), but the public messages would be fine. Private messages (being encrypted for only those the right keys) might be a legal issue, so for a legit amateur radio deployment, the client would have to disable that (or at least operators would have to be educated that private messages may violate fcc rules).
(at 9m53s: https://youtu.be/WzMm7-j7yIY?t=9m53s)
How do you see this happening in such a relative short amount of time? Who (else) is going to do this? Is our culture predisposed to do this, and, if not, is there a strategy to overcome this culture factor?
edit: for clarity
Allow me to designate trusted friends / custodians. Store fractions of my private key with them, so that they can rebuild the key if I lost mine. They should also be able to issue a "revocation as of certain date" if my key is compromised, and vouch for my new key being a valid replacement of the old key. So my identity becomes "Bob Smith from Seattle, friend of Jane Doe from Portland and Sally X from Redmond". My social circle is my identity! Non-technical users will not even need to know what private key / public key is.
Introduce a notion of the "relay" server - a server where I will register my current IP address for direct p2p connection, or pick my "voicemail" if I can't be reach right away. I can have multiple relays. So my list of friends is a list of their public keys and their relays as best I know them. Whenever I publish new content, the software will aggressively push the data to each of my friends / subscribers. Each time my relay list is updated, it also gets pushed to everyone. If I can't find my friend's relay, I will query our mutual friends to see if they know where to find my lost friend.
There should be a way to create handles for real-life objects and locations. Since many people will end up creating different entries for the same object, there should be a way for me to record in my log that guid-a and guid-b refer to the same restaurant in my opinion. As well I could access similar opinion records made by my friends, or their friends.
Each post has an identity, as does each location. My friends can comment on those things in their own log, but I will only see these comments if I get to access those posts / locations myself (or I go out of my way to look for them). This way I know what my friends think of this article or this restaurant. Bye-bye Yelp, bye-bye fake Amazon reviews.
I will subscribe to certain bots / people who will tell me that some pieces of news floating around will be a waste of my time or be offensive. Bye-bye clickbait, bye-bye goatse.
Allow me to designate space to store my friend's encrypted blobs for them. They can back up their files to me, and I can backup to them.
a very nice person whom i like to call mix made a module for this recently: http://git.scuttlebot.io/%25XJz%2BcF9oIgd1eHYFGg3ycVwowLEseL...
The part which splits your key is now automated and part of Patchbay. I'll build the resurrection part when someone needs it
For identity, there's
Right now I'm particularly interested in https://github.com/solid/web-access-control-spec although I think it's incomplete when it comes to data portability and access control. From what I've seen on re-decentralizing the internet, access control is either non-existent, or relies on a server hosting your data to implement access control correctly.
What if, in the WAC protocol linked above, instead of ACL resources informing the server, we could have ACL resources providing clients with keys to the encrypted resource (presumably wrapped in each authorized agent's pub key). Host proof data is a necessity for decentralized social networking IMO, even if the majority of agents would happily hand their keys over to their host.
Also important that an initial smaller community would be targeted and that it would succeed there. FB did this will colleges, a federated one would in a world where FB already exists would have an even harder time.
Depends on your definition of "distributed", I suppose
Keybase offers decentralized trust, in that the Keybase server can't lie to you about someone's keys -- your Keybase client will trust their public proofs and not the Keybase server -- but it's not a distributed/decentralized service as a whole, because you still receive hints from the server about where proofs live, and learn Keybase usernames from it.
(I work at Keybase.)
No, I don't think the tech is quite there yet. Even just handing out human-readable usernames requires blockchain-style consensus, and we don't have a blockchain being followed along by everyone's machines to adjudicate consensus requests (yet!).
The folks at Blockstack Labs are doing fine work in this area, though: https://blockstack.org/
The fact that it failed at the most basic thing of actually telling what it is about, what it does and how would be good reasons to not use keybase.
It's unclear whether this can be changed later, and I'm not yet sure whether I want to use my real identity or a throwaway.
After creating an account with the default ¿randomly? generated name, I tried to use an invite obtained from http://184.108.40.206/invited which was linked from https://github.com/staltz/easy-ssb-pub.
All I got back was "An error occured (sic) while attempting to redeem invite. could not connect to sbot"
It worked with http://pub.locksmithdon.net/ though I feel a bit odd trusting a "locksmith" I've never heard of to stream lots of data to my harddrive...
It's cool that anyone can host a pub – basically, an instance of FB/Twitter/Gmail, it seems – but things 1) will get expensive for them, and it's unclear how they'll fund that – and 2) now I have to trust random people on the internet – not only to be nice, but also secure.
As a "random technically aware netizen", I honestly trust fooplesoft more, since they have a multi-billion-dollar reputation to protect. (Not that I trust fooplesoft).
FWIW, you can use pub.lua.cz:8008:@xYSW6eVu8gTS/nTSXZiH97dgKZ+wp7NkomR6WKK/PBI=.ed25519~iQ16RuvjKZqy/RhiXXmW9+6wuZNq+SBI8evG3PotxvI= if you have trouble connecting to the ones on github.
Feel free to add it to the wiki, I do plan to run it long term, but I am not a github user.
Right, but someone I trust could have their message corrupted, no?
eg; some political leader intends to write "everybody vote for Alice" and it is modified to read "everybody vote for Carol". Is this possible?
(I generally trust FB not to do this because their business would suffer if they were caught, for example – not so with ephemeral pubs)
your followers, their followers, and their followers (assuming everyone is using the default replication settings). These may include pubs or people you follow. If you are able to connect to a pub then most likely it is willing to replicate your feed.
The social aspect is important though because in this architecture what you see is determined by who you follow (and who they follow, etc.)
What I mean to say is, Usenet's social model hardly prevented it from drowning in a sea of low-value content.
Maybe I'm not thinking about it right or use it differently than most :P
See also Joel Spolsky on the topic: https://www.joelonsoftware.com/2003/03/03/building-communiti...
The deciding factor between what came before and facebook and twitter is the ability to broadcast to the entire social network at once, so all of the world can see your brilliance! Feeding into that narcissism is the killer feature of modern social networks.
But yes, for the majority of people, talking about themselves is exactly what they do. They talk about their vacation to the beach. They talk about the drama going on at work. They talk about their sister's date. They don't talk about advances in database design.
I was at a live event (a play) recently and was fascinated by a small group of women in their late 20s / early 30s. They spent a good 10-15 minutes before the play started just taking pictures of themselves being at the play and posting it to their social networks. They talked about the pictures, asked others to send them their copy of the picture. They took pictures from one angle and then another. They talked about who "liked" the picture they just uploaded. It went on and on and on. Not once did I overhear them talking about the play they were about to see. It seemed to be not the point at all. The play was just a hashtag for their social media posts.
Most good conversationalists are good at it because they explicitly draw the other person into talking about themselves and their interests. Whether things become narcissistic is more a factor of personality I think. Perhaps its more than that, though. A good conversationalist would steer the conversation to more interesting content - ie why he person is passionate about their hobby rather than just talking about their accomplishments. Perhaps we need to think about social network features that model what good conversationalists do? Not sure what that looks like though.
[edit for typo]
See, for example, the indiewebcamp people, who are against "silos", as they call Facebook et al., but are recreating their same functionalities with personal blogs and a new version of Pingbacks called "webmentions".
That's what "webmentions" do.
> That way my feed was a mix of the discussions I'm having with others as well as my own stuff.
Here's something I like: everything you say is part of a public discussion, so you're talking alone, also comments have about the same weight as standalone posts, also outsiders can join the discussion, it isn't restrict to your current circle of friends.
Yes, but just because no one has tried to create a different social network. That's why I made my initial comment in the first place.
> What you are describing has existed for years and we called it Usenet. Or a forum. Or a mailing list.
I don't know about Usenet, but forums and mailing lists are generally oriented to narrow topics, it is not something in which you'll see your school friends or people with multiple areas to discuss varied subjects.
Tumblr has some pretty good discussion about movies and books.
Twitter not so good for discussion because off the length limit, but there's plenty of people posting concise observations and jokes rather than posting about themselves.
On both systems, people can reply to content from strangers, and there's lots of conflict arising from that.
I do think Tumblr would be improved by making it easier to have discussions that don't go to all your followers by default, for example like on Twitter where if you tag people at the start of your tweet, it doesn't go into the main feed for your followers who aren't tagged.
Or you can go all the way to partitioning a system into topics, as with Reddit. I wouldn't call that a social network though, you don't just casually start a conversation with people you've chosen to connect with, you start a conversation with a subreddit.
When I used it, which admitedly was a long time ago now, the biggest setback was lack of cross device identities. So I ended up having two accounts with two feeds, `wesAtWork` and `wes`. Maybe they have solved this by now.
ps. Does patchwork still have the little gif maker? Because that was a super fun feature.
> forking a website so easily also makes spoofing very easy...
A fork copies the files of a site, so yeah, it certainly would be easily to spoof somebody's site. It basically is a spoof button. But doing so creates a new cryptographic identity for the site, and that will be the basis of how we authenticate
I understand that transparency might not be a design goal or techinically possible, I'm just raising the concern.
Can't I just share my private key across multiple devices?
2) it would greatly complicate the replication protocol, having to take into account forks, rather than assuming append only, where you can represent the current synced state with a single counter.
I ended up removing the gif maker in one iteration because it was so frequently buggy. That was probably the worst call I made.
Under the hood, patchwork connects to a scuttlebot server. Scuttlebot in turn is based on secure-scuttlebutt (ssb).
Patchwork is a user interface for displaying messages from the distributed database to the user, and to allow the user to add new messages. The underlying protocol supports arbitrary message types, patchwork exposes a UI for interacting with a subset of them. Anyone could write and use other UIs while still contributing to the same database. Patchbay for example is a more developer-centric frontend.
Under the hood, patchwork connects to a scuttlebot server. Scuttlebot in turn is based on secure-scuttlebutt (ssb).
 https://github.com/ssbc/patchbay  http://scuttlebot.io/
edit- they got unkilled.
_THIS_ is very much the spirit of SSB. :)
The technology is here, the only thing left is to make people actually use it…
But unto this database you can build whatever, like there's a github thing and a soundcloud thing and a facebook thing
Forgive the rambling, this is the first time I've written any of this down...
My idea is to use email as a transport for 'social attachments' that would be read using a custom mail client (it remains to be seen if it should be your regular email client or have it be just your 'social mail' client. But... if using another client as regular email, users would have to ignore or filter out social mails). It could also be done as a mimetype handler/viewer for social attachments.
Advantages of using email:
- Decentralized (can move providers)
- email address as rendezvous point (simple for users to grasp)
- Works behind firewalls
- Can work with local (ie Maildir) or remote (imap) mailstores. If using imap, helps to address the multiple devices issue. Could also use replication to handle it too (Syncthing, dropbox, etc)
Scuttlebutt looks like a nice alternative though. Will be following closely.
Problem is you don't have a mean to publicly advertise your status and offer a way to subscribe. That would be a third party provider. I can imagine someone fetching everyone's updates and providing a mechanism to just resend the mail via a public web repository that would act as a public registration hub.
That would be a huge data mine though. Unless you add pgp in the mix and then you have to hit the mark on the client pgp handling to easily allow close friends to give out their public key.
Wouldn't that make a fun POC project ?
I remember I was thinking about it when pownce came out.
I still believe the net would be so much more fun with the likes of pownce and w.a.s.t.e around :(.
I remember having some actual conversations on w.a.s.t.e. That's never happening with torrents.
One this I didn't mention was that there wouldn't be any public posts so definitely more social network than social media.
Anyways, just an idea at this point, though a prototype would not be that hard to put together as an experiment.
Granted, I have been running mail for a long time, so I got to learn the complications as they happened, rather than all at once. But anyone who can set up a production-quality web server/appserver/DB along with the accessories that go along with it can handle it.
Now if email isn't important to your business and/or you just don't want to deal with maintaining it, that's valid. But it just isn't as difficult as a lot of people seem to want to make it out to be.
Are you using Sendmail?
Not in a long time. I haven't found a situation in which I couldn't use Postfix in quite a while. Although the occasional sendmail.cf flashback still hits me.
that's why I asked.
I think such an approach could be interesting, but it seems there is a need for a non-profit to govern such a thing.
So your approach should work in theory within the described framework.
But seriously, this is a very very interesting project.
Maybe it's just me but if I see an article is x+ hours old (15+ for example), I don't bother commenting.
What type of social networking would HN use for non personal(not for family and immediate friends) communication? (I've tried hnchat.com, it's mostly inactive imho)
Perhaps HN could introduce email notifications (e.g., if somebody replies to one of your comments/posts, you get a notification by email).
1) be on the same wifi (presumably great for dissidents in countries with heavy-handed internet control, and inconvenient for everyone else)
2) use "pubs", which can be run on any server, and connected to ¿through the internet?
So most users would use pubs, which are described as "totally dispensable" (a nice property). But how can users exchange information about which pub to subscribe to? Is there a public listing of them?
It seems like the "bootstrapping server" problem (eg; reliance on router.bittorrent.com:6881) will still exist in practice. For that matter, is there currently an equivalent to router.bittorrent.com that would serve this purpose?
This seems like a potentially significant project, and I'm excited by the possibility that it might actually take off – hence the inquiry.
What about organizing groups, which might currently use Slack? For example, political dissidents who don't necessarily all know each other personally. They must use some other communication channel to communicate pubs?
I know some people who work in that area, and every time one of them finds out I work in software their first question is about mesh networking. If SSB is what it seems to be (user friendly, no-frills ad hoc mesh networking) then that would be huge for emergency and disaster planners! Is it mature enough to be used in this way?
Also, please remember that the American First Amendment limits Government speech restrictions. Private communities and individuals can make any rules they want about social acceptable speech.
naturally. where can I learn more?
If I want to have access to everything that's been shared with me, I have to store it all. In the case of images, the storage burden can get large quickly.
The flipside to your remark is, that it is fully offline capable and I'm perfectly happy with that. Also: contrast it with how much space a thunderbird profile takes up.
Is the protocol set up in such a way as to enable easy, automatic deletion of old data from local devices, while still storing them for easy search/scroll-based access on the Pub servers?
Thank goodness Facebook isn't what I want my social feed to look like... all those GIFs and garbage updates.
Also, I suppose if you're linking out of the SocialApp and into the Web, that most of the content is just "messages".
> then it syncs up with the larger storage on my home PC later.
I can hardly wait for devices that work this way.
Sure, but I have few enough friends as is. I know literally no one who would use this, as neat as it is. Bootstraping a social network is hard for both developers and users, but once it gets going, storage requirements would rise fast.
I observed in 2011 that HighWinds Media had not expired any non-binaries postings since 2006, and that Power Usenet had not expired a non-binaries posting for eight years ("3013+ days text retention" was in its advertising at the time). People effectively just turned non-binaries expiry in Usenet off, in the first few years of the 21st century. I did on my Usenet node, too.
I observed then that the Usenet nodes' abilities to store posts had far outstripped the size of the non-binaries portion of a full Usenet feed, which was only a tiny proportion of the full 10TiB/day feed of the time.
We are also basically betting on the size of our message logs to generally grow slower than our individual storage capacities, and it is interesting to know that that worked for Usenet too. For blobs, we will likely develop some garbage collection or expiring approaches. Since the network is radically decentralized, each participant can choose their own retention policy. You can, in fact, delete all your blobs (`rm -rf ~/.ssb/blobs`) and assuming some peers have replicated them, your client will just fetch them again as you need them.
> I'm simply pointing out the error in the premise of your question.
No, you made a non-sequitur factual post about Usenet. I see no actual error pointed out. The fact that Usenet stopped expiring non-binary posts after most of their traffic fled to other services is not a valid argument against possibly using the feature in a peer to peer distributed social network.
If you don't see an error in your premise being pointed out, then you need to put your "posts expire and get deleted, like on Usenet" right up against "people were not expiring non-binaries posts on Usenet" until the penny drops.
Then you need to notice the point, already made by others as well, that the premise of ISL's question is erroneous, too. The storage requirements are not necessarily "tremendous", if one actually learns from the past. Again, your comparison to Usenet needs to involve considering how Usenet treated binaries and non-binaries very differently. (One can look to experience of the WWW for this, too, and consider the relative weights in HTTP traffic of the "images" that ISL talks about and the non-binary contents of the WWW. But your comparison to Usenet does teach the same thing.)
Your and ISL's whole notion, that everything is going to get tremendously big and so everything will need to be expired, rather flies in the face of what we can see from history actually happened in systems like this, such as the one that you made your comparison to. Usenet did not expire and delete non-binaries posts.
By making this comparison and then trying to pretend that it's someone else's non-sequitur you are closing your eyes to the useful lessons to actually learn from your comparison. Usenet, and the Wayback Machine, and the early WWW spiders, and Stack Exchange, and Wikipedia with all of its talk pages, and Fidonet in its later years (when hard disc sizes became large enough), all teach that in fact one can get away with keeping all of the "non-binary" stuff indefinitely, or at least for time scales on the order of decades, because that is not where the majority of the storage and transmission costs is.
People have already danced this dance, several times, and making a distinction between the binary and the non-binary stuff and not fretting overmuch about the latter when one looks at the figures is generally where it ends up.
Let's say for instance that the file you're downloading is a long text file containing a novel, but all you care about is chapter 3. Then all you need are the pieces for chapter 3 – the rest can stick around in the ether somewhere.
This is harder to do with bags of bytes obviously – how do you know which bytes belong to chapter 3? – but if the pieces are self contained messages where you don't need either the previous or the next to make sense of it, then it should be trivial to link to them and the distribution could work like this. Whether it actually works like this or not I have no idea. Sounds like an interesting project anyway!
kidding aside: we just introduce a few instances that act as exchange points (also called "pubs").
I played a little with the idea of using tor hidden services to directly connect to people, so that you don't need another computer that runs all the time.
To help with this situation, the network includes so called pubs. These are basically bots that run 24/7 and friend people. The article very briefly mentions them. More information here: https://www.scuttlebutt.nz/concepts/pub.html
Decentralized social networks seems like an inevitable progression as internet users become more aware of their privacy and ways they can improve online relationships and ...."social networking"
For the "social media aspect", like in Twitter, we're looking at making alternative types of pubs. Imagine having a pub dedicated to only replicating your content (and no one else's). Or multiple of these. So that whoever wants to follow (if you're like Elon Musk famous) can just follow one of your pubs.
I am not sure how much thought has been given to the scalability of this solution, it sounds like it will benefit from most of the advantages offered by P2P in this department.
Eventually something like this could organically grow into the "next Internet", in much the same way that the current internet has morphed into what it is today.
How well has federation worked out in practice (for other federated, social network related protocols) so far?
As far as I know, federation has only worked for ancient stuff that has nothing to do with social networks, like email and DNS. Basically, it is a part of core functionality and thus can't be co-opted by commercial interests (though GMail has made quite an inroad!).
Until it has proven itself, social federation doesn't really seem like a strength to me. It does sound good in theory! Other people with actual experience are adding their anecdotes which lines up with what I'm trying to say.
Email only works _because_ of big players like GMail. Running your own server spam free and away from black lists is an endless task.
DNS is going a similar way with more and more ISPs resorting to hijacking DNS lookups for all sorts of nefarious reasons. This protocol seriously need a broadly embraced signature system to validate origin.
> Every time two Scuttlebutt friends connect to the same WiFi, their computers will synchronize the latest messages in their diaries.
Ultimately this technology seems to be a decentralized, signed messaging system. What problem are they solving? That facebook and twitter can delete and alter your messages?
Meanwhile I'm in search of a long-range, wireless communication system that can function like a network without the need of an ISP. Anyone know anything about this?
that using those services for one's communications places too much power in the hands of centralized authority. (I speak just for myself here).
> Meanwhile I'm in search of a long-range, wireless communication system that can function like a network without the need of an ISP. Anyone know anything about this?
On SSB we have discussed doing gossip connections over long-range wireless connections: https://viewer.scuttlebot.io/%2547H0BQQHAXvvqf8K3ngdeMtAHQdP...
Of course if you are looking for a network in a more traditional sense, something lower-level may be more appropriate.
Mesh networks/WANETs. But you need enough adoption for the network to be considered long-range. Generally they are local-only.
This is Zooko's triangle and was squared by blockchains. Namecoin (2011), BNS (the Blockstack Name System, 2014), and now a bunch of other fully-decentralized naming systems can give you unique usernames. Recently, Ethereum tried launching ENS and ran into some security issues and will likely re-launch soon.
But it doesn't matter because this issue is already solved. We already have globally unique usernames. They're called email addresses, they are unique by their very nature, and they are (for all intents and purposes) already decentralized.
No, they're not: email@example.com depends on microsoft.com, which depends on com, which depends on the root nameservers, which are … a central nameservice.
That's the whole point of Zooko's Triangle: of secure, decentralised and human-readable, you can have at most two. Global-singleton approaches are still centralised (the singleton is the centre), although they may build the singleton in a decentralised fashion.
Maybe I'm missing the point, and I would look to you to explain to me what that is. But I guess congrats, you don't rely on ICANN anymore...
Email addresses aren't in any way decentralised. Saying they are isn't true enough.
> What network does your blockchain run on?
The product in question _doesn't_ rely on ICANN, or Comcast running to your house; it can work without either of those.
So perhaps you would like to now explain how e-mail addresses are a system without a centre. Bear in mind that you yourself have just made the point about ICANN being at their centre. (-:
His point. I am a man.
In places lacking internet bandwidth, people could run pubs in hackerspaces, schools, offices, homes, Actual Pubs, etc. A pub in a place that people frequent would gossip messages for the people, so they would not all need to connect to the internet all the time. Even the pub itself doesn't have to connect to the internet for it to be useful, as it would still help messages spread when people connect to it. As long as someone in the network connects to the Internet at least once in a while, people will be able to communicate with the broader network. With this architecture we can make more efficient cooperative use of network bandwidth.
If so I see that as a fairly large limitation for the common user. Even though truly removing something from the internet is effectively an impossibility, I think most non-technical folks aren't actively aware of this, and I'd at least like the option make it harder for folks to uncover.
What's possible, on the other hand, is to make a message type "ignore the previous" which client apps would interpret to hide them, but obviously a client app can be configured to not hide them.
I'm certainly happy with the "gossip" approach; I just see it as challenging for some people to adopt when they are coddled with the idea that they can censor their past.
C.f. http://theworld.com/~cme/spki.txt and RFCs 2692 & 2693.
Scuttlebutt works the same way: anyone can name themselves anything and anyone can name other people anything, it's up to the client how to interpret those messages. more on how SSB embraces subjectivity: https://youtu.be/P5K18XssVBg.
Curious what motivated the shift.
Granted, Unite still used the Web, Opera accounts, and ISPs, but I believe it could communicate locally over a router too.
500 million tweets per day.
140 bytes (140 characters * 8-bit ASCII) per tweet.
140 bytes * 500 million = 70GB
Thats 70GB per day before metadata. Use this social network for a month and we've exceeded the 1TB mark, twice.
Remember this isn't just 70GB per day on one server, this is 70GB per day on every users PC.
the idea is to only replicate who you care about+a couple of their friends.
The average person has 208 twitter followers. So lets say you have 208 'friends' + a couple additional 'friends' for each of your original friends. That's 624 people total.
There are 100 million active twitter users each day and 500 million tweets per day, that's 5 tweets per person.
5 * 624 = 3120
That's 3120 posts you'll be processing per day. Multiply this by 140 bytes per post and you have 436800 bytes per day or 159.5 MB per year.
Not that this is a bad thing. There's still life in USENet, and a fair few people still sit and discuss things in various groups (if you know where to look). The backbone concept of USENet is still great from a decentralised point of view - someone just needs to add some crypto layers to it (as a standard), and I reckon it could rise again like a phoenix.
 I'm deliberately and totally ignoring the large elephant in the room with HDDs full of pirate software, media, and porn.
And the history of Usenet also shows that people do think that the fact that something is mis-used should preclude any use of it. One can look to the history of how several organizations discontinued their Usenet services as a lesson for that, too.
Do you mean to say that there are more binary messages than text messages by an order of magnitude before chunking and that none of those binaries were permitted copies under the law? Because I'm going to need to see evidence to be convinced of that.
What's different now is that we have pleny of disk space, and more than enough computing power to perform proper cryptography.
This absolutely is usenet + crypto.
Most anti-systemd comments are similarly poorly thought out and articulated.
This might be superficially usenet+crypto, but oftentimes
things are more than the sum of their conceptual parents.
Also wondering, can this be a replacement for Slack? Can I set up a private group chat room? Or can I only use the private @ feature to send private messages to multiple people?
That's roughly the level of paranoia I want now
It's pretty neat when you see two people/groups working on the same problem independently.
I've been working on something that has some similarities. Its a mesh network of pest traps in the New Zealand bush. Battery life is very important, so each node sleeps most of the time, then periodically wakes up and communicates with its neighbours. Made more complex by devices not having real time clocks.
Once every node in the network is powered down most of the time, I don't think you can consider it a grid.
"I'm all ears" said in a positive tone can be "Absolutely I agree, I'd love to hear your ideas".
"I'm all ears" in a negative tone is more like "Prove it. I don't think you have any ideas to this unsolvable problem."
All a matter of interpretation, I suppose :) (and fwiw I don't know whether the author meant it in a positive or negative tone)
If someone is rude to someone who's mistaken that could help them unlearn something, but more likely they will become more entrenched because of consistency principle.
Cialdini talks a lot about consistency principle in his book the "Influence: The Psychology of Persuasion"
It's not always natural and of course we all fail at it from time to time, but justifying poor behaviour because someone else started it just isn't cool.
Seems that networks of a social nature are inherently a feed of news. News from here or there, from one friend or another, etc...
What else could it be?