Hacker News new | comments | show | ask | jobs | submit login
An off-grid social network (staltz.com)
1031 points by staltz 167 days ago | hide | past | web | 366 comments | favorite



As a historical note, there used to be quite a few very popular solutions for supporting early social networks over intermittent protocols.

UUCP [https://en.wikipedia.org/wiki/UUCP] used the computers' modems to dial out to other computers, establishing temporary, point-to-point links between them. Each system in a UUCP network has a list of neighbor systems, with phone numbers, login names and passwords, etc.

FidoNet [https://en.wikipedia.org/wiki/FidoNet] was a very popular alternative to internet in Russia as late as 1990s. It used temporary modem connections to exchange private (email) and public (forum) messages between the BBSes in the network.

In Russia, there was a somewhat eccentric, very outspoken enthusiast of upgrading FidoNet to use web protocols and capabilities. Apparently, he's still active in developing "Fido 2.0": https://github.com/Mithgol


For those who weren't around, Usenet was built on uucp in the early 80s. As messages were store and forward you had to wait a good while for your messages to propagate - many servers only connected daily! Oh, and better set cron to dial in often as messages didn't stay in the spool too long!

Usenet back then was spam free and you could usually end up talking to the creators of whatever you're discussing. I rather miss it.

Quite a few tech companies used private newsgroups for support, so you'd dial into those separately. As they were often techie to techie they worked rather well.

I first came across Usenet and uucp via the Amiga Developer programme. Amicron and uucp overnight all seemed a bit magic back in 87 compared to dialing into non-networked BBS's to browse, very, very slowly!


> Usenet back then was spam free and you could usually end up talking to the creators of whatever you're discussing. I rather miss it.

I still use usenet! It's not quite what it used to be, but you should check it out.


Can you still access it though Google? They've bought DejaVú!


Yes, you can still access newsgroups through Google, just search for your favorite group, e.g. comp.lang.lisp.


Or eternal-september. Free accounts, although you don't get access to the binaries groups I don't think. I mostly use it for comp.risks, comp.arch.embedded, and some other things like that.


Aoie is another service still around, and is free without accounts.


or you could use the site I built that reads HN/reddit/slashdot and any newsgroup you wish on a webpage. Its free if a little buggy - www.sagebump.com

you need the www


I'm the one who brought FidoNet to Russia (Soviet Union, to be precise) in 1990. I remember how it was hard to find two more guys with modems and access to automatic international line in order to request a separate FidoNet region for USSR. Finally we got 2:50 region code in September 1990, and there were three of us - two guys from Novosibirsk and one from Yekaterinburg, both are large cities in the Asian part of USSR.

For us raised in Soviet Union, it was eye opening experience that you may freely exchange messages with people around the globe.


Setting up links to the West was a very brave thing to do.


It's slightly misleading to refer to Fidonet only in the context of Russia. It was popular in quite a lot of places around the world, not just Russia. Not even principally Russia, in its heyday.

These things are definitely systems to learn from, both their architectures and their histories; and people have already been drawing parallels to Usenet on this very page, notice.


When I said that Fidonet was a very popular alternative to internet in Russia as late as 1990s, I didn't mean that it was limited to Russia, but that in Russia particularly (well, FSU) it was still popular even in late 90s, while elsewhere in the world it was subsumed by internet.


Indeed. My BBS was fidonet connected in the 90s. Many interesting discussions happened dispite the day long delay in replication.


Bang paths are still supported in email addresses too!

Your email address might be george@cmu!vax!something!mitre!foo

Which meant Route the email through foo->mitre->something->vax->cmu.

Sysadmins would often keep tables of known routes. While people would describe their route from common known routing hosts.

We were so living in the cyberpunk future then.


Yep my connection back then was uucp on an Atari ST off 720k floppy no hard drive, through a 1200 baud modem, running a set of a mix of ported GNU utilities and Atari software to get usenet and email. My email address was a bang-path ...

And we liked it!


Computing was more fun back then.


In the very early 90s, my personal computer had a UUCP feed for email and news from a local BBS. I didn't use bang paths on it; the provider had a proper Internet connection. It worked satisfactorily well.


Early email providers in Argentina used Unix to Unix Copy Program (UUCP) to move forward your messages to some of the few internet connected servers. IIRC you would write email in Pegasus or Eudora and then upload/download your mail with a 9600bps modem to a UUCP server.


And this is why I'm curious why nobody yet written a social network on top of NNTP.


nobody made one[1]

[1]: https://chan.bbnet.io/


Someone has, but it's only got one user.

(Just guessing, but I wouldn't be surprised if it happened to be true).


This sounds like what I wanted from GNU Social when I first joined over a year ago. GNU Social/Mastodon is a fun idea, but it falls apart when you realise that you still don't own your content and it's functionally impossible to switch nodes like it advertised, along with federation being a giant mess.

I tried to switch what server my account was on halfway through my GNU Social life, and you just can't; all your followers are on the old server, all your tweets, and there is no way to say "I'm still the same person". I didnt realise I wanted cryptographic identity and accounts until I tried to actually use the alternative.

That's also part of the interest I have in something like Urbit, which has an identity system centered on public keys forming a web of trust, which also lets you have a reputation system and ban spammers which you can't do easily with a pure DHT.


Not being able to switch nodes pushes you to try and host your own instead. That's what I've done. IMO we should instead be looking at packaging a self-hosted version into a native Windows and Mac app. Run it in the background and everything's done.


This is what I want. I've been wanting to build something like this for a long time. Something where I own my data. I can back it up and if my laptop gets stolen, I just import my data on a new machine and we're good to go.

The challenges that I see: - Making it easy for any user to get up and running. - De-authenticating old devices. - Making it available from any mobile device.


I bought a VPS from CloudAtCost for a one-time fee of $35. Set it up with nginx and GNU Social and pretty much haven't looked back since. My instance is https://kwat.chat.

Ideally, the end solution would be dead simple. Download the Windows app, run it, put your credit card in if you need a URL registered, and it does everything, including daily backups to a folder on your disk.


That's basically the aim of Urbit. An instance you control that you can connect to other services to tie those external identities together.


Sounds a lot like Diaspora as well.


And then your laptop gets stolen and everything's gone.


And then you remember you set up an automated encrypted backup to the cloud and thank your past-self. Now is the time to do it if you're not doing this:

https://msol.io/blog/tech/dirt-cheap-client-encrypted-online...


Why, oh why do we have smart individuals replacing internet with the word for the visual analogy we used to represent the internet to not knowing better people in position of power ?

This is not a damn cloud ! This a remote computer you can access over the internet. Can we stop with this use of marketing lingo please ?


Sorry buddy, that ship has already sailed. Terms become popular because they are useful. We all know cloud means "a remote computer accessed over the internet", but that is rather cumbersome phrase. "Cloud" says it in 5 characters. Can you suggest a better term?


As a fan of both linguistic games and playing devil's advocate, how about:

"server"


6 characters, sorry ;-)


"host"... 4 characters, beats cloud...?


"Cloud" refers to a commoditized service, generally highly available / performant.

"Host" is generally less ambiguous, referring to a specific thing given the context of the discussion. It's a pronoun for machines (kinda).


Terms become popular because they're useful, but useful to whom?

The purpose of the original coinage of the word "cloud" was to obfuscate that you really meant "someone else's computer". It gives a nice warm, fuzzy decentralised impression - clouds are natural and ubiquitous! No one owns them! If it's in "the cloud" (note the definite article) then it's safe in the very fabric of the network, right?

Nope. It's in Larry and Sergey's basement. Not decentralised at all. Just somewhere else.

The proper term is "server", "datacenter", or "network", depending on what you're actually trying not to say.


If I understand scuttlebut correctly, your stuff is broadcasted to whoever it might concern and the pubs. If you still have your private key, you should still be able to access whatever is in the ether, right ? You somehow become the recipient of your own messages. Only the thief will also have access to your private key, so the account can be considered compromised.


This is a valid point and I don't trust my computer. I would trust, however, a Ledger Wallet http://ledgerwallet.com/ and theoretically and economically it's feasible to have a Ledger wallet app to sign every SSB message. This would be awesome to have.


so, data breach means all your private data is irrevocably publicized.

What percentage of users do you think would be affected by such cases? If it's something over 0.001%, it's a huge problem for a social network.

Sites like Coinbase and Github exist because they re-centralize distributed systems — users don't trust themselves to host their own data securely.

Alternately, if this isn't a problem, why don't users simply host all their own infrastructure for existing tech problems today ?

I'm sure someone capable of living in the Mojave Desert is capable of hosting their own infrastructure - is this network simply for those people, or is it also for journalists, trans people, and HR professionals?


Yep, that's how it works.


Make backups. How is this any different than saving photos just in case you spill water on your laptop keyboard?


99% of users either don't have backups of their precious photos, or only do because they blindly clicked through "set up iCloud" or their Google/Android or Windows equivalents.


It's different precisely because most social/cloud networks abstract away backups.

Very few people have considered whether or not they should attempt to back up their Facebook account. Same's true for Flickr, Twitter, and Gmail.


Frankly I trust Google and Facebook more than I do myself with regard to backups. I know it will eventually burn me, but I've lost, misplaced or misplaced the key to my backups more than once.

I'm probably in the minority being so irresponsible with my own backups, but I'm not alone.

Google and Facebook have a lot on the line with regard to user trust of their reliability. Also, they can't monetize data that they've lost.


But goosebook and facegle make backups for their own sake. You're still tied by vendor lock-in and can be locked out of your own data on a whim and prevented from switching to another service provider. They can and do monetize data that users have been locked out of.

I'd rather have my own backup copieS and take my own responsibilities. The scenario you evoke here would not happen if you had proper backup strategy, two is one and one is none.


The 'cloud backup' part could still be extracted away, using integrations with common providers (dropbox, onedrive, gdrive, etc). It would be a configuration step, but one that has pretty obvious benefits to the user so maybe they'd be likely to supply their credentials for that.


Maybe they should?


Maybe but the person's point stands. Facebook has multiple datacenters with probably some kind of backups. It keeps things for years at a time even when it doesn't need to. It likes to because it helps the business model. Hardly anyone's pics and stuff will disappear.

Compare that to their experience at home with personal gear. Many like the convenience and reliability of Facebook over their own technical skills or efforts. You'd have to convince those people... a shitload of people... that they should start handling IT on their own. Also note that there's many good, smart, interesting, and so on people that simply don't do tech. Anyone filtering non-technical or procrastinating people in a service will be throwing out lots of folks whose company they might otherwise enjoy.

So, these kinds of issues are worth exploring when trying to build a better social network.


It only takes one time of having your account locked by facebook or google and said people are automatically convinced of the obvious advantage of being in control of your own data.

Same with backups, lose your data to drive failure or theft once and suddenly having a backup strategy becomes a priority.

But as long as they have not been bitten once they don't care enough to actually do something proactive.


They usually resolve those with access to their data. Their computers getting trashed by malware or breaking is different. It can cost money to do recovery that might give them nothing. That concern is the more common case.


My laptop keyboard has drain holes directing the liquid to the bottom without ever getting inside, so that's one way it is different.


Or you shut your PC down when you leave the house and then want to access your network from your phone.


SSB's central premise -- distributed users, making an ad-hoc network connection whenever they are physically close, or perhaps have some network connection -- bakes in an assumption that a user's ability to connect to the network is sporadic.

It seems like the system would work just as well for people who decide to turn their system off when they go to work, or are on a sailboat. Of course it's not convenient in the same way that always-on social networks are, but that seems to be specifically not the point of SSB.


How about this: you buy a physical device at Wal-Mart for $29.99, plug it in, hook it up to your wifi and leave it plugged into an outlet. It's got Mastodon or GNU Social on it and could look like this, but branded: http://thegadgetflow.com/wp-content/uploads/2015/10/SmartPlu...


And then my internet connection goes down. Power goes out. I run over my data cap for the month. Comcast shuts me down for running a home server. My home network gets DDoS'd. I miss a patch day and I get hacked.

None of these things are a concern on traditional social networks. They have to be solved before the world has any chance of moving to a decentralized network.


They have to be solved before the world has any chance of moving to a decentralized network.

I say no. Most of them don't have to be solved first.

Feel free to convince me that 3 days of downtime on my personal messaging account cannot have my personal account is a problem.


It doesn't matter how you feel about it, take a look at people complaining when Google put a news article about Facebook at the top of the results instead of the Facebook login page:

https://www.theguardian.com/technology/blog/2010/feb/11/face...

These are people who typed "Facebook Login" into a Google search, clicked the first result without reading, and got confused. Now tell these same users that Comcast blocked their social network or that they can't log in on their phone because their home Internet connection is down.

If you want a social network filled with just people like you and me, look at App.net or GNU Social for inspiration. If you want average users to sign in, these issues absolutely do have to be solved.


Still not convinced. It doesn't need to be for everyone in the beginning.

The early web (when I entered and before) wasn't for everyone. And thats OK for me. Actually I think it is a good way to start.


"he early web (when I entered and before) wasn't for everyone. And thats OK for me. Actually I think it is a good way to start."

It got where it went by doing the opposite of what you're suggesting. The walled garden for smart elites were mostly working on OSI from what old timers tell me. The TCP/IP, SMTP, etc involved lots of hackers trying to avoid doing too much work. Much like the users you prefer to filter. Then, it just went from there getting bigger and bigger due to low barrier of entry. Tons of economic benefits and business models followed. Now we're talking to each other on it.


The problems were solved with centralization.


For this tbing I don't care about those problems.


The early web was devoid of average users. Then the flow of newcomers and not knowing better users surpassed the old timers and knowledgeable users. Then we entered a leveling downward race to a web tailored to their needs because they're the large majority.


What do you call traditional social networks ? To me a traditional social network is an AFK thing.

If you're internet connection goes down, power goes out, you get DDoS'D this would hinder your ability to use any third party online service anyways. The data cap and ISP restrictive terms of service are a different problem that would be challenged and fixed given internet subscriber would go the p2p self host way. The commercial ISP situation is a terrible mess right now. If you got hacked unplug from network, boot from recovery, restore from backup, you're back online in less time than it takes to recover a hacked facebook account.

You say decentralized but it seems to me you meant distributed here.


It's not that hard for the hypothetical manufacturer to set `git pull` and `apt-get update && apt-get upgrade -y` to run on a cronjob every morning at 4am.


Putting another barrier in front of it is not what's needed for people to use it. Treat it like email (or heck, early Facebook), get big clusters of users in by convincing universities and colleges to run a campus server. Businesses would also be a good idea but a harder sell


I do quite like this idea. And, it has precedence.


Then I have yet another device permanently plugged in and running, at a time where I and frankly all of us should try to reduce our energy consumption.


I'd rather see a universal single consumer server with easy download and plugins for all of this stuff. Host my social network, my mail server, my cloud apps, etc. Basically, make social network a part of OwnCloud and sell OwnCloud boxes. Instead of a million small devices, I do one big one... and "big" can still be RaspPi.


A Raspberry Pi is very low-powered though, it's completely insignificant compared to the power used by the webservices you use daily.


If the power consumption of a RPi for each household with someone like us is a major thing then I say we have come pretty far in reducing waste of energy. :-)


People don't want another thing to plug in even if it's low-power. IoT is still nascent and I already don't have enough outlets in my house...


I disagree. Google uses 1/4GW worldwide and that's less than a watt per user.


Where does that number come from?

One data center * 24 backup generators * 3 MW each = 72 MW per data center. Four of those = 288 MW > 1/4GW.

Google has more than three data centers.


Ah yes, that number is from 2011, published by Google. Can't find the original. But it was widely reported[1].

Assuming that they're doubling energy consumption every year they'd have reached 8GW in 2016. That's 8W per user if we assume 1 billion users. Energy usage of a Raspberry is not insignificant relative to even this.

Doing things at scale is vastly more efficient. And only a subset of Google services can be relegated to a Raspberry. Even if you host your own mails, are you ready to ditch the Google search index and Youtube?

[1] http://www.nytimes.com/2011/09/09/technology/google-details-...


It's always going to be running somewhere.


Ha. Good joke.


> Not being able to switch nodes pushes you to try and host your own instead.

And you just use it for yourself? well, ok... but at that point you could also use a fully decent system.


You can follow and reply to any one on any node from your own system. Your feed is populated from people on any node you follow.


> [...] which also lets you have a reputation system and ban spammers which you can't do easily with a pure DHT.

Sounds interesting. How can you ban spammers when they can just create a new public key/identity if their old one is banned? And also, what does "banning" comprise, in a decentralized social network? I would assume it would be sufficient to just "unfollow" that particular identity.


It seems like you could get a some of the efficiency gains of having lots of people on one node, but avoid the difficulty of moving, by setting up a new node for each person even if they are hosted on the same box. That way you can pick up the whole node with your account and move it, instead of trying to move your account from one node to another.


The Fejoa project (https://fejoa.org) actually targets this problem. It aims to make it possible for users to change their hosting server without losing their contacts.


I mean, my view on the switching nodes thing is. It's not like you can just switch emails. Sure you can install a shim in there to forward everything, but there is no way to actually switch it. And that is the design GNU social uses.


Yeah, and that's also a pain point with email.


It's a pain point with the entire design of federated services. On the other hand, the pain point with the monolithic services design of Facebook is I can't even talk to people on other services so...

I'd rather be able to email everybody and have it be annoying to switch than be able to only email people on my chosen provider (and then have to make an account on every service anyway).


> However, to get access to the DHT in the first place, you need to connect to a bootstrapping server, such as router.bittorrent.com:6881 or router.utorrent.com:6881

This is a common misunderstanding. You do not need to use those nodes to bootstrap. Most clients simply choose to because it is the most convenient way to do so on the given substrate (the internet). DHTs are in no way limited to specific bootstrap nodes, any node that can be contacted can be used to join the network, the protocol itself is truly distributed.

If the underlying network provides some hop-limited multicast or anycast a DHT could easily bootstrap via such queries. In fact, bittorrent clients already implement multicast neighbor discovery which under some circumstances can result in joining the DHT without any hardcoded bootstrap node.


I think you're being uncharitable in attributing a misunderstanding. The OP used the phrase "in the first place", and it's mostly correct that if you don't have any cached nodes (hence that phrase), the bootstrap nodes do in fact act as a single point of failure for you.

The multicast neighbor discovery is a neat idea. I wonder what percentage of clients/connections it results in successful bootstrapping for.


You can ship a client with a long list of "cached" nodes that were verified to be long-lived. I mean you need to obtain the client at some point, you can gather a fresh list of nodes along with it. From that point onward you keep your own cache fresh.

You could also run your own bootstrap node on an always-up server if downtimes making the lists stale is a concern.

You can also inject contacts when starting the client, you would have to obtain them out-of-band from somewhere of course, but it still does not require anything centralized.

If you're desperate you could also just sweep allocated IPv4 blocks and DHT-ping port 6881, you'll probably find one relatively fast. Of course that doesn't work with v6.

So there is no centralization and no single point of failure.

> The multicast neighbor discovery is a neat idea. I wonder what percentage of clients/connections it results in successful bootstrapping for.

It could work on a college campus, some conference network or occasionally some open wifi. Additionally there are some corporate bittorrent deployments where peer discovery via multicast can make sense.

If I understand TFA correctly scuttlebutt assumes(?) roaming through wifis and LANs. Those circumstances are ideal for multicast bootstrapping, so in principle the DHT can perform just as well as scuttlebutt, probably even better because once it has bootstrapped it can use the global DHT to keep contact with the network even if there is no lan-local peer to be discovered.


> You can ship a client with a long list of "cached" nodes that were verified to be long-lived. I mean you need to obtain the client at some point, you can gather a fresh list of nodes along with it. From that point onward you keep your own cache fresh.

There is no semantic difference between the two. The only difference is when you connect to the single-point-of-truth bootstrap, at download time (well, technically build-time) or at first startup time. And the latter probably gives you a more current, and not limited to long-lived nodes, thus better, answer.

> You could also run your own bootstrap node on an always-up server if downtimes making the lists stale is a concern.

Which itself needs to be bootstrapped. And once it is, it's equivalent to your local cache.


These are excellent ideas. Are any of them implemented? If I download e.g. uTorrent today and firewall off the hardcoded public bootstrap nodes, will it bootstrap?


> If I download e.g. uTorrent today and firewall off the hardcoded public bootstrap nodes, will it bootstrap?

Possibly, which mechanisms are used varies from client to client. Usually DHT bootstrap is not a primary goal but a side-effect of other mechanisms. Things that work in some clients:

  magnet -> tracker -> peer -> dht ping
  torrent -> tracker -> peer -> dht ping
  magnet -> contains direct peer -> peer -> dht ping
  torrent or magnet -> multicast discovery -> peer  -> dht ping
  torrent -> contains a list of dht node ip/port pairs
As you can see all but the last piggyback on regular torrent connections. But that's more because file transfers are the primary purpose and the DHT is not the raison d'etre of those implementations. If DHT connectivity were considered more important clients would also try more direct approaches.


> You can ship a client with a long list of "cached" nodes that were verified to be long-lived. I mean you need to obtain the client at some point, you can gather a fresh list of nodes along with it. From that point onward you keep your own cache fresh.

I believe this is how bitcoin works. Or at least it used to.


to me it always sounds like approaches like dht are the solution but i'm having difficulties diving into it for the purpose of implementing it for my own apps.

are there any noteworthy resources for non-academics to get started?


Well, for an in-depth understanding you will ultimately have to read the academic papers on specific DHT algorithms, but you don't have to be an academic to read academic papers, no? Besides that there are the usual resources for higher-level overview or gleaning some details: wikipedia, protocol specifications, toy implementations on github, stack overflow, various blog posts/articles that can be found via google.

But a DHT is usually just a low-level building block in more complex p2p systems. As its name says it's simply a distributed hash table. A data structure on a network. It just gives you a distributed key-value pair store where the values are often required to be small. In itself it doesn't give you trust, two-way communication, discovery or anything like that. Those are often either tacked on as ad-hoc features, handled by separate protocols or require some tricky cryptography.


Speaking as an academic who studies distributed systems, my advise is to stay away from anything that relies on a public DHT to work correctly. They're vulnerable to node churn, Sybil attacks, and routing attacks.

The last two are particularly devastating. Even if the peers had a key/value whitelist and hashes (e.g. like a .torrent file), an adversary can still insert itself into the routing tables of honest nodes and prevent peers from ever discovering your key/value pairs. Moreover, they can easily spy on everyone who tries to access them. It is estimated [1] that 300,000 of the BitTorrent DHT's nodes are Sybils, for example.

[1] https://www.cl.cam.ac.uk/~lw525/publications/security.pdf


In practice none of those attacks have yet reached a level of concern for bittorrent developers to deploy serious countermeasures. Torrents generally are considered public data, especially those made available through the DHT, and provide peer exchange which allows near-complete extraction of peer lists anyway, so it hardly introduces any new privacy leaks. Although maintaining secrecy while exchanging data over public infrastructure is desirable, that can be achieved by encrypting the payload instead of obscuring the fact that you participated in the network at all.

BEP42[0] has been implemented by many clients and yet nobody has felt the need to actually switching to enforcement mode.

All that is the result of the bittorrent DHT being a low-value target. It does not contain any juicy information and is just one of multiple peer discovery mechanisms, so there's some redundancy too.

[0] http://bittorrent.org/beps/bep_0042.html


> Although maintaining secrecy while exchanging data over public infrastructure is desirable, that can be achieved by encrypting the payload instead of obscuring the fact that you participated in the network at all.

If I'm "in" on the sharing, then I learn the IP addresses (and ISPs and proximate locations) of the other people downloading the shared file. Moreover, if I control the right hash buckets in the DHT's key space, I can learn from routing queries who's looking for the content (even if they haven't begun to share it yet). Encryption alone does not make file-sharing a private affair.

> BEP42[0] has been implemented by many clients and yet nobody has felt the need to actually switching to enforcement mode.

It also does not appear to solve the problem. The attacker only needs to get control of hash buckets to launch routing attacks. Even with a small number of unchanging node IDs, the attacker is still free to insert a pathological sequence of key/value pairs to bump hash buckets from other nodes to them.

> All that is the result of the bittorrent DHT being a low-value target. It does not contain any juicy information and is just one of multiple peer discovery mechanisms, so there's some redundancy too.

Are you suggesting that high-value apps should not rely on a DHT, then?


> Encryption alone does not make file-sharing a private affair.

Someone who is "in" on encrypted content can observe the swarm anyway, thus gains very little from performing snooping on a DHT. On the other hand a passive DHT observer who is not "in" will be hampered by not knowing what content is shared, he only sees participation in opaque hashes. Additionally payload encryption adds deniability because anyone can transfer the ciphertext but participants won't know whether others have the necessary keys to decrypt it.

What I'm saying is that any information leakage via the DHT (compared to public trackers and PEX) is quite small, and this small loss can be more than made up by adding payload encryption.

> the attacker is still free to insert a pathological sequence of key/value pairs to bump hash buckets from other nodes to them.

There is no bumping in kademlia with unbounded node storage. And clients with limited storage can make bumping very hard for others with oldest-first and one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.

> Are you suggesting that high-value apps should not rely on a DHT, then?

No, they should use DHT as a bootstrap mechanism of easy-to-replicate, difficult-to-disrupt small bits of information (e.g. peer contacts as in bittorrent) which then run their own content-specific gossip network for the critical content. In some contexts it can also make sense to make reverse lookups difficult, so attackers won't know what to disrupt unless they're already part of some group.


> Someone who is "in" on encrypted content can observe the swarm anyway, thus gains very little from performing snooping on a DHT.

I can see that this thread is getting specific to Bittorrent, and away from DHTs in general. Regardless, I'm not sure if this is the case. Please correct me if I'm wrong:

* If I can watch requests on even a single copy of a single key/value pair in the DHT, I can learn some of the IP addresses asking for it (and when they ask for it).

* If I can watch requests on all copies of the key/value pair, then I can learn all the interested IP addresses and the times when they ask.

* If I can do this for the key/value pairs that make up a .torrent file, then I can (1) get the entire .torrent file and learn the list of file hashes, and (2) find out the IPs who are interested in the .torrent file.

* If I can then observe any of the key/value pairs for the .torrent file hashes, then I can learn which IPs are interested in and can serve the encrypted data (and the times at which they do so).

This does not strike me as "quite small," but that's semantics.

> There is no bumping in kademlia with unbounded node storage. And clients with limited storage can make bumping very hard for others with oldest-first and one-per-subnet policies, i.e. bumping the attackers instead of genuine keys.

Yes, the DHT nodes can employ heuristics to try to stop this, just like how BEP42 is a heuristic to thwart Sybils. But that's not the same as solving the problem. Applications that need to be reliable have to be aware of these limits, and anticipate them in their design.

> No, they should use DHT as a bootstrap mechanism of easy-to-replicate, difficult-to-disrupt small bits of information (e.g. peer contacts as in bittorrent) which then run their own content-specific gossip network for the critical content. In some contexts it can also make sense to make reverse lookups difficult, so attackers won't know what to disrupt unless they're already part of some group.

This kind of proves my point. You're recommending that applications not rely on DHTs, but instead use their own content-specific gossip network.

To be fair, I'm perfectly okay with using DHTs as one of a family of solutions for addressing one-off or non-critical storage problems (like bootstrapping). But the point I'm trying to make is that they're not good for much else, and developers need to be aware of these limits if they want to use a DHT for anything.

EDIT: formatting


> This does not strike me as "quite small," but that's semantics.

It is quite small because bittorrent needs to use some peer source. If you're not using the DHT you're using a tracker. The same information that can be obtained from the DHT can be obtained from trackers. So there's no novel information leakage introduced by the DHT.

That's why the DHT does not really pose a big information leak.

> This kind of proves my point. You're recommending that applications not rely on DHTs, but instead use their own content-specific gossip network.

That's not what I said. Relying on a DHT for some parts, such as bootstrap and discovery is still... well... relying on it, for things it is good at.

> But the point I'm trying to make is that they're not good for much else, and developers need to be aware of these limits if they want to use a DHT for anything.

Well yes, but these limits arise naturally anyway since A stores data for B on C and you can't really incentivize C to manage anything more than small bits of data.

> I can see that this thread is getting specific to Bittorrent

About DHTs in general, you can easily make reverse lookups difficult or impossible by hashing the keys (bittorrent doesn't because the inputs already are hashes), you can obfuscate lookups by making them somewhat off-target until they're close to the target and making data-lookups and maintenance lookups indistinguishable. You can further add plausible deniability by by replaying recently-seeing lookups when doing maintenance of nearby buckets.


> It is quite small because bittorrent needs to use some peer source. If you're not using the DHT you're using a tracker. The same information that can be obtained from the DHT can be obtained from trackers. So there's no novel information leakage introduced by the DHT.

Replacing a tracker with a DHT trades having one server with all peer and chunk knowledge with N servers with partial peer and chunk knowledge. If the goal is to stop unwanted eavesdroppers, then the choice is between (1) trusting that a single server that knows everything will not divulge information, or (2) trusting that an unknown, dynamic number of servers that anyone can run (including the unwanted eavesdroppers) will not divulge partial information.

The paper I linked up the thread indicates that unwanted eavesdroppers can learn a lot about the peers with choice (2) by exploiting the ways DHTs operate. Heuristics can slow this down, but not stop it. With choice (1), it is possible to fully stop unwanted eavesdroppers if peers can trust the tracker and communicate with it confidentially. There is no such possibility with choice (2) if the eavesdropper can run DHT nodes.

> That's not what I said. Relying on a DHT for some parts, such as bootstrap and discovery is still... well... relying on it, for things it is good at.

> Well yes, but these limits arise naturally anyway since A stores data for B on C and you can't really incentivize C to manage anything more than small bits of data.

Thank you for clarifying. Would you agree that reliable bootstrapping and reliable stead-state behavior are two separate concerns in the application? I'm mainly concerned with the latter; I would never make an application's steady-state behavior dependent on a DHT's ability to keep data available. In addition, bootstrapping information like initial peers and network settings can be obtained through other channels (e.g. DNS servers, user-given configuration, multicasting), which further decreases the need to rely on DHTs.

> About DHTs in general, you can easily make reverse lookups difficult or impossible by hashing the keys (bittorrent doesn't because the inputs already are hashes), you can obfuscate lookups by making them somewhat off-target until they're close to the target and making data-lookups and maintenance lookups indistinguishable. You can further add plausible deniability by by replaying recently-seeing lookups when doing maintenance of nearby buckets.

I'm not quite sure what you're saying here, but it sounds like you're saying that a peer can obfuscate lookups by adding "noise" (e.g. doing additional, unnecessary lookups). If so, then my reply would be this only increases the number of samples an eavesdropper needs to make to unmask a peer. To truly stop an eavesdropper, a peer needs to ensure that queries are uniformly distributed in both space and time. This would significantly slow down the peer's queries and consume a lot of network bandwidth, but it would stop the eavesdropper. I don't know of any production system that does this.


> If the goal is to stop unwanted eavesdroppers, then the choice is between (1) trusting that a single server that knows everything will not divulge informatio

In practice trackers do divulge all the same information that can be gleaned from the DHT and so does PEX in a bittorrent swarm. Those are far more convenient to harvest.

> I'm not quite sure what you're saying here, but it sounds like you're saying that a peer can obfuscate lookups by adding "noise" (e.g. doing additional, unnecessary lookups).

That's only 2 of 4 measures I have listed. And I would mention encryption again as a 5th. The others: a) Opportunistically creating decoys by having others repeat lookups they have recently seen as part of their routing table maintenance b) storing data in the DHT in a way that requires some prior knowledge to be useful, which will ideally result in the only leaking information when the listener could obtain the information anyway if he has that prior knowledge.

There's a lot you can do to harden DHTs. I agree that naive implementations are trivial to attack, but to my knowledge it is possible to achieve byzantine fault tolerance in a DHT in principle, it's just that nobody has actually needed that level of defense yet, attacks in the wild tend to be fairly primitive and only succeed because some implementations are very sloppy about sanitizing things.

> To truly stop an eavesdropper, a peer needs to ensure that queries are uniformly distributed in both space and time.

Not quite. You only need to increase the number of samples needed beyond the number of samples a peer is likely to generate during some lifecycle, and that is not just done by adding more traffic.

> Would you agree that reliable bootstrapping and reliable stead-state behavior are two separate concerns in the application?

Certainly, but bootstrapping is a task that you do more frequently than you think. You don't just join a global overlay once, you also (re)join many sub-networks throughout each session or look for specific nodes. DHT is a bit like DNS. You only need it once a day for a domain (assuming long TTLs), and it's not exactly the most secure protocol and afterwards you do the heavy authentication lifting with TLS, but DNS is still important, even if it you're not spending lots of traffic on it.


> In practice trackers do divulge all the same information that can be gleaned from the DHT and so does PEX in a bittorrent swarm. Those are far more convenient to harvest.

I'm confused. I can configure a tracker to only communicate with trusted peers, and do so over a confidential channel. The tracker is assumed to not leak peer information to external parties. A DHT can do neither of these.

> That's only 2 of 4 measures I have listed. And I would mention encryption again as a 5th. The others: a) Opportunistically creating decoys by having others repeat lookups they have recently seen as part of their routing table maintenance b) storing data in the DHT in a way that requires some prior knowledge to be useful, which will ideally result in the only leaking information when the listener could obtain the information anyway if he has that prior knowledge.

Unless the externally-observed schedule of key/value requests is statistically random in time and space, the eavesdropper can learn with better-than-random guessing which peers ask for which chunks. Neither (a) nor (b) address this; they simply increase the number of samples required.

> There's a lot you can do to harden DHTs. I agree that naive implementations are trivial to attack, but to my knowledge it is possible to achieve byzantine fault tolerance in a DHT in principle, it's just that nobody has actually needed that level of defense yet, attacks in the wild tend to be fairly primitive and only succeed because some implementations are very sloppy about sanitizing things.

First, no system can tolerate Byzantine faults if over a third of its nodes are hostile. If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes. Are we assuming that no more than one third of the DHT's nodes are evil?

Second, "nobody has actually needed that level of defense yet" does not mean that it is a sound decision for an application to use a DHT with the expectation that the problems will never occur. So the maxim goes, "it isn't a problem, until it is." As an application developer, I want to be prepared for what happens when it is a problem, especially since the problems are known to exist and feasible to exacerbate.

> Not quite. You only need to increase the number of samples needed beyond the number of samples a peer is likely to generate during some lifecycle, and that is not just done by adding more traffic.

I'm assuming that peers are arbitrarily long-lived. Real-world distributed systems like BitTorrent and Bitcoin aspire to this.

> Certainly, but bootstrapping is a task that you do more frequently than you think. You don't just join a global overlay once, you also (re)join many sub-networks throughout each session or look for specific nodes. DHT is a bit like DNS. You only need it once a day for a domain (assuming long TTLs), and it's not exactly the most secure protocol and afterwards you do the heavy authentication lifting with TLS, but DNS is still important, even if it you're not spending lots of traffic on it.

I take issue with saying that "DHTs are like DNS", because they offer fundamentally different data consistency guarantees and availability guarantees (even Beehive (DNS over DHTs) is vulnerable to DHT attacks that do not affect DNS).

Regardless, I'm okay with using a DHT as one of many supported bootstrapping mechanisms. I'm not okay with using it as the sole mechanism or even the primary mechanism, since they're so easy to break when compared to other mechanisms.


> I'm confused. I can configure a tracker to only communicate with trusted peers, and do so over a confidential channel. The tracker is assumed to not leak peer information to external parties. A DHT can do neither of these.

But then you are running a private tracker for personal/closed group use and have a trust source. If you have a trust source you could also run a closed DHT. But the bittorrent DHT is public infrastructure and best compared to public trackers.

> I'm assuming that peers are arbitrarily long-lived. Real-world distributed systems like BitTorrent and Bitcoin aspire to this.

Physical machines are. Their identities (node IDs, IP addresses) and the content they participate in at any given time don't need to be.

> If I can Sybil a DHT, then I can spin up arbitrarily many evil nodes.

This can be made costly. In the extreme case you could require a bitcoin-like proof of work system for node identities. But that would be wasteful... unless you're running some coin network anyway, then you can tie your ID generation to that. In lower-value targets IP prefixes tend to be costly enough to thwart attackers. If an attacker can muster the resources to beat that he would also have enough unique machines at his disposal to perform a DoS on more centralized things.

> Are we assuming that no more than one third of the DHT's nodes are evil?

Assuming is the wrong word. I think approaching BFT is simply part of what you do to harden a DHT against attackers.

> Second, "nobody has actually needed that level of defense yet" does not mean that it is a sound decision for an application to use a DHT with the expectation that the problems will never occur.

I haven't said that. I'm saying that simply because this kind of defense was not yet needed nobody tried to build it, as simple as that. Sophisticated security comes with implementation complexity, that's why we had HTTP for ages before HTTPS adoption was spurred by the snowden leaks.

> Neither (a) nor (b) address this; they simply increase the number of samples required.

(b) is orthogonal to sampling vs. noise.

> I'm not okay with using it as the sole mechanism or even the primary mechanism, since they're so easy to break when compared to other mechanisms.

What other mechanisms do you have in mind? Most that I am aware of don't offer the same O(log n) node-state and lookup complexity in a distributed manner.


> But then you are running a private tracker for personal/closed group use and have a trust source. If you have a trust source you could also run a closed DHT. But the bittorrent DHT is public infrastructure and best compared to public trackers.

You're ignoring the fact that with a public DHT, the eavesdropper has the power to reroute requests through networks (s)he can already watch. With a public tracker, the eavesdropper needs vantage points in the tracker's network to gain the same insights.

If we're going to do an apples-to-apples comparison between a public tracker and a public DHT, then I'd argue that they are equivalent only if:

(1) the eavesdropper cannot add or remove nodes in the DHT; (2) the eavesdropper cannot influence other nodes' routing tables in a non-random way.

> This can be made costly. In the extreme case you could require a bitcoin-like proof of work system for node identities. But that would be wasteful... unless you're running some coin network anyway, then you can tie your ID generation to that. In lower-value targets IP prefixes tend to be costly enough to thwart attackers. If an attacker can muster the resources to beat that he would also have enough unique machines at his disposal to perform a DoS on more centralized things.

Funny you should mention this. At the company I work part-time for (blockstack.org), we thought of doing this very thing back when the system still used a DHT for storing routing information.

We had the additional advantage of having a content whitelist: each DHT key was the hash of its value, and each key was written to the blockchain. Blockstack ensured that each node calculated the same whitelist. This meant that inserting a key/value pair required a transaction, and the number of key/value pairs could grow no faster than the blockchain.

This was not enough to address data availability problems. First, the attacker would still have the power to push hash buckets onto attacker-controlled nodes (it would just be expensive). Second, the attacker could still join the DHT and censor individual routes by inserting itself as neighbors of the target key/value pair replicas.

The best solution we came up with was one whereby DHT node IDs would be derived from block headers (e.g. deterministic but unpredictable), and registering a new DHT node would require an expensive transaction with an ongoing proof-of-burn to keep it. In addition, our solution would have required that every K blocks, the DHT nodes would deterministically re-shuffled their hash buckets among themselves in order to throw off any encroaching routing attacks.

We ultimately did not do this, however, because having the set of whitelisted keys growing at a fixed rate afforded a much more reliable solution: have each node host a 100% replica of the routing information, and have nodes arrange themselves into a K-regular graph where each node selects neighbors via a random walk and replicates missing routing information in rarest-first order. We have published details on this here: https://blog.blockstack.org/blockstack-core-v0-14-0-release-....

> Assuming is the wrong word. I think approaching BFT is simply part of what you do to harden a DHT against attackers.

If you go for BFT, you have to assume that no more than f of 3f+1 nodes are faulty. Otherwise, the malicious nodes will always be able to prevent the honest nodes from reaching agreement.

> I haven't said that. I'm saying that simply because this kind of defense was not yet needed nobody tried to build it, as simple as that. Sophisticated security comes with implementation complexity, that's why we had HTTP for ages before HTTPS adoption was spurred by the snowden leaks.

Right. HTTP's lack of security wasn't considered a problem, until it was. Websites addressed this by rolling out HTTPS in droves. I'm saying that in the distributed systems space, DHTs are the new HTTP.

> What other mechanisms do you have in mind? Most that I am aware of don't offer the same O(log n) node-state and lookup complexity in a distributed manner.

How about an ensemble of bootstrapping mechanisms?

* give the node a set of initial hard-coded neighbors, and maintain those neighbors yourself.

* have the node connect to an IRC channel you maintain and ask an IRC bot for some initial neighbors.

* have the node request a signed file from one of a set of mirrors that contains a list of neighbors.

* run a DNS server that lists currently known-healthy neighbors.

* maintain a global public node directory and ship it with the node download.

I'd try all of these things before using a DHT.

EDIT: formatting


> You're ignoring the fact that with a public DHT, the eavesdropper has the power to reroute requests through networks (s)he can already watch.

But in the context of bittorrent that is not necessary if we're still talking about information leakage. The tracker + pex gives you the same, and more, information than watching the DHT.

> we thought of doing this very thing back when the system still used a DHT for storing routing information.

The approaches you list seem quite reasonable when you have a PoW system at your disposal.

> have each node host a 100% replica of the routing information, and have nodes arrange themselves into a K-regular graph

This is usually considered too expensive in the context of non-coin/-blockchain p2p networks because you want nodes to be able to run on embedded and other resource-constrained devices. The O(log n) node state and bootstrap cost limits are quite important. Otherwise it would be akin to asking every mobile phone to keep up to date with the full BGP route set.

> assume that no more than f of 3f+1 nodes are faulty. Otherwise, the malicious nodes will always be able to prevent the honest nodes from reaching agreement.

Of course, but for some applications that is more than good enough. If your adversary can bring enough resources to bear to take over 1/3rd of your network he might as well DoS any target he wants. So you would be facing massive disruption anyway. I mean blockchains lose some of their security guarantees too once someone manages to dominate 1/2 of the mining capacity. Same order of magnitude. It's basically the design domain "secure, up to point X".

> I'm saying that in the distributed systems space, DHTs are the new HTTP.

I can agree with that, but I think the S can be tacked on once people feel the need.

> How about an ensemble of bootstrapping mechanisms?

The things you list don't really replace the purpose of a DHT. A dht is a key-value store for many keys and a routing algorithm to find them in a distributed environment. What you listed just gives you a bunch of nodes, but no data lookup capabilities. Essentially you're listing things that could be used to bootstrap into a DHT, not replacing the next layer services provided by a DHT.


> This is usually considered too expensive in the context of non-coin/-blockchain p2p networks because you want nodes to be able to run on embedded and other resource-constrained devices. The O(log n) node state and bootstrap cost limits are quite important. Otherwise it would be akin to asking every mobile phone to keep up to date with the full BGP route set.

Funny you should mention BGP. We have been approached by researchers at Princeton who are interested in doing something like that, using Blockstack (but to be fair, they're more interested in giving each home router a copy of the global BGP state).

I totally hear you regarding the costly bootstrapping. In Blockstack, for example, we expect most nodes to sync up using a recent signed snapshot of the node state and then use SPV headers to download the most recent transactions. It's a difference between minutes and days for booting up.

> Of course, but for some applications that is more than good enough. If your adversary can bring enough resources to bear to take over 1/3rd of your network he might as well DoS any target he wants. So you would be facing massive disruption anyway.

Yes. The reason I brought this up is that in the context of public DHTs, it's feasible for someone to run many Sybil nodes. There's some very recent work out of MIT for achieving BFT consensus in open-membership systems, if you're interested: https://arxiv.org/pdf/1607.01341.pdf

> I mean blockchains lose some of their security guarantees too once someone manages to dominate 1/2 of the mining capacity. Same order of magnitude. It's basically the design domain "secure, up to point X".

In Bitcoin specifically, the threshold for tolerating Byzantine miners is 25% hash power. This was one of the more subtle findings from Eyal and Sirer's selfish mining paper.

> The things you list don't really replace the purpose of a DHT. A dht is a key-value store for many keys and a routing algorithm to find them in a distributed environment. What you listed just gives you a bunch of nodes, but no data lookup capabilities. Essentially you're listing things that could be used to bootstrap into a DHT, not replacing the next layer services provided by a DHT.

If the p2p application's steady-state behavior is to run its own overlay network and use the DHT only for bootstrapping, then DHT dependency can be removed simply by using the systems that bootstrap the DHT in order to bootstrap the application. Why use a middle-man when you don't have to?


> If the p2p application's steady-state behavior is to run its own overlay network and use the DHT only for bootstrapping, then DHT dependency can be removed simply by using the systems that bootstrap the DHT in order to bootstrap the application. Why use a middle-man when you don't have to?

It seems like we have a quite different understanding how DHTs are used, probably shaped by different use-cases. Let me see if I can summarize yours correctly: a) over time nodes will be interested or have visited in a large proportion of the keyspace b) it makes sense to eventually replicate the whole dataset c) the data mutation rate is relatively low d) access to the keyspace is extremely biased, there is some subset of keys that almost all nodes will access. Is that about right?

In my case this is very different. Node turnover is high (mean life time <24h), data is volatile (mean lifetime <2 hours), nodes are only ever interested in a tiny fraction of the keyspace (<0.1%), nodes access random subsets of the keyspace, so there's little overlap in their behavior. The data would become largely obsolete before you even replicated half the DHT unless you spent a lot of overhead on keeping up with hundreds of megabytes of churn per hour and you would never use most of it.

So for you there's just "bootstrap dataset" and then "expend a little effort to keep the whole replica fresh". For me there's really "bootstrap into the dht", "maintain (tiny) routing table" and then "read/write random access to volatile data on demand, many times a day".

This is why the solutions you propose are no solutions for a general DHT which can also cope with high churn.


> It seems like we have a quite different understanding how DHTs are used, probably shaped by different use-cases. Let me see if I can summarize yours correctly: a) over time nodes will be interested or have visited in a large proportion of the keyspace b) it makes sense to eventually replicate the whole dataset c) the data mutation rate is relatively low d) access to the keyspace is extremely biased, there is some subset of keys that almost all nodes will access. Is that about right?

Agreed on (a), (b), and (c). In (a), the entire keyspace will be visited by each node, since they have to index the underlying blockchain in order to reach consensus on the state of the system (i.e. each Blockstack node is a replicated state machine, and the blockchain encodes the sequence of state-transitions each node must make). (d) is probably correct, but I don't have data to back it up (e.g. because of (b), a locally-running application node accesses its locally-hosted Blockstack data, so we don't ever see read accesses).

> In my case this is very different. Node turnover is high (mean life time <24h), data is volatile (mean lifetime <2 hours), nodes are only ever interested in a tiny fraction of the keyspace (<0.1%), nodes access random subsets of the keyspace, so there's little overlap in their behavior. The data would become largely obsolete before you even replicated half the DHT unless you spent a lot of overhead on keeping up with hundreds of megabytes of churn per hour and you would never use most of it.

Thank you for clarifying. Can you further characterize the distribution of reads writes over the keyspace in your use-case? (Not sure if you're referring to the Bittorrent DHT behavior in your description, so apologies if these questions are redundant). For example:

* Are there a few keys that are really popular, or are keys equally likely to be read?

* Do nodes usually read their own keys, or do they usually read other nodes' keys?

* Is your DHT content-addressable (e.g. a key is the hash of its value)? If so, how do other nodes discover the keys they want to read?

* If your DHT is not content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?


> Not sure if you're referring to the Bittorrent DHT

I am, but that's not even that important because storing a blockchain history is a very special usecase because you're dealing with an append-only data structure. There are no deletes or random writes. Any DHT used for p2p chat, file sharing or some mapping of identity -> network address will experience more write-heavy, random access workloads.

> Are there a few keys that are really popular, or are keys equally likely to be read?

Yes, some are more popular than others, but the bias is not strong compared to the overall size of the network. 8M+ nodes. Key popularity may range from 1 to maybe 20k. And such peaks are transient, mostly for new content.

> Do nodes usually read their own keys, or do they usually read other nodes' keys?

It is extremely unlikely that nodes are interested in the data for which they provide storage.

> Is your DHT content-addressable (e.g. a key is the hash of its value)?

Yes and no, it depends on the remote procedure call used. Generic immutable get/put operations are. Mutable ones use the hash of the pubkey. Peer address list lookups use the hash of an external value (from the torrent).

> * If your DHT is not content-addressable, how do you deal with inconsistent writes during a partition? More importantly, how do you know the value given back by a remote node is the "right" value for the key?

For peer lists it maintains a list of different values from multiple originators, the value is the originator's IP, so it can't be easily spoofed (3-way handshake for writes). A store adds a single value, a get returns a list.

For mutable stores the value -> signature -> pubkey -> dht key is checked.


absolutely. ssb avoids using DHT for these reasons (and to prove you can build something interesting without using them)

also note: DHT: hash table, BlockChain: linked list. but there are a lot more datastructures than that!


What do you think of S/Kademlia, or the invite-based DHT Persea?


S/Kademlia does not solve this problem; it simply slows down the rate at which an adversary can attack the system by a small amount (i.e. by making node ID creation more expensive and increasing a key/value pair's number of replicas).

There are several DHT papers that talk about bootstrapping DHTs off of social networks. They all fail to solve the Sybil problem in the same way: an adversary simply attacks the social network by pretending to be many people.


Yes, this guy gets it. This community gets it.

Not everything needs a global singleton like a blockchain or DHT or a DNS system. Bitcoin needs this because of the double-spend problem. But private chats and other such activities don't.

I have been working on this problem since 2011. I can tell you that peer-to-peer is fine for asynchronous feeds that form tree based activities, which is quite a lot of things.

But everyday group activities usually require some central authority for that group, at least for the ordering of messages. A "group" can be as small as a chess game or one chat message and its replies. But we haven't solved mental poker well for N people yet. (Correct me if I am wrong.)

The goal isn't to not trust anyone for anything. After all, you still trust the user agent app on your device. The goal is to control where your data lives, and not have to rely on any particular connections to eg the global internet, to communicate.

Btw ironic that the article ends "If you liked this article, consider sharing (tweeting) it to your followers". In the feudal digital world we live in today, most people speak must speak a mere 140 characters to "their" followers via a centralized social network with huge datacenters whose engineers post on highscalability.com .

If you are interested, here I talk about it further in depth:

https://youtu.be/WzMm7-j7yIY


I have been researching along these same lines for a while now as well, ad-hoc/mesh network messaging. My use case would be an amateur radio mesh network. For a while, I was investigating running matrix.org servers on raspberry pis, connected to a mesh network without internet. And that does work, the closest I've come to a great solution.

But I had never heard of scuttlebut until now. This looks even more ideal. In amateur radio, everyone self identifies with their call sign, this follows the same model.

For amateur radio, there is a restriction against encryption (intent to obscure or hide the message), but the public messages would be fine. Private messages (being encrypted for only those the right keys) might be a legal issue, so for a legit amateur radio deployment, the client would have to disable that (or at least operators would have to be educated that private messages may violate fcc rules).


> And I predict, in the next 5-7 years we're going to see a lot more power to the people (...) through decentralized social networking tools / platforms that can run in new types of topologies.

(at 9m53s: https://youtu.be/WzMm7-j7yIY?t=9m53s)

How do you see this happening in such a relative short amount of time? Who (else) is going to do this? Is our culture predisposed to do this, and, if not, is there a strategy to overcome this culture factor?

edit: for clarity


My friends and I have thought this through in detail a while ago, and have a few suggestions to make. I hope you make the best of it!

Distributed identity

Allow me to designate trusted friends / custodians. Store fractions of my private key with them, so that they can rebuild the key if I lost mine. They should also be able to issue a "revocation as of certain date" if my key is compromised, and vouch for my new key being a valid replacement of the old key. So my identity becomes "Bob Smith from Seattle, friend of Jane Doe from Portland and Sally X from Redmond". My social circle is my identity! Non-technical users will not even need to know what private key / public key is.

Relays

Introduce a notion of the "relay" server - a server where I will register my current IP address for direct p2p connection, or pick my "voicemail" if I can't be reach right away. I can have multiple relays. So my list of friends is a list of their public keys and their relays as best I know them. Whenever I publish new content, the software will aggressively push the data to each of my friends / subscribers. Each time my relay list is updated, it also gets pushed to everyone. If I can't find my friend's relay, I will query our mutual friends to see if they know where to find my lost friend.

Objects

There should be a way to create handles for real-life objects and locations. Since many people will end up creating different entries for the same object, there should be a way for me to record in my log that guid-a and guid-b refer to the same restaurant in my opinion. As well I could access similar opinion records made by my friends, or their friends.

Comments

Each post has an identity, as does each location. My friends can comment on those things in their own log, but I will only see these comments if I get to access those posts / locations myself (or I go out of my way to look for them). This way I know what my friends think of this article or this restaurant. Bye-bye Yelp, bye-bye fake Amazon reviews.

Content Curation

I will subscribe to certain bots / people who will tell me that some pieces of news floating around will be a waste of my time or be offensive. Bye-bye clickbait, bye-bye goatse.

Storage

Allow me to designate space to store my friend's encrypted blobs for them. They can back up their files to me, and I can backup to them.


> Distributed identity

a very nice person whom i like to call mix made a module for this recently: http://git.scuttlebot.io/%25XJz%2BcF9oIgd1eHYFGg3ycVwowLEseL...


horcrux! also found here : https://www.npmjs.com/package/ssb-horcrux

The part which splits your key is now automated and part of Patchbay. I'll build the resurrection part when someone needs it


I think a lot of this stuff will rely on a more formal spec like https://github.com/solid/solid-spec to be useful.

For identity, there's https://github.com/solid/solid-spec/blob/master/solid-webid-...

Right now I'm particularly interested in https://github.com/solid/web-access-control-spec although I think it's incomplete when it comes to data portability and access control. From what I've seen on re-decentralizing the internet, access control is either non-existent, or relies on a server hosting your data to implement access control correctly.

What if, in the WAC protocol linked above, instead of ACL resources informing the server, we could have ACL resources providing clients with keys to the encrypted resource (presumably wrapped in each authorized agent's pub key). Host proof data is a necessity for decentralized social networking IMO, even if the majority of agents would happily hand their keys over to their host.


For relays, that is more or less what the pub servers do. I connect to a relay, and if I subscribe to a channel or follow an individual, I end up with messages from months ago. The pub servers "gossip" with each other, so any particular pub server you connect to, should be able to catch you up on all of your friends and channels "gossip".


you have had the same ideas as we did! this is roughly how secure scuttlebutt works.


For the distributed identity piece is there a good reason not to rely on keybase.io?

Also important that an initial smaller community would be targeted and that it would succeed there. FB did this will colleges, a federated one would in a world where FB already exists would have an even harder time.


> distributed identity piece is there a good reason not to rely on keybase.io

Depends on your definition of "distributed", I suppose


My impression was that keybase is distributed? Can it be used without talking to keybase's servers?


The Keybase server manages giving out usernames, and recording the proof URLs for users, and then your client hits the URLs, checks that the proofs are signed with the appropriate key, and caches them to watch for future discrepancies.

Keybase offers decentralized trust, in that the Keybase server can't lie to you about someone's keys -- your Keybase client will trust their public proofs and not the Keybase server -- but it's not a distributed/decentralized service as a whole, because you still receive hints from the server about where proofs live, and learn Keybase usernames from it.

(I work at Keybase.)


It looks like SSB is crying out for Keybase integration. Any plans for being able to add a SSB identity to my Keybase account?


Dunno! Would encourage SSB users to post the request to https://github.com/keybase/keybase-issues/issues/518.


Are you working on a fully decentralized architecture?


(Speaking personally, not sure what an official Keybase opinion would be.)

No, I don't think the tech is quite there yet. Even just handing out human-readable usernames requires blockchain-style consensus, and we don't have a blockchain being followed along by everyone's machines to adjudicate consensus requests (yet!).

The folks at Blockstack Labs are doing fine work in this area, though: https://blockstack.org/


Does keybase still upload the private PGP key to the keybase server by default? https://github.com/keybase/keybase-issues/issues/160


We're trying to move away from PGP to a per-device key model (keys never leave the client devices): https://keybase.io/blog/keybase-new-key-model


You did not answer my question, but I guess it does not matter with the new strategy. Looks like embrace-extend-extinguish to me – has it always been a goal of Keybase to replace PGP with something keybase-specific or did something change?


What the heck is keybase ? I went to the website and it does not offer a clue about what this is or how it works. It says "download the app" but is not an app. It says it's more than a website but does not seem to be distributed in any way.

The fact that it failed at the most basic thing of actually telling what it is about, what it does and how would be good reasons to not use keybase.


Bit of feedback: when you download the desktop application, it prompts for a desired name, image, and description.

It's unclear whether this can be changed later, and I'm not yet sure whether I want to use my real identity or a throwaway.

After creating an account with the default ¿randomly? generated name, I tried to use an invite obtained from http://198.211.122.115/invited which was linked from https://github.com/staltz/easy-ssb-pub.

All I got back was "An error occured (sic) while attempting to redeem invite. could not connect to sbot"

It worked with http://pub.locksmithdon.net/ though I feel a bit odd trusting a "locksmith" I've never heard of to stream lots of data to my harddrive...

It's cool that anyone can host a pub – basically, an instance of FB/Twitter/Gmail, it seems – but things 1) will get expensive for them, and it's unclear how they'll fund that – and 2) now I have to trust random people on the internet – not only to be nice, but also secure.

As a "random technically aware netizen", I honestly trust fooplesoft more, since they have a multi-billion-dollar reputation to protect. (Not that I trust fooplesoft).


These pubs you mentioned are suffering under the large amount of traffic generated by HN and they were not designed for this load. Ideally hosting your own pub should be as easy as possible. My goal is to have it possible under a Heroku "Click to deploy" button or Zeit `now staltz/easy-ssb-pub` so that we can have more pubs. By the way, my pubs are public just because I chose to, but I may take that down if I want. No data would be destroyed, since you'd have all that locally and you can connect to any other pub and replicate through that.


All pubs on the wiki are indeed overloaded. Interestingly if one sets up their own, the other pubs eventually sync with it, only the desktop client seems to be unhappy with laggy pubs. Is that by design?

FWIW, you can use pub.lua.cz:8008:@xYSW6eVu8gTS/nTSXZiH97dgKZ+wp7NkomR6WKK/PBI=.ed25519~iQ16RuvjKZqy/RhiXXmW9+6wuZNq+SBI8evG3PotxvI= if you have trouble connecting to the ones on github.

Feel free to add it to the wiki, I do plan to run it long term, but I am not a github user.


You don't need to trust the security of pubs. Validation of messages happens through cryptographic signing and public messages are public anyways. You also don't need to trust that pubs will be online much because your followers will also help host your content.


> public messages are public anyways

Right, but someone I trust could have their message corrupted, no?

eg; some political leader intends to write "everybody vote for Alice" and it is modified to read "everybody vote for Carol". Is this possible?

(I generally trust FB not to do this because their business would suffer if they were caught, for example – not so with ephemeral pubs)


Each message includes a signature by its author of its content, so it can't be falsified (unless the author's key is compromised)


Nope, there's proper crypto in place to prevent this (each user has their messages signed with a keypair)


So, if I post a GB worth of diary entries, who ends up caching it by default? Just my followers? People I follow? Pubs I connect to?


> So, if I post a GB worth of diary entries, who ends up caching it by default? Just my followers? People I follow? Pubs I connect to?

your followers, their followers, and their followers (assuming everyone is using the default replication settings). These may include pubs or people you follow. If you are able to connect to a pub then most likely it is willing to replicate your feed.


Why do all "social networks" have to be a feed of news? Couldn't anyone think of anything better than a system in which people are only encouraged to talk about themselves and try to get other people's approval? In which having more "friends" is always better, because you have more potential for self-agrandissement in your narcissistic posts?


With SSB, you could make a UI that renders the content in a different way, such as a HN-style thing, and use it on the existing network alongside people using other clients.

The social aspect is important though because in this architecture what you see is determined by who you follow (and who they follow, etc.)


The did, it was called Usenet and it was glorious.


$$$ MAKE MONEY FAST $$$

What I mean to say is, Usenet's social model hardly prevented it from drowning in a sea of low-value content.


If you want a "modern web" example, nntpchan - despite the name, its not related to usenet directly, only uses the same method of federated pub/sub replication.


presenting social content in a different format would be interesting, but I've not seen any compelling options. But the issues of looking for approval and social status are social issues - not sure how a social network tool could avoid that. Isn't that what a lot of people's casual relationships are like? What do most people do at parties and social events in person - they talk about themselves and each other...


On Hacker News, for example, people don't talk about themselves, although they do like to get upvotes.


but hacker news isn't really about social content. Some of that might creep in for some users, and yes the karma thing does pollute motivations for some, but I've never thought of it as a social network. Some networking is done here, but only indirectly I think.

Maybe I'm not thinking about it right or use it differently than most :P


Yeah, it is not a "complete" social network, I was just using it as an example that there are places in which people aren't so narcissistic as in Facebook.

See also Joel Spolsky on the topic: https://www.joelonsoftware.com/2003/03/03/building-communiti...


People are narcissistic by design. Social Networks were created to harness that narcissism to make money. I argue that there are a lot of other 'social networks' on the internet in a technical sense. You had profiles and added friends and then sent things to them/chatted with them on AIM, LiveJournal, any of a number of message boards, blogs/comments, etc.

The deciding factor between what came before and facebook and twitter is the ability to broadcast to the entire social network at once, so all of the world can see your brilliance! Feeding into that narcissism is the killer feature of modern social networks.


You have a point there. "Social content". What is "social content"? Is it people talking about themselves and what they think and showing pictures of themselves? Is that what you do with your best friends? You arrive at their houses and start talking about yourself without being asked?


No, that's not what I (and presumably you and others here) do. We are nerds. We like talking about things that make us think.

But yes, for the majority of people, talking about themselves is exactly what they do. They talk about their vacation to the beach. They talk about the drama going on at work. They talk about their sister's date. They don't talk about advances in database design.

I was at a live event (a play) recently and was fascinated by a small group of women in their late 20s / early 30s. They spent a good 10-15 minutes before the play started just taking pictures of themselves being at the play and posting it to their social networks. They talked about the pictures, asked others to send them their copy of the picture. They took pictures from one angle and then another. They talked about who "liked" the picture they just uploaded. It went on and on and on. Not once did I overhear them talking about the play they were about to see. It seemed to be not the point at all. The play was just a hashtag for their social media posts.


Ok, you're probably right, but I don't want to believe it.


hmm. I think people expect their friends to do some posting about their life or interests on (our current) social networks - so they implicitly have permission to do so. We don't expect our friends to immediately start talking about themselves in-person, but I do think we expect our friends to talk about themselves and what's going on in their lives - it would be pretty shallow friendships if we only talked about the weather or the news headlines.

Most good conversationalists are good at it because they explicitly draw the other person into talking about themselves and their interests. Whether things become narcissistic is more a factor of personality I think. Perhaps its more than that, though. A good conversationalist would steer the conversation to more interesting content - ie why he person is passionate about their hobby rather than just talking about their accomplishments. Perhaps we need to think about social network features that model what good conversationalists do? Not sure what that looks like though.

[edit for typo]


This is what blogging was like, no? Pingbacks created a conversation among posts.


Blogging is about the same as posting in your own "feed" on any social network.

See, for example, the indiewebcamp[1] people, who are against "silos", as they call Facebook et al., but are recreating their same functionalities with personal blogs and a new version of Pingbacks called "webmentions".

[1]: https://indieweb.org/


But the pingback was more about making your blog post a comment on someone's post. So it was creating a conversation. I could read a post, respond to it on my blog and ping back to the original. That way my feed was a mix of the discussions I'm having with others as well as my own stuff. I suppose if you take away the "feed" element, you could call web forums and Usenet "social networking".


> But the pingback was more about making your blog post a comment on someone's post.

That's what "webmentions" do.

> That way my feed was a mix of the discussions I'm having with others as well as my own stuff.

Here's something I like: everything you say is part of a public discussion, so you're talking alone, also comments have about the same weight as standalone posts, also outsiders can join the discussion, it isn't restrict to your current circle of friends.


What you are describing has existed for years and we called it Usenet. Or a forum. Or a mailing list. I can accept that "social networking" is a bad term, but in popular usage it encompasses the "personal feed" almost by definition.


> I can accept that "social networking" is a bad term, but in popular usage it encompasses the "personal feed" almost by definition.

Yes, but just because no one has tried to create a different social network. That's why I made my initial comment in the first place.

> What you are describing has existed for years and we called it Usenet. Or a forum. Or a mailing list.

I don't know about Usenet, but forums and mailing lists are generally oriented to narrow topics, it is not something in which you'll see your school friends or people with multiple areas to discuss varied subjects.


It doesn't have to be. I have my blog using Pelican, no comments, no pingbacks, nothing. I write about stuff that interests me because I feel to. If someone finds one of my articles and they find it interesting, good for them. If they don't, good for them too.


On Twitter and Tumblr you can make extra accounts to participate in discussions you're interested in, and select people to follow based on that, so the feed system is okay for talking about things other than yourself if the feeds don't include everyone you know by default.

Tumblr has some pretty good discussion about movies and books.

Twitter not so good for discussion because off the length limit, but there's plenty of people posting concise observations and jokes rather than posting about themselves.

On both systems, people can reply to content from strangers, and there's lots of conflict arising from that.

I do think Tumblr would be improved by making it easier to have discussions that don't go to all your followers by default, for example like on Twitter where if you tag people at the start of your tweet, it doesn't go into the main feed for your followers who aren't tagged.

Or you can go all the way to partitioning a system into topics, as with Reddit. I wouldn't call that a social network though, you don't just casually start a conversation with people you've chosen to connect with, you start a conversation with a subreddit.


I'm all ears.


Social networks could be topic-oriented rather then "me-oriented". Like reddit but more personal.


sounds like G+


That's your dystopian description of the current state of social networks and not the reality.


Well, it is impossible to fit the "reality" in a text comment, so yes.


Since the author didnt mention it, the original creator of the patchwork project is https://github.com/pfrazee

When I used it, which admitedly was a long time ago now, the biggest setback was lack of cross device identities. So I ended up having two accounts with two feeds, `wesAtWork` and `wes`. Maybe they have solved this by now.

ps. Does patchwork still have the little gif maker? Because that was a super fun feature.


Also, because Paul has awesome projects, and deserves some attention when a project of his makes it to the top of HN but doesn't even mention him, he is working on a browser for the distributed web called Beaker (I am using it to write this now), and it is awesome.

https://github.com/beakerbrowser/beaker


@cowardlydragon you got downvoted to death but that's a fair assumption so I want to reply to you here

> forking a website so easily also makes spoofing very easy...

A fork copies the files of a site, so yeah, it certainly would be easily to spoof somebody's site. It basically is a spoof button. But doing so creates a new cryptographic identity for the site, and that will be the basis of how we authenticate


Cross device identity is still an issue, but not a problem in the foundation. It's a matter of making client apps (like Patchwork) recognize a message of type "link this and that account together" and then your friend's app would automatically follow both accounts and render them as if they are the same thing. It'll be done eventually in Patchwork.


Yeah that is what they were talking about when I was following the project. Once that is done in patchwork, I might try using it again.


It will be a must once mobile is launched, which I'm working on.


Is it also possible to use multiple devices without leaking from which device each message was posted?


Well, yes and no. The log will show a different id (public key) which authored the message. But the device itself (iPhone or Google Nexus or whatever) doesn't need to be mentioned.


That could leak information a user doesn't want to be leaked, like at which hours he is at work (using the work computer) etc. Which id belongs to which device could probably be inferred when the service is used actively.

I understand that transparency might not be a design goal or techinically possible, I'm just raising the concern.

Can't I just share my private key across multiple devices?


Nothing stops you from copy-pasting your asymmetric keys (it's a file) to different devices. I bet it's feasible, the biggest issue is also making sure your log stays the same, because a log shouldn't get forked.


Those sound like pretty unavoidable, and often acceptable, drawbacks.


Is there a reason you can't just use the same key pair on both devices?


yes. 1) it would be significantly less secure - compromising either device would compromise both. Imagine an airplane with two engines that needs both to fly - a single engine plane is actually safer - because the chance of loosing one of one is less than the chance of loosing one of two, (assuming chance of engine failure is independent) Use a separate key on each device is like a two engine plane that can still fly with one engine - this is significantly safer than a single engine plane.

2) it would greatly complicate the replication protocol, having to take into account forks, rather than assuming append only, where you can represent the current synced state with a single counter.


I'm having trouble following this and the reply thread below. Why is identity device-specific? So every time a get a new computer I have a new public key?


You can also use the same keypair on multiple devices. This however results in another problem: You could post content from both devices simultaneously. But the underlying protocol requires each message to refer to the previous message by the same identity. So if two different devices post a message without having received the message of the other one, one of the messages is considered invalid.


My role was to provide an applications perspective. I worked on UIs and data models. Dominic's the maker of the tech.

I ended up removing the gif maker in one iteration because it was so frequently buggy. That was probably the worst call I made.


Can you (or someone) clarify the difference between Patchwork and SSB? Does SSB handle the networking and discovery and encryption and whatnot, and Patchwork just acts as front-end for displaying diaries, connecting to pubs, posting and so forth?


Patchwork is a user interface for displaying messages from the distributed database to the user, and to allow the user to add new messages. The underlying protocol supports arbitrary message types, patchwork exposes a UI for interacting with a subset of them. Anyone could write and use other UIs while still contributing to the same database. Patchbay[1] for example is a more developer-centric frontend.

Under the hood, patchwork connects to a scuttlebot[2] server. Scuttlebot in turn is based on secure-scuttlebutt (ssb).

[1] https://github.com/ssbc/patchbay [2] http://scuttlebot.io/


The downvotes on replies are baffling over here. Here's what AljoschaMeyer said, and it's all accurate:

Patchwork is a user interface for displaying messages from the distributed database to the user, and to allow the user to add new messages. The underlying protocol supports arbitrary message types, patchwork exposes a UI for interacting with a subset of them. Anyone could write and use other UIs while still contributing to the same database. Patchbay[1] for example is a more developer-centric frontend. Under the hood, patchwork connects to a scuttlebot[2] server. Scuttlebot in turn is based on secure-scuttlebutt (ssb). [1] https://github.com/ssbc/patchbay [2] http://scuttlebot.io/


Thank you, I enabled [dead] and I too am baffled at why useful responses are getting killed.

edit- they got unkilled.


correct, patchwork is the UI, ssb is the database.


This excites me. I'm probably naive, but I always imagine that one day I'll retire and spend my days trying to work on an open source mesh network (or something similar). I want future generations to live in a world where 'the internet' isn't a thing that authorities can grant/deny. A headless social network is a promising omen of a headless internet.


> I want future generations to live in a world where 'the internet' isn't a thing that authorities can grant/deny.

_THIS_ is very much the spirit of SSB. :)


Did I misunderstand the article or is this only for social feeds? I want headless internet in general, guess it's a common idea these days.


https://docs.meshwith.me

The technology is here, the only thing left is to make people actually use it…


It's kinda like a database which is chill with being incomplete, but you only get bits of this shared database from your friends (and their friends)

But unto this database you can build whatever, like there's a github thing and a soundcloud thing and a facebook thing


I think the InterPlanetary File System is what you're looking for: https://github.com/ipfs/ipfs


I've been thinking about this very thing the past few days!

Forgive the rambling, this is the first time I've written any of this down...

My idea is to use email as a transport for 'social attachments' that would be read using a custom mail client (it remains to be seen if it should be your regular email client or have it be just your 'social mail' client. But... if using another client as regular email, users would have to ignore or filter out social mails). It could also be done as a mimetype handler/viewer for social attachments.

Advantages of using email: - Decentralized (can move providers) - email address as rendezvous point (simple for users to grasp) - Works behind firewalls - Can work with local (ie Maildir) or remote (imap) mailstores. If using imap, helps to address the multiple devices issue. Could also use replication to handle it too (Syncthing, dropbox, etc)

Scuttlebutt looks like a nice alternative though. Will be following closely.


I had been thinking about something like that too some years ago. Subject or first line of the mail should act as headers for the mail client extension parser. You could tag the social object you send out (event, picture, status update) and users could subscribe to those (the client would just filter out)(it solves the problem of being interested in an author's upcoming books and social comments but not in his comments on his family vacation). Likewise you could choose who get your updates.

Problem is you don't have a mean to publicly advertise your status and offer a way to subscribe. That would be a third party provider. I can imagine someone fetching everyone's updates and providing a mechanism to just resend the mail via a public web repository that would act as a public registration hub.

That would be a huge data mine though. Unless you add pgp in the mix and then you have to hit the mark on the client pgp handling to easily allow close friends to give out their public key.

Wouldn't that make a fun POC project ?

I remember I was thinking about it when pownce came out.

I still believe the net would be so much more fun with the likes of pownce and w.a.s.t.e around :(.

I remember having some actual conversations on w.a.s.t.e. That's never happening with torrents.


The popular free email providers do not like busy transactional mail service.


Sure, but remember it's p2p and no different from me composing an email and BCC'ing my 50 friends and family.

One this I didn't mention was that there wouldn't be any public posts so definitely more social network than social media.

Anyways, just an idea at this point, though a prototype would not be that hard to put together as an experiment.


email is so complicated. There are probably 3 people in the world who know how to properly run an email server.


That's absurd. True, it has become more complicated than it once was, but that's every technology that isn't dead.

Granted, I have been running mail for a long time, so I got to learn the complications as they happened, rather than all at once. But anyone who can set up a production-quality web server/appserver/DB along with the accessories that go along with it can handle it.

Now if email isn't important to your business and/or you just don't want to deal with maintaining it, that's valid. But it just isn't as difficult as a lot of people seem to want to make it out to be.


" I have been running mail for a long time, so I got to learn the complications as they happened, rather than all at once. But anyone who can set up a production-quality web server/appserver/DB along with the accessories that go along with it can handle it."

Are you using Sendmail?


> Are you using Sendmail?

Not in a long time. I haven't found a situation in which I couldn't use Postfix in quite a while. Although the occasional sendmail.cf flashback still hits me.


"the occasional sendmail.cf flashback still hits me."

that's why I asked.


Who needs to run a mail server? It's a (specialized) mail client that uses your existing email account as storage and transport.


I like this idea and think it has legs. But, eventually it might be worth rewriting daemons for SMTP/IMAP that have a simple config file format that would allow simple white/blacklisting domains such that one could run a server that would reject all emails from any domain other than the 6 one specifies. Further, without pgp encryption/signing msgs are discarded.

I think such an approach could be interesting, but it seems there is a need for a non-profit to govern such a thing.


scuttlebutt has sneakernet options it appears. Email is just another transport mechanism.

So your approach should work in theory within the described framework.


A hipster living on a self-steering sailing boat has 600 modules published on NPM. I can't even. Seriously how could this be even more funny?!


I think of hipster as someone who follows (non mainstream) trends, goes to starbucks, loves Apple products and cannot live without Wifi. Not a hacker who builds it's own stuff and cares about privacy. But maybe the beard confuses people.


It's hard to define the term, but I think at the most basic level, "hipster" specifically refers to anyone who enjoys being alternative. "Goes to starbucks, loves Apple products and cannot live without Wifi" are common qualities of people who want to be hipsters but only because they think it's cool (and don't actually embrace an alternative lifestyle).


I recommend dropping the term. It's meaningless now, if it ever actually even meant anything in the first place.


It's pretty funny because of the stereotype. On the other hand, it makes sense that someone dedicated to improving the state of privacy and personal data ownership lives on a boat. If he decides the government becomes too authoritarian, he just leaves with all his stuff. No go-bag needed because he has a go-house. Lay low in SEA near a wifi hotspot or something. You can't escape the reach of corporations, but he (and others) are working on it in the form of scuttlebutt.


Substack is awesome. I think he's very much a modern day instance of the old hacker ethos. He's so creative, too- I've really enjoyed some of his project ideas and presentations I've been fortunate to see.


He has a friend who builds is own shackle, in Hawai of course.

But seriously, this is a very very interesting project.


I am not much of a social networking type of person, but I have wondered how nice it would be to network with a community like HN. For example, I see a nice comment chain going on in some news article, but as the article dies so does all the conversation within it.

Maybe it's just me but if I see an article is x+ hours old (15+ for example), I don't bother commenting.

What type of social networking would HN use for non personal(not for family and immediate friends) communication? (I've tried hnchat.com, it's mostly inactive imho)


I think HN disables commenting after X hours for every post.

Perhaps HN could introduce email notifications (e.g., if somebody replies to one of your comments/posts, you get a notification by email).


Yeah, I'd be happy to see an active IRC channel for HN. Though, I doubt many would use it since IRC is way past being en vogue.


I like IRC still. For example, the science based channels in freenode.net is pretty good. #biology, ##physics and ##math. Might not be the worst of ideas for a mod to make one for us :s (provided they already have an IRC client going, it wouldn't be too much added hassle).


Notifications are the only feature I really miss comparing HN to Reddit.


Sublevel.net has a nice community. Why don’t you join in?


So it seems there are two ways to exchange information:

1) be on the same wifi (presumably great for dissidents in countries with heavy-handed internet control, and inconvenient for everyone else)

2) use "pubs", which can be run on any server, and connected to ¿through the internet?

So most users would use pubs, which are described as "totally dispensable" (a nice property). But how can users exchange information about which pub to subscribe to? Is there a public listing of them?

It seems like the "bootstrapping server" problem (eg; reliance on router.bittorrent.com:6881) will still exist in practice. For that matter, is there currently an equivalent to router.bittorrent.com that would serve this purpose?

This seems like a potentially significant project, and I'm excited by the possibility that it might actually take off – hence the inquiry.


It depends if you want to use it like Twitter (public announcements) or like Facebook (closed small/medium circles). If you use like Facebook, then it's enough that one person among your circle of friends (probably the most tech-savvy one) would host a pub and use that for their friends. You can see how you would probably be connected to a few pubs, because you usually have different circles of friends. If you want to use it like Twitter, then indeed we might need a DHT, but the point there was the resilience of the network.


Okay... in the FB case, though, my friend has to send me the IP/DNS address of their pub somehow, right? eg; Signal?

What about organizing groups, which might currently use Slack? For example, political dissidents who don't necessarily all know each other personally. They must use some other communication channel to communicate pubs?


Item 1 also applies to people trying to communicate in post-disaster scenarios (e.g. earthquakes).

I know some people who work in that area, and every time one of them finds out I work in software their first question is about mesh networking. If SSB is what it seems to be (user friendly, no-frills ad hoc mesh networking) then that would be huge for emergency and disaster planners! Is it mature enough to be used in this way?



The equivalent to a bootstrap node is the pub to which you used an invite code to connect to when joining the network (unless you connect to the network in a different way, like by being followed by someone else on your LAN). When you use an invite code, you publish a message with the pub's address on your feed. When peers replicate your feed, they see those messages, and thereby find out about new pubs to connect to.


Can I choose who's content I pass along? I am ok distributing my own feed, that's presumably why I am joining the network. I am not OK passing along someone else's hate speech, porn, warez, malware, spam, etc. I'd like to be able to review the feeds available and say "Yeah sure I'll pass that around." If everything in a feed is encrypted then I'd need to decide. Also yeah my brother who's feed I follow and pass may upload a really nasty bit of content and I may relay it.


Your computer will only help host data that was hosted by people that you follow. If you don't want to spread content that you disagree with, don't follow people who post such content.


wonder if that un/follow&mute feature will turn out to be a confounding mechanic in the future.


Curating exercises of freedom of speech, eh? Sounds like decentralization won't lead to more digital freedom after all with attitudes like this.


Your freedom to spread filth is precisely equal to my freedom not to repeat it.

Also, please remember that the American First Amendment limits Government speech restrictions. Private communities and individuals can make any rules they want about social acceptable speech.


yes. users actively replicate data, not passively. they have inherent control over what they pass on. If a community doesn't want to replicate your data, then go to find one that does.


I'd like to see all my friends post updates and photos to blogs where I can subscribe via rss. This would be the best social network for me.


What blogging system? Who provides the infrastructure? Getting back to pull/subscriptions via RSS would make me happy to, but this doesn't solve the problem of who's platform are we all sharecroppers on.


You can write a JavaScript component to read out the feed in the background and transform it into an RSS feed. Current U.S. law prevents someone from offering this commercially, especially in a SaaS package.


> Current U.S. law prevents someone from offering this commercially, especially in a SaaS package.

naturally. where can I learn more?


Medium has RSS feeds.


That seems fairly close to how LiveJournal works.


So... Instagram?


wow, I didn't know it you could subscribe with rss - I'll definitely look.


The storage requirements are tremendous, though, right?

If I want to have access to everything that's been shared with me, I have to store it all. In the case of images, the storage burden can get large quickly.


There are basically two types of storage. Logs and blobs. Logs were described in the blog post, but blobs weren't. Blobs are mostly images that type of stuff, and are stored in leveldb. It can easily get to 1Gb or more. The trick is that blobs aren't sigchained, so they could be garbage collected, and that is something that we're working on. Logs can't be garbage collected, but they grow slower than blobs do, and are usually around 100 Mb or less.


Well.. I've been on there for quite some time, granted it's been not mega active but here is a rundown of how much it took until now: there is the main sigchain database, which stores all the messages (following, posts, ....) which is now 150megs in size and there is the blobs (binary attachments like images) which is about 500megs in size. YMMV depending on how many catpictures your friend share ofc.

The flipside to your remark is, that it is fully offline capable and I'm perfectly happy with that. Also: contrast it with how much space a thunderbird profile takes up.


How would that change if you had, say, 5,000 friends – the fb limit, which some people do reach – who were posting multimedia content multiple times a day (which happens on fb)?

Is the protocol set up in such a way as to enable easy, automatic deletion of old data from local devices, while still storing them for easy search/scroll-based access on the Pub servers?


That's going to be a bigger problem on mobile than it is on desktop (I mean, not literally those amounts, but the amounts you'd get from a busier feed).


The entire stackexchange and English Wikipedia dumps including all media is less than 90 Gig. Even low end cell phones have men expansion slots to 128 gig. Whatever you plan to do socially maintaining a local copy is not a storage issue. Non of the cloud ppl will tell you that though.


But I bet Facebook is much larger. If this were to really meet my needs (i.e. people I actually know start using it regularly), I can see this becoming an issue that needs to be solved, especially on mobile. Bitcoin ran into this issue as well, the need for a client to get the whole blockchain. I can see some solutions once cross device identity is done where I get a small amount of the network, perhaps the most recent, then it syncs up with the larger storage on my home PC later.


> But I bet Facebook is much larger.

Thank goodness Facebook isn't what I want my social feed to look like... all those GIFs and garbage updates.

Also, I suppose if you're linking out of the SocialApp and into the Web, that most of the content is just "messages".

> then it syncs up with the larger storage on my home PC later.

I can hardly wait for devices that work this way.


> Thank goodness Facebook isn't what I want my social feed to look like... all those GIFs and garbage updates.

Sure, but I have few enough friends as is. I know literally no one who would use this, as neat as it is. Bootstraping a social network is hard for both developers and users, but once it gets going, storage requirements would rise fast.


Does Scuttlebutt intend to store every post forever? Or could posts 'expire' and get deleted, like on Usenet? It would be on you to save the content you wanted to have long term. Somebody could always take on the burden of capturing an archive of the whole thing in perpetuity and provide web access to it, like the archive.org does for usenet.


Check other comments here in HN thread about "blobs" and garbage collection. But also, there is the easy possibility of just starting a new account. In fact, substack did this, we refer to his old account as "deadsubstack".


Note that after the turn of the 21st century, people were not expiring non-binaries posts on Usenet.

I observed in 2011 that HighWinds Media had not expired any non-binaries postings since 2006, and that Power Usenet had not expired a non-binaries posting for eight years ("3013+ days text retention" was in its advertising at the time). People effectively just turned non-binaries expiry in Usenet off, in the first few years of the 21st century. I did on my Usenet node, too.

I observed then that the Usenet nodes' abilities to store posts had far outstripped the size of the non-binaries portion of a full Usenet feed, which was only a tiny proportion of the full 10TiB/day feed of the time.


The distinction of binary and non-binary posts on Usenet is paralleled by the separation of messages and blobs on Scuttlebutt. As staltz [explained](https://news.ycombinator.com/item?id=14051181), we can garbage collect ("expire") blobs, but not message logs (although a client could do so with the current APIs, it would have security/trust and UI implications, and I'm not aware of any clients doing so).

We are also basically betting on the size of our message logs to generally grow slower than our individual storage capacities, and it is interesting to know that that worked for Usenet too. For blobs, we will likely develop some garbage collection or expiring approaches. Since the network is radically decentralized, each participant can choose their own retention policy. You can, in fact, delete all your blobs (`rm -rf ~/.ssb/blobs`) and assuming some peers have replicated them, your client will just fetch them again as you need them.


You're comparing apples and oranges. Usenet was federated across beefy corporate servers. No single user had to walk around with the entire Usenet archive on their laptop.


It was your comparison, note. I'm simply pointing out the error in the premise of your question.


I wasn't comparing anything. I was proposing a suggestion and only mentioned usenet as a means of explanation.

> I'm simply pointing out the error in the premise of your question.

No, you made a non-sequitur factual post about Usenet. I see no actual error pointed out. The fact that Usenet stopped expiring non-binary posts after most of their traffic fled to other services is not a valid argument against possibly using the feature in a peer to peer distributed social network.


"like on Usenet" is definitely a comparison.

If you don't see an error in your premise being pointed out, then you need to put your "posts expire and get deleted, like on Usenet" right up against "people were not expiring non-binaries posts on Usenet" until the penny drops.

Then you need to notice the point, already made by others as well, that the premise of ISL's question is erroneous, too. The storage requirements are not necessarily "tremendous", if one actually learns from the past. Again, your comparison to Usenet needs to involve considering how Usenet treated binaries and non-binaries very differently. (One can look to experience of the WWW for this, too, and consider the relative weights in HTTP traffic of the "images" that ISL talks about and the non-binary contents of the WWW. But your comparison to Usenet does teach the same thing.)

Your and ISL's whole notion, that everything is going to get tremendously big and so everything will need to be expired, rather flies in the face of what we can see from history actually happened in systems like this, such as the one that you made your comparison to. Usenet did not expire and delete non-binaries posts.

By making this comparison and then trying to pretend that it's someone else's non-sequitur you are closing your eyes to the useful lessons to actually learn from your comparison. Usenet, and the Wayback Machine, and the early WWW spiders, and Stack Exchange, and Wikipedia with all of its talk pages, and Fidonet in its later years (when hard disc sizes became large enough), all teach that in fact one can get away with keeping all of the "non-binary" stuff indefinitely, or at least for time scales on the order of decades, because that is not where the majority of the storage and transmission costs is.

People have already danced this dance, several times, and making a distinction between the binary and the non-binary stuff and not fretting overmuch about the latter when one looks at the figures is generally where it ends up.


I was thinking the same thing, and I don't know enough (or anything really) about this to comment to this, but my second thought was that this probably works like bittorrent, where you don't need all of a file to make sense of the individual pieces.

Let's say for instance that the file you're downloading is a long text file containing a novel, but all you care about is chapter 3. Then all you need are the pieces for chapter 3 – the rest can stick around in the ether somewhere.

This is harder to do with bags of bytes obviously – how do you know which bytes belong to chapter 3? – but if the pieces are self contained messages where you don't need either the previous or the next to make sense of it, then it should be trivial to link to them and the distribution could work like this. Whether it actually works like this or not I have no idea. Sounds like an interesting project anyway!


Yeah, I've been working on something similar to this off and on. Functionally, the mechanics need to be similar to a distributed filesystem like Tahoe-LAFS or Freenet. Content has to be entrusted to the swarm.


They should break it up by time. For example your database only syncs the past month or so as needed and you can choose to request more if necessary. Someone mentioned 150MB right now. That thing is going to get massive eventually.


The quoted 150 MB is for all messages in one's network. On my local node there are 91773 messages, going back roughly a year and a half, taking up 147 MB - of which about 72 MB is the actual messages, and the rest are indexes. gzipped, the 72 MB of messages goes down to 29 MB.

More

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: