Hacker News new | past | comments | ask | show | jobs | submit login
Decentralization for the web (lwn.net)
177 points by psiconaut on July 30, 2015 | hide | past | favorite | 41 comments



With all messaging apps that gets introduced these days, how come the only one that we all actually uses is email? Because it's decentralized! It used to be that we designed protocols for the internet, and now we instead build services on HTTP. Simply because you won't make money by writing RFCs that anyone can implement. But as we think all these new apps are moving us forward, the reality is if I really want to be sure someone can read my message, I must use a 40 year old protocol.


It used to be that we designed protocols for the internet, and now we instead build services on HTTP.

This is the best summary of What's Wrong With The Internet that I've read for some time.


And people focused on HTTP because it mostly worked as everyone adopted NAT routers, as home networks grew beyond what your local expert was willing to support. NAT's implied firewall made security (and uselessness) the default, and was good enough to get paid.

You can see other echos of NAT in today's slow adoption of IPv6... Since lots of software thinks it can safely run on a LAN without you caring about it, lots of systems (even Linux environments) start up with that in mind (even if only subtly, by listening on all interfaces instead of just localhost), and that fosters the fear of dropping that NAT firewall, which is the obstacle to decentralized (and truly competitive) services.

Edit: A recent comment showing the mindset I'm talking about above: https://news.ycombinator.com/item?id=9983056


Due in no small part to the profession of network administration deciding that "security" was in fact not about writing secure software, but configuring firewalls to block every port but 80.

Sealed by the decision of ISPs to provide a piddly 1m up even on lines with 50m down, disallow inbound connections to residential modems, etc.


XMMP has/had a change, however it's a bit complex, and companies that want to keep their garden closed keep gutting support.


See Merideth L. Patterson's "On Port 80".

https://medium.com/@maradydd/on-port-80-d8d6d3443d9a


I find it so painful to see people rediscovering the same problems again and again. Not that I blame anyone individually, yet it holds us all back.

Way back then, when your online connection was flaky, expensive and ~5-50kbps, the first online-only apps where greeted with "but.. but.. what if offline?"

Next thing I know: The iPhone mandates internet access and steam rolls everything into online-first/only mode for the average user. Business models based on that make sense, the industry follows. Privacy concerns are swept away. Control is taken away from the user to the service. Data governance/ownership is on its head. Synchronization is hard, marketing is easier. Decentralization is written off as anarchic geek fantasies. Interoperability does not fit business needs. Open protocols become data islands.

Are we actually about to come back to "but.. but.. what if offline?". I hope so. And how about reversing some of the problems we introduced earlier while we are at it? Privacy, decentralized services, interoperability.


If you really look at this history of computing (back at least to the 1970's or so), you see that very many things go in cycles like that. Certain concepts come into vogue, become the rage, lose steam, are replaced by something else, then come back, become the rage, lose steam... lather, rinse, repeat.

Why exactly this happens is an interesting question. I don't know the answer, but I have a vague suspicion that it happens where there is a fundamental, ultimately unresolvable conflict, and the "answers" just oscillate around the central issue.


I am an author of a distributed database (http://gunDB.io/) and I thought this article was fairly good, but with one grievance. Hashes aren't the ultimate solution.

Certainly hashes will make things a lot better, but a lot of people I know (including some of the people referenced in the article) are heralding them as the second coming of the web.

I disagree because they present more problems, ones that are harder for humans and easier for machines. First off, in order to find the data you practically speaking have to already have the data. Second, if you don't have the data, then you have to know the hash, but how do you know the hash without having had the data OR trusting some authority for the hash? Now you are back to centralization.

Third, and most important from my perspective, a hash often times will point to outdated content because by the time you find it it might have changed (this of course is a good thing but can be problematic). I know they addressed this a little bit in the article but it is worth reiterating. The point is that it is hard to do synchronization on hashes and I would argue that sync is the more important problem and that is what I am trying to address in my open source database.


I agree that hash-based addressing creates a bit of a chicken-and-the-egg problem for content discovery, with the exception of, for example, existing links or QR codes. You'll always need some way of looking up the hash for the content you're hoping to find. But that lookup mechanism need not be a single centralized service. In fact, I would make the argument that in many cases, search engines are doing exactly this right now: you have several coexisting, competing, but centralized services, and a user is free to choose between them. It's centralization, sure, but in a much more robust way (though it admittedly could be significantly improved through, for example, a common search API). The only difference is that contemporary search engines are looking up URLs instead of hashes, but given how we're using URLs these days (ex: https://imgur.com/iauyhdf), I don't think it's a big distinction.

You're also spot-on that dynamic content is the hardest part of hash-based addressing, but the problem is a bit more nuanced IMO. On the one hand, all content, once created, is static. If a thermometer reads 32.8 at 15:40 on 30 July 2015, that data point is fixed in time, immutable. However, we humans think of things far more conceptually, and we build cognitive connections between things. All content is static, but all concepts are dynamic. So the question is then, how do you reconcile those two?

The best answer I've personally come across is bindings -- exactly analogous to binding names to variables in Python. Some hash bindings would be static, some bindings dynamic. That also allows you to construct more complicated objects like buffers natively. That's how I'm doing it in the project I'm working on (https://github.com/Muterra/doc-muse), though the bindings are a relatively new addition that I haven't had time to test yet (or update the documentation, for that matter).


Part of the way that FreeNET addresses the dating issue with hashes is to attach a version number to the content. A monotonic integer that goes up each time it's updated. In this case then though the hash can't be represented directly by the content but is instead more like a UUID. Which puts you back at your second point which FreeNET doesn't really have a way to deal with either, so it ends up people trading urls on message boards essentially.


Well, you're right that it requires centralization, but at least the amount of data you need to retrieve centrally is that much smaller. It would benefit the global infrastructure greatly if I could retrieve websites over LAN if they'd been cached by a computer there.

My wife and I visit a lot of the same sites, even see a lot of the same dynamic content (same facebook friends, etc). Surely there's some community effect that would make this a huge bandwidth win.


Careful: this gets you dangerously close to privacy violations. You may not want certain people to know that you are fetching a piece of content. "Hmm, the profile picture for this specific friend loaded suspiciously quickly.."

A similar mistake is security updates over bittorrent: a real-time list of vulnerable hosts.


Argh. Privacy really is such a double-edged sword.

You could still keep those bits off of the WAN, though. In a "privacy mode" the gateway could coordinate and throw in an average delay.


I think hashed addressing is a lot easier now mobile has increased the appeal of less typing. A user doesn't care whether a QR code contains a URL or a hash.


A hash based distributed database doesn't allow to benefit from the locality effect where user access locally produced data more frequently.

There is also the data hosting cost. It seams more fair and logical that data producers pays for the hosting of their data. Otherwise there is a risk of abuse.


What about distributed metadata dictionaries/taxonomies describing the content and syncing via something like pubsubhubbub?


It's true that http and html are completely obsolete, the proof of that is how smartphone apps perform so much better than their web counterpart.

Honestly, I think there is a need to have solid p2p libraries, so to be able to use a DHT, NAT punchthrough or upnp more easily. P2P is immensely hard and has a lot of implications as much as it has usages. Isn't there a p2p filesystem already ? I'm sure you could distribute public data in a decentralized manner.

I don't know if libtorrent does it well, but since there are security issues implementing software that sends data packets directly, it would be greatly welcomed.

Also I don't believe diaspora went far enough in the decentralization method. There should be no server at all.


I had an idea for doing exactly this in 2011 when domain names were being seized for censorship.

So far this thing: https://github.com/LukeB42/Uroko/tree/development implements a collaborative page editor. All it needs is a gevent-based implementation of Kademlia.

The difficult part is that you as a node in this overlay network may be requesting a document that only one other node has, and you have to trust that node. The other part is that different nodes will have cached the URL you're requesting at different times. How do you prevent lying nodes being taken seriously?

As for sharing edits to documents, that's probably best done as a premeditated thing with people you know, which is where merkle trees'll be useful for verification.


I think centralization occurs because of economies of scale, like the talk says. But open source and protocols can decentralize things again. Security is probably the hardest thing to guarantee when all the source is out there. It takes quite a while to secure against all the obvious attacks, while attackers can see all the code. I'd trust gmail for security before I trust some small host which installed squirrelmail.

But having said that, I think that the reason a lot of stuff becomes centralized is because SOCIAL is not decentralized today. Bitcoin decentralized money but user accounts, profiles, connections etc are still done in a centralized way. That's why GitHuv and is centralized even though git is not. Social and security - if there were solutions to these, many people would decentralize.

And by decentralized, I mean you still have a server hosting your stuff, but it would be your choice - it could be on a local network, and you wouldn't even need the internet. You could be in the middle of rural Africa and your village couls run a social network, which sometimes syncs with the outside world but 99% of the communication wouldnt require it, wouldn't require those drones fb launches.

I think our company Qbix has decentralized social, in that way. It's not decentralized like bitcoin or mental poker, but honestly I don't know why zero trust is such a big deal. Even bitcoin has most people host their wallet with others amd take risks.


Trusting closed-source applications over open-source sounds odd. Security by obscurity is not desirable. What you said about open-source, everyone can read it, is a strength not a weakness.

As long as there is programming there are bugs. We can't prove correctness of all programs by writing purely functional code. Having more eyes on the same code is more likely to expose these bugs. The caveat is that everyone hopes someone else has checked the code. But I don't see how using closed-source application would solve this issue.


That's exactly it. When a startup is just starting, the bugs can be exposed and exploited. Someone's got to fix them. Not every project is huge like linux and webkit. Yes, the mantra is with enough eyes, all bugs are shallow, but in the meantime anything that could be exploited would be exploited, if the network becomes big. The effort to result ratio would be small.

Security by obscurity can be better than exposing all your code to the world where any hacked can compromise the whole network, BEFORE the fix is patched.

And even with open source, would I trust a random small host to secure it better than google? Look at all the android vendors that don't even install the latest patches.


Fuzzing systems find exploits quite effectively in systems that are only available as binaries or APIs. SBO only really works if you're obscure in the sense that hardly anyone is using the system.


Fuzzng systems can do far less than an attacker who has the whole source code.


The primary problem is that the web is client-server and there are substantial inconvenience obstacles to hosting your own server, compared to using a cloud service.

Then there is what Schneier calls the security feudalism problem: maintaining a secure system is very hard, so people prefer to submit to a corporate feu lord which can provide them with security, paying with their privacy as a form of feu duty.


> and there are substantial inconvenience obstacles to hosting your own server

I think one of the biggest problems is DNS.

Until there is an ubiquitous decentralized name resolution protocol there is not much of a point in running your own server on a host that changes IPs every now and then.


Dyndns has been a thing for years. It's not particularly hard to set up, but it's just another thing you need to configure. And I suppose it's an external dependancy.

(Decentralised protocols run quickly into "Zooko's triangle": https://en.wikipedia.org/wiki/Zooko%27s_triangle )


I said "decentralized name resolution protocol" for the very reason that I anticipated someone bringing up dyndns.

DNS is quite centralized with its registrars and the DNS root.


I think the reason diaspora hasn't caught on is because micropayments aren't easy for the average user yet. If you want me to use bandwidth and keep a machine running in my house connected to some decentralized p2p facebook thingee, sure some might do it for free but if everyone who used your node tossed you 1/10th of penny or something like that per use maybe it would have a lower barrier to entry.

Not to mention that it needs to be much easier to setup a server securely for the average person, and have a digital wallet to store their micropayments. (I love bitcoin but asking your average facebook user to setup a wallet securely is still too difficult)

This article correctly called out the advertising incentive of today's big companies as not being in the average users' best interest. I wish it had also talked more about what kind of incentives that are missing to get people to become a part of p2p networks. Great article though, I like how the author tied in the slowing rate of discoveries/progress into the problem.


micropayments aren't easy for the average user yet

And never will be. Information assessment costs integral to the payment decisionmaking process are too high.

Bundle. Don't disaggregate.

(But allow for exclusion of high-cost items on request.)


Would someone kindly summarize the author's main idea?



So far a DHT is the latest method, and diaspora is the previous.

If people want to have true freedom of speech, or better privacy (or just enough), I really think there should be no server at all.

I really want to see an unmoderated decentralized forum. Even 4chan is slightly moderated.


That already exists, it's called Frost and runs on Freenet.


Thanks. All the author does is define the three terms?

I don't want to be lazy; I did look for some sort of thesis or argument but I'm not going to read that long paper without even knowing what it's about.


And who was perhaps the first person to describe this concept, back in 1979? https://lh3.googleusercontent.com/-7RQK6LC3gdg/VGFsmQBX0RI/A...


Well, there’s “right” in some abstract sense, and then there’s “right” as in most profitable. Building user-centered technologies is hard to make competitive with building corporate-centered technologies, especially once big money and bureaucracies have taken a field over from the dreamers and tinkerers and scientists.

But yeah, Ted was right, as were Vannevar Bush and Doug Englebart before him, or Alan Kay after. Some more recent thingies:

http://idlewords.com/talks/internet_with_a_human_face.htm http://idlewords.com/talks/web_design_first_100_years.htm

http://worrydream.com/TheWebOfAlexandria/ http://worrydream.com/TheWebOfAlexandria/2.html

http://hapgood.us/2015/07/21/beyond-conversation/


I believe this is the mentioned keynote (starts after around 12 minutes). https://www.youtube.com/watch?v=VRGB40srFE8


When I read the article I wondered why they did't link to the talk they paraphrased.

Probably it was written almost live and there wasn't a recording available when the article was published. However, they didn't even add the video link afterwards. Why?!


The EuroPython site doesn't even link to the videos from what I could see. The video title isn't particularly helpful, either. I'd imagine nobody realized it was there.


I thought classical java from 10+ years ago had immutable namespaces




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: