Hacker News new | past | comments | ask | show | jobs | submit login
Re-decentralizing the Web, for good this time (verborgh.org)
527 points by Schoolmeister on Jan 11, 2019 | hide | past | favorite | 281 comments

It puzzles me that the linked data future is still discussed, as if we didn't already try it, and didn't already discover that developers dislike arcane RDF standards and the academic-rooted designers of the specifications have a terrible track record of solving real-world problems. And that now they're presenting linked data as some critical component of the decentralized web while skipping out on the debates that everyone else in the space is having - like whether decentralization can be fast, or how to ensure data authenticity, or whether a 'local server / pod' can be built that doesn't get hosed by hole-punching through a home Comcast connection.

Instead, it's just 'what about old-fashioned websites, plus lots of xml schema and long spec documents'? It just tastes like a rehash of Berners-Lee's existing '5-star open data' schpiel ( https://5stardata.info/en/ ) but now with the billing that it'll fix the internet. 5-star open data has been around for years now, and, well, the linked data future isn't here. When's the last time you consumed RDF in an application?

I really, really sympathize with the goals here but when I read through these proposals a few months ago I literally facepalmed. They seem about as realistic as praying for some kind of deus ex machina.

Ultimately I think there are technical solutions to making the decentralized web more attractive than the walled gardens, but at this point they will need to be ridiculously polished and shiny to even get a look, and this stuff... is not. Going forward it gets even worse, they're going to be opposed at every step by corporations with more money than most nations.

The internet was originally decentralized because the government wanted to make it that way, and I think the only way to get back there is going to require a gigantic, economically unattractive investment. There are at least a few governments that may have the capability but I can't name one that would have the motivation. Hopefully some billionaire's charity will decide saving the internet is a worthy legacy.

The internet is already decentralized. Some billionaire can't do anything to fix the situation, at least not directly, because our draconian copyright and network access laws are the only reason that walled gardens are able to exist.

The internet doesn't really tolerate serious technical barriers stopping someone from automatically multiplexing the content from various social networks into a single read-write stream, for example. The issue is that when someone attempts to do that kind of thing, they get sued and they end up owing BigTechCo millions of dollars. [0]

An open internet is _not_ a technical issue. It's a legal one.

[0] https://www.eff.org/cases/facebook-v-power-ventures

Eh. I sympathize, but if you were to implement the same as a client-side app rather than a hosted service, AFAIK there's little history or legal precedent for shutting such a thing down. Well, there was the kerfuffle between Microsoft and Google over Windows Phone access to YouTube, but other than that... Admittedly, a client-side app might not be allowed into walled-garden app stores, limiting its reach if targeting mobile operating systems. Still, mobile isn't everything. I find it strange that after the long history of multi-protocol IM clients back when it was still called "IM", there are so few attempts to do the same for modern protocols. I think the barriers are partly technical – e.g. better encryption/obfuscation, more features and faster pace of changes making it harder to keep up. But even that can't explain everything: many APIs are not obfuscated at all, and many services can be boiled down to pretty simple functionality. Part of it, I think, must be social – like people aren't even aware that it's a possibility, or worth spending effort on.

The walled gardens exist because the open Internet kind of sucks, really.

E-mail is pretty much the last bastion of the old open Internet, and the amount of resources needed to just deal with malicious e-mails is huge. Mindbogglingly huge. And those costs cut out a lot of organizations from being able to operate their own e-mail servers (either the costs of doing it or the costs of verifying to the big players that the e-mail you're sending isn't garbage).

And that's pretty much the story across the board. The old Internet was overwhelmed by bad actors who would ruin everything. Facebook and Twitter house a lot of awful stuff. But can you imagine how bad it would be if we were all still using USENET and IRC?

I liked Usenet and IRC. It would make the internet great again.

I still use IRC. I run my own email server (and haven't had problems so far). There are more protocols than just SMTP and HTTPS. (I even sometimes make up new ones, although sometimes the existing ones will do, even the ones that aren't so common.)

And, yes I agree, it would make the internet great again.

NNTP (USENET) is fast, scalable and very practical.

I wish it was popular instead of random set of forums, mailing lists and reddits.

It is also overrun by people who use it to share large binary files of copyrighted material, and spammers, and so on and so forth.

One way to mitigate spam on Usenet is moderated newsgroups.

Another way might be something similar to ad-blockers for web browsers.

I find that self-regulating communities are much better at it than large platforms with lengthy EULAs written by lawyers. So yeah, if the people I cared about were on UseNet and IRC, I'd switch back to them in an instant.

   The walled gardens exist because the open Internet kind of sucks, really.
Was it that, or was it that the open internet was proving more difficult to monetize? Knock on effects on resources thrown at UX may be relevant.

It's not about UX, it's about how awful people are. Hacker News is an amazing place, and it's not because of the UX, which is certainly of a piece with the Old Internet. It's because there are people who put a lot of effort into making it a not a giant cesspool. And even then, it can be incredibly difficult to discuss some topics here because discussion becomes overrun by bad actors. If you want to get a sense of what the Internet would be like without an amazing amount of curation, go read the comments on basically any story run on a local newspaper or TV station, where they don't have the resources to do content moderation. Read up on the people who get PTSD having to filter content off Facebook so that an unsuspecting user doesn't run across child porn or a snuff film. The walled gardens exist because the places outside of the walls are <em>awful.</em> And they're a lot more awful if you exist outside of the band of college-to-middle-aged white men who makes up the bulk of the people who experienced the old Internet or who post here on HN. If you're in a lot of more socially vulnerable positions, Twitter is far more a fenced garden than a walled garden. But that fence is still a lot more protection than you'd get on a platform that doesn't have to do that kind of moderation to be viable to advertisers. Monetizing the Internet has had a lot of knock-on effects, but one of those effects is that it's really really hard to monetize open neo-Naziism and a lot of other awful things, so the big players clean up their platforms. Recent years have shown that it's easy to monetize a lot of racist dog-whistling, though. It's not perfect. But an alternative to what we have now needs to start from a point of how do you reduce harassment and social harms, not ignoring how the big commercial players have made things better than they were before.

>not ignoring how the big commercial players have made things better than they were before.

In my book, they're moderately worse. I entered the internet around the time of free web forums, where anyone could run one for any reason. I moderated a couple moderately sized ones, mostly oriented around computer games. Overall it wasn't too bad. Certainly nothing compared to what I've seen in larger communities. I suspect the larger the community, the worse the garbage.

But the main reason I say commercial players have made it worse is that they've also commercialized content moderation. Which is to say, they employ people to sit at a desk looking at the absolute worst humanity has to offer for 8 straight hours a day for barely better than minimum wage. That's like a job straight out of Black Mirror.

Forum moderation, by contrast, was/is a volunteer position. You were only in it for as long as you chose to be, and you could leave at any time without any effect on your livelihood.

So I'd argue the "community watch" model of amateur forum moderators was closer to the greater good than the commercial walled gardens.

They've made some things better, and some things worse. Which I think was the point.

"mindbogglingly huge" is an incredible overstatement. It's not easy to deal with malicious e-mails, but it's also not some unattainable end-goal that only a huge corporation can achieve. A single very motivated person can do it with significant effort, or a small team (3-4) people can do it with average effort (in the setup phase. Maintenance is a 1-2 people affair, and not even fulltime).

> The internet doesn't really tolerate serious technical barriers stopping someone from automatically multiplexing the content from various social networks into a single read-write stream, for example.

Is this still true when “telling who's a ‘robot’” is such a common thing to have happen? For instance, I've heard of at least one major platform both sending back quite a lot of UI telemetry and considering third-party clients a violation of their ToS; I haven't heard of strong action being taken yet. (I'm avoiding naming them both because I'm operating partly on hearsay and because I'm more interested in the general question.) Hasn't bigtech had a lot of time and motivation to advance “how to detect people who are using some weird software to talk to us”?

Simpler forms of technical barriers, like with the AIM protocol, were defeated in the past, but it seems like massively upgraded data backchannels, machine learning algorithms, and the new normality of silent automatic updates all the time might strongly favor a centralized defender. Plus IIRC the CableCARD wars didn't go so great, and there were presumably a lot of people motivated to save money on expensive TV packages, whereas risking losing access to all your friends for having slightly better control over something that's notionally “free” anyway sounds like a harder sell.

I don't think it's easy to defeat “socially required tech” + “automatic updates” + “machine learning” at all.

> The internet doesn't really tolerate serious technical barriers stopping someone from automatically multiplexing the content from various social networks into a single read-write stream

If the legal barrier went away, and someone surmounted the "unserious" (which I doubt) technical barriers to doing that, then the content providers would go out of business. That's arguably a good thing, but I suspect many people would disagree. The problem is the profit motive and financing model for what consumers want, not the degree of decentralization. Google didn't screw up the internet; people did, by preferring what Google offers.

The internet is already decentralized. Some billionaire can't do anything to fix the situation, at least not directly, because our draconian copyright and network access laws are the only reason that walled gardens are able to exist.

The internet as used by many, many people consists of a few centralized walled gardens. Walled gardens also exist because of network effects.

An open internet is _not_ a technical issue. It's a legal one.

Perhaps it's a social one as well.

I found it interesting they highlight the decentralized nature of email. And yet the decentralized design meant spam was basically unsolvable. It wasn’t until Gmail came along that they largely solved the spam problem.

There are so many abuse related issues on the web and I’ve seen no decentralized effort that works unfortunately. Cloudflare brought cost effective DDoS protection to the masses.

I would argue that the problem of ‘spam’ has not been solved, since that would mean an agreement on what constitutes spam—that spam has some inherent properties that subjects it to easy classification, and that spam is not evolving.

But that’s beyond the point that I originally wanted to make: spam and email have very similar parallels to security and the Internet. In another universe, it’s possible that the issue of spam and security could’ve been incorporated into the protocols themselves. But for some reason I’m sure is rational, those issues were moved outside, to the hosts — to be left to be solved with middleboxes as firewalls and Google’s spam filter.

I would argue that this was the smarter choice and that both of these involve the same problem: spam and security are ill-defined and constantly evolving.

Certificates? That's a pretty decentralized effort. It works because certs whitelist you. You don't claim to get one until you've been whitelisted somewhere, by someone else who has been whitelisted. (Generally). With email we assume the sender is a good actor and then have to run around blacklisting spam. Helped with the network effects but caused untold hassle.

Gmail "solved" the spam problem not by making email centralized, they did it by making a good spam filter.

Gmail basically took over email through centralization. A lot of companies and people use it because the price is fair and the spam filter works better than almost anything else.

I have tried FastMail and I like the company, but the spam filtering is not good. But good luck even trying to replicate FastMail anti-spam if you roll your entire email stack yourself. I have run my own email server even well before Gmail was popular and it was a nightmare, I'll never do it again. If a company asks me to do it, I'll quit.

Even if you put Gmail aside, email became reputation based, which is naturally going to mean the larger centralized platforms will succeed over a few independent outliers because they will always have a better reputation and control what is allowed to go out or in. Just like how the United States still has a large role in the internet and what is seen internationally because so much is still centralized within those borders, or the centralized companies that are within those borders control it.

I disagree vehemently on the difficulty of running your own email server. I have been running a pretty standard postfix + dovecot email setup for myself and close family for almost a decade now.

It definitely is necessary to once set up proper DKIM, SPF, DMARC and TLS by default (thanks to Let's Encrypt), but after that the setup is pretty much hands-off.

Spamassassin filters my spam down to at most one mail a day, and that's usually because they have some new type of topic not caught by the previous bayesian filter.

I'm pretty sure FastMail uses Spamassassin, so why is an email company so bad at configuring that software? Their spam filtering is simply not nearly as good as Gmail, it's not even close.

I used Spamassassin back in the day and it was effective for a bit, and then pretty much all spammers figured out how to avoid it.

Also, I'm not talking about the obvious VIAGRA large caps spam, which I think is fairly easy to solve. But spam has become much more nuanced and Gmail is the best at it, and the false positive rate is still amazing.

Which was enabled in large part from their centralization of email.

Gmail's spam filter is one of the worst. About once a month some gmail user tells me that he found my mail in the spam folder. Of course it is easy to keep down spam when you don't care about false positives.

The problem with this 'Solid' project seems to be that it is vaporware.

Anybody who wants to advance the open web should focus his efforts on a P2P library with extremely good NAT traversal capabilities that is extremely reliable and simple to use and supports as many programming languages as possible - certainly not just C++ or C. It needs to be deployable under a permissive license on all major platforms macOS, Windows, Unix, Linux, iOS, Android, and browsers, and may not transport any data or chew away bandwidth without allowing total control over this by the programmer and end user. It needs to have a dead simple, almost idiot-proof API. The resulting network on top of IP needs to be searchable, not too high latency, and route to any endpoint on it.

That's still the biggest hurdle for the Open Web. Everything else is secondary.

Does this meet your requirements almost? https://pypi.org/project/pyipv8/

And don't forget to add obfuscation and deniability to the lib, like Tribler and Perfect Dark did.

You'll be surprised to hear that developers like Linked Data. People starting with Linked Data development today are not burdened by the Semantic Web legacy and mistakes of the past. We've been working with front-end devs who have never seen RDF, and never will. They enjoy how Linked Data is able to cross borders and leads to more data than a centralized database could ever give you.

The confusion in your comment is that one would need RDF to do Linked Data. I've written about that misconception here: https://ruben.verborgh.org/blog/2018/12/28/designing-a-linke...

Don't get me wrong, the Semantic Web community has made mistakes and has not been developer-friendly. But we're not still stuck in the 90s. For instance, XML hasn't been a part of any of this for many years.

By RDF did you mean RDF/XML specifically? JSON-LD is still RDF, it's just serialized differently, which is fine, I like RDF, but OP may have more specific concerns than the syntax.

Well, it also contains usable ordered lists. Which is not a small addition to rdf (it defines interop with rdf’s version of ordered lists, but the one in json-ld is array based with random access patterns and a .length while the pure rdf version is purely linked list based without any guarantees about having only one link)

No, I mean RDF in general. As you can read in the article I linked to above, programming with Linked Data does not necessarily require RDF.

The specific concern is that it's a 100th attempt at creating metadata for everything in the world. You can't create a non-ambiguous comprehensive catalog of the world.

This concern can't get less specific.

The very first paragraph on linkeddata.org says it is for "exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF."

It does. But linkeddata.org is not an authoritative source, and even though that statement is partially correct, that doesn't mean that JavaScript developers will need to be exposed to RDF. Just like JavaScript developers aren't exposed to assembly, even though V8 generates it.

It's a shame that linkeddata.org is the top Google result for "linked data" then.

I've been thinking someone with the expertise and influence should write "RDF: The Good Parts"

AKA instead of "Eval is Evil" we might instead say "XML is Evil"

You're right, there are a lot of academics that like the idea of a semantic web, it rings true to a lot of scientific principles. There are also a lot of ideas and far fewer day-to-day applications. Science ticks a long a lot slower than startup culture, however, so it's not suprising that a) understanding of the issues comes slower, but also b) some "experiments" that would utilize semantic data have not yet been fully tested. Read a list of points that address "why the semantic web is dead", many of those points are precisely what science seeks, i.e. principles that promote a slow, and deep understanding of a domain of knowledge.

With open-science mandates coming from governments around the world researchers are looking for ways to share their data in meaningful ways. I can think of a significant amount of research that regularly consumes RDF, particularly in the fields of medical biology and genomics where it's used to annotate data. This is where I'd guess you'll see it take a foothold, for example medical diagnosis codes are notoriously disparate and there is a strong appreciation for what semantics could address. Unify, exchange, and consume medical diagnoses ... proffit.

Links etc. off the top of my head-

* GO - The gene ontology, used in hundreds of thousands of genomic anotations https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3944782/ * UBERON - https://genomebiology.biomedcentral.com/articles/10.1186/gb-... * The second year of US2TS - http://us2ts.org/2019/posts/registration.html * OBO foundry - https://github.com/OBOFoundry/OBOFoundry.github.io

I’ve been learning about Barry Smith’s project for a scientific base ontology, Basic Formal Ontology, and it’s really fascinating stuff.

I always found that in introduced unpragmatic, counterintuitive roadblocks for actual knowledge integration. The most annoying thing is how it carves up reality into representations of different granularity, which are then made disjoint.

One of the proposed selling points of representing your data semantically is that you can infer across it, disjoint assertions are pretty important in this regard. How disjoint assertions are used (Smith) is another issue.

A related beef though, is that with any reasonable size dataset ontology-based inference is computationally very difficult, you have to cut all sorts of corners and know all sorts of tricks to actually infer across your data. In other words- if semantic data are going to truly become ubiquitous we need to infer in real time across them. Inference takes everything in your dataset into account, so adding a single axiom means if you want to be complete, you have to compute all over again -> slow.

Even more, we're already there: US ONC Health IT and HL7 are currently on the FHIR standard, which is slowly integrating linked data principles. E.g. currently uses JSON-LD: https://www.healthit.gov/buzz-blog/interoperability/heat-wav...

There's been a lot of effort to improve RDF ergonomics with JSON-LD, and ActivityPub is a widely used standard based on JSON-LD (though my experience with implementing it has been quite challenging).

I feel you on implementation - every time I make an attempt to try out ActivityPub I get intimidated by the combo of JSON-LD and the verbosity of Activity Streams vocabulary: https://www.w3.org/TR/activitystreams-vocabulary/

I've been working on an ActivityPub implementation and there are so many edge cases and SHOULD vs MAY recommendations, it's ridiculous. I'm planning a separate blog post on the intricacies of the standard.

I banged my head against this issue quite a bit last year. Specifically the thing that tripped me up, is how many fields can either have single value or be a list of values, and figuring out what is meant semantically when it's a list or a single value.

This is exactly what frustrates me the most and what most of my logic checks. I'm writing tests for each of these semantic cases, but it seems silly to me that a standard leaves something so ambiguous.

This is why the go-fed project uses code generation.

> skipping out on the debates that everyone else in the space is having - like whether decentralization can be fast, or how to ensure data authenticity, or whether a 'local server / pod' can be built that doesn't get hosed by hole-punching through a home Comcast connection

Where is a good place to participate in those debates, especially data authenticity and local server pods?

FreedomBox didn't go anywhere. FreeNAS with ZFS is reliable but not designed to be exposed to the public internet. Many local services are using a centralized rendezvous server for NAT hole punching.

On the shiny commercial front, MyAmberLife has $13M in funding for a home server but it's mostly controlled by a central cloud service. Do Western Digital, Synology, QNAP, Drobo, etc care about decentralization?

Some relevant efforts are dat (and Beaker Browser, the user-friendly frontend), Secure Scuttlebutt (and Patchwork, the user-friendly frontend), and IPFS. Somewhat less legit (imho) is ZeroNet, and somewhat earlier-stage or more obscure is Upspin.

Not only now Linked Data + RDF is even simpler and nicer to learn: There are currently more libraries to work with. Also it is more critical than ever for many industries.

Current solution for several issues related with electronic health records concluded to create the new standard, to use RDF and linked data, which solved most of the issues on the previous standard. See FHIR: https://en.wikipedia.org/wiki/Fast_Healthcare_Interoperabili...

In fact, current linked data discussions seem to me that become relevant again because it is more clear now that we have been misusing/overusing/ bad REST, microservices architectures and GraphQL for some already analyzed and solved problems.

But, of course, for a single application which doesn't require interoperability, not requiring standardized data exchange formats, not requiring support for flexible data representation, Linked Data and RDF will be clearly unnecessary. But on time, the future of data interconnection plays on the side of Linked Data IMHO.

Until now, current attempts to create some Linked Data + RDF alternate infraestructures are more likely to create ad-hoc, informally specified, bug-ridden, slow implementations of Linked Data and RDF.

In my opinion, the linked data future is still discussed, because when Tim Berners-Lee first presented the idea of Semantic Web in 1994, he used what I would call an IoT scenario to describe it ([1], scroll to the end; I really hope he is not reading this comment). Now that the IoT is here, maybe more people are ready to listen.

And just to make sure we are on the same page here: it's not academics' job to build usable products. We will continue working on things that are novel from the academic standpoint; if people like you dismiss LD/SemWeb, those novel things will have "a terrible track record of solving real-world problems". I hope this does not come across as too personal.

[1]: https://www.w3.org/Talks/WWW94Tim/

No, your link doesn't talk about an iot scenario. Selling and purchasing houses has absolutely nothing to do with iot.

And using semantic web for that is just as bad. A basic json API would be much more stable than parsing a document with navigation and similar just to get that data.

Agreed 100% schema.org and whatever fb's equivalent is only "succeeded" because of the incentive in their ranking models (thus centralization).

Highly unlikely someone will bother with this (in addition to all the other quirks) while making their website.

Why does it matter the last time anyone consumed RDF? If it's a good fit for the problem at hand (I don't know if it is), then there is no reason to not use it. It doesn't matter how old an idea is, just how it gets used.

I think the part ignored by so many is the need to decentralize the computers into the home. I'm not talking meshes or shared resources. For the majority of use cases, we don't need distributed storage, compute, etc. Just start making these self-hosted "servers", "data pods", etc as easy to install as desktop software and make it clear that they are inaccessible when the computer is off. People that aren't already will gravitate towards at least one always-on machine in their house. Modern societies have reasonable upload speeds and electricity/network uptime to support it. Sure, things like ISP firewall/NAT and dynamic IPs are a bit of a barrier, but you can have volunteers help with relays.

For example, I can easily fire up a Tor onion service on my never-turns-off home desktop computer and reach my stuff from anywhere. Why can't I reach my friends' stuff the same way? Because, to use business-speak, there's nothing "turnkey". It's something I've been pondering and working on. Sure, the bigger players may have to be in DCs, have more stringent uptime requirements, and distribute their bandwidth/workload more. But for most of us, desktop software and web-of-trust style connections could go a long way so long as the front of the software has a FB feel (e.g. a feed, messages, etc). We can tackle discovery, searching, aggregation, offloading, etc later.

Back when mobile devices were rare, skype used p2p and was very popular and novel. But then an increasing percentage of nodes were mobile with limited power, limited bandwidth, and much less storage. Even being a member of a DHT (a nice way for distributed peers to keep track of each other) is a prohibitive from a battery life and bandwidth perspective for phones.

So Microsoft moved skype to a centralized service and have been trying to monetize it since.

The problem with decentralized servers isn't technical, something half as fast as your phone could easily handle distributed versions of popular websites. The arm based "wall warts" were plenty fast, and they are several generations old already.

How could decentralized applications/services be sustainable funded? If not advertising, how? If it is advertising, what's the benefit to users?

Most importantly, why would users care about decentralized vs centralized?

It does seem that a modest arm based server that's silent, potentially integrated into a wifi router would hugely reduce the downsides of p2p networks. Free power, cheap bandwidth, and being part of a p2p network they would avoid long startup times for applications. Users on their phones would get instant access to their data while their local node did any proof of work, DHT tracking, earning the reputation necessary to use bandwidth, cpu, and storage from other peers.

I don't see any technical barriers, just that users wouldn't care, and nobody would want to pay for it.

> Most importantly, why would users care about decentralized vs centralized?

Probably few, except those actively searching for it. And, especially with decentralisation, the inevitable outcome of being too popular is that it starts to become centralised again, to make things easier.

Besides that, I thought for a moment about Apple's tech. If you and a friend have an iPhone and you're both trying to connect to the same network (and have each other in your contacts, I think), iOS will allow you to automatically share the credentials and connect the other phone too.

I reckon that you'd see a lot of value in that kind of device integration, which is essentially peer-to-peer.

I don't know, if people could buy a 'personal cloud box' that is as user friendly as an iphone, for less than a 100 dollars, a substantial number of them might possibly do that.

A 'personal' cloud box is not enough. We need a 'social' cloud box which can talk to the equivalent boxes at the homes of our relatives and acquittances, and which are as easy to set up as logging into a centralized social network.

Mastadon meets Dropbox, with a content navigation interface thrown on top?

Yes, that's the thing. We know how it should be used because people are already using such systems. We need that no part of the system is owned by a single organization.


This is what hubzilla can do, along with using the Zot protocol for distributed identity.The main issue really is having an easy wizard-based installation/configuration for new users wanting to host.

And the biggest issue of them all, keeping it absolutely secure while doing all that. It would be a killing blow to the venture if someone got their data stolen.

The existing centralized systems set a pretty low bar for that, to be honest.

I do not think current big centralized systems are a low bar, the attacks get more and more sophisticated, but mom&pop/volunteer/etc. systems still get exploited in the most banal ways possible.

I'm not certain we need to make this accessible to "the next billion". We like to dress up these actions in virtue but the truth is we hook people up because it's a business. Facebook devastated our social fabric. I don't think it should have been invented. But here we are with its nuclear fallout all because some young douche thought it was a good idea.

If everyone owned a personal cloud box, security procedures would quickly fall off a cliff, those people are now the new botnet.

Maybe leave people alone, stop sticking our fingers in all the places they might stick to money. Let the geeks take care of their little tribe.

I love my Synology NAS server :)

Though its not really simple enough for non-technical people to set up.

I also love my Synology DS-218+. I'm hosting my own Spotify (Airsonic [1]); music tagged to perfection with Beets [2]), an IRC bouncer because Matrix is still not ready, my own Netflix (Synology's Video Station [3]), and my own mailserver (Synology's MailPlus Server [4]). I've been giving out access to this pod to close friends. It's empowering and community-building. This is the kind of social network I desire: close-knit around the warm glow of a server.

It's really self-host heaven. I don't pay for Fastmail, Spotify, or video streaming anymore.

[1] https://github.com/linuxserver/docker-airsonic

[2] https://github.com/linuxserver/docker-beets

[3] https://www.synology.com/en-global/dsm/feature/video_station

[4] https://www.synology.com/en-us/dsm/feature/mailplus

Sick, now turn this into a product that can be adopted by the masses!

I have an older Synology. It is already ready for the masses.

Why: You only need one one for a family and most of the time there's already a person in the family who does "PC stuff". And even if there isn't there's always someone who'll learn it if a friend has one.

The rest of this post is not targeted at you but rather on a whole attitude here at HN:


Anyone who can operate a web browser, has any education in IT and knows enough English to read instructions in the box should be able to set up one.

In fact I think just being able to read the quick start instructions should be enough to install one with basic features.

Setting up websites in the 90ies - early 2000s were a lot harder. Same goes for using older PCs with DOS.

A major problem today seems to be learned helplessnes. In our well meant and to some degree profitable[0] effort to make sure anyone can use anything we have are creating a situation were people are more helpless .

Seriously: if app stores and walled gardens had been introduced first the web and email had been considered to complicated now. I can imagine HN: """You mean my siblings, parents and grandparents are going to install this "e-mail" thing? Even if they were able to configure "smtp" and whatnot they'd forget the "email address" or even how to start it before tomorrow."""

[0:]: if anyone doesn't catch my drift, your brightest customers might not be the ones who pays most ;-)

Edits: a number of them :-)

> most of the time there's already a person in the family who does "PC stuff". And even if there isn't there's always someone who'll learn it if a friend has one.

That's not true at all. Confirmation bias is rough when you're technical; you keep spotting other technical people.

Was about to disagree strongly with your conclusion but you have a point. Not everyone had a web site.

But I'm not totally convinced either: email has been huge despite the configuration needed, also with people who had to take it step by step twice and make notes while doing it. Some figured it out on their own (or more realistically using the step by instructions that came bundled with their first modem). Other had a son or a grandson who'd picked it up at school. Others got it at work.

My grandparents where the youngest group of people I can think of that didn't have access to email somehow.

And my wifes grandparents have/had access to mail and used actual mail clients too, not just Hotmail or Gmail.

The issue such devices have in practice is upfront cost, compatibility, the need for port forwarding, and the lack of redundancy so you don't lose your data. Among other problems but these come to mind. Centralization solves those problems because you can connect to a server instead of port forwarding, your data might be stored across 3 servers, the cloud storage might be as cheap as free, and applications are designed to work seamlessly with a small number of big cloud providers.

In the depths of my soul I would love to re-decentralize the web. I truly believe data centralization will cause people to suffer a lot. decentralized tech needs to solve so many problems before alternatives to centralization become viable. Centralized approaches also improve over time and are a moving target to keep up with.

IPv6 will help by eliminating NAT (and thus port forwarding problems). IPFS might help with redundancy. But yeah the cost and complexity are major problems. I think the biggest problem is making open standards that can evolve quickly. All the big chat platforms have switched to proprietary protocols so they can iterate and roll out changes on their own.

Basically all home routers block incoming connection because so many shitty IoT devices were built to trust anything that is able to connect to it. My router doesn't even have an option to turn off the block. Can only unblock ports 1 by 1

I guess the route around this would be for the maker of these devices to partner with router companies. Alternatively, they could also be routers. That would potentially make quite a lot of sense...

I can see a lot of value in routers becoming mini home servers + IoT device hubs. There should also be some standard way for devices to let the router know if they should have access to the outer internet so things like lightbulbs only have access to the iot hub and then the hub has access to the internet.

Doesn't UPnP already solve this? And it enabled by default on most routers.

But IPv6 routers will continue to block incoming traffic.

Which NAT/PAT/port-forwarding issues are you thinking of?

Dont forget about device maintenance, which is a huge burden. Users of centralized services never have to worry about upkeep.

These products have existed for a while and don't really make a dent.

I don't think I've encountered any super simple appliance I can plug into my home network and move data onto and then easily give online services I trust access to its data. Ever since I heard about Tim Berners Lee's new project, the Solid POD (Personal Online Datastore), I've been thinking about how to make it friendly enough to people that they'd actually use it in large enough numbers to gain traction. It always seemed like people would have to learn to start paying at least a little money to make privacy work without advertising dollars. I just Googled around for the latest and noticed that the MeWe social network (which emphasizes privacy) which Tim is working with offers a free tier on data but then charges above 8 GB. I wonder if their long game on being sustainable is to just make money on add-ons above the free tier. Here's hoping it catches on...


> How could decentralized applications/services be sustainable funded? If not advertising, how?

How any other open source is funded (e.g. corporate/individual donations, grants, crowdsourcing, support, ancillary products, etc). I don't believe, at least for an MVP, that much funding is needed compared the scope of some of the successfully funded open source projects that exist.

> Most importantly, why would users care about decentralized vs centralized?

They wouldn't. And ideally, beyond the annoying hoops on initial versions (e.g. discovery/identity), they shouldn't. Your software needs to win on features. A self-hosted, subscribable Reddit clone w/ chat would be a very good start.

Please, show me a list of self-sustaining open source projects that cover developer time and administrative overhead just in donations.

The only one I can think of right now would be Font Awesome 5 (via Kickstarter), which already was a very popular product with big name recognition and had a professionally run Kickstarter.

Pretty much ever open source project only survives because developers donate their own time, or companies allow their developers to do so.

> The only one I can think of right now would be Font Awesome 5 (via Kickstarter),...

And even there, imho Bob (iirc) from the video was right - I hate font awesome 5, and always install either version 4 or "fork awesome" in my projects.

Why so?

I must thank you for your question, and must (embarrasing as it is) admit I was wrong.

I started a reply stating that one can no longer use <i> elements and must instead include svgs, but then I decided to double check - and found I was wrong all this time... Some minor details have changed, but mostly for the better. Their marketing is a bit annoying too, but that's the price of having a sustainable business model.

I obviously haven't checked version 5 in projects yes, but I guess I should now. So thanks for making me realise my mistake...

Those, then, would seem to be valid ways to organize such a project. Lord knows there is more than enough volunteer time available to achieve any project - if properly organized.

There should be a marketplace for apps running on home routers. It would change the Internet as we know it. Developers should be able to create router apps and submit them to marketplaces and users should be able to install router apps of their choice with just a few taps.

Routers have a few key advantages over most other computing devices owned by the public: routers normally have a public (non-NAT) IP address, they're always on, and battery power is not a concern. If people could install a Tor implementation on their router with just a few taps, Tor usage could expand dramatically. Developers of decentralized social networks might finally get a foothold once the installation problems are gone.

IMHO the best way to re-decentralize the Internet is by creating routers that host arbitrary apps, along with a marketplace of router apps.

The way to bootstrap this idea seems simple: sell a new router with a thin margin, provide SDKs for free, let developers charge what they want for the apps, and take a 30% cut from app purchases. The plan seems so clear that I wonder why I haven't yet heard of anyone doing it. :-)

The obvious problem with in-home edge compute like that is the shittyness of router implementations, combined with the shitty software out there, combined with the lack of common interfacing with the router by a human just begs to open up a world of devices to botnets. Needs to be interfaced with by humans regularly.

Right. Whoever does this needs to be competent at building a safe, stable, and usable router OS.

OpenWRT/LEDE is a thing. But don't expect cheap home routers to be able to run useful services-- the cheapest hardware is truly bottom-of-the-barrel, even OpenWRT had to drop support for lots of devices.

Ubiquiti makes decent hardware, and now has an upgrade to go from their flavor of linux to debian stretch.

I wonder if Ubiquiti would consider selling a "developer" module of the Edgeroute-X or similar with more ram.

Or maybe look for a raspberry pi clone with dual ethernet (not connected by USB). Add a 128GB microsd for $20-$25 to have some room for video, photos, email, chat logs, etc.

Please don't encourage router-makers to add more functionality that risks exposing my network traffic to the internet. I sympathize with the sentiment, but the people that make these devices have proven again and again that they are unable to execute on basic traffic security.

> routers that host arbitrary apps

Servers. You are describing servers.

Edit: upon reflection, rather a lot of people I know don't have routers, either; they use shared internet (e.g. xfinity) or only have mobile plans.

I think this is a very very interesting idea. The only problem that makes me nervous is telecoms/ISPs capping the line or requiring some propietary interface to it (something done this days to an extent) and quirks. If they feel they are going to lose control or similar. Just a thought, I don't really know if it could be a real problem though.

I've found secure scuttlebutt (www.scuttlebutt.nz) really impressive. By using public servers called pubs as meeting point they can ignore lots of issues like NAT punching and discovery. I don't like that it seems to be based on an immutable log but it showcases well how much is possible with offline first, sync when possible software (just like syncthing)

I have been meaning to try SSB, I have had some success using DAT[1] to transfer large files between friends.

1: https://datproject.org/

Setting aside the fact that most people would look unfavorably on yet another box in their homes, how does any of this help you avoid the blocking and tracking of the ISP?

Your ISP will still know what you are doing, and will have the ability to block you from doing it should the need arise.

I also feel like people are proposing solutions based on technologies as they stand today. Chosen solutions to the decentralization issue need to consider realistic future cases. Just as a quick thought experiment, what happens in the near future, say, 3 to 5 years out, when a huge chunk of people are using 5g technologies as their primary connections? AT&T, or sprint, or verizon, or what have you will still get all of your traffic information in this, very plausible, future case. "It's encrypted." or "It's going through a relay." Is just not a sufficient response to privacy if you wish to have privacy from AT&T. I mean, think about it, chances are, the relay will be using AT&T too.

Which brings me to the big problem with solutions like these, ie - inevitable recentralization. Google, or Facebook, (and now because of how this new decentralization idea works, even AT&T), are in an almost unassailable position to act as the "switchboard" for all of this non-indexed data. Need to know where your aunt, who just moved, is on this new decentralized network? Are you going to ask google? or "WeAreDecentralizationIdealists.com"? Oh, you're going to ask your own node? Sorry, her new information has not propagated to your node yet. Check back tomorrow. Oh forget it, just call your aunt and ask her to give you the new information so you can input it manually.

No, she will need to be registered somewhere to seamlessly communicate changes to her node's connection information. And that "somewhere" will likely be a BigCo.

If we want to replace the behemoths, we need to come up with solutions that are just as easy to use, (easier actually), and avoid any possible blocking or tracking. That requires some very creative people to think radically. Easily blockable in home network nodes I think, are not only not really solving the problems, but are also doomed to failure usability-wise when compared with google or facebook.

> Setting aside the fact that most people would look unfavorably on yet another box in their homes, how does any of this help you avoid the blocking and tracking of the ISP?

Meh, don't need another box, just keep the family desktop on. But even without that, there is still value. The ISP tracking isn't much of an issue using an existing network like Tor. Software starts up, reads the locally encrypted Sqlite DB for your list of "friends'" onion IDs and connects to your friends gRPC services they are hosting on their machines as onion services. Maintains that stream, begin receiving fed data from these other servers (locally caching as you receive which is more ideal in an ephemeral world than live retrieval, but can be a mix of both depending on settings). All without the ISP knowing a thing except that you are connected to Tor.

> Which brings me to the big problem with solutions like these, ie - inevitable recentralization.

Yup. Can't get easily get around this. People are going to gravitate towards what's easier and what they want on the outside ignoring what's on the inside. It happens to most continually adopted standards, even if it's just a more trusted server w/ more uptime. And that's ok, I don't want to win some ideological battle at the cost of user happiness. I completely agree the software must be so easy you can't tell what's under it, but I don't think it requires that radical/creative thinking. Just user-oriented effort instead of the constant barrage of difficult-to-setup tech demos.

> Meh, don't need another box, just keep the family desktop on.

Doesn't exist anymore (often enough).

> Just user-oriented effort instead of the constant barrage of difficult-to-setup tech demos.

Things like identity management and data storage make these barriers pretty deep. “I can delete my post and it basically won't be accessible anymore” (there can be physical exceptions so long as they're legibly exceptions to the social reality), “I don't have to think about how big my images are and can just post as much as I want”, “I can lose any of my own hardware and everything will still be there because it's in the cloud”, and “I can tell who my friends are based on common knowledge within my circles of their unique name which is easy to remember and meaningful” are all things that heavily constrain what you can do “on the inside”.

Mastodon has meanwhile managed to either do something right or get lucky wrt the path dependency of building structures where prosocial hosting behavior is convenient: a whole bunch of mostly-volunteer instances have sprung up, adopters have managed to make instance choice part of identity so that the domain-part isn't just a “meaningless extra thing to remember”, and federation remains reasonably strong; meanwhile, financial support for server costs has mostly leant toward the Patreon model, allowing a fraction of generous users to help support a bunch of free riders while not having to directly participate in administration. At the same time, despite Mastodon having almost exactly copied Twitter's model in terms of available user interactions, the zeitgeist has repeatedly suggested that users getting on board for the first time often had no idea what instances even could be, and had to have the very concept explained several different ways before it got real traction. Random instance death is also a problem that's tempering the mood nowadays, because keeping the server up requires enough motivation which sometimes runs out, and some instances have started having problems with media storage requirements, which, see above (though I'm told the internal architecture could use some optimization too).

There's something deeper in here surrounding the thorough conflation of type with instance in the popular side of the digital world; I feel as though something critical to the more literate concept of this didn't make it into the default folk model, such that only centralized services are legible. I have some hope that Mastodon and related ActivityPub-based federated services absorbing waves of people fleeing the abusive behavior of major social media (such as the recent Tumblr exodus) will make a dent in this and cause the appropriate concepts to reach critical cultural density.

> Mastodon has meanwhile managed to either do something right or get lucky wrt the path dependency of building structures where prosocial hosting behavior is convenient

It seems to me that Mastodon gained traction in a way that wasn't much different from email. In both cases there was a communication network which anyone could add a node to pretty easily. Your node could have local moderation policies and you wouldn't have too many issues with getting blocked as long as you acted nice. For both email and Mastodon this enabled a few motivated administrators with some influence to onboard thousands of users quickly, and benefit from a network effect bigger than just their node.

The similarities to email end when you consider _why_ people migrated to the network in question. With Mastodon the waves have been culturally/politically motivated people running away from something (ex-Twitter users fleeing abuse, Japanese loli enthusiasts trying to avoid embarrassment, sex workers seeking an alternative to SESTA/FOSTA regulatory deplatforming). Whereas with email people were running _to_ something - the first free, global communications network of its kind.

It's also interesting to note that email nowadays has lost these advantages (running a server is much tougher now and you can't really pull it off without blessings from Google and others). Sure enough email is now on the decline.

I think email evolved when people had more digital-identity ties to their ISPs though. I remember having an email address tied to your university or your dial-up provider or whatever being pretty common in the 1990s, and pre-eternal-September the same protocols already existed among a more closed crowd (which is kind of like Facebook, now that I think of it).

Then “free email” providers on the early consumer Internet were able to compete on things like storage, because that was something it was still acceptable to expect people to pay attention to. One of GMail's original big draws was “lots and lots of storage”.

Email being in decline for social purposes feels partly related to changing feedback expectations and UI inertia, but I'd guess also some relation to a “mental association with uncool things” including both spam and stressful/boring transactional email; it's become more of a business thing. The spam side of that is related to the gradual federation lockdown, and something similar could happen to Mastodon, but at least there's some discussion about it happening in advance now.

I'm not sure what this all adds up to.

> running a server is much tougher now and you can't really pull it off without blessings from Google and others

Only true if you care about Gmail (a centralized service) getting your e-mail and delivering it to its users. You could just not care about Gmail and still interoperate with lots of mail servers - even lots of servers that do have Google's blessing.

> Meh, don't need another box, just keep the family desktop on.

I literally do not know anyone who owns a desktop computer any more.

Desktops are used by almost anyone doing gaming or serious video editing. Obviously not everyone but still common for a lot of people.

And yet, we continue to exist.

The point isn’t non-existence -it’s rarity. You cannot ‘just’ do something as a general solution when it only applies to one in a hundred.

a desktop that is already on most of the time or an upgraded router would work perfectly.

you've clearly not used any of the modern roaming-enabled encrypted overlay networks that people have been working on. one example is hyperboria. i connect to a handful of public relays (and one that i host myself) and no matter where i am my device is reachable at the same ip address. the address is an ipv6-encoded hash of the public key that is used for routing. its actually really neat, and you don't need to wait more than a few seconds for the routing tables to update if you change locations.



I haven't even heard of these things, and your description sounds like so much effort I'm not even going to be looking this tech up. If it's more than "install using an installer, not a terminal, and then it just works(tm)", it's not the kind of thing that's going to get people who weren't already on board, on board.

well, i didn't say this was a consumer-ready network. i'm just pointing out that the technical issue behind "how do i route to my aunt's network if she changes her location" is a solved problem. nobody is saying this stuff is ready for mass consumption. we are building and using these things so that they can eventually be easy enough to use that we can replace current-internet with future-internet.

Well the page actually mentions 'Just works' and 'Low barriers', but it goes off a few paragraphs later:

> This year the event will take place from 3rd to 9th August 2015 in Maribor, Slovenia

2015 ??

> Your ISP will still know what you are doing, and will have the ability to block you from doing it should the need arise.

Putting an automatically configured VPN on such a box would be extremely easy, no?

First someone has to make a server that is perfectly secure and lasts for 50 years without any updates. It needs to be like a thermostat from the 1950s - maybe it lacks the latest features but it still works just the same as new. I'll accept maybe it only needs to last 10 years, but that is the minimum time.

I have family to think about: I don't have time to update my servers every time some new zero-day is fixed. I'd rather pay someone else to deal with those details. My five year old is growing up time with him is far more important that fixing security holes.

> First someone has to make a server that is perfectly secure and lasts for 50 years without any updates.

Nah, your auto-updating desktop is fine. The software itself might evolve to an hands-off, evergreen-type of approach, but for now just a desktop daemon is fine. The reason keeping servers updated seems so non-trivial is that we visualize it in an ops sense like we're at work.

> Nah, your auto-updating desktop is fine.

As confirmed by the lack of complaints about auto update in Windows 10. I mean, its not like updates ever break anything anyways.

I'd argue that it's the practice and not the principle of auto-updates which is unsound. Automatic security patches make a lot of sense in the current security climate, but once that infrastructure exists it is irresistible to use it for random feature creep and marketing.

> I'd argue that it's the practice and not the principle of auto-updates which is unsound.

I disagree. I've yet to encounter a system where automatic updates didn't sometimes break things. It's a nice theory, but in practice things break when you change things they depend on even when you're trying not to, and a lot of developers don't try that hard. Not to mention, sometimes you're dealing with design flaws that cause API changes, or programs relying on behavior that wasn't officially part of the stable interface. Auto-updating causes more problems than it solves.

I don’t think it would be that hard to build an update system where not everyone gets updates at once, and you wait for consensus among early adopters before forcing it on the remaining users.

No amount of early adopters will ever have full coverage of what people actually have, in part because the people who need reliability are not likely to be early adopters in the first place.

You still need a company or foundation stable and competent enough to maintain it and not ruin it for 50 years...

I wish home appliances all used the same form factor for controller boards for these smart appliances. Like a socket for a raspberry pi (instead of a hat).

Had to replace my dishwasher because the controls shorted out.

A motor or the display could just as easily burn out, but this article is about software problems, which you could fix with a workalike board.

Worldwide, most people access the internet via mobile devices. They don't own or use desktops. Beyond that, many don't have internet to the home. They just use their phones.

One problem is still asymmetric upload/download speeds for many (most) home connections.

Often upload is 10x slower than download.

And users expect their internet services to be fast. If every request involves leaving the backbone to go all the way into your home and then back out again, good luck with that.

I think the better solution is to leverage cheap vms. At least performance stands a chance. It's just a matter of making cloud computers accessible and usable to non technical people.

> I think the better solution is to leverage cheap vms.

So...centralize many/most popular internet services on infrastructure that provides cheap, reliable VMs. At the risk of overdrafting my snark budget: that sounds familiar.

Data caps on upload as well not to mention many ISP's have terms that prevent you from running services on your account. If you want that from them you need a business account which typically costs quite a bit more. At least that's how Comcast and Verizon are in my neighborhood...

Perhaps the definition of net neutrality could be extended to cover this problem. I mean, if we want an internet where everybody is treated equally, then symmetric upload/download speeds are a basic requirement.

The tricky thing is that ISPs have traditionally offered symmetrical service, but only as "business class". So they would protest that they already sell that, customers just need to pay (double, triple) to get it. That said, my (just residential) fiber connection consistently tests slightly faster upstream than down.

Try Sandstorm.

Home servers are a very difficult sell (see $500 Helm) compared to VMs running in a data center and IMO the privacy difference is mostly illusory.

/me goes to sandstorm.io, click install, skip past the paid version on to self-host, see it only works on Linux, close tab

^ That is your expected reaction by normal desktop users. I mean literally download an exe and up pops up your feed ready to add your friends, or favorite businesses, news sites, link aggregators, etc given their onion ID (yes, onion ID is annoyingly large, especially v3, but discovery/identity comes later, don't let it hold up the system).

I'm not convinced you need a "home server" in the traditional sense. Just accept what you lose, uptime, if you use your laptop or phone to do the hosting. You can share between them too given a synced private key which is the software's job, not the user's. Still, an ephemeral self-hosted-on-desktop social network can go a long way (and again, people will let the need for uptime drive their always-on desktop decision). This stuff requires such little resources to start, a cheap Raspi w/ a install/reach-from-other-device would work just fine if they don't have a home computer and want one just for this. Large storage can come later.

I do agree the privacy difference is minimal.

I think the ideal use case for Sandstorm is either the service model or best-case the "your family computer guy runs it for a ton of people you know". If I stand up a Sandstorm server at home this year (which is likely), I'll probably allow any friends or family to use it.

The "normal desktop user" should probably not be running their own self-hosting setup, because they will fail at backups and reliability and performance.

Or, one proposes a system with redundancy built into it, so that a machine going down or getting soda poured on it doesn’t take out the whole network.

Truth. This stuff needs to be EASY. If I'm capable of setting up but too lazy, my mom stands zero chance.

The apps on sandstorm are super out of date. When I checked a few months ago the gitlab version was from 2016

Is sandstorm still around? I was a big fan of the idea and mission, but then I thought kentonv kind of wound it down when he joined Cloudflare.

Yeah unfortunately it's basically on life support now. For the first year after the company folded, I was still hacking on it frequently on weekends, but these days I mostly only push the monthly release to update dependencies. It's just that Cloudflare Workers is, well, successful, and it's... more fulfilling... to work on a successful project. :/

Could ellaborate why is illusory?

Data centers and (paid) cloud vendors don't look at your data and the police can get into your home as easily as your VM.

Nope, they can't. There are reasonably-strong legal protections against police entering your home, because it's the main place where you actually have a legally-acknowledged expectation of privacy.

Aren't there similarly-strong legal protections against police entering a data center and accessing servers?

Not really.

There are protections in both cases. But they are different.

For instance, in the USA, there is no requirement that you have a mechanism to provide information stored at home to law enforcement, even under warrant/court order. If you have one, they can compel you to use it, but if you don't they have no remediation.

But a cloud provider is legally required to have that mechanism, and when order if they don't exercise it, they are punished. See: https://en.wikipedia.org/wiki/Stored_Communications_Act and https://en.wikipedia.org/wiki/CLOUD_Act.

Most people have no real reason to be concerned about police looking at their data, even if it does somehow happen (not in America, anyway). So as a need, it's a vanishingly small market, limited to people who actually have something illegal to hide - drug business, child porn, whatever. As a want is a different thing - people, convinced to be afraid of Big Brother, want to shield themselves. But that's still not a very large market, when it comes to people paranoid enough to actually do something about it.

If I had data that I legit feared the government finding, I wouldn't be protecting it with a home server - I'd protect it by encrypting the crap out of it and storing it on a pretty generic cloud service, not in my home or anything easily traceable to me.

Nope, you're thinking of this backwards. The people who actually have something illegal to hide might actually try to do what you're saying here, make it untraceable somehow (at non-trivial risk and expense, of course!). But if what you're worried about is more casual, warrant-less invasion of privacy by your government (that isn't motivated by something as blatantly illegal as the examples you chose), "just store it at home" is a no-brainer.

Right, but if you had data that were just that confidential to you, that's the easy part. Just encrypt it and don't let anybody look. I don't really have that kind of data, but I do have a lot of stuff that's just private, in the sense that I don't have anything to hide, but I close the door when I'm in the toilet. So, really private data, encrypt it, store copies in many safe places, you're done.

That's not really what Solid is about. Solid is not just about me, it is about us. The stuff we do together. The sharing we do, but we share not just with anybody, but with someone we trust. It may be something really trivial: I share my grocery shopping list with my wife. It is not sensitive by any means, but it is also nobody else's business. Those are data used on my terms. Nobody should be peeking into my life to map me, as I go along with my daily business, but my daily business consists of interacting with a lot of people, and I do share and I want to share, but I want personal data control.

Now, personal data control is really the key to permissionless innovation. So, we're not just doing it to protect from snooping, once people have their data then you get a level playing field were there can be competition for the best user experiences.

What if everyone had a personal always-on cloud storage like dropbox, or even their ISP, where data is encrypted and access managed via crypto keys exchanged directly between clients so that plaintext never leaves the client. This gets us around devices being offline and NATs. Clients communicate via these always-on cloud stores using them as store and forward network, along the lines of http://cweb-services.com/intro.html

There are challenges like establishing initial connections, or push notifications, but these can hopefully be worked out.

>I think the part ignored by so many is the need to decentralize the computers into the home. I'm not talking meshes or shared resources

This same thought has popped up in my mind as well before... I think it's a probable future only if these devices are discrete plug-in and forget machines and generate revenue for the owner.

Actually, we've been chatting about this, and I think it would be great to start selling Solid servers on a OLinuXino board with a nice box around. Not because it would make us a lot of money, but to demonstrate that you, at least a bunch of nerds, can easily take complete control, from everything to the Open Hardware to the MIT-licensed server.

I'm running Solid on my own box, and I can't see myself doing it any other way, but it was pretty hard to set it up. We need to change that.

> People that aren't already will gravitate towards at least one always-on machine in their house.

I think you radically overestimate the desire and ability of the average computer-user to consider their devices' uptime.

there is a "deprecated?" server software. i use it daily, its called Pichat, and so far it seems to be deleted from the internet. it is a chat server/webserver/fileserver. it comes in linux, and win version it is on horde as well as the SDK for it. Tell me if you would like to know more. this is not my software, if you [whomever that may be] are the person or part of the team that created this [pichat] please chime in.

look at this persons -about in thier profile.


this is handy as well.


The W3C has a proven track record to produce overengineered shit when it comes to "the semantic web".

Just look at ActivityPub. It's essentially OStatus but instead of XML we slapped namespaces on JSON, wrote a bunch of overly complex preprocessing procedures so that everyone can output just the way they want[1] and still made half the spec ambiguous enough[2] that implements essentially follow the one rule that matters, maintain compatibility with Mastodon.

[1]: https://www.w3.org/TR/json-ld-api/#algorithm-5

[2]: https://please-just-end.me/ap.html#block-activity-outbox (domain name relevant to content)

The Semantic Web does not require XML these days. JSON-LD (for JSON) and GRDDL/Microdata (for HTML) are widely-acknowledged standards. For simple text use, akin to a Markdown-formatted document, you can use Turtle. If you believe that JSON-LD is genuinely ambiguous, take it to the authors of that spec and contribute to getting it fixed.

Do you think a format that requires this amount of code to predictably parse is a good format?


"In order to regain freedom and control over the digital aspects of our lives"

Nothing proves his point more than:

  <script src="//www.google-analytics.com/analytics.js...

This feels like the same argument:

"Al Gore claims he's an environmentalist, yet he flies using his personal gas guzzling plane."

And yeah, it's an ad hominem. "Is a fallacious argumentative strategy whereby genuine discussion of the topic at hand is avoided by instead attacking the character, motive, or other attribute of the person making the argument"

You can most definitely use centralized servers to disseminate decentralization. I'm pretty sure https://ipfs.io started as a github project and on a monolithic server backed by cloudflare. That doesn't dismiss them in the least - considering they now heavily dogfood.

More specifically a Tu quoque / appeal to hypocrisy https://en.wikipedia.org/wiki/Tu_quoque :-)

Thanks for sharing this! I wasn't aware of this term and it's very relevant to another topic I often discuss. It'll be very useful to me. :}

How does running analytics on my site impair my ability to talk about what Tim Berners-Lee wants to do for freedom and control? ;-)

Yes, I track how popular what content is on my site. Motivates me to write more. Please feel free to block trackers; I do that as well.

Have you considered Piwik or a similar self hosted analytics solution?

I don't care if people run Analytics, its just when they go sharing all that data with a third party that it gets troubling.

Yes, I urgently need to migrate; same with Disqus, needs to become Solid.

PS. It is called Matomo now and at https://matomo.org

It undermines your argument, is what it does.

You can talk about it all you want, but as long as you continue to participate in the very thing you rail against, you're going to struggle to be taken seriously.

People who participate in something are often best positioned to criticise it, because they:

A) know how the sausage is made, and

B) stand to actually lose something if people believe them, i.e. they are standing by their point in spite of the negative consequences to themselves.

People on the dole who argue against (details / implementation of) social security should be taken seriously. Rich people arguing against tax breaks for the wealthy. Programmers against big tech firms. Etc.

So only murderers should judge other murderers?


technically the whole criminal judicial system is a licensed murderer

That's not at all how argumentation works :-)


'Actions speak louder than words.' ;-)

That's exactly how argumentation works...

You’re confusing argumentation with perception.

How is that not freedom? Each site owner has the freedom to use google analytics or not, no one forces them to. And each user has the ability to block requests to google analytics if they want, or avoid sites that use it.

How am I supposed to know that any particular domain is used for analytics, versus core functionality? How am I supposed to know what scripts a webpage loads, before viewing it?

> "But look, you found the notice, didn't you?" "Yes," said Arthur, "yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.'"

I am curious to how this could be solved. I mean, i guess you could have your browser ask before every request generated for a page, but that seems very cumbersome.

apps do it on first install, why shouldn't browsers do it on a website's first launch?

> why shouldn't browsers do it on a website's first launch?

Because users would riot, and would use browsers or plugins that suppressed these notifications.

I would love it, personally! And websites would riot, and then reduce the number of permissions they requested, just like apps do/did.

It would be just like the cookie notices are, now that the data misuse is displayed with a banner, people rage and hate those. Websites haven't really removed third party/non-essential cookies.

And Disqus comments, which are also pretty surveillance spammy.

It's like Lenin said: “The Capitalists will sell us the rope with which we will hang them.”

I skimmed this and see two big problems.

First, this idealistic idea that "we" are going to take back our data. Who is this we? Only the smart, high-agency people who have time to spare. The commercial web is increasingly tuned to the normal user, who is low-agency and easily led around. Who will win a battle of user acquisition and retention? Facebook or the rebels? Facebook of course. So any solutions proposed here are just for a tiny percentage of users who will then be isolated from the real and useful social networks. Or more realistically use both.

Or maybe if the infrastructure is built, a layer of savvy entrepreneurs can emerge to monetize it? I'm thinking of reaganemail, selling an anti-google email account to the AM radio crowd.

Second, the idea of somehow eliminating censorship. De facto censorship will always exist, even if you sugar coat it as Twitter has tried - "your content is still there, but only if someone explicitly looks for it". Any platform without censorship will just be flooded by every marketer and political zealot, for starters.

Also, I think he is conflating filter bubbles with centralization. Without centralization, wouldn't we still have filter bubbles as people self-select into their online communities?

Well, when I got on the internet in 1988, we were all 'smart, high-agency people who have time to spare'

Supposing we manage to solve this problem, what's to say average people can't participate in 10 years or so or so when the tech has been made easier to use?

Early web was quite decentralized already. Many separate Bulletin Boards, later forums. Many people writing there had an idea how to create their own.

It didn't start centralized. Centralization happened. I might be more cynical than I should be but as a designer I struggle to see the future in which we have social dynamics that favor decentralization instead of convergence into a less self-managed system (i.e. all current centralized networks).

No it wasn't. Vast reams of content were hosted exclusively on GeoCities. In fact almost all "home pages" were on GeoCities or AOL back then. There has never really been a time in the history of the internet when a few small providers or companies didn't have outsized dominance - DARPA early on, then Netscape, AOL and GeoCities, then Microsoft, Blogger, WordPress.

This sort of discussion looks often like rose tinted spectacles. The past wasn't so different to today.

> Without centralization, wouldn't we still have filter bubbles as people self-select into their online communities?

Perhaps, but those would be self-selected, not imposed by the provider. Big difference.

I've read a bit about Solid in the past, but never quite understood how it will handle different data models. Does it force social data to all look the same (as in, have a predefined set of fields)? If not, how do apps built on it interoperate?

Don't get me wrong, I'm all for projects like this. I think it's wonderful. I just never really got how the apps will work with the same data without being forced into a particular data model (which seems like it would limit what you could do).

Great question, because it is basically the most ignored problem in the Semantic Web community and thus the one that we are spending quite a lot of time on.

So, basically, there is one data model, RDF, but RDF does not require the same set of fields, to the contrary you are free to write your own. Obviously, you wouldn't get good interoperability if you do. So, there are several things you can do:

1) Adopt what others are using 2) Map your "fields" (we're more for calling it vocabularies), to the stuff others are doing, and rely on apps to figure out interop using reasoners. 3) Don't care, your app will work fine for you.

I mean, 3) is fine, it is just that you'd be missing out. 2) also works, kinda, but reasoners aren't all that easy to use, so I'd mostly like to see people go for 1).

So, we need to make it really easy to find existing stuff. You could go for the big one, i.e. https://schema.org/ or you could go more in detail and look at https://lov.linkeddata.es/dataset/lov/ . The former has a lot of traction, the latter is real decentralized, so I kinda prefer that.

Then, we have to make it real easy to author new stuff when you can't find existing stuff, because that will happen. Then, we need to make it easy for others to find yours, so that they can start using it too for similar applications. And, I'm thinking that it will be kind of a graduation process, where you first look for existing stuff, and when failing to find anything, you just mint your own without thinking about others, just to get something that works up and running. Once your app starts gaining traction, you tighten it up, and if then something other gets popular, you can migrate to that with little disruption.

So, we're not there yet, but we're thinking and working on it a lot.

Thanks, this is a really great and thoughtful reply, and it's good to know serious work is being done here. I can't wait to see how the project unfolds!

You can mix and match parts from different vocabularies on the Web. See it as a JSON object where you can add any key you want, except that some of those keys have meaning to some applications. Apps don't have to understand everything, they just access what they need.

I think the answer to some degree is RDF and the semantic data standards. It’s hard to explain them quickly, but it solves a portion of this problem.

The web is decentralized, and we already have control of our data. The problem is, people keep giving it away.

I'm fairly confident that 98% of the population of the earth doesn't give a crap that their data is collected, or that they don't "control" it. This whole "decentralized web" thing is just privacy nerds trying to convince us that we need this, when really no regular consumer is asking for it.

Technologically, yes. But centralization takes many shapes (as I explain in the article: https://ruben.verborgh.org/articles/redecentralizing-the-web...).

I encourage you to read the article, where you'll see that I'm arguing from a permissionless innovation perspective, not so much privacy.

We can already do this - but it's a bad idea.

Plenty of services are API compatible with Amazon S3 (e.g. anyone can run their own S3 clone) so people can modify existing sites to use S3 with OAuth. Use OAuth to allow the user to delegate access to their S3 service link. No new protocols needed, no big innovations required.

But for this to work on anything other than the most rudimentary data (media files, blog posts, and serialized data) would require completely changing the way all modern applications are written. Databases would all have to change, APIs would all need to follow specific standards, and networks would need to become a hell of a lot more stable, higher bandwidth, and lower latency.

Assume you're Twitter, and you want to map-reduce all of the data of all your users to find out how many people retweeted a user, and then notify those users. Now you need to connect to every user's service provider, get their data, store it temporarily on your own servers, duplicate everything, do your processing, and then write changes back to all storage services for all users. Now do this every second. If you don't, you have to store this map-reduced data on your own service's storage, which violates the principle of only using the user's storage pod.

In fact, data would have to become more centralized to work in this model. Currently, application data exists across a range of services in a variety of networks, all of it being dynamically accessed in different ways before it is accessed by a user. There are dozens of different databases used just to open up the TV Guide on your cable company's set-top box. All of that would have to be centralized in one or two databases in order for the storage and processing to be disconnected.

Not only that, but a lot of data is useless to anyone but the original service provider or original application. Only a Facebook clone would be able to use Facebook's data, and only data relevant to Facebook's ad sales should stay on Facebook's servers, even if it contains "Peter clicked on ad X at Y time". Should there be a separation of what kind of data gets decentralized? Do we really want to go down the rabbit hole of what is my data, and what is data about me that a company has originated and created value from? (Is a picture mine because it's a picture of something I own, or is it mine if I took the picture?)

The idea that every component of every application could be completely decentralized from each other is unlikely. Now, what is more in the realm of possibility is doing a Google or Facebook, and creating features that allow exporting or importing all data. But that process is not perfect, and the procedure can take from minutes to days. And to use this data it would still all have to follow standards specific to a particular application.

And again, we already have a lot of these data standards. We have standards for most of the kinds of data that exist today, such as calendar, contacts, e-mail, instant message, voip, office documents, images, and so on. We have standards to synchronize and syndicate data feeds. We have standards to federate accounts and manage permissions. But commercial sites don't natively build these features as interoperable with each other - because, why would they?

Storage and processing of data are intimately connected with the specific applications that use them, and trying to decouple them will result in inefficiency and complication, with no clear advantages.

OK, so nobody said decentralization is easier. There's been plenty of academic papers saying pretty much the same as you do. But we have to, not for technical reasons, but for ethical and social ones. So, we're starting to tackle it head on.

Your TV Guide is a good example of things that aren't hard. They don't change very quickly, so you can just use a cache. That's easy.

Finding the number of RTs, that's also easy, apart from it being an open world of course. When they RT, they notify you. And you want to display those RTs with your tweet? Just cache those who notified you.

Stable data access standard? That is Solid itself. And the data model, that's RDF.

There are ways that you can go about doing this stuff.

Finally, we're also getting some traction around this in academia, they've been hung up in stuff that isn't helpful for too long.

Actually, the TV Guide example uses data that updates constantly. Every single interaction a user has is recorded and is used by other systems. The guide also changes based on user-specific views or preferences. Another example is Netflix's famous user-specific recommendations, which, changing constantly and whose algorithm is regularly fine-tuned, is a strategic feature. Even just playing a single show requires a dozen different calls to authorize its playing, based on a number of considerations.

Finding the number of retweets is also more difficult, because there's other data that gets recorded too. Not only do you have your own data now, you now have the data of everyone else that retweeted you. Is it your data, or theirs? Who is caching it, and how long? How does refreshing the cache effect consistency of each user's views? With decentralized applications you have to choose what kind of functionality you will support.

But, yes, in theory, if you allowed only one service provider to use some given data, you could rely on caching (read: holding a copy of data indefinitely) to a good extent. But as soon as you have multiple using it, you enter the extremely hairy world of multi-master high-availability strong-consistency replication. AKA, absolute hell. But this isn't even the most difficult problem to me.

We already had some good data access standards. The question is, why weren't sites using them to allow data interoperability/mobility? Answer: they didn't want to. So even if you create a technical solution for all of this, the best you will get is the Facebooks of the world publishing a read-only calendar feed, clunky, slow export tools, and single-feature one-way application integrations. Like we have now.

I don't see an ethical or social reason to decouple the data from the services I use, and I don't think the majority of the world population does, either. The only ethical/social concern I have is with the very existence of the service, which is a different concern.

Technology won't help with this, regulation will.

We have parallels from other platforms - specifically the fixed and mobile phone networks.

There used to be monopolies in local phone service. There were new competitors, but to change provider, you had to change phone number.

Even changing cell phone provider required a number change.

This obviously had strong network effects pulling you to stay with your provider. You had to tell _everyone_ in your extended network where to find you and have them update all of their business records when you changed from one carrier to another.

Eventually, everyone figured out this was stupid, and Number Portability [1] was forced on carriers by regulation.

This problem is completely gone now. You can take your number with you.

If we allow people to take their data to new social networks, and force federation, then we will get decentralization. However, it won't happen without regulation anymore than it did with the phone companies.

[1] https://en.wikipedia.org/wiki/Local_number_portability#Histo...

If you have ever seen any attempt by any government to regulate software you would know this to be a Lovecraftian nightmare, and not a solution.

If you have ever seen any attempt by any volunteer-run FLOSS team to solve nationwide social problems with technology you would know this to be a Lovecraftian nightmare, and not a solution.

But you aren’t legally mandated to use it.

Also-- companies that aren't legally mandated to allow third parties to reverse engineer their medical devices would otherwise use every means within their power to stop you from doing that. We know this because companies outside of the medical industry use every means within their power to stop you from doing that.

And I can't see how a response of "let's get rid of the chess board so they can't play" would be an adult response to this problem.

The way this usually works, is there's an entrant into the market who decides it is cheaper to use the government to gain access to a network instead of building their own (Number Portability, Local Loop Unbundling).

The government then says "You have to allow competitors access to X, and you have to do it by date Y".

Then the companies get together and agree on how to do it because they agree that government dictated standards suck. There is usually some jostling around with someone wanting to run a centralized database for a nice per-transaction fee. Typically this is tossed out in committee, but not always.

Beaker Browser is a cool experiment showing how you can decentralize the web that is both easy for end user's and fun for developers because it pushes web standards.


*disclaimer: I help develop Bunsen Browser, the mobile companion for Beaker Browser.

Could you explain to me, a newbie, what Beaker promotes and what its advantage is against browsers that use HTTP?

I'm not sure this is a coherent plan, since it doesn't talk about how privacy rules get enforced for services. Who vets the services? If a fun game that you let have access to your "personal data pod" and it turns out to be Cambridge Analytica and just copies everything it sees into its own database, how is that an improvement over Facebook apps?

Choosing between service providers is no more meaningful for privacy than asking Windows users to download arbitrary apps. If smart phones are any more secure than desktops, it's because Apple and Google are constantly improving OS-level security and policing their app stores for malware.

Of course app stores have well-known flaws. But if we want to do better than that, someone has to figure out a better way to choose good rules and enforce the rules better.

The people dont care about decentralization or centralization. This is all a big generalization, but humans are lazy and when it comes to making a moral choice, they're going to pick the path of least resistance and completely ignore all moral consequences. Its looks like at the moment centralized services are what the people want and it's what they deserve.

Why is "decentralisation" even the moral choice to begin with. A lot of projects claim to be decentralised, but when you ask "ok, who has the power in this project" it turns out that a small cabal of developers has most of the same rights and powers a corporate, centrally hosted service would have. It takes very careful analysis to discover whether a thing is really meaningfully decentralised or is just claiming to be.

Shameless plug but I designed and wrote something for doing this from 2011 to 2015 because nothing like it existed or indeed exists as far as I'm aware.

It's a p2p caching proxy that also lets you edit web pages collaboratively in realtime over a LAN or the internet. It has a contacts list system and p2p chat functionality. This project effectively died due to lack of interest and I still have various security concerns about it (Should you break/reimplement Same-Origin policy or break/reimplement the TLS chain of trust?)

The main security concern is that because it decentralises HTTP in-place (existing URLs can now be looked up on an arbitrary number of overlay networks if the original URL isn't providing an OK response) it puts users at risk of malicious actors spamming overlay networks with browser exploits for popular resources like "news.ycombinator.com/".

I hope TBL and co converge on satisfying answers to these problems or constrain their design to not bother with decentralising existing URLs in-situ.

Code lives here: https://github.com/Psybernetics/Synchrony

Feel free to shoot me any questions.

Other people here have said this general idea here, the large centrealized services like Google and Facebook have succeeded in becoming so big through a lot of effort and a lot of cost, which was paid for by all the money they make. At a minimum they have to pay for their server use.

From what I understand the proposal here seems to not allow for the advertising model. I don't think a services can grow and survive making people paying because people are too cheap.

There might be a better chance for something like this is they allow for the economics. - Maybe the data host can provide a "advertising" profile which the user has control of. This can be exposed to the application hosts to allow for advertising. - Maybe you also throw micropayments into the mix, along with bartering for information or micropayments.

Another issue is complexity. A number of comments have talked bout over-engineered solutions and protocols. This decentralizezd idea could be started with something small like an open social network standard. I think I saw something similar to this on HN not too long ago: - You have a web site, which is your profile. A provider could give you a nice editor for it. - You have a feed, where you can put pictures, short posts, long posts, whatever. This is distributed with RSS. (The host makes this all seamless for you.) - Identity is controlled with OAuth, used only to give an identity to visiting users. The owner users can manage permissions for certain remote users (his "friends")

Such a service could be managed on your own web server, or there could be different cloud providers that make this arbitarily easy, with arbitrary levels of functionality on the "profile" page, the "feed" and the "friend" permission management.

> From the above, it is clear that our primary obstacles are not technological [5]; hence Tim Berners-Lee’s call [6] to "assemble the brightest minds from business, technology, government, civil society, the arts, and academia to tackle the threats to the Web’s future". Yet at the same time, computer scientists and engineers need to deliver the technological burden of proof that decentralized personal data networks can scale globally and that they can provide people with an experience similar to that of centralized platforms.

This whole article looks like "well, the obstacles are not technological, but let me write a few pages about technology anyways".

If the obstacle are not technological, then we need non-technological solutions. So far I think GDPR is one such non-technological step towards taking back control of our personal data.

The hardest problem in my opinion is "preventing the spread of misinformation" because we essentially need a way to distinguish between malice and stupidity. Without mind-reading I do not see how this could be possible at scale.

Yes, I was asked specifically to write a chapter on Tim's work. What we have is the technology to make it work—it's a necessary, but not a sufficient condition.

I can't escape the feeling that SOLID will be at best neutral but likely will make things worse. Some of the diagrams show more companies having access to your data where it will continue to be mined, sold, etc. If you control storage but not execution it seems like you control nothing.

You control access permissions too :-)

An application with access to bits in your pod can always copy that data to its own storage right? e.g., a Facebook built on Solid can still grab all the data it needs (with the requisite perms) and store it away, build a profile per WebID, continue serving "cached" copies of changed content, etc. What are your thoughts on this?

At that point, it becomes a legal matter. Not everything can be solved with technology. (There's homomorphic encryption, bet let's leave that aside for now.) The GDPR legislation in Europe sets a good precedent for demanding removal of our data.

The crucial point is that Solid will bring more choice: there will be social feed viewers that will be more invasive, and those that will be less invasive. People can choose the one they like, without consequences as to whom they can interact with. Today, we do not have a choice: if we want to interact with people who use Facebook, we have to use Facebook as well.

The problem is people. To put it charitably, not everyone is "technical" enough to figure out how to own their own data, so I think silos and walled gardens are here to stay, because they are quick and easy for people. I for one, fully support keeping my own data in (as much as possible) future-proof formats, and although I've had a blog in some form for years, I want to move away from standard social media as much as possible.

The web cannot be decentralized without putting an end to SSL. As long as certificate-issuers are the arbiters of commerce and browsers push users to trust unsecured websites more and more, malicious governments will be able to silence people by revoking their certs.

There are stronger alternatives. We need to make a push to begin using them.

I'm not sure what stronger alternatives you're referring to. Can you elaborate or share any resources?

If, instead of identifying services by some human readable name, they are identified by their public keys, then we don't need certificates - there are several encrypted and authenticated transport protocols which only require knowledge of the destination's public key upfront.

You then need an alternative name system which links a unique human readable name to a public key. This is the tricker part (see Zooko's triangle), but there are some creative solutions like Namecoin and the Blockstack Name Service.

> links a unique human readable name to a public key

Easy: use DNS, store the PGP key ID a TXT records, and then look up the public key for that ID using a PGP key server.

Your argument might sound stronger if you didn't use the name of a dead protocol. All use of SSL has been prohibited by RFC since 2015.

I'm pretty sure there aren't better alternatives.

People tend to use "SSL" to mean "CA-based TLS" these days, as in "SSL certificate".

I went to the Decentralized Web Conferece a few years ago and really liked it. In spirit, I am onboard.

In practice, I am satisfied with just using my own domain for email, my web site, and self-hosted blog. For communication I like FaceTime so I can see people while I am talking with them, phone, and email.

I still use social media, very occasionally, to see what people are doing and sometimes advertise my new open source projects and updates, and any books I write. Most of the problems people talk about with Facebook/Twitter don’t bother me as long as I only use the systems infrequently. I am not tempted to cancel my accounts.

The design, typography, and diagrams in this article are wonderful. I like it when people pay this much attention to detail!

This essay is in radical need of a TL;DR. If something is this important, you owe it to the subject matter not to bury the lede under a mountain of history and flowery exposition.

Ask yourself: who is this for? People who are not already deeply passionate will stop reading unless they are engaged in a minute of reading. Note that a minute is being extremely generous; on a commercial consumer site, it's apparently an average of 7 seconds before someone will click away.

I recommend that you check out this video and reconsider how you might reframe your message as a call to action that speaks to a better future we can create together.


I even jumped you to the good part.

I definitely scrolled around, looking for some kind of summary. I wantee to figure out if I was gonna be wasting my time reading another recap of what I lived through or if there was gonna be a proposal for some way to get back to decentralization that I could evaluate and keep in mind when designing my own apps. Couldn't figure it out from scrolling so I bailed.

I’m all for the principles here, but one worry I have is the loss of efficiencies afforded by economies of scale which could dramatically increase the carbon footprint compared to the centralized versions.

This is why carbon needs to be priced so you could have facts about the magnitude of the problem (spoiler: probably pretty small) instead of trying to make qualitative tradeoffs (is it worth destroying society to save the environment?)

Decentralized web can be downloaded and backed up by one entity. Then, you can go to that centralized entity to enjoy all the content.

If we still don't have decentralization, it's because it is not as easy.

In 2005, I worked at a startup that attempted to solve the problem of privacy and security for personal information (photos, home videos, music, personal health/finance documents, contacts etc) while also providing ability to share and collaborate.

The solution involved running a mesh network with nodes on user's laptop or desktop and a corresponding node in the cloud. These nodes would index local data and provide replication of metadata across nodes and backup of actual data to cloud node.

A locally running web app acted as replacement for 'windows explorer'. It allowed the user to access all their files and folders across all their nodes, access them (open document, play music/videos, see contacts etc), create smart collections and share these files, folders or collections with other users in a secure authenticated and private manner.

User got an identity - which comprised of a dedicated domain (or subdomain) and a PKI certificate tied to that domain. Each node had it's own private key and their public keys were tied together by the identity certificate.

All communication between nodes (of same user or across users) where authenticated and encrypted using these identity/node keys and certificates. No central node existed in the system that could spy on these activities. The architecture separated the network discovery cloud nodes from your data cloud nodes and architecture allowed for your data cloud nodes to be hosted separately anywhere (say, in your own cloud instances).

This is the only system I have seen that utilized zero knowledge protocols and made it accessible to common people to manage their data and share with others as well.

But unfortunately, as a business it never took off. It got acquired by emc and merged with mozy (good old data backup company) and then this product died a silent death in 2010.

Maybe it was timing, maybe after snowden, if this product had launched it would have done well.

But now, I think a more urgent and a relatively less complex problem to solve is one of distributed communication. In this era of always connected powerful devices (mobile phones, home gateways), why don't we all have our own personal email/chat servers that nobody else can spy on? Why does email and chat have to get relayed via big aggregators who mine so much data as well as metadata?

Not only do they violate privacy, they succumb to security breaches and cause serious damages.

I feel the stage is set for this disruption: crypto protocols, always-on cheap connectivity, compute power at the edge, and sensitivity to privacy/security in general population – all of these ingredients are appropriately set right now for this to happen.

Fascinating to read about your experience with the startup, and how the service was architected.

> unfortunately, as a business it never took off

Sounds like the timing was too early in 2005. I believe these days we're all so tired of the privacy and security situation, that the world is ready for something like this.

> [it] utilized zero knowledge protocols and made it accessible to common people to manage their data and share with others

This describes exactly what the re-decentralized web needs.

Seeing the many attempts over recent years, it looks like there are significant technical, financial/business and social challenges - but I totally agree with your conclusion, that "the stage is set for this disruption". It also feels like the tide is rising, that the solution is being worked on from numerous fronts and eventually a more evolved system will be adopted by the public.

Sounds like https://keybase.io/

The point about the decentralised web allowing the permissionless creation of centralising systems reminds me of the paradox of tolerance, where tolerant societies are thought to be taken over by intolerance if they tolerate that intolerance.

Maybe this is a lesson that we need to be less tolerant towards the creation of centralised services because those with money and power will seek to bring decentralised systems under their own control.

For the technically savvy, you can run a virtualized desktop:

  - GPU passthrough VM (gaming)
  - SATA passthrough (FreeNAS)
  - multi NIC passthrough (pfSense/OpenWRT)
  - app server/cloud/P2P Linux or FreeBSD VM(s)
http://unraid.net sells a KVM-based product. VMware ESXi and XenServer are free. Connect a Ubiquiti AC-Lite WiFi access point to a dedicated NIC on the x86 box, WAN to another NIC. Since pfSense owns the WAN NIC, it can host a VPN server for your devices, including mobile. All VMs get virtual NICs. Dell T30 with quad-core Xeon and ECC costs about $400 with 8GB RAM and 1TB disk, it can hold 4 x 3.5" drives (20 TB in RAID-1) and 2 x 2.5" SSD.

Level1Techs has intro videos on home servers: https://www.youtube.com/results?search_query=level1+home+ser...


  - Stable and boring x86 platform
  - Good performance for gaming
  - Commercially supported hardware
  - Upgradeable storage and GPU
  - Upgradeable router software

Great article Ruben! I've been following Solid's progress for a while, and I think your article very eloquently summarize its purpose and relevance. I'm especially interested in the ability to circumvent the middle-men, and resolve the marketplace chicken-and-egg problem once and for all.

Watching your TED talk in 2013 was one of the most influential moment in my life, and discovering the semantic web was perhaps my greatest epiphany. While the vision never left my mind, I never acted on it. Until now.

I'm dedicating 2019 to linked data. I'm going all-in.

Last week, I started to build a tool to convert unstructured input to linked data. Even after recognizing canonical literals (email, phone, url, color, gender, boolean, integer, float, date, time span, money, weight, distance, language, image, geo coordinates), I couldn't accurately infer predicates and guess classes. Before trying more complicated stuff like bayesian inference, I decided to try a simpler exercise.

This time, I want to aggregate structured data from different sources and map it to some existing ontologies. For example, I want to convert some JSON about comments and links from Reddit and Hacker News to RDF using the http://schema.org vocabulary.

- Can I feed the JSON into some ML system that automatically figures out the mapping? What if I provide some annotation or feedback?

- Can I manually turn the JSON into JSON-LD and use that as the mapping information? What about complex transformations (different structures and literals)?

- Should I implement the mapping manually using my favorite programming language?

- Should I use R2RML or RML?

What's the state of the art today for semantic data integration?

Maybe take a look at FRED? (Disclaimer: not used it myself)

- Homepage http://wit.istc.cnr.it/stlab-tools/fred/

- Paper https://www.researchgate.net/publication/280113533_FRED_From...

There are likely other projects and papers, google 'text to rdf nlp'

Stephen Reed (ex-Cyc engineer) also did some interesting work in this field, in his Texai project, over 10 years ago. Although there are few references to it on the web now: that part of his project is no longer open source (and I know of no known mirrors).

- Paper https://pdfs.semanticscholar.org/8026/107de65c5a14aa8d0d47f9...

- Homepage http://texai.org

Very much related, "Populating the Semantic Web—Combining Text and Relational Databases as RDF Graphs", Kate Byrne.

- http://homepages.inf.ed.ac.uk/kbyrne3/docs/thesisfinal.pdf

Related paper by Stephen Reed, "A (very) brief introduction to fluid construction grammar"

- https://www.researchgate.net/publication/228378264_A_very_br...

Actually, the online API for FRED seems broken, and none of it seems to be open source - and the paper is light on details.

Solid looks to be trying to reimplement what platforms like Ethereum are already building. The same ethos is there and this is very well written but I wonder if the Solid project just missed that when doing their research. Hopefully all of their efforts don't go to waste and they can extend some of their work to the broader decentralized web community.

No: blockchain technologies are about reaching decentralized agreements. Solid is about everyone being able to write their own things (so no agreement) without centralized parties.

The Ethereum project is about providing a complete decentralized web3 stack - not just a blockchain, though the database layer that provides is a critical part of it.

Not really, it is pretty much the other way around. :-) We're basically building the simplest thing that could possibly work, they are rebuilding a lot of infrastructure that they have to use, but we can use where it makes sense. So, they are kinda trying to implement the Web, which is a much bigger task than adding access control and identity... :-) There's also been quite a lot of overlap between people working on Solid and working on Blockchains in the past, so we know it well. But we're not really in competition, we'd be fine coexisting.

Just FYI, it is unclear from your comment, which of these organisations you associate with.

Is "we" Solid or Ethereum?

In kjetilk's profile...

> about: Hacker, community guy and project release manager at Inrupt, working on Solid.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact