
Re-decentralizing the Web, for good this time - Schoolmeister
https://ruben.verborgh.org/articles/redecentralizing-the-web/
======
tmcw
It puzzles me that the linked data future is still discussed, as if we didn't
already try it, and didn't already discover that developers dislike arcane RDF
standards and the academic-rooted designers of the specifications have a
terrible track record of solving real-world problems. And that now they're
presenting linked data as some critical component of the decentralized web
while skipping out on the debates that everyone else in the space is having -
like whether decentralization can be fast, or how to ensure data authenticity,
or whether a 'local server / pod' can be built that doesn't get hosed by hole-
punching through a home Comcast connection.

Instead, it's just 'what about old-fashioned websites, plus lots of xml schema
and long spec documents'? It just tastes like a rehash of Berners-Lee's
existing '5-star open data' schpiel (
[https://5stardata.info/en/](https://5stardata.info/en/) ) but now with the
billing that it'll fix the internet. 5-star open data has been around for
years now, and, well, the linked data future isn't here. When's the last time
you consumed RDF in an application?

~~~
svachalek
I really, really sympathize with the goals here but when I read through these
proposals a few months ago I literally facepalmed. They seem about as
realistic as praying for some kind of deus ex machina.

Ultimately I think there are technical solutions to making the decentralized
web more attractive than the walled gardens, but at this point they will need
to be ridiculously polished and shiny to even get a look, and this stuff... is
not. Going forward it gets even worse, they're going to be opposed at every
step by corporations with more money than most nations.

The internet was originally decentralized because the government wanted to
make it that way, and I think the only way to get back there is going to
require a gigantic, economically unattractive investment. There are at least a
few governments that may have the capability but I can't name one that would
have the motivation. Hopefully some billionaire's charity will decide saving
the internet is a worthy legacy.

~~~
cookiecaper
The internet is _already_ decentralized. Some billionaire can't do anything to
fix the situation, at least not directly, because our draconian copyright and
network access laws are the only reason that walled gardens are able to exist.

The internet doesn't really tolerate serious technical barriers stopping
someone from automatically multiplexing the content from various social
networks into a single read-write stream, for example. The issue is that when
someone attempts to do that kind of thing, they get sued and they end up owing
BigTechCo millions of dollars. [0]

An open internet is _not_ a technical issue. It's a legal one.

[0] [https://www.eff.org/cases/facebook-v-power-
ventures](https://www.eff.org/cases/facebook-v-power-ventures)

~~~
cwyers
The walled gardens exist because the open Internet kind of sucks, really.

E-mail is pretty much the last bastion of the old open Internet, and the
amount of resources needed to just deal with malicious e-mails is huge.
Mindbogglingly huge. And those costs cut out a lot of organizations from being
able to operate their own e-mail servers (either the costs of doing it or the
costs of verifying to the big players that the e-mail you're sending isn't
garbage).

And that's pretty much the story across the board. The old Internet was
overwhelmed by bad actors who would ruin everything. Facebook and Twitter
house a lot of awful stuff. But can you imagine how bad it would be if we were
all still using USENET and IRC?

~~~
scottlocklin
I liked Usenet and IRC. It would make the internet great again.

~~~
zzo38computer
I still use IRC. I run my own email server (and haven't had problems so far).
There are more protocols than just SMTP and HTTPS. (I even sometimes make up
new ones, although sometimes the existing ones will do, even the ones that
aren't so common.)

And, yes I agree, it would make the internet great again.

------
kodablah
I think the part ignored by so many is the need to decentralize the computers
into the home. I'm not talking meshes or shared resources. For the majority of
use cases, we don't need distributed storage, compute, etc. Just start making
these self-hosted "servers", "data pods", etc as easy to install as desktop
software and make it clear that they are inaccessible when the computer is
off. People that aren't already will gravitate towards at least one always-on
machine in their house. Modern societies have reasonable upload speeds and
electricity/network uptime to support it. Sure, things like ISP firewall/NAT
and dynamic IPs are a bit of a barrier, but you can have volunteers help with
relays.

For example, I can easily fire up a Tor onion service on my never-turns-off
home desktop computer and reach my stuff from anywhere. Why can't I reach my
friends' stuff the same way? Because, to use business-speak, there's nothing
"turnkey". It's something I've been pondering and working on. Sure, the bigger
players may have to be in DCs, have more stringent uptime requirements, and
distribute their bandwidth/workload more. But for most of us, desktop software
and web-of-trust style connections could go a long way so long as the front of
the software has a FB feel (e.g. a feed, messages, etc). We can tackle
discovery, searching, aggregation, offloading, etc later.

~~~
wmf
Try Sandstorm.

Home servers are a very difficult sell (see $500 Helm) compared to VMs running
in a data center and IMO the privacy difference is mostly illusory.

~~~
kodablah
/me goes to sandstorm.io, click install, skip past the paid version on to
self-host, see it only works on Linux, close tab

^ That is your expected reaction by normal desktop users. I mean literally
download an exe and up pops up your feed ready to add your friends, or
favorite businesses, news sites, link aggregators, etc given their onion ID
(yes, onion ID is annoyingly large, especially v3, but discovery/identity
comes later, don't let it hold up the system).

I'm not convinced you need a "home server" in the traditional sense. Just
accept what you lose, uptime, if you use your laptop or phone to do the
hosting. You can share between them too given a synced private key which is
the software's job, not the user's. Still, an ephemeral self-hosted-on-desktop
social network can go a long way (and again, people will let the need for
uptime drive their always-on desktop decision). This stuff requires such
little resources to start, a cheap Raspi w/ a install/reach-from-other-device
would work just fine if they don't have a home computer and want one just for
this. Large storage can come later.

I do agree the privacy difference is minimal.

~~~
ocdtrekkie
I think the ideal use case for Sandstorm is either the service model or best-
case the "your family computer guy runs it for a ton of people you know". If I
stand up a Sandstorm server at home this year (which is likely), I'll probably
allow any friends or family to use it.

The "normal desktop user" should probably not be running their own self-
hosting setup, because they will fail at backups and reliability and
performance.

~~~
hinkley
Or, one proposes a system with redundancy built into it, so that a machine
going down or getting soda poured on it doesn’t take out the whole network.

------
sascha_sl
The W3C has a proven track record to produce overengineered shit when it comes
to "the semantic web".

Just look at ActivityPub. It's essentially OStatus but instead of XML we
slapped namespaces on JSON, wrote a bunch of overly complex preprocessing
procedures so that everyone can output just the way they want[1] and still
made half the spec ambiguous enough[2] that implements essentially follow the
one rule that matters, maintain compatibility with Mastodon.

[1]: [https://www.w3.org/TR/json-ld-
api/#algorithm-5](https://www.w3.org/TR/json-ld-api/#algorithm-5)

[2]: [https://please-just-end.me/ap.html#block-activity-
outbox](https://please-just-end.me/ap.html#block-activity-outbox) (domain name
relevant to content)

~~~
zozbot123
The Semantic Web does _not_ require XML these days. JSON-LD (for JSON) and
GRDDL/Microdata (for HTML) are widely-acknowledged standards. For simple text
use, akin to a Markdown-formatted document, you can use Turtle. If you believe
that JSON-LD is genuinely ambiguous, take it to the authors of that spec and
contribute to getting it fixed.

~~~
sascha_sl
Do you think a format that requires this amount of code to predictably parse
is a good format?

[https://github.com/kazarena/json-
gold/blob/master/ld/api_nor...](https://github.com/kazarena/json-
gold/blob/master/ld/api_normalize.go)

------
StreamBright
"In order to regain freedom and control over the digital aspects of our lives"

Nothing proves his point more than:

    
    
      <script src="//www.google-analytics.com/analytics.js...

~~~
rubenverborgh
How does running analytics on my site impair my ability to talk about what Tim
Berners-Lee wants to do for freedom and control? ;-)

Yes, I track how popular what content is on my site. Motivates me to write
more. Please feel free to block trackers; I do that as well.

~~~
amiga-workbench
Have you considered Piwik or a similar self hosted analytics solution?

I don't care if people run Analytics, its just when they go sharing all that
data with a third party that it gets troubling.

~~~
rubenverborgh
Yes, I urgently need to migrate; same with Disqus, needs to become Solid.

------
crucini
I skimmed this and see two big problems.

First, this idealistic idea that "we" are going to take back our data. Who is
this we? Only the smart, high-agency people who have time to spare. The
commercial web is increasingly tuned to the normal user, who is low-agency and
easily led around. Who will win a battle of user acquisition and retention?
Facebook or the rebels? Facebook of course. So any solutions proposed here are
just for a tiny percentage of users who will then be isolated from the real
and useful social networks. Or more realistically use both.

Or maybe if the infrastructure is built, a layer of savvy entrepreneurs can
emerge to monetize it? I'm thinking of reaganemail, selling an anti-google
email account to the AM radio crowd.

Second, the idea of somehow eliminating censorship. De facto censorship will
always exist, even if you sugar coat it as Twitter has tried - "your content
is still there, but only if someone explicitly looks for it". Any platform
without censorship will just be flooded by every marketer and political
zealot, for starters.

Also, I think he is conflating filter bubbles with centralization. Without
centralization, wouldn't we still have filter bubbles as people self-select
into their online communities?

~~~
gerbilly
Well, when I got on the internet in 1988, we were all 'smart, high-agency
people who have time to spare'

Supposing we manage to solve this problem, what's to say average people can't
participate in 10 years or so or so when the tech has been made easier to use?

~~~
flixic
Early web was quite decentralized already. Many separate Bulletin Boards,
later forums. Many people writing there had an idea how to create their own.

It didn't start centralized. Centralization happened. I might be more cynical
than I should be but as a designer I struggle to see the future in which we
have _social dynamics_ that favor decentralization instead of convergence into
a less self-managed system (i.e. all current centralized networks).

~~~
repolfx
No it wasn't. Vast reams of content were hosted exclusively on GeoCities. In
fact almost all "home pages" were on GeoCities or AOL back then. There has
never really been a time in the history of the internet when a few small
providers or companies didn't have outsized dominance - DARPA early on, then
Netscape, AOL and GeoCities, then Microsoft, Blogger, WordPress.

This sort of discussion looks often like rose tinted spectacles. The past
wasn't so different to today.

------
orthecreedence
I've read a bit about Solid in the past, but never quite understood how it
will handle different data models. Does it force social data to all look the
same (as in, have a predefined set of fields)? If not, how do apps built on it
interoperate?

Don't get me wrong, I'm all for projects like this. I think it's wonderful. I
just never really got how the apps will work with the same data without being
forced into a particular data model (which seems like it would limit what you
could do).

~~~
kjetilk
Great question, because it is basically the most ignored problem in the
Semantic Web community and thus the one that we are spending quite a lot of
time on.

So, basically, there is one _data model_ , RDF, but RDF does not require the
same set of fields, to the contrary you are free to write your own. Obviously,
you wouldn't get good interoperability if you do. So, there are several things
you can do:

1) Adopt what others are using 2) Map your "fields" (we're more for calling it
vocabularies), to the stuff others are doing, and rely on apps to figure out
interop using reasoners. 3) Don't care, your app will work fine for you.

I mean, 3) is fine, it is just that you'd be missing out. 2) also works,
kinda, but reasoners aren't all that easy to use, so I'd mostly like to see
people go for 1).

So, we need to make it really easy to find existing stuff. You could go for
the big one, i.e. [https://schema.org/](https://schema.org/) or you could go
more in detail and look at
[https://lov.linkeddata.es/dataset/lov/](https://lov.linkeddata.es/dataset/lov/)
. The former has a lot of traction, the latter is real decentralized, so I
kinda prefer that.

Then, we have to make it real easy to author new stuff when you can't find
existing stuff, because that will happen. Then, we need to make it easy for
others to find yours, so that they can start using it too for similar
applications. And, I'm thinking that it will be kind of a graduation process,
where you first look for existing stuff, and when failing to find anything,
you just mint your own without thinking about others, just to get something
that works up and running. Once your app starts gaining traction, you tighten
it up, and if then something other gets popular, you can migrate to that with
little disruption.

So, we're not there yet, but we're thinking and working on it a lot.

~~~
orthecreedence
Thanks, this is a really great and thoughtful reply, and it's good to know
serious work is being done here. I can't wait to see how the project unfolds!

------
peterwwillis
The web is decentralized, and we already have control of our data. The problem
is, people keep giving it away.

I'm fairly confident that 98% of the population of the earth doesn't give a
crap that their data is collected, or that they don't "control" it. This whole
"decentralized web" thing is just privacy nerds trying to convince us that we
need this, when really no regular consumer is asking for it.

~~~
rubenverborgh
Technologically, yes. But centralization takes many shapes (as I explain in
the article: [https://ruben.verborgh.org/articles/redecentralizing-the-
web...](https://ruben.verborgh.org/articles/redecentralizing-the-web/#ashort-
history-of-de-centralization-and-the-web)).

I encourage you to read the article, where you'll see that I'm arguing from a
permissionless innovation perspective, not so much privacy.

~~~
peterwwillis
We can already do this - but it's a bad idea.

Plenty of services are API compatible with Amazon S3 (e.g. anyone can run
their own S3 clone) so people can modify existing sites to use S3 with OAuth.
Use OAuth to allow the user to delegate access to their S3 service link. No
new protocols needed, no big innovations required.

But for this to work on anything other than the most rudimentary data (media
files, blog posts, and serialized data) would require completely changing the
way all modern applications are written. Databases would all have to change,
APIs would all need to follow specific standards, and networks would need to
become a hell of a lot more stable, higher bandwidth, and lower latency.

Assume you're Twitter, and you want to map-reduce all of the data of all your
users to find out how many people retweeted a user, and then notify those
users. Now you need to connect to every user's service provider, get their
data, store it temporarily on your own servers, duplicate everything, do your
processing, and then write changes back to all storage services for all users.
Now do this _every second_. If you don't, you have to store this map-reduced
data on your own service's storage, which violates the principle of only using
the user's storage pod.

In fact, data would have to become _more centralized_ to work in this model.
Currently, application data exists across a range of services in a variety of
networks, all of it being dynamically accessed in different ways before it is
accessed by a user. There are dozens of different databases used just to open
up the TV Guide on your cable company's set-top box. All of that would have to
be centralized in one or two databases in order for the storage and processing
to be disconnected.

Not only that, but a lot of data is useless to anyone but the original service
provider or original application. Only a Facebook clone would be able to use
Facebook's data, and only data relevant to Facebook's ad sales should stay on
Facebook's servers, even if it contains "Peter clicked on ad X at Y time".
Should there be a separation of what _kind_ of data gets decentralized? Do we
really want to go down the rabbit hole of what is _my_ data, and what is data
_about me_ that a company has originated and created value from? (Is a picture
mine because it's a picture of something I own, or is it mine if I took the
picture?)

The idea that every component of every application could be completely
decentralized from each other is unlikely. Now, what is more in the realm of
possibility is doing a Google or Facebook, and creating features that allow
exporting or importing all data. But that process is not perfect, and the
procedure can take from minutes to days. And to use this data it would still
all have to follow standards specific to a particular application.

And again, we already have a lot of these data standards. We have standards
for most of the _kinds_ of data that exist today, such as calendar, contacts,
e-mail, instant message, voip, office documents, images, and so on. We have
standards to synchronize and syndicate data feeds. We have standards to
federate accounts and manage permissions. But commercial sites don't natively
build these features as interoperable with each other - because, why would
they?

Storage and processing of data are intimately connected with the specific
applications that use them, and trying to decouple them will result in
inefficiency and complication, with no clear advantages.

~~~
kjetilk
OK, so nobody said decentralization is easier. There's been plenty of academic
papers saying pretty much the same as you do. But we have to, not for
technical reasons, but for ethical and social ones. So, we're starting to
tackle it head on.

Your TV Guide is a good example of things that aren't hard. They don't change
very quickly, so you can just use a cache. That's easy.

Finding the number of RTs, that's also easy, apart from it being an open world
of course. When they RT, they notify you. And you want to display those RTs
with your tweet? Just cache those who notified you.

Stable data access standard? That is Solid itself. And the data model, that's
RDF.

There are ways that you can go about doing this stuff.

Finally, we're also getting some traction around this in academia, they've
been hung up in stuff that isn't helpful for too long.

~~~
peterwwillis
Actually, the TV Guide example uses data that updates constantly. Every single
interaction a user has is recorded and is used by other systems. The guide
also changes based on user-specific views or preferences. Another example is
Netflix's famous user-specific recommendations, which, changing constantly and
whose algorithm is regularly fine-tuned, is a strategic feature. Even just
playing a single show requires a dozen different calls to authorize its
playing, based on a number of considerations.

Finding the number of retweets is also more difficult, because there's other
data that gets recorded too. Not only do you have your own data now, you now
have the data of everyone else that retweeted you. Is it your data, or theirs?
Who is caching it, and how long? How does refreshing the cache effect
consistency of each user's views? With decentralized applications you have to
choose what kind of functionality you will support.

But, yes, in theory, if you allowed only _one_ service provider to use some
given data, you could rely on caching (read: holding a copy of data
indefinitely) to a good extent. But as soon as you have multiple using it, you
enter the extremely hairy world of multi-master high-availability strong-
consistency replication. AKA, absolute hell. But this isn't even the most
difficult problem to me.

We already had some good data access standards. The question is, why weren't
sites using them to allow data interoperability/mobility? Answer: they didn't
want to. So even if you create a technical solution for all of this, the best
you will get is the Facebooks of the world publishing a read-only calendar
feed, clunky, slow export tools, and single-feature one-way application
integrations. Like we have now.

I don't see an ethical or social reason to decouple the data from the services
I use, and I don't think the majority of the world population does, either.
The only ethical/social concern I have is with the very existence of the
service, which is a different concern.

------
jpollock
Technology won't help with this, regulation will.

We have parallels from other platforms - specifically the fixed and mobile
phone networks.

There used to be monopolies in local phone service. There were new
competitors, but to change provider, you had to change phone number.

Even changing cell phone provider required a number change.

This obviously had strong network effects pulling you to stay with your
provider. You had to tell _everyone_ in your extended network where to find
you and have them update all of their business records when you changed from
one carrier to another.

Eventually, everyone figured out this was stupid, and Number Portability [1]
was forced on carriers by regulation.

This problem is completely gone now. You can take your number with you.

If we allow people to take their data to new social networks, and force
federation, then we will get decentralization. However, it won't happen
without regulation anymore than it did with the phone companies.

[1]
[https://en.wikipedia.org/wiki/Local_number_portability#Histo...](https://en.wikipedia.org/wiki/Local_number_portability#History)

~~~
TylerE
If you have ever seen any attempt by any government to regulate software you
would know this to be a Lovecraftian nightmare, and not a solution.

~~~
jancsika
If you have ever seen any attempt by any volunteer-run FLOSS team to solve
nationwide social problems with technology you would know this to be a
Lovecraftian nightmare, and not a solution.

~~~
TylerE
But you aren’t legally mandated to use it.

~~~
jancsika
Also-- companies that aren't legally mandated to allow third parties to
reverse engineer their medical devices would otherwise use every means within
their power to stop you from doing that. We know this because companies
outside of the medical industry use every means within their power to stop you
from doing that.

And I can't see how a response of "let's get rid of the chess board so they
can't play" would be an adult response to this problem.

------
staticvar
Beaker Browser is a cool experiment showing how you can decentralize the web
that is both easy for end user's and fun for developers because it pushes web
standards.

[https://beakerbrowser.com](https://beakerbrowser.com)

*disclaimer: I help develop Bunsen Browser, the mobile companion for Beaker Browser.

~~~
MarsAscendant
Could you explain to me, a newbie, what Beaker promotes and what its advantage
is against browsers that use HTTP?

------
skybrian
I'm not sure this is a coherent plan, since it doesn't talk about how privacy
rules get enforced for services. Who vets the services? If a fun game that you
let have access to your "personal data pod" and it turns out to be Cambridge
Analytica and just copies everything it sees into its own database, how is
that an improvement over Facebook apps?

Choosing between service providers is no more meaningful for privacy than
asking Windows users to download arbitrary apps. If smart phones are any more
secure than desktops, it's because Apple and Google are constantly improving
OS-level security and policing their app stores for malware.

Of course app stores have well-known flaws. But if we want to do better than
that, someone has to figure out a better way to choose good rules and enforce
the rules better.

------
deevolution
The people dont care about decentralization or centralization. This is all a
big generalization, but humans are lazy and when it comes to making a moral
choice, they're going to pick the path of least resistance and completely
ignore all moral consequences. Its looks like at the moment centralized
services are what the people want and it's what they deserve.

~~~
repolfx
Why is "decentralisation" even the moral choice to begin with. A lot of
projects claim to be decentralised, but when you ask "ok, who has the power in
this project" it turns out that a small cabal of developers has most of the
same rights and powers a corporate, centrally hosted service would have. It
takes very careful analysis to discover whether a thing is really meaningfully
decentralised or is just claiming to be.

------
LukeB42
Shameless plug but I designed and wrote something for doing this from 2011 to
2015 because nothing like it existed or indeed exists as far as I'm aware.

It's a p2p caching proxy that also lets you edit web pages collaboratively in
realtime over a LAN or the internet. It has a contacts list system and p2p
chat functionality. This project effectively died due to lack of interest and
I still have various security concerns about it (Should you break/reimplement
Same-Origin policy or break/reimplement the TLS chain of trust?)

The main security concern is that because it decentralises HTTP in-place
(existing URLs can now be looked up on an arbitrary number of overlay networks
if the original URL isn't providing an OK response) it puts users at risk of
malicious actors spamming overlay networks with browser exploits for popular
resources like "news.ycombinator.com/".

I hope TBL and co converge on satisfying answers to these problems or
constrain their design to not bother with decentralising existing URLs in-
situ.

Code lives here:
[https://github.com/Psybernetics/Synchrony](https://github.com/Psybernetics/Synchrony)

Feel free to shoot me any questions.

------
gpsx
Other people here have said this general idea here, the large centrealized
services like Google and Facebook have succeeded in becoming so big through a
lot of effort and a lot of cost, which was paid for by all the money they
make. At a minimum they have to pay for their server use.

From what I understand the proposal here seems to not allow for the
advertising model. I don't think a services can grow and survive making people
paying because people are too cheap.

There might be a better chance for something like this is they allow for the
economics. \- Maybe the data host can provide a "advertising" profile which
the user has control of. This can be exposed to the application hosts to allow
for advertising. \- Maybe you also throw micropayments into the mix, along
with bartering for information or micropayments.

Another issue is complexity. A number of comments have talked bout over-
engineered solutions and protocols. This decentralizezd idea could be started
with something small like an open social network standard. I think I saw
something similar to this on HN not too long ago: \- You have a web site,
which is your profile. A provider could give you a nice editor for it. \- You
have a feed, where you can put pictures, short posts, long posts, whatever.
This is distributed with RSS. (The host makes this all seamless for you.) \-
Identity is controlled with OAuth, used only to give an identity to visiting
users. The owner users can manage permissions for certain remote users (his
"friends")

Such a service could be managed on your own web server, or there could be
different cloud providers that make this arbitarily easy, with arbitrary
levels of functionality on the "profile" page, the "feed" and the "friend"
permission management.

------
qznc
> From the above, it is clear that our primary obstacles are not technological
> [5]; hence Tim Berners-Lee’s call [6] to "assemble the brightest minds from
> business, technology, government, civil society, the arts, and academia to
> tackle the threats to the Web’s future". Yet at the same time, computer
> scientists and engineers need to deliver the technological burden of proof
> that decentralized personal data networks can scale globally and that they
> can provide people with an experience similar to that of centralized
> platforms.

This whole article looks like "well, the obstacles are not technological, but
let me write a few pages about technology anyways".

If the obstacle are not technological, then we need non-technological
solutions. So far I think GDPR is one such non-technological step towards
taking back control of our personal data.

The hardest problem in my opinion is "preventing the spread of misinformation"
because we essentially need a way to distinguish between malice and stupidity.
Without mind-reading I do not see how this could be possible at scale.

~~~
rubenverborgh
Yes, I was asked specifically to write a chapter on Tim's work. What we have
is the technology to make it work—it's a necessary, but not a sufficient
condition.

------
wmf
I can't escape the feeling that SOLID will be at best neutral but likely will
make things worse. Some of the diagrams show _more_ companies having access to
your data where it will continue to be mined, sold, etc. If you control
storage but not execution it seems like you control nothing.

~~~
rubenverborgh
You control access permissions too :-)

~~~
yati
An application with access to bits in your pod can always copy that data to
its own storage right? e.g., a Facebook built on Solid can still grab all the
data it needs (with the requisite perms) and store it away, build a profile
per WebID, continue serving "cached" copies of changed content, etc. What are
your thoughts on this?

~~~
rubenverborgh
At that point, it becomes a legal matter. Not everything can be solved with
technology. (There's homomorphic encryption, bet let's leave that aside for
now.) The GDPR legislation in Europe sets a good precedent for demanding
removal of our data.

The crucial point is that Solid will bring more choice: there will be social
feed viewers that will be more invasive, and those that will be less invasive.
People can choose the one they like, without consequences as to whom they can
interact with. Today, we do not have a choice: if we want to interact with
people who use Facebook, we have to use Facebook as well.

------
rickcogley
The problem is people. To put it charitably, not everyone is "technical"
enough to figure out how to own their own data, so I think silos and walled
gardens are here to stay, because they are quick and easy for people. I for
one, fully support keeping my own data in (as much as possible) future-proof
formats, and although I've had a blog in some form for years, I want to move
away from standard social media as much as possible.

------
maxk42
The web cannot be decentralized without putting an end to SSL. As long as
certificate-issuers are the arbiters of commerce and browsers push users to
trust unsecured websites more and more, malicious governments will be able to
silence people by revoking their certs.

There are stronger alternatives. We need to make a push to begin using them.

~~~
jakelazaroff
I'm not sure what stronger alternatives you're referring to. Can you elaborate
or share any resources?

~~~
sparkie
If, instead of identifying services by some human readable name, they are
identified by their public keys, then we don't need certificates - there are
several encrypted and authenticated transport protocols which only require
knowledge of the destination's public key upfront.

You then need an alternative name system which links a unique human readable
name to a public key. This is the tricker part (see Zooko's triangle), but
there are some creative solutions like Namecoin and the Blockstack Name
Service.

~~~
jimsmart
> links a unique human readable name to a public key

Easy: use DNS, store the PGP key ID a TXT records, and then look up the public
key for that ID using a PGP key server.

------
mark_l_watson
I went to the Decentralized Web Conferece a few years ago and really liked it.
In spirit, I am onboard.

In practice, I am satisfied with just using my own domain for email, my web
site, and self-hosted blog. For communication I like FaceTime so I can see
people while I am talking with them, phone, and email.

I still use social media, very occasionally, to see what people are doing and
sometimes advertise my new open source projects and updates, and any books I
write. Most of the problems people talk about with Facebook/Twitter don’t
bother me as long as I only use the systems infrequently. I am not tempted to
cancel my accounts.

------
snazz
The design, typography, and diagrams in this article are wonderful. I like it
when people pay this much attention to detail!

------
peteforde
This essay is in radical need of a TL;DR. If something is this important, you
owe it to the subject matter not to bury the lede under a mountain of history
and flowery exposition.

Ask yourself: who is this for? People who are not already deeply passionate
will stop reading unless they are engaged in a minute of reading. Note that a
minute is being extremely generous; on a commercial consumer site, it's
apparently an average of 7 seconds before someone will click away.

I recommend that you check out this video and reconsider how you might reframe
your message as a call to action that speaks to a better future we can create
together.

[https://youtu.be/qp0HIF3SfI4?t=121](https://youtu.be/qp0HIF3SfI4?t=121)

I even jumped you to the good part.

~~~
sprayk
I definitely scrolled around, looking for some kind of summary. I wantee to
figure out if I was gonna be wasting my time reading another recap of what I
lived through or if there was gonna be a proposal for some way to get back to
decentralization that I could evaluate and keep in mind when designing my own
apps. Couldn't figure it out from scrolling so I bailed.

------
jshen
I’m all for the principles here, but one worry I have is the loss of
efficiencies afforded by economies of scale which could dramatically increase
the carbon footprint compared to the centralized versions.

~~~
wmf
This is why carbon needs to be priced so you could have facts about the
magnitude of the problem (spoiler: probably pretty small) instead of trying to
make qualitative tradeoffs (is it worth destroying society to save the
environment?)

------
firefoxd
Decentralized web can be downloaded and backed up by one entity. Then, you can
go to that centralized entity to enjoy all the content.

If we still don't have decentralization, it's because it is not as easy.

------
vinay_ys
In 2005, I worked at a startup that attempted to solve the problem of privacy
and security for personal information (photos, home videos, music, personal
health/finance documents, contacts etc) while also providing ability to share
and collaborate.

The solution involved running a mesh network with nodes on user's laptop or
desktop and a corresponding node in the cloud. These nodes would index local
data and provide replication of metadata across nodes and backup of actual
data to cloud node.

A locally running web app acted as replacement for 'windows explorer'. It
allowed the user to access all their files and folders across all their nodes,
access them (open document, play music/videos, see contacts etc), create smart
collections and share these files, folders or collections with other users in
a secure authenticated and private manner.

User got an identity - which comprised of a dedicated domain (or subdomain)
and a PKI certificate tied to that domain. Each node had it's own private key
and their public keys were tied together by the identity certificate.

All communication between nodes (of same user or across users) where
authenticated and encrypted using these identity/node keys and certificates.
No central node existed in the system that could spy on these activities. The
architecture separated the network discovery cloud nodes from your data cloud
nodes and architecture allowed for your data cloud nodes to be hosted
separately anywhere (say, in your own cloud instances).

This is the only system I have seen that utilized zero knowledge protocols and
made it accessible to common people to manage their data and share with others
as well.

But unfortunately, as a business it never took off. It got acquired by emc and
merged with mozy (good old data backup company) and then this product died a
silent death in 2010.

Maybe it was timing, maybe after snowden, if this product had launched it
would have done well.

But now, I think a more urgent and a relatively less complex problem to solve
is one of distributed communication. In this era of always connected powerful
devices (mobile phones, home gateways), why don't we all have our own personal
email/chat servers that nobody else can spy on? Why does email and chat have
to get relayed via big aggregators who mine so much data as well as metadata?

Not only do they violate privacy, they succumb to security breaches and cause
serious damages.

I feel the stage is set for this disruption: crypto protocols, always-on cheap
connectivity, compute power at the edge, and sensitivity to privacy/security
in general population – all of these ingredients are appropriately set right
now for this to happen.

~~~
lioeters
Fascinating to read about your experience with the startup, and how the
service was architected.

> unfortunately, as a business it never took off

Sounds like the timing was too early in 2005. I believe these days we're all
so tired of the privacy and security situation, that the world is ready for
something like this.

> [it] utilized _zero knowledge protocols_ and made it accessible to common
> people to manage their data and share with others

This describes exactly what the re-decentralized web needs.

Seeing the many attempts over recent years, it looks like there are
significant technical, financial/business and social challenges - but I
totally agree with your conclusion, that "the stage is set for this
disruption". It also feels like the tide is rising, that the solution is being
worked on from numerous fronts and eventually a more evolved system will be
adopted by the public.

------
captainbland
The point about the decentralised web allowing the permissionless creation of
centralising systems reminds me of the paradox of tolerance, where tolerant
societies are thought to be taken over by intolerance if they tolerate that
intolerance.

Maybe this is a lesson that we need to be less tolerant towards the creation
of centralised services because those with money and power will seek to bring
decentralised systems under their own control.

------
transpute
For the technically savvy, you can run a virtualized desktop:

    
    
      - GPU passthrough VM (gaming)
      - SATA passthrough (FreeNAS)
      - multi NIC passthrough (pfSense/OpenWRT)
      - app server/cloud/P2P Linux or FreeBSD VM(s)
    

[http://unraid.net](http://unraid.net) sells a KVM-based product. VMware ESXi
and XenServer are free. Connect a Ubiquiti AC-Lite WiFi access point to a
dedicated NIC on the x86 box, WAN to another NIC. Since pfSense owns the WAN
NIC, it can host a VPN server for your devices, including mobile. All VMs get
virtual NICs. Dell T30 with quad-core Xeon and ECC costs about $400 with 8GB
RAM and 1TB disk, it can hold 4 x 3.5" drives (20 TB in RAID-1) and 2 x 2.5"
SSD.

Level1Techs has intro videos on home servers:
[https://www.youtube.com/results?search_query=level1+home+ser...](https://www.youtube.com/results?search_query=level1+home+server)

Advantages:

    
    
      - Stable and boring x86 platform
      - Good performance for gaming
      - Commercially supported hardware
      - Upgradeable storage and GPU
      - Upgradeable router software

------
miguelrochefort
Great article Ruben! I've been following Solid's progress for a while, and I
think your article very eloquently summarize its purpose and relevance. I'm
especially interested in the ability to circumvent the middle-men, and resolve
the marketplace chicken-and-egg problem once and for all.

Watching your TED talk in 2013 was one of the most influential moment in my
life, and discovering the semantic web was perhaps my greatest epiphany. While
the vision never left my mind, I never acted on it. Until now.

I'm dedicating 2019 to linked data. I'm going all-in.

Last week, I started to build a tool to convert unstructured input to linked
data. Even after recognizing canonical literals (email, phone, url, color,
gender, boolean, integer, float, date, time span, money, weight, distance,
language, image, geo coordinates), I couldn't accurately infer predicates and
guess classes. Before trying more complicated stuff like bayesian inference, I
decided to try a simpler exercise.

This time, I want to aggregate structured data from different sources and map
it to some existing ontologies. For example, I want to convert some JSON about
comments and links from Reddit and Hacker News to RDF using the
[http://schema.org](http://schema.org) vocabulary.

\- Can I feed the JSON into some ML system that automatically figures out the
mapping? What if I provide some annotation or feedback?

\- Can I manually turn the JSON into JSON-LD and use that as the mapping
information? What about complex transformations (different structures and
literals)?

\- Should I implement the mapping manually using my favorite programming
language?

\- Should I use R2RML or RML?

What's the state of the art today for semantic data integration?

~~~
jimsmart
Maybe take a look at FRED? (Disclaimer: not used it myself)

\- Homepage [http://wit.istc.cnr.it/stlab-
tools/fred/](http://wit.istc.cnr.it/stlab-tools/fred/)

\- Paper
[https://www.researchgate.net/publication/280113533_FRED_From...](https://www.researchgate.net/publication/280113533_FRED_From_natural_language_text_to_RDF_and_OWL_in_one_click)

There are likely other projects and papers, google 'text to rdf nlp'

Stephen Reed (ex-Cyc engineer) also did some interesting work in this field,
in his Texai project, over 10 years ago. Although there are few references to
it on the web now: that part of his project is no longer open source (and I
know of no known mirrors).

\- Paper
[https://pdfs.semanticscholar.org/8026/107de65c5a14aa8d0d47f9...](https://pdfs.semanticscholar.org/8026/107de65c5a14aa8d0d47f9e20d9e24343cb3.pdf)

\- Homepage [http://texai.org](http://texai.org)

~~~
jimsmart
Very much related, "Populating the Semantic Web—Combining Text and Relational
Databases as RDF Graphs", Kate Byrne.

\-
[http://homepages.inf.ed.ac.uk/kbyrne3/docs/thesisfinal.pdf](http://homepages.inf.ed.ac.uk/kbyrne3/docs/thesisfinal.pdf)

------
Sargos
Solid looks to be trying to reimplement what platforms like Ethereum are
already building. The same ethos is there and this is very well written but I
wonder if the Solid project just missed that when doing their research.
Hopefully all of their efforts don't go to waste and they can extend some of
their work to the broader decentralized web community.

~~~
rubenverborgh
No: blockchain technologies are about reaching decentralized agreements. Solid
is about everyone being able to write their own things (so no agreement)
without centralized parties.

~~~
cslarson
The Ethereum project is about providing a complete decentralized web3 stack -
not just a blockchain, though the database layer that provides is a critical
part of it.

------
benlorenzetti
<i> He and many others were able to state their critical opinions because they
had the Web as an open platform, so they did not depend on anyone’s permission
to publish their words. Crucially, the Web’s hyperlinking mechanism lets blogs
point to each other, again without requiring any form of permission. This
allows for a decentralized value network between equals, where readers remain
in active and conscious control of their next move.</i>

For decentralization the root problem always existed, while pointing at
another resource requires no permission, receiving and hosting that resource
does. Your government has to let you receive it and your ISP has to let you
host.

This is a much lower level problem compared to the three challenges Berners-
Lee puts forward, which seem to have little to do with decentralization.

1\. taking back control of our personal data;

2\. preventing the spread of misinformation;

3\. realizing transparency for political advertising.

------
ilaksh
I think if you add some cryptography to Solid and use JSON-LD and pick some
schemas and not expect everyone to implement OWL and then get a usable naming
scheme for IPFS (or replace IPFS with something similar with names that work)
and then create some P2P Solid servers then this could work pretty well.

------
mikob
What's to stop a service from having a pod that stores user's data in a
mutated form? (Forgive me for the basic question).

~~~
jimsmart
Signature hashs of the files, (possibly/probably) with those signature hashs
being further signed/verfied with the user's key, to establish trust. With
further chains of trust through key signing, if needed.

------
TheMagicHorsey
Somebody should make an easy to install home server with a standard API to
access data. People will start decentralizing on their own when you can just
buy a box and do some basic configuration, and have a secure home web server.
And then developers will build on the decentralized platform because it has
users.

------
dabockster
> Since 2010, no single browser has gained more than two thirds of global
> market share anymore

What about Google Chrome?

------
kornork
Maybe I missed it somewhere, but I didn't quite follow how this is going to
gain a foothold. Solid has the same problem any new social media platform has
- before people want to use it, people have to be using it. Facebook and
Google certainly have no incentive to promote it.

~~~
kjetilk
Oh, but Solid isn't just a social network. True, social networks have really
powerful network effects, so it is a key to success, but not the only key.
We're separating data from apps, which enables permissionless innovation. That
means a lot of people can start writing cool things that they just can't now,
because they are constrained by those platform companies. We're doing that
too. And once people start doing that, every useful app that comes to Solid
will grow the platform, first probably as small communities here and there,
and then those communities get new connections, and boom, disruption! :-)

------
carlsborg
The specification (set of protocols) is here [https://github.com/solid/solid-
spec](https://github.com/solid/solid-spec)

------
ngcc_hk
Missed that about 20% humanity is under a walled national garden. If you have
a protocol that are individual or home oriented, would it be allowed to work
even.

------
EGreg
_Since 2010, no single browser has gained more than two thirds of global
market share anymore_

Pretty sure Chrome did. Or WebKit/Blink family. This is GOOD imho.

~~~
rubenverborgh
Unfortunately, "pretty sure" is not the way sound arguments are made ;-)
Evidence is: [https://netmarketshare.com/browser-market-
share.aspx?options...](https://netmarketshare.com/browser-market-
share.aspx?options=%7B%22filter%22%3A%7B%7D%2C%22dateLabel%22%3A%22Custom%22%2C%22attributes%22%3A%22share%22%2C%22group%22%3A%22browser%22%2C%22sort%22%3A%7B%22share%22%3A-1%7D%2C%22id%22%3A%22browsersDesktop%22%2C%22dateInterval%22%3A%22Monthly%22%2C%22dateStart%22%3A%222017-01%22%2C%22dateEnd%22%3A%222018-12%22%2C%22segments%22%3A%22-1000%22%7D)
and [http://gs.statcounter.com/browser-market-
share#monthly-20171...](http://gs.statcounter.com/browser-market-
share#monthly-201712-201812)

Such a centralization comes with the risk of websites only working with one
browser, forcing people to chose a certain device, operating system, and
browser vendor.

------
austincheney
I don't see any real possibility for decentralization so long as HTTP(S) is
the protocol of the web.

------
zaro
This sounds like yet another technical solution to a problem that is mostly
societal.

------
sonnyblarney
All of this is very academic.

Regular people and businesses are always going to make the decision in front
of them.

'Decentralization' unto it's own, is not something anyone directly cares
about. People care about privacy, somewhat, but there are other paths to
privacy, or at least, consumers may very well believe there are.

Decentralization will only happen with a real impetus: a product or service
that facilitates it, that people want, either for issues related to
decentralization, or, more likely for some other reason that just happens to
facilitate decentralization for some other, related reason.

------
pbalau
This is a load of ... You can't descentralize the web for 2 reasons: DNS and
SSL. And then you have the IP organisation, the name escapes me right now.

~~~
peterwwillis
Both DNS and TLS PKI (nothing uses SSL) are decentralized by design.

~~~
pbalau
Who decides I can or cannot get foobar.tld?

~~~
amiga-workbench
You are perfectly free to run your own nameserver, nobody has to use it
though.

~~~
pbalau
Yes I can run my own top level dns. Yes, people can chose to use my dns or
they can chose to use the google one. The question is why would they do
either, why would they pick mine over the google one, or the google one, over
mine.

You can argue that the google dns can sign the answer with a cert you trust,
thus you know you got the right answer. And then, when you get the address of
the service you want to talk to, you can check their cert and know for sure
you are again talking to the thing you want to talk to.

And here lies the problem, how are you going to check those certs? Are you
going to queue at a google office and obtain a cert on a usb stick, then
import that in you environment, then do the same for all other services you
want to talk to? Or are you going to trust a central authority? What if this
central authority is messing with you? What if someone up the chain is messing
with your central?

Let's not talk about how your router might send your dns request to some guy's
home server that answers to 8.8.8.8, instead of google and you can't do shit
about it...

Everything we know about internet works the way it works, because everybody
involved agrees to do the right thing, from ip routing, to dns resolving, to
certs verifying. All of this is based on a set of rules set by a central
authority, that everybody choses to follow. You can't speak about web
decentralization as long as proving who you are is a very centralized system.

------
alexashka
Huh...

> The situation becomes problematic when we are robbed of our choice, deceived
> into thinking there is only one access gate to a space that, in reality, we
> collectively own.

Robbery - the action of taking property unlawfully from a person or place by
force or threat of force. [0]

Deceit - The action or practice of deceiving someone by concealing or
misrepresenting the truth [1]

That's what those words mean. They also have nothing to do with anything that
has happened with the internet over the last 20 years.

[0]
[https://en.oxforddictionaries.com/definition/robbery](https://en.oxforddictionaries.com/definition/robbery)

[1]
[https://en.oxforddictionaries.com/definition/deceit](https://en.oxforddictionaries.com/definition/deceit)

~~~
fwip
I don't think quoting the dictionary here makes you look especially smart.

Are you railing against the use of "rob" with an intangible noun? Would you
cry foul at phrases like "robbed of their dignity?" Do you ignore alternative
definitions like "to deprive of something unjustly or injuriously?"[0]

Do you believe that nobody involved in centralization conceals or
misrepresents the truth? Does a marketer never overstate the benefits of their
hosted solution?

[0]
[https://www.dictionary.com/browse/robbed](https://www.dictionary.com/browse/robbed)

