Hacker News new | past | comments | ask | show | jobs | submit login
Defending Internet Freedom through Decentralization: Back to the Future? [pdf] (dci.mit.edu)
143 points by Dangeranger on Aug 19, 2017 | hide | past | favorite | 46 comments



(From the document..)

Risks Posed by the Centralized Web

- Facebook and Google, account for 81% of all incoming traffic to online news sources in the U.S.

- Google processes 3.5 billion search queries per day, roughtly ten times more than its nearest competitors

Risk 1- Top-down, Direct Censorship

Risk 2 - Curational Bias/ Indirect Cencorship

Risk 3 - Abuse of Curatorial Power

Risk 4 - Exclusion


>Google processes 3.5 billion search queries per day, roughtly ten times more than its nearest competitors

The crazy thing to me is this has already been examined through many of their obviously anti-competitive acquisitions, and yet they always win. Namely, they use their position as the search engine leader to determine trends in search, purchases and advertising to buy companies in industries that will ultimately help the bottom line of their advertising revenue driven search engine to the detriment of alternative products.

In other words the very people/companies who already buy Google ads now must pay more for Google ads because Google entered the market and is competing for the same ads jacking up prices. Because of Google's search engine they know exactly where they can get away with this, they know the maximum ad cost bearable by their competitors and can/will squeeze it at best or they can effectively drive you out of business leaving themselves as the sole operator in the space at worst.

I mentioned it the other day, the one thing people could do is boycott google search, which I know sounds ridiculous, but how hard would it really be to organize a single day google search boycott across the web where people agreed to use bing or DuckDuckGo for a single day? google shareholders would throw a fit and change would be made internally in an instant, and I'd go so far as to say political pressure would come down to seriously look into the business practices. Most importantly people would wake up, understand their power and realize they own the web not these dominant corporate entities.


> how hard would it really be to organize a single day google search boycott across the web

- DDG and probably even Bing would go under.

- It's just a slap on the wrist for Google.

I think a better idea is to create a distributed, federated crawler and search engine that does not depend on a central point of failure.

Alternatively, we could still use Google for long-tail, technical searches and use a different one for important queries such as politics and social.


Good luck creating a distributed search engine that produces results that come even close in quality to Google's. Search engines are hard. There is a reason Google dominates, and its not just inertia.


Do you have actual evidence this is the case, or is it just supposition? I've been using DuckDuckGo for the last week and whenever I was unable to find something relevant I'd try the same query in Google and also come up short.

Even if Google is better overall, if you can use another search engine and get by, you are helping loosen their grip on their monopoly and give other search engines a path to improve. If you are unable to create a competing search engine yourself, that is at least one way you can help.


I also use DDG as my main search engine. For me at least the results are much worse than Google's, not only for obvious things where Google benefits from the additional information they have about me, for example localizations, but also for normal queries. Quite often some statcounter sites rank above the actual sites that I want.


Same here. I often use ⌘-l g TAB to get google results. Especially when searching for programming related things.

I do appreciate the wikipedia pages and the weather from DDG.


Don't take that advice as discouragement to start.

Google did have to engineer a giant distributed system to provide the search results we get. There are articles that talked about how they just had motherboards connected to hard drives laying on anti-static pads filling rooms in buildings. The racks and cabinets didn't make sense for them.

Today we take for granted software stacks like lucene. Nothing like that existed when Google moved into the space.

So yeah, 15-20 years ago internet search was a really difficult problem. Now, the sheer volume of research available which specifically addresses the problem space of a trusted decentralized search, to me, means that a solution is imminent.

You could be the person that puts all the research blocks in the right order and launches a successful project.


> giant distributed system

A politically distributed system is a whole different beast.

People have built plenty of distributed systems where all endpoints are under the control of a single organization (or cooperating organizations). That's essentially a solved problem.

But a system which is not under a single control is a whole different approach to being distributed. Not only does A not control B, they may not intend to cooperate, and A does not necessarily have any reason to trust B or vice versa. B may even be an endpoint that is outright hostile to the network itself (ie, spam, ddos).

There have been very few successful distributed systems that follow that model. The WWW is one. That's the good news: If you manage to find a viable model of decentralization, AND it succeeds in user adoption, you have literally changed the world forever.

That's how hard it is: People who achieve it go down in history.


I see it as more of an incremental process than as an achievement. The Gnutella2 protocol by Michael Stokes is the most successfully deployed emergent and distributed search service that I am aware of. While G2 lacks any trust mechanisms there are a few projects working in the problem space of trusted decentralized stacks or "anarchitecture" that are trying to solve that for us.[1][2][3] All the building blocks seem to be in some state of existence and there are people experimenting by arranging them differently. The systems trust example you gave seems easily solvable after creating a durable decentralized trust model which manages identities and can verify trust while keeping privacy in mind. User key management seems like it will always be a problem.

The thing you mentioned is user adoption. Even if a decentralized Google was available there is no marketing machine behind it. There's no reason for people to adopt it. Patchwork is a good example of what organic growth looks like for a decentralized social app.

Developers are another facet of user adoption. Developers don't think about using these tools for new service development. They get paid to work on centralized architectures. The incentives aren't there yet.

It makes sense to me that this is the natural evolution of that way in which we build services. And yeah, it is super hard and we are all gonna be famous. ;)

[1] https://www.scuttlebutt.nz/

[2] http://ceptr.org/projects/holochain

[3] https://blockstack.org/


> While G2 lacks any trust mechanisms there are a few projects working in the problem space of trusted decentralized stacks

Search engines need to solve the spam issue.

Without central control you lack any solid ground for eradicating spam, preventing fake accounts, and exploits to your protocols. Building trust is not easy. The only progress we made in the past decade was with Bitcoin, proof-of-work. Bittorrent relies on either a central tracker or the DHT(==not very secure).

For the past 12 years I've dedicated my academic career to building trust without any central control or central servers. It's hard. <plug> We enhanced a Bittorrent client with a web-of-trust plus distributed torrent search and replaced tit-for-tat with a ledger. We using an incremental PageRank-like trust model. https://github.com/Tribler/tribler/wiki#current-items-under-...


> Tribler is the first client which continuously tries to improve upon the basic BitTorrent implementation by addressing some of the flaws. It implements, amongst others, remote search, streaming, channels and reputation-management. All these features are implemented in a completely distributed manner, not relying on any centralized component. Still, Tribler manages to remain fully backwards compatible with BitTorrent.

> Lengthy documentation in the form of two master thesis documents is available. First is a general documentation of the tunnel and relay mechanism, Anonymous HD video streaming, .pdf 68 pages. Second is focused on encryption part, called Anonymous Internet: Anonymizing peer-to-peer traffic using applied cryptography, .pdf 85 pages. In addition, there are the specifications for the protocols for anonymous downloading and hidden seeding on this wiki.

The link that you provided is going to take up a few weeks of my spare time. Thank you!


> a trusted decentralized search, to me, means that a solution is imminent.

Yes, something like Lucene didn't exist when Google Inc started in 1998 so the community could theoretically build a new search engine on it (or other similar open-source text indexing technology).

However, this overlooks the dynamic where Google will continue to innovate new algorithms for search results while the community is trying to "catch up" to Google with yesterday's technology (Lucene). Google isn't standing still.

This is similar to the dynamic where opensource Hadoop is not as good as Google's new "mapreduce" they use internally.[1] Same situation with GIMP that still doesn't have feature parity with Photoshop even though GIMP has been updated for 21 years.

That creates a feedback loop where users continue to use Google in 2027 because the open-source federated alternative is still "catching up" with Google 2017 and delivers worse results.

I think people underestimate what it takes to replicate Google's search intelligence. Something like Lucene only covers the text indexing. A major part of the software is the data center operations and coordination processing pipeline. Maybe datacenter/cloud software like OpenStack might be leveraged to help coordinate a hundred thousand processing nodes but it's still far short of replicating what Google does.

>Even if a decentralized Google was available there is no marketing machine behind it. There's no reason for people to adopt it.

But google search didn't have marketing behind it either. In 1999, they didn't run ads on Super Bowl or Wired Magazine. It's fast adoption was spread by word-of-mouth because everybody saw right away that it had better results than the junk returned by AltaVista, Excite, AOL, etc.

If the federated community search engine is truly better, people will use it.

[1] http://www.datacenterknowledge.com/archives/2014/06/25/googl...


All of your points are valid and a bit depressing. Google will always be ahead of the game in search. This will always be a problem for FOSS decentralized projects. How to overcome this will be a problem that is perhaps handed down to the next generation.


Sweet irony would be if you could use their own AI research to help


distributed, federated crawler and search engine that does not depend on a central point of failure.

here you go: http://yacy.net


> how hard would it really be to organize a single day google search boycott across the web

http://www.onedaywithoutgoogle.org/


Ironically, antitrust law prevents news publishers from working together and negotiating with Google. That would be "anti-competitive".


Maybe it's time to discuss how big is healthy for democracy. I think there is a case to be made that companies that get this big should be broken up into smaller ones. It's a tuning mechanism to keep competition alive and prevent power from getting too concentrated. It should also handle the too-big-to-fail problem we see in other domains.


The risk of a centralized web is censorship by government.

It's that simple. Other things may be a risk also, but not because of a centralized web.


Why isn't censorship by non-government interests a problem/risk?


We are in this "decentralized" web, is just that the giants have a habit into buying out anything becoming remotely popular.

Think end game capitalism. Capital amalgamation.

In fact, any modern corporation will be buying as many smaller parts as they can, and regularly firing 10-20% of the people every year (cisco , etc).

So then, the problem isn't in the "centralized" web.


The big guys get capital for next to nothing from the public markets so they can buy out the small guys who challenge their control. They can even run the bought businesses badly and still make a profit because the money they used to buy them was so cheap.


Plus the nasty fact that being huge and global means you can afford implementing the most effective tax evasion schemes. If a significantly lower tax rate means having to open up an office with a thousand employees in country X, then that is simply what's done, because it is cheaper than paying taxes.


Due to efficiencies of scale and network effects, centralization may seem inevitable. However, we can structurally offset some of this tendency using crypto-economics.


It's not inevitable. Centralization is a side effect of the CFAA and copyright laws. The reason that people aren't able to read Facebook on a user's behalf and multiplex the feed into a competing network is because Facebook will sue you and the FBI may arrest you if you do this.

As such, users are denied true competitive freedom to say "I prefer the way that NewThing does stuff". They instead have to say "I prefer NewThing, but my social graph is held hostage by Facebook."

I won't say anything in particular about network effects because it's usually taken the wrong way, but there's no reason any of this lock-in has to exist on the digital web. We've chosen it by giving extremely strong legal protections to existing tech incumbents.


Network effects will sooner be broken by a legal, than a technical solution.


I think this is the fundamentally wrong angle, balkanisation and "decentralisation" don't actually seem to help end consumers and don't seem transparent.

The logical conclusion when scale and centralisation are inevitable is the good old concept of a "public utility", see Bell Labs, electricity networks, and so forth. Doesn't sound as exciting because there's no "ether" or "crypto" in it but it has served most people quite well


For the stuff that works, i.e. electricity as an invention.

But if you need to continue innovating, fragmented ecosystem > one entity.

Prime example being: what have 7B people invented, vs , what has UN (as their one face) invented.


I don't know what to make of that analogy, but why not stick to the historical example I gave, Bell Labs has invented a good deal of things, as has the military (which I guess can be thought of as a utility of sorts) and the telecommunications sector.

For fundamental and 'deep' research the public sector is doing a pretty good job. I'm skeptical about selling every chat app and toy as "innovation".


In fifty years people will look back on this thread and note that we were still debating ideas from such radical ends of the spectrum.

I don't know how it will turn out but I have not heard much about your communist internet. I don't mean that in a bad way it is just a useful analogy for me to think about internet as railroads, as the post office, the dmv. Something we need so much that we pay taxes to make sure it is there regardless of whether or not it is viable in a business sense.

Then I realize that Bell Labs was not a public utility. It was an exceptional research project driven by exceptional people. Putting a power grid in the same category as Bell Labs is a fallacy. The nature of public institution stifles innovation. The ossified structure provides a safety which precludes the need for survival. Yes, we need the DMV but for fucks sake why do I have to wait in line for 6 hours once a year? How is the post office not profitable? Why are our trains so much slower than in other countries?

What is it that you are imagining as good when you see us provisioning our computing resources from a government agency?


I wonder how much centralization is due to consumer ISP and computing limitations. What structure would communication channels take in a world of phones with terabytes of local storage, days of battery life under heavy use, and reliable unlimited high-speed internet?

Non-corporate software alternatives are getting there, but the next bottle-neck is operation. A simple static personal website still involves hosting "in the cloud" or setting up personal hardware, and operations. Mass adoption requires a 1-click 0-maintenance app.


I think computing limitations is the big one but also software limitations in terms of how much software a company might need to develop for itself to compete with a big player. Right now there's a lot of innovation going on. Not that there won't always be innovation, but in the grand scheme of time I'd argue web-tech will one day reach a state of the art that progresses more incrementally in the way that fields like physics, biology, etc. tend to with of waves of small insights rather than the few decades of rapid evolution they underwent in their infancy.

It's at that point of incremental innovation for web-tech where I think little guys become competitive because that's where they can start smaller businesses running cheap but 'good enough' hardware and premade open-source software to provide similar services to the big players albeit at smaller margins to provide comparable but less comprehensive products for cheaper. These smaller businesses are where I imagine the decentralization manifesting. Especially if some of those businesses are things like "Join our regional hardware co-op and contribute your unused processing power to our service in exchange for a share of the profit". That would be huge on its own for de-centralization, if a profitable service could bring anybody and everybody a worthwhile amount of money for simply running their web-service software during downtime.


Is there any sort of free open source internet search project? Similar to OpenStreetMap for open map data. Searching hasn't found any.


Also, the amount of open source software libraries for search is disappointing.

Somewhat strange, because searching is one of the pillars of CS.



yacy.net


I'm sad they don't mention urbit.org. Urbit aims to solve a lot of these problems at once (storage + identity + social + platform).


This part made me laugh out loud: "and “Arvo,” a functional network OS, which is also a database." https://bitcoinmagazine.com/articles/urbit-the-bold-pitch-to...

It seems like they've had a professional story teller masticate their ideas into good descriptions.

But really, if you had to describe it in 5 buzzwords or less would it be a "federated crypto p2p lambda server"?


Curtis loves to write about political philosophy and it's clear some of that has leaked into the urbit project--which isn't in isolation bad. It does seem to deter a general person simply interested in what the project is all about since you have to cut through the obscure language.


Started reading last night. No technical expert but I came of age after the widespread use of the Internet. Many people, my age and older, can care a less about the philosophy of decentralized platforms.

However could you combine usability of centralized platforms with a decentralized platform focus on privacy, lack of ads, etc?

(of course how to pay for it...)


Regarding payment, I think you must take payment from your users (my recommendation would be donations, like Wikipedia).

This is because an organization's primary objective is self-preservation, and that means aligning priorities with income sources (or when you're designing the organization, aligning your business model with your priorities).

As we've seen with journalism, journalism's reliance on advertising leads to journalism designed primarily to attract viewers for advertisers rather than to serve the public interest (although good journalism does still happen because some journalists care about that despite their organization's priorities).


Wikipedia is a great example of a free decentralized product.

I suppose a way without having to ask for donations is one I would be interested in though I suspect there are " whales" who make up the bulk.

I think the issue with Wikipedia, though it has been able to scale to reach the world, is that it seem to lack the power to change how info is distributed and consumed in the spirit of Google and Facebook outside its digital border.


To increase the competition via decentralization of the government control of internet


Removal of government control is a double-edged sword - government can be a huge barrier to innovation, but they can also enforce innovation, by blocking monopolistic companies from being anticompetitive.


I feel like these monopolistic companies will have a lot more sway in any attempts at legislation than the disorganized, disinterested masses.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: