Hacker News new | past | comments | ask | show | jobs | submit login
Neocities is implementing IPFS – distributed, permanent web (ipfs.io)
632 points by stesch on Sept 8, 2015 | hide | past | favorite | 235 comments

Reading this felt kind of like the first time I read a writeup on Bitcoin. There's the same sense of throwing out some old, formerly immutable rules here, the excitement of something that's going to test some boundaries and inevitably clash with some authority (how can you, for instance, comply with the EU's "right to be forgotten" when that information is scattered into a redundant mesh?). Interesting times ahead for IPFS.

Thanks for the kind words. :) Come join us! https://github.com/ipfs/

You guys are great! :-)

Interesting project addressing linkrot and central servers. Hopefully this line of work will attract more attention and developers.

Sadly the design contains some flaws that need addressing. From their whitepaper (http://static.benet.ai/t/ipfs.pdf) The architecture is essentially creating 1 giant Bittorrent swarm where each piece may be seeded by different people. See section 3.4.4 of whitepaper. This approach suffers from the "double coincidence of wants" problem. There will be a billion items if this takes off. Matches are rare and could take hours/days.

"Aside from BitSwap, which is a novel protocol". These types of protocols have been worked on for over 8 years. Nobody has ever been able to protect them from an Eclipse or Sybil attack. Early work: http://www.seas.upenn.edu/~cse400/CSE400_2005_2006/Wang/Writ... Deployed system from my research group, improved upon for 7 years now: http://www.pds.twi.tudelft.nl/~pouwelse/A_network_science_pe...

I haven't thought about IPFS in months. One thing I do remember thinking about months ago, are aggregators and distributed indexers.

I don't know enough about those kind of security attacks to comment.

My comment "you guys are great!" refers to their team as people not necessarily the project itself. I've worked with them before, and they are awesome folks.

Agree. What we really need is for browser vendors to get together, bring in the tremendous financial and technical resources they have, and get a decentralized protocol formalized and supported across browsers. It may take years, but it needs to be done.

That would change the internet, and the world.

Two pieces of (coming soon) good news:

- very soon, you wont need to install anything to use IPFS. it will "just work" with js on today's browsers.

- for best perf, yes, we need browser implementations. and... those have begun :)

Are those implementations extensions or do they require native code changes?

OK, as apparently .js is the defacto thing to build cool sites, I for one will be left out here.

On sites not known to me, I typically block most (read all) .js at first, just to make sure that these sites do not leak too much information about me to external tracking tools, advertisers or the like.

So using .js seems to me a bit problematic in regards of people not enabling it by default (but this is an absolut minority, if I look at the stats of the sites I analyze).

I use a custom hosts file (well, technically I run it as an internal DNS server) from Someone Who Cares to block most of the aforementioned annoyances: http://someonewhocares.org/hosts/

It works very well for me so I'd recommend it to others, if they're not already aware of it.

I've always wondered if Windows or other operative systems read the entire hosts file everytime they want to resolve an address. Maybe a big hosts file is bad for network performance?

It will largely be read out of memory, since it doesn't change much and is frequently requested. Scanning 64K or so won't break a sweat compared to doing a DNS lookup across the network to a resolver that might itself need to make a request.

Well, I think you can install something to use it without js, and they were just saying js would be for if someone wished to use it by simply using a webpage, with no install?

> OK, as apparently .js is the defacto thing to build cool sites, I for one will be left out here.

I doubt they're building it in JS for giggles.

In order to make changes like this stick my guess would be that you need to reach a tipping point before interest fades, and to do that you need to take away the browser vendors ability to say 'no' or 'maybe later'.

As you point out, people who don't run JS by default represent a tiny percentage of traffic, so would have to be a secondary concern to be supported later.

The first implementation of IPFS is written in Go, and it includes fully functional nodes, http gateways, and libraries, but they're also working on a node.js version (that's also compatible with browser environments) so that non-technical users can have ipfs sites "just work". (although it requires at least static content served from a "normal" http/html server)

One solution for people like you (and me) who care about execution of too much js could be to make the code free software. You then can allow all free js in your browser by default. See [0] for details.

[0] https://www.fsf.org/campaigns/freejs

Somebody with golang-foo, firefox addon experience and a itch might also want to take a look at http://id-rsa.pub/post/go15-calling-go-shared-libs-from-fire...


Do you have links to browser implementations?

As far as I understand it, you must issue a "pin" command in order to relay or "host" contents. So if some contents is illegal, authorities can command you to "unpin" it. It's a better option than having to take down your whole node.

Right to be forgotten however applies only to search engines right now, doesn't it? The articles the results used to point to still exist. So a redundant mesh itself doesn't seem to apply here.

The "right to be forgotten" applies to any information service, not just search engines. However, search engines are often easier legal targets since they do business in the EU and thus can be made subject to EU law. If I have no business in the EU, and post my website on a non-EU hosting provider, EU law can demand I take down a page but has no jurisdiction to force me or force my provider. But e.g. Google, since they do business in many EU member states (primarily sell advertising), if they disobey EU law, EU governments/courts can take legal action against them (e.g. confiscate their EU advertising revenue).

As far as I'm aware that's simply not true. See http://ec.europa.eu/justice/data-protection/files/factsheets...

Furthermore the right to be forgotten only mandates hiding results when someone searches specifically for a name, for example. The results are still allowed to come up for unrelated search terms.

Directive 95/46/EC (the Data Protection Directive) never specifically mentions "search engines", it talks about "controllers" of personal data. The ECJ ruled (Case C‑131/12) that search engines such as Google count as controllers of personal data, and thus are subject to the "right to be forgotten" under the Directive. But clearly the Directive is applicable to many other things than search engines, so the "right to be forgotten" cannot be restricted to search engines either. (Newspapers should be exempt because article 9 of the directive has an exemption for "the processing of personal data carried out solely for journalistic purposes or the purpose of artistic or literary expression", but search engines don't fall into that category, and other web sites may or may not either.)

The court judgement - http://curia.europa.eu/juris/document/document_print.jsf?doc... - paragraph 26 says "As regards in particular the internet, the Court has already had occasion to state that the operation of loading personal data on an internet page must be considered to be such ‘processing’ within the meaning of Article 2(b) of Directive 95/46" - so by the Court's reasoning the Directive potentially applies to any web page, not just search engines.

Indeed, the Data protection directive is not about search engines: it is about privacy online in general and any web page published or plausibly controlled from within the European union is subject to it (and of course also to any IPFS page, although who controls it is harder to say). It has always applied to a lot more than search engines.

The "right to be forgotten" is a phrase used in the argument for de-listing, and later became the name for the ruling itself. This was however not a right that the court granted, but rather the court points out that the data protection directive stipulates that personal data shall be: """ (c) adequate, relevant and not excessive in relation to the purposes for which they are collected and/or further processed;

(d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that data which are inaccurate or incomplete, having regard to the purposes for which they were collected or for which they are further processed, are erased or rectified; """ and judges (in paragraph 93) that in particular, data that was once in compliance might not be at a later time.

It also states that the failure of the particular data in this case to be those things means the person described by them can invoke the right to object to the processing, in this case having personal data listed on a page as a response to a "search made on the basis of his name" (in paragraph 94).

It is not completely clear, but I think the query being judged upon includes only the name of the person, and adding a keyword relating to the particular year, or bankruptcy actions, would be enough to make the data "adequate, relevant and not excessive".

More relevant, is that it seems safe to include the link even in name searches that also includes additional keywords, since google lost on account of them judging a very old news about him still relevant for listing in a general name query, and the court saying: not so.

Sorry, hadn't read your comment, before stating the same. Hat tip to you.

>for instance, comply with the EU's "right to be forgotten"

Easy: The Net interprets censorship as damage and routes around it.

Oh, right, we're not calling this censorship yet. Funny how quick we are to put up mirrors to Tibet and Tank Man content but somehow we're on the hook to protect London businesspeople and corporations from criticism? Like Bitcoin or Bittorrent and other decentralized applications, it will not be able to follow the dictates of nation states and it will be on nation states themselves to filter appropriately. There's no one to call to remove "unwanted" content. No one to hit with fines, taxes, lawsuits, and fees.

Also, there's a discussion to have in regards to EU shakedowns of US companies. These billions of dollars in fees that go straight to the EU don't seem like consumer protection to me, but fundraising. Billions of dollars in fees for "bundling media players" or "having an Android monopoly" should give any wannabe entrepreneurs pause. Its a shame that the EU is unquestionable on sites like HN and reddit. There's a lot of questionable morality here and obvious signs of corruption. So yeah, not being able to be dictated by the EU or any nation state is a feature, not a bug.

Isn't the right to be forgotten something, that esp. relates to search engines? They (esp. Google) have to delete the links to this content, but the content can stay on the web forever.

So this isn't per se a problem. But what happens, when I act as a "mirror" in this net and have mirrored some questionable content (who ever might define questionable is up to anybody to imagine). Will I get a shake down from the authorities, as oftentimes happened with people having tor exit nodes?

I fear that the clash would come rather sooner then later, as a lot of people do have a lot of incentives for a centralized web (money, politics, control, power).

I don't believe that law should be shaped by technology in principle. It should be shaped by what the majority think is right for society.

But the right to be forgotten is a flawed concept to me, and I think any tech that makes its implementation more problematic will do us a favour.

> It should be shaped by what the majority think is right for society.

I would kindly suggest that we consider examples from history when laws backed by the majority have resulted in horrifying, depraved violence and deprivation; then contemplate whether it is indeed moral to shape law through majority opinion.

But problem is that it is. It's a EU law, so while I agree with you, I sense the IPFS will clash with this law. Let's wait and see, progress can't be stopped though...

Just to point out -- hypertext systems prior to the web almost universally expected permanent stable addresses (and this was, for instance, required by the various Xanadu specs). Enquire (Tim Berners-Lee's immediate predecessor to the Web) had permanent stable addresses, too.

The web didn't have permanent stable addresses because the web was a hack, intended as a way to explain Enquire to people who didn't have a background in real hypertext systems. (Let the entire existence of the Web, with its myriad flaws, be a lesson to you: your crappiest demo may well become the thing you're known for twenty-five years later, so make sure your crappiest demo doesn't suck as bad as the Web does.)

> (Let the entire existence of the Web, with its myriad flaws, be a lesson to you: your crappiest demo may well become the thing you're known for twenty-five years later, so make sure your crappiest demo doesn't suck as bad as the Web does.)

The web is amazing, and you are using it right now. It has flaws, but so does anything that achieves success at the level the Web has. How many technologies have a direct impact on billions of people's lives?

There are subtle reasons why "crappy" technology often wins. One is that worse is often better (http://www.jwz.org/doc/worse-is-better.html) and the second is that people are more prone to complain about popular technologies, which create the illusion that their alternatives would obviously be so much better.

EDIT: Note that the web is not just a technological phenomena, it's a social one. If you care about the real world, it doesn't make sense to look at it as a technological innovation in isolation. Anyone can dream up The Perfect Protocol in a research paper, it's an entirely different beast to actually do it and make it work for billions of people.

A big (biggest?) reason is that popularity is largely independent of quality. Things get popular for lots of reasons heavily affected by the shape of social networks and early choices by a small number of people involved—often based on what they have and have not heard of more than an analysis of what is better or worse. Quality plays a small role at best.

This Nautil.us article[1] does a good job of summarizing the relevant research on the issue and references some supporting studies.

> My own research has shown that fame has much less to do with intrinsic quality than we believe it does, and much more to do with the characteristics of the people among whom fame spreads.

Put another way: the web has a direct impact on billions of people less because it's any good and more because popularity is inherently a power law and it happened to come out ahead. This doesn't invalidate any criticisms of quality!

It's perfectly possible if not likely for something popular to be crappy because most things are crappy and that's not enough to stop them from getting popular. Then they require tons of engineering effort and hacks to scale and improve again not because of quality but because of legacy. The web is what's there already, after all!

That's how things are, but that doesn't mean it's inevitable or a good state of affairs.

I really wish people wouldn't conflate popularity, quality and success. The first two operate on different axes and the last, if it doesn't, should.

[1]: http://nautil.us/issue/5/fame/homo-narrativus-and-the-troubl...

> Quality plays a small role at best.

This is simply not true in the long-term. Good technologies survive because they are useful.

Of course quality is not enough to ensure success, but it sure as hell matters for long-term survival. Homer, Shakespeare and the wheel survived because they are very good / useful. Are there lost Homers? Of course there are. Will the Web survive 100 years? I have no idea, and neither do you. A decade? Most likely.

Be careful with conflating what is with your idea of what something should be. If you think something sucks and work on making it better, that's great! Keep doing it, and maybe someday we will have something that's even better. But don't diss the popular just because it isn't ticking all of the "it-should-be-this" boxes.

Quality is definitely a small factor in success, when we're talking about network effects. When we add in capital, with all its perverse incentives, quality matters even less.

Technologies become popular primarily based on how easy or cost-effective they are to replicate. The fewer concepts you introduce, the lower the bar is for people to reproduce your intellectual inventions; the cheaper the materials and manufacturing process, the easier it is for knock-offs to be manufactured. Popularity means a plurality of knock-offs. Quality often requires subtlety and attention to detail -- two things that add effort to replication, and thus work against popularity.

The web became popular not because it was better than the alternatives but because it was stupider than the alternatives -- it had fewer features, fewer ideas, and fewer nuances. The web is no longer less featureful or nuanced than its competitors, because unplanned organic development grows complexity like a rat with a radium drip grows tumors, so as a result the very simplicity that made the web grow so quickly in the first place would prevent it from growing again, had it not become the focal point of an enormous amount of lock-in and path-dependence.

The web is a huge bureaucratic mess right now, just waiting to be disrupted by a superior hypertext system.

How does one take advantage of this situation?

There will be technologies which:

(1) despite better quality, but won't have enough traction (OS/2 v Windows),

(2) thrive just fine, while their crappy cousin continues beating them in popularity year after year (PostgreSQL v MySQL)

(3) despite being better on paper, fall short in actual use (Android v iOS)

(4) ___________________________

Without fame, how does one evaluate technologies; it's impossible to try them all. How does one proceed?

HTML/HTTP was an improvement over earlier concepts of bidirectional, centralized, reliable hypertext precisely because the links were one directional, hierarchy-free and unstable, and could be put anywhere. The servers belonging to distinct entities weren't coupled. The pages belonging to one entity weren't coupled. It ripped out a lot of politics and baked-in design decisions and let loose creative anarchy.

Great point. A good example of theory vs practice.

I still think Xanadu et al are interesting, but it's instructive to try to understand what the real trade-offs are, the most obvious one being one of shipping speed.

I think a big reason for worse-is-better is that simple "crappy" solutions very often have better evolvability characteristics:


"Elegant and beautiful" systems that are polished and perfect are often also brittle and monolithic and complex. Complex means hard to interoperate with, understand, and re-implement, and brittle means they can't change with the times.

Complexity also favors single vendors due to the difficulty of interoperability and understanding. Xanadu would likely have ended up as Xanadu, Inc. with no third party implementations.

I've been using the Web since late '92 and evaluated and selected the Web for a project in '93 where I looked at other hypertext tools.

If the Web was a hack then the world could be doing with more hacks.

So what do you call the demo that was delayed a couple of weeks to perfect before releasing? I don't know either.

Or, conversely, make sure your crappiest demo is as robust and compelling as the web.

This turned out to be not a bug, but a feature: removing the requirement for permanent, stable addresses made a large-scale hypertext system something that could actually be implemented. (Unlike Xanadu.)

Here we are, more than twenty years later, and a hypertext system with permanent, stable addresses is still so rare in the real world that creating one will get you on the front page of Hacker News. The "crappy" Web, meanwhile, is the foundation of modern commerce, culture and communications.

If that's crappy, I hope to God I can make something crappy someday.

The thing is, Xanadu was implemented -- twice, at autodesk, prior to 1990. (It wasn't released for political reasons until 1999.) Please check your facts before saying something clearly untrue like "Xanadu couldn't be implemented".

Even if it wasn't -- being a first mover by sacrificing quality can get you greater adoption over superior competitors. Compare the Mac, released in 1984, with the Amiga 1000, released in 1985. By being delayed by a year, the Amiga was able to have all the features that the Mac team wanted but were told that they couldn't have -- color graphics, multitasking, and an actually usable amount of RAM, at about half the price of the Mac. The Mac, on the other hand, was the follow-up to the Lisa, which was technically superior feature-wise but too slow to be usable. Both the Amiga and the Lisa lie forgotten, and people come to believe that the Mac was the first consumer GUI machine or the best consumer GUI machine of its time -- or that its limitations were unavoidable. In reality, a combination of luck and good PR warped our sense of history.

The reality is that the web was not the first hypertext system, or the best, and the features that it lacked were implemented and functional in its competition. The web gained a following through a combination of being free (at a time when most hypertext systems, including Xanadu, were proprietary and commercial) and stupid enough that it could be reimplemented poorly by beginner programmers -- which, of course, was its aim.

The Web was designed as a model to teach non-programmers the concepts behind Enquire, Tim Berners-Lee's "real" hypertext system. The fact that the web escaped and Enquire didn't is a tragedy.

Lots of crappy things become the basis of culture, commerce, and communications. Quality doesn't particularly matter when it comes to being the basis of things, and nuance and complexity work against you if you want wide adoption. But, when something crappy becomes so widespread, its flaws become far more dangerous.

Lack of permanent stable addressing made possible the DMCA safe harbor provision (and thus systems like ContentID that delete your content at random on the off-chance that you were quoting somebody famous). Lack of bidirectional links caused the monetization system for the web to be based on advertising. Embedded markup created the need for complex javascript kludges to navigate deeply hierarchical DOM structures in order to perform simple transformation.

The web is a long series of poorly thought out decisions, all of which in the end led to both the grey hairs, premature heart attacks, and hefty wallets of people on this site. We're stuck with it now, because we've created a broken system and added structure on top of it to keep it from being fixed. And we, as an industry, profit from its brokenness. But, it would be intellectually dishonest to pretend that it isn't broken -- or that the ways in which it was broken were unavoidable. They were avoidable, and they were avoided; it's sheer bad luck that the web was adopted over its superior brethren.

Stable links were also part of the earliest ideas for the web, they just failed early. The URL was only intended as the 'locator', the URN was the stable link, and some way had to be found for URN's to be mapped to URL.

I absolutely love the concept and the underlying moral reasoning for why we need IPFS!

One of the first questions that came to me: Is there any way to add a forward error correcting code, like an erasure code to IPFS? I didn't find any discussion of this in the IPFS paper.

This seems somewhat critical to be able to compete with modern centralized storage systems which are likely to use FEC extensively to provide the level of redundancy that customers expect. Modern FEC codes can provide phenomenal levels of redundancy with less than 50% overhead. IPFS seems to rely on many fully-redundant copies?

Interesting Idea! I think TAHOE-LAFS also uses something like this?

From a practical standpoint, I'd like my local node to have a full copy so that my friends coming over don't need to reach out to the (maybe offline) internet but I don't see why we shouldn't support encoding the chunked blocks in such a way?

Based on my reading of this article, IPFS doesn't specify how you store the content - just that the content you provide back matches the requested hash. So in theory this could easily be implemented under the hood.

Though perhaps I'm misunderstanding your question - if you want multiple IPFS nodes to coordinated on storing parts of an erasure encoded file? that would be useful too. Again I think it's possible - you could build a small distributed system to host files this way. The most interesting variant though is if IPFS could have support for something like this natively - so nodes run by different users can each pitch in a little. Conceptually it's possible, but probably the protocol doesn't support it.

Reminds me how Usenet binaries are often posted along with par2 files.

Reed-Solomon ECC are just amazing!

This sounded like many of the bitcoin projects, specially filecoin and upon reading further I came to know it is by the same guys.

Similar projects have been in development for past few years such as https://github.com/feross/webtorrent and zeronet

This has same problems as the bitcoin infrastructure though:

1. It is unscalable. A page built on IPFS receiving huge inflow of comments would generate many diffs quickly and as soon as they spread in the network they get outdated, thereby clogging space on people's disks but more importantly clogging the bandwidth where nodes compete to download the quickly changing diffs.

2. This is not completely decentralised because it uses bittorrent trackers to identify the nodes in the network. Taking down the the trackers would take down the system as well.

Webtorrent is an already working, fast, alternative to IPFS but still centralised. Think of it this way, can you do a pure peer-to-peer webrtc site without needing a ICE/TURN/STUN server? Peer discovery is the centralised part of the problem.

As for your first point, you should really first learn more about how IPFS handles content addressing. If you look at section 3.5 the paper[1], you see that a IPFS object consists of a free-form data field, and an arbitrary amount of links to other IPFS objects. This is a very flexible format that allows for caching.

You talk about a page with comments. You could (this is an example) create three 'models': a Page, a PageBody and a Comment (which 'inherit' from IPFSObject). A Page would contain links to one PageBody and multiple Comment objects. Every time a comment is posted, a link is added to the Page object (so its hash changes), but the PageBody (where the majority of the page size presumably is) doesn't have to change. All peers can keep the same PageBody cached, while the latest Page contains all the desired comments. This caching mechanism will also work for unchanged comments.

Every time the Page changes, its hash would be published under some IPNS name, so peers will always retrieve the latest version.

[1] https://github.com/ipfs/papers/raw/master/ipfs-cap2pfs/ipfs-...

I gave a poor example. What I wanted to say was an ever changing data in a non-centrally managed context. Some nodes hold hashes for the previous version and then they'd compete for resources.

The thing from bitcoin is 51% attack where majority of nodes collude to provide the wrong data(they might not be malicious but just serving old version of a data). This can only be prevented by time-stamping in slow scenarios.

In case of neocities, a central server is giving you a hash that the nodes would provide you the data for, but without a central management how does IPFS associate correct file to the correct hash?

If it only works for a centralised scenario then it is similar to the http://peercdn.com project where instead of a server supplying content, your previous visitor browser cache is supplying the content through p2p webrtc and you save server bandwidth.

I don't really see how your 'competing for resources' scenario would work. New peers that try to fetch the new content at some IPNS name would see a new IPFS object hash, and search the DHT for that new hash. Nodes that have an old version won't be involved (unless of course the object 'hierarchy' contains cached data, which will only speed things up). At first the only peer having the new content will probably be the creator, but this will quickly spread across the network.

As long as peers do not decide to keep the data (by pinning the hash), the old content will simply be deleted and replaced with new content. Pinning is extremely cheap, so you can imagine that you would want to auto-pin pages as you visit them. This is not on by default though, and I think that's a good thing.

> [B]ut without a central management how does IPFS associate correct file to the correct hash?

It does so using cryptographic identities. In IPFS every peer has a private/public keypair, and the PeerId is the hash of its public key. When publishing content under an IPNS name (which always has the PeerId at its root), you must prove that you 'own' the PeerID by signing the object with your private key. Other nodes will not 'mirror' your IPNS object unless the signature matches (which makes a nice consensus).

The paper calls this self-certified names: you can distribute the objects, and let them be provided by third parties without losing authenticity. See section 3.7.

>In case of neocities, a central server is giving you a hash that the nodes would provide you the data for, but without a central management how does IPFS associate correct file to the correct hash?

Mathematics[1]. Your hash is based on the content of the file, therefore you can guarantee that the content is the same as what the original author published. Also, IPFS is not a blockchain, there is no 51% attack.


I am not native english speaker so perhaps was misunderstood again. You searched for a content by an author, and the network gives you multiple hashes, which one do you trust, without a central authority telling you?

I think the fundamental misunderstanding here is that you can't search by author. You can fetch by IPFS hash or IPNS pubkey (which is tied to a hash). You can also use existing name services (such as DNS or namecoin) to tie human-meaningful names to a pubkey or hash. (but probably a pubkey)

As with my comment above: you trust the content that is ultimately (up the hierarchy) signed in an IPNS object with a valid signature.

And there must indeed by some central authority giving 'ultimate trust'. This is Zooko's triangle. By default IPNS gives you 'decentralized' and 'secure', but you can also opt to tie an IPNS name to a DNS TXT record, and lose 'decentralized' (at least for the initial lookup) but gain 'human-meaningful'.

I think there is a level where DNS -> IPNS ID is even then still better than DNS -> IP address. For DNS -> IPNS you could have a facility in there saying 'This DNS TXT record should not change, if it does something is wrong'. You could lock down the record for a period of time, such as the duration of the lease from the registrar.

Besides, the idea is that IPFS could happily support Namecoin[1], so there's the decentralized, secure, and human-meaningful DNS service. The only 'non-secure' aspect there is the unlikely event of a 51% attack.

[1] https://namecoin.info

A 51% attack will let you double spend and roll back transactions. If one wanted to steal an established name on Namecoin, they'd need to either roll back to before it's registration, roll back to before it's previous renewal and force an expiration, or mine past it's next renewal to force expiration. Renewal limit is 36,000 blocks. For a name that consistently renews early every 18,000 blocks a 51% attack would need to go on for months to steal it. Still possible, but very difficult.

Namecoin is merge-mined with Bitcoin by many miners, so if you wanted to 51% it you'd need a pretty large chunk of Bitcoin's hash power. An evil pool could do this, however it would be visible on the Bitcoin blockchain that they were doing it (although not what transactions are present/not present in the attack chain).

> uses bittorrent trackers to identify the nodes in the network

It's using a distributed hash table, not trackers. Nothing's completely decentralized; distributed hash tables are pretty good, since you can join them by talking to any participating node at all.

That is true for the data lookup part where DHTs are replacing DNS in a way put still the peers need to discover each other's ip in some way.

I don't know what would be the term for it since it is not completely centralised, as in the nodes collectively form the system without any central coordination, but a new node joining in still needs some information to connect to the network from some source which could be denied-of-service by say a government, so not completely decentralised either.

It would be quite difficult to blacklist every bootstrap node. The thing is, every single IPFS node represents a DHT bootstrap node. In a world where everyone is using IPFS you could just turn to the stranger next to you, ask for his IP address, and bootstrap off him. You only need one single node to prise your way into the DHT network, one is enough to build out the rest of the network from.

The BitTorrent DHT's never been successfully attacked or blocked as far as I know. It doesn't seem like a big concern. There are millions of nodes and you only need to find out the IP:port of one of them.

I concur.

On principle, they could let the centralized server propagate the information that the file was updated. They could convert IPFS to a mere cache sharing system between browsers that are close by.

1. Don't use it for what's basically messaging

2. I thought torrents can work decentralized e.g. using magnet links and DHTs.

You can still use IPFS for direct messaging though, see http://gateway.ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsN...

The difference being you're addressing to IPFS node IDs, rather than IP addresses. The advantage here is that your IP address could change (moving datacenters etc.), but your node ID never needs to change. Technically you could just use IPFS for routing to between clients and your standard PHP server, which would still be an improvement over the current approach and would guarantee a DNS whoopsie couldn't redirect your site to China, but you'll get better results (less bandwidth required, guaranteed uptime) by decentralizing as much as you can.

Even DHTs require a server to bootstrap the network, due to a node's unicast nature on the internet.

That's why there's the magnet links, isn't it?

I wish this article mentioned something about BitSwap; before reading the IPFS paper, I was doubtful about anyone wanting to participate without incentives.

Still, even after skimming the paper, it sounds like a lot really depends on how well BitSwap is implemented. At one point, the paper says this:

> While the notion of a barter system implies a virtual currency could be created, this would require a global ledger to track ownership and transfer of the currency. This can be implemented as a BitSwap Strategy, and will be explored in a future paper.

So it kind of sounds like they need something like bitcoin, but don't want it to be tightly coupled to bitcoin - which is probably smart. But without a single clear solution for a reliable BitSwap strategy (and I'm not sure how it could be reliable without a distributed global ledger of some kind), it's hard to see how/why the necessary resources will be contributed to the IPFS network.

Check out another protocol we (the IPFS creators) are working on: http://filecoin.io

Interesting - I'll have to check it out. Thanks for pointing me to this.

Really cool stuff by the way - I'm very excited to see people working on this, and I hope IPFS (or something like it) is successful.

Awesome writeup!

Sure, it's easier to setup your own web presence with "the cloud" today than it used to be, but this only further centralizes control of content to the big cloud providers.

Not to mention the cost of bandwidth when serving content via HTTP. Sure, you can distribute your content via BitTorrent, but what kind of user experience is that? Can my grandma use it? Probably not.

I hope IPFS sees further adoption.

Thanks! Help us make it :) -- https://github.com/ipfs/

and, you'll love what we have coming. Soon, you won't need to install ipfs at all :D

Are you talking implementing IPFS on the browser level? Is there public code yet?

See https://github.com/ipfs/ipfs.js

It's coming along, not fully functional yet but you can watch its progress (or contribute code :D ) at https://github.com/ipfs/node-ipfs

Does this mean that browser-only users will not be able to 'host' content?

So is this primarily for static websites? I don't see dynamic websites going too well with this system.


If someone like GitHub supported this for GitHub pages it would be a great step forward for this as well.

With IPNS, which allows the creation of links whose destination can be mutated by whoever has the private key, you can implement quite a lot of interactivity. You won't ever be able to implement facebook over this platform, but you can replicate all of its functionality by using a tumblog-like architecture. Facebook's design is inherently centralized, and therefore doesn't really fit into the world of IPFS.

The idea is that every user owns their own content, not some central organization. Every user maintains a log of all of their actions, like an RSS feed. Every time they make a status update, that's an item. When they comment on someone else's status update, that's an item. Even when they just like something, it's an item. Each update of the action log would also have send ping to all friends, over some kind of distributed IM like tox. They already need an IM connection like that for messenger.

Given that information alone, anyone can reconstruct the activity of all their friends into a news feed, and post new content. You'd need a lot of client-side leverage, and the ability to upload to the IPFS network from the client, but all the dynamic information is there.

And there you have a fully distributed facebook clone. More resilient and privacy-sensitive and monopoly-averse than even tools like diaspora and statusnet.

> Facebook's design is inherently centralized

Not really. There's no reason you couldn't implement Facebook's core functionality with personal servers containing a profile at each node and a pub/sub (or even just pull) system connecting each node with its friends.

Actually, this is exactly what I'm looking for. I want to be able to control my photos and other content, and share that with family and friends.

All data is encrypted and my server can be hosted anywhere (just as simple as setting up a WordPress instance).

My own "news feed" are a combination of all of the other nodes I'm subscribed to.

My problem with Disapora is that they push the "sign up to someone else's pod" rather than "install your personal pod".

It just feels like a multi-centralised model. Rather than one Facebook with my data, now there are 311 "Facebookesque" Diaspora Pods (as of today).

As much as people hate the idea, a WordPress like (PHP) application as a "personal server" would be successful. There are lots of hosting companies out there where you can get up and running for peanuts.

My aim would be for a simple install:

- Download software and upload to hosting company

- Enter MySQL auth details

- Choose local filesystem or third party (S3, DropBox, Google Drive, MS SkyDrive) with OAuth as the backing store (files stored encrypted).

- Finish

Then I would be able to subscribe to other friends and family personal servers, just like RSS feeds. Friends can access my own personal server through OAuth and leave comments on content. Those comments are echoed back in my private feed which is displayed in their merged news feed on their own app instance.

The technology is already there. It just needs someone who is damn good with PHP!

We're heading towards this with Peergos (https://peergos.org source at https://github.com/ianopolous/Peergos). Encrypted, self-hostable file storage, sharing and social network. We're considering using IPFS under the hood for fragment storage/finding. You'll be able to either use your own disk, or enter credentials for a cloud provider. It's still very early days yet though.

The reason why they get people to sign up at other people's pods is that it's easier to get users that way. They are struggling to get users regardless and that is their biggest flaw.

That sounds a lot like the federated media hosting that GNU MediaGoblin is creating.[0]

[0] http://www.mediagoblin.org/

So, diaspora*?


Unfortunately it never really caught on.

Yes, I believe diaspora was a social network of this sort. But it's just one example, and not an indictment of decentralized social networking in general. Your address book + SMTP is a decentralized social network that caught on just fine. The only difference is that it was designed and built before corporate interests were the driving forces behind new standards (or lack thereof).

A few questions about IPNS:

- Is it possible to, say, log into my bank account, see my current balance, and set up bill pay?

- Can I order a pizza over IPNS?

- Can I do realtime chat?

Also, this article says we could use Namecoin and not rely on ICANN for DNS, however, you still need ICANN to distribute ip address to get connected to the internet in the first place. Or am I missing something?

1) 2) 3), yes, it is still possible to directly connect to specific peers through IPFS, see http://gateway.ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsN...

I think that it should be possible to build hybrid centralized/decentralized sites with IPFS, with static public content being fully decentralized and sensitive information going through direct tunnels. Of course, the more decentralized your system the better.

As far as the IP system, yes we are unfortunately still reliant on ICANN being sensible about IP allocations, but the DNS system can be fully deprecated. I do know that there is a project out there to redo the IP system in a decentralized fashion but I can't recall its name right now.

> project out there to redo the IP system in a decentralized fashion


That's the one. Juan mentioned it in one of his talks, said that it would be possible for IPFS to sit on top of that and use it as the network layer.

The first two cases you mention are about communicating with a specific real world organization. You can implement it using ipns, but it's inherently centralized.

The last one is a no. Well actually yes if you just poll a file, but that's not practical on a large scale. You need an external communication channel to implement that. It's possible to have that be fully distributed as well though, as in tox. You need this external communication channel to implement a lot of things anyway, so it would probably be part of the platform that forms around the idea of a distributed web.

I'm thinking about the discovery problem and how this relates to distributed technology now. File storage looks solved but creating content relationships like lists of similar files sounds much harder and still reliant on a authority that curates and filters the spam. Lost data in IPFS will be "buried treasure" in that nobody has the link to it anymore, but it still exists in a literal sense.

Interesting to think about.

There are plans for a (distributed) search engine: https://github.com/ipfs/archives/issues/8

How does friend recommendation work in a decentralized model?

You have friends. A script can check your friends, and make suggestions based on that.

You won't have the full graph that Facebook does, but does that really matter? I suspect the best predictors of who your friends are would be people already in your network.

Isn't friend recommendation mostly stuff like "people who follow you/you follow also follow X"? Maybe going one or two more follow hops, but not so far out that you need a global view of the social graph. If users of a decentralized network publish their following and follower lists, it should be pretty easy for a local crawler to check those hops.

IPFS is designed with mutability in mind-- think of it like git branches which advance to newer versions. (though you can indeed get rid of version history in that pointer if you want). So, the short answer is "yes, IPFS will work with mutable content just fine", though _how_ the mutability works is a harder, deeper question.

I think the best sources will be the paper, the talks on the website, and the FAQ:

- https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7... - https://ipfs.io/ - https://github.com/ipfs/faq/issues

They could take an approach similar to how Freenet does dynamic things. See here for details: http://draketo.de/light/english/freenet/communication-primit...

Freenet has a similar immutable datastore but chat, microblogging, etc have been built on it.

> So is this primarily for static websites?

More work would be needed here perhaps. But decentralized apps should happen one day. As someone without real technical knowledge in security and decentralized platforms, I'd love to hear from some on HN where we really stand on decentralized apps and security.

- Public data. Distributing, searching and querying is not a problem.

- Restricted data. You'd be distributing encrypted data to peers. Now, if we have a way to send an encrypted query to the peer and have it run on encrypted data without the peer being able to understand the query or the data, we'd be set perhaps.

- Obviously this would need application architectures to change. But that's solvable.

> Now, if we have a way to send an encrypted query to the peer and have it run on encrypted data without the peer being able to understand the query or the data, we'd be set perhaps.

Is that possible? It doesn't seem to be "N versus NP" but it seems similar. How can a computer find a correct answer to a question it doesn't know?

I suppose you could send off small units of processing across many nodes, such that none of them knows the full result set...

This is called "homomorphic encryption". There's a lot of research in this area, but we're far from being able to efficiently perform all computations.

Since the private data and the query are from the same user, they could share some mathematical connection perhaps, which would allow a query to be done and the results sent back (results which only the user can decrypt).

Google shows up plenty of articles on the subject, but I am not an expert to know where we stand on this.

Interesting. So you'd perform operations on encrypted data, and send back an encrypted result that's useful yet different from the original encrypted data, all without ever performing decryption?

It seems possible but extremely tricky.


My understanding is that it is possible, and Turing-complete systems can be built, but they are slow and unwieldy, requiring a large amount of cyphertext to encode data (on the order of 4KB per float, I seem to recall).

It's still up for debate whether or not such techniques will eventually be up for general distributed computing, e.g. distributing a VM instance over multiple distributed servers; or just for specific use-cases, e.g. bitcoin-esque distribution of small transactions, etc.

I don't really know about restricted data, but AlexandriaP2P is doing pretty interesting work with storing IPFS magnet links in a blockchain.

How dynamic? Resources can be updated to point to new immutable objects. For instance, as an operator of a website, I can sign the resource `/comments/10189265.json` as referencing an object with a particular hash, and then when there are additional comments, change the object it references. How fast things like this can update are dependent on the implementation, which I haven't read.

This is also the qualm I'm having with it... it's great for HTML files, but dynamic websites go right the window.

I believe they have an "IPNS" protocol which is intended to handle mutations.

Any links, please?

You can read section 3.7 of their paper for an overview: http://static.benet.ai/t/ipfs.pdf

I imagine you could go through their codebase to try to see how far this has come along: https://github.com/ipfs/go-ipfs

According to this, it looks like the implementation is currently in progress: https://github.com/ipfs/go-ipfs/blob/master/dev.md

Hope something in there helps you.

I'm very excited to see what happens with IPFS. The article talks about replacing HTTP however, and this is definitely a tricky task.

Someone in this thread already asked one of my questions (So is this primarily for static websites?) but my second question is: So is this primarily for personal websites?

I'm having a hard time finding a good way for Facebook, for example, to monetize their website. Targeting ads go out the window with mostly static content. Even more so though, what about Netflix? How is DRM done? How do you make sure only the correct users can access your objects?

edit: Also, doesn't a "permanent web" have an inevitably fatal flaw that you can't free space?

One monetization strategy in general for distributed networks is that users pay for bandwidth. They're rewarded for the bandwidth they provide. They profit if they provide more bandwidth than they use. And app developers are rewarded when their apps cause bandwidth to be used. I don't know if IPFS works this way.

As far as paying a premium over raw bandwidth to access premium content (Netflix), you could pay them to unencrypt it for you to download. DRM though isn't secure in the first place and decentralization won't make it better.

Oh, of course, I'm not trying to argue that DRM is good or that decentralization would make it better or even work as well. The problem of course is that without it working as well as it currently does, premium content providers have no incentive (and in fact, are decentivized) from making the switch.

The general idea of decrypting premium content at a cost would work fairly well, however I'm wondering if it's something that can be done easily and dynamically via IPFS. For example, can I, without burdening the network with a billion copies, encrypted with each the client's key, hand that data off to them? If I simply decrypt it and hand it over, it's on a distributed network and anyone can grab it.

Delivery of the unencrypted premium content would likely happen through a direct link with a different protocol. As you say, the content providers don't want that content to be accessible to everyone in the network.

The website used to navigate the catalog of premium content could be built as a IPFS page though.

You can free space, but how that happens is not explicit.

The more people a file is requested by, the more places it is stored. If the file ceases to be requested, and all the participants decide to remove it from their caches, the file is gone.

DHT records are slightly more persistent, but without a sponsor they fade as the network's churn removes the nodes that once hosted them.

It is very hard to delete things, and the most likely things to be deleted are things people are not providing effort to preserve (even one person providing such effort is sufficient to maintain the DHT records.)

Hmm, that was sort of what I figured, but that seems to me as though calling it a "permanent" web is a bit of a misnomer. There is still a place for projects like the Internet Archive.

Brewster Kahle (IA founder) has expressed a public interest in IA acting as a seeder for distributed websites, so IA definitely still has a place.

Interesting point - but why wouldn't the business model that worked for Facebook - ads - work on a decentralised web? Instead of posting content to FB alongside which they run ads and generate revenue, individual publishers could keep the ad revenue themselves, perhaps with a brokerage / commission model like Google operate. Seems to work OK for blogs, which are quite decentralised. Why shouldn't it work for status feeds?

In terms of targeting, once you're dealing with "small data" instead of centralised "big data" you need a lot less algorithms to figure out what ads might work. A FB neural net probably does a lot of deep learning to guess that a bunch of mountain-biking friends might be interested in saddles. Decentralised operators could probably figure that out themselves.

My first thought when reading the specification was that they're liars and making bold claims out of context. Yes, any torrent ever released is also permanent. Yet, if I dig my archives and try to download 10 years old ed2k links from ShareReactor, somehow there aren't any seeds. Wonder why? Isn't it permanent? Because the ed2k link is right for sure, it's permament.

When a massively distributed file system[1] was introduced into the Linux kernel circa Y2K, there was what I remember as a lot of hype.

Nobody, but nobody wanted it then (or since, it's still part of the kernel), and I would be surprised if anybody wanted something similar today.

[1] https://en.wikipedia.org/wiki/Andrew_File_System

> uses a set of trusted servers

This uses a set of untrusted servers, so it's different. Also, trust use-cases come in a lot of different flavors, so the comparison of 'wants' is pretty thin.

All undergraduate accounts at Carnegie Mellon were on AFS in the 1990s. It may have been ahead of its time -- all I know is (a) it was much less reliable than NFS; (b) its security policy was different enough that people (admins included) left lots of security holes; and (c) *.struck

CERN still uses AFS extensively for all users' files. I believe they use a forked version of OpenAFS. My experience with it at CERN has been okay. It's been fine to use locally, but transatlantic over the Internet can be very slow (it does work eventually though).

I'm not sure if there are any alternatives that work any better over long distance. I suppose CERN also has CVMFS [1], which is a read-only caching filesystem that retrieves the files from upstream via HTTP. This works much better, but it is read-only so only satisfies certain use cases.

[1] http://cernvm.cern.ch/portal/filesystem

I was running IPFS as "my own pastebin for files" for a while (it's great!), but was wondering what can they do to improve adoption / popularity. This move is amazing! Useful, interesting for people who care, and visible for others.

Next, I'd really like someone to implement git backend in IPFS / IPNS. Right now there's https://github.com/whyrusleeping/git-ipfs-rehost but that's just simple hosting.

I love the person who made it. It's exactly what I needed.

thanks :) if you find any issue - let me know

I'm assuming that once ipns starts to work reliably you'll add the support for it? Or is there something that would prevent it? (I mean for single-user repo anyway - multi-user would have to share the key... somehow)

Right, that's the idea.

Nothing is technically stopping me from adding ipns except finding the time to do it.

This reminds me of a feature I sorely miss on my blog (and almost every other blog I visit!) which is for the blogging engine to automatically take a snap-shop of webpages you link to and let users read the cached version at will.

If I link to a page and it goes down, or the content changes, it in a way changes the content of my blog too - in an unwanted way that is!

This! I already thought about starting a project for a local browser proxy that builds an index of <$url,now()> => $snapshotHash as I go. Maybe mixin readability and metadata like referrals.

I find it quite belittling how browser vendors treat their history features in terms of archival.

Zotoro saves snapshots, although it is for scientific references (citation manager). Due to ads and what not, it will balloon in size surprisingly fast.

Interesting. The ads problem is what i meant with 'mixing in readability' by using techniques to reducing a page to its core content. Only text paragraphs from blogs, images from flickr/imgur albums, audio/video from media sites, etc.

Check out Crestify which supports bookmarking with archiving to archive.org and archive.today https://www.reddit.com/r/alphaandbetausers/comments/3j702x/c...

A plugin that makes archive.org and archive.today save a snapshot would be ace. I would give (measly, sorry) 20$ if someone goes to crowdfund this as wordpress plugin.

You could use Crestify which automatically archives copies of your bookmarks to archive.org and archive.today Once you have a page bookmarked, you can use the archived links in your blog post



Not a fan of third-party hosted services unless necessary but thanks.

My first thought is how does it handle SHA hash collisions? Googling for it took me back to HN: https://news.ycombinator.com/item?id=9322489 AFAICT it's not yet resolved.

256 bit hash collisions have so low probability that you don't need to prepare for them. If one happens, its cause for celebration.

You can cover earth surface with 8.5×10^36 oxygen atoms. Imagine that every one of them is IPFS file with 256 bit hash. Probability of collision is still less than 0.1%.

ipfs currently uses what we call a 'multihash' where every hash value is tagged with the hash function used to generate it. As of now, the default hash function is sha256, and if I recall correctly, the chances of a world ending asteroid hitting earth and wiping out the planet is far more likely to occur than a collision. If sha256 is shown to be broken, we can change our default hash function to something like sha512 without breaking any existing data.

Well.... while an asteroid annihilating life on the planet isn't likely, the size of the web isn't a joke either, so such a collision is more probable than in other cases...

I suppose this could be made a problem for the user, i.e. if you want to make sure that your page isn't colliding with any other, then simply check first. (Of course there is still a race condition possibility of two colliding addresses being created at around the same time...)

Heres the article i normally link to when this comes up: http://stackoverflow.com/questions/4014090/is-it-safe-to-ign...

Its generally considered safe to ignore, there are no known sha256 collisions, and most people are fairly confident they wont see one for quite some time.

the caveat of course being that SHA256 may be _broken_. That's when we'd have to upgrade to another hash function.

(We may improve how this upgrade will work down the road by allowing links to link with multiple hashes at the same time, but at this time this has not proved to be necessary, and can be added later)

In the (relatively rare) case of a SHA hash collision, couldn't you try and resolve a collision by using (the results) of other hashing functions?

While two things may produce the same SHA hash, they may differ with other hashing functions.

All you're doing then is fiddling with probabilities. You can do that just as well by changing the length of the sha unless you have a reason to think sha is broken.

However, the probability of a hash collision is already incredibly low. As a demonstration of this, here's a list of a tiny subset of bitcoin private keys: http://directory.io/

The human mind just isn't capable of understanding how large of a number the number of sha hashes is.

Right, but fiddling with probabilities is exactly what you want to do in that case right? I don't know if a hashing algorithm that has no collisions exists (I suspect the answer is no, maybe other than the identity function). If collisions start to become a problem, you can move that problem further out (much much farther out) by mixing more hashing functions in, which was my point.

It's well documented (of course also by the link you posted), the super low likelihood of hash collisions, but I think (hope) the original poster knew that -- was trying to answer assuming that extraordinary case actually happened.

> I don't know if a hashing algorithm that has no collisions exists (I suspect the answer is no, maybe other than the identity function).

There is no hash function without collisions. The set of inputs is infinite but the set of outputs is finite. (The identity function isn't a hash function--its output isn't a fixed length.)

Thanks for the correction! Somehow forgot the fixed length output requirement of hashing functions.

But the probability that sha256 is broken is much much higher than a collision, and definitely within the realm of human understanding.

this is why we use multihash -- https://github.com/jbenet/multihash

It will reduce the probability of a collision at the expense of the address being longer - because now you need to carry around hash1 + hash2. But still it doesn't eliminate the possibility, only makes it less likely.

There's no hashing construction that will "completely eliminate the possibility" (see https://en.wikipedia.org/wiki/Pigeonhole_principle), so I'm not sure what answer you're looking for. It's computationally infeasible to generate collisions (accidental or malicious) if SHA-2 is not broken. If you're interested in reading about how combining multiple hashes affects security, see this paper: https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472....

Exactly, what I noted was the most likely temporary/possibly long-term solution to the problem.

What about hashing an object that contains the hashes generated by x hashing algorithms? an extra step of resolution, sure, but some way to easily demarcate that some file experienced a collision with another file, and had to have the number of hashes increased to stave off further collisions?

instead of calculating multiple hashes every time, might as well use a better hash?

We're way past my personal knowledge, but what I was thinking was that the probablistic combinations of two algorithms outweigh (in terms of probability of a collision) the likely hood of a single algo.

IE if you have some hashing function with a 1/2^5 chance of a collisions, using that in conjuntion with another hashing function with say 1/2^3 requires that both unlikely probabilities occur, resulting in 1/2^8, where a "stronger" hashing function might only have 1/2^6 (technically stronger than either one of them, but still not as strong as both)

I'd really love some correction on the logic above though, I am by no means well studied in probability or hashing functions, despite some effort, and would really appreciate any corrections.

Well, I'm not 100% convinced this is going to take off, but I'm intrigued. I'm going to make an effort to get Fogbeam Labs' website up and running on IPFS shortly as well. I'm curious to see where this goes.

While IPFS can certainly complement HTTP (for example in preventing censorship in states such as China), it will not replace HTTP. Decentralization does not work with centralized services (99% of the web) such as Twitter, Google, etc. You'll need HTTP to run server code. I for one will not be hosting a node where arbitrary code can be run. Besides the obvious fact that a centralized service wants to control what serves from their backend, and the funny scenario where a 51% consensus can decide what Google's logo is.

>You'll need HTTP to run server code

It's better if your application is redesigned to be distributed on a fundamental level, but you can still send custom data through IPFS in a standard client/server relationship if you want to, see http://gateway.ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsN... for an example.

>I for one will not be hosting a node where arbitrary code can be run

This isn't a distributed computation platform like Ethereum, it's purely for transferring files, and the only files you host are those that you explicitly request. Also, IPFS is not a blockchain with a 51% to take over, it's an entirely different animal, the only way to redirect a site is with the correct associated public/private key (unless SHA-256 gets broken at some point in which case IPFS migrates to a different hashing algorithm).

So let's say I'm building a decentralized chat app built on some DHT and would like to host it on IPFS.

I imagine getting the static html/js/css for the client-side app onto IPFS should be simple enough, but how would I handle bootstrapping?

Even in a decentralized service, clients need to connect to a seed server somewhere to start querying for other peers on the DHT. Would it be possible to host such a seed server on IPFS? Or would the seed server still need to live on a traditional server somewhere on the "old web"?

The way our network currently bootstraps is to dial a set of peers that are hardcoded into the default config file. We also have mdns for finding peers that are on your LAN without the need for any hard coded addresses.

Thanks for the quick reply. I was actually thinking of an app running on a separate IP-based DHT, but your reply brought to my attention that IPFS actually already runs on a DHT.

Since IPFS runs on its own DHT, are there any plans for a client-side IPFS DHT library for apps to use to connect to peers on the IPFS network? Similar to https://github.com/feross/bittorrent-dht, but with native support for IPFS peers?

Something like this would make building a client-side real-time app based on IPFS a much more streamlined experience since developers won't have to depend on an external DHT.

But what happens if mobile devices will become the norm instead of PCs?

The number of hosters will be far less than the number of clients.

The same issue is now in bitcoin, lots of lightweight clients (that don't keep a full ledger), and not so many full nodes (probably even less after mining rewards will be zero).

On the other side of the coin, I'd say mobile clients offer more opportunities for local p2p distribution (Bluetooth, etc)

So even more battery drain?

I mean indeed local p2p distribution sounds nice, but what's the use if the device is dead in a few hours?

A neat system might be, when the reciprocation system gets more advanced, to allow for nodes to build up a reputation/trust with other nodes while they're at home sitting on charge/wifi. Then, they can 'cash in' these brownie points when they're out and about, to download stuff without reuploading anything at the time.

There are a number of ways this reputation could be represented, perhaps one possible approach might involve exchanging some unit of cryptocurrency (Filecoin?), or perhaps a web-of-trust style system where nodes publicly 'trust' other nodes based on how much use that node was to them. Personally, I think cryptocurrency is the way to go here, but there's lots of work out there on P2P incentivization strategies.

Of course, this could work in tandem with the standard barter system that bittorrent uses, IPFS is not locked into any one reciprocation algorithm and could happily use multiple different systems depending on the situation.

The upside is less battery drain from long-distance data connections. But yes, it's not a panacea.

Mobile devices are getting increasingly powerful, so by the time this is implemented (and when/if it becomes popular) it probably won't be an issue.

Batteries and power will most certainly be an issue for mobile devices.

I have always wondered if there is some way to make a distributed network which provides content in such a way that users' freedoms are maximized. I don't particularly like handing over all control of my data and internet presence to cloud providers. IPFS looks really, really cool!

As ot asked, I am also curious about how this compares to other efforts to create decentralized networks, like Freenet and GNUnet. I definitely plan to pick one in the coming weeks and start using it, hope this catches on.

That's interesting, but I see no reference in the website or the paper to GNUnet or Freenet or other existing DHT-based file distribution networks. How does this compare?

> Organizations like the NSA (and our future robot overlords) now only have to intercept our communications at a few sources to spy on us.

If you had the hash of some content, couldn't anyone find all of the ips currently serving it?

EDIT: Found this


"ipfs dht findprovs <hash>" gives the hash of the nodes serving it. I'm not sure how to get from that to an IP.

"ipfs dht findpeer <peerID>" will search the DHT for connection info on a given peer.

- So I want to download a file over IPFS today, how do I go about it? What sort of client do I use?

- Now I want to view a whole web page. Is that possible?

- Let's say I have a site that I want to publish through this, along with a bunch of files (images, downloads). I'm serving that site now through Apache. Do I need to 're-publish' each file (using the command in the article) every time a file on there changes? Or is there some automated way to have a distributed version of the latest version of my site? I mean, it's fine and dandy that people can store cached copies, but what I want is that if there's an update to my software, that people actually get the latest version when they click 'download', not that they get an old version because some dude somewhere made a mistake at one point 'hosting' a static version of my site and not updating, and visitors just happening to use that version because that dude is closer to them (in network topology terms) than I am.

1) you either use one of the public, legacy http gateways and dial https://ipfs.io/ipfs/$myContentHash(/andDepending/maybe/Some... or you use the go implementation which is available for download at https://ipfs.io as a compiled binary.

You can use it like a unix command line tool to add and get files but you can also run it in daemon mode. It then also opens a HTTP server locally which allows you to use the same URLs as above just with http://localhost:8080 as the schema and host.

2) You can use the ipfs cli tool to add a directory with your sites html and other static content. The client will then give the the root content hash. Then you open /ipfs/$rootHash/index.html or /site2.html and voila! :)

Btw, this article is serving from ipfs. Try 'ipfs get QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5bgFYiZ1' and take a look in that directory on your harddrive. Also: when you now run 'ipfs daemon' your node will help with requests for this content.

3) To update content with a static name, you can use IPNS, it's also mentioned in the article. IIRC: To resolve those entries (and get the latest ipfs content hash) the node with the corresponding key for the ipns entry needs to be online. If it isn't, you get an error like 'Path Resolve error: could not resolve name.'. Maybe come to the #ipfs irc channel on freenode for more specific questions.

This is great news. Dapps [1] are coming, and DHT [2] /storage technologies like IPFS (although IPFS is more than just a DHT) are the other much-needed side of the coin - no pun intended - to make that reality happen. Exciting times ahead. [1] decentralized app [2] distributed hash table

A universal distributed file system that looks native on my system (thanks to FUSE) and caches content I access locally, while serving it securely to others, is interesting... assuming you can solve key problems like search, identity when you want it, privacy when you dont, and human memorable addressing... which afaict IPFS doesn't address.

The "permanent web" on the other hand you can create using MHT archive files and BitTorrent magnet links. I struggle to get excited about that.

Wake me up when someone figures out how to make apps that require seamless multi-user collaboration, like say distributed clones of HN, Facebook or Github.

> Wake me up when someone figures out how to make apps that require seamless multi-user collaboration, like say distributed clones of HN, Facebook or Github.

There are people working on this, so stay awake.

+1, I still try to understand how to uniquely address all the content (btw asked for it here, just one guy made a joke). Can't think of fb using this without centralization also.

Distribution is a great idea. Decentralizing is a great idea. Open Source is great idea.

Permanent.... is a great idea and a very conflicting idea.

While the utility of permanent cannot be denied but the implications for privacy, piracy are far reaching.

This isn't freenet where you have no control over what you store. If you have an objection to certain content, you don't have to store it. It's only permanent so long as at least one person is willing to distribute it.

eg. dodgy Snapchat pics your mate's gf sent him?

Transparency can work both ways.

I wonder what the legal implications are for hosting a node. I imagine you would have to comply with DMCA requests in the US, so just hosting a node would require you to setup a whole process for that.

IPFS is not like freenet where nodes host content -- it is more similar to bittorrent, where you broadcast the files that you have available on your node.

Copyright violations will likely function in a similar route to bittorrent.

Unless everyone just ignores DMCA requests and they become unenforceable.

Ignoring DMCA requests opens you up to being sued for copyright infringement though—that's the entire purpose, is to provide a copyright "safe harbor" for places that host user generated content, provided they meet certain restrictions.

Just reading back though the specification of IPFS, be wary about the claims of 'permanent Web'.

Distributed content in IPFS is eventually purged from nodes unless one of the hosting nodes has pinned it to be retained. Therefore, if no-one at Time x views certain content and pins it, then unfortunately at Time x + n that content might disappear just as thoroughly as under HTTP.

Unfortunately I fear that means that 'popular' content persists whilst niche and unique data might still fade away.

The idea is that people who care most about keeping it up will pin it, or pay to keep it pinned.

If this gets implemented in browsers (native or as an extension) I would hope that bookmarking would serve that purpose. Seems like to logical choice.

Perhaps it is also possible to tweak the caching algorithm to keep track of files that are rare on the network.

I believe incentives for seeding rare files are in the future plans for bitswap

Much like they do now by keeping their web server online?

If somebody put something online fifteen years ago, they might not care about it anymore, but I might. With the current web, I can mirror that content by downloading it and putting it on a new server. With this system, that process would be part of the basic workings of the system.

Sounds somewhat like bittorrent.... as long as there is a "seed" then the content is available, but it falls apart when the last seeder disconnects.

... and it also sounds like usenet, but here everybody is a news server and they selectively host articles.

With bittorrent you could make a call in a public forum for more seeds for a file. And if people have a copy of that particular content they can reseed.

I wonder if this would also be possible in this case?

For example, my small server might serve 200 sites but I'll backup around 500 or so older ones somewhere else.

For more details about IPFS and other potential uses I recommend listening to this Software Engineering Daily episode with the creator of IPFS Juan Benet: http://softwareengineeringdaily.com/2015/08/25/interplanetar...

We've been thinking about this problem for quite some time. My friends and I are passionate about re-decentralizing the web. The biggest challenge we've come across is this:

Many users want to do things with other users

That means there has to be a way to persist state and history (of a group activity, an evolving resource, etc.) across machines. Mental poker is still very hard to do for groups. Since most clients will not have amazing uptime, that means you still need a network of servers. Servers that manage security behind a firewall.

So, the conclusion is -- for group writable data, the best you can do is have servers be completely interchangeable, but you will still need servers and networks of servers.

The good news is, many of the challenges are orthogonal to each other. Someone's building a Mesh network, for instance. Someone else is building the protocol layer with IPFS. We're building the social layer for this coming web.

Couldn't you do group editing through using lisp inspired link lists? See a comment thread? CONS your comment onto the end of the comment thread and push into the ether, and notify about the new list head.

Take other patterns from clojure immutable data structures for various other kinds of group edited time travelling sites

It might be possible to do some sort of self contained data structure that authenticates everybody and only allows authorized access to it, which solves the Byzantine Generals problem like BitCoin does, and which resists forking and instead uses vector clocks and Operational Transformations to consolidate actions done to the tree. But even so, how would it resolve conflicts? It could be a long time until a conflict is discovered, and then do edits get dropped, or what?

Why would you want to resist forking, and incur all that other complexity when you can just have an "authoritative" source for the entry point? Similarly, if someone wants to fork the conversation off and become a new "authoritative source" why would you need to stop them?

as for conflict resolution, merging, etc, it kind of looks like a (mostly) solved problem. I don't see how conflicts could occur in a comments stream. It's append-only.

It's append only until B and C get a copy and each append their own comments, then D and E get two separate forks and append their own comments. That can work for trees to which you only append, such as comments. That does NOT work for any operations that do non-append operations on trees, such as operations which rearrange the tree into a line.

So yes, for append-only tree structures, this may work. In terms, of say, a Google Doc that means you can never delete or insert any text, but only append text to the end of paragraphs etc. Certainly very limiting.

Unless there's something you haven't described, in which case I encourage you to provide a more in-depth comment describing a solution for actually inserting or modifying the tree.

actually, google docs ARE append only. They use the vaunted google wave tech of Operational Transforms. every single timestamped keystroke is saved separately in the database. when two users are involved, and editing simultaneously, the OT algorithms resolve the merge conflict on the spot. it kind of looks like a tree structure, or an append only array. the actual document you are looking it is actually something like a flattened keyframe in a movie that's reached the end.

This looks great, congrats to everyone involved, this is really exciting stuff. I got a couple questions I didn't quite understood.

1 - What prevents someone from altering a JS file and serve it to other peers?

2 - Is it possible to obtain all the version of an object and see it's history of changes (like Git)?

To 1: If you alter the contents, it's hash would change, thus it's address would change, if I understood correctly.

see https://github.com/ipfs/specs/tree/master/protocol#ipfs-and-... :

"authenticated: content can be hashed and verified against the link"

1) Content is verified by its hash 2) Not yet, but I believe "commit history" functionality is planned

You might also want to take note of the browser addons for firefox[1] and chrome[2].

Those redirect requests to public gateways (http://(gateway.)ipfs.ip/ipfs/$path) to the http server of the local ipfs daemon. It fetches the content using bitswap protocol and makes it in turn available to other nodes, helping with the distribution.

[1] - https://github.com/lidel/ipfs-firefox-addon [2] - https://github.com/dylanPowers/ipfs-chrome-extension

Is there any incentive to host nodes?

While running an IPFS node does supply some utility to the DHT, the primary utility you can provide to the network is providing files.

You only provide files that you have previously accessed via IPFS or added yourself. So even without any external motivations to sponsor other people's content (which are in the works), you have the motivation of sponsorship of content that you wish to increase the robustness and accessibility of. Increasing access to public utility can be itself an incentive.

yes. https://ipfs.io/ has more info, in particular, the linked pdf.

Also check out filecoin, which as I understand uses hosting data as a proof-of-work for cryptocurrency.

this PDF: http://filecoin.io/filecoin.pdf ? http://ipfs.io appears to be under heavy load...

I just couldn't digest all, after I read this entence in the description:

--- When neocitieslogo.svg is added to my IPFS node, it gets a new name: QmXGTaGWTT1uUtfSb2sBAvArMEVLK4rQEcQg5bv7wwdzwU. That name is actually a cryptographic hash, which has been computed from the contents of that file. That hash is guaranteed by cryptography to always only represent the contents of that file. ---

And opened a reddit/eli5 question, just to understand "if we can use a hash to identify content uniquely?" here: https://redd.it/3k8g51

Can anybody elaborate and lighten me up a bit?

Pigeonhole principle: https://en.wikipedia.org/wiki/Pigeonhole_principle

I think you've correctly picked up on the difference between "uniquely" and "infeasable".

It's not possible to use a hash function to give every possible input a unique reference. But a cryptographic hash function aims to make the inputs to "hash collisions" be very different. So, for real world files the hash is unique, because the file that would produce a hash collision would be malformed and useless.

In general when you hash a message you don't want an attacker to make subtle changes and get the same hash.

"Here is my electronic payment for £100" should never have the same hash as "Here is my electronic payment for £10,000. Nice to meet you on the weekend".

Okay, I can't explain it like you're 5... But here's as close as I can get.

You've got DNA. Everyone's DNA is unique (shhh, just go with me for a minute). In fact, let's say that we found about 140 GATC pairs, which were also unique (we don't need your whole sequence, just those 140 pairs). Meaning, if we take your 140 GATC pairs, and we take my 140 GATC pairs, we'd be guaranteed to get different results.

Now, if I want to refer uniquely to a person, I can just use those 140 GATC pairs, which looks roughly like QmXGTaGWTT1uUtfSb2sBAvArMEVLK4rQEcQg5bv7wwdzwU (when expressed with upper + lower + numbers).

But, you've been thinking to yourself, WHAT ABOUT TWINS?!?

Well, great question...

...but the cryptographic hash that we use is one where twins are EXCEEDINGLY UNLIKELY. Like, amazingly, shockingly, oh my god, unlikely. They exist, but in practice they're so uncommon that it doesn't matter.

Maybe sometime, some unfortunate guy is going to request a picture of Natalie Portman, and get an MP3 of Ted Nugent. It could happen. It's just super unlikely to ever happen in practice.

So, there you have it - a cryptographic hash is... excuse me, because this is a supremely flawed analogy... kind of like the DNA of a file.

Thanks for the response, I've just thought of that single example and couldn't digest it, but it is just me.

It's like [guid](http://stackoverflow.com/questions/2977593/is-it-safe-to-ass...) as far as I can understand and even bigger that we shouldn't worry.

There is a function that will guarantee to give you a unique 'hash' for every input value, and that is the identity function. Be sure to use a differently subscripted function for every input size though, as the fixed-length property must be satisfied.

Kidding aside, the answers given on Reddit 20 minutes before your question was posted here suffice.

I participated in an REU a couple years back working with Named Data Networking. Seems pretty similar... The ahem central problem lies with ourselves: TCP/IP was designed as a communication network, not a distribution network. NDN seeks to be a ubiquitous data distribution network. Iirc, the design helps fend off DDoS.

https://en.wikipedia.org/wiki/Named_data_networking http://named-data.net/

This brings up a important issue with the web and internet we have now. And that's the difficulty we have in supporting new protocols in clients.

"especially when bandwidth rates for small players start around $0.12 per gigabyte" - who the heck is paying $0.12/gb?!

More like 2€/TB

Video streaming is mentioned, but I don't see how this works for streaming, unless the metadata is the content. Is this right?

Sounds like the DHT lookups could be quite slow, 20 queries to find the content is going to be a lot of latency. Sounds like a better algorithm than binary search (which I assume is used) is needed. Having said that I think the immutable data / DHT stuff is a good idea in theory and I hope something like this takes off!

Bears some resemblance to Maidsafe.

Will the browser be able to download different parts of the same file from different servers ?

Yes, bitswap is the same basic idea as bittorrent.

Hmm, wouldn't this be a fantastic way for archive.org to host stuff? Set up servers around the world (e.g. with volunteers), copy every file you come across the web there, voila. Permanent.

Good idea. But a lot of websites also have a database on the server-side. I wonder how they deal with that.

honestly don't get the big deal or what the advantages are of this or what it solves.

bad marketing

I think this is a great trend we are seeing.

how does IPFS compare to bittorrent's project maelstrom?


In my view Maelstrom instantly disqualifies itself because it's kept closed source. I really don't understand how BitTorrent (the company) can take itself seriously preaching about "truely neutral" networks.

I've worked with IPFS, and it's really amazing. The development on it is very active, and the main developers are easily accessible for questions on Freenode#ipfs. I've been trying to build a tiny consensus protocol on top of it, but its native support for signing is still a little rough around the edges (and of course I've been distracted by a thousand other side-projects).

Maelstrom was a massive disappointment in my opinion, you appear to be locked in to static websites with no mechanism to issue updates. The entire site is packaged up into a single torrent blob meaning there is no hope of sharing swarms between different sites. And as has been said, it's closed source and Windows only. Avoid.

I love IPFS, the team has done an amazing job with the implementation. As much as i love it though, i can't help but feel https://morph.is/ 's technology of a Targetted Block is required for the web. I know that is outside of IPFS' specific scope, but is that something any of the IPFS/IPNS/IPN developers are thinking of?

Being able to push data to a target seems to be a tool that will empower real change - not just solve a technical problem (HTTP failings).


Not sure why the downvotes without explanation. I'm requesting legitimate discussion, so meaningless downvotes only harms the community imo.

Please contribute

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact