Hacker News new | past | comments | ask | show | jobs | submit login
IPFS: The Permanent Web (sourcegraph.com)
313 points by _prometheus on July 22, 2014 | hide | past | web | favorite | 46 comments

So, a named-data networking https://en.wikipedia.org/wiki/Named_data_networking project.

> Named data networking (also content-centric networking, content-based networking, data-oriented networking or information-centric networking) is an alternative approach to the architecture of computer networks. Its founding principle is that a communication network should allow a user to focus on the data he or she needs, rather than having to reference a specific, physical location where that data is to be retrieved from. This stems from the fact that the vast majority of current Internet usage (a "high 90% level of traffic") consists of data being disseminated from a source to a number of users.

(Another idea with a Ted Nelson pedigree, btw.) Van Jacobson's working on another, NSF-funded project in this area http://named-data.net/ at present.


Ted Nelson talks about this in this amazing and inspiring Google Talk video from 2006. He talks about the rise of packet-based networking, and how the future will be content-centric.


That's Van Jacobson, not Ted Nelson

Yeah, related! I haven't looked extensively into NDN, still need to sink into it, but good thing Van Jacobson's doing it! :)

Contrast points: IPFS is an implementation at the leaves of the network (rather than calling for routers to change initially), can be mounted as a filesystem, maps to the web (string paths as URLs), and provides a Merkle DAG data model.

I don't see how its fundamentally a different concept. Named-data networking is a loose category of proposals, doesn't necessarily need routers to change.

Obviously you can't tell that other guy what to do but if IPFS doesn't inform that other effort then I think they are going to be wasting time.

Anyway I hope you will very seriously look at the various different approaches to NDN (with several different names) as a loose category and consider attempting to recruit from or merge with or interface with other efforts (there are many similar systems) and possibly expanding the scope a little if necessary.

We really do need a new internet.

Also what do you think of operational transformations, do they relate in any way to IPFS or future IPFS capabilities?

Yeah, I'm all for consolidating efforts. I'll be looking into NDN more. Though there's only so many minutes in a day-- if you can help me figure out who's great there to talk to, drop me a line juan@ipfs.io (other than Van Jacobson, ofc. will go talk to him at some point.)

On OTs: yep! you can implement OTs on a merkle dag trivially, so you can build files from OTs. So! apps can use OTs as first class data structures and store directly onto IPFS.

This is similar to GNUnet[1] in many ways. GNUnet, besides offering anonymity and infinite application-building possibilities, comes with an interesting incentive mechanism for making nodes keep others' files.

There is a project to port GNUnet to the browser going on here[3]: https://github.com/amatus/gnunet-web

[1]: https://gnunet.org/

[2]: https://unhosted.org/decentralize/26/Decentralized-reputatio...

[3]: https://github.com/amatus/gnunet-web

According to a report by EMC, current global storage capacity is around 1.5 zettabytes (1.5 million petabytes) [1]. With Commercial off-the-shelf hardware, storing 1 PB of data carries a fixed cost of around $100,000 USD [2]. Thus, the cost of today's storage capacity is around $150 billion USD. (About what the United States government pays every year in interest [3].) Those numbers are expected to increase an order of magnitude by 2020. Thankfully this does not apply to the interest payments.

With numbers like that, and with cloud storage prices sitting very far away from marginal costs, protocols and incentive structures like IPFS and filecoin are going to be of great value to enterprises, governments, and consumers. I would not be surprised if, by 2020, a majority portion of the data online was stored in such a system; with exponential growth, a new majority share is created every ln(2) / rate years. In the case of the "data universe", which grows about 40% annually, doubling time is about 2 years [1].

[1] http://www.emc.com/leadership/digital-universe/2014iview/exe... [2] http://www.backblaze.com/petabytes-on-a-budget-how-to-build-... [3] https://www.cbo.gov/publication/44716

This sounds a lot like Freenet[0], though w/o the anonymity & plausible deniability guarantees. Freenet shares some of the same features + Limitations:

* Content keys are based on hashes of content, and don't change (unless a new encryption key is used to re-insert a file)

* keys that are not requested frequently will fall off the network automatically.

* Keys can be signed so only the holder of the private key can update said key

[0] https://freenetproject.org/

yup. my thought exactly.

the good news, I guess, is that this system might get people more acquainted with content-based networks, so freenet will become a next logical step.

on the other hand, I just don't see much point in development of non-anonymous system when we have a good anonymous system already

Freenet is not really good. For what I know, it is heavy, slow, doesn't work that well in practice, and has a huge and old codebase. What is more, it includes a lot of bells and whistles but (for all I know) doesn't separate cleanly the minimal backend for storage and the frontend of all the features that run on that.

There is certainly a use case for a more modern project that would try to implement cleanly the base features of Freenet, with a clear interface.

... and you are also the filecoin person, yes ?

If you have some spare time, could you solve that power inverter / solar panel challenge that's on the front page currently ? Thanks in advance.

hahah thanks :) I wish I understood physics + matsci well enough. danifong does though! Maybe we can convince her to make something.

So the obvious question, after reading that page, is... how do we ensure that if the data identified by X is there now, that data will still be there tomorrow? What incentive is there to host other people's data?

Great question. IPFS doesn't make any guarantees regarding durability, because ultimately this depends on people caring about the file, either because it's valuable to them, or they're deriving indirect value (being paid to care).

People that care about the data's presence will pin objects locally:

- I pin + seed my own files, or files I'm interested in keeping alive.

- I can hire a service (or multiple) to pin + seed my files for me.

- I can use things like http://filecoin.io to incentivize large groups of people to seed files for me. (this is why Filecoin + IPFS are sister protocols)

Is seeding strictly necessary? Borrowing from Diaspora, can't it be pods (now acting as data routers) that each node first connects to for establishing a peer-to-peer connection? Note, that this is just during the connection establishment phase (assuming TCP or higher level SPDY/QUIC), i.e the control plane and during the actual data transfer, the data can flow directly to the peer via a potentially secured connection (like a VPN?)

Of course, that would introduce some privileged nodes, but if anyone can create such a node, as part of protocol and be part of this data routing network, isn't that still a distributed egalitarian network?

Epeen is often enough. Make some sort of virtual practically useless reward depending on how much data you share/host.

So what happens when someone uploads illegal content (like torture videos, snuff films, child pornography, the US constitution on some campuses)? Can clients define blocklists for things they don't want to store?

Yep. Blacklists, and there's some other techniques (like blacklisting on the DHT).

This opens up the door to censorship too though, so it has to work like the web does now: individuals and groups can select whether to serve things or not depending on their own views. Same for Routing.

A twist on the previous question, would there be any way for me to deliberately choose to be completely ignorant of what I am serving, like running a Tor relay node (or like in Filecoin, if I understand that protocol correctly)? I would be a bit worried about the liability implications of having the ability to filter content I'm serving, but not having the time and resources to actually do any filtering.

As a non-lawyer, I would guess that you would receive the same safe harbor as ISPs do today: they can filter the content they serve, but don't until it's specifically reported to them, and aren't liable for those transmissions.

Even if, say, someone published a list of known illegally-transmitted copyrighted files, you could still make a strong argument that you don't trust the source of the list and would need to receive and review a specific report of each instance of copyright violation that went through your service, like the current DMCA model. (Again, though, not a lawyer.)

I see, thanks. Obviously any time you can delete things you can be forced to do so, leading to censorship, but I think the tradeoff is worth it. Meaning, letting people choose to not serve content they don't like will lead to more nodes hosting most content, while not overly impacting nodes that are willing to host the rest. Systems like these really need critical mass to get going.

Copyright-based businesses in the US are going to love your philosophical take on censorship.

That's how the internet works. If people are doing illegal things (like standing up to a totalitarian regime), they take on the risk of repercussion. (We can't tell a-priori what the information content is reliably).

People shouldn't accidentally store illegal things they don't mean to (hence blacklists) but you also don't want to snuff out freedom of speech of those who understand + are willing to take the risk. The issue is routing access to it is also considered hosting it (dcma takedowns for links on the web).

I think that the best thing is to have the default DHT include blacklists that can be updated to handle DMCA requests. Sort of like DNS works today. Definitely something that we'll have to figure out as time goes on.

  # a mutable path
This reminds me of AFS, which is old, but seemed pretty sophisticated. I've only personally seen it in use at CMU. This project has much broader aims than AFS, of course, and the resemblance is probably superficial.

You're on point. I cite AFS in the paper (http://static.benet.ai/t/ipfs.pdf). I used AFS at Stanford.

Seems like a very important piece of work.

I didn't read very carefully but I don't see many facilities for permissions control other than having a personal folder.

Did I miss that? Do you have plans for permissions support, like groups or read/write access or ACLs etc.?

Or maybe if you just make a simple type of group so that one account could share access to its personal folder with a set of other people/accounts, that would handle most use cases for permissions.

Other than that, seems like this is a great start on solving everyone's problems.

permissions are hard in general.

I think permissions in IPFS should be implemented as encryption + capabilities (see E).

(e.g. grant people access to particular paths by giving them decryption keys. once they have the blocks they can cache them, but you could do revocation by moving the blocks. gets away from the dedup benefits of the merkle dag, but if you _really want to do revocation_ you sort of can.)

Sweet. I loved the promise of SFS and was really sad to see the development on it die.

SFS ftw! let's make it happen.

I should mention: Am hiring, so if you'd like to work on this full or part time, send me an email: juan@ipfs.io

So would this be a startup or a non-profit org? The problem this project is trying to solve is bigger that you or me or any small group of people in an org, not in the technical ability per-say, but along the lines of "absolute power corrupts absolutely". If a single person or entity is the primary creator, what are your thoughts on, how to remain true to the original vision of a truly distributed, participatory network? The linux model perhaps?

The gist is a startup that makes money by running services or market protocols on top, and whose work is open source (e.g. IPFS is MIT licensed). I'm a firm believer in open source and want to build a company that can be independent of other companies, that as long as it's able to create value for its users, it can continue to evolve the internet. I should spend some time describing my vision for this because it's really hard to describe the full thing + get right. I should probably write, or maybe record a video?

I don't think a single blogpost or video won't be able to capture it. Audience in general may be pre-disposed to think and reason about it in terms of the existing underlying abstraction layer. (This is an unqualified personal judgement).

This problem and the proposed solution involves new ideas and changes across the boundaries of that layers of abstraction, so it would be hard to digest in a single sitting. Unless of course all this was synthesized in your own head :)

My guess is you would need need a more agile version of the Douglas Engelbart's "Mother Of All Demos", a functional prototype that shows everything works together in synergy to create more than a sum of the parts.

The best case would be if all the components of the puzzle are independently valuable, like Musk's SpaceX model.

Looks fairly similar to PPSPP: https://datatracker.ietf.org/doc/draft-ietf-ppsp-peer-protoc...

Both look pretty cool, I'd like to see some more traction in this area.

Thanks! Hadn't seen this. `toRead.enqueue(_)`

Correct me if I'm misunderstanding, but this is only beneficial for static content, correct? For instance, if I want to load facebook.com, I still need to hit Facebook's servers directly (rather than the IPFS mesh). How does IPFS know if I'm requesting dynamic content?

If your dynamic content is static content that is dynamically updated (with the mutable path), then it would seem like it would behave as if it was a dynamic page. For example, think of a REST API that was pointed at a bunch of static json files on your hard disk. This doesn't really handle writes, but it would certainly handle shared updating.

You can publish mutable stuff at the same identifier using IPNS. (the talk or paper describe it in depth), so you can publish entire websites that way, signed by your public key.

Logic in webapps are not covered here; logic is totally dynamic. that's not what IPFS is for-- HTTP works well, and there's other things in mind for the future. Some attempts to look at are ethereum, go-circuit. I've ideas around an erlang-inspired global vm, but that's a whole can of worms I'm not ready to open yet :)

IPFS says: do your logic however you want, return IPFS links, and fetch data from IPFS directly.

forgive me if this is totally naive;

Woud you say that there is a way where a truly distributed content-centric system could be build, sort-of around this whereby the following wouldd be true:

Assume there is a commonly massive shared set of content. Users have resource pools (money, storage, bandwidth, whatever) -- all apply some slice of their resource pool into the system and then all users are contributing to the servicing of content that is globally accessed by the group as a whole.

All "static" or mundanely common assets are served from this resource pool.

The regular pinning and seeding applies to all other "niche" content.

as more users access/gain interest into that niche content, the more resource units it receives?

Using locations instead of names for data seems like what Rich Hickey means when he talks about Place Oriented Programming.

Sounds like a workable version of the Xanadu project. Is there any connection there or ideas from that being used?

I think it's inescapable to build anything resembling the Web without touching many Xanadu ideas -- either opting to do what Xanadu proposes or the opposite. I like lots of the thinking behind Xanadu, though sadly most of it never saw the light of day. Actually, Ted had expressed interest in talking about this, I'll report back if we do.

Well, this seems to have versioning and some notion of permanence, which are two big ideas in Xanadu that the current Web lacks.

This has the potential to be seriously interesting!

thanks! :)

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact