Hacker News new | past | comments | ask | show | jobs | submit login
IPFS Project Roadmap (github.com)
242 points by robto 9 days ago | hide | past | web | favorite | 147 comments

"2019 Goal: The most used code and binary Package Managers are powered by IPFS."

That's kind of stupid-ambitious for 2019 when another 2019 goal is "a production-ready implementation" and IPFS has been around for 3 years already.

This isn't a roadmap, it's a wishlist. And I'm someone who wants to see IPFS succeed.

In theory, there are incremental paths to achieve that objective, while serving clients all the way along.

1. Define a P2P ready data model and protocol (which they have, Merkle forest and everything)

2. Run a single server/cluster, make a very lightweight client library for that

3. Expand the server to a makeshift CDN (start simple, e.g. rsync like mirrors)

4. Federate the CDN (still in a hierarchical fashion, so you always know whom you are talking to). Also, look for peers on the local subnet (broadcast is simple).

5. Expand that to a friend-of-a-friend network, use PEX like peer finding (PEX is shockingly simple https://en.m.wikipedia.org/wiki/Peer_exchange)

6. Go full P2P, talk to strangers on the Net, use DHT or anything.

The only trick is to start with a data model that can go all the way to (6). So backward compatibility/ installed base issues do not stop you at earlier stages. Think globally, act locally.

The dangerous thing is deploying an untested complex codebase to serve the most hardcore use case for live customers on Day#1. That may not work. Because even all these shockingly simple steps will turn quite long and tedious in practice. There will be "issues". Advancing one step a year is quite impressive if done under the full load.

The IPFS client is such an untunable memory hog that I turn it off whenever I'm not using it (which, of course, defeats the entire purpose). I would be ecstatic if we had something like the old uTorrent, but for IPFS. A nice UI, easy configuration, an ultralight implementation. It would be a dream come true.

Exactly this: such an ultralight, accessible implementation would make it better suitable to run on embedded devices and mobile phones. And since we still live in as fairly disconnected world this is probably an area where IPFS can accelerate.

I kind of made this https://github.com/hsanjuan/ipfs-lite

May not be ultralight (running a DHT node isn't, but it does remove a lot of crust).

We live in the world of NATs that effectively prevents any p2p networks.

What about Transmission? It seems much less bloated in comparison to uTorrent.

I mean an IPFS client, not a torrent client. There are hundreds of those.

When you say nice UI, do you mean a GUI?

Yes, to make it more accessible.

I would love to see IPFS used in projects like Nix, but in the current state it's downright impossibile.

I have seen IPNS taking between 5-10 minutes to solve a single address and go-ipfs with a few pinned files taking more that 5GB of memory after running for a day.

> go-ipfs with a few pinned files taking more than 5GB

Yeah, I don't understand the hype around IPFS when it performs so badly. After reporting the issue and having a conversation with the devs I had an impression that they are just dilettantes.

Experiencing this performance issue with ipns was a major turn off.

It's a bit sad, that Dhall (programmable configuration language for YAML & co.) used to use IPFS for its source/packages, but stopped, because of reliability :( (I'm wondering if there are/were others?)

> Early on in the language history we used IPFS to distribute the Dhall Prelude, but due to reliability issues we’ve switched to using GitHub for hosting Dhall code.



I wish IPFS the best, because at least in theory this seems like a perfect use case

We conducted initial experiments with moving nix package manager to ipfs, but also ran into too many issues.

Are these issues documented somewhere, would love to learn more. At first glance nix and ipfs seem like a perfect match

Sure thing. I think most of the discussions have been cross-referen ed in this issue: https://github.com/NixOS/nix/issues/859

also, Fury, the new Scala build tool by Jon Pretty, uses IPFS to distribute builds/build definitions




it seems like people working on package management/distribution related things LOVE the idea of IPFS

This is not a roadmap, but rather a wishlist. There is a fundamental problem that IPFS needs to solve first. This problem is called an efficient WebRTC-based DHT. In order to change the web, IPFS needs to become usable in the browsers. Since the backbone of IPFS is DHT, there need to be an efficient UDP-based solution for "DHT in the web". Right now this isn't possible and reasons are not just technical, but political. The IPFS team would need to convince all major players that enabling this DHT scenario is a good idea.

I actually wrote a DHT that operated over WebRTC itself with in-band signalling for my undergrad thesis, in the application/js layer. Total PITA, but a ... "good?" learning experience.

How could you possible make such a DHT? We live in the world of NATs, especially symmetric NATs, where each mobile phone user gets assigned a random ip:port every time it makes a connection. DHT, on the other hand, needs every node to have a persistent address that can be contacted any time. In other words, with the NATs, a DHT node cannot cache a bunch of peers and contact them later because those peers are no longer available at those addresses, so every time a node re-joins DHT, it needs to restart the bootstrapping process from scratch, from those initial hardcoded bootstrap servers. Effectively this makes this DHT a fully centralized system. WebRTC cannot solve this problem.

DHT doesn't need every node to have a persistent reachable address from every other node directly. It's ok if some have persistent addresses, some can be reachable after NAT traversal or only through other nodes acting as relays.

any links, please?

(Forgive janky thesis-ness) https://bit.ly/2P7w6cq

This paper is really well done. Im interested in seeing what comes of this. Going to spend some more time with it. Looks like you had the initial idea in 2011, ahead of its time. Well done.

IPFS is not usable outside of browsers yet, so I guess you're too optimistic.

If you want to do package managers, your #1 priority should be Nix. Don't do something more popular where you help less, go with the thing that you can really provide the killer missing feature.

Nix + IPFS has been tried before, but what was missing is the Nix-side hierarchical content addressing of large data (not just plans). With the "intensional store" proposal, this should finally happen. Please push it along.

Data shouldn't be hashed like Nix's NARs, or IPFS's UnixFS for maximum interopt. Instead please go with git's model for all it's problems.

Thanks, hope someone can take me up on this because I'm up to my neck with other open source stuff already.

Right but not connected to the intentional store.

The vision of a IPFS-powered web working is beautiful.

However I would love to see a reference implementation that works at minimum and not just drains out your computer up to latest resource it may have. If we're so near the "production-ready" status of the reference implementations then I think that goal will never be achieved.

I see this comment often when IPFS is discussed - but the devil is in the details when it comes to replacing the underlying tech of "the web" with something else.

How does an IPFS powered website do dynamic content? User sessions? Is all the client's session data encoded in the IPFS address itself?

Even if there's no user sessions, but the page content updates, how do you continuously point clients to fetch the right updated page (e.g. how would you implement a Hacker News style aggregator that updates every minute)?

IPFS does static content just fine - CAS-es are wonderful for that - but websites are much more than static content.

Dynamic content is not the problem. I don't want dynamic content. I think HTTP and servers are the way to go on dynamic content. I just tried, for years, to use IPFS as a way to distribute static content (you know, stuff that will never ever change, even if that stuff is referenced from a dynamic location), and the problems I encountered were so many I finally gave up.

I would love to see all these problems solved and IPFS working very well in the next few years, but I'm afraid the IPFS people are very good in making press releases and presentations, but not in delivering really good software as they say they do.

Anyway, they have no obligation to deliver anything to anyone -- except maybe the people who entered the Filecoin ICO.

If you want IPFS to be a replacement technology for the web, you need dynamic content. Else, it's a useful static content distribution network, but it's not "the web", not even in the sense of what the web was in the 90s, or the "web" any more than Bittorrent is.

Now, obviously, they're under no obligation to deliver anything. But I'm trying to understand what you mean when you say:

    The vision of a IPFS-powered web working is beautiful
Only handling the static part of webhosting is something, but it's not everything.

I'm under the impression that the original comment was referring to an IPFS-powered web, rather than the web being powered by IPFS. Good ol' HTTP servers will continue to form the web as we know it, and IPFS can provide a new web of static content.

So at best IPFS is a replacement for something like S3? If I want to create an application, I can have the static content portion of it hosted on IPFS, but the brunt still on my own servers? That’s a very niche use case, I would say. And while it allows me to host my copy of the anarchist’s cookbook, it’s not going to be good for much else.

I didn't say it's everything. I don't think HTTP must die or will die. HTTP is great, it's awesome, wonderful. I would like IPFS to exist along with HTTP, that's what I said.

Actually, I think the fact that IPFS developers are trying to replace HTTP one of the reasons they fail so awfully in producing a good IPFS for static content. They try to integrate much more stuff than actually needed in the protocol.

I don't want to hijack the discussion from IPFS, but Swarm has good ideas with respect to dynamic content if you're interested how that might work in a decentralized setting, for example see Swarm Feeds presented here: https://www.youtube.com/watch?v=92PtA5vRMl8

> how would you implement a Hacker News style aggregator that updates every minute

AIUI, that's the problem that IPNS is designed to solve (https://github.com/ipfs/specs/blob/master/naming/README.md#i...). HN controls its private key, enabling it to be the only one who can update the record at the signature of its public key, and those IPNS records have nanosecond precision expiry timestamps and TTLs, meaning they can update at the frequency of their choice

I agree with the sibling comments that (at least as the IPFS is currently specified) having a user session would be problematic. It's theoretically possible that _your_ HN front page would have an IPNS record of IPFS://mdaniel.news.ycombinator.com and then we're back to the aforementioned expiry semantics. Upvotes would have to travel out of the IPFS network, but in some sense, I think that's expected since one wouldn't want an upvote to be archived, but rather the resulting content to be

There are a ton of weird perspective changes when thinking about the content addressable web, but it might not be the 2050-esque far away that it seems

For dynamic data, you need to use a CRDT system like GUN ( https://github.com/amark/gun ).

For instance, see P2P Reddit ( http://notabug.io ) which:

- Running in production with GUN.

- Handling about 42,000 monthly visitors. ( https://www.similarweb.com/website/notabug.io )

- Has done ~1TB in 1 day of decentralized traffic.

You can then configure GUN to save to IPFS as the blob/storage engine (or we have filesystem, localStorage, IndexedDB, S3, etc. as options).

It is basically like having a P2P Firebase :) with IPFS plugin.

It's a shame that so many P2P advocates - or at least the ones using the stuff built with P2P tech - are very loud alt-right types.

I popped on notabug.io just now, and the chat is full of swastikas, racial slurs, and the most up-voted posts are primarily anti-gay misinformation.

I see you wrote GUN, which I'm sure was no small feat, and it looks like an impressive piece of technology. How do you feel about your tech being primarily used in this way?

:( few months ago NAB didn't have this problem. :(

The guy actually built a P2P moderation tool (anyone has their own "glasses" that filters based on your policy).

notabug.io is suppose to be running a curated homepage (nab.cx not curated) but I think he said it broke a week ago when he was modularizing his code.

NAB usually isn't this bad, lot of the people on it are anti-altright, but altrighters certainly can drown them out.

But it does look like NAB has become more toxic over time. :(

Previously, I was pretty neutral "do what you want with it".

Now, I shifted to be more opinionated about what I want to see built on top.

Primarily, apps that spread Open Source economics through art (music, etc.), to draw a crowd of lovers/creators not haters/destroyers.

Thanks for your response. I hope NAB gets better again.

This is true of pretty much any tech that is anti-censorship in some way or another (anonymizing and/or decentralized).

But it's hardly surprising:

"The trouble with fighting for human freedom is that one spends most of one's time defending scoundrels. For it is against scoundrels that oppressive laws are first aimed, and oppression must be stopped at the beginning if it is to be stopped at all."

In this case, it means that you'll see people who are predominantly censored from other places already use this tech to avoid further censorship. In US right now, at least, that tends to be alt-right, white nationalists etc. For a more detailed take:


I just clicked on notabug and didn't see any swastikas or racial slurs. Are you lying?

You think somebody would do that? Just go on the internet and tell lies?

Warning: image contains racial slurs and swastikas: https://i.imgur.com/0kdxzVR.png

A little later in the chat, the developer chimes in and talks about some stuff. Then some people angrily accost them for censoring things. If the Nazis and the racists aren't censored, I don't want to think about the odious content that is.

Edit: Here's a thread from... yesterday, where a bunch of users are mad about the "censorship" of the developer hiding a swastika post from the front-page (not even deleting it or removing it from whatever their equivalent of a sub-reddit is): https://notabug.io/t/whatever/comments/509b9189ece85515671d3...

Well, to me these seem like some adolescents trying to be funny and go against the mainstream. They post swastikas and call racial slurs because they know they shouldn't be doing that. They aren't really nazis, right? They don't even know what a nazi is.

You can call this "decentralized reddit" a bad place, as it really is, but you can't say it's because it's "full of right-wing people". These adolescents are not "right-wing people".

> They aren't really nazis, right?

At some point... is there a difference? If you find yourself in a group "ironically" screaming you all support X for long enough, soon you'll find that some of you actually support X. And that you enabled those people.

Well and besides, the actual content on the site seems to skew alt-right pretty heavily, whether the racial slurs and swastikas are ironic or not. Taken altogether, it paints a pretty clear picture of a voat-style forum that will alienate most other people if it stays that way.

They're there because they've been excluded from mainstream sites. It may seem extreme, but one way to assess how uncensorable something is by its content of stuff that you hate. Even stuff like child porn that any sane person would hate. So you just ignore it.

I get what you say. But it's arguable that most of those actual WWII Nazis didn't really know much about anything. I mean, watch Riefenstahl's "Triumph of the Will". There's stuff in there that reminds me a lot of Young Pioneer camps (not surprising, I know) and Woodstock (except for the lack of drugs). My point is that they were just belonging to something that seemed cool.

The sexual misinformation is on the front page. The racial slurs and swastikas are in the chat. Great look for a site.

I wonder what's changed for early adopters of new tech compared with the original internet in the early 90's? Early adoption was still demographically skewed towards certain groups, but i don't recall this brand of right-wing thought being so prominent.

Hey, very cool!

Somehow I forgot about it.

As you say:

> I think all censorship should be deplored. My position is that bits are not a bug.

> — Aaron Swartz (1986 - 2013)

Edit: But I like https://nab.cx/ better. Or at least, as an alternative.

That's what's got me excited - they've managed to articulate a vision for the future that I'm totally on board with: decentralized, privacy respecting, and user owned. I really want to see that vision become a reality.

it is already here! If you use a mix of SSB, GUN, DAT, IPFS, & WebTorrent.

SSB = Social-like P2P data.

GUN = Firebase-like P2P data.

SEA = End-to-end encryption. ( https://gun.eco/docs/SEA )

DAT = GIT-like P2P data.

IPFS = Images/assets.

WebTorrent = Video-like P2P data.

Are there example applications with some complexity on GUN?

Yes, D.Tube, The Internet Archive, Notabug.io (warning: its getting spammed by altrighters currently), etc.

Only saw your comment now, will reply on Twitter too - long time no see since #hashtheplanet !

It would be great to get case studies with some information on how they use the platform into the docs. GUN sounds reasonable from the description but it is really difficult to visualize using it in a larger app.

Case studies would be great, nobody is getting paid tho. What info you looking for? I can try digging it up for you.

Notabug basically does everything Reddit does + more. Would that qualify as a larger app?

What in particular, what do you think would be difficult?

For the internet to be truly decentralized, it needs to be so at the physical connectivity layer as well.

Perhaps a worldwide swarm of drones creating a mesh network.

Solar powered drone planes would be nice... or maybe even a swarm of satellites will eventually be possible.

I'm picturing something more like bird-like or even moth-like drones, intelligently repositioning themselves to provide the widest coverage.

Satellites are too big of a target, and not very transparent; e.g. we wouldn't know if someone went up there and installed some snooping hardware. The same can be said about drones, but with proper swarms the chances of you connected to a compromised drone would be less.

I wonder how much Broadband-HamNet could scale in practice.

Would love to see Arch/Alpine Linux repo move to IPFS by default. Would also like to see better integration with Git, and an SCM platform comparable to GitHub (or GitLab). That could really get the developer community heavily involved in the project if it was sponsored by Protocol Labs.

Yeah! That's a thing we're working towards. We're currently looking into apt and npm, both efforts are coming along pretty well and driving development towards fixing the bottlenecks preventing it from "just working".

https://github.com/ipfs-shipyard/npm-on-ipfs https://github.com/ipfs-shipyard/apt-on-ipfs

In addition to apt and npm, I would like to see docker image distribution powered by IPFS. It really feels stupid to pull images from a central registry sitting on the other side of the globe when the image is already present in the next node in your kubernetes cluster.

There is a big issue, that ipfs uses its own hash mechanism (hash of protobuf of dag), while registries (and most other existing content based distribution mechanisms) use sha256 hashes of the whole content.

eg see https://github.com/ipfs/notes/issues/269

So you can't simply interop between the two without some sort of lookup to convert hash functions. It is hugely frustrating, as basically different content based distribution mechanisms cant work together.

In theory docker image registries support pluggable hash functions, although it is not clear to me that the ipfs function is even very well defined outside its own code. We could start to add a second hash calculation to every registry operation, but it would be a performance hit which some users would not like.

(tree based content hashes that allow parallelisation are nice, but the ipfs one is very ipfs specific and more complex than it needs to be I think).

If it's for your own cluster, uber open sourced a docker p2p registry explicitly for use with clusters.


That would make many things much easier, and I would enthusiastically use it.

Discovery performance is the biggest issue I see. If I deliberately load the same file on a couple of peers, it can take hours (or forever) to be able to find a peer with that file to pin it. It is clumsy and difficult to explicitly connect to peers (because you can't just try to discover peers at an address, you need to include the node ID as well), and even if you manage to enter the right information, you won't necessarily succeed at connecting to the peer the first time.

It's nice to see #2 for package managers, something I've been thinking about recently. I haven't look much into this yet, but I wonder if IPNS could provide a step forward in supply chain protection since package signing isn't available yet in certain managers/repos or not commonly utilized.

I love the idea of IPFS, but I can't think of a use case not covered by torrents.

Would someone mind enlightening me regarding what sets IPFS apart from torrents?

One practical difference is that all the parts of a "collection" are individually addressable in IPFS.

For example, unlike torrents, you can seed a collection like "My Web Show (All Seasons)" and add new files as new episodes become available. With torrents, you have to repackage them as new torrent files. IPFS also then encourages file canonicalization instead of everyone seeding their own copy of a file.

This is a over simplification I think. In IPFS to add a file to a folder, you need to rebuild that folder which changes the address. You still need to get people to access the new address to see any updates. IPNS makes that fairly easy, but a similar technology could be made for bittorrent.

I think what makes IPFS interesting is that all files are like torrents and all folders all like torrents of torrents.

And since each torrent is a hash if the file underneath it, if 100 people individually add files or folders that contain identical chunks, then without explicitly doing anything the are so helping each other share those files.

In traditional torrents, the files are concatenated and only then divided into chunks. So if I take a existing torrent and add a single 16-byte file to the beginning, there is a good chance it will have no common hashes between new and old one.

Update: There is apparently "bittorrent V2" protocol [1][2], which allows file sharing. It is still not implemented in major clients, like libtorrent[3]

[1] http://bittorrent.org/beps/bep_0052.html

[2] https://news.ycombinator.com/item?id=14951728

[3] https://github.com/arvidn/libtorrent/issues/2197

Not quite. They would need to use the same file and the same chunker. Two people can add the same identical file and share zero hashes. Files are broken down into chunks and those chunks are hashed I to a merkle tree.

I was at a place that tried to distribute their files via bittorrent. I wasn't there for the initial implementation, but have dealt with it after it was used.

The data was immutable, so we didn't have that use-case. The tracker software we were using (one of the often used open source C++ ones) seemed to handle a couple hundred torrents just fine, but couldn't handle tens of thousands. Even if only a few were active. I'm not sure if it was excessive RAM or high CPU, but they built a wrapper tool to expire and re-add torrents as needed. I think technically it was limiting the number of seeds (from the central server) for different torrents.

There was also a lot of time/overhead in initiating a new download. This was exacerbated by the kludge mentioned above. Client would add the torrent, you would trigger a re-seed, then the client would wait awhile before checking again and finding the seed. Often this dance took much longer than the download itself.

Think of torrents that you can update: You have the magnet link for the one version you are downloading, but also you have available the magnet link for the current version so the uploader can update at anytime and you would receive the update. And if both versions share some pieces, then people can share them across both torrents, and any other torrent that happens to have a piece with the same hash.

I would love to try that, can you share a quick and easy breakdown of how I would do publish that?

ipfs add -r mydir/

(Add a file to mydir/)

ipfs add -r mydir/

That's it, two different hashes, different contents, but intelligently deduplicated so you only need to download the diff if you already have the files in the former.

That's not much different than changing my files, making a new torrent file with a small blocksize and having users use that.

What is the benefit here?

That the old seeders can seed the new stuff without knowing about your new torrent.

Main feature is automatic data-sharing between distributions. With torrents, everything is siloed, and data is only exchanged between peers of that torrent. IPFS doesn't care /why/ you're getting information or the link you found it from, just that it can find it by its hash.

Say you distribute "Julie's Webcast Complete Series" and somebody else distributes "Julie's Webcast - Episode 3, with Russian subtitles," peers and seeders from both distributions can share data for the shared content. Similarly, updating a dataset only requires downloading the new data.

This is done automatically, both per-file hashing and (optionally, not sure the current state) of in-file block hashing.

> Julie's Webcast - Episode 3, with Russian subtitles

> peers and seeders from both distributions can share data for the shared content

So does IPFS have "plugins" for different archive/container formats so it can "see" that the underlying video/audio streams are identical between "Julie's Webcast - Episode 3.mp4" and "Julie's Webcast - Episode 3, with Russian subtitles.mkv"?

Otherwise container stream interleaving will play holy hell with any sort of "dumb" block hashing :(

Last I checked it was dumb. Possibly breaking block boundaries based on a rolling hash.


> go-ipfs-chunker provides the Splitter interface. IPFS splitters read data from a reader an create "chunks". These chunks are used to build the ipfs DAGs (Merkle Tree) and are the base unit to obtain the sums that ipfs uses to address content.

> The package provides a SizeSplitter which creates chunks of equal size and it is used by default in most cases, and a rabin fingerprint chunker. This chunker will attempt to split data in a way that the resulting blocks are the same when the data has repetitive patterns, thus optimizing the resulting DAGs.

I think they should use the rolling hash based chunking by default


this is an implementation detail of the DHT client. if you have enough cooperating bit torrent clients set up to seed a sparse swarm like IPFS does, you could do the same thing.

which begs the question, why fork the DHT in the first place? there are BEP drafts that cover all of the features that IPFS (and DAT for that matter) bring to the table.

my guess: there isn't a lot of money in making yet another bit torrent client.

Yeah but you can't add an episode to the torrent later and have all existing peers seed the old episodes in the new torrent automatically.

you can only do that with IPFS keys that are aliased with IPNS which is equivalent to a BEP-46 mutable DHT key.



One of the biggest challenges with IPFS in my mind is the lack of a story around how to delete content.

There may be a variety of reasons to delete things,

- Old packages that you simply don't want to version (think npm or pip)

- Content that is pirated or proprietary or offensive that needs to be removed from the system

But in its current avatar, there isn't an easy way for you to delete data from other people's IPFS hosts in case they choose to host your data. You can delete it from your own. There are solutions proposed with IPNS and pinning etc - but they don't really seem feasible to me last I looked around.

This list as @fwip said is great as a wishlist - but I would love to see them address some of the things needed in making this a much more usable system as well in this roadmap.

> But in its current avatar, there isn't an easy way for you to delete data from other people's IPFS hosts in case they choose to host your data.

If you put it on IPFS, it's not "your data" any longer. If that doesn't work for you, then don't use IFPS.

Edit: I do get why people are concerned about persistence of bad stuff. But it's not at all unique to IPFS. And even IPFS forgets stuff that nobody is serving. I mean, try to find these files that I uploaded a few years ago: https://ipfs.io/ipfs/QmUDV2KHrAgs84oUc7z9zQmZ3whx1NB6YDPv8ZR... and https://ipfs.io/ipfs/QmSp8p6d3Gxxq1mCVG85jFHMax8pSBzdAyBL2jZ.... As far as I can tell, they're just gone.

If IPFS becomes famous to the point governments have to look at it, and that it allows to bypass laws, they will try to forbid by law people to run nodes, same way with Tor in some countries.

Try and succeed are different things. The larger the use case set, the tighter the integration with everything else, the stronger the reliance on it, the harder it will be to outlaw it. Often the laws drift ever-so-slightly to accommodate the new reality.

> Old packages that you simply don’t want to version

It’s important to think of IPFS as a way to share using content hashes - essentially file fingerprints - as URLs. Every bit of information added is inherently and permanently versioned.

This is a tremendous asset in many ways, for example de-duplication is free. But once a file has been added and copied to another host, any person with the fingerprint can find it again.

While IPFS systematically exacerbates the meaningful problems around deletion that you describe, they are not unique. Once information is put out in the world, it’s hard to hide it.

> It’s important to think of IPFS as a way to share using content hashes - essentially file fingerprints - as URLs.

That's not at all unique to IPFS though - in fact, this is what the ni:// (Named Information) schema is supposed to be used for https://tools.ietf.org/html/rfc6920

(Depending on whether the hashes being used are properly filed with the NI IANA Registry, some IPFS paths might already be interconvertible with proper ni:// format, though with some caveats. sha256 hashes are definitely supported in both, though ni:// does not use the custom BASE58BTC encoding found in ipfs paths. Moreover, ni:// does not standardize support for file-level paths as found in ipfs, but does support Content-Type, which ipfs seems to leave unspecified. Files larger than 256k in IPFS are a whole other can of worms however, as you apparently lose the ability to lookup by sha256 hash of the whole content, and thus to properly interoperate with other mechanisms.)

Also, nitpicking but a content hash defines a URI not merely a URL, since its use is not restricted to looking up resources over a network.

Managing universal data removal is not universally solved (or even wanted) on internet scale. So it sounds weird to demand it from technology which is trying to solve completely different problem.

A best-effort deletion could be beneficial for any node. It reduces storage requirements a bit.

A way to mark something as deleted, in a form of another ipfc object, could serve as soft delete, and also a permission to actually delete hunks of the object marked as deleted.

This, of course, is not secure deletion, and should not be.

>A best-effort deletion could be beneficial for any node. It reduces storage requirements a bit.

As far as I'm aware, any node is free to delete its own data. In fact, isn't only storing data at the user's explicit request in the first place? It just can't do anything about what other nodes choose to keep or delete. If you're referring to a particular node wanting to delete do a best-effort deletion of data on other nodes, it's not clear to me why node A cares about reducing storage on nodes B, C and D (if you own those nodes, delete it yourself; if you don't.. then I don't know what you're up to)

The same argument applies to self-driving cars, crashing into objects on the road is not a problem because they are trying to solve an entirely different problem.

What you actually want is to break the universe. It is physically impossible to revoke information unless it happens by a strange coincidence. 24x7, you emit information that races away with the speed of light. You can't chase it down. Physically.

On the other hand, if you can decide which information stays, you essentially own the system.

So, given the project's mission, I guess it is a requirement that information stays online as long as someone somewhere is willing to keep it.

I doubt there will every be a way to delete content as every legitimate method of deleting will be commandeered for censorship. Even if they did add something you can never really know the other nodes actually deleted it.

Sometimes, censorship is good. For a silly example, if somebody somehow filled this comment section with images of goatse, it would be nice if we could take that down.

I definitely agree from a technical level, you can't ever guarantee the deletion of files on somebody else's machine.

So the question is, how do we build systems that enable users to protect their communities, without them becoming yet another tool for abuse?

I don't see how that would be a much different problem to now.

If I put goatse in this comment, others can see it. Then mods will remove it.

If it were on IPFS, the same would happen except the old version may still be accessible if people are still distributing it.

You'd only come across it if you were deliberately looking at an older version.

Why shouldn't that be handled at the application level? Just like with git, if there's content you don't want anymore, you orphan the block hashes in whatever structure the application uses to store and display content; IPFS nodes could still store the abusive comments or media, but nobody would find it unless they were looking for it or randomly fetching blocks.

If the ipfs-based application doesn't have some means (by a group of administrators, or by community consensus) to orphan (de-link) user-contributed content that's abusive, then that app needs to be improved. It doesn't seem like an IPFS-protocol-layer problem.

You could add another layer on top of IPFS, something like IPFS-O (ownership) or IPFS-C (censorship), which used either a separate DHT or a centralized service to allow people to register never-before-seen block hashes by generating a keypair and uploading a signature of the block hash. If the content later needed to be removed, the signature could include contact information, and the signer could be appealed to (or sent a court order) to sign a removal message. It wouldn't really work, though. Nobody would run ipfs nodes paying attention to such a meta-service. And if the service were centralized, there would be grave concern over censorship by whatever entity ran it. And people could abuse the service by registering data blocks that haven't shown up in IPFS yet, claiming ownership when they don't really own it. You'd have to go to the courts to resolve that, the courts could issue an order to a centralized operator, if it's centralized, but again, nobody would run IPFS nodes respecting such a centralized censorship-enabling service, so the whole thing would be futile. And if the censorship service were decentralized, it could be sabotaged by enough libertarian nodes refusing to store or pass along removal messages.

> For a silly example, if somebody somehow filled this comment section with images of goatse, it would be nice if we could take that down.

So maybe don't build a Hacker News clone on top of IPFS? It's not meant to be to be a solution to every problem.

If soft delete is ok there is no problem. But should it be part of protocol or application layer is another question.

Removing comments from a comment page is not deleting content, it's changing it.

This page would be an IPNS address under the admin's control which would point to some IPFS hash representing the current goatse-containing state of the page. The admin would then create a new page, which would get a new IPFS hash (since it's new content) and point the IPNS address to it.

As it turns out, you cannot ever really delete information, you can simply change where your "well known" pointers point to.

> For a silly example, if somebody somehow filled this comment section with images of goatse, it would be nice if we could take that down.

Or you could just not look at it.

> One of the biggest challenges with IPFS in my mind is the lack of a story around how to delete content.

"One of the biggest challenges with [HTTP|ZFS|TLS|USB|ATX|VHS|USPS] in my mind is the lack of a story around how to delete content."

If you delete your copy of some data, someone else may still have theirs, but then it's them who controls whether to delete it. It's not a challenge for Serial ATA that it doesn't have a function to delete certain data from every hard drive in the world at the same time. Most systems don't work that way, not least because it's inherently dangerous.

The inability to delete things is a feature of IPFS.

Do you want yet another way of serving content that is subject to censorship and ridiculous content takedown policies?

That's not an issue, that's a feature!

Just watch how fast warez people will start to use IPFS to host all seasons and all episodes of Friends.

why not encrypt it? that way it's junk data to anyone that can't decrypt it

What happens when that encryption method becomes obsolete and broken? Now this private encrypted data is accessible all over the world.

Data should never be deleted. Ever.

Start by accepting that, and everything starts to make sense.

Child porn? Video of you in the bathroom that someone took from a hidden camera? Your stolen financial information?

Sure, of course.

But the problem is that if you can delete that stuff, someone else can delete stuff that you don't want to be deleted.

If you don't want something deleted, you can always keep a copy of it yourself.

Exactly. Which means you can't delete your bathroom video from any networked system. See the Streisand effect.

That is _literaly_ what we already have. If you delete (unpin) content, others are free to keep it (pinned).

The OP was suggesting a way to delete things from other peoples machines.

Yes, but only if you manage that before it's gone.

No. It's unreasonable to expect they could be deleted.

That’s just a dogmatic statement, you need to motivate it.

It's unreasonable to expect to be able to delete data that's been released publicly. Any attempt to delete arbitrary data will either fail or involve extreme authoritarian measures.

Once you accept that, you can focus on reducing the output of compromising information, rather than trying to erase it after the fact. Prevention over cure. This will inevitably lead to a society where people do more of what society wants, and less of what society doesn't. This is a good thing.

Also, I feel like we don't collect as much information as we should. Analytics is a lot less comprehensive than it should. Other than CCTVs, real-world analytics is basically non-existent. This greatly inhibits progress in AI/ML.

How does “should never be” follow from “difficult in practice”?

I have a question for the IPFS people. I am a non-techy who really likes the IPFS idea and wants to see it succeed.

However, whenever this topic comes up here at HN, we get a bunch of people who say they tried to use it but it was basically unworkable, like too much RAM usage and various sorts of failures. And rarely does anyone respond by saying that it is working just fine for them.

So my question to the IPFS people is, when is it going to get really usable? I am asking for something reasonably specific, like 2 or 3 years, or what? And I am supposing that would mean a different promise/prediction for each main different use case. So how about some answers, not just "We are aware of those problems and are working on them"

As you seem to foresee - being "really usable" depends on the use case. The one we're focused on this year is package managers - and making IPFS work really well for that use case in particular. There is lots of room for improvement on performance and usability - setting the package managers goal gives us really a specific target to focus and deliver on. This won't solve "all the problems" (there's a lot to solve for package managers alone!) - but will help us take a big step forward in production readiness and hopefully knock out a swath of performance issues experienced by everyone.

I've been considering Swarm distributed file system because of its closeness with the Ethereum development.

It seems to do the same thing and works already but hardly gets any press. IPFS and the Protocol Lab's Filecoin sale seemed to generate a lot of marketing despite it becoming clearer later that Filecoin is for an unrelated incentivized network.

It is hard understand the pros and cons of choosing to use IPFS over Swarm, or where they are in comparative development cycle.

I know many decentralized applications that opt for IPFS for their storage component, and know of the libraries to help different software stacks with that. But I can't tell if it is right for me, versus the state of Swarm.

Swarm and IPFS together with Filecoin try to address the same problem - persistent data storage in a decentralised network.

Swarm is not at all "working already" - the incentivisation layer for nodes to store data for other users is not implemented and currently mostly theoretical and work-in-progress.

IPFS is more mature in comparison to Swarm, but the underlying architecture is rather different.

What is Swarm's intended incentivisation layer and where can I read about their plans? It seems like all documentation including plans are outdated, and I was ignored in their gitter chatroom where devs wanted to talk about dev things and outreach people seemed nonexistent.

I see things being stored on Swarm without incentives, like plain text

Swarm documentation might not be perfect, but it is not outdated - https://swarm-guide.readthedocs.io

I believe the chapters about PSS, Swarm Feeds, ENS, Architecture, among others, are mostly up-to-date.

You can read about the incentivisation layer at https://swarm-gateways.net/bzz:/theswarm.eth/ethersphere/ora...

Currently incentivisation is not integrated or implemented in Swarm, so a user has no guarantees about what happens with their uploaded content. If the node hosting it disconnects from the network, it will be gone. The plans to address this are through the sw^3 protocols suite and/or erasure coding.

Regarding plain text - it doesn't really matter what bytes you store in Swarm - encryption is implemented and you can store non-encrypted or encrypted bytes, this has nothing to do with incentives for persistent storage.

We try to do outreach and answer community questions when possible, but the team is not big and this is currently done on a best-effort base, we could definitely improve on that front, I agree.

Does Swarm’s closeness with the Ethereum means it will be unsuitable for any tasks which do not rely on financial incentives?

For example, if in the future debian is moved to IPFS, then many organizations are likely run local IPFS servers with Debian repos pinned. But if debian is moved to Swarm, I do not think that many organizations will be incentivized - the money are insignificant in the total spending, while engineering effort and organizational overhead (finance) is likely to be very big.

Why do you think that it would be different?

"If Debian is moved to Swarm, then many organizations are likely to run local Swarm servers with Debian repos pinned"

Because from what I understand, Swarm requires Ethereum and a constant money flow. And the monthly amount of money is hard to predict. In many places I worked at, anything money related has huge overhead from finance department, and something more complex than “$X/mo” will require an immense amount of coordination and permissions.

What's the difference between DAT and IPFS? I'm trying to understand all these new technology with a grand aspiration to replace the current infrastructure.

https://ipfs.io/ https://datproject.org/ https://beakerbrowser.com/

So I can store a file in ipfs by its hash, but there’s no way to link to the next version of the file. I can only link to older versions?

I’m a giant advocate for decentralized architectures but so far I’ve never found a use for it that doesn’t rely on a centralized way to find out about new data

I'm not super familiar with ipfs, but I _think_ ipns is supposed to solve that problem.

> Inter-Planetary Name System (IPNS) is a system for creating and updating mutable links to IPFS content. Since objects in IPFS are content-addressed, their address changes every time their content does. That’s useful for a variety of things, but it makes it hard to get the latest version of something.


It doesn't have to be 100% decentralized to be useful though. A central server could be in charge of linking the hash to the latest ipfs directory of files. (Or they could use ipns, but let's ignore it for a moment.) But then all of the files are available from a decentralized cloud of users, from everyone running ipfs and hosting the content. And if the central server giving the latest hash ever goes down, people will still be hosting the files and people can find the latest directory hash elsewhere.

A file in ipfs is stored/addressed by a hash of its content.

If you know the hash of the next version and put that in the file, then that hash will affect the hash/address of the file itself.

But the hash of the next version depends on the hash of the _next_ next version, and so on out to infinity... and you almost certainly don't know all those hashes, so you can't compute the hash of the next version, so you can't compute the hash of the current version, if it must contain the hash of the next version.

This is exciting. I would love to make it to the 1st IPFS Camp in June.

IPFS is a joke. They have name lookup feature but relies on traditional DNS! What are they thinking?

Also, if the IPFS's idea of working as local server is sound, BitTorrent DNA(browser plugin, steaming video over BitTorrent) should had been worked.

It seems to me, they suffered NIH syndrome. They tried to reinvent the wheel. The P2P file transfer protocol over IP has already been covered by BitTorrent. What we need is a nice front end which use BitTorrent protocol as back end and offer a illusion of Web site.

> IPFS is a joke. They have name lookup feature but relies on traditional DNS!

This is not a criticism. That's describing a feature. Yes, they do have that. You could implement your own name resolution in a different way if you need that.

The real innovation is making files content-addressable.

Don’t magnet links solve that for torrents?

Hate to break it to you, but Freenet has been doing that since the year 2000.

It's amazing how many things that are popular now are simply re-discovery of Freenet features.

On the top of that when you ask them about it the answer usually is:

- but this is decentralized

- it is not a bug|lack of implementation but a feature

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact