This is also, in my opinion, the biggest objection to IPFS - that it really doesn't necessarily lead to any kind of true decentralized hosting unless someone else has decided to pin your files.
And why would they? I understand that IPFS is supposed to prioritize connections based on seed/leach ratio like a typical torrent service, but I only torrent a few files at any given time and it's pretty trivial to set them to seed or disable manually; there's no way I'm going to make seed vs. don't seed decisions on every website I visit. So unless for some reason I specifically think to help share some webpage or other, it'll just get auto-deleted from my machine when it cycles out of the cache, as the automated stuff is what'll be keeping track of maintaining my seed/leech ratio, total disk usage for other people's content, upload limits, not leaving if you're one of the only seeders left, etc. In theory, IPFS could be even more prone to link-rot than the vanilla Web, depending on how many people try and actually depend on the decentralized hosting and end up having their files vanish once it's been a long enough time that nobody's still hosting them.
And when there's so many different webpages, as opposed to just a few torrents, why would I think to pin any one thing in particular? Torrents can work based on charity and the need to maintain a particular seed/leech ratio; but I don't have the mental energy to bother deciding whether to be charitable about every dang website I visit.
So this ends up meaning that IPFS works for popular, recent content, where there's enough people who have downloaded the content themselves recently enough that it's still in their cache to meaningfully take the load off the original host in serving that content. But you're not going to get dedicated long-term seeders of any particular site the way you do with highly-desirable files like pirate torrents. But generically, you will always need your own centralized server for any content you want to upload and make sure stays online long-term.
As I understand it, this is sort of the problem Filecoin is trying to solve, but that has its own issues (it's hard to see how paying people to host your stuff on their own machines can ever be cost-competitive with paying AWS to do it).
> it really doesn't necessarily lead to any kind of true decentralized <data storage> unless someone else has decided to <store your data>
You might have an overly romantic idea of decentralization that doesn't necessarily align with the actual definition of decentralization. I would even argue that there is no solution for the idea you're suggesting. You can't have data that isn't stored anywhere.
The permanent bit in ipfs is actually referring to something else: The data isn't guaranteed to be available at all times, but the link is guaranteed to point to the correct data. Slightly anecdotal, but a while ago there was a discussion about a pdf in #cjdns. Somebody had an ipfs link, but nobody was seeding it anymore. A few hours later somebody digged out a pdf from an old archive but wasn't sure if it's the correct one, so we ran `ipfs add file.pdf` and the ipfs link started working again.
So the file was lost until it could be recovered from a different source and IPFS served the function of a glorified hash database, that's something very different from what it promises.
Bittorrent has the same mechanics with magnet URNs, but you never see them promising permanent storage.
To be fair, IPFS does not "promise permanent storage" either.
I have torrent files that point to the correct data but it's impossible to obtain because the data they point to isn't being seeded. But just like in IPFS, if I have the correct file I can seed it again simply by adding it to my torrent client into the torrent job.
And then someone will have to put in the money to pin/host the content anyway if you want to keep it online.
Link rot is solvable with tools such as WARC Proxies.
You think these 20 people will stay offline forever?
Torrents die all the time because nobody bothers to donate bandwidth to strangers for content they barely care about.
How is IPFS different?
I am aware that IPFS does not actually do this, but a lot of people talk about it like it does, like IPFS means you can have websites without really needing to have/pay for any server yourself at all. (Which is, yes, technically sort of true, but only if visitors to your site reliably pin its content.)
I was thinking about that content effectively "linkrotting" away if nobody visits it for a while, say it's old posts on a blog or something. I suppose how big of an issue this really is comes down to how long stuff would tend to stay in that cache in a real, practical usage scenario. I guess I'd been assuming that it'd only be day or so, but it could be longer in practice, depending on how much space is actually allocated to the local cache and how much each individual bit of unique content takes up and how many such unique chunks of content will be downloaded per day, along with maybe some more complex factors besides first-in-first-out for deciding what to collect.
(I elaborated a little more about this below, when I thought about it kinda working like an automated torrent manager that made sure you maintained a decent seed ratio, didn't exceed an upload data cap, etc. and shuffled stuff out when it was taking up too much space or had hit a target ratio and wasn't going to be seeded anymore, with some prioritization for "is there anyone else seeding this", "is this a highly-demanded bit of content", etc.)
Plus, even if almost nothing is going to really be truly dcentralized, it could make hosting a site a heck of a lot easier because the traffic you personally have to deal with may be drastically reduced, as popular content gets uploaded to the first users who can then share with later users. You only have to be the exclusive host of content that's accessed more rarely than the period it'd typically stick around in the cache of the last person to ask for it. (depending on whether or not that last person is currently online.)
(It's easy to imagine a case where short-term decentralized hosting from user caches could be almost entirely sufficient - say, an IPFS-based version of 4chan, where threads are inherently short-lived and temporary objects anyway, and only in the very slowest and least-populated boards would you have threads that are checked so infrequently that you couldn't rely on the currently online user caches to contain that content. You'd still need a central server to do things like spam filtering, user authentication (to make bans stick), and update the index of what files hashes and post hashes are in which threads and in which order, and which threads are currently alive on each board)
For my part, I think it's more romantic that individuals in the network might affirmatively choose to pin the content that is meaningful to them.
If you want to host your website on IPFS, pop it up on a domain through Cloudflare and set up a bid for storage space.
(There's been some progress on getting proof-of-replication working, but it's still early stages and I haven't seen anything on proof-of-spacetime.)
Also, the economics of FileCoin don't make a lot of sense ... https://blog.dshr.org/2018/06/the-four-most-expensive-words-...
I just dug it up and they say Proof of Replication is going great. But don't show anything, they promise to open the code in the 'coming months' so people might see what progress has been done if any.
No progress was mentioned for Proof of Spacetime.
Edit: The link to their update, the only substantial one since the ICO last year I think... https://filecoin.io/blog/update-2018-q1-q2/
They wrote a paper describing a plausible system that - if it existed - could do a thing. They've been much less successful actually finding a way to implement it.
You can probably do it by mixing the data with some extra data so replicated copies actually look different so you have to do proof of possession on each replica, and you dont even know they are replicas.
GP suggests making each copy unique. It seems to me that the difficult part is making it cheap for uploaders, verifiers and downloaders to translate between the original data and the various unique copies, without also making it cheap for storers to do so (otherwise they could just store the original data and generate parts of the copies on the fly when challenged).
A similar problem arises in memory-hard functions used for password hashing, such as scrypt and Argon2. Those functions are designed to ensure that you have to use a large amount of memory to compute the function - or at least, to ensure that a space/time tradeoff that allows you to use a smaller amount of memory is very expensive. I wonder if techniques from memory-hard functions could be useful in proof of (unique) storage?
ipfs:// on the other hand I hear most frequently handwaving away the problems like "who will pin unpopular content" or "how far does it scale" or "who will pay for gateway bandwidth once it gets big" or even "how to make content easily update" (I don't consider IPNS a good solution to that question)
I blogged about my experience with dat a couple of months ago. Here's a link if anyone's interested. https://hannuhartikainen.fi/blog/dat-site/
Since then I've used dat for copying files a couple of times. I haven't really browsed the dat:// web and I'd guess nobody has visited my website over that protocol (but then I don't have analytics and estimate my blog to have a dozen visitors a month).
Electric cars provide the capability to power a car with renewable energy. Powering the grid with electric energy is a separate problem.
IPFS provides the capability of distribution. Incentivizing pinning is indeed a separate problem, but providing this capability feels like a huge step forward to me.
To be fair, some of this is automated. If you browse using your own node, it does maintain a small cache, which helps with the load distribution when something gets really popular. This blog post is a good example; despite literally being the #1 item on HN, it's still up; that's the global IPFS cache at work. It's slower than the average site, sure, but it didn't outright vanish, even though the author clearly is using minimal resources to actually host it. I think that's awesome.
I thought the last 20 years on the Internet have shown us that "humanity is noble, and good" isn't true, and we should assume "evil and/or lazy".
So yes, they will "unpin any site as soon as it becomes a resource drain" unless they're feeling particularly charitable - but that's exactly how it works with torrents, too! Generally you set up rules like "seed until a maximum up/down ratio is reached for that file, then don't bother", "restrict uploads to a certain maximum speed and/or certain maximum amount of data per day", that kind of thing. I figure you'd basically have an automatically-managed cache of limited size that you'd use to hold the stuff you were liking-to-pin, and stuff that was judged to be no longer worth seeding would get kicked off, as would old/unpopular content if the cache had filled up and you were shuffling in new content.
So unpinning anything that becomes a significant resource drain is also - at least in the relatively-short-to-medium-term - fully compatible with other people being willing to "like" your site to pin it, if not perpetually, than for an extended period of time.
so you're disincentivized from using the service
It's a huge oversimplification to describe people in just those terms. People are complicated and so are their motivations, and it's hard to tell in advance which new things will be successful or not.
For example, I once worked for a guy who hadn't heard of open source and, when I described the concept to him, couldn't wrap his brain around why so many people would take the time to write and maintain a bunch of software and then just give it away. This is not the work product of "evil and/or lazy" people.
We know these two things happen, consistently. We have failed to account for them in much of the foundations of the Internet - it's the assumption of benevolent cooperation. Which was great for ARPAnet, but simply doesn't work in the wild. We have 20+ years of proof.
We cannot continue to put our heads in the sand and ignore that, because for better or worse, the Internet has become a major force shaping society.
The idea that naivete translates into mental health is certainly fascinating, but I believe the colloquial translation for that is "ignorance is bliss".
If most people use smartphones to consume content on metered internet connections with limited battery life, where do they pin it such that it's available to others?
(around 65% of Americans between 18 and 50 have PC/Mac laptops- see https://www.statista.com/statistics/228589/notebook-or-lapto...)
Here's a graph of average hours of computer usage per US household in 2009:
Let's assume these figures are useful as a first approximation of the number of hours of laptop usage per laptop owner in 2018. I'm not saying it's a perfect match, just that they're likely to be better than figures you or I would pull out of the air.
Taking the centre of each band, assuming "more than 10 hours" means "10 to 16 hours", and excluding the "no computers" band, gives a mean of 4.3 hours per day. So each laptop is active roughly 18% of the time.
That's more than I expected, but it still means you're going to need a lot of people to pin the content before you have a 99% chance of at least one copy being online at any given time.
Edit: As far as I can tell, with 18% uptime you need 15 copies for 99% reliability (assuming no correlation between the online times of the various copies, which is optimistic - in reality there will be strong daily and weekly cycles).
Having said that, I think IPFS sounds like a really good idea -- oddly, it's a lot like what I envisioned in a science fiction novel, where the notion of data being replicated across hundreds of storage nodes is so taken for granted that the main character has trouble conceiving of the notion of data that has a "location" (that is, is stored on only one device). But to get there, it needs to be largely content-agnostic: if the data is out there on the network, then it's replicated across those hundreds of storage notes, regardless of popularity. IPFS proves that technology is basically already here in theory -- but in practice, I'm not sure it's feasible in terms of storage costs/requirements yet.
You could still get 'a hug of death ' but as the host you can just sit back and let the infrastructure work, distribute and recover, ala what it looks like to be the first seeder of a torrent. I predict this is unlikely in real world situations though, as for someone to share your site they'd likely have visited it first, thus your content host is likely to not be the only host. I'm an optimist, mostly because I think is stupid where we are at concerning self hosting from our homes.
The counterargument would be: there are cheaper ways to mine bitcoin than on aws infra. If you can incentivize that with money, you can incentivize storage with filecoin.
You also get censorship resilience (and maybe even better distribution if all goes excellently).
Perhaps not, technically, but it will turn hosting into a commodity. And it will remove any difference between centralized and decentralized from the user's point of view. Therefore, it's a huge win.
"Fast Data Transfer via Tor" has persisted for ~20 months. But "Fast Data Transfer via Tor (Methods)" was gone soon after I took the hosting IPFS node down.
You can either pay somebody to pin the files. Or a public Organization like "archive.org" decides that a site is important enough to pin it. The typical surfer will seldom pin something but can keep low freq special interest sites alive.
There is no fee lunch.
With HTTP it has become hard to mirror a modern site. In IPFS it is a built in feature. Makes a better Internet.
That's a big part of Amazon's AWS offerings, hosting other people's files on their own machines. Heck, Amazon could spin up IPFS host nodes and start earning filecoin for themselves. The goal is that file hosting becomes competitive, cheaper and more available. You won't have to choose AWS or DigitalOcean, you just throw some change at it and anybody can host it.
Because they're reading the blog, and are interested in the content - i.e. because next generation IPFS-friendly browsers participate in the pinning, as they should.
I mean, think about it: there is already a copy of this web page out there, multiply redundant, in the form of our browser cache. All it really needs for IPFS to be viable is for the browser vendors to give the user the means to make those cached files part of their contribution to the Internet...
But how do you solve NAT? Most end-user devices can't expose ports, and IPs are increasingly shared. Skype was uniquely successful there, but I doubt IPFS in its current form could make use of those tricks.
The solution to NAT is IPv6.
Your node blindly accept data, store it and upload it.
There are some P2P network implementations which did that and failed horribly bad on the end. A few abuser can DoS the network.
Oh and there are illegal data and numbers. I'm rather extreme freedom-of-speech advocates myself. But even for me, there are some immoral data I will never help spreading it knowingly.
I view IPFS as a better distribution network, not a better host - in contrast to current's internet structure, if many people request the same content from me, they might only need to contact my server once and then share it between them thus reducing overall load.
So the point of IPFS is not to store content for eternity, but to provide efficient distribution of content.
IPFS promises nothing, nor tries to, in the way of permanent archiving. It's about reducing congestion and gaining benefits of immutability. It could archive something in the sense that something popular is difficult to remove.. but that's definitely not a guarantee.
: ignoring aggressive non-sharing downloaders of course
It's just a mistake to view IPFS as allowing for truly "decentralized" websites or as a decentralized file storage platform - unless you have the kind of content that makes other people want to follow the typical torrent model and actively long-term "seed" it by pinning, you'll still need to have your own personal central server to host the content on if and when nobody else is.
Which, yes, IPFS has explicitly never promised that - but a lot of people seem to think it does.
From the IPFS web site:
Humanity's history is deleted daily
IPFS keeps every version of your files and makes it simple to set up resilient networks for mirroring of data.
This is clearly stating that the system keeps every version of your files, and it says nothing about "pinning", or the fact that the files will, in fact, not be kept. At best the web site is misleading, at worst it is simply lying.
I don't yet know with the IPFS really is, nor what it really does, but it's statements like that on the IPFS web site that makes me distrustful of the hype.
So I don't really buy into the hype-drama. So many people are concerned with hype.
Anyway, to your specific points - if you understand IPFS those comments are not entirely off board. However I can understand why they would lead people astray. In reality I see those comments, ie human history being deleted, as a reference to the mutable web. I can find a post on Reddit and today it is meaningful, tomorrow it might be deleted. In a general immutable system, if I reference an immutable address to the content I care about I will always find exactly that content. Whether or not it exists permanently is another issue, one that I don't care about honestly - I care that what exists can't change out from under you. Just by viewing data in an IPFS-like system naturally makes you own it, as you effectively download a copy of it. No one can take that from you.
Now, whether or not you decide to permanently hold onto the data you want is another story. But again, permanency is not likely to be "solved" by anyone.. and honestly, given how so much "content" can be illegal, I don't think we ever can or should solve the permanency issue.
It has not been reasonable to ignore such users for the last five years or so. Like it or not, the computing landscape has shifted to mobile devices which aren’t on all the time and have limited power and bandwidth. Perhaps BitTorrent is OK now, in its niche, but if it were to go mainstream for downloads you’d go from a small fraction of non-sharing downloaders to a high fraction.
Skype’s architecture is an interesting one in that space: it used to be distributed, with many computers all over the place nominated supernodes; but the shift to mobile made that architecture untenable, and so they had to shift to a centralised model, which generally performs worse, to keep it working at all—not for surveillance, but for scalability!
You're taking me the wrong way. I wrote that because I didn't feel like writing paragraphs going into explicit detail over the pros and cons of seed avoiders and how one might handle them, etc. It wasn't the point of the conversation I was trying to make, informing about the general design. BitTorrent works without you hosting the file, so does IPFS, that's the point. Nothing more, nothing less.
Which, I should have known better from HN, but /shrug. I guess in the future I need to be more explicit when I ignore a topic.
Pinning other people's pages that I've read seems like a reasonable way to scratch the same itch.
Its called FileCoin. And its the answer to what the grandparent says.
I don't know how IPFS works, but I'd like to contribute by setting up and forgetting an IPFS cache on the Internet to help the network grow.
It's not a configuration option though. You would either have to build up your own system using the ipfs libraries, or monitor the events from a daemon.
Unfortunately there is a bug that is not allowing the information required to be logged correctly in the events, so the actual cid of the content is not exposed.
If it where, you could look for the `handleAddProvider` or `handleGetProvider` operation through the event logs (`ipfs log tail`) and then inspect the object (`ipfs object stat`) to determine if you wanted to pin it.
> Ideally with some kind of mechanism to store poorly seeded data.
This would be a little more difficult, but you could attempt to query the dht to determine how many peers are already providers for the data (`ipfs dht findprovs`).
This is why IPFS (w/ Filecoin), Storj and others use cryptoeconomic mechanism design and incentive structures to incentivise hosting / pinning data.
I think of it more like email, email doesn't work either unless you pay someone to host or manage your email situation (or setup your own). It's like email in that the protocols,(multihash stuff etc) is more like an RFC describing how the system works, and thus standardizes how files are shared on the internet. Other tools (browsers etc) can build on those primitives to have a better system of caching and tooling doesn't have to keep re-inventing the algorithmic wheel in terms of how to do this stuff. Maybe IPFS isn't the address files by the hash of their bytes protocol we end up with, but I think it's likely that we will end up with something(s) that do operate in that way and unify the disparate methods of managing that stuff we use today.
So as for who hosts the files you pin, I would figure you'd pay someone to do that in the near term in the same way you'd pay for S3. In the future these providers could compete on price by offering features like p2p load balancing to offer cheaper prices, CDN integration for better perf. Then in a further off future something FileCoin can move it to a decentralized system (we'll see how it plays out and even if we don't get there it's still better I think than the smorgasbord we deal with now).
The cool thing is between all those transitions there should be minimal disruption in how your data is stored and accessible, and it should be common between a lot of different projects.
To me the big open issue is that I think IPNS isn't what I would do for what I call the labeling problem, there i'd go with something more like git type of a system. You really need a way to move the content hashes out of the urls if you want them to be human readable, then the urls themselves (which really probably ought to be urn's) should assert their own immutability, allow namespacing and versioning.
So I could say something like set-label 'mutable://blog.johnsmith.com' -> "content-hash|other-label" but have that action recorded into a blockchain or publicly accessible git repo like thing pointed at by DNS and have the name resolution system replacement be able to query that chain automagically when making requests (the browser for example is aware of this system and leverages it). Further for mutable urls (again probably ought to use urns) previous versions can be requested trivially with @version e.g. "mutable:blog.johnsmith.com@3" and so on.
We can do all this now ad-hoc with http headers etc, but it's all disparate and not unified like it should be, for example it doesn't tie in meaningfully with your file system. If if did then my file system on my computer at the label level (not the inode implementation level) would also use the same scheme. I write to a folder there, it automatically updates the label tree of that folder into the naming system and it's all consistent.
1: My hunch is that AWS/Azure/etc will have economies of scale too hard to beat in terms of the actual hardware but they may be eclipsed / acquire a startup that does the user facing implementation of this.
If I'm not mistaken there were already some attempts to connect it to a TLD hierarchy.
But imagine the normal thing is to give 1GB of cache to IPFS. Granted if a planet worth of 1GB cache's isn't enough to save your content then it's not interesting enough.
So if IPFS aspires to be the "permanent Web", but is virtually useless for preserving these, it hardly seems to qualify. With a few exceptions (for very large content that would require a ton of bandwidth to download and host, or to very complex content that requires preserving the structure of a whole large website to stay functional instead of individual pages, or for inherently server-side content), the kind of stuff that would be popular enough for someone to pin virtually never vanishes from the Web - even if links to it do break - because that's the kind of stuff that has almost certainly already been saved locally & reposted to other websites. And the stuff that's served unpinned from user caches is, very clearly, not permanent.
So it's a "permanent Web" ... which is, at best, barely less fragile and prone to linkrot than the current web, in that if content is popular at one point and then the site layout changes or the original host goes down entirely it can still be kept up at the same link if someone was smart enough to save it locally, and specific versions of a website can be linked to specifically (even if nobody's necessarily hosting them anymore). But in all other respects, it is exactly as ephemeral as the current Web, and the fancy decentralized parts of it that are different than the current Web are among the most ephemeral parts, while the option to still have the boring old fragile centralized-server solutions where you host your own content personally are the durable ones.