Right now it does none of those things well. The client chews through CPU and memory when seemingly doing nothing. If I try to download content, it is far slower than BitTorrent unless I go through a centralized gateway. If I add content it takes ages to propagate, making it utterly unsuitable as a replacement for a web server. There is no system to keep content alive so links will still die. The name system is byzantine and I don't think anyone uses it.
Unfortunately, they are now unable to pivot because they did that asinine ICO. The right thing to do is to give up on the FileCoin nonsense and build a system that solves a problem better than anything else, but that is no longer allowed because they already sold something else to their "investors"
Point being, they didn't just bite these pieces off randomly: they see a picture of how the internet _could_ work, and they're trying to realize it. If they can get it working, then boom! You have decentralized internet, and you also have a ton of bonuses that just fall out from this being the right way to do things: resistance to censorship, better archiving, reduced influence of web megacorps, etc. But you have to have it -all- to actually be better. The sum is WAY greater than the parts.
The trouble is, this statement:
> Right now it does none of those things well.
So I get what you're saying. To build user-adoption, they need to find a way to deliver an improved experience, not just an improved model that would be better if more people used it. But I object to the idea that the solution is to choose one of those things at the exclusion of the others. The whole idea doesn't make sense if they choose one. If I were advising them, I wouldn't tell them to reduce their scope in terms of "doing all the things", but rather reduce their scope in terms of doing all the things for the entire internet. They should find some kind of sub-network or community that gets extra value out of the decentralization, and prove out the concept there. Maybe it's a big company's intranet, or a network of (paging ARPA) universities?
There is no RIGHT way to decentralize the web. I don't think IPFS is the right way to do it either.
Tim Berners-Lee's Solid (https://solid.mit.edu/) offers a much more practical path to a decentralized web. The advantages with Solid's approach over IPFS is that:
Solid doesn't throw out what we already have, and recommend a new layer on top of the internet (example: ipns).
Solid handles access control which pretty much every application needs (encryption is btw, a poor substitute for access control).
Solid has the ability to revoke access, and delete data (very important).
It can work in browsers without extensions.
Solid is not muddied with talk of the Blockchain. It's disappointing that cryptocurrency has very nearly hijacked this space.
Solid is conceptually simple. You own a pod that has a unique address (using familiar schemes). You put your stuff on it and allow access to people; like DropBox but standards based. Companies can offer paid hosting services to run your pod - more space, bandwidth etc.
IPFS is not commercialization friendly. IPFS performance is unlikely to be great, ever.
Disclosure: I am invested in an open protocol similar to Solid, but simpler. So not entirely unbiased.
Although a reference implementation (https://github.com/webpods-org/podmaster) is kind of ready, there's no documentation yet (which will go up on webpods.org soon).
The only way to see the feature-set is to look at some of the tests. https://github.com/webpods-org/podmaster/blob/master/src/tes...
If you're interested, please email me. I'm looking for collaborators.
1. How is this meaningfully different to WebDAV?
2. Is the assumption that web apps export stuff to your pod from time to time, or actually use it as the primary storage? If the former, isn't it more or less the same idea as Google Takeout, if the latter how do apps handle the possibility of slow pods, outages or the need to use relational databases for storage? When building server side apps you do normally need tight control over storage.
Webpods is more like git than WebDAV. It allows apps/users to store data in logs, and the data can be records (strings) or files. If bob is syncing from alice, he'd pull all entries from the commit-id until which he has previously synced.
An app will store data in a pod (which has a unique hostname) such as instagram.jeswin.someprovider.com. Each pod can have multiple logs, such as "friends", "albums", "comments" etc. Logs have permissions attached to them, which control who can read those logs. There are similarities to WebDAV here, but again it's more like how we use git.
> Is the assumption that web apps export stuff to your pod from time to time, or actually use it as the primary storage? If the former, isn't it more or less the same idea as Google Takeout, if the latter how do apps handle the possibility of slow pods, outages or the need to use relational databases for storage? When building server side apps you do normally need tight control over storage.
Apps are expected to be local-first, though they aren't forced to. You'd write to the local database, and simultaneously sync with the pod. Similarly, if you're pulling data from friends, those would (most likely) be stored locally as well.
Slow pods are a problem, but I hope people would generally prefer reliable pod service providers. In the same way Dropbox gives you some guarantees of reliability. If the app is designed to be local-first, the user is not immediately prevented from using the app while the network is down; and syncing can happen once connectivity is regained.
Relational databases and schemas are not supported on Pods, it's just an immutable log. Most apps should do event-sourcing (https://martinfowler.com/eaaDev/EventSourcing.html), wherein they write to an event log. But this stream (of events) could be processed into an more easily queryable view.
Of course, this won't work for all kinds of apps. It works well for apps handling personal data or for collaboration tools; such as slack, project management tools, instagram, google photos, music collections etc. On the other hand, it's not a good fit for apps in which the data needs to be centralized. Such as ecommerce, banking, insurance, delivery services etc.
The specification naively says that the data is saved in interoperable formats. Sure, you can store your data in an interoperable data formats, suppose it is JSON, but it is of little use if the various applications do not know how to interpret and manage correctly the information contained therein.
It's been a while, have the applications improved?
I don't think we'll be able to avoid that hurdle; we'll need to make sure that the protocol is really simple.
But having to know the data format of the app itself is to be expected. If an app "instagram-on-solid" stores data in a certain way, the alternate app will need to understand those schemas as well to be compatible. That's how interop has always worked, even in the pre-internet age when we were exchanging files on disk.
> It's been a while, have the applications improved?
I haven't looked at apps in a while - but that was indeed moving very slowly.
To me this should be a separate codebase. And I see this in other features they have been including too.
I'm using IPFS quite heavily to store content generated on https://pollinations.ai but there is no way I could run it without a centralized node that I have control over at the moment because otherwise it would just be painfully slow and unusable.
I love the idea of content addressed decentralized data. I have no real use for IPNS at the moment and many of the other features of the core IPFS stack.
Very much an actively developed project, with over 100,000 monthly active users.
The landing page is a bit of a mess
The problem is that their architecture by it's nature takes all the quality related qualifiers out of those goals. Replication, but not rapid, access but not fast, access to data but not long term or actually resilient, storage but not large scale. So it's only advantage is if you value decentralisation above all other characteristics.
Anyone can register a lot of fake “seeder” for a file chunk making it hard to download that chunk.
Same problem for the name server.
If you want to use the Filecoin network as a "provider of last resort" for IPFS data, there's https://estuary.tech which will mark your data as verified, sort out the deals with storage providers, and then mirror it to IPFS.
There's also third-party tools like https://fission.codes/ , https://docs.textile.io/powergate/ , https://web3.storage/ and https://www.pinata.cloud/ for making this easier.
(Disclosure: I work at the Filecoin Foundation.)
We were looking at this at work the other day.
We noticed the storage price vs S3 saying "0.03% the cost of Amazon S3" and then someone (who's been trying to get adequate performance out of IPFS for a while) said "0.03% of the price, and even lower performance".
The modern, more economic web, wouldn't come until Netscape added form fields and cookies, at the behest of some of the original owners. And there were a ton of people at Netscape making these decisions about their vision of the future of the web. In-browser music listening wouldn't come until Macromedia, Disney and Microsoft pushed their vision for a "multi-media web"; browsers wouldn't build native support until much, much later.
So yes, we absolutely decided what the web would be about, and built technology to match that vision.
I'm as much of a HyperCard fan as (almost) anyone else, but that is almost certainly not where the term "hyperlink" comes from. Ted Nelson used the word "link" back in the mid 1960s, in the context of another coinage of his, "hypertext". The historical record is already a little unclear about whether or not he was using hyperlink that early, but by the time HyperCard came to be, the term was already differentiated from a "simple link", with some level of implication caused by the "hyper" prefix that it was most likely on another computer/server. The most HyperCard could offer was a link into a different stack.
The "hyper" prefix predated Hypercard, and it's meaning in the context of information processing/retrieval/presentation meant more than the majority of links that HyperCard offered (even though they were also great). Yes, I know that the wikipedia page on the word "hyperlink" claims that HyperCard "may have been the first use", but the cited reference for that claim offers no evidence for it whatsoever.
EDIT: here's a good summary article of pre-WWW hypertext systems from the 80s https://fibery.io/blog/hypertext-tools-from-the-80s/
The early web's competition was things like FTP, Gopher, and email-driven apps (e.g., Listservs, the Usenet Oracle). Plus paper-based stuff, like department phone books, mailing documents around, etc. It was hugely better than any of those for many common uses, so adoption was rapid.
Once you have a critical mass of users, then it can make sense to add other things in. But for that first audience, we can't be vague, selling some shining future that will happen eventually.
As I recall, one of the example CGI programs from NCSA presented a form to fill out a Papa John's order, which was then sent via the email-to-fax gateway. Which, now that I think of it, was indeed more "economic".
Cookies was definitely a Netscape thing, for profit making - a shopping cart for MCI.
- A simple system, designed from scratch, sometimes works.
- Some complex systems actually work.
- A complex system that works is invariably found to have evolved from a simple system that works.
- A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system.
John Gall, 1978
1. The web succeeded but many things whose backers made similar comparisons failed. Knowing that one technology had a big impact doesn’t say that a given unproven technology will be the next one to go big. It’s more likely that you’re looking at the next Groove Networks or something like that.
2. The web was immediately useful for many people and you could get started easily. IPFS has some interesting but far from unique properties and trying to be a network increases the amount of adoption and maturity needed for it to be worth using for most people. This is especially true for peer-to-peer sharing where the most useful participation requires up-front risk and costs which many people aren’t going to want to accept. Without that, it’s basically just harder to use web hosting which may or may not be cheaper.
Netscape was founded in 1994.
So, if you’re comfortable with either of those a starting point, 6 years is somewhere in the dot com boom.
IPFS on the other hand is a horrible "jack of all trades" that has mediocre performance even in the best of times, and it hasn't really gotten any better since it first launched 6 years ago.
And that's not even bringing up the cryptocurrency cohort souring the project with its stench.
I don't object to the existence of IPFS, rather I prefer more efficient and focused projects instead. Someone in the comment threads mentioned Solid, which sounds like a decent decentralized information protocol or system of sorts.
And for those that want censorship resistance... who can forget Freenet? That project has been around since 2000 and seems to do a pretty bang up job, even if the performance is not much better.
I can't see the future. I can, however, look at the world and think about what I see.
The Web was a hit because it was really obviously useful for real work from the moment of its creation. IPFS is BitTorrent with magnet: links, and the seeding problems that implies, and the rest is janky nonsense.
The web was fast (for documents on 56k) and extremely useful almost immediately. It was obvious to everyone watching that the technology was going to change everything.
14.4 baud iirc. You wouldn't call it fast.
I had a 14.4K Zoom modem, but I'm pretty sure the ISP I worked for around '95-'96 was buying lots of US Robotics 33.6 modems.
I agree that 56k came a bit later, and didn't necessarily work on any particular phone line.
Something made it feasible right then for anybody to set up a bank of modems in their apartment, to provide direct internet, and there was an explosive growth in small ISPs before they consolidated. At the time, I was kind of oblivious to the historic moment, but the one I worked for was literally a few modems in the closet of a crummy apartment downtown when I started and within months we'd moved to an office a few blocks away and were installing modems like mad.
I found this, not necessarily authoritative:
"In 1994 the National Science Foundation commissioned four private companies to build four public Internet access points to replace the government-run Internet backbone: WorldCom in Washington, Pacific Bell in San Francisco, Sprint in New Jersey, and Ameritech in Chicago. Then other telecom giants entered the market with their own Internet services, which they often subcontracted to smaller companies. By 1995 there were more than 100 commercial ISPs in the USA."
I think that was probably it - right then and there anyone could buy a pipe to the internet and connect some modems. It was around then that I heard the term "T1" which was a lot back then.
Maybe there was some connection to:
I don't know, but probably significantly better than a few years later, due to how vast an improvement 95 was.
There really, in my view, and in some of the magazine reviews I read back then, was nothing to recommend Windows 3.1(1) except if (a) you couldn't afford a Mac, or (b) you wanted to run software that wasn't available for one.
Suddenly, once 95 got traction, people promoting Macintoshes had to make excuses for the lack of memory protection or pre-emptive multitasking, on top of the high prices. And Windows just wasn't as godawful ugly any more.
But at the moment that ISPs were all sparking into existence, I don't think the wave had quite arrived. I mean, people were getting it and I do remember vaguely the initial version of IE, but 95 wasn't even the majority of PC users for a little while.
A browser for 3.1 that I kind of remember from those days, that lapsed into obscurity, was Cello:
Before that, the way to get software (for me, because I wasn't a college student) was to go to a local computer store and copy their disks containing free or shareware.
But everything changed about 1995.
"Is it about being a decentralized caching layer? Is it about permanently storing content? Is it about replacing my web server? Is it about replacing DNS? Is it about censorship resistance?"
websites at the beginning did not decide to be a caching layer - they decided to be websites. they did decide to permanently store content. they did decide to use web servers. they did not decide to replace dns. they did decide to be about censorship resistance.
now imagine you put up a website, put some time into it, and it may or may not be up at a random time in the future. not a product that's usable.
Torrent trackers solved this in a very interesting way. They created an economic system where bandwidth was the currency, incentivizing the permanent seeding of content. It was illegal to take more than you gave. I've even seen an academic paper studying their system!
Bandwidth as a currency eventually proved to be a failure. It enabled the rise of seedboxes, dedicated servers featuring terabytes of storage and connections to high capacity network links. Just like the IPFS centralized gateways you mentioned. They would eventually monopolize all seeding, removing any normal person's ability to gain currency. In some trackers, if you wanted to consume content, your only options were renting one of these seedboxes or uploading new content to the tracker. You always stood to gain at least as much bandwidth as the size of the content you uploaded. The seedboxes would monitor recent uploads and instantly download your new content from you so that they could undercut you. I suppose it was a form of market speculation.
They also failed to realize that there is no uploading without downloading. By penalizing leechers economically, they disincentivized downloading. This led to users being choosier: instead of downloading what they like, they'd download more popular stuff that's likely to provide higher bandwidth returns on their investment. Obscure content seeders would not see much business, so to speak, due to the low demand for the data. Users would stock up on popular and freeleech content so they could get any spare change they could. The more users did this, the less each individual user would get. Then seedboxes came and left them with nearly nothing.
This was eventually solved by incentivizing what was truly important: redundancy. Trackers created "bonus points" awarded to seeders of content every hour they spent seeding, regardless of how much data they actually uploaded to other users. These points can be traded for bandwidth. This incentivized users to keep data available at all times, increasing the number of redundant copies in the swarm. People will seed even the most obscure content for years and years. In some trackers, these rewards were inversely proportional to the amount of seeders: you made more when there were fewer seeders. This encouraged people to actively find these poorly seeded torrents and provide redundancy for them.
We can learn from this. People should be compensated somehow for providing data redundancy: keeping data stored on their disks, and allowing the software to copy it over the network to anyone who needs it. The data could even be encrypted, there's no reason people even need to know what it is. Perhaps a cryptocurrency could find decent application here. Isn't there a filecoin? Not sure how it works.
I'm not sure why everyone assumes that bittorrent was about currency. It wasn't, and that was its superpower. No matter how much you were able to contribute, you were able to strengthen its system, redundancy and availability.
The only problem bittorrent had in practice was a legal one that finally led to a problem of availability of discovery.
All that filecoin hype will lead to only one thing, and one thing only: the people that are popular will decide what files are worth to the system, and the masses will simply dump everything else and forget about it.
Add an inflation for every modification of any file in the whole system, and you have a perfect way to destroy real incentives.
I don't believe in most web3 projects because they always think it's about trading files with each other. It is not. Discovery and access to data should not be limited by your financial capability to buy things. The internet exploded because people got access to vast amounts of knowledge, for basically free compared to before.
Torrent trackers were and are to this day. The most successful trackers have proven to be those with a ratio economy. The late what.cd, often described as the library of alexandria of music, had the harshest ratio economy of them all.
> All that filecoin bullshit will lead to only one thing, and one thing only: the people that are popular will decide what files are worth to the system, and the masses will simply dump everything else and forget about it.
That's certainly a possibility. I don't know. I welcome discussion about this.
> Discovery and access to data should not be limited by your financial capability to buy things. The internet exploded because people got access to vast amounts of knowledge, for basically free compared to before.
Absolutely agree. Unfortunately, this is not possible with current copyright laws. This sort of utopia is currently only possible in underground networks such as private bittorrent swarms. It's been proven by history that some sort of incentive is necessary to get people to commit their personal resources -- storage and bandwidth -- to those networks. Ratio economies were created to address the leecher problem: users who simply downloaded what they want, without providing neither bandwidth or redundancy in return.
Reminds me of maker incentives in markets. Providing liquidity is sometimes paid for.
HTTP may be inefficient for document storage, but IPFS is inefficient for almost everything else.
I like the concepts behind IPFS but it's simply not practical to use the system in its current form. I hope my issues will get resolved at some point but I can't shake the thought that it'll die a silent death like so many attempts to replace the internet before it.
Decentralized tech doesn't work well until the network effects build up.
IPFS has the interesting quality that the more popular a piece of content is, the easier it is to get ahold of. If millions of people were using IPFS, the most popular content would be being served by many thousands of people, and finding it and downloading it would be extremely fast. Then subsequent viewings would be instantaneous because you can cache it for life.
This leaves interesting incentives for monetizing pinning and caching for less popular content.
It makes sense if you ask me. If I love a piece of music so much that I'm willing to give it to others for free then everyone benefits from being able to access it easier.
Content that people care about organically becomes more resilient and nearly impossible to remove.
Content that no one cares about is slow and inefficient because it has to be hauled out of cold storage the one time a year anyone cares.
If someone thinks that content is more important than people are giving it credit for they can host it or pay for someone else to do it.
If you have a website and you have "fans" that subscribe to you and help pin all your stuff, then your stuff becomes faster and easier to get. Your "fans" can even get paid for helping to serve your content.
So, to me, it's early days for IPFS, and the way to make it better is to try to build apps that increase its usage, so the power of the network effects is felt.
That sounds like the worst of the current internet, but even worse
I remember popcorntime was as responsive as Netflix at the time it came out and it scared the shit out of the MPAA so they killed it with prejudice.
IPFS doesn’t have an excuse for sucking beyond a basic lack of engineering effort.
...then IPFS would just get even slower and use even more resources to manage the index and find content as I am pretty sure the DHT they are using doesn't scale the way you seem to think it does.
This interesting given your description:
> IPFS has the interesting quality that the more popular a piece of content is, the easier it is to get ahold of. If millions of people were using IPFS, the most popular content would be being served by many thousands of people, and finding it and downloading it would be extremely fast.
This idea hasn’t been new since the turn of the century (BitTorrent offered exactly that in 2001) and nothing in that description explains why this is different than the many previous attempts. It’d be interesting to hear about how IPFS plans to maintain that without the problems with abuse and how it keeps competitive performance relative to non-P2P in a world where things like CDNs are much cheaper and easily available than they were around the turn of the century. Using P2P means giving up a lot of control for the content provider and that’s a challenge both from the perspective of the types of content offered and the ability to update or otherwise support it on your schedule.
IPFS expands BitTorrent into a global filesystem.
You can mount IPFS on your filesystem and address files by pointing at local resources on your machine. So you could have an HTML file say `<img src="/ipfs/QmCoolPic" />`. You can't do that with BitTorrent.
But that's not true anymore, most internet power-users are on broadband connections, many of which are symmetric, transfer speeds up or down are no longer a limiter that pushes people towards decentralization.
So when considering a decentralized system like IPFS, the downsides of decentralization, like availability, edit control, and service support, are much more salient.
There are a lot of things that "could work if everybody uses it". You can never get there if the thing isn't desirable compared to existing alternatives.
I feel like freenet (2000) is maybe a better comparison
Except that most people outside tech would probably be using phones/tablets/crappy pcs, with upload speed that is 10% of their download speed.
So niche content is difficult to get a hold of?
Sounds like a bad idea.
The point being, obtaining the popular stuff is no longer subject to DoS because distributed caching is built into the protocol itself.
It launched in Feb 2015.
Things that have launched since then:
The idea of Donald Trump as President of the USA
Tesla Model 3
It's entirely possible that everything could change any day now. It's equally plausible that it's just a bad implementation of a decent idea, and something similar could come along and deliver on its promise.
I find this disturbing, because not only is it not true, that it originated in 2015, and many people have commented on how The Simpsons predicted it in 2000...
...but the reason they predicted it had a lot to do with Trump running in 2000, and more recently, he's reportedly been saying he is such a winner he won the first time he ran.
So it reminds me of the famous photo with Stalin and the "vanishing Commissar"...but what do you know - that has been deleted from Wikipedia recently!
It was there prior to 2016 though:
Trump wasn't taken seriously as a candidate in 2015. He didn't declare his candidacy until July 2015 at which point his odds were 150/1 and it didn't get above 66/1 in 2015.
In 2000 his run wasn't taken seriously either. He had a approval rating of 7%. https://en.m.wikipedia.org/wiki/Donald_Trump_2000_presidenti...
But I suspect you had it right the first time. I sort of think "the idea" is the relevant stage. And that adds ~15 years to that particular thing.
You criticize that people have unrealistic expectation, and then you are making an unrealistic claim...
Requesting an IPFS document would query a few popular repositories, then revert back to normal IPFS if it's not found.
These buffer servers would also track what's popular and shuffle around what they store accordingly.
The downside of this approach is that it only works with popular nodes and you'd be back to the old, centralised internet architecture for all real use cases.
I don't think you can accurately gauge what is and isn't popular in a P2P network like IPFS. You never have a view of the entire network, after all.
There's also the problem of running such a system. Who pays for the system's upkeep and do we trust them? If we'd use Cloudflare's excellent features, who says Cloudflare won't intentionally uncache a post criticising their centralisation of the internet, forcing the views they disagree with to the slow net while the views they agree with get served blazingly fast.
I don't think such a system would work well if we intend to keep the decentralised nature of IPFS alive. Explicit caching leads to centralization, that's the exact reason caching works.
Instead, the entire network needs a performance boost. I don't know where the performance challenges in IPFS lie, but I'm sure there's ways to improve the system. Getting more people to run and use IPFS would obviously work, but then you'd still only be caching only popular content.
Edit: actually, I don't really want to see caching happen through popularity of the service either, because as it stands IPFS essentially shares your entire browsing history with the world by either requesting documents in plain text or even caching the documents you've just read. I wonder if that IPFS-through-Tor section on their website ever got filled in, because the last few times I've checked that was just a placeholder in their documentation.
IDK a whole lot about IPFS though. Maybe it was the metadata resolving / DHT lookup or whatever that was super slow. BitTorrent latency was always pretty high, but it didn't matter because throughput was also high
Paying for pinning sounds like something that could work but it would introduce some of the same problems that the real web suffers from back into IPFS. The idea "a web for the people, by the people" becomes problematic when you start paying people to make your content more accessible.
The thing I liked about the idea of IPFS pinning is that you are paying per byte stored, v.s. per byte accessed, as long as the p2p sharing works. I.e. hosting-via-pinning a website only you read would cost the same as hosting a website that the whole internet reads.
From what I could tell the performance issue was mostly located in the networking itself, getting the client to resolve the content on the right server. That's something that could be improved through all kinds of algorithms without breaking compatibility or functionality, so there's hope.
I agree that pinning comes with some interesting ways to monetize hosting without the need for targeted advertising that the web seems to have these days. Small projects like blogs, webcomics and animations could be entirely hosted and supported by the communities around a work, while right now giant data brokers need to step in and host everything for "free".
"It stays in the cache as long as it stays in the cache"
??? What on earth does this mean?
This is opposed to long-term caches like Cloudflare's that'll cache the contents of your website regardless of how many requests come in. Cloudflare will happily just refresh the contents of your website even if nobody has been to your website for weeks, and quickly serve it up when it's needed.
"Doesn't matter. the point of ipfs is that when cloudflare and google shut down their gateway, the ipfs content is still available at the same address."
Without decentralisation being supported at the protocol level, as soon as the host dies, it's gone. This is particularly problematic because centralised services slowly subsume small services/sites and this either cuts off the flow to the other small sites or eventually something changes on the big centralised site and a bunch of these little sites break.
The underlying problem is that storage and bandwidth cost money and people have been conditioned not to think about paying for what they consume so things end up either being ad supported or overwhelming volunteers.
One of the points of IPFS (and bittorrent before it) is that this is not a problem; each node that downloads the data also uploads it to other nodes, so having lots of traffic actually makes it easier to serve something (indeed, if it was already widely seeded by Google's mirror, there wouldn't be any sudden traffic).
BitTorrent as many have noted is great for popular things, even not-particularly-popular things, but absent incentives to continue seeding (i.e. private trackers' ratio requirements) even once-popular things easily become inaccessible as the majority of peers don't seed for long, or at all.
I guess what I don't quite is what IPFS adds vs. say, a trackerless BitTorrent magnet link that uses DHT? Or is it really just a slight iteration/improvement on that system?
Beats me! I think there might be support for finding new versions of things, but I'm not sure about the details or how it prevents authors from memory-holing stuff by saying "The new version is $(cat /dev/null), bye!".
If nobody pins a link it disappears but there is no strong incentive it just rides on abundant space and bandwidth and wealthy Gen Xers that want to be a part of something
The same group released filecoin which experiments with digital asset incentives.. and venture capital
so that's one advantage to one audience
I guess the approach would be to simply run IPFS on those servers, with the popular content in it, as a seed.
That's good enough for kicking off batch file transfers (assuming you mean P2P networks like BitTorrent), but there's no evidence that people will tolerate a slow web, and lots of evidence that they won't.
The main challenge for me with this comment is that you can't expect distributed/decentralized networks to win if you set an expectation that "things will just be slower than the normal web". Nobody is going to migrate to that.
I don't know Skynet. I first checked Google, got a Wikipedia link describing a movie, then checked the Wikipedia disambiguation page, but got nothing.
Also, why would a project duplicate the efforts of IPFS rather than contribute to it?
IPFS has chosen an architecture which fundamentally keeps it non-performant, Skynet is built from the ground up in a different way, and gets 10-100x improvements on performance for content-addressed links, and 100-1000x improvements on performance for dynamic lookups (IPNS)
ipfs cat /ipfs/QmQPeNsJPyVWPFDVHb77w8G42Fvo15z4bG2X8D2GhfbSXc/readme
DHTs just aren't a good choice for massive data systems if you need low latency.
I've just pinned a 12MiB file filled with random bytes on one of my servers (`dd if=/dev/urandom of=test.dat bs=1M count=12; ipfs add test.dat; ipfs pin <hash that came out>`). The server has a 50mbps uplink, so transferring the file to my laptop should take about two seconds.
Dumping this blog's contents over IPFS takes the server about 3 seconds (first time load) so the network seems to be in working order, at least when downloading data. `ipfs swarm peers` lists about 800 known peers. On the server itself, `ipfs cat /ipfs/redacted > /tmp/test.dat` runs in about a second, which is all perfectly acceptable overhead for a transfer that'll take two to three seconds anyway.
On my laptop, I've tried to get the file but I just cancelled it after waiting for 16 minutes. Halfway throughout the wait, I've tried opening the file through the ipfs.io proxy, which finally gave me the file after a few minutes, but no such luck yet if I retry the ipfs command.
I don't know if it's the random file, the size, or something different, but if I'm launching a blog or publishing documents on IPFS, visitors should not be expected to wait five to ten minutes for the data to load. "After the first twenty visitors it'll get faster" is not a solution to this problem, because there won't be twenty visitors to help the network cache my content.
Maybe I'm expecting too much here; maybe the files shouldn't be expected to be available within half an hour, or before Cloudflare caches it. Maybe there's something wrong with my laptop's setup (I haven't done any port forwarding and I'm behind a firewall). Either way, if I follow the manual but can still buy a domain, set up DNS and hosting on my VPS and send a link to a friend faster than I can get the file through P2P, I don't think IPFS will ever get off the ground. Fifteen minutes is an awful lot of time for a data transfer these days!
Edit: actually, now it seems ipfs.io and cloudflare have picked up the file in their caches. Data transfer is up to normal speed now. If you want to try to replicate my experiment, I've just uploaded a new test file to /ipfs/QmbBD872kjfoutAmTKFCxTCApw9LBB9qxxRyXpEGYzsqMH.
Edit 2: I realized that by saying I downloaded the file and that the file is random, I just announced my personal IP address to the world through the IPFS hash, so I removed it. That was pretty dumb of me, and also a pretty clear problem of IPFS in my book.
Even torrents take 1-10 seconds to initiate downloads with an absolutely massive network.
Will the network be faster the more people join it?
Is centralized infra inevitable for fast things because of all assumptions and insight a centralized provider can use? BTC is also slow, I don't know of any cryptocurrency that can provide VISA transaction volumes, even if they use more power than some countries alone.
Skynet is a decentralized network with lookup times that have a p50 TTFB of under 200ms. It achieves this by looking things up directly on hosts rather than routing through a DHT. There's a bit of overhead to accomplish this (around 200kb of extra bandwidth per lookup), but for a smooth web experience that tradeoff is more than worthwhile.
Do we need all that comes with IPFS? Not just technically, but the user training and pivot of technical doers?
So many of these projects feel like programmer vanity projects, there’s really little difference between them and a guy on the corner telling me why Protestants are wrong, join his flock.
That it’s a technical project not entirely ephemeral nonsense doesn’t matter; solutions exist already we just don’t implement that way.
It's interesting how the same people promoting the "creator economy" also tend to promote the cryptocurrency space and IPFS without an ounce of self-awareness. IPFS sounds awful for creators of all kinds in the same way as BitTorrent was awful for artists. I can definitely see a use case for IPFS as a file storage for trustless systems such as smart contracts, which are designed as immutable, trustless systems.
But IPFS has the crypto problem of conflating the stuff that works now with the stuff that's hypothetical in its marketing, and not admitting that the latter is janky nonsense that doesn't bloody work.
Again, I'm not sure when it wasn't working, or when it began working (it's always worked since I've been around), but IPNS has made huge strides, I use it every day. Even https://ipfs.io is using IPNS, it's very popular.
You could use all the current web/http/DNS infrastructure, and add certified/cacheable GET results.
Anyone could run their own proxy and cache what they see fit. Seems like an easier transition as it could be fully compatible with the current web.
Honestly, I like IPFS as a tech, and blockchain isn't useless, but the entire community just makes me want to puke because it's full of so many extremely ignorant people (at best), but mostly just fraudulent liars (at worse and most common).
No, the difference is that IPFS will use the same address to fetch content from anyone who's seeding it.
If Hacker News shuts down, it will no longer be accessible at 'news.ycombinator.com'; all existing links to that address will die, or worse will start showing some unrelated content (probably domain-squatter spam). That cannot be prevented by making a copy on my hard drive (or even the Wayback Machine).
On the other hand, an IPFS version will continue to exist at the same address for as long as anyone is seeding it. All links will remain working; anyone can join in the hosting if they like; even if it eventually stops getting seeded, it may still re-appear if someone re-inserts it, e.g. if they insert the contents an old hard drive they found in an attic (as long as the same hash algorithm is used, then the address will stay the same).
I was only talking about the HTML, not the backend server code. Still, you could look on https://awesome.ipfs.io/apps for ideas.
I wonder how that would work, a naive direct translation seems impractical. An address identifies an exact piece of content, so a hacker news article gets a new adress every time a comment is added?
You couldn't host the "live" version of Hacker News (with user accounts, new comments, etc.) unless YCombinator open up their databases, and re-architect the system to work in a distributed-friendly way.
The latter is an interesting idea, but isn't required to fix HTTP issues like link rot.
This is just a guess, though.
That said, IPFS isn't quite perfect here, as IPFS hashes do not actually point to the content itself, they point to a package that contains the content and depending on how that package was build, the hash will change.
Wow, they have an URI scheme for Not Invented Here. Amazing!
You don't "archive" a page with IPFS, with IPFS everything you keep a copy of stays available under the same address, that either happens automatically via cache or via a manual 'pin'. That's fundamentally different than what HTTP does.
The archive copy you create of a HTTP site is your own personal copy and completely inaccessible to anybody else. Even if you put it online, people would have no clue where to find it. With IPFS the document never leaves the address space and stays accessible to everybody under the same address, no matter who decides to host it.
Another important practical difference is that IPFS has native support for directories, so you don't have to try to spider around to try to guess all the URLs, you can just grab the whole directory at once. That in turn also has the nice side effect that .zip archives essentially become irrelevant, as you can just upload the directory itself.
In a world of finite storage, nobody's going to keep up a copy of everything. Nobody will have a copy of most things. Even if IPFS worked acceptably, even if it worked as very narrowly promised, plenty of stuff would fall off the web. At best, we'd see somewhat fewer temporary disruptions of very currently popular content.
Solution for it becoming more efficient is that someone else should host it for you ideally for free.
It is just exercise in throwing big numbers and utter ignorance to impress people but downloaded megabytes are not magically going away.
Just like torrents - no one wants to seed or pay hosting costs everyone wants to download. There is no protocol that is going to fix that. Why is everyone mining BTC like crazy, because they get money for that
A big problem in the NFT space of late is that OpenSea sells NFTs that link to an IPFS URL, then doesn't bother seeding the image after it's sold - so a pile of NFT images no longer exist anywhere on the IPFS.
You also gotta look into the smart contract code and see if baseURI an IPFS link or someone's proprietary website.
If you want to update content, you have to point the user to a new hash. IPFS has the IPNS mechanism for that, this adds a layer of indirection, so instead of pointing to the IPFS hash directly, you point to the IPNS name which in turn points to the current hash. What an IPNS name is pointing to can be updated by the owner of that IPNS name.
Another option is to do in via plain old DNS and have the DNS record point to the current hash of the website.
For the rest, hosting takes money. People will not archive the entire internet for free and IPFS is not a magic wand which eliminates the need to have people like the skilled IA team. It could make their jobs easier but that’s far from “nearly complete” and no more or less pristine.
Volunteer capacity at the scale of many petabytes of online storage is unproven and the long tail of accesses is enough that you're going to have to think not just about the high replication factor needed but also the bandwidth available to serve that content on a timely manner and rebuild a missing replica before another fails.
Right, which is why I said it would have to periodically scan the registered clients to ensure a minimum number of clients has each block to ensure redundancy.
> also the bandwidth available to serve that content on a timely manner and rebuild a missing replica before another fails.
I think a slow, cheap but reliable archive is better than "more expensive but lower latency", so I'm not particularly concerned with timeliness.
> Right, which is why I said it would have to periodically scan the registered clients to ensure a minimum number of clients has each block to ensure redundancy.
That's the easy problem, not the hard one I was referring to: doing what IA does requires you to be able to crawl web resources and identify everything which needs to be available for a page snapshot to be usable. IPFS only helps with that in the sense that you can tell whether you have the same URL payload without requesting it — you still need to handle dynamic behaviour and that's most of the work.
> I think a slow, cheap but reliable archive is better than "more expensive but lower latency", so I'm not particularly concerned with timeliness.
What I would be concerned with is “more expensive, higher latency, and greater risk of irrecoverable failure”. Relying on volunteers means that you need far more copies because nobody has a commitment to provide resources or even tell you if they decide to stop (“ooops, out of space. Let me clear some up — someone else must have this…”), and the network capacity isn't just a factor for user experience — although that can prevent adoption if it's too slow — but more importantly because it needs to be available enough to rebuild missing nodes before other ones also disappear.
I'm not sure what you think would be difficult exactly. You've said that archive.org has already done the programming needed to ensure dynamic resources are discovered, and now those resources are content ids rather than URLs. Nothing's really changed on this point.
> Relying on volunteers means that you need far more copies because nobody has a commitment to provide resources or even tell you if they decide to stop
Yes, but you would also have many more volunteers. Many people who wouldn't donate financially would donate CPU and storage. We saw this with SETI@home and folding@home, for instance.
> nobody has a commitment to provide resources or even tell you if they decide to stop
Why not? If you provide a client to participate as a storage node for archive.org, like SETI@home, then they would know your online/offline status and how much storage you're willing to donate. If you increase/decrease the quota, it could notify the network of this change.
The point was that it's outside of the level which IPFS can possibly help with. IA actively maintains the code which does this and any competing project would need to spend the same time on that for the same reasons.
> Yes, but you would also have many more volunteers. Many people who wouldn't donate financially would donate CPU and storage. We saw this with SETI@home and folding@home, for instance.
That's an interesting theory but do we have any evidence suggesting that it's likely? In particular, SETI@home / folding@home did not involve either substantial resource commitments or potential legal problems, both of which would be a concern for a web archiving project. There's a substantial difference between saying something can use idle CPU and a modest amount of traffic versus using large amounts of storage and network bandwidth.
SETI@home appears to have on the order of ~150k participating computers. IA uses many petabytes of storage so let's assume that each of those computers has 10TB of storage free to offer — which is far more than the average consumer system — so if we assume all of them switch, that'd be 1.5EB of storage. That sounds like a lot but the need to have many copies to handle unavailable nodes and one factor controlling how many copies you need is the question of how much bandwidth the owner can give you — it doesn't help very much if someone has 2PB of storage if they're on a common asymmetric 1000/50Mbps connection and want to make sure that archive access doesn't interfere with their household's video calls or gaming. Once you start making more than a couple of copies, that total capacity is not looking like far more resources than IA.
> > nobody has a commitment to provide resources or even tell you if they decide to stop
> Why not? If you provide a client to participate as a storage node for archive.org, like SETI@home, then they would know your online/offline status and how much storage you're willing to donate. If you increase/decrease the quota, it could notify the network of this change.
All of what you're talking about is voluntary. One challenge of systems like this is that you don't know whether a node which simply disappears is going to come back or you need create a new replica somewhere else. Did someone disappear because they had a power outage or ISP failure, tripped over the power cord for an external hard drive, temporarily killed the client to avoid network contention, just got hit with ransomware, etc. or did they decide they were bored with the project and uninstalled it?
Since you don't have an SLA, you have to take conservative approach — lots of copies, geographically separated, etc. — which reduces the total system capacity and introduces performance considerations.
I'm not sure why they would have to compete. They're literally solving the same problem in basically the same way. I see no reason to fork this code.
> That's an interesting theory but do we have any evidence suggesting that it's likely? In particular, SETI@home / folding@home did not involve either substantial resource commitments or potential legal problems, both of which would be a concern for a web archiving project.
But this isn't a concern of a web archiving project any more if content-based addressing becomes the standard, because pervasive caching is built into the protocol itself. Publishing anything on such a network means you are already giving up some control you would otherwise have in where this content will be served from, how it's cached, how long it lasts, etc.
> All of what you're talking about is voluntary. One challenge of systems like this is that you don't know whether a node which simply disappears is going to come back or you need create a new replica somewhere else.
Yes, you would have to be more pessimistic and plan for more redundancy than you would otherwise need. Each node in a Google-scale distributed system has a low expected failure rate, but they still see regular failures. No doubt they have a minimum redundancy calculation based on this failure rate. The same logic applies here, but the failure rate would likely have to be jacked up.
> Since you don't have an SLA, you have to take conservative approach — lots of copies, geographically separated, etc. — which reduces the total system capacity and introduces performance considerations.
Whether there would be performance problems isn't clear. Content-based addressing is already fairly slow (at this time), but once content is resolved, fragments of content can be delivered from multiple sources concurrently, and from more spatially close sources. Higher latency, but more parallelism.
I'm not willing to invest the time needed to gather the data you're asking about to actually quantify all of the requirements, but despite the points you've raised, I still don't see any real obstacles in principle.
The point was simply that the original comment I was replying to claiming that this made it easy to replace archive.org was really only relevant to one fraction of what an archiving project would involve. If IA is going strong on their side, it's not clear why this project would get traction.
> > > That's an interesting theory but do we have any evidence suggesting that it's likely? In particular, SETI@home / folding@home did not involve either substantial resource commitments or potential legal problems, both of which would be a concern for a web archiving project.
> But this isn't a concern of a web archiving project any more if content-based addressing becomes the standard, because pervasive caching is built into the protocol itself. Publishing anything on such a network means you are already giving up some control you would otherwise have in where this content will be served from, how it's cached, how long it lasts, etc.
That's a separate problem: the two which I described are covering the commitment of storage, which is unlike the distribution computing projects in that it's only valuable if they do so for more than a short period of time, and the legal consideration. If you run SETI@Home you aren't going to get a legal threat or FBI agent inquiring why your IP address was serving content which you don't have rights to or isn't legal where you live.
> The same logic applies here, but the failure rate would likely have to be jacked up.
Yes, that's the point: running a service like this on a voluntary basis requires significantly more redundancy because you're getting fewer resources per node, have a higher risk of downtime or permanent loss of a node, and replication times are significantly greater. Yes, all of those are problems which can be addressed with careful engineering but I think they're also a good explanation for why P2P tools have been far less compelling in practice than many of us hoped. Trying to get volunteers to host things which don't personally and directly benefit them seems like more than a minor challenge unless the content is innocuous and relatively small.
> Whether there would be performance problems isn't clear. Content-based addressing is already fairly slow (at this time), but once content is resolved, fragments of content can be delivered from multiple sources concurrently, and from more spatially close sources. Higher latency, but more parallelism.
The problem is bootstrapping: until you get a lot of people those assumptions won't be true and a worse experience is one of the major impediments to getting more people. In the case of something like web archiving where the hardest part is crawling, which this doesn't help with at all, and there's a popular service which is generally well-liked it seems like have a detailed plan for that is the most important part.
p2p no huge IP infrastructure between nodes
interest packets at narrow waist (like human attention, scarce)
sign all packets, more secure, easier to trust|not
hash for pointers, no surprises in containers
crypto apply more cryptography, less web3 baloney
broadcast radio native, no emulation of copper wire
data, closer to where you want, find faster, keep
content you want w/out intermediation, need to find "where" within IP/udp/tcp addresses
https://youtu.be/P-GN-pYfRoo?t=1825 node to node
https://youtu.be/yLGzGK4c-ws?t=4817 more application, less security hassle
http://youtu.be/uvnP-_R-RYA?t=3018 hash name the data
https://youtu.be/gqGEMQveoqg&t=3006 data integrity w/out need to trust foreign server
I was tipped off to https://twitter.com/_John_Handel/status/1443925299394134016 which I honestly think might be the approach to think about how networks in a broader sense than regular telecommunications are bootstrapped.
Whether HTTP or IPFS, you want content:
(1) hosted/stored redundantly enough for availability but not over-hosted/stored because that raises costs.
(2) hosted “near” to requesters to make efficient use of the network, which limits costs and increases speed. (Near in terms of the network infrastructure.)
With HTTP I understand how this works (the content publisher figures out a hosting solution balancing what they are willing/able to pay and what kind of availability and speed they want/need).
Not sure with IPFS though. If people are choosing what to host, wouldn’t the hosting be uneven, with some content under-hosted (or not hosted at all) and some popular content greatly over-hosted? …leading to an inefficient system? Under—hosted content would be slow to retrieve or simply unavailable. Over-hosted content is wasting resources, making the system more costly than it needs to be (somebody is paying for the storage and servers to make that storage available).
I realize this a alpha/experimental, but without a strong answer to this, I don’t see how the system can work at scale.
The distributed part is still going strong, but in my eyes distributed (or decentralization) should be a tool, not a goal.
The question is which kind of Internet we want to build with these distributed infrastructure? If it's just cloning the current Internet, then I don't see what it brings.
My personal answer is that we should build a democratic Internet. One where people are members of the Internet, instead of mere users, and decisions are made in a democratic process, like in a state. This kind of Internet can only be implemented using the distributed/decentralized tools.
Perhaps I missed it, but I don't see where they claim that IPFS can provide a solution for dynamic content; which makes up a huge portion of what is served over HTTP.
It's also brutal for battery life.
The title shows clearly that they were wrong.
Let's assume all goes well and in 2-5-20 years IPFS is the web. A random Joe has an IPFS server in his basement, because it's profitable or at least convenient for him. Most of the traffic never reaches AWS or CouldFlare. What do they do about it? They pretend to be random Joes and mirror the same setup.
Obviously Amazon will manage their servers more efficiently than an army of Joes, so the nothing really changes: we end up with a decentralized protocol where 90% of traffic just happens to end up on AWS servers anyway.
I read the docs and tutorials but no luck. Felt like docs were missing some special incantation or setup step, or precondition. Or the CLI wasnt giving me feedback on some blocking error, and it was failing silently. Dunno. Gave up. Hope to revisit some day. Because in theory it would be useful tech to have in my toolbox.
I think it's so bad that so much of the Internet information is siloed in walled gardens and that pages have to be dynamically generated and routed across the globe every single time... Imagine how it's gonna be when we get to Mars, for instance. Storing this knowledge graph in IPFS files seems like a logical step to me (as an absolute layman, though)
Of interest: they provide a trustless way to store your data encrypted data on centralised boxes (S3, Backblaze) if you want.
search for content = it presumes that there's a google-like indexing of content, and then a DNS-like way to retrieve it.
It seems to me that IPFS, in this analogy, is simply a different way to address a web page / document.
note: I am very familiar with IPFS. I just think that this analogy is really poor.
The thing is that the web by its nature is already decentralised. The real issue with the web today isn't really some technical, architectural flaw.
Centralisation has emerged in this already reasonably well decentralised system because of network effect driven accumulation, the market and pre-existing wealth from investors tipping the scales.
Yes IPFS is a more thorough attempt to create a distributed web, yet I doubt it is fully immune to the forces that captured and centralised the existing web, except for the fact that it is currently unpopular. Or it will remain unpopular because there's no incentive for private enterprises to invest much in a system where they can't control their own corner of it and few users like using systems built on the cheap when flashy, well funded alternatives exist, if it is truly that resilient to centralisation. Whichever way you want to slice it.
The social forces that cause centralisation in the web are the same forces that cause e.g. monopoly and extreme wealth accrual in the rest of society. And the fix has to be social, not technical.
It's a long road though, with lots of negotiations with lots of organizations. I do feel optimistic about it though.
The idea on how a decentralized web of hypertext documents should be done right isn't exactly new. AFAIK the fact that in the HTTP+HTML web stack, servers going down meant documents disappearing and links going stale was criticized even at the time. The HTTP+HTML stack winning out is probably one of the many examples of "good enough" winning over "perfect".
Well, that's the difference between an URL and an URI, right? HTTP seems URL-oriented while IPFS seems URI-oriented.
URL - Uniform Resource Locator
URN - Uniform Resource Name
URI - Uniform Resource Identifier
IPFS is a CAS (Content Addressable Storage), so an "Identifier" there is a semantic key (i.e. hash of the content).
The problem is that it isn’t unique enough from the existing experience and the problems it claims to solve aren’t something non technical people care about. Web distribution in its current form works. Even if it’s supremely shitty grandma can still post photos on Facebook.
If a new distribution format wants to win it needs to be different in a way an 8 year old and grandma can equally understand. IPFS is not that. What 8 year olds and grandma equally understand is that content is king, the layman term for client/server model.
More likely the future will purely be a focus on web technologies for application experiences and distribution. Kill the client/server model. If grandma wants to share photos with grandkids she doesn’t need Facebook at all. She only needs a network, a silent distribution/security model, and the right application interface. No server, no cloud, no third party required.
Start with a description of the solution. If you can't cover it in 1-2 paragraphs at the beginning, you are in trouble. Don't focus so much on what everyone else is doing wrong - focus on what you are doing and let others be the judge of whether this is better.
Well, we’re all dependent on the Internet which can be shut down at any time by any government, and normally almost completely dependent on FAANG companies.
I applaud the freedom activist spirit but the only exciting technologies in this regard must be P2P and not Internet carried.
A lot of website hacking could've been avoided with a better design like even having encryption as standard.
But that would be a very different web from today's, where most content comes from dynamic CMS/CMF, blog software, forum software, or other web apps. It addresses things like the old personal homepage of the mid-90s, but not much at all about the modern web.
It also doesn't address the major outages (like we've seen with cert revocations or DNS outages, etc.)
Tell that to all of the content creators, corporate or not.
Like, if I’m talking to someone over some chat setup which doesn’t have a built in “send this file directly to this person” feature, it would be nice to be able to say, give them a multihash of the file and my external ipv6 address (+ port? I’m not quite sure how routing works), and have them request the file from my computer.
Now, you might say “but how does that help with the situation where a bunch of people in a room want the same file from a distant location?”.
And, maybe it doesn’t as much? But I think if e.g. people on a local network had theirs try first if the local network already had it, and then check the given external address, that that could work?
I’m somewhat surprised that I hadn’t heard of the ni scheme before tonight.
Is there some nice software (can be cmd only, but ideally multi-platform) to do that where both parties are on separate residential connections, and where the receiving party side software automatically checks that the received data matches the hash?
Because if so, that seems to serve exactly the purpose I’m thinking of!
Thank you for the direction
> give them a multihash
> place to request data from
> try local network first
Already baked in!
So, sounds like I ought to learn to use bit torrent.