Sadly the design contains some flaws that need addressing.
From their whitepaper (http://static.benet.ai/t/ipfs.pdf)
The architecture is essentially creating 1 giant Bittorrent swarm where each piece may be seeded by different people. See section 3.4.4 of whitepaper. This approach suffers from the "double coincidence of wants" problem. There will be a billion items if this takes off. Matches are rare and could take hours/days.
"Aside from BitSwap, which is a novel protocol".
These types of protocols have been worked on for over 8 years. Nobody has ever been able to protect them from an Eclipse or Sybil attack. Early work:
Deployed system from my research group, improved upon for 7 years now: http://www.pds.twi.tudelft.nl/~pouwelse/A_network_science_pe...
I don't know enough about those kind of security attacks to comment.
My comment "you guys are great!" refers to their team as people not necessarily the project itself. I've worked with them before, and they are awesome folks.
That would change the internet, and the world.
- very soon, you wont need to install anything to use IPFS. it will "just work" with js on today's browsers.
- for best perf, yes, we need browser implementations. and... those have begun :)
On sites not known to me, I typically block most (read all) .js at first, just to make sure that these sites do not leak too much information about me to external tracking tools, advertisers or the like.
So using .js seems to me a bit problematic in regards of people not enabling it by default (but this is an absolut minority, if I look at the stats of the sites I analyze).
It works very well for me so I'd recommend it to others, if they're not already aware of it.
I doubt they're building it in JS for giggles.
In order to make changes like this stick my guess would be that you need to reach a tipping point before interest fades, and to do that you need to take away the browser vendors ability to say 'no' or 'maybe later'.
As you point out, people who don't run JS by default represent a tiny percentage of traffic, so would have to be a secondary concern to be supported later.
Do you have links to browser implementations?
Furthermore the right to be forgotten only mandates hiding results when someone searches specifically for a name, for example. The results are still allowed to come up for unrelated search terms.
The court judgement - http://curia.europa.eu/juris/document/document_print.jsf?doc... - paragraph 26 says "As regards in particular the internet, the Court has already had occasion to state that the operation of loading personal data on an internet page must be considered to be such ‘processing’ within the meaning of Article 2(b) of Directive 95/46" - so by the Court's reasoning the Directive potentially applies to any web page, not just search engines.
The "right to be forgotten" is a phrase used in the argument for de-listing, and later became the name for the ruling itself. This was however not a right that the court granted, but rather the court points out that the data protection directive stipulates that personal data shall be:
(c) adequate, relevant and not excessive in relation to the purposes for which they are collected and/or further processed;
(d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that data which are inaccurate or incomplete, having regard to the purposes for which they were collected or for which they are further processed, are erased or rectified;
and judges (in paragraph 93) that in particular, data that was once in compliance might not be at a later time.
It also states that the failure of the particular data in this case to be those things means the person described by them can invoke the right to object to the processing, in this case having personal data listed on a page as a response to a "search made on the basis of his name" (in paragraph 94).
It is not completely clear, but I think the query being judged upon includes only the name of the person, and adding a keyword relating to the particular year, or bankruptcy actions, would be enough to make the data "adequate, relevant and not excessive".
Easy: The Net interprets censorship as damage and routes around it.
Oh, right, we're not calling this censorship yet. Funny how quick we are to put up mirrors to Tibet and Tank Man content but somehow we're on the hook to protect London businesspeople and corporations from criticism? Like Bitcoin or Bittorrent and other decentralized applications, it will not be able to follow the dictates of nation states and it will be on nation states themselves to filter appropriately. There's no one to call to remove "unwanted" content. No one to hit with fines, taxes, lawsuits, and fees.
Also, there's a discussion to have in regards to EU shakedowns of US companies. These billions of dollars in fees that go straight to the EU don't seem like consumer protection to me, but fundraising. Billions of dollars in fees for "bundling media players" or "having an Android monopoly" should give any wannabe entrepreneurs pause. Its a shame that the EU is unquestionable on sites like HN and reddit. There's a lot of questionable morality here and obvious signs of corruption. So yeah, not being able to be dictated by the EU or any nation state is a feature, not a bug.
So this isn't per se a problem. But what happens, when I act as a "mirror" in this net and have mirrored some questionable content (who ever might define questionable is up to anybody to imagine). Will I get a shake down from the authorities, as oftentimes happened with people having tor exit nodes?
I fear that the clash would come rather sooner then later, as a lot of people do have a lot of incentives for a centralized web (money, politics, control, power).
But the right to be forgotten is a flawed concept to me, and I think any tech that makes its implementation more problematic will do us a favour.
I would kindly suggest that we consider examples from history when laws backed by the majority have resulted in horrifying, depraved violence and deprivation; then contemplate whether it is indeed moral to shape law through majority opinion.
The web didn't have permanent stable addresses because the web was a hack, intended as a way to explain Enquire to people who didn't have a background in real hypertext systems. (Let the entire existence of the Web, with its myriad flaws, be a lesson to you: your crappiest demo may well become the thing you're known for twenty-five years later, so make sure your crappiest demo doesn't suck as bad as the Web does.)
The web is amazing, and you are using it right now. It has flaws, but so does anything that achieves success at the level the Web has. How many technologies have a direct impact on billions of people's lives?
There are subtle reasons why "crappy" technology often wins. One is that worse is often better (http://www.jwz.org/doc/worse-is-better.html) and the second is that people are more prone to complain about popular technologies, which create the illusion that their alternatives would obviously be so much better.
EDIT: Note that the web is not just a technological phenomena, it's a social one. If you care about the real world, it doesn't make sense to look at it as a technological innovation in isolation. Anyone can dream up The Perfect Protocol in a research paper, it's an entirely different beast to actually do it and make it work for billions of people.
This Nautil.us article does a good job of summarizing the relevant research on the issue and references some supporting studies.
> My own research has shown that fame has much less to do with intrinsic quality than we believe it does, and much more to do with the characteristics of the people among whom fame spreads.
Put another way: the web has a direct impact on billions of people less because it's any good and more because popularity is inherently a power law and it happened to come out ahead. This doesn't invalidate any criticisms of quality!
It's perfectly possible if not likely for something popular to be crappy because most things are crappy and that's not enough to stop them from getting popular. Then they require tons of engineering effort and hacks to scale and improve again not because of quality but because of legacy. The web is what's there already, after all!
That's how things are, but that doesn't mean it's inevitable or a good state of affairs.
I really wish people wouldn't conflate popularity, quality and success. The first two operate on different axes and the last, if it doesn't, should.
This is simply not true in the long-term. Good technologies survive because they are useful.
Of course quality is not enough to ensure success, but it sure as hell matters for long-term survival. Homer, Shakespeare and the wheel survived because they are very good / useful. Are there lost Homers? Of course there are. Will the Web survive 100 years? I have no idea, and neither do you. A decade? Most likely.
Be careful with conflating what is with your idea of what something should be. If you think something sucks and work on making it better, that's great! Keep doing it, and maybe someday we will have something that's even better. But don't diss the popular just because it isn't ticking all of the "it-should-be-this" boxes.
Technologies become popular primarily based on how easy or cost-effective they are to replicate. The fewer concepts you introduce, the lower the bar is for people to reproduce your intellectual inventions; the cheaper the materials and manufacturing process, the easier it is for knock-offs to be manufactured. Popularity means a plurality of knock-offs. Quality often requires subtlety and attention to detail -- two things that add effort to replication, and thus work against popularity.
The web became popular not because it was better than the alternatives but because it was stupider than the alternatives -- it had fewer features, fewer ideas, and fewer nuances. The web is no longer less featureful or nuanced than its competitors, because unplanned organic development grows complexity like a rat with a radium drip grows tumors, so as a result the very simplicity that made the web grow so quickly in the first place would prevent it from growing again, had it not become the focal point of an enormous amount of lock-in and path-dependence.
The web is a huge bureaucratic mess right now, just waiting to be disrupted by a superior hypertext system.
There will be technologies which:
(1) despite better quality, but won't have enough traction (OS/2 v Windows),
(2) thrive just fine, while their crappy cousin continues beating them in popularity year after year (PostgreSQL v MySQL)
(3) despite being better on paper, fall short in actual use (Android v iOS)
Without fame, how does one evaluate technologies; it's impossible to try them all. How does one proceed?
I still think Xanadu et al are interesting, but it's instructive to try to understand what the real trade-offs are, the most obvious one being one of shipping speed.
"Elegant and beautiful" systems that are polished and perfect are often also brittle and monolithic and complex. Complex means hard to interoperate with, understand, and re-implement, and brittle means they can't change with the times.
Complexity also favors single vendors due to the difficulty of interoperability and understanding. Xanadu would likely have ended up as Xanadu, Inc. with no third party implementations.
If the Web was a hack then the world could be doing with more hacks.
Here we are, more than twenty years later, and a hypertext system with permanent, stable addresses is still so rare in the real world that creating one will get you on the front page of Hacker News. The "crappy" Web, meanwhile, is the foundation of modern commerce, culture and communications.
If that's crappy, I hope to God I can make something crappy someday.
Even if it wasn't -- being a first mover by sacrificing quality can get you greater adoption over superior competitors. Compare the Mac, released in 1984, with the Amiga 1000, released in 1985. By being delayed by a year, the Amiga was able to have all the features that the Mac team wanted but were told that they couldn't have -- color graphics, multitasking, and an actually usable amount of RAM, at about half the price of the Mac. The Mac, on the other hand, was the follow-up to the Lisa, which was technically superior feature-wise but too slow to be usable. Both the Amiga and the Lisa lie forgotten, and people come to believe that the Mac was the first consumer GUI machine or the best consumer GUI machine of its time -- or that its limitations were unavoidable. In reality, a combination of luck and good PR warped our sense of history.
The reality is that the web was not the first hypertext system, or the best, and the features that it lacked were implemented and functional in its competition. The web gained a following through a combination of being free (at a time when most hypertext systems, including Xanadu, were proprietary and commercial) and stupid enough that it could be reimplemented poorly by beginner programmers -- which, of course, was its aim.
The Web was designed as a model to teach non-programmers the concepts behind Enquire, Tim Berners-Lee's "real" hypertext system. The fact that the web escaped and Enquire didn't is a tragedy.
Lots of crappy things become the basis of culture, commerce, and communications. Quality doesn't particularly matter when it comes to being the basis of things, and nuance and complexity work against you if you want wide adoption. But, when something crappy becomes so widespread, its flaws become far more dangerous.
The web is a long series of poorly thought out decisions, all of which in the end led to both the grey hairs, premature heart attacks, and hefty wallets of people on this site. We're stuck with it now, because we've created a broken system and added structure on top of it to keep it from being fixed. And we, as an industry, profit from its brokenness. But, it would be intellectually dishonest to pretend that it isn't broken -- or that the ways in which it was broken were unavoidable. They were avoidable, and they were avoided; it's sheer bad luck that the web was adopted over its superior brethren.
One of the first questions that came to me: Is there any way to add a forward error correcting code, like an erasure code to IPFS? I didn't find any discussion of this in the IPFS paper.
This seems somewhat critical to be able to compete with modern centralized storage systems which are likely to use FEC extensively to provide the level of redundancy that customers expect. Modern FEC codes can provide phenomenal levels of redundancy with less than 50% overhead. IPFS seems to rely on many fully-redundant copies?
From a practical standpoint, I'd like my local node to have a full copy so that my friends coming over don't need to reach out to the (maybe offline) internet but I don't see why we shouldn't support encoding the chunked blocks in such a way?
Though perhaps I'm misunderstanding your question - if you want multiple IPFS nodes to coordinated on storing parts of an erasure encoded file? that would be useful too. Again I think it's possible - you could build a small distributed system to host files this way. The most interesting variant though is if IPFS could have support for something like this natively - so nodes run by different users can each pitch in a little. Conceptually it's possible, but probably the protocol doesn't support it.
Reed-Solomon ECC are just amazing!
Similar projects have been in development for past few years such as https://github.com/feross/webtorrent and zeronet
This has same problems as the bitcoin infrastructure though:
1. It is unscalable. A page built on IPFS receiving huge inflow of comments would generate many diffs quickly and as soon as they spread in the network they get outdated, thereby clogging space on people's disks but more importantly clogging the bandwidth where nodes compete to download the quickly changing diffs.
2. This is not completely decentralised because it uses bittorrent trackers to identify the nodes in the network. Taking down the the trackers would take down the system as well.
Webtorrent is an already working, fast, alternative to IPFS but still centralised. Think of it this way, can you do a pure peer-to-peer webrtc site without needing a ICE/TURN/STUN server? Peer discovery is the centralised part of the problem.
You talk about a page with comments. You could (this is an example) create three 'models': a Page, a PageBody and a Comment (which 'inherit' from IPFSObject). A Page would contain links to one PageBody and multiple Comment objects. Every time a comment is posted, a link is added to the Page object (so its hash changes), but the PageBody (where the majority of the page size presumably is) doesn't have to change. All peers can keep the same PageBody cached, while the latest Page contains all the desired comments. This caching mechanism will also work for unchanged comments.
Every time the Page changes, its hash would be published under some IPNS name, so peers will always retrieve the latest version.
The thing from bitcoin is 51% attack where majority of nodes collude to provide the wrong data(they might not be malicious but just serving old version of a data). This can only be prevented by time-stamping in slow scenarios.
In case of neocities, a central server is giving you a hash that the nodes would provide you the data for, but without a central management how does IPFS associate correct file to the correct hash?
If it only works for a centralised scenario then it is similar to the http://peercdn.com project where instead of a server supplying content, your previous visitor browser cache is supplying the content through p2p webrtc and you save server bandwidth.
As long as peers do not decide to keep the data (by pinning the hash), the old content will simply be deleted and replaced with new content. Pinning is extremely cheap, so you can imagine that you would want to auto-pin pages as you visit them. This is not on by default though, and I think that's a good thing.
> [B]ut without a central management how does IPFS associate correct file to the correct hash?
It does so using cryptographic identities. In IPFS every peer has a private/public keypair, and the PeerId is the hash of its public key. When publishing content under an IPNS name (which always has the PeerId at its root), you must prove that you 'own' the PeerID by signing the object with your private key. Other nodes will not 'mirror' your IPNS object unless the signature matches (which makes a nice consensus).
The paper calls this self-certified names: you can distribute the objects, and let them be provided by third parties without losing authenticity. See section 3.7.
Mathematics. Your hash is based on the content of the file, therefore you can guarantee that the content is the same as what the original author published. Also, IPFS is not a blockchain, there is no 51% attack.
And there must indeed by some central authority giving 'ultimate trust'. This is Zooko's triangle. By default IPNS gives you 'decentralized' and 'secure', but you can also opt to tie an IPNS name to a DNS TXT record, and lose 'decentralized' (at least for the initial lookup) but gain 'human-meaningful'.
Besides, the idea is that IPFS could happily support Namecoin, so there's the decentralized, secure, and human-meaningful DNS service. The only 'non-secure' aspect there is the unlikely event of a 51% attack.
Namecoin is merge-mined with Bitcoin by many miners, so if you wanted to 51% it you'd need a pretty large chunk of Bitcoin's hash power. An evil pool could do this, however it would be visible on the Bitcoin blockchain that they were doing it (although not what transactions are present/not present in the attack chain).
It's using a distributed hash table, not trackers. Nothing's completely decentralized; distributed hash tables are pretty good, since you can join them by talking to any participating node at all.
I don't know what would be the term for it since it is not completely centralised, as in the nodes collectively form the system without any central coordination, but a new node joining in still needs some information to connect to the network from some source which could be denied-of-service by say a government, so not completely decentralised either.
On principle, they could let the centralized server propagate the information that the file was updated. They could convert IPFS to a mere cache sharing system between browsers that are close by.
2. I thought torrents can work decentralized e.g. using magnet links and DHTs.
The difference being you're addressing to IPFS node IDs, rather than IP addresses. The advantage here is that your IP address could change (moving datacenters etc.), but your node ID never needs to change. Technically you could just use IPFS for routing to between clients and your standard PHP server, which would still be an improvement over the current approach and would guarantee a DNS whoopsie couldn't redirect your site to China, but you'll get better results (less bandwidth required, guaranteed uptime) by decentralizing as much as you can.
Still, even after skimming the paper, it sounds like a lot really depends on how well BitSwap is implemented. At one point, the paper says this:
> While the notion of a barter system implies a virtual currency could be created, this would require a global ledger to track ownership and transfer of the currency. This can be implemented as a BitSwap Strategy, and will be explored in a future paper.
So it kind of sounds like they need something like bitcoin, but don't want it to be tightly coupled to bitcoin - which is probably smart. But without a single clear solution for a reliable BitSwap strategy (and I'm not sure how it could be reliable without a distributed global ledger of some kind), it's hard to see how/why the necessary resources will be contributed to the IPFS network.
Really cool stuff by the way - I'm very excited to see people working on this, and I hope IPFS (or something like it) is successful.
Sure, it's easier to setup your own web presence with "the cloud" today than it used to be, but this only further centralizes control of content to the big cloud providers.
Not to mention the cost of bandwidth when serving content via HTTP. Sure, you can distribute your content via BitTorrent, but what kind of user experience is that? Can my grandma use it? Probably not.
I hope IPFS sees further adoption.
and, you'll love what we have coming. Soon, you won't need to install ipfs at all :D
It's coming along, not fully functional yet but you can watch its progress (or contribute code :D ) at https://github.com/ipfs/node-ipfs
If someone like GitHub supported this for GitHub pages it would be a great step forward for this as well.
The idea is that every user owns their own content, not some central organization. Every user maintains a log of all of their actions, like an RSS feed. Every time they make a status update, that's an item. When they comment on someone else's status update, that's an item. Even when they just like something, it's an item. Each update of the action log would also have send ping to all friends, over some kind of distributed IM like tox. They already need an IM connection like that for messenger.
Given that information alone, anyone can reconstruct the activity of all their friends into a news feed, and post new content. You'd need a lot of client-side leverage, and the ability to upload to the IPFS network from the client, but all the dynamic information is there.
And there you have a fully distributed facebook clone. More resilient and privacy-sensitive and monopoly-averse than even tools like diaspora and statusnet.
Not really. There's no reason you couldn't implement Facebook's core functionality with personal servers containing a profile at each node and a pub/sub (or even just pull) system connecting each node with its friends.
All data is encrypted and my server can be hosted anywhere (just as simple as setting up a WordPress instance).
My own "news feed" are a combination of all of the other nodes I'm subscribed to.
My problem with Disapora is that they push the "sign up to someone else's pod" rather than "install your personal pod".
It just feels like a multi-centralised model. Rather than one Facebook with my data, now there are 311 "Facebookesque" Diaspora Pods (as of today).
As much as people hate the idea, a WordPress like (PHP) application as a "personal server" would be successful. There are lots of hosting companies out there where you can get up and running for peanuts.
My aim would be for a simple install:
- Download software and upload to hosting company
- Enter MySQL auth details
- Choose local filesystem or third party (S3, DropBox, Google Drive, MS SkyDrive) with OAuth as the backing store (files stored encrypted).
Then I would be able to subscribe to other friends and family personal servers, just like RSS feeds. Friends can access my own personal server through OAuth and leave comments on content. Those comments are echoed back in my private feed which is displayed in their merged news feed on their own app instance.
The technology is already there. It just needs someone who is damn good with PHP!
Unfortunately it never really caught on.
- Is it possible to, say, log into my bank account, see my current balance, and set up bill pay?
- Can I order a pizza over IPNS?
- Can I do realtime chat?
Also, this article says we could use Namecoin and not rely on ICANN for DNS, however, you still need ICANN to distribute ip address to get connected to the internet in the first place. Or am I missing something?
I think that it should be possible to build hybrid centralized/decentralized sites with IPFS, with static public content being fully decentralized and sensitive information going through direct tunnels. Of course, the more decentralized your system the better.
As far as the IP system, yes we are unfortunately still reliant on ICANN being sensible about IP allocations, but the DNS system can be fully deprecated. I do know that there is a project out there to redo the IP system in a decentralized fashion but I can't recall its name right now.
The last one is a no. Well actually yes if you just poll a file, but that's not practical on a large scale. You need an external communication channel to implement that. It's possible to have that be fully distributed as well though, as in tox. You need this external communication channel to implement a lot of things anyway, so it would probably be part of the platform that forms around the idea of a distributed web.
Interesting to think about.
You won't have the full graph that Facebook does, but does that really matter? I suspect the best predictors of who your friends are would be people already in your network.
I think the best sources will be the paper, the talks on the website, and the FAQ:
Freenet has a similar immutable datastore but chat, microblogging, etc have been built on it.
More work would be needed here perhaps. But decentralized apps should happen one day. As someone without real technical knowledge in security and decentralized platforms, I'd love to hear from some on HN where we really stand on decentralized apps and security.
- Public data. Distributing, searching and querying is not a problem.
- Restricted data. You'd be distributing encrypted data to peers. Now, if we have a way to send an encrypted query to the peer and have it run on encrypted data without the peer being able to understand the query or the data, we'd be set perhaps.
- Obviously this would need application architectures to change. But that's solvable.
Is that possible? It doesn't seem to be "N versus NP" but it seems similar. How can a computer find a correct answer to a question it doesn't know?
I suppose you could send off small units of processing across many nodes, such that none of them knows the full result set...
Google shows up plenty of articles on the subject, but I am not an expert to know where we stand on this.
It seems possible but extremely tricky.
My understanding is that it is possible, and Turing-complete systems can be built, but they are slow and unwieldy, requiring a large amount of cyphertext to encode data (on the order of 4KB per float, I seem to recall).
It's still up for debate whether or not such techniques will eventually be up for general distributed computing, e.g. distributing a VM instance over multiple distributed servers; or just for specific use-cases, e.g. bitcoin-esque distribution of small transactions, etc.
I imagine you could go through their codebase to try to see how far this has come along: https://github.com/ipfs/go-ipfs
According to this, it looks like the implementation is currently in progress: https://github.com/ipfs/go-ipfs/blob/master/dev.md
Hope something in there helps you.
Someone in this thread already asked one of my questions (So is this primarily for static websites?) but my second question is: So is this primarily for personal websites?
I'm having a hard time finding a good way for Facebook, for example, to monetize their website. Targeting ads go out the window with mostly static content. Even more so though, what about Netflix? How is DRM done? How do you make sure only the correct users can access your objects?
edit: Also, doesn't a "permanent web" have an inevitably fatal flaw that you can't free space?
As far as paying a premium over raw bandwidth to access premium content (Netflix), you could pay them to unencrypt it for you to download. DRM though isn't secure in the first place and decentralization won't make it better.
The general idea of decrypting premium content at a cost would work fairly well, however I'm wondering if it's something that can be done easily and dynamically via IPFS. For example, can I, without burdening the network with a billion copies, encrypted with each the client's key, hand that data off to them? If I simply decrypt it and hand it over, it's on a distributed network and anyone can grab it.
The website used to navigate the catalog of premium content could be built as a IPFS page though.
The more people a file is requested by, the more places it is stored. If the file ceases to be requested, and all the participants decide to remove it from their caches, the file is gone.
DHT records are slightly more persistent, but without a sponsor they fade as the network's churn removes the nodes that once hosted them.
It is very hard to delete things, and the most likely things to be deleted are things people are not providing effort to preserve (even one person providing such effort is sufficient to maintain the DHT records.)
In terms of targeting, once you're dealing with "small data" instead of centralised "big data" you need a lot less algorithms to figure out what ads might work. A FB neural net probably does a lot of deep learning to guess that a bunch of mountain-biking friends might be interested in saddles. Decentralised operators could probably figure that out themselves.
Nobody, but nobody wanted it then (or since, it's still part of the kernel), and I would be surprised if anybody wanted something similar today.
This uses a set of untrusted servers, so it's different. Also, trust use-cases come in a lot of different flavors, so the comparison of 'wants' is pretty thin.
I'm not sure if there are any alternatives that work any better over long distance. I suppose CERN also has CVMFS , which is a read-only caching filesystem that retrieves the files from upstream via HTTP. This works much better, but it is read-only so only satisfies certain use cases.
Next, I'd really like someone to implement git backend in IPFS / IPNS. Right now there's https://github.com/whyrusleeping/git-ipfs-rehost but that's just simple hosting.
Nothing is technically stopping me from adding ipns except finding the time to do it.
If I link to a page and it goes down, or the content changes, it in a way changes the content of my blog too - in an unwanted way that is!
I find it quite belittling how browser vendors treat their history features in terms of archival.
You can cover earth surface with 8.5×10^36 oxygen atoms. Imagine that every one of them is IPFS file with 256 bit hash. Probability of collision is still less than 0.1%.
I suppose this could be made a problem for the user, i.e. if you want to make sure that your page isn't colliding with any other, then simply check first. (Of course there is still a race condition possibility of two colliding addresses being created at around the same time...)
Its generally considered safe to ignore, there are no known sha256 collisions, and most people are fairly confident they wont see one for quite some time.
(We may improve how this upgrade will work down the road by allowing links to link with multiple hashes at the same time, but at this time this has not proved to be necessary, and can be added later)
While two things may produce the same SHA hash, they may differ with other hashing functions.
However, the probability of a hash collision is already incredibly low. As a demonstration of this, here's a list of a tiny subset of bitcoin private keys: http://directory.io/
The human mind just isn't capable of understanding how large of a number the number of sha hashes is.
It's well documented (of course also by the link you posted), the super low likelihood of hash collisions, but I think (hope) the original poster knew that -- was trying to answer assuming that extraordinary case actually happened.
There is no hash function without collisions. The set of inputs is infinite but the set of outputs is finite. (The identity function isn't a hash function--its output isn't a fixed length.)
What about hashing an object that contains the hashes generated by x hashing algorithms? an extra step of resolution, sure, but some way to easily demarcate that some file experienced a collision with another file, and had to have the number of hashes increased to stave off further collisions?
IE if you have some hashing function with a 1/2^5 chance of a collisions, using that in conjuntion with another hashing function with say 1/2^3 requires that both unlikely probabilities occur, resulting in 1/2^8, where a "stronger" hashing function might only have 1/2^6 (technically stronger than either one of them, but still not as strong as both)
I'd really love some correction on the logic above though, I am by no means well studied in probability or hashing functions, despite some effort, and would really appreciate any corrections.
It's better if your application is redesigned to be distributed on a fundamental level, but you can still send custom data through IPFS in a standard client/server relationship if you want to, see http://gateway.ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsN... for an example.
>I for one will not be hosting a node where arbitrary code can be run
This isn't a distributed computation platform like Ethereum, it's purely for transferring files, and the only files you host are those that you explicitly request. Also, IPFS is not a blockchain with a 51% to take over, it's an entirely different animal, the only way to redirect a site is with the correct associated public/private key (unless SHA-256 gets broken at some point in which case IPFS migrates to a different hashing algorithm).
I imagine getting the static html/js/css for the client-side app onto IPFS should be simple enough, but how would I handle bootstrapping?
Even in a decentralized service, clients need to connect to a seed server somewhere to start querying for other peers on the DHT. Would it be possible to host such a seed server on IPFS? Or would the seed server still need to live on a traditional server somewhere on the "old web"?
Since IPFS runs on its own DHT, are there any plans for a client-side IPFS DHT library for apps to use to connect to peers on the IPFS network? Similar to https://github.com/feross/bittorrent-dht, but with native support for IPFS peers?
Something like this would make building a client-side real-time app based on IPFS a much more streamlined experience since developers won't have to depend on an external DHT.
The number of hosters will be far less than the number of clients.
The same issue is now in bitcoin, lots of lightweight clients (that don't keep a full ledger), and not so many full nodes (probably even less after mining rewards will be zero).
I mean indeed local p2p distribution sounds nice, but what's the use if the device is dead in a few hours?
There are a number of ways this reputation could be represented, perhaps one possible approach might involve exchanging some unit of cryptocurrency (Filecoin?), or perhaps a web-of-trust style system where nodes publicly 'trust' other nodes based on how much use that node was to them. Personally, I think cryptocurrency is the way to go here, but there's lots of work out there on P2P incentivization strategies.
Of course, this could work in tandem with the standard barter system that bittorrent uses, IPFS is not locked into any one reciprocation algorithm and could happily use multiple different systems depending on the situation.
As ot asked, I am also curious about how this compares to other efforts to create decentralized networks, like Freenet and GNUnet. I definitely plan to pick one in the coming weeks and start using it, hope this catches on.
If you had the hash of some content, couldn't anyone find all of the ips currently serving it?
EDIT: Found this
- Now I want to view a whole web page. Is that possible?
- Let's say I have a site that I want to publish through this, along with a bunch of files (images, downloads). I'm serving that site now through Apache. Do I need to 're-publish' each file (using the command in the article) every time a file on there changes? Or is there some automated way to have a distributed version of the latest version of my site? I mean, it's fine and dandy that people can store cached copies, but what I want is that if there's an update to my software, that people actually get the latest version when they click 'download', not that they get an old version because some dude somewhere made a mistake at one point 'hosting' a static version of my site and not updating, and visitors just happening to use that version because that dude is closer to them (in network topology terms) than I am.
You can use it like a unix command line tool to add and get files but you can also run it in daemon mode. It then also opens a HTTP server locally which allows you to use the same URLs as above just with http://localhost:8080 as the schema and host.
2) You can use the ipfs cli tool to add a directory with your sites html and other static content. The client will then give the the root content hash. Then you open /ipfs/$rootHash/index.html or /site2.html and voila! :)
Btw, this article is serving from ipfs. Try 'ipfs get QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5bgFYiZ1' and take a look in that directory on your harddrive. Also: when you now run 'ipfs daemon' your node will help with requests for this content.
3) To update content with a static name, you can use IPNS, it's also mentioned in the article. IIRC: To resolve those entries (and get the latest ipfs content hash) the node with the corresponding key for the ipns entry needs to be online. If it isn't, you get an error like 'Path Resolve error: could not resolve name.'. Maybe come to the #ipfs irc channel on freenode for more specific questions.
The "permanent web" on the other hand you can create using MHT archive files and BitTorrent magnet links. I struggle to get excited about that.
Wake me up when someone figures out how to make apps that require seamless multi-user collaboration, like say distributed clones of HN, Facebook or Github.
There are people working on this, so stay awake.
Permanent.... is a great idea and a very conflicting idea.
While the utility of permanent cannot be denied but the implications for privacy, piracy are far reaching.
Transparency can work both ways.
Copyright violations will likely function in a similar route to bittorrent.
Distributed content in IPFS is eventually purged from nodes unless one of the hosting nodes has pinned it to be retained. Therefore, if no-one at Time x views certain content and pins it, then unfortunately at Time x + n that content might disappear just as thoroughly as under HTTP.
Unfortunately I fear that means that 'popular' content persists whilst niche and unique data might still fade away.
Perhaps it is also possible to tweak the caching algorithm to keep track of files that are rare on the network.
... and it also sounds like usenet, but here everybody is a news server and they selectively host articles.
I wonder if this would also be possible in this case?
For example, my small server might serve 200 sites but I'll backup around 500 or so older ones somewhere else.
Many users want to do things with other users
That means there has to be a way to persist state and history (of a group activity, an evolving resource, etc.) across machines. Mental poker is still very hard to do for groups. Since most clients will not have amazing uptime, that means you still need a network of servers. Servers that manage security behind a firewall.
So, the conclusion is -- for group writable data, the best you can do is have servers be completely interchangeable, but you will still need servers and networks of servers.
The good news is, many of the challenges are orthogonal to each other. Someone's building a Mesh network, for instance. Someone else is building the protocol layer with IPFS. We're building the social layer for this coming web.
Take other patterns from clojure immutable data structures for various other kinds of group edited time travelling sites
as for conflict resolution, merging, etc, it kind of looks like a (mostly) solved problem. I don't see how conflicts could occur in a comments stream. It's append-only.
So yes, for append-only tree structures, this may work. In terms, of say, a Google Doc that means you can never delete or insert any text, but only append text to the end of paragraphs etc. Certainly very limiting.
Unless there's something you haven't described, in which case I encourage you to provide a more in-depth comment describing a solution for actually inserting or modifying the tree.
1 - What prevents someone from altering a JS file and serve it to other peers?
2 - Is it possible to obtain all the version of an object and see it's history of changes (like Git)?
"authenticated: content can be hashed and verified against the link"
Those redirect requests to public gateways (http://(gateway.)ipfs.ip/ipfs/$path) to the http server of the local ipfs daemon. It fetches the content using bitswap protocol and makes it in turn available to other nodes, helping with the distribution.
 - https://github.com/lidel/ipfs-firefox-addon
 - https://github.com/dylanPowers/ipfs-chrome-extension
You only provide files that you have previously accessed via IPFS or added yourself. So even without any external motivations to sponsor other people's content (which are in the works), you have the motivation of sponsorship of content that you wish to increase the robustness and accessibility of. Increasing access to public utility can be itself an incentive.
When neocitieslogo.svg is added to my IPFS node, it gets a new name: QmXGTaGWTT1uUtfSb2sBAvArMEVLK4rQEcQg5bv7wwdzwU. That name is actually a cryptographic hash, which has been computed from the contents of that file. That hash is guaranteed by cryptography to always only represent the contents of that file.
And opened a reddit/eli5 question, just to understand "if we can use a hash to identify content uniquely?" here: https://redd.it/3k8g51
Can anybody elaborate and lighten me up a bit?
I think you've correctly picked up on the difference between "uniquely" and "infeasable".
It's not possible to use a hash function to give every possible input a unique reference. But a cryptographic hash function aims to make the inputs to "hash collisions" be very different. So, for real world files the hash is unique, because the file that would produce a hash collision would be malformed and useless.
In general when you hash a message you don't want an attacker to make subtle changes and get the same hash.
"Here is my electronic payment for £100" should never have the same hash as "Here is my electronic payment for £10,000. Nice to meet you on the weekend".
You've got DNA. Everyone's DNA is unique (shhh, just go with me for a minute). In fact, let's say that we found about 140 GATC pairs, which were also unique (we don't need your whole sequence, just those 140 pairs). Meaning, if we take your 140 GATC pairs, and we take my 140 GATC pairs, we'd be guaranteed to get different results.
Now, if I want to refer uniquely to a person, I can just use those 140 GATC pairs, which looks roughly like QmXGTaGWTT1uUtfSb2sBAvArMEVLK4rQEcQg5bv7wwdzwU (when expressed with upper + lower + numbers).
But, you've been thinking to yourself, WHAT ABOUT TWINS?!?
Well, great question...
...but the cryptographic hash that we use is one where twins are EXCEEDINGLY UNLIKELY. Like, amazingly, shockingly, oh my god, unlikely. They exist, but in practice they're so uncommon that it doesn't matter.
Maybe sometime, some unfortunate guy is going to request a picture of Natalie Portman, and get an MP3 of Ted Nugent. It could happen. It's just super unlikely to ever happen in practice.
So, there you have it - a cryptographic hash is... excuse me, because this is a supremely flawed analogy... kind of like the DNA of a file.
It's like [guid](http://stackoverflow.com/questions/2977593/is-it-safe-to-ass...) as far as I can understand and even bigger that we shouldn't worry.
Kidding aside, the answers given on Reddit 20 minutes before your question was posted here suffice.
I've worked with IPFS, and it's really amazing. The development on it is very active, and the main developers are easily accessible for questions on Freenode#ipfs. I've been trying to build a tiny consensus protocol on top of it, but its native support for signing is still a little rough around the edges (and of course I've been distracted by a thousand other side-projects).
Being able to push data to a target seems to be a tool that will empower real change - not just solve a technical problem (HTTP failings).