
An introduction to IPFS - bergie
https://medium.com/@ConsenSys/an-introduction-to-ipfs-9bba4860abd0
======
joepie91_
So, the same question and point of criticism that I've brought up in past
discussions about IPFS, and that so far has not yet been sufficiently answered
by anybody:

The claim is that IPFS could replace HTTP, the web, and so on. The only thing
I see, however, is a distributed filesystem, which is only one part of the
puzzle. Real-world applications require backend systems with access control,
mutable data, certain information being kept secret, and so on - something
that seems fundamentally at odds with the design of IPFS.

How would IPFS cover all of these cases? As it stands, it essentially looks to
me like a Tahoe-LAFS variant that's more tailored towards large networks - but
for it to "replace HTTP", it will have to not only cover _every_ existing HTTP
usecase, but also do so without introducing significant complexity for
developers.

Seriously, I'd like to see an answer to this, regardless of whether it's a
practical solution or a remark along the lines of "this isn't possible". I'm
getting fairly tired of the hype around IPFS, with seemingly none of its users
understanding the limitations or how it fits (or doesn't fit) into existing
workflows. I can't really take it seriously until I see technical arguments.

~~~
natrius
Let's separate the use cases: we want a permanent, public web, and we want
applications that don't die when servers get shut down. IPFS clearly solves
the first use case, so let's talk about the second.

In this context, IPFS is part of the toolkit for building decentralized
applications. The architecture of these applications is fundamentally
different from today's apps. Users control the data, not the application
developer. Private data can live on the user's machine or some cloud service
they trust to hold the data for them. IPFS probably isn't a good fit for
private data, but public apps like a decentralized Twitter will probably use
it for all of their data. Encrypting private data and storing it with IPFS
will sometimes make sense.

To deal with mutable data, it depends what your goals are. If users don't need
to agree with each other about the value of the data, they can store it
however they want and maintain their own view of the world. If they need to
agree on data that only one person can change, then you can use IPNS to manage
mutable references controlled by one keypair. If users need to agree on data
about their interactions with each other, then they need to use a
decentralized cloud computer that can enforce rules on a historical record. We
call these enforcement clouds "blockchains," and Ethereum is a blockchain that
defines a process for defining new rules so application developers don't have
to build their own blockchains.

Building decentralized apps with IPFS and Ethereum is the most exciting work
I've ever done. It doesn't just improve the apps themselves, it fundamentally
changes the economics of our industry. Most businesses are built on network
effects, but decentralized apps don't bottle up the value of their networks:
they give them to their users.

That's more like the world I want to live in. If you want to be a part of this
change, you should join us at ConsenSys, or join the decentralization movement
in general.

~~~
droffel
I've recently been considering building an application that runs on a backend
of Ethereum+Storj. The ability to store (and retrieve) data in Storj chain
feels like a necessary component in getting a full DApp set up, right now I'm
grappling with a concept that would (somehow) require the ability to provide
users with a single-use key to download data from Storj.

The future of cryptocurrencies is bright, there's now a large set of tools in
the toolbox to use to build resilient, permanent applications for people to
use. I'm really excited to see what people build in 2016, the cryptocurrency
ecosystem has been growing rapidly in the shadows, and they're finally
breaking into the mainstream.

~~~
DennisP
Also, the guys working on Ethereum's Swarm think they've figured out a good
incentive system for it, and plan to publish an "orange paper" in the near
future.

[https://www.reddit.com/r/ethereum/comments/46fz8f/ethereum_i...](https://www.reddit.com/r/ethereum/comments/46fz8f/ethereum_in_the_year_of_the_fire_monkey_can_you/)

~~~
matthewbauer
That sounds a lot like FileCoin which was created by the IPFS guy.

------
matthewbauer
I really like that IPFS is trying to change the way we think about the
internet and HTTP. That being said, I'm very skeptical of a lot of the design
choices. It seems like it's just trying to incorporate a lot of the latest
buzzword technologies without any real consideration why. I get that
blockchain, Git, BitTorrent are all powerful but that doesn't mean that mixing
them all together into IPFS is going to be useful. Most likely it will end in
a sort of internet Frankenstein's monster: overly complicated and lacking real
benefits over traditional HTTP, FTP, and the rest.

My biggest concern is that in the end IPFS isn't even really "permanent" in
the way I understand it. Objects added to IPFS still need someone to in a
sense "seed" them for that content to be available. What advantages does that
give over just hosting the internet over static torrents?

~~~
jokoon
> overly complicated and lacking real benefits over traditional HTTP, FTP, and
> the rest.

Authenticity over P2P is complex, that's the cost of having no server. But if
such a cost can save you hardware and server hosting, it's worth it.

> What advantages does that give over just hosting the internet over static
> torrents?

No need for DNS. That is pretty huge.

~~~
geebro
>But if such a cost can save you hardware and server hosting, it's worth it.

The vast majority of this discussion is miles above my head, but this made my
Spidey Sense tingle. Someone somewhere has to host the data, and seemingly in
more than one place. There is no such thing as a free lunch.

------
kodablah
One of the things I am most looking forward to is the abstraction into
libp2p[1]. I am wanting to try out my own ideas but I don't want to hassle w/
building my own Kademlia DHT or NAT traversal.

1 -
[https://github.com/ipfs/specs/tree/master/libp2p](https://github.com/ipfs/specs/tree/master/libp2p)

------
ThrustVectoring
>It is left as an exercise to the reader to think about why it’s impossible to
have cycles in this graph.

This was funny. Suppose you wanted to build a node that linked to itself.
You'd have to find a fixed point in the combination of functions that adds
other data to the link and hashes it. Finding a fixed point of a hashing
function is hard.

~~~
KMag
In fact, for an ideal cryptographic 256 bit hash function, modeled as a random
oracle, it takes an average of 2^128 iterations before reaching a periodic
point. The average cycle size of the reached periodic point is also 2^128.
There exists a fixed point for your set of files only if the cycle size of the
periodic point is 1.

Using the big-step, little-step cycle detection algorithm to avoid using
gigantic amounts of memory, you're then looking at an average of 1.5 * 2^129
iterations of updating your graph of 256-bit cryptographic hashes in order to
discover you've hit a periodic point.

Offhand, I don't know the probability that there's a fixed point for a given
starting point for a random mapping of 256-bit values to 256-bit values, but
my intuition is that it's vanishingly small. If anyone has an elegant
derivation of the probability, I'd love to see it.

~~~
nhaehnle
The expected number of fixed points in a random permutation is 1. This is an
application of linearity of expectation: for a random permutation f of size N
and a given input X, the probability that f(X) = X is 1/N, i.e. the
probability that X is a fix point is 1/N. There are N possible choices of X,
and so by linearity of expectation the expected number of fix points is N *
1/N = 1.

This doesn't tell you anything about concentration bounds or whatever, but
it's a neat fact nonetheless.

~~~
KMag
Great to know! Upvoted.

Unfortunately, it's a random mapping, not necessarily a random permutation. An
ideal block cipher would be modeled as a random permutation. Though, in this
particular case the domain and range are the same, so unless I'm missing
something, the expected number of fixed points comes out to 1 by the same
reasoning.

~~~
nhaehnle
Indeed, the expected number of fix points is the same for both permutations
and mappings. Quite curious when you think about it, because the distribution
of fix points is obviously different! (Consider the case n = 2 for the
simplest example: there are mappings with exactly 1 fix point, but every
permutation has either 0 or 2 fix points).

Note though that a correct block cipher is necessarily a permutation, because
it's invertible (by definition, a permutation is just an invertible mapping
with domain and range equal). A hash function on the other hand needn't be a
permutation even when you restrict the domain to inputs of the same bit length
as the hash output.

------
filearts
What I didn't see answered in the article was how content is discovered.

The only way we are able to productively use git is because there is a
convention to have some state in a non content-addressable location
(.git/refs, .git/HEAD, etc...).

Saying that IPFS could replace the web means either: 1) Introducing shared
mutable state; or 2) full knowledge of everything on the network.

I'm guessing that the existing web is what provides that layer right now. Is
there any work going on for novel IPFS-based content discovery mechanisms?

Another thought: Given the content-addressable, immutable nature of this
graph, how does one discover that a new version of something is available
without a central authority? How could we discover the tip of a blockchain
with IPFS alone?

~~~
dsp1234
IPNS

[https://groups.google.com/forum/#!topic/ipfs-
users/fr6dlQ8we...](https://groups.google.com/forum/#!topic/ipfs-
users/fr6dlQ8we7Q)

------
Sami_Lehtinen
Just pointing out that GNUnet and Freenet both allow pretty much similar
feature set. I've studied both extensively, and after checking out IPFS, I
don't get what's new. Except all the 'hype' around it, which is generally
something which I as tech nerd dislike. Another problem with distributed
solutions is often performance, some tasks just become surprisingly expensive.

~~~
_prometheus
Have you tried using all of them and compared them for yourself?

git, hg, monotone, ..., all offer(ed) similar feature sets -- dvcs. yet
they're __very __different tools.

------
beagle3
"Private", as opposed to the "public" IPFS , but essentially the same ideas:
[https://camlistore.org/](https://camlistore.org/) (from Brad Fitzpatrick of
livejournal fame)

~~~
XorNot
The problem with this and other initiatives is they don't really have a good
story for what happens with a lot of data-lint regular people have. They make
broad assumptions that we have near limitless storage resources (including
those needed for redundancy) when private users definitely don't have that and
even at enterprise levels, the story is still fairly complicated.

Immutable is an interesting idea - it's a lot less interesting when 100
different copies of the same slightly changed RAW file from my digital camera
are using up 100s of gigabytes. Or I misclick and something goes into the
public store which shouldn't - I might not be able to get rid of all of it,
but I should be able to undo it a little.

~~~
rakoo
If you look a little bit closer into camlistore, you'll see that it splits
content with content-defined chunking, ie when there's a change in the middle
of the file only this zone (camlistore targets 8kB) is _really_ added to
camlistore; there's an indirection between the file as an object and the
actual content. That means that hundreds (or even thousands) of slight changes
to the same big file won't change much.

Regarding misclicks, for camlistore everything is private by default; in order
to make something public you have to build an authorization, which you give to
someone. You also have the possibility to remove that authorization, and the
data won't be publicly accessible anymore.

~~~
XorNot
That doesn't address the issue: any compressed content (and raw files are
losslessly compressed) tends to really break rolling hash type splitting
systems.

------
_prometheus
Thanks very much to Christian and John for writing a much needed detailed
article :)

Some more links for people to check out:

## (upcoming) IPLD "merkleized JSON" format:

\- improves upon our basic format to make it much more pleasant to build
things on top of ipfs.

\- JSON meets CBOR meets Merkle-linking

\- mini-spec:
[https://github.com/ipfs/specs/blob/master/merkledag/ipld.md](https://github.com/ipfs/specs/blob/master/merkledag/ipld.md)

## answers to some common questions i've read on this page:

\- content model / replication:
[https://github.com/ipfs/faq/issues/47](https://github.com/ipfs/faq/issues/47)

\- how resolution works:
[https://github.com/ipfs/faq/issues/48#issuecomment-152917088](https://github.com/ipfs/faq/issues/48#issuecomment-152917088)

\- how IPNS / mutable linking works:
[https://github.com/ipfs/faq/issues/16](https://github.com/ipfs/faq/issues/16)

\- this is a very poor answer, sorry, i'll write up a post or paper on it.

\- for now if interested, see the QConf slides below, specifically slides ~110
to ~130 -- the DNS, IPRS, SFS/Mazieres linking, IPNS parts.

## These repos have interesting "lab notebook" style discussions:

\-
[https://github.com/ipfs/notes/issues](https://github.com/ipfs/notes/issues)

\- [https://github.com/ipfs/apps/issues](https://github.com/ipfs/apps/issues)

## deep dive talk at stanford:

\- video:
[https://www.youtube.com/watch?v=HUVmypx9HGI](https://www.youtube.com/watch?v=HUVmypx9HGI)

\- slides: lmk if you want them, i'll pdf them up

## talk at ethereum's devcon1 covering blockchain uses

\- video:
[https://www.youtube.com/watch?v=ewpIi1y_KDc](https://www.youtube.com/watch?v=ewpIi1y_KDc)

\- slides (interesting bits start at slide ~70):
[https://ipfs.io/ipfs/QmUgRq7QfmRbPw5kXqwSs1TRtPDBXMoDNiYwJQg...](https://ipfs.io/ipfs/QmUgRq7QfmRbPw5kXqwSs1TRtPDBXMoDNiYwJQgQ1kodNY/ipfs-017.ethereum-
devcon1.compressed.pdf)

## talk at qconf sf (similar to above)

\- in this talk i discuss a bunch of datastructure stuff, including using IPFS
for PKI, for arbitrary dns-like records, for name systems, for CRDTs, and so
on.

\- unfort video will be released in march: [https://qconsf.com/video-
schedule](https://qconsf.com/video-schedule)

\- slides (intersting bits starts at slide 80):
[https://ipfs.io/ipfs/QmPpYmdSEKspjgXxVyGK9UMHV54fKZS8MwJjppg...](https://ipfs.io/ipfs/QmPpYmdSEKspjgXxVyGK9UMHV54fKZS8MwJjppgeyNsoE1/ipfs-018.qconsf.compressed.pdf)

------
symlinkk
Probably a dumb question but how does this compare to
[http://storj.io/](http://storj.io/)?

~~~
Qwertie
Sorta like torrents vs dropbox.

storj is for paying people to store your personal data and ipfs is more for
public data.

Of course you can use ipfs for private data too

------
nickysielicki
The graph to describe the directory is a misprint, right?

"testing 123\n" isn't anywhere, and "Hello World" (and its hash) is pictured
twice. I'm sure that the testing.txt arrow should just be pointing to a node
with a different hash and content.

