
Filecoin: Proof of Storage Systems - simonebrunozzi
https://blog.coinlist.co/deep-dive-into-filecoin/
======
filleokus
I don't really get the point of decentralised _storage_ solutions, where the
stated goal is to compete with ≈ S3 [0].

S3 storage pricing, for most users is just a rounding error compared to e.g
egress or compute cost. And it has proven to be tremendously reliable, both in
durability and access. No sane person would point a CDN against a Filecoin
backed origin, right?

For kinda-censorship resistant distribution of stuff like Wikileak dumps we
already have torrents right now, where the seeders are aware of the content
they spread.

For uses where S3 is not competitive, like storage of my media rips, spinning
rust at places like Hetzner exist, or Backblaze et. al for more casual backup
uses.

The only point I can see for decentralised "storage" of this kind is for ultra
high durability, small data size, almost never read, almost some sort of
"stakes". One could imagine something like storing contracts/land deeds,
similar to Gwern's timestamping of URL's. The files should be small, the
storage model optimised for maximal durability (every node store's almost
everything?), and prices high enough to ensure durability.

[0]: Just look at this PR page
[https://filecoin.io/store/](https://filecoin.io/store/) [1]:
[https://www.gwern.net/Timestamping](https://www.gwern.net/Timestamping)

~~~
Dylan16807
I understand your overall point, but when it comes to

> S3 storage pricing, for most users is just a rounding error compared to e.g
> egress or compute cost. And it has proven to be tremendously reliable, both
> in durability and access.

It's not like you can separate S3 storage costs from egress costs, though. I
_wish_ I could use S3 for storage but pay for my own bandwidth.

Even if you'll to go all the way to their network and plug a network cable in,
well... They charge $7.40 a day for a gigabit port, and $54 a day for 10gig.
That's on par with what many companies charge for transit to the actual
internet, but fair enough. Oh, wait, I neglected the per-byte cost on top. The
per byte cost that, for a mostly saturated port, is $210 or $2100 per day
respectively.

And that's the cheap option.

~~~
filleokus
Yeah, it's true, it's not like egress is a completely separate thing from
storage.

But I guess it also depends a bit on your use case? E.g if you upload big
datasets for heavy computation and then only download small results (≈ machine
learning) or have a cheap(er) CDN in front of S3 with a favourable usage
pattern, the storage and egress would be somewhat separate.

Still, something like Filecoin is obviously not a resonable solution for the
run of the mill "cheaper than S3 file hosting and network" problem?

(I've been on the lookout for something like that, for personal use where I
don't all those nines and other S3 benefits. The best thing yet seem to be
Wasabi [0])

[0]: [https://wasabi.com/cloud-storage-pricing/#three-
info](https://wasabi.com/cloud-storage-pricing/#three-info)

------
zepearl
General question:

when somebody will push into Filecoin some "bad" files (anything, from docs
needed by the mafia to children being sexually abused, to fake
videos/claims/whatever about you, to any other bad things you can think of),
how can that be detected and then deleted? (ignoring here the theme related to
accountability)

Question based on [https://docs.filecoin.io/introduction/why-
filecoin/](https://docs.filecoin.io/introduction/why-filecoin/) :

 _Filecoin resists censorship because there is no central provider that can be
coerced into deleting files or withholding service. The network is made up of
many different computers run by many different people and organizations.
Faulty or malicious actors are noticed by the network and removed
automatically._

In this context I'm not a believer of "automatically".

~~~
simonebrunozzi
I think this is a serious problem - but probably not for the exact use case
you list (mafia, sexually abused kids).

It should be solved by allowing an "expensive" (a) "vote" (b) to override, or
delete, or hide for a number of years.

(a): expensive should discourage the option in general (b): proof-of-stake,
proof-of-ownership, proof-of-work are some of the many ways in which you can
define voting rights.

Unfortunately, it doesn't work easily in practice. Example:

You created a file (say, a CV, or a newspaper article in PDF) to discredit me,
but it's a lie. To me, not having that file publicly accessible would be worth
$1,000.

You could blackmail me - give me $1,000 or I'll upload the file! But I think
you might do it anyway, so I don't negotiate (never negotiate!!). The file is
1 MB, and it costs $1/year to keep it up and running on Filecoin.

I should have an option to ask a voting pool to remove that file for $100. The
pool agrees, and the money gets burned (I shouldn't create any incentive for
someone to cash in the amount).

However, you can upload the file again; or a slightly different version, and
we are back to square one.

It then goes to the "discoverability" of the file. That part is probably going
to be decentralized too, so no court could "order" Filecoin to make a file not
discoverable.

How do we solve it? I think there will emerge a search system that provides
services to censor, obfuscate, etc, a number of files, based on reasonable
requests. Whoever builds that company, I bet it will not be poor.

~~~
zepearl
Thanks a lot for your brainstorming - some remarks are interesting.

But overall the root problem would still exist and the solutions would have to
gather a majority - e.g. asking a "pool" to remove a video would mean for the
"pool" to watch that video (to ensure that the claim is valid) and after the
first experience (if it's child porn) we would all probably have to call a
psychiatrist to help us overcome what we've seen => not feasible, it would
actually destroy the pool's mental stability which is, in this concept, needed
to evaluate the contents (videos in this example).

Edit:

Being ignored by some search engine would still be a no-go for me. Knowing
that some personal data (general files, pics, videos, whatever) exists
somewhere would make me feel absolutely NOK.

------
simonebrunozzi
> We created Filecoin because the amount of data humans generate is
> exponentially increasing, and we need more efficient ways to store and
> access it.

I don't think Filecoin can be MORE efficient than some of the larger, current
systems.

I actually think this statement is misleading. It should say that Filecoin's
goal is to offer storage options that are distributed (hence redundant),
protected from censorship, and possibly removed from absolute control of a
single large corporation.

~~~
rudolph9
There are a lot of nuanced ways a trustless distributed system can be more
efficient. The article is pretty light on details but consider the content
addressing mechanism. The data can be securely served by any device and
independently verified by any recipient. The minimum latency of reads is
theoretically much lower than the sever/client infrastructure commonly used
today. It’s like edge computing with any device that can connect to one
another. Obviously a lot goes into ensuring the data existing on a low latency
connection to the distributed-client but the there are a number of somewhat
nuanced theoretical efficiencies that can be achieved by making content
addressing the core of a distributed system (caching, [as mentioned] fault-
tolerance, parallel computing, etc).

~~~
acdha
> The minimum latency of reads is theoretically much lower than the
> sever/client infrastructure commonly used today.

This is an oversimplification: it assumes that discovery is very low cost and
that there's a peer with a copy enough closer on the network that it's faster
than talking to server-class hardware in a data center with a high quality
network connection. Given the number of assumptions which need to be true for
that to be a net-positive I'm skeptical that it'd be easy to hit anywhere
close to the best-case theoretical scenario.

~~~
rudolph9
> Obviously a lot goes into ensuring the data existing on a low latency
> connection to the distributed-client

This is admittedly a non-trivial task but perhaps an example can shed light on
way it can be more efficient in practice.

Consider the Apple Photos app. It's on your phone, it's on your desktop, it's
on the cloud (iCloud storage). Say you have so many photos they don't fit on
your phone. When you try to access a photo that is not on your phone it gets
it from icloud (these are literally features that are currently offered by
photos/iCloud). Now suppose your desktop has lots of storage, you usually look
at your photos while connected to your local network, why cant you get the
data from there?

The key to the efficiency of file coin is not some magic cryptographic token,
it's not even "blockchain", the key is the underlying protocols being unified
for synchronizing devices across multiple tiers of networking. Filecoin mostly
just makes sure the data doesn't disappear and gives you an optional
substitute for cloud storage. Content addressing, p2p communication by
default, flexible/future-proof standards for negotiated what data is and how
it is formatted, and in modular plug-and-play utilization of many different
communication/synchronization techniques is the key to applications being
built in more effective and efficient way than conventional client/server
architecture.

The word that comes to mind is "flexibility". Want you store to be S3 buckets?
Great! Want to host a private network, only your devices can access? Great!
Want to joint the public network? Great! Want to ensure persistence of data
via a token? Great! Want to just use the address mechanism? Great! Want to run
the software in the cloud, on you local machine, in a sandboxed browser tab
and have them all seamlessly forming a p2p network over a multitude of network
standards (tcp, udp, web-sockets, etc)? Great!

I worked extensively last year with various ipfs techs utilized in a private
p2p network consisting of ARM nodes capture time-series measurement devices
and cloud based persistent nodes managing the pinset of data and persisting
long-term storage. There are definitely some rough edges but again it's open-
source built is such a modular way it's almost difficult to develop your self
into a corner.

The software coming out of protocol labs is overwhelmingly not new technology.
Protocol lab large focuses on utilization of battle-tested existing standards
and tech (many of which are relatively ancient like TCP, ssh, git, json, etc)
unified, without breaking existing stands, via a common addressing mechanism.

tl;dr Filecoin/crypto-tokens just a side note! The vast majority of software
coming out of protocol labs is independently useful and largely focuses on
bridging together widely-used battle-tested tech.

~~~
acdha
> Consider the Apple Photos app. It's on your phone, it's on your desktop,
> it's on the cloud (iCloud storage). Say you have so many photos they don't
> fit on your phone. When you try to access a photo that is not on your phone
> it gets it from icloud (these are literally features that are currently
> offered by photos/iCloud). Now suppose your desktop has lots of storage, you
> usually look at your photos while connected to your local network, why cant
> you get the data from there?

Yes, what Dropbox, Crashplan, etc. offered a decade ago — it's a neat sounding
idea but there are two reasons why this isn't a huge win in practice: people
are mobile so the number of times where you need an uncached file and happen
to be on the same network is relatively low and increasingly few people have
an always-on device with a ton of local storage free (phones are really good
at generating high volumes of data so this is a non-trivial problem).

Working on things like this is really interesting but it involves a lot of
work to handle unreliable clients or networks and bitrot (hashes don't solve
this if your client helpfully replicates the bad sector from your desktop over
the pristine copy on your phone). That overhead makes it a lot harder to beat
conventional services, especially in cases where the time investment is
greater than the possible savings.

~~~
rudolph9
> Working on things like this is really interesting but it involves a lot of
> work to handle unreliable clients or networks and bitrot (hashes don't solve
> this if your client helpfully replicates the bad sector from your desktop
> over the pristine copy on your phone). That overhead makes it a lot harder
> to beat conventional services, especially in cases where the time investment
> is greater than the possible savings.

The idea that ipfs/related-project are trying to beat conventional services is
big misconception. The overwhelming majority of the tech does not preclude
usage in conventional services. For example, IPFS can 100% be configured as
server/client offering roughly the same costs/reliability many are used to.
The advantage is interoperability with many different "services", conventional
and alternative (e.g. filecoin) alike.

~~~
acdha
Can you expand that interoperability thought? It’s unclear to me how this
could change the main obstruction of vendors choosing not to encourage
interoperability - anyone who keeps their API locked down is unlikely to adopt
standard IPFS.

~~~
rudolph9
Essentially it's a separation of concerns. The content "address" (i.e. how we
specify the content that is desired) and the content "location" (i.e. the
connections which serve the content associated with an "address") are
distinct.

An "address" is also self validating. Hash addressing, where a hash of the
data is the address of the data, are use to accomplish this. This is useful in
many contexts and an [extra few bits][1] on the the front allow an address to
represent much much more.

"location" is by default a distributed hash table enabling the fulling
distributed routing of content and by uses [Kademlia][0]. But the software
could easily be configured to with fixed values for the hash table mapping any
"address" to the same location which happens to be a cloud provider.

Could you elaborate on what you mean by "anyone who keeps their API locked
down is unlikely to adopt standard IPFS."?

[1]:
[https://en.wikipedia.org/wiki/Kademlia](https://en.wikipedia.org/wiki/Kademlia)
[2]: [https://github.com/multiformats/cid#how-does-it-
work](https://github.com/multiformats/cid#how-does-it-work)

------
cgb223
How is Filecoin different from Sia coin?

I remember a while ago hearing about both trying to decentralize storage but
never kept up with it enough to really suss out the difference

~~~
Sargos
Sia seems like a Dropbox/OneDrive replacement whereas Filecoin is an
incentivized storage layer infrastructure for apps to use, especially dapps.

If you wanted to build a decentralized SoundCloud you couldn't use Sia but
IPFS/Filecoin is the primary building block of such a dapp.

~~~
olah_1
Sia released a new version of their product called Skynet[1] which provides
exactly the functionality of IPFS+Filecoin today. Part of Skynet is a group of
SDKs[2] which you can use to integrate skynet storage as part of your
application. I believe that they are even adding the ability for developers to
get kickbacks if they build on Sia[3].

If you do want something like Dropbox that is built on Sia, there is
Filebase[4].

I am not associated with Sia at all. I've just been following their twitter
for a while.

[1]: [https://siasky.net/](https://siasky.net/)

[2]: [https://nebulouslabs.github.io/skynet-
docs/#introduction](https://nebulouslabs.github.io/skynet-docs/#introduction)

[3]:
[https://twitter.com/SiaTechHQ/status/1291441690008592384](https://twitter.com/SiaTechHQ/status/1291441690008592384)

[4]: [https://docs.filebase.com/](https://docs.filebase.com/)

~~~
xur17
I'm curious to look at filecoin - last time I tried to use sia, it was still
fairly rough around the edge (minimum file size, seed based recovery isn't
automatic, etc).

------
mifeng
This article seems like it was written in 2017, not 2020.

Filecoin raised over $200m and hasn't shipped anything that I'm aware of in 3
years. Maybe it will be revolutionary, but the crypto industry has long moved
on from white whale projects that don't ship.

~~~
wmf
[https://github.com/filecoin-project/lotus](https://github.com/filecoin-
project/lotus)

[https://docs.filecoin.io/how-to/install-
filecoin/](https://docs.filecoin.io/how-to/install-filecoin/)

~~~
mifeng
This is what I'm talking about. It's been 3 years, and they're still doing
testnets? What a joke.

------
opportune
One thing I'd like to see decentralized storage protocols guarantee is that
not only does the data exist on the backend, but is it available enough to be
useful, and if so, are the latency and bandwidth good. PoST might be able to
solve this if it enforces a strict deadline on the answer and tests
availability sufficiently frequently.

The other is some sort of authentication. Encryption of the data itself, IMO,
is not good enough. In 10 years time I don't want a nation state or
megacorporation with enough entangled qubits to be able to pwn all of my data.
But distributed authentication as I understand it is a hard problem, because
you probably need to make your authentication proof public, which creates a
similar set of problems on the authentication itself.

~~~
foota
Two thoughts:

One is that frequently verifying availability is going to get expensive quick
with iops for coldish storage systems.

Secondly, iirc symmetric encryption isn't really vulnerable to quantum
algorithms, as far as we know (and maybe even proven not to be?). It's
asymmetric crypto that's at risk.

------
mdaniel
That is not what matches my mental model of a "deep dive," but was interesting
enough to warrant an upvote

For example, the phrase "decentralized marketplace of storage providers and
choose the one" was just glossed over -- through what mechanism does one
choose a storage provider? Is there an API that (for example) a container
storage interface provider could consume? like that kind of "choose a
provider"?

I also would have enjoyed more hyperlinks for terms such as zk-SNARK, and "a
storage miner" which is similarly glossed over

~~~
dang
Ok, we'll swap a prefix for a suffix of the title above.

(Submitted title was "Deep Dive into Filecoin".)

~~~
simonebrunozzi
Good call. I should have thought of a better (submission) title in the first
place.

------
ironchief
This hardly meets the definition of brief description, let alone a "Deep
Dive".

~~~
hacker_newz
That's par for the course working with filecoin.

------
mdaniel
By chance, I saw a relevant article over on r/ipfs:
[https://www.axios.com/filecoin-blockchain-
delay-3b5e6b9a-bcc...](https://www.axios.com/filecoin-blockchain-
delay-3b5e6b9a-bcc8-41cf-81cf-563f6cebb2c4.html)

~~~
wmf
Discussed yesterday:
[https://news.ycombinator.com/item?id=24199595](https://news.ycombinator.com/item?id=24199595)

