Hacker News new | past | comments | ask | show | jobs | submit login
Opening the Filecoin Project Repos (filecoin.io)
162 points by bencevans on Feb 14, 2019 | hide | past | favorite | 54 comments

I'd love to finally see a writeup of the "proof of spacetime" storage. Understanding that proof seems critical to understanding what use cases FileCoin will and won't work for.

Edit for clarity: This is the closest I was able to find, and is short on details. https://github.com/filecoin-project/specs/blob/master/proofs...

The proof of storage is even trickier.

A multi-account attack (sybil?) can just store the file once, and pretend to store it N times, inflating the cost (attacker revenue) by N fold. I can't see how this is avoidable other than being an authority with insight into most of the internet traffic structure. ...Maybe it is designed to only work within china?

last question was a joke, but note that every single multi-account protection requires a breach of anonymity. That's why most modern distributed protocols try to completely work around any benefit of having multiple accounts. Storage, by definition, can't.

The proof of replication provides assurance that for some given data D, a unique replica R is created for it. What you’re proving with the Proof of SpaceTime isn’t that you have some data, you’re proving that you have the replicas. Generating the replica is slow, so you can’t do it on the fly. I gave a talk recently that goes over some of this briefly: https://youtu.be/GZZ2G9bPXsM

Thanks for linking your talk! Just started watching it now. Is that Dan Boneh in the front?

Yep! He gave the talk just before me

you talk for 15s about it on that video. I will try to read the secondary paper on this... but doesn't look like it can protect against someone just generating hashes for multiple accounts from the same stored data.

P and (the fake) P2 can both reply to V with perfectly normalized storage.

But regardless if the protection works against bad actors, it definitely "works" against good actors if you take into account storage prices. E.g. P1 and P2 are two distinct good actors. They both pay cloudProviderA for storage. P1 and P2 bid on the same filecoin storage contract. They store that independently of each other, but cloudProviderA normalize the storage (with their own backup and consistency solutions). But with this protection scheme, since the cloud provider can't normalize, the cost will eventually go up because their normalization margin got erased. Over time, the data storage cost for good actors will go up, while for bad actors (assuming they are possible) the profit will increase.

To generate a response to the verifier, you have to have your replica of the data in question. A replica (from proof of replication) is a unique encoding of the data that is quite slow to generate. If the prover isn't storing the replica, they will have to regenerate it before they can respond to the challenge, which takes a noticeable amount of time.

What scale is "noticeable"? Is it slower than "we're going to pretend the network is slow and send you 1KB/s until we generate the rest"?

I wouldn't be surprised if regenerating it a few times is more expensive than simply storing it once.

1kb is much larger than most hashes.

I'm not sure what you're trying to say by that.

I'm saying that strategies like that won't work when the data required fits into a single packet.

Take a read through Ben's PoRep paper: https://eprint.iacr.org/2018/702

Thanks, I'll take a look.

Here is interesting paper addressing that - "Proofs of Replicated Storage Without Timing Assumptions" - https://eprint.iacr.org/2018/654

I'm interested in this as well. Failure scenarios for their system are data and financial loss.

For example, if there is a loophole in "proof of spacetime" then you might be paying for your data to be backed up 8 times and have it only backed up once. You find out that was the case only when that node disappears and your data is lost.

A brief for anyone wondering what Filecoin is:

> Filecoin is a decentralized storage network that turns the world’s unused storage into an algorithmic market, creating a permanent, decentralized future for the web. Miners earn the native protocol token (also called “filecoin”) by providing data storage and/or retrieval. Clients pay miners to store or distribute data and to retrieve it.

Filecoin is a project of Protocol Labs which also created the DAG based hypermedia protocol IPFS.

If you add compression, it is pied piper.

Except FileCoin is Hooli and Sia(Coin) is Pied Piper. They're essentially reinventing Sia with money. https://sia.tech/

how so? Sia has it's own token aswell

Protocol Labs wont integrate filecoin use into IPFS?

See the Filecoin FAQs (https://filecoin.io/faqs/#what-is-the-connection-between-ipf...) for more information on how IPFS and Filecoin interoperate.

Filecoin is essentially IPFS with incentives for data retention and distribution.

Given that so many people do it all for free with very good uptime and connectivity, do you think Filecoin will be something consumers want?

The interesting bit: "Legal Compliance for Filecoin Storage Miners" https://github.com/filecoin-project/specs/issues/65

"In order to comply the with law, a miner who agreed to store the bad data must not serve it, and delete the data from its hard drive as soon as possible."

@ianjdarrow from the linked thread here. To clarify a little – the idea isn’t that Filecoin would ever tell anyone they have to remove data of any type. Filecoin is open-source software that tons of storage miners will use across diverse legal and moral systems, and it’s important that those miners can make decisions about how what they store and to whom they serve it.

Of course, you have to do this using incentives to avoid breaking the protocol. This is actually one of the reasons it makes sense for storage and retrieval markets to be separate. If you’re being paid a fee to serve a file you have, you should rationally want to serve it unless there’s a compelling reason not to.

Wow, you have some really tough business decisions with this architecture:

- Is the mentioned blacklist implemented?

- Will it be enabled by default?

- Will the legal entity responsible for the blacklist be US-based?

From your link.

Storage miners can absolutely decline retrieval requests, at any time, for any reason.

Storage miners have to be able to access the data themselves for the purposes of the PoSt, but since the PoSt is a zero knowledge proof, none of that data ever leaves the miner. The PoSt simply proves to everyone else that they actually have the data, without revealing the data (thats how we keep it compact).

I found this interesting too. Could a miner save on bandwidth costs by just hoarding a bunch of data, but never serving it up? I guess that they must have a solution for that, but is the solution decentralized?

Yes, that seems a possible attack. You can store stuff and pass the regular audits using "Proof-Of-Storage", yet refuse to upload bits when finally asked.

Hard to protect systems from irrational attacks like that (we know from building a similar system). They are quite transparant about their "unsolved-problems": https://github.com/filecoin-project/specs/issues/63

Why do you call that behavior irrational? Seems like it would save some bandwidth costs, as the parent suggests.

AFAIK nodes get paid for serving data and they can set the price to be higher than their bandwidth costs, so refusing to serve data is refusing to make money.

If you are the only surviving node with the data, then it's called extortion or holding the data hostage.

Preventing that kind of problem is basically the point of Filecoin and similar systems.

see my reply to the parent; there’s a retrieval market too, which could help that issue

from what i understand, you get a reward for serving the data, so you miss out on a good portion of bounty. you probably COULD attack the data like that, but it would cost you which might stop large scale attacks

*EDIT: and not just for serving it, but for serving it quickly.

from their site:

> Are you in the middle of a dense city? Do you have access to very fast Internet pipes? Have a rack in a carrier hotel? Perfect! In Filecoin's Retrieval Market, miners get rewarded for delivering content quickly. This makes the retrieval market good for miners who have low-latency, high-bandwidth connections to lots of users (but not necessarily the most disk space).

> Could a miner save on bandwidth costs by just hoarding a bunch of data, but never serving it up? I guess that they must have a solution for that, but is the solution decentralized?

I've always suspected this will be the case with Filecoin. Whether or not it is profitable depends on the ratio between storage and bandwidth costs. If you can store stuff cheaply but your bandwidth is expensive, then it make sense to store junk data if your hard drives are not full.

Filecoin really has two incentive mechanisms: A proof-of-storage-hardware mining scheme similar to proof of work which does not preclude storing useful data, along with a retrieval market which should make storing useful data more profitable than not in most cases.

Maybe we need a bandwidth coin as well ...

Martti Malmi (and I) are working on that:


Our testnet has already pushed ~1TB of data in a day on a P2P network that cost ~$99 to operate. (Internet Archive, D.Tube, Notabug.io, and others run us in production).

I'm hoping bandwidth + Filecoin will be the perfect combo for devs to replace AWS/Azure/gCloud/S3/etc. for about ~85% cheaper than current centralized rates. (As calculated by our 300GB / 100M 1kb per record test here, a few years ago: https://www.youtube.com/watch?v=x_WqBuEA7s8 ).

Now we just need a kickawesome CPU/GPU coin that isn't slow/bottlenecked. Anybody know one that could unify with IPFS + GUN?

I’m curious what exactly you are talking about. It sounds like you were getting an 11 megabit connection for $99? What is the product?

IIRC each proof also contains a little bit of the data, so over time the original data is transmitted.

that sounds like the exact opposite of a zero-knowledge proof: https://en.wikipedia.org/wiki/Zero-knowledge_proof

This project (I mean IPFS) startet with a very interesting goal. Now it seems they have totally derailed and went the blockchain based virtual currency way. So sad. Instead they should have invested their energy into better integrating IPFS eg. into browsers.

Last I checked, FileCoin was just an incentive layer for IPFS, not a core part of the protocol. Close association between Protocol Labs and the Ethereum community doesn't inspire me with confidence, but as long as IPFS works without it, I'm happy.

Link to the Filecoin FAQs: https://filecoin.io/faqs/

What stops a filecoin miner from storing all replicas to S3? If nothing, then why use filecoin at all? Economies of scale would dictate that the most successful filecoin miners would simply be cloud storage providers by another name (but paid in tokens instead).

it's possible that some miners choose to store their data on S3 or some other large/well known cloud provider. However, miners have other options as well. For instance, they could store their data in their own data centers or with more start-up storage providers without needing all the marketing resources that Amazon, Microsoft, Google, etc. have.

One of the use cases I'm looking forward to seeing from Filecoin though is for regular folks with desktops that have extra storage capacity to just rent that capacity out. Right now the extra 500GB of storage space I have is just an unutilized resource waiting to be made use of. Additionally, it allows people to basically swap data capacity so I'll store 500GB of data for other people, and in return I'll use that Filecoin to pay other people to store my data. This is could be very useful for personal data backup where my primary requirement is that my data live somewhere else that is not on my computer.

> What stops a filecoin miner from storing all replicas to S3?

Being undercut by Amazon running Filecoin on their own?

So filecoin would just be a slower, less scalable, and more costly way to pay for S3?

Just put it on Saito and pay for it to stay on-chain for however long you need.

The fact that as a miner you have no control over whether someone stores illicit documents on your machine seems like a dealbreaker

The fact that as a maintainer of a website with user-contributed content, you have no [up-front] control over whether someone stores illicit content on, and distributes it from, your server, seems like a dealbreaker.

We have been Apache 2.0 open source from day one.


Kudos to filecoin! And choosing Apache 2.0 as the license.

Okay? You have... A thing? Probably a similar thing? If you must spam your own thing, at least tell us why we should care about your thing in contrast to the actual subject of the thread.

My fault!

Filecoin is great!

We are also building a blockchain that is based around Verifiable Delay Functions and has been open source from day one.

VDFs can be used to track time in a permissionless network in a way that works before consensus. Which opens up a wealth of distributed systems optimizations that involve a synchronized clock. If that sound interesting, check out https://github.com/solana-labs/solana

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact