> Today it is possible to purchase a refurbished 1 TB hard drive for about $20 — so buyers are essentially paying Amazon each month the entire cost of the hard drives used to store their data!
What a refurbished hard disk does not give you:
- Redundant storage
- Redundant power/network/etc.
- Geographic distribution
- A network API
- Any networking, or uplinks at all
- Access control
- A team ensuring the reliability and security of your data
- Global network of datacenters and pops
- Integration with other cloud services
All that and Amazon et al. do not store your 1TB of data in a single refurbished hard disk. This is beyond non-sense.
Sure, some of these costs are amortized, but I still think this is a lame comparison. If all you wanted was at-cost block storage with crappy reliability and speed, you could probably accomplish that in easier ways.
> What a refurbished hard disk does not give you: [...]
The whole point is that a decentralized storage network can add such features on top of the 1TB refurbished hard drive.
The refurbished drive may not be reliable, but if you replicate your data amongst many separate drives with independent failure (geographically distributed; different providing entities) then you can create a system with greater reliability. By using erasure coding schemes (similar to RAID5/6), it's possible to reduce the storage overhead which supports reliability to a fraction of the principle amount. Thus, providing 1 TB of storage to the network could earn slightly less than 1 TB of reliable storage in return.
Comparatively, a single cloud provider has the disadvantage that, because they are a single legal entity, they can decide for whatever reason to not continue your service.
What I am not saying is that Sia is not viable or that cloud storage providers have no downsides, which they clearly do. I did not mean to imply that there is anything wrong with this service, just to express my annoyance with the comparison. I am not happy that my comment got so much attention, because I had no intention to overshadow the announcement with my single gripe.
The line that is annoying me and probably others is this:
> Buyers are essentially paying Amazon each month the entire cost of the hard drives used to store their data
...They're not. Because redundancy. Because electricity, CPUs, network, software development, etc.
The comparison is meaningful setup for discussing decentralized cloud storage.
The comparison establishes the fact hat Amazon's price is not coming from the physical storage HW itself, its the everything around it. Then you can move to discussing how to get other parts cheaper.
I think the use-case of cloud storage is mostly marketing by cloud provides to show relevance to the consumer. They are are loss leaders at best; used to lock you in to their services / track your every move; at worst.
I think the comparison with a $20 1TB hdd is perfect. Cloud providers, and phone makers, have convinced you to have your photos, music, data, online, always. Like a bank. Among other things, its much easier to mine your usage patterns that way and enforce DRM of other cloud services.
I have my IDE PC drive from 20 years ago. It still works. I can burn a CD if im that worried about failure. I dont need the redundancy of a billion user website, a networking api, access control, all that other crap you use in enterprises that comply with ISO standards.
Before cloud providers, people didnt take their photos to Apple and tell them to store them. The only reason u do it now, is because of marketing and service integration.
Appreciate your response. My comparison to the cost of a hard drive today is not meant to be a direct comparison, but instead to demonstrate that traditional cloud storage is actually quite expensive. Sia is able to bring down storage costs by an order of magnitude and bandwidth costs by two orders of magnitude. While most HN readers understand the nuances of cloud storage costs, most non-technical readers are unfamiliar with the high costs of S3 storage and egress.
The issue I take is that the comparison is truly not meaningful. Yes, storage is cheap, but you are comparing the cost of a refurbished disk to a managed service.
This may lead people to think cloud storage is a waste of money, which I personally think is untrue (though Sia certainly may be able to do it for cheaper than cloud storage, too. Both things can be true.) While the cost to us is certainly more than the cost to the provider, the cost for us to replicate what a cloud provider offers is certainly greater at small scales where we benefit less from amortization and volume pricing.
A good point may be that we don’t actually need to replicate what a cloud provider does to offer a compelling alternative for various use cases. This I would probably agree with. It is certainly cheaper for me to have my own NAS and offsite backups than it would be to pay for cloud storage, though it is still surprisingly expensive (at least to the uninitialized. Check out what a fully loaded Synology 8 bay would cost! Multiply by number of backup sites. A lot of storage, but definitely not cheap.)
The comparison wasn't complete, but I think it was still meaningful. It's like comparing the cost of sand to glass; it should be obvious that sand is not glass, but it begs the question: how much proprietary value is there in glassmaking?
> This may lead people to think cloud storage is a waste of money, [...]
But that is exactly the argument being made -- that a trustless distributed system can substitute and obsolete cloud storage.
Then I respectfully disagree. I don't think cloud storage is always the best use case, but I do believe it is a good value for what it actually offers you. Backblaze B2 is my ideal marker for what a 'good value' in cloud storage should look like. At $5/mo per TB[1] it easily crushes the metric of monthly cost being less than equivalent non-redundant refurbished storage media.
> that a trustless distributed system can substitute and obsolete cloud storage.
OK, I have literally no qualifications to say that this isn't true. After all, what I am NOT claiming is that it can't be done for cheaper. What I AM claiming, is that without a service like Sia, it is certainly not going to be cheaper for common use cases of end users. I'm talking about comparing harddrives to services.
> Buyers are essentially paying Amazon each month the entire cost of the hard drives used to store their data
I still think though, that a compelling case could be made without bad comparisons or hyperbole. Which makes this more frustrating than it needs to be imo.
Well, the actual comparison is between the cost (for ownership) of a 1 TB (refurbished) drive vs. 1 month of 1 TB of cloud storage.
The Author could have done it as the cost (for ownership) of a brand new NAS with 2x2 TB disk (which can be - just checked on tiger direct) around 350+65+65=480 US$ vs 6 months of 4 TB cloud storage 6x4x20=480, but the overall message would have been the same.
Imagine that you rent a car for 6 months and during the 6 months you pay rates equating the price of the car brand new.
Obviously you're paying for more than the drive. That doesn't make it a bad comparison. There's no great reason for the drive to be more than half the cost, so even with 3 copies amazon still comes out as making a whole lot of profit. There's a reason that multiple competitors can provide all those things at significantly lower prices.
And more importantly, if you're focused on storage then don't pay for features you don't need.
One of the main innovations of Sia is providing a low cost means to turn that 1 TB refurbished hard drive into a useful cloud product.
Sia is able to provide, at 1/10th the cost of Amazon, high throughput, high reliability, high redundancy, geographically distributed cloud storage, and it can do that without high reliability drives or datacenters.
Off topic, but: I purchased three refurb 6TB HGST He HDDs in Jan. from a reputable shop. All three failed last month. I was so happy that the shop completely refunded all of them because the warranty is generally very limited on refurb HDDs. Normal RMA is 5yr on this specific model.
There's no way refurb drives are cost effective for these environments.
That all three drives lasted this long then failed within a month is very suspect. This sounds less like random failure indicative of the quality of refurbished drives in general, and more like a defect common to all three drives.
Because of such plagues, data-centre operators often mix drives of different manufacturers, models, batches and ages. Running heterogeneous devices in a redundant array helps makes the failure distribution more random so that the safety margin is less likely to be suddenly overwhelmed by correlated failures.
Beyond a functional test, checking the SMART data for an indication of overall drive state and He containment, and then giving the drive a quick clean, it would be interesting to know the supplier's scope for 'refurbishing' an He drive.
Abuse concerns and “does it need a blockchain” cynicism aside I find all of these “crypto” cloud computing projects fascinating. I highly recommend playing around on their testnets. Storj and Filecoin are also competing in the storage space, and BitTorrent is vying to be a CDN alternative.
SONM aims to be an EC2 competitor, and interestingly requires ID verification to participate in their testnet due to the clear abuse potential of unattributable internet-connected VMs. (which exist in plenty of other places anyways since at present date, hosting providers have no KYC requirements)
Will the sharing economy extend to computing resources? I’ll stay tuned.
Decentralized computing is a challenge because unlike storage, it's very expensive to encrypt computation. You see something like 10,000x slowdowns.
With storage, we can encrypt and checksum everything client-side in a cryptographically secure manner, and we can have full faith that even though we are giving our data to someone else, they have no way to view the data or tamper with the data.
If techniques like homomorphic encryption gain substantial speedups, I'm sure that we'll start to see decentralized/trustless computing, but for now you really need to be doing your outsourced computing with someone you can trust and someone who is regulated, because it's possible to act maliciously.
1. Now I have to trust two providers, each of which could have problems.
2. Now I have to pay for encryption, then pay for storage, and probably have higher ingress costs, too (I can probably trust AWS/GCP/Azure/Backblaze/whatever; not sure about random-blockchain-guy #67812).
SGX has always seemed like the Netflix DRM problem to me.. eventually, pixels have to show up on a screen somewhere, right? Am I mistaken in this line of thinking?
Homomorphic encryption solves this, but it’s throwing an awful lot of performance out the window.
I think the idea behind SGX based cloud computing would be that you send encrypted data and code to the enclave, which then decrypts the data, does the computation, encrypts the result, and then returns the encrypted result.
The enclave gets to see everything that's going on, but in theory nothing can view or corrupt what's going on inside of the enclave.
I'm personally pretty skeptical that SGX is a secure platform, however if you do decide that SGX works and is trustworthy, it would be possible to outsource your computation in a mostly cost effective way to anyone who has an SGX enclave.
Yes, this is the idea. Enclave technology should improve over time, whether it's SGX or something else. And Golem is working with Intel, Texas A&M and others in creating an environment called Graphene (https://grapheneproject.io/) where applications can easily be run on SGX.
Enclave defeating technology will improve over time as well. Things like scanners and lasers and emi detectors get better and better just as technologies for hiding from them do.
This isn't what SGX is for. SGX is designed to add a layer of security to protect against malicious code on the same machine, encrypting memory at rest such that its contents aren't compromised if a vulnerability allows access to it. It's not intended to support an untrusted computing platform like homomorphic encryption.
Decentralized computing requires redundant computations for verification which seems like a waste for most applications and not to mention latency added to each request.
I'm sure there's use cases for this that require more sophisticated compute than what smart contract platforms like Ethereum offer but I'm not sure what those are. If someone is being prohibited from using a cloud platform and needs to resort to a decentralized hosting platform, I can see them simply using a VPN and hosting the application on a different platform that's not censoring them. SONM requiring KYC for their decentralized platform sounds like an oxymoron.
You can use storj to host your files without using blockchain. I think they accept credit cards directly. Also storj is doing a lot to make sure only reliable hosts are allowed to continue to operate on the network. If you aren't serious about having a reliable hosting node then you'll eventually get booted from the network.
Hi all, author here, just wanted to say I am thrilled to see the HN community discuss decentralized cloud storage in more detail. There are many drawbacks today and narrow use-cases, but our goal is to continue to improve Sia – and over the coming years we will prove that decentralized storage can compete directly with Amazon, Google, and Microsoft. We feel confident that the marketplace dynamics, in particular, will foster competition between hosts on the network and maintain prices that are an order of magnitude cheaper for storage and two orders of magnitude cheaper for bandwidth.
Pretty arrogant to say you're going to make cloud storage providers obsolete when you haven't even built a competing solution. You can't even share data yet...
And what about all the other features that cloud providers offer? You say "[the] Sia software is exponentially improving, and performance and featureset is quickly approaching Amazon S3", but I see absolutely no evidence of even the most basic features of cloud storage providers.
Authorization? Regulatory compliance? High-performance bandwidth? High-reliability public API/URLs? Customer support? Private cloud peering? Zero effort integration with many (most?) major data-related open source projects? Versioning? One-click, no-knowledge reliability?
How much do you need to grow to hit cloud scale? 1,000,000x? More? And you're already claiming that you're going to make S3 obsolete?
Your blog would hold more weight if you weren't so over-the-top, bullshit-level optimistic.
I often enjoy the cynicism of HN as a counterpoint to the typical breathless media coverage of new technology. However that now famous comment on dropbox is a good demonstration that cynicism isn't a short cut to truth.
I'm not saying it won't work (I have no idea - there's no reason someone can't replicate all of s3's features eventually). I'm just saying that this "we're going to replace run s3 out of business" optimism is pretty unfounded right now. And on a factual level, "featureset is quickly approaching Amazon S3" is pure bullshit.
Only took me a couple of minutes to get a file uploaded using the aws-cli. It’s the only decentralized object storage solution I’ve ever seen that seems to actually work.
Filebase uses a decentralized service on the backend, but the service itself is centralized. If Filebase shuts off their servers, you lose all your data.
As an analogy, email is decentralized but Gmail is not.
Interfacing with these decentralized storage networks requires cumbersome client side software, that handles things like the smart contracts and utility coin transactions.
All of the projects I’ve seen appear to be quite far away from being easily accessible by ordinary users.
Based on where the technology is currently at, if you want to compete with traditional could services, you need a service provider like this to build a platform on top of the underlying network.
My point is that this service seems to have achieved that. People talk about storj a lot in this space, but go and try upload a file to storj right now. You can’t because they don’t actually have a product in the market. This is the first time I’ve seen a press release like this, gone to a web page, signed up for a service, and uploaded a file immediately. Which is a refreshing change from an industry that tends to talk big about the impact they’re going to have for months or years on end, without ever releasing a functional product to the market.
Fair enough. I agree about there being huge value in services providing abstracting away the complexities of Sia and making it accessible to more casual users. I'm just bothered by the fact that Filebase seems to be deliberately implying that they're a decentralized service, when they're, in fact, just as centralized as Google or Amazon.
The player that I prefer in this space is Goobox.[1] I think their goal is to build a Sia-based service similar to Mega, but they have an S3 API as well. They've been around longer than Filebase, and their marketing comes across to me as more honest.
Gotta say, the fact that the installation instructions consist primarily of getting around the "unknown / unidentified developer" restriction doesn't inspire confidence.
Thanks for checking out our service! Filebase[1] offers an S3-compatible API and we are happy to confirm that we have tested basic use cases with restic.
Using Sia on our backend enables us to offer cloud storage at very competitive rates. Currently, Backblaze charges egress fees (we don't) and Wasabi has a 90-day minimum charge for all objects. (we have no minimums)
> and their prices are suspiciously similar at about $20 per TB per month. Today it is possible to purchase a refurbished 1 TB hard drive for about $20 — so buyers are essentially paying Amazon each month the entire cost of the hard drives used to store their data!
Really? That's the argument? That you could store your files cheaper on a single, refurbished disk without considering the cost of housing, electricity, network, physical security, redundancy, management etc.?
Aren't all those costs priced in? I assumed the data storage with Sia was redundant and encrypted, which should cover basically all of those things, right?
Looks like it from looking at the site:
>File segments are created using a technology called Reed-Solomon erasure coding, commonly used in CDs and DVDs. Erasure coding allows Sia to divide files in a redundant manner, where any 10 of 30 segments can fully recover a user's files.
[...]
>Before leaving a renter's computer, each file segment is encrypted. This ensures that hosts only store encrypted segments of user data.
I'd love for someone to do an analysis on the performance and availability of the data. I wonder if you could pay more for faster speeds or higher levels of redundancy? That would be neat features.
The implication is that with Sia you can get those same benefits without having to essentially pay the entire cost of a new hard drive every month. (Sia's rates are much cheaper.)
> Hosts on Sia do not have to worry about building and maintaining enormous datacenters, employing thousands of employees, and marketing their services. They only need to worry about providing reliable storage capacity to renters on Sia.
I would expect that building large datacenters and employing people to manage them is exactly how you go about providing reliable storage capacity.
And while it's true that providers don't have to worry about marketing or branding, well... the flip side is that they can't do any marketing or branding and need to compete in a market where the product is completely commoditized.
Sia is built on a completely different assumption than most large datacenters - uptime requirements for hosts are around 95%, which equates to about 36 hours of downtime per month.
This means that you can run a much leaner database and require substantially less expertise to keep things going. If a rack goes down, you don't need someone on-site to bring it up ASAP, you can fix it the next workday without upsetting your customers.
95% uptime for hosts translates to 99.99+% uptime for Sia users, because users store data across hosts in a 10-of-30 scheme. As long as 10 out of 30 of the hosts are online, you will be able to access your data. The probability of losing 20 hosts when each of your 30 hosts have 95% uptime is exceedingly small; your practical uptime depends more on the reliability of the software than it does the reliability of the hosts you use.
As for commoditization: that's the goal! If we can completely commoditize data storage, prices should come down dramatically, and the market should be a lot more efficient. You won't need to be an Amazon or Google to have a stable and competitive data offering.
> This means that you can run a much leaner database and require substantially less expertise to keep things going. If a rack goes down, you don't need someone on-site to bring it up ASAP, you can fix it the next workday without upsetting your customers.
This is already the case for cluster storage systems like Ceph.
This extends to the whole site. You don't need backup power. At all. Maybe no redundant network equipment, if you get replacements on speed dial. Likely no redundant fiber, just don't take your time to arrange a replacement if someone cuts it (use a different, pre-scheduled conduit you can rent and fill with fiber in a day or two).
If you get unexpected "bus factor", you can handle replacement combing back early from holiday.
> I would expect that building large datacenters and employing people to manage them is exactly how you go about providing reliable storage capacity.
Further, I imagine that if there is money to be made providing reliable storage to Sia users, then those with experience of reliably providing low-cost, high-volume storage will corner the market pretty fast!
I tried Sia about 2 years ago. It worked "ok" for half a GB files, However, for 4-5 GB my upload got stuck while the help forum couldn't provide answers besides "wait for the next release".
There were also unclear issues back then. Are the files encrypted on my computer before being uploaded? are they sharded between many hosts? or are they stored in one host? How long will they be stored.
Most of these issues seem like they could be improved. Not an inherent faults of the technology or something. I like Sia's idea and tech in general; I'll give it a try again now.
Sia has come a long way in the past two years. Uploads, downloads, and repairs are both much more stable and also a lot faster. Sia can effortlessly sustain 100mbps upload and download if your connection is that fast.
Sia is happy with files that are hundreds of GBs in size today, and with a filesystem that that gets up to about 20 TB. Beyond that it starts to struggle, but we're continuously working to expand its capabilities.
Data has always been encrypted client-side, even in very early releases. Data today is sharded between hosts in a 10-of-30 Reed-Solomon scheme.
I think you will find that the experience is much smoother and more complete compared to the experience from 2 years ago. And, a critical feature is now available as of the latest release: seed based file recovery. After you upload your files, it's possible to create a snapshot of your data that you can recover later on a different machine using nothing more than your wallet seed. This makes Sia a practical and cost effective solution for secondary or tertiary backup.
Long time user and follower of Sia. It's always surprising to me that it's overlooked compared to filecoin (which is similar, but seems like $250m+ vaporware to date). I use it to have tertiary backups.
Weird caveat I found: the smallest chargeable size is actually pretty large.
> Weird caveat I found: the smallest chargeable size is actually pretty large.
This and the cost to form contracts is is large enough that it doesn't make economical sense until you are storing > $1gb.
That said, they seem to slowly be working through a number of these concerns. They very recently released seed based backups, one of the biggest issues I had last time I looked.
We do not yet have the ability to share the same set of contracts across multiple machines, but it is on the roadmap. This would include using the same set of contracts on desktop and mobile devices, which will open up some intriguing use-cases.
Can someone explain the difference between Sia vs Filecoin/IPFS. The latter seems to be still working on solving some fundamental crypto primitives before they can offer what seems like what Sia already claims to have solved?
Yes, Filecoin is trying to come up with a "more efficient" and possibly "elegant" implementation based on algorithms that are still being invented. Of course, with 100x the budget it's possible that they could still overtake Sia.
Given they publish their apps on all platforms they could probably add this in the future as an ssd or PCIe storage tier by doing regular disk benchmarks.
However it seems like a major limiting factor, given that random access speeds on spinning drives or even cheap ssds can drop drastically by an order or two of magnitude (i.e. from 130MB/s to 2MB/s).
Given how they're doing error correction in a distributed manner, having a fast storage tier would be a major advantage for them which would eliminate any problems from flash storage wearing out and fully gain the benefit of throughput, random access times and cheap new prices of nvme drives.
At the moment I probably can't use this for streaming my own 4k videos to myself because an up to 300mbps network is not good enough for that, the slowest storage in it would be a bottleneck for a significant amounts of the experience of watching the streamed file.
And I definitely can't use it for application storage given how important response time is to the user experience and user growth.
That all aside, those 2 are relatively extreme examples and there's a lot of usage scenarios in between them.
I do admire how far they've gotten this given all the blockchain hyped products that have failed. They seem to have a solid system here that will only improve on performance and put pressure on bringing online storage prices down.
Most 4K video is closer to 50mbps. If your home connection is that fast, Sia should be more than capable of delivering smooth 4K streaming to your home.
A single host with slow storage is not a blocking factor because Sia downloads in parallel from as many as 30 hosts at once when composing a stream of data. Even if each has an I/O bottleneck of 1 MB/s (8mbps), you'll be able to get your video out at 240mbps.
In practice today, Sia hosts are almost all network bottlenecked, most use HDDs which do not have trouble keeping up with network latencies (HDD seek times are 11ms, network latencies usually 50ms+) or throughputs (HDDs can usually get close to 100 MB/s, far more than the network throughput)
Can I ask, have you tried streaming 4K video from the latest version of Sia? Seek times might be 5-10 seconds, but I believe that once your stream has loaded you will see completely smooth playback. Our next release has a focus on reducing those seek times, but the code is not ready yet.
No I have not tried it however, based on your information, this looks ideal for something like showing auto-playing video on a splash screen using your API to pull the file on the client side.
Does that mean for every individual hoster-renter contract, of ones that were established and used, some ~9% of them failed?
That would essentially mean that to get equivalent numbers to those advertised by B2, which claims 11 9s I'd need to buy over 10 contracts. Is this actually how bad it is at the moment?
You are correct that the failure rate for contracts is somewhere around 9%, and you are also correct that you need over 10 contracts to get 11 9's of redundancy, however this doesn't translate to needing 10x redundancy on the Sia network to get high reliability.
Data is uploaded to Sia (per-default, it's configurable through the API) in a 10-of-30 scheme using Reed-Solomon coding, which means that each piece of data is held by 30 hosts, and out of those 30 hosts any 10 of them are sufficient to recover the original data. This has a total overhead of 3x, and the algorithms behind it are in my opinion super fascinating.
> Data is uploaded to Sia (per-default, it's configurable through the API) in a 10-of-30 scheme using Reed-Solomon coding, which means that each piece of data is held by 30 hosts, and out of those 30 hosts any 10 of them are sufficient to recover the original data.
This sounds good - I'm just trying to understand how this is counted - are you establishing 30 separate contracts to achieve this?
Sia maintains 50 contracts with hosts at all times, and uses 30 of them for each file segment that gets uploaded. We use state channels, so we can use the same contracts each time you upload a new file, minimizing the total amount of on-chain activity.
Can someone ELI5 why you need a blockchain for this? Another startup could make a non-blockchain version of this that uses similar syncing technology to GDrive / Dropbox file sync that just syncs other people's stuff instead of yours and pays you monthly.
Founder here. The blockchain gets you a couple of things.
The key thing in our blockchain is the storage contract, which is something like a blockchain-SLA. The host has an obligation to store some data, and the host also puts up money out of pocket as a promise that the data will be stored faithfully. The blockchain will occasionally challenge the host, and if the host cannot create a proof that the data is still being stored, the host's collateral is forfeit.
The renter also puts money into the file contract, this money is used to pay the host if the host is honest/keeps the data, and the money is destroyed if the host is dishonest or loses the data. The money gets destroyed so that the renter has no reason to interfere with the host, nothing to gain from seeing the host fail or working to make the host fail.
Thanks to the blockchain, the host has a guaranteed payment if they store the data, regardless of whether the renter sticks around for the duration of the contract or not. And also thanks to the blockchain, the renter knows that the host will not get paid until the end of the contract, and that they will actually lose some of their own money if they lose the data.
There's another great benefit though to cryptocurrency, which is that payments are super low friction. Every time you upload data to the Sia network or download data from the Sia network, you make hundreds to thousands of tiny payments to the various parties over state channels. This wouldn't be feasible using more traditional payment systems, we'd have to track everything in a centralized database, and that actually doesn't scale very well.
Sia is both low cost and high speed, but these are side benefits over the primary goal of uncompromising decentralization. Storage just happens to be an application where the decentralized version is actually highly competitive to the centralized versions.
I'm glossing over a fair amount here, if there's anything that doesn't make sense or seems off feel free to ask, I'm happy to clarify.
How do renewals work? With S3 I upload something and then pay for it + bandwidth until I delete it. Would I need to know how long I need storage for ahead of time? Estimated bandwidth?
I run a video rendering service that currently uploads previews (call it 5-10mb) to S3 for ~3 days and then deletes them. Finished renders (~50-200mb) are stored for 2 weeks. I upload 500-1000 new files a day.
Amazon just bills me for this once a month. Would I need to load up a wallet and prepay each of these contracts individually? How would I send out a download link or play a video in a browser?
You give Sia a budget and it does a pay-as-you-go with the budget. You can set how long you want the contracts to last, by default it's 3 months. That means you budget for 3 months of storage, and then you don't need to pay into Sia again for 3 months.
Anything that you paid for but didn't use will be refunded in full at the end of the 3 months.
Right now Sia does not support filesharing of any kind, however that's in the mid-term (6-12 months) roadmap.
There are a lot of details such as finding servers with available space, paying for storage, posting bonds, collecting bonds with fraud proofs, and doing all of that in a decentralized way. You pretty much need a blockchain for the decentralization.
> that just syncs other people's stuff instead of yours and pays you monthly.
with VC money? with a fickle settlement system that requires 5 layers of financial institutions before you get a REST API? where you probably still need 50+ money services licenses and compliance department anyway? where any of those financial institutions can still cut you off for any reason? to maintain operations in a single country?
why do all that when you can just print money, or more accurately create a scarce digital asset of value to the market you create, available internationally on day one
you don't have to agree with it, but the pendulum has swung waaay in this other direction and thats the answer for "why" before you even get into the implementation that the founder wrote.
So exhausting watching people argue that trust is no longer necessary when they're simply pushing the trust around. You have to trust someone at the end of the day, and it's more practical to use the law to trust them (and enforce that trust) than cryptography.
If I want to decentralize my storage, I would replicate data between multiple cloud object stores (Backblaze, S3, Wasabi, etc, a library to abstract PUTs and GETs) and pay them for their service. I trust them, but can verify each is holding up their end of their commercial agreement (similar to RAID block scans, but with object hash verifications). They aren't randomly going to yank their storage node of the network like a home NAS device or other consumer level storage device, because it's their business to be available.
A superior solution has yet to present itself.
Sidenote: The blog post is holding cloud provider data egress fees up as an example of exorbitant bandwidth pricing; this is true, but only from cloud providers. You can get bandwidth extremely cheap from non-cloud providers or through other means[1].
They're definitely less costly. I could host over 10x the amount on Sia for what I pay for B2 at the moment, B2 is already one of the cheapest providers, plus that's just in storage, ignoring all bandwidth costs. I'm really curious what the numbers will look like in terms of reliability here. Theoretically you can figure out how often storage contracts try to retrieve and fail and the host has to pay a fee by looking at the block chain, anyone done the analysis on this yet?
But do you have the same level of durability and availability of your data? Do you trust the storage nodes to not be making copies of your data? Can you pay with dollars instead of bitcoin?
I'm paying for trust, and it is a hard sell that a distributed storage system spanning worldwide jurisdictions is inherently more trustworthy (although it is possible that it can be as durable and reliable as traditional storage systems, but the proof is in the data).
> Do you trust the storage nodes to not be making copies of your data?
No, I trust the cryptography to, I sure as hell wouldn't trust Backblaze, let alone Amazon not to do that either. I use Restic when using such services to encrypt before sending anything to them.
> durability and availability
This is IMO the only real question. I want to see some numbers and I think their blockchain should actually be able to provide values for this theoretically.
Okay, so after some reading: Sia uses Reed-Solomon coding in a 10 of 30 configuration. This means that as long as any 10 out of the 30 hosts you upload to are available you can download your files. There's a quite high failure rate for contracts at the moment as it's not very profitable, around 9%. In order for this to be a problem, over 20 contracts would need to fail. If my math is correct we're talking about that should achieve approximately "20 9s" of reliability. Now there's one major catch with that, which is that your hosts must be chosen in a properly random way, but there's some cool tooling for Sia to help with that available like Decentralizer.
Overall, given this, while I'm not 100% confident in their software and cryptocurrency yet, given time I think this could prove to offer an even higher guarantee of reliability than traditional cloud storage services at much lower prices.
Depends largely on your bandwidth usage if that's beneficial, for me personally I'm talking home/small business backups where the additional reliability of B2 is great and the price actually works out better than putting that system in RAID1.
Of course, I'm talking about using the system to host personal and small business backups like I do with B2. I backup all the important content off my home RAID nightly. Definitely a good warning to put out there though.
You're not necessarily wrong but it's not so one-sided. In Sia you are definitely paying, the storage is automatically verified, and the storage providers don't get paid if they go down.
I think its great that most of the questions here are getting answered, but I'm still a little confused as to who this is for, and why adding bitcoins great great grand child to the project helps.
It doesn't look like its built for large data sets, since the capacity of the network is fairly low, and the providers fairly small. So if I want to store ~5TB of data, this isn't much of a fit and if I want to store ~5GB of data well there are tons of options and if I want it replicated, I'll replicate it.
Seriously though, I don't want to play game theory games with a list of storage providers, especially for tiny sums of data, I just want it to be there.
This also cuts both ways. Your business needs a file but your customer can't connect to that node because of some political blocking of internet connections. Who can you rely on? What central service can you call to get this fixed asap?
The idea with Sia is you upload to not one, but several people, you pick the level of redundancy you want and if you can't get the file you want from any given one of them, they pay the price and you get it from someone else.
It's kinda designed to expect that; you end up paying for more redundancy over more hosts, which is a win. Yes some will / may be down, but a) they're penalized financially, b) you can (by default do) have redundancy
I was greatly skeptical of these types of systems but ipfs proved me wrong. It's fine, when combined with a CDN. I'm interested in what new use cases Sia handles when CDN + IPFS works so well. For instance my blog https://www.wuli.nu (source code https://github.com/posix4e/blog)
Can you please clarify which CDN features are you referring to on your blog (can't find any) or anywhere else? Also can you elaborate what do you even mean by "so well"?
Sia was started to be built before IPFS was even announced and has hosted PBs of data already as a working product. On the other hand IPFS hasn't even released it's public testnet so it would be great to understand your point.
Edit: btw by IPFS I assume you refer to their Sia equivalent product, Filecoin. At least I do, as IPFS cannot be compared to Sia alone.
How does http handle illegal content? Is there any reason that content should be blocked at a protocol level instead of punishing the person who created it through existing regulations? If Sia has a way to remove such content it loses its decentralized nature, which is the whole reason it exists in the first place.
Should it exist? Well sometimes important content is illegal, such as political content in China. It’s good to have a way to publish important illegal content, even if bad stuff slips through.
What if someone posts illegal violent or sexual content? That is unfortunate, but it means that a real person took the time to post such content on the network. Keeping them from speaking will not change their warped beliefs, it will just hide them from the world and keep people from remembering that evil people exist. The person who published the content should be punished through existing regulations and the content should stay on the network to remind us that some people are horrible. Hopefully no one accesses it.
I believe that censorship resistance is a key value proposition for decentralized file storage. So its agnostic handling of illegal content is a feature, not a bug.
In the spirit of blockchain and decentralization, I think you're right. But practically, explaining that to the authorities will probably not be a pleasant experience if they manage to find cp or other illegal data on your hard drive.
The data is encrypted by the client, so that's theoretically impossible. Neither you nor the authorities have the ability to determine what file fragments (whole flies aren't stored; just fragments) are on your PC.
It's not theoretically impossible for law enforcement to discover the encryption keys to some illegal files on someone's computer. Then they could go around arresting people who host the chunks. Of course those people wouldn't have known about those files in particular, but they were knowingly running software that enables hosting illegal files. A judge would have to decide if that's OK.
Doesn't AWS also have the same problem? Someone could upload encrypted illegal content, then later law enforcement could discover the key. Why hasn't AWS gotten in trouble for that yet?
No large cloud provider allows anonymous usage. They take down content on request and assist law enforcement in catching the non-anonymous people doing illegal things on their platforms.
Okay, bad example. AWS is a paid service so obviously they have more information about their users than most. What about Dropbox or OneDrive? Both of those services allow files to be stored and shared with no requirement to identify yourself when you're creating an account, beyond providing an email address.
Both Dropbox and OneDrive scan your content to determine if it matches known illegal content (be it illegal pornography, DRM content, etc) and will remove it and remove you from their service.
If someone pre-encrypts data and then uploads that to Dropbox, it gets more complicated and AFAIK involves more of monitoring the IP addresses that things are coming from and/or being shared to.
Same argument for Bitcoin, cash etc. You have a thousand degrees of slippery slope in what constitutes "enable". Does paying your workers in cash for tips enable tax evasion?
Also, I'm not sure what scenario in which you would be able to find the encryption key, yet not have access to the wallet controlling the contracts? Even if you did, I'm not sure why hosts could not assist in cancelling contracts in a similar fashion to cloud providers taking down content on request.
US has strong third party protection laws. If it doesn't, HN could be held accountable for illegal files that were converted to base64 and then posted in the comments.
I am not a lawyer and this is not legal advice, but as a third party you are not responsible for illegal data uploaded to your server/machine so long as you are not aware that it is illegal data. Once you are informed that there is illegal data on your machine, you have 24 hours to remove it.
I wonder about whether it's right to be renting your storage space away without knowing what it's being used for. I don't want to directly contribute toward bad things happening.
You don't have any real proof that Company XYZ aren't using your money to do bad things.
You would stop participating if it turned out the Sia network was mostly used for bad things. In the same vein, you may choose not to use Amazon because it is unfairly treating warehouse workers.
All data is fragmented and encrypted, so it's impossible to know whether a particular host is storing a fragment of illegal content or not. I suppose if you somehow _did_ find out you were hosting a fragment of illegal content you could breach the storage contract (and pay the resulting penalty) but that's very unlikely because, again, all content is encrypted.
Encryption on Sia happens on the client-side, so the host has no guarantee that the file is actually encrypted.[1]
For encrypted data, I think it will be similar to the situation with other storage providers. If law enforcement searches your computer and discovers that you've uploaded illegal content in encrypted form to S3/GCS/Mega (especially Mega because they make it so easy to upload client-side encrypted data), then law enforcement will order the provider to destroy all copies of the data.
It will be interesting to see what happens if the provider is a Sia host. Law enforcement entities have standard processes for reporting illegal content to Amazon/Google/Mega/etc, and those companies have teams responsible for handling those requests. Casual Sia hosts currently wouldn't know how to handle such a request. The outcome might be that the compliance costs are too big for casual home users, so hosting on Sia becomes a specialized task that only dedicated companies can provide.
I think someone technically skilled enough to run a Sia host would also be skilled enough to know how to delete delete a specific file off their hard drive. Sia could even provide a UI to make that easy. Though if Law Enforcement has access to the client's encryption keys, I think chances are it'd be way easier for them to just issue a delete command to the Sia network directly.
We do in fact already make it trivial to remove content from your host. Law enforcement can provide a list of hashes, which you pass as parameters to a CLI command that deletes them.
I suppose that would only work if whatever keys the pirate needed to publish to allow the data to be downloaded wouldn't _also_ give the keyholder the ability to delete that data. Otherwise the copyright holder could just issue the command to delete the file themselves. I'm not sure if Sia works that way or not; would be interesting to see.
Since they say their intent is to compete on price with S3 / CDNs, it seems possible to be able to download a file without having permissions to delete that file. If that were not the case, then Sia would be limited to personal backup only.
It's confusing, because they refer to themselves as a potential competitor to S3 several times in the linked article, but I thought I read somewhere that conceptually what they're building is actually just the data persistence layer of a service like S3?
A complete S3-like service would require a third-party tool on top of Sia. Goobox[1], for example, uses sia as a storage backend and provides an S3-compatible API[2].
In other words, right now - I think if you are interacting with Sia directly you can do whatever you want with the files you have access to. Not 100% sure about that.
Depends on how it's implemented though, does it not? Eg, I can encrypt a file and give it to 10 people and they'll be clueless of what is in it, yes. But, because of how I encrypted it it is clear that the contents are the same on those 10 people.
So if I am busted and my key/etc compromised, it is possible that those hosts are compromised as well.
This says nothing of Sia's implementation, just that encryption itself does nothing to prevent a party from being prosecuted from illegal content.
Couldn't the same thing happen with AWS? You encrypt an illegal file, store it on AWS, then get busted? Why would those 10 people in your example get in trouble but not AWS?
Well that too depends on implementation, I imagine. AWS has channels for communicating information about their part in illegal content. So does Reddit/Youtube/etc. Even early IPFS plans[1] included channels to take down content you may be hosting.
Though, I feel like you're injecting meaning into my reply. I was replying to your comment about how authorities wouldn't know if content on my computer (a Sia host) was illegal. My point was simply illustrating that encryption by itself does not prevent me from being the host of illegal or illicit content.
I find the concept of "Illegal Content" conceptually questionable. Authorities do not care much, I assume, if you possess illegal content. What they care about is if you share illegal content.
Say you have a pro-democracy pamphlet on your hard-drive in China. If they find it they would imprison you no doubt but they don't care so much to look for it, and it is hard to find if you never share it with anybody.
SIA provides private storage so if your only crime is possessing illegal content but never sharing it with anybody no big deal (I would assume). But if you share it then SIA can not protect you because the exchange must happen somewhere outside of SIA.
Which is stupid. If I were a member of a foreign government I’d use sophisticated hackers to plant that kind of content on anyone I didn’t like. Automatic jail for life.
It would be an interesting case for the Supreme Court. Did you do something illegal by allowing SIA to store encrypted data on your hard-disk not knowing what it was?
It doesn't. The last real application of Sia that I heard of during the previous crypto boom basically sounded like renting your HD space out to people who would potentially use it to store illegal content.
Child porn is the emotional argument, the only possible case where most people might agree that some content can be illegal. A more representative scenario is illegal copies of music or a political pamphlet in China.
1) If childporn was not banned, how many people would consume that content anyway?
2) What about hyper-photorealistic childporn movies with fictional characters and future rendering technologies? How will you tell the video is true or not and worth to be removed?
Otherwise encrypted illegal content is stored on everyone's computers, and technically makes them potential felons. And thanks to the magic of the blockchain, it's indelibly recorded forever.
Infosec usually, the difference here is you're taking on the risk and data knowingly and willfully. "Officer, it's part of a blockchain thing called Sia" is unlikely to satiate them.
Although the cost of cloud backups are egregious compared to the cost of raw storage, I'll pay a premium for the peace of mind that comes from letting someone else be held liable for storing my data and credentials to access it. I can easily upload/download files to google, iCloud, or Dropbox from almost any device knowing only my email and password, which I find preferable to having to remember an arbitrary 29 word seed. With the amount of exit scams in cryptocurrency, too, I just don't trust any project to continue to provide the same amount of utility that they do now, if they provide any at all.
I suppose you could use a custodian site to link an email and password to the seed, but then you enter a centralized third party to the mix.
I see value in Sia but it's just not for the average person in its current state.
You could store your Sia seed for free on Google Drive. That way you get the benefit of Google being "liable" for your data and credentials, but with the much lower storage costs of Sia. Yes, it's a centralized third party, but you already implied you have no problem with that.
Sia's seeds are ridiculous-- the 29 words provide 300 bits of entropy. 100 bits would be a sufficient security margin against brute forcing, assuming a modern memory-hard KDF like Argon2.
With a 100 bit password, assuming every flop of the 1.8 exaflops of the Top500 supercomputers tested a new password, it would still take 25,000 years to crack. Key stretching should add at least 30 bits of security by taking a billion operations--
Here's what 100 bits of security margin looks like with a more sophisticated scheme (abbrase): "Hope raised between unpleasant bellows. Devil rode sullenly, refugees waiting." => (first three letters) hopraibetunpbeldevrodsulrefwai.
I think they are saying they are removing "trusted" intermediaries. There are zero trusted intermediaries sitting between buyer and storage/bandwidth in Sia's model. Thus, value cant be extracted from providing trust/confidence.
They explain what the mean by that two sentences later:
> Hosts on Sia do not have to worry about building and maintaining enormous datacenters, employing thousands of employees, and marketing their services. They only need to worry about providing reliable storage capacity to renters on Sia.
And yes, it's essentially a decentralized storage market built on smart contracts. You can either pay to store your files (the client software handles redundancy, contracts, etc automatically) or _get_ paid to rent out space on your hard drive.
Data is verified probabilistically on the Sia network. The blockchain has access to the Merkle root of the data that the host is supposed to be storing. The blockchain will request that the host provide a 64 byte segment of the data (chosen randomly) along with a Merkle proof that the data is part of the Merkle root.
If the host can provide the data and the proof, the host is rewarded as though they've demonstrated that they have all of the data. If the host cannot provide those 64 bytes along with a proof, the host is punished as though they are not storing any of the data.
How does punishment work? What stops one bad actor from agreeing to collect infinite data from a variety of sources and tanking both the trust and profitability of data hosts?
Also what about bandwidth constraints on the host end
When a host agrees to accept data, they put up out-of-pocket money. This makes it expensive for a bad actor to accept an infinite amount of data, as each piece of data requires more collateral to be put forward by the host.
Before a renter creates a contract with a host, the renter will perform some measurements on the host and determine if the host is suitable. A renter in China will chose different hosts than a renter in the US, because the latencies and throughputs of each host will be different.
Is there somewhere I can read about the punishments in more detail? E.g. how often the quizzes are, what the penalty is for getting it wrong / not being available for the answer?
If I recall correctly, the client pre-computes hashes of random fragments of each file then "quizzes" the hosts on that information periodically. If they're not actually storing a copy of the data, they won't be able to compute the resulting hash. (I may be misremembering some of the details, but I believe that's how it works in principle.)
I agree that the use cases don't seem to be lining up. So is this for personal use or corporate data center replacement? Consumer companies will give you 2TB for $10 with no bandwidth costs with guaranteed replication and sharding built in. What happens when one of these nodes goes down? I didn't see anything in the post about that. I think the thing that always gets lost in these ideas is that there is a reason that people pay these current prices. It is because is solves a real problem and high reliability requires paying people to make sure that is true. A smart contract and collecting micro payments on the hope a cryptocurrency will rise in price have much different skin in the game.
I've always thought the most important use case around decentralized storage was more around censorship. You can post content that is harder to block in countries that firewall AWS, etc.
This is a flashback for me to 2006. I was working on storage technology and reviewing a few products in p2p storage. There's too much burden to apply p2p to get space. Storage is cheap enough for this idea to work.
Just switched to backing my data on Sia. Was previously using Nextcloud, which is great, but more expensive.
Took a little time to set up, but renting 200 GB for 10 cents/mo, in a decentralized system that encrypts my data, has great uptime, and doesn't mine my personal or "aggregated" data, is unbeatable.
I heard Sia is not great at handling large number of files, so I'm using a backup software to create automated, compressed backups first, and Sia handles those larger files.
This takes me back I still have a few of there coins sitting around from years ago that have done nothing but tanked ( this was somewhat expected but I wanted to diversify in alts and this looked like one of the cooler options )
This reads like a typical blockchain non-sense hype article. Before you even go down the road of comparing sia to S3, what about some unbiased benchmarks? Show us the 99.999999999% durability, the pretty much unlimited bandwidth, the single digit milli-second latencies, the practically unlimited capacity.
If you want to compare Sia to anything, compare it to Dropbox or OneDrive. S3 was made for an entirely different purpose and Sia can't even remotely compete with it.
As a personal data store? Maybe. But wait until this thing becomes more popular and hackers start making a mess. Will be interesting to see how stable this "fully decentralized" network is, once it gets the full attention from bad actors as all the cloud providers.
You are being deceived by Amazon's PR machine at its finest. The 11 9s of durability are a "design", not a guarantee. The official PR line at the time was "S3 is designed to provide 11 9s of durability".
I agree in that they are probably comparing the wrong aspects of S3.
You're not going to be able to compete on the speed anytime soon probably, and maybe even the capacity if it really takes off. But, anyone here that works on an Amazon-based internet company knows that S3 is the most unreliable (in terms of availability, not storage) universally used product they have, next to EC2.
A good chunk of incidents can be attributed to S3 downtime in some AZ. With a decentralized storage network, the probability of actual downtime in terms of availability go wayyyyy down. If Amazon us-east-1 has a bad deploy or a network event, you lose access to those buckets, full-stop.
With something like this, you'd need a much larger network event, and if the app is fully decentralized itself, blast radius can be contained to individual customer regions instead of where you host your servers.
Probably more useful to compare to GCS Storage honestly, since they have multi-region buckets.
> You're not going to be able to compete on the speed anytime soon probably, and maybe even the capacity if it really takes off
Sia today running in the default configuration gets as much as 300mpbs upload and download, which I believe is comparable to S3. Latencies are several seconds, which is not comparable to S3 however this is a matter of software optimization, not a fundamental limitation of the network architecture.
coming from someone who is not a fan of Siacoin from when I last used it in 2017, I think you miss the point
what I like and want is encrypted distributed data that I can store and retrieve quickly and cheaply
I also like an economic model that aligns the hosts to maintain the data, and how the proliferation of that economic model lets me invest in its growth from merely being a passive non-productive speculator, and finally I would like a clear easy to understand ROI from being a productive host: can I subsidize the cost of my hardware that I'm already interested in having.
any cryptocurrency can do that, but how the software uses the cryptocurrency is a key factor, and unfortunately we can like an idea and team but get forced to accept an economic model that has flaws
so for me, it is easy for me to understand why teams want to promise the sky against S3, and it really isn't that relevant to using this securely. like, you might be able to subsidize all of your hardware and provide a service without worrying about if the SIA comparison is accurate because its really not relevant.
but finally, it turns out that unmonetized and non-cryptocurrency based decentralized cloud storage networks are good enough! so I stick with IPFS.
IPFS isn't decentralized cloud storage though. Unless you host your file yourself on your IPFS node, your file is not guaranteed to be available. Hence why Filecoin is a thing they are working on.
there are also services and nodes which will maintain pins for you, and the capacity for decentralized encrypted shards is fine.
There may be circumstances where I want the piece of mind to pay multiple nodes to store/retrieve encrypted shards of data. SIA nor STORJ seem to be it.
I have a feeling this is an apples to oranges comparison.
Can you serve clients directly over HTTP using Sia in a permissioned manner? If not, then it's not usable like S3. If it's not usable like S3, then S3 pricing comparison is meaningless.
Also, how correlated is Sia pricing to the overall cryptocurrency market? Will my costs go through the roof when there's another crypto bubble?
Would it actually scale? If Sia became popular, would prices go up like Bitcoin transaction prices? Isn't the extremely cheap bandwidth due to spare capacity of average broadband plans? If so, wouldn't providers eventually cut that off? Broadband pricing relies on the fact that most users don't use all of the bandwidth most of the time.
These ideas are all the wrong direction to decentralize. We don't need more layers and more complexity and more blockchains, we need simple, reliable, accessible household appliances. For storage, for sharing, for posting.
Please stop adding layers and layers of calculations, it's a waste of energy.
From what I understand, right now these projects are working on the backbone of the systems. In the future, other developers will be able to create services on top of these platforms to resell the storage space. So maybe someone will have an online cloud hosting service with a simple drag and drop online interface. They will charge a fee to the customer and the website operator will simply buy storage contracts on SIA or similar providers to provide the actual hosting and distribution of the files.
> Please stop adding layers and layers of calculations, it's a waste of energy.
The trend in electronics is that this has become less true over time. As transistors have shrunk, the relative cost of calculation has diminished compared to the cost of transmission over distances. A cryptographic operation can be cheap compared to accessing main memory because the distances signals travel within the crypto logic block are tiny.
Complexity (as in more to understand) does not equate sensibly to higher power consumption.
What a refurbished hard disk does not give you:
- Redundant storage
- Redundant power/network/etc.
- Geographic distribution
- A network API
- Any networking, or uplinks at all
- Access control
- A team ensuring the reliability and security of your data
- Global network of datacenters and pops
- Integration with other cloud services
All that and Amazon et al. do not store your 1TB of data in a single refurbished hard disk. This is beyond non-sense.
Sure, some of these costs are amortized, but I still think this is a lame comparison. If all you wanted was at-cost block storage with crappy reliability and speed, you could probably accomplish that in easier ways.