
Cloud Storage for $2 per TB per month - beedrillzzzzz
https://blog.sia.tech/cloud-storage-for-2-tb-mo-8a34043e93bb
======
kmod
I worked on the design of Dropbox's exabyte-scale storage system, and from
that experience I can say that these numbers are all extremely optimistic,
even with their "you can do it cheaper if you only target 95% uptime" caveat.
Networking is much more expensive, labor is much more expensive, space is much
more expensive, depreciation is faster than they say, etc etc. I don't think
the authors have ever done any actual hardware provisioning before.

I didn't read all their math but I expect their final result to be off by a
factor of 2-5x. Hard drives are a surprisingly low percentage of the cost of a
storage system.

~~~
Taek
Author here. A lot of these numbers are drawn from experience in the mining
world, where people realized that when cost is the ultimate bottom line, a lot
of corners can be cut.

Sia systems don't need a ton of networking. I ran the networking buildout
costs by some networking people, and again it comes down to cutting corners.
If you only need 10 gbps per rack, if you don't mind having extra milliseconds
added, etc, you can get away with very scrappy setups. The whole point is that
it's not a highly reliable facility.

~~~
arcticbull
Here's the issue. We know that due to economy of scale and domain experience,
AWS will always have the lowest cost (to Amazon) for storage -- whether that's
totally-reliable storage, or sorta-reliable. If there was a demand for sorta-
reliable, they'd build a sorta-reliable S3 and undercut you. Then, blockchain
adds inefficiency. Therefore, it's basically impossible for any blockchain
solution to have a lower total cost to provide storage.

~~~
throwaway9878
Everybody with more money than you can always undercut you in anything you
ever do so why bother ever trying to do anything

~~~
paxys
The point is not to do it cheaper but better.

~~~
arcticbull
... better than AWS with its SLAs, that every major company relies on to some
extent?

~~~
paxys
Yes. There's a reason companies like Digital Ocean, Heroku, OVH, ZEIT, Joyent,
Rackspace, Linode, Cloudflare and many more have been able to survive and grow
rapidly in an AWS-dominated space. None of them are competing by undercutting
Amazon in price.

~~~
folmar
OVH is certainly undercutting Amazon in price. The free bandwidth included in
each instance is already a dealbreaker if you actually use the instance for
anything other than very heavy cpu-bound loads with small output to send back.

------
walrus01
I work in telecom/datacenter infrastructure and this is fanciful. The whole
way they take the wattage load of one machine and then hand wave away all of
the rest of the costs of either building and running a datacenter, or paying
ongoing monthly colocation costs... Is just scary. I truly don't mean to
offend anyone but this looks like a bunch of enthusiastic dilettantes.

Generators?

UPS?

Cooling costs?

Square footage costs for the real estate itself?

Security and staffing?

At the scale they intend to accomplish they will need at minimum several
hundred kilowatts of datacenter space. Even assuming somewhere with a very low
kWh cost of electricity, that much space for bare metal things isn't cheap. Go
price a lot of square footage and 300kW of equipment load in Quincy, WA or
anywhere else comparable, the monthly recurring dollar figure will be quite
high.

And all of that is before you even start to look into network costs to build a
serious IP network and interconnect with transits and peers.

~~~
Ajedi32
They're not talking about a datacenter. Datacenters need to be reliable. Sia
storage pools don't, because security and reliability is achieved at the
global network level, not at the level of individual systems or storage pools.
95% reliability means you can be down for two whole weeks out of every year
and still be well within acceptable uptime requirements.

Generators? Who needs those? Just wait for the power to come back on. UPS? Why
bother? Square footage? Stick some wooden shelves in the cheapest building
possible. Cooling? Locate in a cold climate and buy some window fans.

This isn't anything like the sort of infrastructure you're used to dealing
with. Think Bitcoin mining farm, not Backblaze datacenter. Any corners that
_can_ be cut _will_ be.

~~~
throwaway3157
Who are the customers for such unreliable systems?

~~~
Ajedi32
Sia is _very_ reliable from the customer's perspective. Its only individual
systems and storage pools that have lax uptime requirements. Thanks to some
clever network-level redundancy mechanisms (10-of-30 redundancy), 95% uptime
at the storage pool level translates to 99.9999% uptime from the customer's
perspective. See the "Uptime Math" section in the OP for details.

~~~
ac29
When you says Sia _is_ reliable, do you mean it could be reliable if
hypothetical X, Y, and Z things happen?

Because according to the homepage (sia.tech) there are only 895 hosts storing
a total of 206TB right now, which is a very, very small amount. Backblaze, a
relatively small player as compared to the big cloud providers has 1.1 million
TB of raw capacity as of last year (redundancy reduces the available capacity,
but still) [0].

[0] [https://www.backblaze.com/blog/hard-drive-stats-
for-2019/](https://www.backblaze.com/blog/hard-drive-stats-for-2019/)

~~~
Ajedi32
By reliable in this context I mean robust against hardware failure. (Including
the failure of entire storage sites.) The OP explains the math and associated
assumptions for how they derived that "99.9999%" figure, and acknowledges that
since the calculated chance of data loss due to hardware failure is so
infinitesimally small, other failure modes outside of what they modeled are
likely to dominate.

As for the relatively small number of hosts at present, 895 is more than
enough for 10-of-30 redundancy to work "as advertised". You really only need
30 hosts technically. The bigger issue I think is the relative immaturity of
the software. Sia is still pretty new compared to most other data storage
systems; and although I've never heard of any software bugs in Sia resulting
in data loss, that doesn't mean such a bug will never be discovered. Be
cautious, keep backups, and never rely on any single storage medium to store
your data.

------
mtlynch
In 2018, I spent about six weeks running a series of tests to measure Sia's
real world costs. At that time, storage cost ~$4.50/TB on Sia to back up large
real world files (backups of DVDs and Blu-Rays).[0] Community members have re-
run my tests every few months, most recently in October 2019, when the cost
was measured at $1.31/TB, though it's worth noting that recent tests use
synthetic data optimized to minimize Sia's cost.[1] It's also unclear how much
the market value of Sia's utility token affects these costs, as the price of
Siacoin has fallen by ~80% since I conducted my original set of tests.

The calculations in today's blog post account for the labor cost of assembling
hardware, but leave out major other labor costs:

1\. You need an SRE to keep the servers online. Sia pushes out updates every
few months, and the network penalizes you if you don't upgrade to the latest
version. In addition, to optimize costs, you need to adjust your node's
pricing in response to changes in the market.

2\. You need a compliance officer to handle takedown requests. Since Sia
allows anyone to upload data to your server without proving their identity,
there's nothing stopping anyone from uploading illegal data to the network. If
Sia reached the point where people are building $4k hosting rigs, then it's
safe to assume clients would also be using Sia to store illegal data. When law
enforcement identifies illegal data, they would send takedown notices to all
hosts who are storing copies of it, and those hosts would need someone
available to process those takedowns quickly.

[0] [https://blog.spaceduck.io/load-test-
wrapup/](https://blog.spaceduck.io/load-test-wrapup/)

[1] [https://siastats.info/benchmarking](https://siastats.info/benchmarking)

~~~
marcinjachymiak
Another cost associated with your 2nd point is the collateral a host would
have to burn to comply with take-down notices.

------
reggieband
I'm going through Sia's website now. It seems this article is meant to bolster
the claim on their website which states "When the Sia network is fully
optimized, pricing will fall somewhere around $2/TB/month." [1]

Call me skeptical but it seems that they aren't committing to building out
this infrastructure themselves or providing a specific amount of storage at
this pricing. They seem to be outlining a potential infrastructure that some
enterprising individual (or corporation) could use to provide storage at that
price to "renters" within their marketplace.

I guess I'll just wait until someone puts their money where their mouth is.
Given that this is a marketplace, the fact that a theoretical setup could be
built to provide some service doesn't necessarily guarantee it will be built.

1\. [https://support.sia.tech/article/thvymhf1ff-about-
renting](https://support.sia.tech/article/thvymhf1ff-about-renting)

~~~
ericflo
It's live right now though, the community has deployed the infrastructure [1]
and the pricing is approximately what they claim [2].

1\. [https://siastats.info/hosts_network](https://siastats.info/hosts_network)

2\.
[https://siastats.info/storage_pricing](https://siastats.info/storage_pricing)

~~~
ADefenestrator
That's the price being paid for the storage, but is it actually covering the
cost of providing that storage? Or is it just 100-300 people who thought "Huh,
neat, I'll toss a host online and see how it goes" ? I'd lean towards the
latter and assume those storage costs are heavily subsidized by a few people
satisfying curiosity.

------
jakear
> That means about 2 hours of labor per rig. We’ll call that $50

Does that seem low to anyone else? I don’t really have any background in the
area, but 25/hr cost _to the company_ would be less than 20/hr pay for the
skilled labor. Other countries are different of course, but in US I could make
that much flipping burgers in the right area.

~~~
eyegor
It's outrageously low. They're also fancifully assuming cpu tdp = electrical
power, cooling = 0w, and another 0w to the motherboard/network cards. And each
box has 1 non-redundant, schmuck-grade $80 psu, as well as a consumer grade
mobo. This would never be anywhere near their uptime.

~~~
Taek
CPU TDP is accounted for, the CPU linked in the blog post draws 65w and that
is used in the electricity calculations.

I did realize that I completely forgot about RAM, when I get back to a
computer I'll have to make some updates, but it won't materially move the
numbers, there's 33% margin of error between the number in the spreadsheet and
$2 / TB / Mo.

The $80 PSU is what I could link to from Newegg, I do have experience in
industrial electronics, and I know from firsthand experience that you can buy
a 10+year PSU at 93% efficiency for well under $80 at 300 watts. At that
level, you're going to be able to request all the required cabling as well,
which means you're getting a much better price than the $7 per cable linked in
the post.

95% uptime means 18 days of downtime per year. Consumer grade PSUs and mobos
do much better than that.

~~~
eyegor
You included tdp but tdp /= actual electrical power. Intel approximates it
from base clock (practically all consumer motherboards ignore the tdp/boost
spec). AMD uses voodoo to calculate tdp, it's a pure marketing number with no
basis in power usage. To make matters worse, motherboards typically go nuts
with voltage on consumer boards, and power draw can vary wildly depending on
what instructions you're using. There are a crap load of x86-64 extensions.

Yes, you forgot ram. And network chips, motherboard power delivery losses,
motherboard power usage, cabling losses, etc. I'd guess it would total
50-100w, but feel free to current clamp the psu rails to get a realistic
number.

As for the 95% uptime, I agree with you. I wasn't considering how much
breathing room that actually provides, I was just going with my instincts.

------
ComputerGuru
There is way too much hand-waving and assuming going on this article. It is a
load of BS that does not take into account real-world inefficiencies. e.g.
sometimes buying in bulk is more expensive than buying at retail, esp when you
need consistent supply. Sure, you may need only an hour of sysadmin time a
day, but what sysadmin will let you employ them an hour a day? The buildout
did not list a CPU. The assumptions about uptime are over-amortized, an outage
given the resources they quote may average out to 95% uptime but their latency
for getting systems back up is going to be absolutely terrible and I’d be
surprised if outages were shorter than a day or two on average. They aren’t
factoring in cooling. They aren’t factoring in the drastically reduced
lifetime of drives in their ridiculously cramped and under-ventilated cubbies.
They are completely ignoring diagnostic time, presuming they can only quote
actual repair times, which is an absolute joke given the lack of smart
hardware and enterprise DC management. They think they can average out
throughout over the number of drives not taking into account per-channel
limitations. They are not taking into account the extra time to build and
dismantle systems in their hacked-together IKEA shelves. They are
underestimating the costs of electricity at commercial rates. I could go on
and on, but suffice to say that I would never, ever use their network for any
purpose without another backup (which they don’t finger into their costs, of
course ;). I thought B2 was risky; this is taking it to an entirely different
level.

~~~
kllrnohj
> The buildout did not list a CPU.

It did, or does now anyway, the RYZEN 3 1200 for $95.

EDIT: Although the better option is the 3200G so that you can actually get a
display output from the thing. Same price, so it doesn't really change
anything, but it does cut the CPU core count down a bit if that matters at
all.

That said the buildout still doesn't work because you can't actually plug the
"sata splitter" cable they linked into the motherboard. Because the splitter
was actually a 4-lane SAS SFF-8087 breakout cable, and there's no consumer
motherboard with 8x of those connectors on it. Good luck finding even 1 or 2
of those connectors on a consumer board, and it sure as hell won't be at dirt
cheap prices.

So you either need 4x the computers they calculated, or you need to budget for
add-in SATA/SAS controller cards. Which, because they aren't used in consumer
land, are not cheap. You could go used, but that's still going to increase the
bottom line (and won't be a reliable source of parts)

They also aren't factoring in assembly time nor budgeting for that. Building
these isn't going to go very quickly.

~~~
hddherman
I have gotten away with cheap PCIex1 2xSATA2 adapter cards, for roughly
10-15USD at the time of purchase. They did work, but this assumes a
motherboard with room for lots of PCIe cards.

Edit: to clarify on the CPU usage, could a potential build also get away with
a cheap AMD Athlon 3000G?

------
growt
I feel like backblaze has already done most of this and has it in production
[1]. Whereas this is just done back of the napkin calculation.

[1] [https://www.backblaze.com/b2/storage-
pod.html](https://www.backblaze.com/b2/storage-pod.html)

~~~
TheDong
Backblaze has tried to make their datacenter as efficient as possible, and
still only ends up hitting $5/tb/mo for their b2 service, as a point of
reference.

~~~
notyourday
Backblaze is _very good_ but they are definitely not efficient in $$
utilization.

Efficient $$ utilization is bread racks, built out data centers abandoned by
the likes of PepBoys that landlords will part for $3/sq foot per year and
Google using servers without cases and velcros to keep hard drives attached.

~~~
walrus01
Abandoned pepboys stores don't usually have very good fiber connectivity.
Backblaze and similar hosting/storage companies move enough traffic that they
need to be topologically close to major IX points.

If all you want is cheap commercial real estate with cheap dollars per square
foot figures, there are plenty of economically depressed areas within the
United States that you could put things. Those areas usually have very poor
fibre connectivity, fibre diversity, and choice of carriers.

I have previously explained this to a number of people who asked me,
basically, why don't all of these gigantic abandoned shopping malls get
converted into Data center space? Two reasons: poor connectivity, and nowhere
near enough electrical grid feed capacity (as proper three phase service) in
terms of watts per square foot. Bulldozing empty land in Quincy and putting up
a tilt-up concrete on slab dedicated purpose datacenter structure is much less
costly than extensively retrofitting abandoned, 30, 40, 50 year old commercial
real estate.

~~~
notyourday
> Abandoned pepboys stores don't usually have very good fiber connectivity.
> Backblaze and similar hosting/storage companies move enough traffic that
> they need to be topologically close to major IX points.

Not stores. Data centers.

It is typically cheaper to get long distance fiber links than metro fiber and
midsize data centers do not consume that much power, power which is plentiful
outside the major metros, especially in the old manufacturing areas.

The real reason why companies do not go there is because it is not sexy and
non-sexy places do not get "ninja" employees that would be passing brain
teasers on a whiteboard.

> Bulldozing empty land in Quincy and putting up a tilt-up concrete on slab
> dedicated purpose datacenter structure is much less costly than extensively
> retrofitting abandoned, 30, 40, 50 year old commercial real estate.

If you are doing it in a major metro then unless you get a big fat tax break
your real estate taxes are going to kill you.

~~~
jauer
> Not stores. Data centers.

Honestly, it doesn't make much difference. Datacenters built for random
corporations often only have connectivity to the local ILEC or MSO which is
going to get you pretty poor pricing.

~~~
notyourday
All of them are on net and all of them have fiber already to the premises.
Most of them have not just local loops but termination point for long distance
carriers.

------
lalaland1125
One interesting point of reference is that backbaze currently charges $5 / TB
/ Mo. Assuming they haven't changed their profit margin of 50% from 2017
([https://www.backblaze.com/blog/cost-of-cloud-
storage/](https://www.backblaze.com/blog/cost-of-cloud-storage/)), then this
would imply that they have a direct cost of roughly $2.5 / TB / Mo.

------
aresant
Top of Hacker News and there's nothing clickable above the fold that takes me
to the SIA website.

Content marketers and technical marketers - don't miss the opportunity on
Medium and other platforms to at VERY LEAST link to your homepage in the first
section.

In fact that is at the top of this awesome piece of content marketing is a
"Sign Up" button for Medium . . .

~~~
pbhjpbhj
I ended up at
[https://github.com/NebulousLabs/Sia](https://github.com/NebulousLabs/Sia) and
there's no activity in the last two years, the latest issues are a few "you
broke my wallet with the update and my password doesn't work" from 2018.

~~~
really3452
Yeah, they moved to gitlab:
[https://gitlab.com/NebulousLabs/Sia](https://gitlab.com/NebulousLabs/Sia)

~~~
pbhjpbhj
Thanks

------
border43
I've been using Sia for about three months to backup some personal files.
Nothing crazy, but it seems to work well.

I'm looking forward to seeing this project mature as well as have some more
layers build on top of it moving forward. I really wish the client offered
synchronization or access across multiple devices. For now you have to try
third party layers on top of Sia to accomplish this.

~~~
asdkhadsj
> I'm looking forward to seeing this project mature as well as have some more
> layers build on top of it moving forward. I really wish the client offered
> synchronization or access across multiple devices. For now you have to try
> third party layers on top of Sia to accomplish this.

Yea I'd actually pick it up now and give it a try if it had this feature.

------
AaronFriel
Really smart people make this mistake a lot, so I'm wondering what Sia is
doing to decorrelate failure rates. If hedge fund quants can turn mortgage
tranches into a machine for massive correlated economic losses, can blockchain
quants turn storage tranches into a machine for massive correlated storage
losses?

Or if one of the major hyperscalers or datacenter operators decides to start
selling storage to Sia, it seems likely that their control plane across
datacenters could result in correlated failures. A networking outage for their
AS could result in multiple datacenters appearing offline concurrently, for
example.

------
TheDong
This analysis entirely omits the cost of a sysadmin to manage the storage
servers. Even if sia is assumed to do almost everything, and even if we only
want 95% uptime, you still need someone to deal with software updates, hard
drive monitoring, etc etc.

The profit of $570/year/box is not enough to pay a part-time sysadmin and
still have any useful profit.

------
simias
>If we assume that the 30 hosts go offline independently

I wonder how reasonable this assumption really is. For regular CPU-bound
crypto-mining we see that it tends to centralize geographically in zones where
electricity, workforce and real-estate space to build a datacenter are cheap.

Assuming that Sia ends up following a similar distribution, it wouldn't be
surprising if several of these hosts ended up sharing a single point of
failure.

Beyond that, if only copying stuff around three times to provide tolerance is
enough to lower the costs to $2/TB/Mo, why aren't centralized commercial
offerings already offering something like that? Just pool three datacenters
with 95+% uptime around the world and you should get the same numbers without
the overhead of the decentralized solution, no? Surely the overhead of
accounting for hosts going offline and redistributing the chunks alone must be
very non-trivial. With a centralized, trusted solution it would be much
simpler to deal with.

Or is the real catch that Sia has very high latency?

~~~
WrtCdEvrydy
I'm guessing there's not a lot of 95% datacenters that don't have heavy
generators or UPS on site. You'd have to basically build a datacenter that has
lower guarantees.

------
scrooched_moose
Wait, how are they connecting 32 drives to that motherboard? They seem to be
implying they are splitting each SATA plug 4 ways, which as far as I know is
impossible.

The adapter they're linking to is SF8087 to 4x SATA, not SATA to 4x SATA
(which shouldn't exist). That motherboard doesn't have SF8087, it has 8 SATA3
connections.

Unless I've missed something big, SF8087 can not be plugged into SATA3.

~~~
britmob
It cannot. I assume there’s an HBA thrown in the mix somewhere they did not
mention.

------
theamk
I don't think it is correct to say that the only options are "host failures
are truly independent" or "world war three".

The hosts are not ever going to be fully independent. There will be hundreds,
if not thousands, host co-located in the same location -- likely of the
cheapest grade, without any extras like fire alarms or halon extinguishers or
redundant power feeds. A single fire (flood, broken power station) has a
chance of taking out thousands of hosts simultaneously.

And there is management system as well -- AWS has thousands of engineers
working on security. Will there be one at this super-cheap farm? What are the
chances there will be farms with default passwords and password-less VNC
connections? And since machines are likely to be cloned, any compromise
affects thousands of hosts.

... and all of those things are made worse by the fact that if you store
hundreds of thousands of files, your failure probability raises significantly.
If a data center burns down, at least few of your files may be unlucky enough
to be lost.

------
sigstoat
at a minimum the facility will need some power conditioning and/or insurance.
you don't want a brief power surge to eat all of your capital, and lockup
fees, in one go.

> For a 32 HDD system, you expect about 5 drives to fail per year. This takes
> time to repair and you will need on-site staff (just not 24/7). To account
> for these costs, we will budget $50 per year per rig.

will you not also lose 6TB (times utilization) of your lockup every time a
drive dies?

> 8x 4 way SATA data splitters

you've linked to SAS breakout cables. they don't plug into SATA ports, they
plug into SFF-8087 SAS ports.

they cannot plug into the motherboard you've listed. nor have I ever seen one
listed for retail sale that has 8 SFF-8087 ports.

the cheapest way to get 8 SFF-8087 ports is with some SAS expander card, and a
SAS HBA. even scraping off eBay that's another $50 per host, and two more
components to fail.

there are also actual SATA expanders out there, but they last about 3 months
before catastrophic failure in my experience.

~~~
Legogris
Isn't a potential problem with "SATA splitters" also that all disks will share
the same channel an therefore end up with worse performance? (Though I guess
it won't make a difference for mechanical drives)

~~~
sigstoat
any of the expander (SATA, or SAS) things, yes, will be sharing bandwidth. but
as you mention, it won't be a limiting factor for mechanical drives. and
considering the latency involved in this sort of retrieval, probably isn't a
problem regardless.

FWIW the break out cable they've listed is splitting up a connector that has 4
electrical channels onto 4 physically separate cables, so there's no problem
with it. they just don't have anywhere to plug it in.

------
johnklos
Big deal. I charge $5 per TB per month and I'm not even trying to be cheap.

The economies of scale should make this much less expensive. Colocating your
own machine in a real datacenter and hosting your own data shouldn't still be
cheaper than practically all of "the cloud" offerings, but it is. What does
that tell you about "the cloud"? It's marketing bullshit.

Sure, it's fine for occasional use, but anyone using the hell out of "the
cloud" can easily save money by using anything else.

~~~
chiefsucker
That’s not really surprising, but people tend to forget about it. In the end
somebody has to pay for ops. It’s business as usual, like it was a century
ago.

There are cases where you can indeed save money by doing more by yourself. But
how much time does it cost you and how much is your time worth?

How much time do you need to research, purchase, and eventually build your
hardware? How much time do you need to get a decent data center deal? How much
time do you need to bootstrap your setup? How much time do you need to
regularly maintain your infrastructure?

~~~
johnklos
Those are all just "cloud" talking points.

My time is worth a tremendous amount to me, which means I want to use my own
hardware. "The cloud" does not guarantee reliability.

Any company that does any project that even slightly regularly requires
compute / storage can easily justify the time to do all the things you
mentioned.

The fact that many companies have gone towards "the cloud" goes hand-in-hand
with the fact that many companies use Windows. It's clearly not the best thing
to use to get things done, but the IT people don't want to reduce their
importance and the management people like the kickbacks and perks they get
from buying certain thing from certain companies.

The savings look good on paper, but the reality is that they're based on
leaving out lots of information. I've helped several companies move from "the
cloud" back to good, local compute resources because of the amount of money
they were hemorrhaging to "cloud" providers.

For the most part, it's all marketing bullshit.

------
UI_at_80x24
From their site: [1] Sia is a variant on the Bitcoin protocol that enables
decentralized file storage via cryptographic contracts

[1][https://sia.tech/sia.pdf](https://sia.tech/sia.pdf)

------
krick
I don't know anything about the subject, so no idea if these claims are
realistic. But whatever, either they deliver or they don't.

My (or their, actually) problem is I don't really get what they are offering
right now. There is an impressive landing page with big numbers and pretty
pictures which explains pretty much nothing. Project seems to be in production
for at least 3 years, there are some apps, but I don't actually see if I can
use it to backup/store some data and how much it costs right now. I mean, they
say "1TB of files on Sia costs about $1-2 per month" right there on the main
page, but it cannot be true, right? It's just what they promise in the
hypothetical future, not current price-tag?

The only technical question I'm interested here is why they actually need
blockchain? This is always suspicious and I don't remember if I saw _any_
startup at all that actually needs it for things other than hype. It is
basically their internal money system to enable actual money exchange between
storage providers and their customers, right? So, just a billing system akin
to what telecom and ISP companies have? Is it cheaper to implement it on
blockchain than by conventional means? How so?

~~~
ericflo
> but it cannot be true, right? It's just what they promise in the
> hypothetical future, not current price-tag?

Here's the live pricing, right now:
[https://siastats.info/storage_pricing](https://siastats.info/storage_pricing)

> Is it cheaper to implement it on blockchain than by conventional means?

It's more so that _anyone_ can join the network as a host. They don't have to
have a financial or business relationship with anyone, they can just provide
their storage service and charge for it. No way to do that currently in the
world without a blockchain.

~~~
ac29
> It's more so that anyone can join the network as a host. They don't have to
> have a financial or business relationship with anyone, they can just provide
> their storage service and charge for it. No way to do that currently in the
> world without a blockchain.

Maybe I misunderstand your point, but I could certainly install MinIO (a S3
compatible object store) on a home NAS and charge people for it without using
a blockchain. I see your point about not having a financial or business
relationship with a blockchain network acting as an intermediary, but I can
assure you that the IRS and various law enforcement and regulatory agencies
would tell you that you absolutely do have a financial and business
relationship with whoever is paying you via the crypto-network whether you'd
like to or not.

~~~
ericflo
> I could certainly install MinIO (a S3 compatible object store) on a home NAS
> and charge people for it

But how would that work? You'd probably make a website or app that had users
sign up for an account, and then with that account they could associate
payment information from a payment processing company, and then you'd provide
them with credentials where they could log in to their Minio instance. Right?

Then, you have to go out and market your service, explain to people why they
should use it instead of existing alternatives, convince people that you're
trustworthy, build a reputation, and generally do sales.

In the case of Sia, you build your host, plug it in, announce it to the Sia
blockchain, and then clients from all around the world start paying to use
your storage.

Clients don't have to register for an account first, don't have to involve a
third-party payment processing company, and don't need a sales pitch because
they algorithmically test, measure, and rank hosts.

I remember at the outset of the web, a new thing was this user demand for
services to become "self-serve", as in, you would no longer need to talk to a
salesperson and establish a relationship in order to buy something — even
something custom. I see this as the next step of that, where you want to be
able to programmatically and algorithmically establish and dissolve those
kinds of service agreements.

------
standardUser
On a related topic, I've had a ton of problems finding a cloud storage system
that will reliably handle files around 100-200gb. Does anyone have a
recommendation for a service that can handle that file size with ease?

~~~
toomuchtodo
Any object storage system (S3, Backblaze B2, Azure Blob, GCP) should be able
to handle those file sizes with proper chunking into smaller parts (limit
details below per object store).

S3:
[https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview....](https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html)
(Max object size: 5TB, Max single multipart size: 5GB)

Backblaze B2:
[https://www.backblaze.com/b2/docs/large_files.html](https://www.backblaze.com/b2/docs/large_files.html)
(Max file size: 10TB, Max single object size: 5GB)

Azure: [https://docs.microsoft.com/en-
us/rest/api/storageservices/un...](https://docs.microsoft.com/en-
us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-
blobs) (Max file size: 4.75 TB, Max single block size: 100MB)

Google:
[https://cloud.google.com/storage/quotas](https://cloud.google.com/storage/quotas)
(Max file size: 5TB, doesn't appear there is a lower limit for objects to be
composed into a single object, docs could be better in this regard)

@khc: Terminology updated to be more clear for S3

~~~
reaperducer
I'd drop Microsoft from that list because the request for for something
"reliable."

It's reliable enough, if you can get it to Microsoft's cloud. But for the last
six months I've struggled putting very large files into Azure, using five
different connections from five different providers in three locations. Small
files are no problem. But large ones take two, three, or four tries.

~~~
toomuchtodo
I don't disagree Azure is hot garbage, just listing for completeness. S3 and
Backblaze B2 are my go-to object stores.

------
servercobra
So no CPU (or APU, so you don't need a GPU), no RAM, and those breakout cables
are actually for SAS, but no SAS card listed in the total. This does not
inspire confidence in the project at all.

------
imhoguy
Interesting article, but "black swan situations like world war three" may be
underestimation. Software bugs are more likely and sometimes fatal.

I wonder why transfer prices are not included? As you explain every transfer
is paid does it mean one has to pay for 10 uploads of every single object,
right? But as equipment ages, peers go out of business then who pays for the
data rebalancing transfers?

------
pedrocr
It's probably feasible to reach these levels of cost. I certainly still keep
NAS in two locations because even places like hetzner don't sell lower powered
machines with lots of disk space. But the build they specify doesn't have a
CPU or RAM and it's using a SAS cable to connect to a SATA motherboard.
Depending on the requirements of the platform they may be able to get away
with non-ECC RAM, a simple APU to not need a graphics card, and a few cheap
SATA PCIe cards to get enough connections. It will probably add ~500$ or ~10%
to the build though. I don't know if the other costs have similar issues.

~~~
Kliment
Hetzner's super cheap "cloud" machines allow you to attach "block storage"
volumes of whatever size you like. It's not particularly well advertised
(searching for this sort of thing takes you to their dedicated storage
offerings) but it's there and it's great.

~~~
pedrocr
From what I can see on the website that block storage consists of SSDs priced
at ~50€/month/TB which is a non-starter. The cloud instances are incredibly
cheap though, I may have a look at those.

~~~
mallets
Check out their storage box if using their cloud instances.
[https://www.hetzner.com/storage/storage-
box?country=ot](https://www.hetzner.com/storage/storage-box?country=ot)

€10 for 2TB and unlimited transfer within same datacenter (or 10TB if
accessing from outside). You will get a 1Gbps or more transfer speed and many
options for mounting the storage (SCP, webdav, etc). I used to use them as a
intermediate backup location when I used their cloud instances and dedi
servers.

~~~
pedrocr
That's starting to look quite interesting, thanks! 5TB of rsyncable storage
for 22€/month could definitely replace one of my NAS boxes. The 10 concurrent
connections are the only big limitation really. Not being able to run my own
stuff is also not ideal. But it's the kind of thing I've envisioned to stop
using a NAS and just point syncer at:

[https://github.com/pedrocr/syncer/](https://github.com/pedrocr/syncer/)

------
rcar
I was expecting an ad based on the title, but it ended up being an interesting
analysis of just how much storage ends up costing them with their focused
hardware setup.

------
prophesi
Honestly, I'm a bit confused on who the targeted audience is for this article.
I've been running as a host for Sia for months now. My rig is a raspberry pi
and a 10tb external HDD I had laying around.

~~~
Hamuko
Do you make anything from it?

~~~
prophesi
For the first few months, you'll basically have all of your Sia drained as
your contracts start coming in and your host wallet gives them collateral. But
yeah, my contracts only recently started to complete, and I've got the amount
of Sia I started with at the very least.

So... Yes, but probably not very much? Particularly when accounting for the
price drop in Sia.

Edit: If you do decide to host, it'll likely be about 5-6 months before your
contracts start completing if your host settings go by the recommended 26 week
max duration. And don't go out buying hardware unless you're in it for the
long run, which makes me now realize the point of this article.

~~~
aschatten
Can you share more info on cost breakdown and earnings? Also, how expensive is
electricity in your area?

~~~
prophesi
I wouldn't try hosting on Sia if you're trying to make a profit. I spent 20USD
on Sia currency. And I didn't go out and purchase a RPi + harddrive for it.
You likely wouldn't make a profit if you did. RPi's are great for costing next
to nothing in electricity, though.

------
dnprock
I worked for a p2p startup 15 years ago. We were exploring ideas and products
in this space. We came close to partnering with a company doing distributed
cloud storage. Their idea was to allow people to rent storage space in
personal computers.

We decided to scrap the plan to do p2p storage, ended up using cloud storage.
This p2p storage idea is a tough one. People are not willing to make a few
dimes renting out their hard drive or CPU. The economic unit is too small to
work. But good luck trying this idea. I wouldn't be surprised if someone tries
again in 20 years. :)

------
mjb
The IO operations amplification for 64 of 96 is pretty brutal, and
particularly unfavorable in a world where capacity-per-IOPS keeps trending up.
I wonder how they'll deal with that.

~~~
Taek
Each of the 64 pieces is fetched from a different host on the network, meaning
all of the IOPS are happening in parallel at network speeds. You aren't going
to be doing sub-millisecond updates for sure, but you can easily get under
100ms.

The reed-solomon also isn't even the most computationally expensive part, the
computationally expensive part is computing and verifying the Merkle roots.
All parts of the system though can go >1gbps on standard CPUs.

~~~
wmf
It's fine from the perspective of a single request but it seems like it
reduces the overall throughput of the network. If Sia has, say, 1M hard disks
and can do 100M raw IOPS then 10-of-30 gives 10M net IOPS but 64-of-96 gives
only 1.5M net IOPS.

------
ds
The most important thing not addressed here is demand. Last I checked
(granted, this was a while ago) it simply wasnt there- Meaning if you built
this rig you might only be able to rent out a small part of it.

If this has changed I would be interested in hearing about it-

One other thing I am not understanding is how this makes financial sense, even
if the demand is there. If I am buying a rig for 4500 bucks to get 200TB,
making "570 a year in profit" is nowhere near exciting enough. Practically any
other use pays more. Renting a dedicated server for a game, web hosting, hell
even GPU mining makes more.

(a single 1080ti can do about 1$ a day in gross revenue on grin/eth/etc -
which can be had used for ~400 bucks- Or you can get a p102 which is the
mining card version with no display output for 250 bucks) - Payback with power
costs/etc.. well below the 10 year threshold of siacoin)

Now where it might be interesting (IF there is demand), is just adding
harddrives to an existing infrastructure already in place. So if you are a GPU
miner and have 1000 rigs already in place, just adding a single 4TB harddrive
to each machine might not be too bad- They go for about $50 each used and
according to this, will pay back $8 a month with minimal extra costs

~~~
ds
So I did a quick look and it seems like the total usage of siacoin is not that
large.

[https://siastats.info/hosts_network](https://siastats.info/hosts_network)
Only 710 TB is in use. Or about $20k worth of hardware TOTAL for the entire
network, according to the above URL.

Also, why is this a cryptocurrency at all? Wouldnt this business be
drastically simplified by simply paying people out//letting people rent space
with either USD or bitcoin?

~~~
sho
> Also, why is this a cryptocurrency at all?

Do you even need to ask? Because they have minted a bunch of SIA coin and this
is their effort to give it value out of thin air, making them all
millionaires. Using someone else's currency, despite its obvious benefits,
means no huge pre-minted pool under their control = no lambos.

Every single ICO project is like this.

------
franciscop
The website seemed okay and even useful until:

"Both renters and hosts use Siacoin, a unique cryptocurrency built on the Sia
blockchain. Renters use Siacoin to buy storage capacity from hosts, while
hosts deposit Siacoin into each file contract as collateral."

~~~
drak0n1c
Until payment processing for regular money becomes free, peer-to-peer
micropayments can only be done economically with a cryptocurrency. The need
for a new unique crypto currency for this project is debatable, but I can
understand that the people who work on this project full-time need to earn a
paycheck.

------
miohtama
See also Tardigrade

[https://tardigrade.io/](https://tardigrade.io/)

It is $10/mo/TB, but has different uptime, speed and security characteristics.

------
rob-olmos
How did they go from 8 SATA ports to 32 SATA devices? The linked cable uses a
SAS host/connector. Was a SAS card left out of the parts list?

~~~
Tepix
Using "8x 4 way SATA data splitters".

What i don't get is why they don't use 14TB HDDs, they are only 15% more
expensive per TB. On the other hand they'd need 2.33x less PCs at $550 each,
plus their power use.

So instead of every 7 PCs with 6TB HDDs they'd need 3 with 14TB HDDs.

PS: They could also use a mainboard with 10 SATA ports instead of 8. They are
only $15 more than the chosen board. Adding one or more PCIe 8x SATA
controller cards might also make sense, depending on the average load of a
system.

~~~
zaroth
> Using "8x 4 way SATA data splitters".

There's no such thing as a passive SATA data splitter.

~~~
tracker1
it's probably a SAS/Sata controller, and the sas interface split to 4x sata.

~~~
zaroth
They do link to the motherboard they are talking about, and it doesn't have
SAS.

~~~
tracker1
They mention 8 pcie slots with 48 pcie lanes... I'm presuming they are filling
them with sas/sata controllers.

~~~
hddherman
> 48 PCIe lanes

PCIe lane count depends on CPU support, for AM4 I believe it ranges from 6x
PCIe 3.0 (cheapest AMD Athlon/Ryzen CPU-s with integrated GPU) to 24 (PCIe 4.0
for latest Zen 2 based Ryzen 3000 series).

The CPU they list in the article supports 16 lanes of PCIe 3.0 connectivity +
4 lanes for chipset (storage and other IO). Nowhere near the 48 PCIe lanes you
mention, although you could argue that 20+4 lanes of PCIe 4.0 bandwidth is
equal to 48 lanes of PCIe 3.0 bandwidth, but this would require a compatible
CPU, which would increase the cost by hundreds of dollars.

------
Mave83
We from croit.io operate Ceph based storage including everything from
datacenter, power, switches, labor, licenses, all to a price point of 3€/TB.

No consumer ware

------
ianopolous
There is one big problem that I've not seen anyone else point out with systems
like this. I know because I did the calculation early on with Peergos and came
to the conclusion that it doesn't work.

The problem comes when you want to store multiple files. If the corresponding
erasure code fragments from different files are not stored on the same server
then you don't have correlated failures. Contrast this with a typical raid
scheme where a failed drive means the nth erasure fragment of every file is
gone - correlated failures. If the failures across different files are not
correlated, which is the case if you're storing each new block on a random
node, then you are basically guaranteed to lose data if you have enough files.
Depending on your scheme, this can happen as low as 1 TiB of data for a user.
It is similar to the birthday paradox.

For erasure codes to work for a filesystem you need to have correlated
failures.

~~~
paulsutter
totally false

ordinary raid has very slow recovery because it’s concentrated on a hot spot
of a new drive. plus recovery waits for a new drive to be inserted (double
stupid).

when fragment placement is randomized, recovery is widely distributed and can
happen in less total time so lower chance of data loss.

~~~
ianopolous
I didn't say anything about speed of recovery because it's not relevant.
Recovery can't happen at all if enough fragments aren't online. The maths says
that with unncorelated fragment placement, and thus uncorrelated failures, and
with enough data, you are basically guaranteed to lose data. Try doing the
maths for an entire filesystem, where each file/block is individually erasure
coded.

~~~
paulsutter
We stored hundreds of petabytes on cheap SATA drives with random fragment
placement using reed solomon 6+3 coding (half the space of 3 replicas but same
durability). Never lost a byte.

Speed of recovery is crucial, because that’s your window of vulnerability to
multiple failures. For example. try raid 5 on giant drives. The chances of
losing a second drive during recovery is very likely.

~~~
ianopolous
No need to be rude. EDIT: The offensive part was removed

What was the probability of failure of your drives? My guess is you just
didn't hit the threshold for your failure rate. The maths checks out (PhD
here). Seriously, do the calculation.

~~~
ianopolous
To clarify, the assumptions I'm making for the calculation are:

1) a Fixed probability of a server failing

2) a fixed erasure coding scheme used for all files

3) uncorrelated server failures

4) an erasure fragment is stored on a random server

~~~
ianopolous
It boils down to the following:

You can calculate a probability L of losing a given file.

Because we've assumed totally uncorrelated failures that means this is the
same for all files, and that the probability of losing NO files if you have T
files is (1 - L)^T

As you can see, this approaches 0, meaning Pr(losing a file) approaches 1 as T
increases.

Using the probability of file loss in Sia, which I would say is is too low,
but lets ignore that. They get L = 10^-19.

This leads to T = ~10^19 before you expect to lose data. If you're erasure
coding on the byte level, then that's 10 exa bytes.

I expect your probability of failure is much less than random nodes on a
distributed global network of volunteers. so yes, ~petabyte is below the
threshold, but there is a threshold.

------
londons_explore
This tech seems very cool, but with only 200TB stored I worry that it is
destined to not pay for its overheads. No big project can survive on a revenue
of $20/month!

When will the project grow some mobile apps like Dropbox or Google drive that
you can just put a credit card number into, pay a few bucks and know your data
is safe?

------
vondur
That rack setup in the article reminds me of Gilfoyle's setup for Anton in the
garage from Silicon Valley.

------
vinniejames
This is pretty incredible both from a product perspective, as well as the
potential to push the whole industry towards a race to the bottom. Equilibrium
here pushes storage, processing, and availability towards distributed nodes,
unless high availability is required for some unique business case

------
rishabhd
Offtopic question : I went through their website, and I have no idea about
blockchain. I have gone through the documentation and almost everything they
are doing is possible without it as well. Cryptoproofs for storing data, smart
contracts et al - not sure if it is different from regular encrypted
deduplication done regularly with standard per hour/ per minute billing. Also,
Siacoin for payment, not sure if it is the most optimal way. I think I am
missing something, would be glad if someone can point me in the right
direction.

------
aiisjustanif
You lost me on the homepage (sia.tech) there are only 895 hosts storing a
total of 206TB right now. That coupled with the shady infrastructure.

You aren’t touching my enterprise data, not even the cold storage logs.

------
people42
For people that use Plex... Do you think that it's a good place ? If we have
several TB is it cheaper to use it's own PC with (let's say) 2x10TB HDD +
2x10TB HDD backup or it's better to go online today ? When I check the price,
I feel that it's always more expansive.

For my backup, it's not sync in real time but I do manual backup every 3
months. I can loose some data but I feel ok with that.

------
zhte415
The website sia.tech required a Google captcha challenge for me to even load,
clicking through from the article.

So.. that turned me off in an instant.

------
johnchristopher
I wish there was an easy solution that would allow me to plug an S3 or a
virtual drive or whatever and mount it as a partition in my cheap 20gb VPS.

The price for additional “drive”, like 5 bucks per month for 50gb or sthg, is
insane. Especially when comparing with Dropbox or Onedrive pricing (or even
physical drives sold over the counter).

------
rwbaskette
Where are these datacenters? I live in Ohio in the area of two of the points
on the home page at [https://sia.tech/](https://sia.tech/) One appears to be a
private residence or a farm. The other dot is literally on a golf course
fairway, is a private residence, or a power substation.

------
flatiron
I have no clue how google stores my 35TB for $12 a month. They do at a loss
and I guess they don’t really care.

~~~
pedrocr
Where are you getting that price? Google Drive is 300$+ for that and GCP
archive storage is 140$.

Edit: apparently that's the unlimited storage in GSuite for 12$/month/user as
long as you have 5 users or more.

~~~
flatiron
they say you need 5 people but its just me and i pay $12

------
tgsovlerkhgsel
I expected this title to be about [https://www.scaleway.com/en/c14-cold-
storage/](https://www.scaleway.com/en/c14-cold-storage/) which already offers
exactly that, Cloud storage for $2 per TB per month.

~~~
markc
Sia isn’t cold storage.

~~~
tgsovlerkhgsel
I think it's a lot more comparable to cold storage than to anything else. I
assume you cannot serve directly from Sia? How long does recovery take?

(Does anyone know how long Scaleway takes in practice? They claim minutes but
I haven't used them yet.)

------
zupa-hu
Does any consumer grade motherboard have IPMI* support? When I tried to
optimize my server costs one issue I ran into was that colocation providers
require IPMI capability, which seems only available in server-grade
motherboards.

* IPMI is for remote hardware management

~~~
zupa-hu
To elaborate, while I like such optimization efforts, for this to work you'd
need to run your own datacenter because the _ones I have found_ that offer
cheap bandwidth require IPMI (to lower their labor costs).

Or you need to know some providers I don't know, in which case, tell me. :)

------
kalium_xyz
I just saw IBM claiming to do 10TB at zero cost on twitter, how are you guys
gonna beat that?

------
execve12
> It also turns out that 32 HDDs only consume 200w, so the 750w PSU we picked
> is more than sufficient.

Yep. Stopped reading right there. HDDs use ~15 watts when they boot up. I
experienced this and I never allocate less than 20 watts / hard disk.

~~~
kllrnohj
Staggered spin-up is a thing. That motherboard doesn't have it most likely,
but they also can't plug 32 HDDs into it either. They'd need an HBA card of
some kind, which is more likely to have staggered spin-up.

------
OJFord
> over the long term, storage will be cheaper than $2 / TB / Mo

OK, but what is it today?

~~~
Taek
[https://siastats.info/storage_pricing](https://siastats.info/storage_pricing)

It's less than $2 / TB / Mo today, but relies on a completely different set of
economics that don't scale beyond a few hundred PB. This article was aimed at
people who understand why Sia is so cheap today, but do not believe that Sia
will continue to be cheap as the network scales.

------
Aunche
In addition to what other people mentioned, there is also a huge cost in
managing all the metadata that you'd get from billions of files. This really
even worse if you're using a crazy 64-32 encoding.

~~~
Mave83
Yes, but with croit.io you can manage Ceph clusters with ease, reducing Laber
costs and increase reliability.

We build clusters of low PB scale that have a TCO with everything from labor
to hardware, from financing to electric, from routers to cables and can be run
below 3€/TB. For that you can store data in Block(rbd, iscsi), Objekt(S3,
swift) or Filestorage(CephFS, NFS, SMB), high available on Supermicro hardware
in any datacenter worldwide.

Feel free to contact us or use our free community edition to start your own
cluster.

------
irishcoffee
This makes me want to buy storage just because I know exactly how it’s spent.

------
crobertsbmw
I was sexcotes to cancel all my storage subscriptions, but for some reason
this article didn’t convince me that that was an option yet.

------
QuinnyPig
I wonder what it would take to build a frontend for this that uses other
people’s exposed S3 buckets for storage.

------
jsight
So, $48 USD/year for 2TB? That's a little less than idrive, as they charge
~70/year for 2TB.

------
classics2
one thing that bothers me about this system beyond the hand waving math is
what would motivate me to give up control of my data being available? If I
have no SLA and no ability to convince a bunch of down hosts to come back
online with my data, why store it that way at all?

------
dillydobby
I wouldn't even buy new hardware. I would buy retired hardware for a fraction
of the price.

~~~
entangledqubit
I would imagine that managing such a pool of dated and highly heterogeneous
hardware would incur a lot of overhead. The procurement effort may end up
being rather high as well.

~~~
mtnGoat
yet a number of FAANGs have datacenters full of mismatched lease returns. So
it much make sense at some scale.

------
hestefisk
Looks super interesting. Can anyone who has used it share their experience?

------
riffic
Durability? Availability?

~~~
kazen44
Availability is the big one. Getting high availability makes systems
incredibly more complex and expensive.

It's the same reason increasing redundancy and uptime to the next factor of
nines is almost logarithmatically more expensive.

~~~
galeaspablo
Increasing redundancy and uptime to the next factor of nines is exponentially*
more expensive.

------
cjohansson
This sounds interesting, thanks for sharing!

------
britmob
Great writeup from David as always. Can’t wait to see more!

If anyone hasn’t seen the work being done on the skynet platform, I highly
recommend taking a look. Amazing stuff.

