The NixOS Foundation’s Call to Action: S3 Costs Require Community Support

johnklos · 2023-06-03T03:23:04

Half a petabyte in storage and 3 TB transfer a day? Shit. That's nothing, unless you're "saving money by using the cloud".

While I feel for them, they put themselves in this position. Even when someone gives you something apparently for free, you have to consider what might happen when it disappears, and it seems that nobody thought about it, or if they did, they didn't care.

People often learn how expensive "the cloud" is the hard way.

esjeon · 2023-06-03T06:13:19

3 TB a day is roughly 36 MB/s, so yeah it's pretty much doable. 10 Gbps connection is already common, and it's even possible on 1 Gbps connection if people don't mind some saturation during peak time.

> People often learn how expensive "the cloud" is the hard way.

Yeah, people really go a long way around just to get back to on-premise and colocation...

vasco · 2023-06-03T08:08:55

> Yeah, people really go a long way around just to get back to on-premise and colocation.

Outside of pet projects I'm not sure many people are doing that, but happy to read any research. From all I've seen the numbers show an overwhelming move to the cloud, and a few grumpy cats annoyed that the pricing structure is different.

supriyo-biswas · 2023-06-03T09:11:40

https://tech.ahrefs.com/how-ahrefs-saved-us-400m-in-3-years-...

https://37signals.com/podcast/leaving-the-cloud-part-2/

BiteCode_dev · 2023-06-03T08:41:37

37signals enters the chat

WJW · 2023-06-03T09:53:30

I've read the article from DHH he makes some decent arguments, but as I see it that article is mostly notable because most people don't self-host. At least in my corner of the tech scene (ie EU startups) there are probably hundreds of companies hosting in the cloud for every company choosing to self-host and I haven't seen a shift in that trend yet. If anything it seems to be shifting more towards the cloud.

BiteCode_dev · 2023-06-03T10:03:07

True, the wake up call is mostly in articles and social medias, not in the trend, in the same way most people use Windows despite most technical writers using Mac or Linux.

Personally I've been self-hosting forever, and that's the only reason I could stay afloat. There is a whole category of business that is simply not profitable if you use the cloud.

joshspankit · 2023-06-04T13:03:21

I may be wrong, but I feel like “startup” no longer means “bootstrapped for the lowest amount possible” (the model that created sv) and now means “vc-backed for the lowest amount of dev-effort possible”.

In the latter case on-prem is a distraction

insanitybit · 2023-06-03T15:28:23

AWS is a public company, we can see that the above posters are wrong pretty easily.

n2d4 · 2023-06-03T04:47:20

I'm curious. What would you do, host on-prem? Even if I had the hardware, as a pure software person, I have no clue how I'd even start setting up a physical location to serve data on that scale. And even if the people on the NixOS team know, it sounds like their time would be better spent working on NixOS than building & maintaining a data center (even if small). Bare metal storage services seem to be on a similar price level as lower-cost cloud services like Cloudflare or Backblaze, but I'm very interested if you happen to know a cheaper alternative!

Plus, even if you deal with the storage itself, how do you deal with serving the traffic globally? Optimally you'd want a distributed CDN, but that doesn't really sound feasible without the cloud. Would you still use a cloud CDN, or just serve directly from your own server?

supriyo-biswas · 2023-06-03T06:49:38

It is astonishing as to how marketers from the three big clouds have successfully convinced people that there is no middle ground between operating your data center and using one of their cloud services.

Here's what you do:

1. Buy two servers equipped with a large amount of storage. There are many providers[1][2] that offer this. (You could technically do it with one, but two allows you to have high availability and prevent data loss in case of accidents, imagine `rm -rf *`.)

2. Use Minio to set up erasure-coded storage on one of the servers, and replicate objects[3] to the other server.

4. Use Cloudflare (because of their open source sponsorship program[4]) to serve the objects from the primary server, and when performing maintenance tasks, fall back to the secondary.

You could also serve objects directly out of the primary/secondary, but using Cloudflare takes care of low-effort DDoS attacks by hiding the origin, and provides some caching which helps prevent hitting the egress limits of the server provider.

[1] https://www.netcup.eu/vserver/vstorage.php

[2] https://www.hetzner.com/dedicated-rootserver

[3] https://blog.min.io/active-active-replication/

[4] https://blog.cloudflare.com/cloudflare-new-oss-sponsorships-...

chpatrick · 2023-06-03T11:05:17

Or a couple SX servers from Hetzner

https://www.hetzner.com/dedicated-rootserver/matrix-sx

(You would need 4 SX294s for half a petabyte with 2x replication)

And put ceph on them which gives you an S3-compatible API:

https://docs.ceph.com/en/quincy/radosgw/index.html

It's not trivial to set up but once you do it's much cheaper than AWS. It's even cheaper if you can find the machines in the server auction. There are some 150 TB machines on there right now for 174 euro/month.

antongribok · 2023-06-03T14:11:10

While I totally agree you can easily handle this with Ceph, I would never do it with 2x replication on spinning rust. You can maybe do 2x replication with SSDs if you really know what you're doing and you have data replicated on multiple Ceph clusters.

Also, your math does not quite add up if you want to store 425TiB. You'd be filling up the Ceph cluster to 95% which would be impossible (you'll start losing OSDs way before that).

If this was me, I would definitely do this on Ceph, but I would opt for a more traditional erasure coding setup of 8+3. For that, you'd need 12x SX134s as a minimum starting point.

I suppose you could start with an EC profile of 2+2 or 4+2...

I do have a serious question on Hetzner:

Do they offer any private networking options? That page only lists a single 1 Gbit connection. I don't think that would work well with Ceph. It's doable, sure, but I wouldn't want to support such a cluster, not with 160TB raw per box.

justinclift · 2023-06-03T15:29:59

> Do they offer any private networking options?

Doesn't seem like it for their dedicated servers, not 100% sure though.

Their docs indicate that upgrading the 1Gbit ethernet link to 10GBe is an available option:

https://docs.hetzner.com/robot/dedicated-server/network/10g-...

CoolCold · 2023-06-04T09:44:48

Internal networking in simple case provided via VLANs on that 1gbit interface via their vSwitch.

Extending with additional NICs should be possible from what I've read but not used myself.

chpatrick · 2023-06-03T16:45:46

You can upgrade them to 10gb NICs.

usr1106 · 2023-06-03T08:03:09

Yes, perfectly doable, but the expertise and time needed to do it (without fatal mistakes) and keep it running 365 days a year is non-negligible. Doing this by unpaid volunteers is more than challenging.

simonjgreen · 2023-06-03T08:12:25

This is, in my opinion, a falacy perpetuated by the marketing departments of "big cloud". Don't forget that pre-2006 there was no S3, and we managed just fine. They are not challenging skills to learn, the zeitgeist has just convinced itself they are not worth learning

vasco · 2023-06-03T08:20:26

It's condescending to think that someone can't reasonably choose to pay for a service they are capable of doing.

Cooking isn't hard either but I still go to restaurants, and don't go around laughing at patrons telling them that they never bothered to learn how to cook if they are there.

simonjgreen · 2023-06-03T08:29:46

Do you eat every meal at a restaurant though? That would not be sustainable. To roll with your analogy it would be like the hospitality sector persuading the majority to ditch home meals entirely

vasco · 2023-06-03T08:39:15

The problem was your assumption which is condescending. I gave you an analogy to make it clearer. Taking the analogy in the second paragraph apart doesn't change the point in the first - it was simply to help illustrate it, but it stands on its own.

simonjgreen · 2023-06-03T08:57:17

It's not condescending at all. You are saying they could reasonably pay for it, but they quite obviously cannot (hence the existence of this whole post). What I'm principally challenging is the assumption that this required effort is substantial. I'm making the point that it is not, certainly (in my view) compared to the effort to run an entire operating system project and all of the ancillaries that come with that. This is an area they have chosen to pay for that (again, in my view) they don't need to. They have gone to great lengths to architect a system that allows deep pinning of very specific versions, it's not a stretch to think about the architecture required to deliver that. Instead they have stopped that line of thinking at [insert S3 here] rather than carry on solving. A direct consequence of that now is a $9k/m hole they need to fill in their budget. I'm being critical, sure, but not condescending. Overall, it frustrates me seeing opinions that operating infrastructure is difficult and therefore must automatically be avoided. In this particular example avoiding operating said infrastructure potentially is putting the project in the red and at risk, but nowhere on the NixOS call for help does it suggest they have considered operating this themselves. And there in is the crux of my point. The zeitgeist has turned against operating infrastructure, and yet is frequently caught out by the cost of this decision.

vasco · 2023-06-03T09:17:12

> They are not challenging skills to learn, the zeitgeist has just convinced itself they are not worth learning

Next time you see someone paying for something you think it's easy to learn, that the "zeitgeist told them not to learn", go up to them and say what I quoted and you'll see real fast if it's condescending or not.

Arch-TK · 2023-06-03T11:03:56

This is a silly thread.

The difference in situations here is literally that on the one hand large corporations have spent money to market the idea that their services are essential because the alternative is incredibly difficult. On the other hand, McDonalds, KFC, Burger King, "Big Restaurant" etc have not spent large quantities of money trying to persuade the populace that cooking at home is incredibly difficult. It's possible to sell someone a convenience without also telling that person that it's the only real option and this is what the person you're replying to is highlighting.

It's not condescending, it's the reality of the situation. Yes in the examples you've described it would be condescending but in the situation we're actually talking about there are actually large numbers of people who genuinely believe (as a result of successful marketing campaigns) that the equivalent to cooking your own meals at home is incredibly dangerous, difficult, expensive and error prone.

vasco · 2023-06-03T11:24:48

> On the other hand, McDonalds, KFC, Burger King, "Big Restaurant" etc have not spent large quantities of money trying to persuade the populace that cooking at home is incredibly difficult

They haven't because it's a ridiculous proposition. The same way cloud providers would be laughed out of the room if they tried to say the same thing and they didn't actually have a better product with trade-offs that are genuinely better for some use cases. The fact that you believe that everyone using cloud products is either too lazy to learn or has been fooled by marketing because they can't think for themselves, unlike you who see the light, is the condescending part. Mind you, on the face of overwhelming data showing most companies have, or are moving large parts of their infrastructure to the cloud, so everyone else are dummies either too lazy or too stupid to see what you can see.

Arch-TK · 2023-06-03T12:44:47

> The fact that you believe that everyone using cloud products is either too lazy to learn or has been fooled by marketing because they can't think for themselves,

I never said anyone was too lazy, just misinformed about the benefits, and it's a more far fetched statement to claim that marketing doesn't work than it is to state that it does work.

As evidenced by some people in this thread claiming that not using S3 would be basically impossible (despite a large number of popular distributions literally doing exactly that) there are clearly plenty of people who have fallen for the marketing.

Moreover, you're the one implying that falling for marketing is something only people who can't think for themselves do, that's also not what I'm saying. I think everyone is susceptible to marketing, and in this case, clearly a lot of intelligent people who are otherwise capable of thinking for themselves have fallen for this blindspot.

I think you are the one being condescending here.

whyleyc · 2023-06-03T16:19:05

The original post did not come off as condescending to me, but instead just highlighted how BigCorp cloud marketing departments have an incentive to sell developers on “infrastructure is tough, just pay us”. (I say this as a consumer of AWS services and fan of their products).

“Assume good faith”[1]

[1]https://news.ycombinator.com/newsguidelines.html

justinclift · 2023-06-03T15:31:35

Reading back over that, I'm not seeing the condescending bit?

heisenbit · 2023-06-03T08:28:12

Eating in the restaurant once in a while is fine. Lage range of choices delivered quickly. But is it sustainable? Price higher, usually less healthy and not tailored to special needs.

In any case one does it professional IT done right is costly but cheaper than e.g. IT without offsite backup in case of a disaster.

usr1106 · 2023-06-03T09:00:08

Who managed just fine? IT-based services very a minor fraction of what they are today. And IT departments in some random company were much bigger.

vanviegen · 2023-06-03T12:18:26

> And IT departments in some random company were much bigger.

Not sure about that. There appear to be just as many people in IT, it's just that they're not that technical anymore, mostly producing policy documents and pushing tickets around. The actual work is outsourced to the cloud, SaaS products and outside consultants (for duct taping the mishmash of SaaS products together). But that may be just my lacking understanding of my strange little part of the globe.

makkesk8 · 2023-06-03T09:30:12

We do this at work, albeit much larger scale than this and I'm the only one who spends any time on upgrades, replacing failed disks and so on. Honestly, apart from upgrades every 3-ish months I don't spend any time at all on this, it's all chugging along without intervention.

Things don't fail as much as they used to, at least not from using something mature as minio and of course hardware can fail as they do specifically spinning rust, which is non trivial to replace if you have hot-swap capable servers in the first place.

I don't possess any decade worth expertise in either minio or running this kind of setup so personally I believe you are exaggerating that this is even worth calling challenge at all.

But that's just my take :)

mynameisvlad · 2023-06-03T11:34:27

So I’m sure you’d be more than happy to volunteer your services to NixOS and be responsible for the setup, including when there are issues (which will inevitably happen regardless of tools and technologies used), right?

It’s so easy, after all.

johnklos · 2023-06-04T01:08:54

I'd volunteer to do it, if they paid for hardware, colo and transportation for initial install. After things settle down, it takes no more than half an hour or so every month.

Pretending to know how much work something is when it's not your area of expertise is a bit silly.

Actually, yes, it is easy, after all :)

jacquesm · 2023-06-03T08:12:20

Not for the people behind NixOS. If it is challenging then that would cause me to re-calibrate my opinion of the project as a whole.

nine_k · 2023-06-05T02:23:01

The money that are paid as the exorbitant AWS egress fees could be used to pay a competent SRE to handle that. I don't think that running the package mirror would require a full-time dedicated paid position. Initial setup, maybe some heavy restructuring operations as storage requirements grow, maybe some firefighting, but not the daily operation.

baq · 2023-06-03T08:25:45

In the olden days you’d find someone to donate you a server and have a few volunteers around the world mirror it.

not_your_vase · 2023-06-03T09:26:37

It still happens. You need storage a TB or two with unlimited (or at least very high) traffic? That's not an issue. Even I could help being a mirror from home. But I wouldn't be surprised if there would be some free tiers already from cloud provider.

Half PB is a bit more than that, and I guess it is expected to grow... in 2023 it is not realistic to throw such storage around for free (+mirrors). It will be realistic in a few years considering that the storage prices have been in free fall for the past 30 years, but that time hasn't came yet.

(As a maybe interesting comparison, Debian's mirror is around half TB: https://people.debian.org/~koster/www/mirror/size )

mcbits · 2023-06-03T10:35:23

The falling cost of storage is at least partly the reason hosting a few thousand packages takes up hundreds of TB. "Storage is practically free" from the perspective of each of those projects. By the time .5 PB is a trivial cost, they'll need 50 PB.

est31 · 2023-06-03T15:07:01

Yeah NixOS has done some technical choices to require tons of data downloads, even if there is little modification of the content of the packages. Regularly my computer has 10 Gb worth of downloads because some high level package got upgraded and even if it's dynamically linked, the lower level packages still get re-downloaded from cache.nixos.org. It doesn't matter because storage is free right?

not_your_vase · 2023-06-03T13:02:12

Yeah, that too.

vasco · 2023-06-03T08:13:46

You do all of this and you still don't have nearly the availability or durability of data you get from S3, nor people on call to fix it once anything happens to it.

It's like comparing leasing a Mercedes-Benz with full service and all inclusive insurance to building your own bicycle. Some people don't need the car, sure, but it's not the same thing.

debugnik · 2023-06-03T08:38:13

But they're complaining about the cost of leasing the Mercedes, so they need to lower their expectations if they want a cheaper solution.

Although having to spend some time to attain good enough availability/durability is not a big deal when the rush to launch was over long ago. Unlike in your example, you can turn the bicycle into a car.

megous · 2023-06-03T09:56:51

The usecase is a cache. http://cache.nixos.org/

All it needs to do is to get you the data faster than the compilation itself - most of the time - to be useful.

I also wonder at the original thinking, given the use case. Cache usually has high ratio of download to upload and doesn't need high reliability. S3 doesn't excel at the costs of fetching the data from it, at all.

Filligree · 2023-06-03T14:25:47

It also “caches” the source code downloads, many of which are no longer available from the URL listed in the repository, if anywhere at all.

johnklos · 2023-06-03T18:47:57

It only seems like that to people who don't know any better.

Do you ACTUALLY think you can communicate with people at Amazon who truly know what's going on? If so, why do we see countless stories about people who have problems with Amazon, Google, et cetera, who can't communicate with them? Do you know of some special, magic way to get the skilled people to talk with you that the rest of us don't?

You act like being able to call Amazon, even if you could get someone on the line who knows something, is somehow better than having even just one good systems administrator. All of your ridiculous assumptions go right out the window when you have a sysadmin build, test and deploy your own hardware that you (via your sysadmin) have complete control over.

Security? You can't secure Amazon. You literally can't. If you wanted to, you'd have to buy the company, then you'd have to evaluate your whole S3 staff. If you hire a systems administrator, you have to vet your systems administrator and anyone else who has admin access, not a collection of tens of thousands of employees across the planet, any of whom could be stupid enough to visit the wrong web site using the wrong browser on the same computer as their keys.

Availability? My side projects have better uptime than Amazon.

Durability? Two geographically disparate locations and LTO, plus many mirrors, make things plenty durable.

When you're not a systems administrator, you can be forgiven for not understanding the big picture, but not for acting like you do.

vasco · 2023-06-03T20:29:01

Yes it is very simple, you pay for support. Maybe I don't know any better, working professionally on top of AWS for almost 10 years and growing usage from a couple of thousands a month to half a million a month (while remaining profitable the whole time). Maybe, just maybe, people disagree with you even if they are informed.

And for the point in question, you don't need to talk to anyone, the point is there's thousands of datacenter, networking, app-level layers of staff all working to make service stable for you without you even knowing. I don't need to talk to anyone if an S3 server, rack, or even full DC goes down.

RaitoBezarius · 2023-06-03T22:10:38

Honestly, we are an open source project, not a company. If we have the money, of course, we would prefer to spend it there and do better and more with it.

If this is threatening in a serious countenance, our growth and ability to conduct our project. I'm not sure if we need all the 9s of durability of S3, given that people already lost all their buckets on S3.

We do not have the money to store 5x or 6x the contents of the cache.

So we much prefer to have control, and we have enough skills to run all that shit easily, the problem is that the skilled people are in rare availability and are usually working on harder problems than running that. So ultimately, this is a balance problem.

Disclaimer: NixOS developer, 23.05 Release Manager.

rakoo · 2023-06-03T11:14:16

On the technical side, garage (https://garagehq.deuxfleurs.fr/) does multi master replication by default, so is probably better for this use case. Still with S3 API.

slondr · 2023-06-03T12:09:29

I have a few projects running off of a self-hosted garage instance, it’s very nice software.

goodpoint · 2023-06-03T10:12:17

You really don't need al that mess, just a bunch of mirrors around the world.

lacasito25 · 2023-06-03T08:09:07

I mean, you make S3 look easy compared to that

supriyo-biswas · 2023-06-03T09:03:33

If `minio server /mnt/sd[abcd]1` and the occasional `apt-get upgrade` and changeover of DNS records is so much work, maybe you should just spend 100x for S3.

bsagdiyev · 2023-06-03T11:27:08

Seriously, there is a large amount of comments of people who seem terrified of this or throw out "but but updates and failing hardware". We run multi PB storage on prem at work and it's just fine. Not some massive team running it.

danielrmay · 2023-06-03T08:10:37

That's the business model.

AnthonyMouse · 2023-06-03T06:59:54

> Even if I had the hardware, as a pure software person, I have no clue how I'd even start setting up a physical location to serve data on that scale.

3TB/day is about a third of what you can serve out of any run of the mill PC with a gigabit ethernet connection. There are 22TB drives and individual servers with 24 drive bays, which exceeds 500TB. So you could do this with one machine connected to a gigabit fiber connection, and the machine would cost approximately what they're paying monthly (almost all of that for the drives).

In practice a 24-bay chassis is silly and you might use a separate storage array etc., but that's just details.

> Plus, even if you deal with the storage itself, how do you deal with serving the traffic globally?

At 3TB/day you're not Netflix. Just serve it directly from your own server.

ptman · 2023-06-04T20:06:36

https://www.supermicro.com/en/products/system/4U/6048/SSG-60... - 4U 72 drives

AnthonyMouse · 2023-06-05T01:33:02

It's not silly because it doesn't exist. It's silly because it weighs a hundred pounds and makes design trade offs to achieve a level of density you probably don't even care about if you only need one of them. The people who want that thing are the people who are going to fill a rack with them or more instead of several times that many racks with something else.

In practice if you need ~24 drives you get a couple of 2U array containers that each hold 12 drives and then buy a 2U server or two to plug them into and don't care at all that you used 8U for 24 drives instead of 4U for 72 because your 42U rack still has 34U to spare.

lordgilman · 2023-06-03T05:25:36

You colocate, like every other distro has done since the 90s.

herewulf · 2023-06-03T06:23:21

And throw in some rsync mirrors for good measure.

411111111111111 · 2023-06-03T06:48:21

> Plus, even if you deal with the storage itself, how do you deal with serving the traffic globally?

As a simple example 1 of the cheapest hetzner baremetal nodes gives you 500gb nvme or 2tb HDD storage with an unlimited gigabit Connection in Europe. This sums up to i believe 10TB of data per day.

Given the provided numbers, a single 50€ instance could handle the load, even if it's located in Europe. CDNs make sense for systems which are sensitive to high latency, or you're pushing multiple TB per hour, which this just isn't.

usr1106 · 2023-06-03T08:13:41

I wonder how unlimited such cheap unlimited connections really are. In the simplest case insufficient capacity in the provider network might become a bottleneck. They might notice that you are the noisiest one in the whole neighboorhood and either ask more money or throttle you. Offers that are too good to be true typically don't last very long.

_a9 · 2023-06-03T10:17:51

I have quite a few servers with hetzner. I don't know what the exact criteria are but once you hit around 250TB/month per server with multiple servers for (likely) multiple months in a row then they will tell you to chill out or they will terminate. I've gotten it and I've seen a few others on lowendtalk mention it too.

I've been using hetzner for a couple years and have always used them for my high bandwidth services but I have only gotten that notice once despite being a very heavy user, and around the same time everyone else did around a year ago. Maybe they're gonna start to care? With all the price hikes and all. I ended up just getting more servers to level it all out.

CodesInChaos · 2023-06-03T09:27:50

At 1 EUR/TB, even the instance types that pay for traffic (e.g. the 10GBit dedicated servers) hardly break the bank.

However Hetzner does not have the best peerings, so bandwidth to some locations might be worse than you'd like.

411111111111111 · 2023-06-03T10:42:04

Thats true, and I didn't actually want to suggest using the cheapest server available as a singular node either.

I just wanted to put the given numbers into the scale of the cheapest baremetal servers available on the market, to illustrate that this is not an amount of data you'd have to consider for scaling. Realistically speaking, you'd want to have more then 2 nodes so you don't have to worry about rebooting/updates etc

Shish2k · 2023-06-03T22:09:20

I’ve run some fairly network-heavy services (900mbps+ on a 1gbps connection, 24/7) at both leaseweb and hivelocity; they never complained about the traffic

toyg · 2023-06-03T06:32:28

Actually, for someone building an OS, having to face very practical challenges in the server space would likely be extremely beneficial. Dogfood, people, dogfood.

johnklos · 2023-06-03T12:31:32

Surely you know what colocation is, don't you? Suggesting someone has to build a datacenter for it is like saying that programming is time consuming because people have to write their own compilers.

If someone asked me to do this, I'd colocate in two locations (likely Europe and west coast USA) two full copies of everything in 1/4 rack. After physically building, testing, installing, then testing some more, which likely would take a month, it'd be a part time job to monitor and babysit for the first few months of use, then it'd be an hour a month maintenance job.

CDNs cost money and come with tons of problems. For non-interactive static files, they're not necessary.

If it requires constant attention, you're doing it wrong :)

citrin_ru · 2023-06-03T10:11:16

> Optimally you'd want a distributed CDN, but that doesn't really sound feasible without the cloud

CDN is a good choice when you ether need to sever small volume (i. e. not 500Tb) with low latency, e. g. to reduce web page load time by hosting static assets (js/css/img) close to the customer or when you have really huge traffic (e. g. Netflix, but they decided public cloud is not cost effective for CDN and built their own).

For NixOS mirrors latency is not important, traffic is not huge (withing capacity of a single commodity server) and just two mirrors - in US and EU would be close enough to majority of users.

Roark66 · 2023-06-03T08:42:08

Well, having a $9k per month budget you could hire an entire rack in a premium data center location (let's say Telehouse London-Docklands that hosts London internet exchange - You can connect to it in a very cost effective way). This would still leave 80%~90% of the budget (depending on power use) for hardware/software etc.

But a full rack is a huge overkill. How about a quarter rack for few hundred $ per month?

What about connectivity? Well, as mentioned Telehouse Docklands hosts LINX so you can get an unlimited transfer gigabit right into the Internet backbone for a couple hundred $ (mind I dealt with them few years ago, but I doubt prices changed that much - probably they're lower if anything). Also there is a cluster of data centers in docklands so they compete on price heavily as well as resellers that often have good deals.

This is all just location and connectivity. How about hardware? Let's say they wanted proper support with their hardware not diy so they were looking for a "solution provider". Send a request for quote to 10 of them including fujitsu, ibm, hp etc. Then tell them you're looking for a "charitable organisation discount". Unfortunately we're missing the most important metric to price this "solution". IOPS and if its mostly small or large files (random or sequential Access), but they host most of it in "infrequent access" s3 so let's assume it's not a lot, just to find the lower pricing boundary.

So let's say a pair of 2u servers with two hba cards each and 4 12 disc jbods with 22tb Sas drives a total of a petabyte in raw storage or half a PB mirrored (not that they would run it mirrored). A system like this from a big name (it's rather low performance, but we're after cheap right, their s3 is likely very similar) might cost $80k. Over 5 years of lease that might be ~$3k per month plus their hosting and connectivity it's about half of their $9k.... Then there is software, but that's a separate thing. Also one could DIY such system for half the price, one could add SSDs etc. This is just an example. To design this properly one would need usage data from their s3 and few days of work to put it nicely together.

traceroute66 · 2023-06-03T12:35:45

> mentioned Telehouse Docklands hosts LINX so you can get an unlimited transfer gigabit right into the Internet backbone

First, look at the LINX network map. They are not just in Telehouse Docklands.

Second, with all due respect, do you know anything about how the internet works ? "internet backbone" huh ?

LINX is a peering exchange. A connection to LINX gets you nothing. Sure you can peer with the route servers. But you won't find any Tier 1 routes on the route servers, and the Tier 1s will also refuse to peer with you because you're too small or you don't meet their ultra-restrictive peering policies (for example, good luck getting BT to peer with you !).

Most people are better off getting a good connection to either a Tier 1 or a high-quality Tier 2 network and leaving it at that.

> Also there is a cluster of data centers in docklands so they compete on price heavily

London Docklands is the most expensive place in the UK to colocate, and probably in the Top 5 most expensive globally. Its prime real estate, all of the big-name colo facilities know that and will charge you accordingly, ESPECIALLY Telehouse !

And trust me, you don't want to go anywhere near a "cheap" Docklands facility, they are cheap for a reason.

The days of £300 a month for a full rack in Docklands are long gone my friend.

There are many other perfectly good places both in the Greater London area and elsewhere in the UK and Europe where you will get more bang for your buck.

sokoloff · 2023-06-03T15:58:37

I’d be surprised if any of the top five most expensive colos weren’t all targeting HFT or other financial services.

traceroute66 · 2023-06-03T16:36:02

> targeting HFT or other financial services

And the hyperscalers.

The Googles of this world only really build their own data centres in the US and maybe, if you're lucky, one or two other places elsewhere.

But generally, everywhere else in the world, and absolutely certainly in London, they're in the same buildings as everyone else ... except they rent entire floors and not just a few racks of course !

tecleandor · 2023-06-03T12:21:09

I would split it in pieces and create a DIY CDN (like lots of Linux distros have). From the numbers I've seen in the article, the big problem is the storage, not the transfer, although they say that the space can be reduce by garbage collecting the less used caches (although not ideal).

Note that they are using Fastly as CDN in the front, but that's not important for them (monetary wise at least). The big part of the bill is just the storage part.

With little detail about their data I would chose one of these architectures...

  Debian (or other distro) mirror style:
  - Split the cache storage in pieces (by architecture, release or other categories)...
  - Deploy a bunch (10-30) of cheap servers in affordable hosting around the world. The most popular data is distributed in more servers (for example, the x86_64 build caches for the latest version of the packages)
  - Get some people to volunteer with small mirrors

  CDN style:
  - Get a couple or three medium/big sized (in disk, not CPU) servers for storing all the cache, for example, with Minio, so they don't even have to stop using S3 storage libraries.
  And then either:
  - Keep using Fastly
  or
  - Distribute tiny servers with Varnish or the like
  - Get some people to volunteer tiny edge cache servers

Some weeks ago this was on the front page:

- Serving 90TB/Day of Linux Updates from Thin Clients: https://news.ycombinator.com/item?id=35883053

- The scripts they use: https://github.com/PhirePhly/micromirrors

jtolds · 2023-06-03T04:55:06

(piping up from another thread, but)

> I'm very interested if you happen to know a cheaper alternative!

and

> Optimally you'd want a distributed CDN

https://www.storj.io/ is this thing!

n2d4 · 2023-06-03T05:01:55

Using the pricing from your other comment and the numbers from the parent, if I estimate correctly, Storj comes in at about $8k/month? That's cheaper than AWS (considering there are no CDN costs), roughly on the same level as Cloudflare and Backblaze, but still more than the entirety of NixOS Foundation's (monthly) incoming funds in 2022.

jtolds · 2023-06-03T05:06:40

According to https://discourse.nixos.org/t/nixos-foundations-financial-su..., they have 450TB and 29.5TB of S3 egress/month. By my math, assuming the Nix Foundation goes in through our front door, talks to no sales people, and pays list price, that's $2k/month.

n2d4 · 2023-06-03T05:13:54

You would replace the Fastly CDN with Storj, correct? Your link mentions 1500TB of Fastly traffic, which would cost $12k with Storj. Not sure if this is related to parent's 3TB/day number, but I went with that when I estimated the $8k.

I think the best cloud solution might be something like Storj + Fastly or Cloudflare end-to-end. But I'm really curious how much could be chopped off if you tried to solve this on-prem.

jtolds · 2023-06-03T05:22:53

Storj + Fastly is genuinely a pretty good combo. There are certainly some things Fastly does that we don't do (besides being more specifically a CDN, they also have compute@edge, etc). We have a good relationship with Fastly (e.g. https://docs.fastly.com/en/guides/log-streaming-storj-dcs)

But yeah, I don't know how the Nix Foundation is getting their Fastly traffic paid for, maybe that's in/out of scope. Certainly they should keep Fastly if it makes the overall thing cheaper. Having a distributed origin (us) to dedicated edge cache (Fastly) makes for a strong distribution story.

usr1106 · 2023-06-03T08:19:25

They did not make the decision that the cloud is cost-efficient. They had someone else paying their cloud bill and the cloud is easy for them.

Now that the someone else is pulling out they need to think for the first time what would be cost-efficient.

traceroute66 · 2023-06-03T12:12:07

> Now that the someone else is pulling out they need to think for the first time

They should have thought about that on day zero.

If someone else is picking up the tab then there is ALWAYS the possibility that someone will pull out for whatever reason.

usr1106 · 2023-06-03T12:29:20

When you start an open-source project you don't start with thinking what will happen when you have Terabytes of traffic.

Even for a start-up it's generally not advised to spend considerable effort on scalability before you have any customers.

traceroute66 · 2023-06-03T12:40:28

>When you start an open-source project ...

Rubbish.

Its nothing to do with "start-up" or "open-source project".

As I clearly said. Its Dummies 101 ... if "somebody" is giving me money, then only an idiot doesn't consider what happens if that "somebody" stops giving me money.

Its not rocket science, its basic planning.

regularjack · 2023-06-03T13:18:41

What you're describing is premature optimisation. They made the right call going with the solution that required the least effort. Now that that solution is no longer viable, they can spend resources addressing it. Not fixing problems that don't exist is the right call.

amusable3539 · 2023-06-03T20:14:09

I see people usually use the term "premature optimisation" to mean "I wouldn't think about this thing, so no one should". Also people really take that one quote and consider it gospel, forgetting that "premature" is contextual and subjective.

Meanwhile I think it's a bit silly that they didn't have a plan B for 425 TiB (!) of data and are scrambling to figure it out now. It's not about "fixing problems that don't exist", it's called "planning ahead for a scenario that is practically guaranteed to happen" and has high risk for the future of the project if they don't have such a plan.

They created essentially a SPOF where "failure" was losing the financial goodwill with some company.

traceroute66 · 2023-06-03T14:56:34

> What you're describing is premature optimisation.

Have you bothered to read my post ? I don't know how I can make it any clearer ?

1. NixOS were dependent on DONATED service(s)

2. It is not sensible to assume the service will continue being donated forever, that is just foolhardy.

Anyone who has ever run a business or has been involved in charitable work knows this. NixOS appears to have just learnt the hard way instead.

tedivm · 2023-06-03T15:19:04

They had assurances that they wouldn't just have it ripped from them, are being given time to find a replacement, and are engaging multiple avenues to do so. They are having absolutely no downtime from this process. You're making assumptions that aren't warranted at all to make a point that really just seems like an excuse for you to be angry at random strangers. I'm honestly not sure why you are so upset (and rude) about this.

traceroute66 · 2023-06-03T16:26:51

> They had assurances that they wouldn't just have it ripped from them, are being given time to find a replacement

Sure, but we're still talking after the event here.

The overall point of being over-reliant on a single donor in the first place remains the fundamental problem that could have been avoided.

I suggest we draw a line in the sand and agree to disagree.

ThinkBeat · 2023-06-03T17:08:31

I think that is "something" for a huge amount of business operating around the world.

For everyone here saying its trivial and they can do it cheap.

Get together and form a non-profit organization that offers super cheap infrastructure to the NixOS foundation. With a 99.99% SLA and around the clock staff ready if something goes wrong.

Which I think if you just co locate at a datacenter means going to the physical racks when shit breaks. (Or pay for "remote hands" and "remote spare parts". Or just pay for managed hosting.

I think that sounds like a lot of work for volunteers who have other jobs as wel.

nine_k · 2023-06-05T02:25:54

A perfect case for something like Backblaze to sponsor it and get some more publicity.

paulddraper · 2023-06-03T04:16:44

> they put themselves in this position

You say this like they're hopeless.

simonjgreen · 2023-06-03T08:18:00

I think the implication is either naive or wilfully ignorant, rather than hopeless

tecleandor · 2023-06-03T12:33:03

Yep, the thing is they're using Fastly as CDN, moving 1.5PB a month, and its cost doesn't seem to be a problem for them (I don't know if its covered by Fastly or they have a good price). The gross of their problem is the storage, and S3 is probably the most expensive place to put raw storage.

If they start paying the bill, their best option is moving out. Either to a cheaper provider (be it R2, B2 or whatever) or self managing it (Minio...).

And seeing the $32K price for the migration to R2, and guessing that's transfer costs, I would garbage collect and move only hot data and let the rest be rebuilt during some months.

adisbladis · 2023-06-03T13:16:24

Fastly sponsors NixOS.

tecleandor · 2023-06-04T12:24:20

That's good, because that bill would be even bigger than the S3 one :P

mrweasel · 2023-06-03T08:13:43

There's still open source hosting providers that could provide the bandwidth and maybe the storage for such a project, but I doubt many of them could provide something like S3, it would either be git, https, ftp or maybe rsync.

Building something like S3 yourself on Swift or Minio isn't easily done, the management of these solution can be tricky, especially for multi-petabyte systems.

S3 most likely got them going and scaling really easily, but then at some point the cost comes back to bit you. Still I don't see any good alternatives right now. I don't completely disagree with your, but the overhead of building an S3 like storage solution upfront would have actively prevented the NixOS project getting off the ground.

traceroute66 · 2023-06-03T12:10:16

> While I feel for them, they put themselves in this position.

I agree entirely.

I feel the same way about OpenBSD when they come round begging for money.

I'm like "sure, great cause, but can we first have chat about your totally energy inefficient hosting of multiple racks full of ancient power-hungry kit in Theo's basement".

Money doesn't grow on trees, and especially in this era of inflation and high energy prices, I'm afraid I don't think it is unreasonable for me to expect the organisation asking me for money to get their affairs in order first.

anotherhue · 2023-06-03T15:49:08

Well they're hardly going to find a colo that'll take those machines, so you must mean that they should stop building for them?

They've said before they hardware heterogeneity has been useful in bug hunting, and at worst it can be viewed as a heating expense for the fairly prolific guy.

Does qemu support all those architectures ?

traceroute66 · 2023-06-03T16:29:15

> Does qemu support all those architectures ?

Probably. But my understanding is that the OpenBSD core developers consider qemu as the antithesis to their beliefs.

At least that's what I recall reading when it was discussed at the time, i.e. it was met with one of Theo's famous hard "no".

tstrimple · 2023-06-03T05:30:08

> That's nothing, unless you're "saving money by using the cloud".

I've helped a number of clients save money by using the cloud. It's not that hard when you use the tools for the purposes they are suited for. It's not the hammer's fault that it does a poor job of pounding in that screw. At some point the "craftsman" needs to take the time to learn what their tools do and what they are best used for.

jtolds · 2023-06-03T04:03:13

Cloudflyer (https://www.cloudflyer.io/) offers free migrations from S3 to Storj. Storj is $4/TB/mo, $7/TB egress. It's decentralized storage, so the base functionality has CDN-like performance (no Cloudfront needed), and you don't need to pay extra for multiregion redundancy.

I jumped on the Nix Foundation Matrix to try and help them directly, but for anyone reading this thread, Storj might be able to cut a zero off your storage costs. Check us out!

Full disclosure - I work for Storj. https://www.storj.io/

ChymeraXYZ · 2023-06-03T06:11:19

"you’ll be paid in STORJ Token." https://www.storj.io/node

This statement basically makes me unable to take this seriously.

At "Egress Bandwidth $20/TB" it should pay off to buy up all the cheap hetzner servers and convert them to storj nodes. They would pay themselves back after 2 TB and then you are off making profit.

But wait, there is more:

They charge 7$ for egress bandwidth... How can they pay 20$ to the one providing the bandwidth. Oh, right they don't actually pay the operator, they give you a crypto token.

So all in all paying for storj feels like supporting yet another crypto scheme.

jtolds · 2023-06-03T07:16:45

Certainly there are differing opinions about cryptocurrency in general.

In terms of "does the math work", you're absolutely right that charging $7/TB and paying $20/TB to our storage node operators is not sustainable. But it shouldn't be a surprise that this was an intentional early subsidy to grow the supply side of the network. In an open system like ours, you see these business dynamics, whereas you might just not see them otherwise.

To be fully transparent, we are working to reduce that subsidy now that we're hitting scale. You can see some of our conversations with our storage node operator community about this on our forum: https://forum.storj.io/t/announcement-changes-to-storj-node-.... You can also see full network stats here: http://stats.storjshare.io/

quickthrower2 · 2023-06-03T13:04:59

It it a bit mean to anyone who has invested in equipment to supply to the network and used the subsidised numbers to work out if it will be profitable. This should be clearer on the website.

AnonC · 2023-06-03T05:29:21

This seemed interesting, and I had a quick look at a few pages on the Storj website. The website looks like its geared towards the enterprise segment. At the same time, I didn’t see any prominent link to a comparison page. Comparing with Backblaze B2, Cloudflare R2, etc., would be quite useful.

On integrations, I see there’s a tech preview for Restic. Any plans to support Borg? Is there a platform roadmap available publicly? I’m primarily looking as a single user to store larger backups. The Storj pricing (only for storage, not for egress) seems to be a lot lower than that of rsync.net (even with the special HN Reader’s discount).

jtolds · 2023-06-03T07:21:47

This is really good feedback, thank you.

The platform roadmap is available here: https://github.com/orgs/storj/projects/23. Good question about Borg. We are supported by rclone natively, and I've seen some success with people combing Borg and rclone. But that's certainly something we can look into more.

AnonC · 2023-06-03T10:01:44

Very quickly, the roadmap page you linked to seems to be available only for GitHub users (and probably people who are also given permissions on your repo?). I only see a long list of “You can’t see this item” cards.

SSLy · 2023-06-03T10:17:59

They're hiding their not-yet-cleared queue, if you scroll to the right there should be visible cards, even logged-off.

optymizer · 2023-06-03T04:12:30

On an unrelated note, the Storj landing page spells the word 'compatibility' wrong ('compatability').

jtolds · 2023-06-03T04:19:13

ooh, that's embarrassing. thanks for pointing that out. fixed now

AnonC · 2023-06-03T05:34:07

The same typo seems to be in other places too. I just noticed it in the Products submenu under Why Storj.

xgbi · 2023-06-03T06:46:21

And the canary "shows" that there's been subpoenas served:

https://www.storj.io/canary.txt

AnonC · 2023-06-03T07:20:50

Oh, boy! It really begs the question if these are even trustworthy at all.

jtolds · 2023-06-03T07:22:35

Here's the HN conversation about that: https://news.ycombinator.com/item?id=34192336

labawi · 2023-06-08T08:04:50

I really like the storj concept - distributed storage, local encryption. Comes from tahoe-lafs I believe, except you don't need to find 10 buddies.

There is one main reason I don't use it besides web share: Despite being OSS, uplink-cli is not available in linux distribution packages, which

- makes me uneasy about stability/maturity of the platform (e.g. do you not provide packages because you don't have a stable API, and if so, should I really rely on you to keep my data?)

- it increases friction - either build from source or manually copy binaries; and you have need to do this on all your machines and servers; and it means no automatic updates

- binaries/packages without trusted update channels may be against corporate/personal policies, and you usually want your storage accessible on all/most of your computers

- if available in non-distribution package repositories (didn't see in quick-start) - it would still create trust issues

Have you considered getting your cli packages in distributions?

If I could just `apt install uplink-cli` or similar, I would already be using it.

ARandomerDude · 2023-06-03T06:53:28

How do you convert Storj tokens to USD?

jtolds · 2023-06-03T07:26:38

As a customer of Storj, this isn't something you'll need to worry about. We're integrated with Stripe and take credit cards and invoices. We will also take STORJ token if you want.

If you are looking to provide storage space to the network as a storage node operator, then your nearest reputable exchange should do the trick.

ignoramous · 2023-06-03T09:51:22

I think to serve files over WebTorrent / BitTorrent with a (HTTP) web seed stored across blob stores at Cloudflare R2 / Backblaze B2 / Wasabi is a better model for a distributed CDN.

PlaneSploit · 2023-06-03T04:41:04

Perfect use case for Storj. I hope they take you up on that.

mattbee · 2023-06-03T07:17:39

Why not offer to sponsor them and cut all the zeroes? When I built a hosting business that was most of our marketing and it worked really well.

jtolds · 2023-06-03T07:23:18

We're hoping they respond to some of our reach outs for a deeper discussion! But if for some reason you're reading this here, NixOS Foundation folks, my email is in my profile!

user6723 · 2023-06-03T03:49:39

One of my colocated servers has 60TB of space to spare and has a 10gbe unmetered uplink on the provider's IP-transit blend.

It is easy to configure traffic shaping on the box so that any unused upload bandwidth can be used without any impact to existing applications.

I would be happy to seed NixOS torrents. There is an extension to torrents where files can be added to a torrent after a torrent is released that might make sense to look at.

toastal · 2023-06-03T05:32:27

Makes you wonder why the whole protocol isn't Bittorrent with users seeding their builds to others.

kortilla · 2023-06-03T05:49:37

Because the cloud movement in all of its hype stifled any investment in this area.

toastal · 2023-06-03T06:20:00

I believe it. I foresee a lot of holes being dug for projects buying into “the cloud” and “the edge”.

Jhsto · 2023-06-03T09:13:46

With NixOS the main problem is that a lot of local "packages" are configuration files with a lot of secrets. It is very easy to end up seeding your VPN and email passwords this way.

toastal · 2023-06-03T11:10:04

It's not required but it’s a best practices to not get your secrets into the nix store. There are solutions like env vars or Agenix.

Filligree · 2023-06-03T14:31:23

People do it anyway, and none of those solutions are as convenient.

toastal · 2023-06-03T16:00:29

Guess you should just check all your secrets into source control. That would be easier too.

pxc · 2023-06-04T16:50:18

The Nix store is world-readable. Storing plaintext secrets in the Nix store is absolutely insane and foolish. Who does this?

Either don't store your secrets this way or don't use Nix. Please.

Filligree · 2023-06-07T14:05:08

I use agenix when I can, but I'm internally wincing at the thought of trying to explain it to friends I'm getting into nixos. It's not exactly usable.

/nix/store isn't world-readable. It's machine-readable, and most machines are single-user. I'll explain the caveats, but I can't tell them never to put secrets there; the predictable outcome then is they won't try nixos.

ptman · 2023-06-04T20:12:01

There are unfortunately lots of software packages that don't offer a convenient passwordFile option in nix and the easiest way to get running is to specify a password in a way that ends up in the nix store.

pxc · 2023-06-05T22:44:40

I knew of a few options like that, but didn't realize it was so widespread. I wish Nixpkgs had a policy against writing modules that way. This creates a policy nightmare for companies that might use Nix. Alternatively, we need ACLs for the Nix store, or at least some store paths, at a minimum. :-\

ritchiey · 2023-06-03T07:23:43

The reason usually cited for why not BitTorrent for software packages is because it makes public what package versions you have and haven’t downloaded.

This information could be useful to an attacker.

goodpoint · 2023-06-03T10:18:01

Hardly. Plus I'd rather have a random bunch of people get a small fraction of such information rather than a single company see everything.

__MatrixMan__ · 2023-06-03T19:50:12

Only if you're only seeding what you're using. It would be easy to add an obfuscation layer that seeds more.

heavyset_go · 2023-06-03T08:14:41

At one point there were BitTorrent transports for Debian's apt.

teddyh · 2023-06-03T12:31:28

<https://wiki.debian.org/DebTorrent>

antoniojtorres · 2023-06-03T21:25:36

I wonder if it’s possible to easily seed portions/parts of giant files. A little bit like another project mentioned here but just mediated with torrents and volunteers.

10000truths · 2023-06-03T06:36:46

Google and some napkin math tell me that this kind of workload should not be on the cloud.

A 4U or 8U rack server with 1 PB of storage and plenty of cores: ~$40K

Colo facilities in the Bay Area: $400-500/mo

Compared against the status quo of $9K/mo, the ROI ends up being 5 months per box, for twice the storage capacity they need. Set up multiple of those boxes around the world to the extent that you care about redundancy/distribution, and they still end up far ahead in a few years time.

georgyo · 2023-06-03T11:05:39

I am no fan of the cloud, but with that many disks, failure rate goes up. So you also need someone to go to the DC. All the disks will bought around the same time, which means failures start to cluster up as well.

The machine also needs to be replaced every $interval. After 4 or 5 years, keep it up is more hassle that replacing it.

A single machine patches and will need to reboot. What should happen to people trying to access the store?

Keep replicas help, ensuring a petabyte or even half a petabyte of small files is in sync across multiple sites with various link speeds is not as trivial as rsync.

welterde · 2023-06-03T16:46:24

Most datacenters offer remote hands, so there is no need to travel to the DC for those kinds of things (just ship them the drives and have them replace it). One can also easily add extra spare hard drives to the pool that ZFS or mdraid can automatically put into production in case of drive failure, so there is no need for active intervention after each drive failure.

And if any downtime for upgrades is unacceptable one can go for distributed storage solutions like ceph, seaweedfs or minio. Or simply 1:1 mirror everything to a second machine (which you want for disaster recovery anyway) and switch over during the maintenance period. And since for this specific usecase the server(s) would only serve as CDN origin, a brief downtime might even have very little actual impact.

bsagdiyev · 2023-06-03T11:30:46

If the NixOS team cannot handle this then I question their capabilities. This is cake work.

quickthrower2 · 2023-06-03T13:10:00

Handle yes, but is this “cheaper”

toomuchtodo · 2023-06-03T16:29:34

As is, Nix would have to spend $108k/year to continue to host from S3. For comparison, OpenStreetMap's entire budget for infra/ops is 169,197/year (FY2023).

georgyo · 2023-06-07T22:31:32

Of the cloud storages, AWS S3 is the most expensive for both storage and bandwidth.

A half petabyte at other places such as wasabi would cost 35k a year, including bandwidth.

Open street maps by comparison in 196GB compressed including full history. 20 years of full snapshots would amount to only .2PB, or 20k a month to host on cheaper storage providers.

I don't think using OSM's budget was the best example.

marginalia_nu · 2023-06-03T10:45:41

Sounds about right. You might possibly be able to bring the server cost down to like $20K if you really turn every stone looking for good offers (for a storage server like this you can probably go with a refurbished machine; the demand on the hardware is really nothing to write home about), but you should be paying hundreds per month, absolutely nowhere near $10k/month.

heavyset_go · 2023-06-03T03:08:26

S3's costs are insane for this use case, can't imagine using it to for a Linux distribution's repositories.

edit: Oh man, there's potentially a $32k migration fee to pay for S3 egress, as well.

scottlamb · 2023-06-03T03:31:54

"We will cover egress and migration fees for customers migrating over 10 TB of data from US, Canada, and Europe regions, and storing it with us for at least 12 months." - https://www.backblaze.com/b2/c2c-migration.html

Also free egress from Backblaze to CDNs (Cloudflare or Fastly).

mythz · 2023-06-03T04:27:23

Hopefully France is successful in suing AWS for their anti competitive Egress fees. Either way we've avoided any potential land mines like this by switching to Cloudflare R2.

supriyo-biswas · 2023-06-03T07:25:28

Ofcom in the UK has proposed an investigation of the big three cloud providers' services for high egress fees[1].

[1] https://www.ofcom.org.uk/news-centre/2023/ofcom-proposes-to-...

sitkack · 2023-06-03T07:22:11

Why just AWS, Google leases dark fiber to the other telcos, it basically runs the entire internet. That it charges egress fees at all is kinda nuts. They make 100% profit (more) on your egress from GCP.

pxc · 2023-06-03T03:35:58

> Oh man, there's potentially a $32k migration fee to pay for S3 egress, as well.

I've known about this for a while, but never bothered to ask. Why is that kind of fee even legal? Is there anything pro-social or valuable about that kind of business practice?

cyberax · 2023-06-03T04:13:32

Why should it be _illegal_? AWS egress fees are clearly documented, it's not like they are some kind of hidden deceptive fee.

Companies are free to impose whatever fees they want. You can simply just choose to not have business relationships with them. Monopolies complicate the question, but AWS is not a monopoly.

zarzavat · 2023-06-03T05:50:23

Because it may be viewed as an anti-competitive business practice intended to prevent customers from switching providers.

Yes, AWS’s customers knew what they were signing up for, but AWS’s competitors and potential competitors have a right to enter the market and compete with AWS, and they can’t do that effectively if AWS is hoarding all the customers with punitive switching fees.

cyberax · 2023-06-03T06:33:18

> Because it may be viewed as an anti-competitive business practice intended to prevent customers from switching providers.

S3 egress rates do not prevent you from switching. Transferring out data costs about the same as 3 months of regular storage.

> AWS is hoarding all the customers with punitive switching fees.

The fees are not punitive. The Internet egress fees are similar to inter-region transfer fees within AWS itself.

sitkack · 2023-06-03T07:27:28

Sounds like a libertarian trope. Their house, their rules.

AWS and all the clouds are too big. They structure prices to milk the most of their captured customers. They all price their products within parity, so on the whole there isn't a huge difference in moving. This also makes a nice natural churn between them. The clouds themselves offer almost no innovation in the computing space.

You know who the real rubes are? Everyone else, chip manufacturers, storage, etc. They all made commodity things using shared protocols and people are just pissing it away. And this sht replacing everything is worse than the stuff getting recycled, hardware, ideas, all of it.

cyberax · 2023-06-03T20:10:21

> Sounds like a libertarian trope. Their house, their rules.

Well, yes. I support regulation when it makes sense, but this situation isn't it.

> They structure prices to milk the most of their captured customers. They all price their products within parity, so on the whole there isn't a huge difference in moving.

You've got to make your money somehow. And expense structures for various large providers are pretty similar, there's no shady conspiracy here.

immibis · 2023-06-03T22:02:09

You asked why it isn't against the law, and the law about these things is libertarian.

The law says that since the fee was clearly communicated to you and you had a choice to use the service or not, there's nothing wrong with that.

Y_Y · 2023-06-03T07:26:32

> Companies are free to impose whatever fees they want. You can simply just choose to not have business relationships with them.

Maybe in libertarian paradise, but this line gets parroted a lot for something so patently untrue. Even in the US there are plenty of laws that rightfully limit the freedom of vendors for the benefit of consumers.

goodpoint · 2023-06-03T10:21:28

> Companies are free to impose whatever fees they want.

Wrong. You seem to be unaware of the whole concept of anti-competitive behavior.

1letterunixname · 2023-06-03T17:41:37

You maybe conflating limited circumstances of enforcement and other countries such as those in Europe with ideals. If it's a US company, then there is little meaningful regulation. "Laissez-faire" and "the invisible hand" are the prime directives.

goodpoint · 2023-06-04T08:52:55

No, I never claimed that US is currently enforcing antitrust laws often. But that does not mean they don't exist. See https://en.wikipedia.org/wiki/United_States_antitrust_law

paulddraper · 2023-06-03T04:18:44

It's not like AWS charges specifically for export.

This is their "normal" egress fee. Maybe too high, but it's not a "gotcha" that applies just to migrating away.

dilyevsky · 2023-06-03T04:02:02

Why would that be illegal? That’s just normal egress fee x GB stored. This is known in advance so anyone storing so much data in cloud is willingly putting themselves in this position

driverdan · 2023-06-03T03:59:58

Why would it be illegal? You agree to it when you sign up for the service. Don't like it? Don't use the service.

kbumsik · 2023-06-03T07:30:58

Because the egress costs money actually.

Almost all cloud providers cost you for egress fee or have some limitations on egress, except Cloudflare.

immibis · 2023-06-03T22:03:19

The big clouds - AWS, GCP and Azure - all have utterly ridiculous egress fees compared to every non-"cloud" hosting service and compared to what ISPs charge for these kinds of connections. For example, AWS charges $90/TB while a typical price is more like $1/TB.

It's almost certainly done on purpose to discourage you from switching away, and in the hopes that you won't notice until you deleted all other copies of your data.

dinvlad · 2023-06-03T16:58:56

The question is why egress costs this much in the first place though. All cloud providers pay for the aggregate bandwidth, but its cost seems substantially lower than what they charge for the traffic volume:

https://blog.cloudflare.com/aws-egregious-egress/

https://www.lastweekinaws.com/podcast/aws-morning-brief/how-...

Also to note that on many smaller VPS providers customers rightfully pay for the bandwidth, not total traffic. So this charging-by-traffic appears to have no other explanation than predatory practices.

foobarbecue · 2023-06-03T04:12:17

It does seem like anti-competitive pricing.

speed_spread · 2023-06-03T05:30:44

If anything, it's their own competitivity they're harming.

foobarbecue · 2023-06-03T13:33:32

In case I wasn't clear, my argument is that egress is most expensive for someone trying to migrate away from the system (assuming a big dataset with relatively low traffic). When comparison shopping at the beginning of a project, people often neglect the cost of someday wanting to switch. So this can be hidden vendor lock-in.

ArchD · 2023-06-03T04:03:04

Outbound traffic from S3 is not free. If the data is emigrating, of course you need some outbound traffic.

sitkack · 2023-06-03T07:28:18

Why doesn't it cost to go in?

fbdab103 · 2023-06-03T15:52:26

So then you have to pay to get it back.

charcircuit · 2023-06-04T01:14:15

because in peering agreements it is convention to only charge for only ingress or egress, whichever one is bigger. Most AWS servers use more egress bandwidth than ingress bandwidth so AWS only charges for egress.

irjustin · 2023-06-03T03:59:16

I know we want to jump to hating aws but is there any info on the breakdown of this?

Like who is charging the amount? Is it all aws or is there a cloudflare portion? Part of it is paying an engineer's time? How much data is being transferred?

If it was pure egress fees then it's roughly ~350tb of data?

pizzafeelsright · 2023-06-03T04:03:22

$0.12 per gb

Four grand. It can be done cheaper over time.

Waterluvian · 2023-06-03T03:47:22

They should rebrand it Data Impound Lot.

dinvlad · 2023-06-03T14:57:59

There’s a Snowball device for this kind of migration, which would be much cheaper.

quickthrower2 · 2023-06-03T13:16:08

I don’t know much about NixOS but I imagine the data in those buckets exists elsewhere? There could be a community effort to push data to the new provider and avoid the egress fee (or some of it)?

1letterunixname · 2023-06-03T17:36:15

AWS Snowball (export) is $30/day (10 day minimum) + $30/TiB

greyface- · 2023-06-03T18:46:14

And for just $12,500, you can keep the device instead of returning it!

bombcar · 2023-06-03T03:10:42

At least use one of those s3 competitors with sane pricing.

GitHub? backblaze? Cloud flare?

mkl · 2023-06-03T03:21:11

> same pricing

I think you meant "sane"? Otherwise there's not much point in switching.

bombcar · 2023-06-03T04:57:06

Yeah, that was an unfortunate typo.

sitkack · 2023-06-03T07:29:15

Some things can only be said as a joke or by accident or both.

aidenn0 · 2023-06-03T05:44:36

The only thing I wonder is how much space it would use if the nar files were stored without .xz and in a store that does smart deduplication. Because of the way Nix works, there's going to be a lot of e.g. ELF files that are nearly identical except for the rpath.

eliaspro · 2023-06-03T13:06:30

See this post in the original discussion regarding flokli's work on tvix-store, which provides a completely different storage model, while still being able to provide NAR compatibility by producing them on demand:

https://discourse.nixos.org/t/the-nixos-foundations-call-to-...

The tvix project itself is quite fascinating and worth a look. It reminds me a bit of the Gentoo times when Paludis came around and blew Portage's performance out of the water - but tvix seems to care more about compatibility with the existing ecosystem instead of being a competitor.

aidenn0 · 2023-06-03T17:27:10

tvix-store is neat and I look forward to what it might bring in the future (along with the rest of tvix) However there are many deduplicating storage systems that exist right now. Switching to one of them could potentially be an intermediate solution, though compression would have to be solved; perhaps a low -memory version of zstd for compressing in transit would make sense.

eliaspro · 2023-06-03T22:01:18

It might be worth a try to run Caddy, which supports zstd out of the box (the only webserver which does so to my knowledge), in front of an uncompressed store to see whether on-the-fly encoding/compression is feasible.

https://caddyserver.com/docs/caddyfile/directives/encode#enc...

andrewstuart · 2023-06-03T03:44:21

2 days ago someone said they had thousands of dollars of AWS credits to use.

https://news.ycombinator.com/item?id=36149966

supriyo-biswas · 2023-06-03T04:09:45

The sponsor could use this to enable a one-time migration out of S3, into something like Cloudflare R2.

1. The sponsor generates an IAM user and the corresponding access key to access objects in the NixOS buckets.

2. The NixOS operator grants access to said IAM user and (this is important) enables "requester pays" on the buckets so that the sponsor is billed.

3. The sponsor's keys are added to Cloudflare R2, which pulls objects from S3 and stores them.

dizhn · 2023-06-03T06:01:06

I hope there's no place there for error. It would suck if the gifter ends up with unused credits + a huge bill. :)

botanical · 2023-06-03T03:26:11

I've never heard of LogicBlox but they were sponsoring $9000 a month? That's crazy, normally even big sponsors will give a once-off $5000. Well, well done to them

pxc · 2023-06-03T03:31:27

The author and lead maintainer of Nix used to work there for some years, as well. Maybe he pushed for it, or maybe they saw both things as similar kinds of investments in their stack.

rglullis · 2023-06-03T04:13:09

That seems like the perfect use case for IPFS. I can even bet that if the people at the NixOS foundation played their cards right, they could get the Filecoin bag holders to pay for the development of this new feature.

__MatrixMan__ · 2023-06-03T05:22:08

It's in the works: https://discourse.nixos.org/t/obsidian-systems-is-excited-to...

justahuman74 · 2023-06-03T03:04:00

The nix store seems on the face of it like a perfect use case for something like IPFS

hamandcheese · 2023-06-03T06:13:23

Unfortunately it currently isn't, because nix packages are not content addressed. They are "input addressed", which means that when requesting the contents of a derivation, you have to trust the source to give you the right bytes.

There are efforts underway to make things content addressed. This doesn't solve the trust problem -- you still have to trust someone to tell you which inputs map to which content address -- but it does help with the distribution problem. Mirrors no longer have to be trusted. Trust and distribution become decoupled.