Hacker News new | past | comments | ask | show | jobs | submit login
Nix/NixOS S3 Update and Recap of Community Call (nixos.org)
107 points by ghuntley on June 9, 2023 | hide | past | favorite | 83 comments



Interesting to compare with other distros: https://chaos.social/@Foxboron/110480984761804936

> It's wild to me that the monthly cost of the #NixOS cache infrastructure could fund the entire Arch Linux infrastructure costs for a year.

> It's 6 times the amount Arch get in monthly donations, and ~4 times Debian gets in monthly donations.


This is probably in part explained by the sheer size of nixpkgs. It's the largest and freshest (i.e. with the most up-to-date versions) package repository among Linux package managers (see https://repology.org/repositories/statistics/total).

Unlike Arch it also is not a rolling release package repository and has both unstable and stable branches that it needs to maintain (along with different versions for each). In fact, the NixOS cache keeps every version of every package that has ever been built and stored in the cache so that binary versions of all historical package versions are available.

Finally it also includes packages for MacOS binaries as well as Linux ones.

I'm frankly surprised it actually costs as little as it does given that it's using pretty expensive S3 storage for everything.


the cache also stores the source code used to create the build. there’s been numerous cases where upstream code has been deleted on GitHub or elsewhere then rescued from nixcache.


What percentage of the total repo is source code? I would think the compiled artifacts could be tossed at a moment’s notice, but would not want to lose the source.


I wonder if there’s a more cost effective storage option besides S3.


S3 is definitely not in a great spot on the cost effectiveness axis, but it wins on a lot of other axes


The costs involved are so small it makes me a bit sad. Such an important and useful project is struggling over costs that are less than an average engineer’s salary?

This is something that tens of thousands of engineers are getting huge value from in their daily life. I know this is a pretty common problem in the open source world, but it still sucks to see.


We have split this up into short and long term. Right now it's just to resolve it and then looking further into a more scalable/long term option. We haven't started an external fundraising campaign yet as I hope to make progress with one of the providers if possible first. Hope to update early next week.


But that’s the question though: are they all getting such huge value from it if none of them are willing to pay for it? Or is this an issue of Nix just not even asking for money?


Even OpenSSL previously struggled to get enough funding until it finally blew up and the situation became widely known. Things aren’t easy for projects led by volunteers.


NixOS stores its build artifacts in S3, but it's now too expensive, so they're exploring other options, including Storj, Cloudflare R2, and self-hosting.


Amazon literally hasn't reduced prices for S3 since 2016: https://www.petercai.com/storage-prices-have-sort-of-stopped...

Hard drive cost $/B continue to fall: https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/

Self-hosting is looking more and more attractive.


AWS has repeatedly released new, lower-cost tiers. Indeed, Nix is already using the Infrequent Access tier. That tier already existed when that blog post was made, but it hurts the author's point so they didn't mention it. The cheapest tier, Glacier Deep Archive, is $0.00099/GB.

There are still lots of reasons to consider alternative hosting providers (the biggest one: egress) but that blog post--frequently reposted and annoyingly misleading--is not a good summary of the situation here.


> The cheapest tier, Glacier Deep Archive, is $0.00099/GB.

Bad example. "Latency", if it can even be called that, is 12-48 hours.

It's not cloud storage, it's archiving.


Yeah, $0.00099/GB (roughly $1/TB per month) sounds cheap, but retrieving that data will cost you almost two orders of magnitude more than the storage, so not really such a good deal. Also, obviously the latency of the "glacial" tier is so awful that it is useless for anything other than archival of long-term data without any need for speedy restoration.


Glacier Deep Archive can take hours to access abucket, likely because the right tape should be shipped from a warehouse.


We have... a lot of terabytes in GDA. I'm well aware :) I'm trying to show the wide range of tiering options available to fit various needs. OP's article pretends that only Standard exists. You'd be a fool to raw dog 425TB of cold data on the Standard tier, and indeed Nix is not doing that.


Intelligent tiering was released after 2016 and is fantastic at reducing prices?


They also store them without any distributed software supply chain integrity. One compromised person and the whole system fails.

I wish they were less concerned with where they artifacts are stored and more about how nixpkg can prove they are authentic and who authored them.

Nix has so many innovations over other other distributions but the massive step backwards in supply chain integrity still makes it too dangerous to use for any system you would not trust strangers with ssh access to.


NixOS binary caches are secured by public-key signatures, as well as HTTPS. Hashes for all source artifacts are checked by builders, and this is strongly enforced by a build sandbox that disables network access for derivations without a known output hash. NixOS is leading the effort toward reproducible builds (https://reproducible.nixos.org/), with many sources of nonreproducibility automatically ruled out by design.

Supply chain security is of course a massively multifaceted technical and social problem, but I’m curious what other distributions you think are doing it better in practice?


Nix packages are blindly and automatically signed by a single private key last I reviewed it. Bribe or compromise one person and the entire model fails. There is good reason to do this too given that many major blockchain projects use it for builds. Hundreds of millions to be made by one that impersonates the right maintainer before the right software release.

Also no signed commits or signed authorship means someone with Github access can just fake history and inject whatever they want after code reviews are completed, which will then be blindly and automatically signed.

Some of the people with write access to the nixpgs repo even have SMS recovery enabled on their github recovery email accounts. One sim swap to compromise all nix users. I will not call them out, but go try to do a email password reset on recent committers for yourself. A malicious github employee could also of course do whatever they want to an unsigned repo. Or a well placed BGP attack. Lots of options. It is hard to prevent such things, but author commit signing would mitigate the risk and can be enforced.

I made my case for this to the nix team but in the end it was concluded people would stop maintaining packages if they had to do the bare minimum like commit signing or hardware 2FA. https://github.com/NixOS/rfcs/pull/34

All this is fine, but it means effectively a decision was made for NixOS to be a hobby distro not suitable for any targeted applications or individuals. It really sucks, because I love everything else about nix design.

Instead I am forced to bootstrap high security applications using arch and debian toolchains which are worse than nix in every way but supply chain integrity given that all authors directly sign package sources with their personal well verified keys. They have a ton of other security and even their own supply chain problems but they at least can survive phishing, a malicious mirror, or a sim swap. It is a low bar nix sadly does not meet.


They are efforts underway. Look in to content-addressed-nix and Trustix.

But the way I understand it, the current trust model is no different than any other package manager, so this hardly seems like a fair criticism.


Trustix and CAN are -fantastic- for the problems they are intended to solve, but they only cover the consumption side of the supply chain. You can use these to ensure the binaries are built from the published code, or that you are using the right published code. You have no idea if the published git commits were not made by a sim swap compromised github account... because maintainers are not required to do the bare minimum such as signing commits.

Compare to arch, fedora, debian, and basically every other linux distro that has existed more than a decade. Every maintainer signs their own contributions with well known keys so they cannot be impersonated and so later stages of the supply chain cannot tamper with them.

Newer distros like Nix and Alpine decided do get rid of all that security overhead in order to attract a huge pile of randos as maintainers. I mean, it worked, but at a very high price.


I see. Yes, that is a bit disappointing.


I don't follow. The NARs are signed by a key that (presumably?) only Hydra has access to. (The signature is then recorded in the narinfo file). The integrity is the nixpkgs history, and the signature verification that happens when you pull the narinfo+NAR from a cache that hydra pushed to.


How can Hydra know the nixpkgs repo was not tampered with? Maintainers impersonated?

How can anyone know the Hydra signing key was not tampered with?

These are problems other linux distros have solved for decades by just requiring maintainers press a blinking yubikey or similar to sign their contributions.


Nix will refuse to accept any cache substitution that it doesn't have a trusted signature for. And if you distrust their cache, you can always disable it entirely or replace it with your own.


Signed by a single centralized key on an internet connected stack that how many people have control of?


Would a decentralized storage option work to reduce some costs incurred by the centralized storage companies (ie, P2P via torrents)? Offload the bandwidth and storage to community members willing to loan out their unmetered home connections for hosting the nixOS packages? Even corps or startups can loan out their infra to assist on an as needed basis and use the depreciation as some sort of tax write off.


In the old days of the early web, we called these "mirrors"


The equivalent of mirroring would be using a CDN. I understand that NixOS already does this through Fastly.

The problem is that the base data is substantial (425 TiB) and expensive to store on S3. Just mirroring the base data would not change this. Some sort of way to distribute the 425 TiB among multiple hosts/etc might help, but seems like that would be really complicated to manage reliably—much harder than finding a new sponsor or moving off S3 to some more efficient alternative.


The problem is ensuring that files will reliably be available for years. P2P doesn't do a good job in that regard.


Some work has been done on integrating ipfs within nix. https://github.com/obsidiansystems/ipfs-nix-guide/blob/maste...


All the major clouds charge close to 9 cents per gigabyte for data out (egress).

Cloudflares R2 storage charges 0 cents out per gigabyte.

I don’t know how they do it.


There was a recent thread titled something like "How do I serve 5TB of data" (I can't seem to find it now). MANY of the responses said that bandwidth in Cloudflare is 0 cents "until you reach 100TB, then they want to talk to you about their Enterprise plans". This was echoed by 5+ responses, so before you go relying on CF's free bandwidth, check on this.

We were looking to migrate a small photo service over to Cloudflare workers and R2 storage, partly because of the Bandwidth Alliance, but squashed that when we heard there might be surprise charges.


They started Bandwidth Alliance [0] [1] to reduce, or even eliminate, bandwidth charges. So I think they're sticking to that commitment.

I can't speak to the overall health of the alliance, though. I remember reaching out to one of their listed partners (I forget which) in the Alliance a couple of years ago about the egress fee for bandwidth originating from Cloudflare, and the response I got was "we can talk about that when your fees get huge". That wasn't the response I was expecting from a partner.

I just clicked on one of the listed partners [2[ and the page still says "all bandwidth between Vultr and Cloudflare will be free of charge in the upcoming months." The Wayback Machine tells me that that statement hasn't been updated ever since it was created.

So without a clear number or scale, customers are still at the mercy of a call to the partner's Sales dept.

[0]: https://blog.cloudflare.com/bandwidth-alliance/

[1]: https://www.cloudflare.com/bandwidth-alliance/

[2]: https://www.cloudflare.com/partners/technology-partners/vult...


It seems the answer for Cloudflare is that the real price of their services is just a lot more hidden.

9 cents per GB is still a lot more than it needs to be. It's a form of vendor lock-in. BunnyCDN charges[1] 1 to 6 cents per GB depending on the region, or 0.5 cents and lower for the "volume network" with fewer POPs. They've done so for years, and it funded their entire operation until their first investment round last October.

[1]: https://bunny.net/pricing/


> LogicBlox, the longstanding sponsor of https://cache.nixos.org 25, is ending its sponsorship

Wow, I suppose this was bound to happen eventually, but LogicBlox built their entire prod infra on NixOS a decade ago.

Guess they didn't see the value in continuing to use such a niche distro.


They got acquired a bit of time ago and are now making some changes in the larger entity.


I'm surprised replit is not helping out financially considering they now use nix to power their repls.



The other option is to rent a computer with lots of storage from hetzner / ionos or similar.


There's a hell of a lot more involved in hosting 400+ TB of data [1] than "renting a computer".

[1]: https://discourse.nixos.org/t/nixos-foundations-financial-su...


400TB of disk and a 100Gbe NIC in a single server is easily affordable hardware for a hobbyist, let alone any kind of organization

yeah, there will be maintenance and setup work, but we've (web software engineers) developed a weirdly extreme fear of self hosting, even in cases where we're being killed on transfer costs


Errr... What do you mean by "easily affordable"? A quick look on Amazon says that I'd be paying something like €18.80/TB, so buying 400TB would be €7,520 - far out of the range of the majority of hobbyists. Plus a 100GBE NIC would be another €800 or so, plus you'll need all the other ancillary things like a case and a processor and RAM and cabling and whatever the monthly cost to host it somewhere, unless your ISP gives you a 100Gbps connection (which I'm sure will be extortionately expensive if they do). End result is the hardware costs alone approaching €10,000 - significantly more if you want any sort of redundancy or backups (or are you planning to run the whole thing in RAID0 YOLO mode?), plus probably low-hundreds per month, not including the time investment needed to maintain it.


I mean, I spent about half that just on the computers in my apartment and I only have a 56Gbe LAN

I shouldn't say "easily affordable" like anyone can afford it, but rather something that you can do with off the shelf retail components (within the budget of a hobbyist software engineer, anyway)

it's cheap compared to hobbies like driving a sports car, at least!


You seem to have seriously skewed notions of what’s considered a hobbyist


Why? You can fit that in a single server. None of the data is unrecoverable. Most is binary cache for built packages and the ISOs can simply be rebuilt from what’s in git. They’re paying ridiculous amounts of money to serve easily rebuilt data.


Depending on how bad & frequent you want outages to be and whether you want the whole thing to go down for scheduled maintenance, it can get quite a bit more complicated. A pair of servers with some level of RAID is probably the minimum to achieve any amount of availability better than a server in someone's closet. Then you need a cold backup, on top of that, no matter how many live servers you have.

I've also had much better luck with connectivity, transfer speeds, and latency, for clients in some regions, with AWS than with some other hosts (e.g. Digital Ocean). It seems peering agreement quality really matters. Not sure how the big physical server hosts fare on that front vs. AWS (which is basically the gold standard, as far as I can tell) and it's hard to find such info without trying and seeing what happens. This matters if you're trying to serve a large audience over much of the Earth. The easy single-server solution might well see some clients lost 95% of their transfer speed from your service, or not be able to connect at all, compared with AWS.


The old binaries are not necessarily simple to rebuild. Projects move / go offline. Tags get moved. Binaries for the impure systems may not be public anymore. You can't easily rebuild everything from that cache.


this is an oversimplification that borders on absurdity.


Not absurdity, it was seriously considered on Nix Discourse and Matrix. Hetzner offers very attractive options like CX line of dedicated VMs, three of which would be sufficient for this project. Alas, this still requires more consideration than the time available to switchover.

https://www.hetzner.com/dedicated-rootserver/matrix-sx


S3 stores 3 separate copies across 3 AZs so for that redundancy (at 400TiB), wouldn't you need more like 6x 14x16TB servers if you're making a direct comparison?


Yes, for redundancy it could be useful, but doesn’t change the affordability materially.


there is human cost associated with operating this that s3 and other fully managed object stores don’t have.

there is just no reality where you build your own service for this with better reliability, less management, and less maintenance than s3 (and when i say s3 i’m including all managed object stores).

pointing at hetzner and saying, “look how cheap this is”, is missing the point.


_shouldn't_ it work that way?


I was tempted to jump in the initial post and offer help with self storage, but honestly as someone who has done large-scale self-hosted object storage (many PB of storage in fact), it's just one of those things that are almost always going to be better left to a dedicated service like S3/R2 or something else.

I'm glad to see their short and long-term plan will probably involve a subsidized professional block storage service. I think that's the right move.


Following securing the runway, I wonder if something like IPFS would add value here.


A P2P cache might make a lot of sense to reduce traffic to servers but you still need an authoritative copy of everything. Also the P2P cache would be opt-in and it's effectiveness would depend on its adoption.


It's nix, you only need a hash database for trust.


Sure, but someone still has to store it. If the foundation wants you keep the old builds, they still need to pay for that 0.5PB somewhere.


I am not 100% sure how IPFS works, but it's where some NFCs are stored, right? Is it even possible to only store blobs you care about? Either way, it does solve the migration problem.


I wish Nixpkgs were available over peer-served networks, like bittorrent.


Since the output of nix is typically hashed it seems ripe for rsync based mirroring? Is there something particularly difficult for nixos over a typical arch Linux mirror for example?


Nix is easy to mirror. That doesn't address the root problem, which is simply that it's a lot of data.


I mean... 100tb is like 4 hard drives worth of data these days


Does anyone know what the following refers to?:

> ... and potential for replacing Fastly.

As far as i can tell from this and previous post - mix cache isn't using Fastly?


Lol AWS or Cloudlfare could cover these costs easily. Let’s see who steps up.

Tangentially, if they self hosted using say ZFS would the cached artifacts and other such things be candidates for the dedup features of the file system?


Bezos could cover my costs easily, yet he is not stepping up to help. It seems presumptuous to say that these companies should donate just because they could afford it. As distasteful as it seems, the open source community really needs a better funding model.


“Distasteful“? Really?

Their current sponsor isn’t a deep pocketed publicly traded cloud provider. All of these cloud providers advocate for open source. Cloudflare could come in and cover these costs and write them off as donations or something similar. For AWS they’d probably not even make a material difference in the accounting software.

I think you’ve got some deep seated angst against the Uber wealthy (join the club) but I never mentioned bezos. I mentioned these companies sponsoring selfishly for the goodwill it would bring.

I’ll try to make my case better next time.


This is not "not stepping up". These things take time.

> AWS S3 - We have been in contact with AWS throughout the week and awaiting a response to meet with the OSS team.


Taxes.


> Tangentially, if they self hosted using say ZFS would the cached artifacts and other such things be candidates for the dedup features of the file system?

Yes, block-level deduplication would be hugely beneficial. If you're running Nix locally you can set auto-optimise-store = true, which enables file-level deduplication using hardlinks; that already saves a great deal of space.


The nix cache compresses store artifacts in NAR archives, which I think defeats de-duplication.

Its possible to serve a nix cache directly out of a nix store, creating the NAR archives on-the-fly. But that would require a) a big filesystem and b) a lot more compute.

Which are not insurmountable challenges, but they are challenges.


My org does the live-store-on-zfs model with a thin proxy cache on the front end to take the load off the CPU constantly regenerating the top-5% most popular NARs.

It seems to work well for us, but I am curious how that would look at 425TB scale— would filesystem deduplication pay for the losses in compression, and if so, by how much? How does that change if you start sharding the store across multiple ZFS hosts?


Dedup can be used with compression I thought.


I think you are confusing deduplication and compression provided by the filesystem vs. deduplication and compression provided by nix.

If you were just storing a static set of pregenerated NAR archives, you will not see any benefit from filesystem-provided compression or deduplication.

If you host a live nix store (i.e. uncompressed files under /nix/store), then you could benefit from filesystem-provided compression and deduplication. Also, nix itself can replace duplicates with hard links. But the downside is then that you have to generate the NAR archives on the fly when a client requests a derivation.

That might be worth it, especially since they get great hit rates from Fastly. But on the other hand, that means a lot more moving parts to maintain and monitor vs. simple blob storage.


Definitely. It's not hard to imagine that even if they end up rolling their own for this, it might make sense for it to be a bunch of off-the-shelf Synology NASes stuffed with 22TB drives in donated rackspace around the world, running MinIO. If the "hot" data is all still in S3/CDN/cache, then you're really just keeping the rest of it around on in-case basis, and for that, the simpler the better.


Yeah that's what I was thinking. Say 3 nodes in 3 different data centers all sync'd running enough drives of spinning rust to meet their needs and then some plus the overhead that ZFS requires (i.e the rule of thumb of not using more than 80%? of the usable storage to prevent fragmentation I think) and then exposing that via an S3 compatible API via minio + a CDN of their choice.


Is this done with a off the shelf tool that I could run? Or did you build this in house?


nix-serve is off the shelf: https://github.com/edolstra/nix-serve

Then put whatever cache you want in front.


We started there, briefly tried nix-serve-ng [1], and then landed on Harmonia [2], which has worked well.

[1]: Ran into https://github.com/aristanetworks/nix-serve-ng/issues/19 with it

[2]: https://github.com/nix-community/harmonia


de-dupe in ZFS is a "here be dragons" sort of feature, best left alone unless you're already familiar with ZFS administration and understand the tradeoffs it brings [0]

in this case, the files being stored are compressed archives (NAR files) so the block-level dedupe that ZFS does would probably find very little to work with, because the duplicated content would be at different offsets within each archive.

beyond that, there's also the issue that they would need a single ZFS pool with 500TB+ of storage (given the current ~425TB size). that's certainly doable with ZFS, but the hardware needed for that server would be...non-trivial. and of course that's just a single host, they would want multiple of those servers for redundancy. distributed object storage like S3 is a much better fit for that volume of data.

0: https://www.truenas.com/docs/references/zfsdeduplication/#co...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: