FWIW: a longstanding limitation for parallel downloads within Homebrew isn't architectural (it's not too hard to add!) but structural with respect to Homebrew's download sources: GitHub and others are very gracious with the amount of traffic we send their way, and we don't want to overtax services that have other major consuming parties.
(This is a perverse countereffect: small projects can make performance decisions that Homebrew and other larger projects can't make, because they don't have a large install base that reveals the limitations of those decisions.)
> FWIW: a longstanding limitation for parallel downloads within Homebrew isn't architectural (it's not too hard to add!) but structural with respect to Homebrew's download sources
I have heard that before.
Hmm.... I wonder if you can get away with not doing parallel downloads, but just keep the sequential downloads going in the background while it is installing a package? It is the pause in downloads during an install that I find is the main slow down in brew.
> I wonder if you can get away with not doing parallel downloads, but just keep the sequential downloads going in the background while it is installing a package?
I could be wrong, but I believe multiple people, including maintainers, have looked into exactly that :-)
(I also need to correct myself: there is some work ongoing into concurrent downloads[1]. That work hasn't hit `brew install` yet, where I imagine the question of concurrent traffic volume will become more pressing.)
Sure. There's also a reason no major OSS packaging ecosystem uses these protocols: the only thing worse than a slow distribution scheme is an unreliable one. Combine that with the (reasonable) lack of a reward scheme for seeding an OSS packaging ecosystem, and you have a distribution mechanism that's significantly more brittle than the current "throw a lot of bandwidth at it" approach.
(Among other technical challenges, like updating the P2P broadcast for each new bottle.)
I think you are misinformed as BitTorrent, for instance, is much more reliable than https alone. The reward scheme is built in already: the client uploads while it's downloading and installing and prioritizes the clients it is downloading from. At worst, the reliability and performance are the same as the web seed.
Generating additional metadata at bottle build time doesn't appear to be much of a technical challenge either.
> The reward scheme is built in already: the client uploads while it's downloading and installing and prioritizes the clients it is downloading from.
These are asymmetric: brew runs at a point in time, and most people decidedly do not want brew running in the background or blocking while leechers are still being serviced. They want it to exit quickly once the task at hand is done.
> Generating additional metadata at bottle build time doesn't appear to be much of a technical challenge either.
That's not the challenge. The challenge is distributing those updates. My understanding is that there's no standard way to update a torrent file; you re-roll a new file with the changes. That means staggered delivery, which in turn means a long tail of clients that see different, incompatible views of the same majority-equal files.
> My understanding is that there's no standard way to update a torrent file; you re-roll a new file with the changes.
Kinda. You do create a new torrent, but you distribute it in a way that to a swarm member is functionally equivalent to updating an old one. Check out BEP-0039 and BEP-0046 which respectively cover the HTTP and DHT mechanisms for updating torrents:
If that updated torrent is a BEP-0052 (v2) torrent it will hash per-file, and so the updated v2 torrent will have identical hashes for files which aren't changed: https://www.bittorrent.org/beps/bep_0052.html
This combines with BEP-0038 so the updated torrent can refer to the infohash of the older torrent with which it shares files, so if you already have the old one you only have to download files that have changed: https://www.bittorrent.org/beps/bep_0038.html
That’s very cool! That addresses the basic update issue, although I would be surprised if there was a production-ready Ruby library for torrents that included these. The state of HTTP(S) in Ruby is sad enough :-)
(There’s also still the state/seeding problem and its collision with user expectations around brew getting faster, or at least not any slower.)
I agree with you about package manager usage patterns being a poor fit for seeding by end users. I definitely wouldn't want my computer to participate.
I could see institutional seeders doing it as a way to donate bandwidth though, like a CDN that's built into the distribution protocol instead of getting load-balanced to Microsoft's nearest PoP when hitting a GitHub `ghcr.io` URI like Homebrew does today. Or even better, use that as an HTTP Seed (BEP-0019) to combine benefits of both :)
Yeah, something at the institutional layer makes sense. Thank you for sharing these links!
(My skepticism around whether this makes sense for Homebrew might be obscuring it, but I’m generally quite a big fan of distributed/P2P protocols, and I strongly believe that existing CDN decencies in packaging ecosystems are a risk that needs mitigating.)
There isn't an update issue. BitTorrent metadata are hashes, they get updated and distributed at the same time as the current URLs and hashes in the same file or maybe in a similarly named file right next to it in the same pull request.
There is no state/seeding problem. The client downloads from the same https url as always but uses peers on an as-available basis to speed things up and reduce load on the origin.
The adjacent thread observes that there is an update issue, just one that has a technical solution.
> The client downloads from the same https url as always but uses peers on an as-available basis to speed things up and reduce load on the origin.
So some kind of hybrid scheme, which (to me) implies the worst of both worlds: clients are still going to hammer upstreams on package updates (since client traffic isn’t uniform), and every client pays a bunch of peering overhead that doesn’t pay off until the files are “hot.” In other words, upstreams still need to plan for the same amount of capacity, and clients have to do more work.
(The adjacent thread observes that none of this is necessary if CDNs or other large operators do this between themselves, rather than involving clients. That seems strictly preferable to me.)
> These are asymmetric: brew runs at a point in time, and most people decidedly do not want brew running in the background or blocking while leechers are still being serviced. They want it to exit quickly once the task at hand is done.
Yes, brew exits when it is done installing, nothing would need to change about that if you used BT protocol to speed up downloads. I'm sure you do have some helpful users who would volunteer to seed their cache though, which would become feasible.
> That's not the challenge. The challenge is distributing those updates.
The metadata goes in the formula alongside the current metadata (URLs and hashes.)
> My understanding is that there's no standard way to update a torrent file; you re-roll a new file with the changes.
You should only re-distribute the original file that was downloaded and thus one can just advertise the original torrent that was downloaded.
But as you said earlier, brew is a point in time command and this BitTorrent solution would only really work if brew switched to an always-on service. And I am not sure that many people want to do that, although I am sure some would.
(This is a perverse countereffect: small projects can make performance decisions that Homebrew and other larger projects can't make, because they don't have a large install base that reveals the limitations of those decisions.)