Hacker News new | past | comments | ask | show | jobs | submit login
CocoaPods downloads max out five GitHub server CPUs (github.com/cocoapods)
826 points by jergason on March 8, 2016 | hide | past | favorite | 308 comments

Note how perfect that response from mhagger is. A clear, honest sounding assurance of what Github wants to deliver. A perfectly comprehensible description of what is the problem, and where it is coming from. And then suggestion how to fix it the project actually can work on, plus mentioning changes to git itself that Github is trying to make that would help. It not only shows great work going on behind the scenes (and if that is untrue, it at least gives me that impression, which is what counts), but also explains it in a great way.

I was astonished at how selfish/myopic/whatever alloy's response was.

To be blunt, you're abusing the shit out of SOMEONE ELSE'S product that you're not even paying for. Your first question shouldn't be to see what Github can do for you to make it so you don't have to make changes. You should be falling over yourself investigating all available avenues for reducing load.

It's an incredibly entitled way to think about things and I would have a real hard time employing someone who's first response was like this.

I don't know, it sounded to me like he just didn't totally understand what Github was saying. By the end of the thread, it seemed like everyone was agreeing. I wouldn't be comfortable using words like "selfish" to describe any of what I read.

I certainly don't think the barb about your willingness to employ people who write things on Github issues threads that you disagree with is helping anyone understand any part of this situation. I understand the urge to find ways to be emphatic about how much you disagree with things, and I often find myself compelled to write lines like that, but I think they're virtually always a bad idea.

> I certainly don't think the barb about your willingness to employ people who write things on Github issues threads that you disagree with is helping anyone understand any part of this situation.

It seems to be one of HN's go-to insults. "Look at this person's behavior, I would never hire them," as if everyone wants to work at your startup.

> as if everyone wants to work at your startup.

Do you really think that when people say "I would never hire that person" that there is an implication of "everyone wants to be hired by me (and by extension my company)?"

> Do you really think that when people say "I would never hire that person" that there is an implication of "everyone wants to be hired by me (and by extension my company)?"

Nailed it. Just because someone, a team, or a company has hiring criteria doesn't mean they assume everyone wants to work at their company. It means they have an idea of who they are looking for.

nope. they also want everyone to know.

its the difference between buying a packet of your favorite snacks and telling all your friends what your favorite snack is..

you probably expect them to like the same snack.

But what if they respond with a snack I've never heard of and interest me so with its description that I've just found my NEW favorite snack.

Additionally what if they inform me about my snack with information that means I can't morally choose it anymore, or that it's dangerous to my health? I now have the opportunity to switch my viewpoint, or reduce the weight that it has in my criteria.

It begins the discussion if you as the person starting the thread are interested in having it and not just looking to be agreed with. I, whether I'm in the minority or not, am always looking to start the dialog. Being agreed with is boring.

You're right. The obvious answer is to never communicate with other people... ever. Communication with other people implies that you are full of yourself and looking to show off to other people how awesome you are.

I didn't say I disagree with his statement. I'm saying, maybe more implied, that I'm not going to hire someone who displays a lack of interest in finding a real solution to a real problem that has to do very much with what they're trying to build. And on top of that shows a serious streak of entitlement and a lack empathy towards the very service they're essentially abusing and not even paying for.

I wonder how many times CocoaPods has ruined someone's day/night on some GH team. I wonder how many dinners some mom or dad has missed with their kids because their service alarms are going apeshit. I don't think it's hyperbole to say that if you are a top 10 repo at Github, you are responsible for ruining individuals days and taking time away from their families if you are hammering the system.

Now, these are entirely my opinion and I'm not saying alloy is bad at what they do. I'm saying that is a collection of attitudes that I'm not going to put on my team.

I think it is a hyperbolic statement to say they're responsible for ruining people's days.

Let's think this through and ask ourselves a few question.

1. Did they go out there way to do damage? 2. Are they responsible for deciding the infrastructure and how well it can handle with load? 3. Did they force said people to work at GitHub? 4. Is the open source culture and hosting a major part of GitHub business plan? 5. Are they responsible for staffing to ensure people are scheduled to work when work needs some?

Yup, only matched by 'Great article! We also use Javascript at <randomStartupNameNobodyEverHeard>.

I wonder how many of the people that say that actually are employed in a position where they get to make hiring decisions.

Even as a non-manager I was employed in a position to make hiring decisions. Do you ever get brought into interview loops? You're part of hiring decisions. You may not have final say, but you are hopefully very much listened to.

Really? Alloy specifically said they were blasting a free service's infrastructure for their own benefit. Told about the issues, the response was basically Alloy et al wanted to invest no time or money into a better situation. Even cited HR and funding benefits.

That's selfish to the point that it could be a textbook example of an externality. Fortunately, like you said, things got agreeable by the end with Alloy taking simple steps he was given to make thjngs better for everyone.

> Told about the issues, the response was basically Alloy et al wanted to invest no time or money into a better situation.

That's actually not true.

https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm... is the answer we are talking about, aren't we? What alloy is doing there is:

1. Thanking mhagger for the response

2. Asking for additional explanations

3. He explains then why the project is taking the route they took, the benefits for them. Explaining alone does not mean unwillingness to change. It just makes sure it is clear why it is like it currently is. Yes, that mentions the time and money benefit, but so what – it's honest, and it is valid to limit expectations on what is possible now.

4. He then indeed asks for a discussion on how to improve the current system without changing it completely. But this does not mean not investing time, to the contrary: He actively invites a continuation of a discussion and already there makes clear that he is indeed willing to work on a better solution, and that is the core point making this a constructive answer.

How the discussion continues in my eyes clearly shows that the negative interpretation of this first answer, in your comment and this thread, is wrong. That's not someone blocking change, not even at first, that is someone asking for clarification and even clearer tasks to do. That's not a bad thing at all.

"> Told about the issues, the response was basically Alloy et al wanted to invest no time or money into a better situation."

"That's actually not true."

"taking the route they took, the benefits for them... Yes, that mentions the time and money benefit, but so what – it's honest, and it is valid to limit expectations on what is possible now."

So, it was true. Then the dialog continued from there with Alloy paying attention to the situation. Alloy refused to do any signiciant work on their end for their maximal benefit in a disruptive situation from a free service. Even admitted they were using a service for something it's not designed for but didn't care about moving to a more appropriate one. Hence, my calling it selfish.

Eventually, others had invested their own time and energy into the problem enough to come up with some simple recommendations for Alloy that take very little effort on his part. Alloy summarized those and agreed to attempt them. Thread was closed before we could see where that went.

Given above, I stand by my claim that he was a selfish individual pushing his own liabilities onto others wherever possible. Even in how it was remedied was mostly on others. On other hand, he might make a good capitalist w/ that level of exploitation and externalizing. :)

Saw how you went from "no time and money" to "no significant work"? My point is that he signaled explicitly that he was willing to invest time and work. He literally writes so:

> I.e. I’d like us to continue this discussion, at first, from the notion of us maintaining the existing architecture. Where things are absolutely impossible, it would be great if you can include more links to docs/source that explain why things are impossible.

Notice the "at first", notice also the following proposed action of working with a snapshot.

You made clear you were expecting another reaction. And I see why. Still, I think you miss how much good faith was contained in this response. It is a bit sad this gets overlooked. People forget text is not that easy to interpret.

"Saw how you went from "no time and money" to "no significant work"? "

You're right that text is not easy to interpret. For instance, there were two interpretations of my text: a literal and precise one that focuses on how much effort I say he would commit; an interpretation that realizes I was speaking figuratively with hyperbole. It was the latter. The message was a counterpoint supporting that he was selfish rather than a precise statement of how selfish he was. You would be 100% right if we were talking literally about him such as in a court filing or HR report.

"Notice the "at first", notice also the following proposed action of working with a snapshot. You made clear you were expecting another reaction. And I see why. Still, I think you miss how much good faith was contained in this response."

This is possible. Let me re-read his post first. Alright, done. Here's a re-review.

His first response starts with thanks and statements that show either (a) an incredibly joyous and friendly personality or (b) brown-nosing of a salesman before a pitch. Horizontal line. Unclear on some things. Asks for more information. We then get to the reasons:

1. Did no work on syncing data to reduce funded development hours.

2. Don't want to operate a repo due to reduced effort or funding.

3. Easier for their users and adoption.

These are all self-centered. Honest as you said but already support my claim of selfishness. Let's keep looking. Upon a suggestion of other packaging systems, vaguely claims they are using a "smarter" method then reiterates HR and funding justifications above. Ignores alternatives in next sentence to reiterate their existing, strange, and broken solution with a dismissal about having to build a cathedral rather than just using existing solutions.

So, Alloy already laid a foundation of total selfishness in terms of time, funding, and design inflexibility. At this point, Alloy is interested in solutions that totally maintain their existing design and lack of commitment to anything else. Offers to make a few simple changes that "would still use your resources." Asks for information that basically leads to those in recommendations that they begin to apply.

So, re-reading his post, it comes off as incredibly selfish using text that's not hard to interpret. He clearly believes their design works, won't be changed unless forced, changes must take little effort from them, they must not use their funding, and must specifically use GitHub's resources. My claim of selfish and externalizing is fully supported at this point. I think the other commenter's claim of being "myopic" about what he's doing in the project is accurate, too.

yeah, my read was that he didn't totally get the context of the package dist / CDN suggestion immediately. i think the github peeps probably understood that they were using this approach because the developer workflow was simple and strait-forward, but as you scale up simple and strait-forward approaches often break down, and CocoaPods seems to be hitting this tipping point with using Git and Github as a package dist system.

this is what we used to call the "good" problem (things breaking because they are successful), but that doesn't make it cheap or easy to fix. the other stuff they're talking about in the thread will alleviate some pain and buy some time, but it won't solve the fundamental problem if CocoaPods continues to get "big" (imagine apt or yum trying to run like this).

i understand their want to maintain the simple and coherent workflow... if i was writing a package dist sys, i would love to have it work off a standard git repo. maybe this is something that could be solved with a plugin architecture like the large binaries stuff so that developers could continue with their preferred workflow but end-users could take advantage of a CDN-like system for distribution.

> my read was that he didn't totally get the context of the package dist / CDN suggestion immediately.

This is mine as well, but it's also troubling to me, given that the repo in question is meant to be a package management system; it means there are fundamental holes in the user's understanding of packaging systems.

My mentor has a lot of contempt for the bevy of packaging solutions that people come up with - invariable people look at the old ones, think they're too complex and wrong, write up new code that is Slick(tm) and Fast(tm) and Cool(tm) and they are... until they hit scale. Whether that scale is number of users, or serving multiple environments, or serving a great many packages of different versions... the lack of domain knowledge in the design stage will cause huge amounts of issues.

There is not an old "xcode" compatible package system. The issue is here, that the C++/Objective-C/C eco system doesn't have a standardize module system. Cocoa pods fills good part of the gap.

No, if this way of thinking came out in an interview, where the candidate's first response to the hypothetical was basically "I don't really understand what the problem is but I'm sure 'they' can mostly fix it" (and obviously I'm paraphrasing into how I interpreted Alloy's response), I wouldn't hire them.

Now granted, I'd try to give them a chance to step away from that statement and let them show me that they are interested in understanding the issue and interested in reducing their impact on the product. I'm okay if they don't yet know HOW, but if they basically just throw up their hands, say it's someone else's problem, and leave it at that then no. In fact, hell no. I'm not interested. It's shows a level of entitlement and a lack of interest in their craft and I will not subject my team to someone like that.

I had the same impression from alloy's response. I've basically read it as "hi ho we will not change anything".

And it had this passive agressive ring to it, with the hand clapping and the hurray in the beginning and the stone walling in effect.

Had exactly the same feeling. I felt bad for the GH employee who responded. He was helpful and thoughtful, made it clear what the problem was, offered advice and promised to make whatever's possible on their end...

...only to have someone come up and act like a paying customer whose expectations weren't being met. He answered suggestions by saying something that comes down to "I don't understand, can you repeat please", and never quite grasped that if he wants a better experience for his users, he also needs to work for it.

The introduction to the response, in typical douche manager style, was the cherry on top.

Even if they were a paying customer- what would you do if GH could support that volume? Obviously there are SLAs in place if you're big enough, but at the end of the day, if they simply just can't handle the volume what are going to do? 'Nothing' isn't the answer if you want to maintain users. You'll have to solve the problem no matter what, it just seems really egregious to me to have that response when you're not giving anything back (money or time) to the team that is essential allowing your service to exist.

That was the smuggest use of emoji I've seen in a while

I took it to be more defensive than passive-aggressive.

I took that emojis to be the exact opposite of the almost sarcastic tone I think you're interpreting it to have.

I guess even with emojis, it's hard not to make tone ambiguous.

I guess even with emojis, it's hard not to make tone ambiguous.

...probably especially with emoji.

Yeah. "That's your problem, and we don't want to change anything because that's a non-zero cost for us just to fix a cost on your side."

And what the hell was with the quotes around "free"? Are you paying? No? Then there's no quotes about it.

Not that I'm in the habit of giving people the benefit of the doubt, especially when it comes to mangling grammar, spelling, and punctuation... BUT, I've noticed that awfully many people nowadays seem to think that quote marks are some kind of emphasis sign. So those of you who ARE in the habit of giving people the benefit of the doubt on shit like this might take that into consideration here.

I think you could assume better of people. His response was maybe a bit tone deaf, but text is a very poor way of communicating mood and circumstance. There are any number of creditable explanations from tiredness to language skills that could make sense of the response you're offended by.

fwiw: "I would have a really hard time working for someone whose first inclination was always towards criticism over accommodation or compassion." But then I also acknowledge there may be a whole bunch of other stuff going on here behind the scenes. ;)

That's valid, everyone has the attitudes and traits they're looking for on both sides. I should have better clarified in my first statement that I in fact would give them the opportunity to walk that statement back to show their interest in fixing it, and more importantly show some empathy to the GH team.

My biggest problem is that I can't imagine a time, even if I don't understand the problem, where I would say that I'm not interested in essentially not abusing a free system that can't handle my load. He basically said he had better things to spend his time or money on. That shows a motivation/attitude that would likely be there regardless of the issue or if they understood it. That's his knee jerk reaction, which I would say is probably his most honest, and it seems very selfish and not empathitic to the GH team at all.

I'll caveat this with the fact that I'm a coworker of alloy's and orta's, so I'm admittedly biased to seeing them in a positive light.

But I think you may be reading his tone more negatively than necessary. What I see is him starting off by expressing gratitude and then switching voices to communicate very explicitly what the needs and desires of his stakeholders are. He's simply trying to reflect that as clearly as possible and discover the additional context. This was clearly effective, as you can see from the rest of the conversation that with all the information out there, everybody comes to a mutually acceptable consensus. Problem solved!

What we've ultimately got here is a free service built on a free service. CocoaPods has nothing but the time and effort of volunteers. GitHub has input of resources from the commercial side of their business. Both sides clearly want to preserve the utility of this end-to-end workflow in a more sustainable way.

"Falling over yourself" is a subjective amount of effort, but clearly CocoaPods has tried to minimize their impact on GitHub. As it turns out, the attempted optimization of shallow fetching backfired, but that's not from lack of regard for the resources they rely on. What was missing was exactly the context the Github employees provided.

Honestly, I think people are offended second-hand by a perceived lack of groveling on CocoaPods part, and to me, that's way overblown.

I think there's definitely a tone problem in his first response, but if you continue reading the thread, you'll see that everything else is very positive. I guess it was just a bit of the classic internet text expression problem. Massive high five for the Github guys for looking past it and being extreme pros.

It probably wasn't intentionally selfish. Just really short-sighted, which isn't uncommon. Sometimes it takes a while for things to penetrate for anybody. Alloy's later comments seem much more productive. Whether that's because of backlash, he reread the earlier comment and realized how it seemed, or just had some time to think about the imposition on Github, at least it looks like CocoaPods is going to reconsider how they handle their distribution.

It's hard to see the difference between selfish behavior and short-sighted behavior because they're often confounded. What I got from this is that Alloy will consider the deep vs shallow deep copying Git issue, but wants to continue using Git and GitHub as a CDN for a massive user base, and doesn't want to re-architect because developers are expensive and time is limited.

The Git deep v. shallow issue just puts a band-aid on the CPU problem, but it doesn't do anything about the terabytes of bandwidth per week (it'll be worse), and it won't do anything about GitHub's claim that Git is not meant to be used as a CDN and doesn't scale well.

They've become a big project that warrants thinking about revenue or organization strategy, but they're delaying it by externalizing their costs. Cases like these can pressure GitHub into rethinking the leniency of their policies.

I also think that if you're in the top-5 resource consuming group, more sympathy would go your direction if you were a paying customer, but they've indicated no interest.

  > To be blunt, you're abusing the shit out of SOMEONE ELSE’S
  > product that you're not even paying for.
I <3 GitHub, but for their own business/values/whatever reason they choose to host open source for free. It’s not like these people have found a loophole and are getting a paid service for free.

AFAIC, that makes every free user a customer. They may not be a paying customer, but it's GitHub’s choice to be in the free hosting business.

Just to clarify, GitHub hosts a very specific type of content: open source software development projects. They never offered to be a general purpose hosting provider.

From @mhagger's measured and thoughtful reply: "We understand that part of the CocoaPods workflow is that its end users (i.e., not just the people contributing to CocoaPods/Specs) fetch regularly from GitHub..."

I know where you’re going with this, but GitHub deliberately blurs that line too. For example, I host both of my blogs on GitHub.

I guess I’m saying that I don’t see CocoaPods as being a “bad actor” so much as the extreme tail of a distribution.

Are you hosting your blogs as plain git repos or are you publishing static pages to GitHub Pages, the static hosting option?

That's true that anyone with a repo is essentially a customer, paying or not. But CocoaPods is really a bad actor in this in that they're not just hosting source code up their for development purposes. They're using it like a CDN, and bless Github for not find a reason to boot them, but I'm sure they've got a bunch of legal ways to do it in their terms of service. I'd argue that CocoaPods is really breaking the spirit of what GH is trying to provide.

This is getting granular, but any GitHub user - whether free or paid - is bound by the Terms of Service. Yes, that makes them a customer, and they accordingly have to adhere to the conditions of using the product.

The one that CP's usage most likely confronts is G.12 - found here: (https://help.github.com/articles/github-terms-of-service/)

> If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption.

> you're abusing

Cocoapods uses GitHub. No abuse here.

You're quibbling semantics.

Using a GitHub repo as a high traffic code CDN and keeping 5+ cores pegged while being the single biggest consumer of resources across the whole platform could be reasonably defined as an abuse of the service.

Primary definitions on Google:

"abuse" verb: use (something) to bad effect

"abuse" noun: the improper use of something

Dabbling in the 'kind of customer we can afford to lose' territory.

Not really. Imagine the backlash github would receive here.

As an ops person, I can tell you that I would probably upgrade my paid github account instead of cancel it if a project was thrown out that had the audacity to issue the statement that they don't care about the infrastructure they use for free so they can focus on the important things - namely developer funding.

yeah, it's the kind of backlash that would cancel a couple other non paying projects and there would be half dozen discouraging blog posts but paying customers that aren't open source aren't going to leave because a free tier opensource customer was abusing a service.

> Not really. Imagine the backlash github would receive

I don't know... I'm sure they'd get some, CocoaPods has a very large user base after all. But if GitHub laid out the reasons like they laid out the options to start this thread, I think they'd weather the storm fine and diffuse some people who show up to be angry.

I could see why GH would drop them and think they'd be well within their right and in good moral standing in my book. I just don't think they would unless the maintainers became incredibly hostile or proved unable to fix the problem or even band-aid it. GH just seems too culturally invested in making things like this work to a satisfactory conclusion. I have a feeling that the CocoaPods team will be schooled in a lot of things directly from the GitHub team as they work to resolve the performance issues, just look how informative the initial post was.

It's very clear GitHub was not designed to be used as some project's personal CDN for this kind of traffic.

It's abuse. Wasn't intentional, but at the scale that Cocoapods is running it's abuse.

> personal CDN

Discounting the fact that CocoaPods is used by millions of iOS developers...

He definitely was referring to the CocoaPods team when he used the word "personal," not their users.

If CocoaPod was run by a business that was charging those millions of developers for their services, it would be reasonable to expect that business to pay for a real CDN.

They're not, so that's not a reasonable expectation. But it's no more reasonable to make these demands of GitHub. Giving out a free product doesn't mean that you're required to give it out unconditionally, or in unlimited amounts, or forever. In the end, GitHub owns the infrastructure and services it's providing and can do what it wants with it.

Pretty sure that's the point. If it wasn't used by so many developers, it wouldn't be causing load issues. :)

It's not the developers causing the issue, it's the end users.

In this case, developers are the end user population.

"This repository experiences a huge volume of fetches (multiple fetches per second on average). We understand that part of the CocoaPods workflow is that its end users (i.e., not just the people contributing to CocoaPods/Specs) fetch regularly from GitHub"

Yes, but he means that "end users" of CocoaPods are actually developers. It's a package manager for libraries.

Actually, in the text, I guess no one voting me down has actually read it, he pretty clearly defines the developers as the people who contribute to the packages, not the people who download and use them. It's like saying, as an IDE user, you are a developer of the project, you are not, you are an end user of a developer product. The end users, in this case, software developers, are using it as a CDN. It's pretty clearly stated in multiple places, github is not a CDN.

Except the original quote in this comment chain was "Discounting the fact that CocoaPods is used by millions of iOS developers." so that's what 'developers' was referring to in the comment you originally replied to

I totally disagree. The CocoaPods usage model is not at all the expected way to use GitHub. I'm surprised using GitHub that way is even allowed by the TOS. It's very obviously a hack to avoid paying for their own infrastructure. The project representative in the issues thread even admits as much in his response.

Just about every other package manager for every other language (Pip, CPAN, Hackage, etc.) uses its own infrastructure.

Homebrew uses Github in roughly the same way.

In this case the difference between 'abuse' and fair use may be a homebrew dev that works for github. I would expect that to be a perk of most places that I work.

I've run tor exit nodes and repo hosting when I worked for ISPs and Datacenters while at the same time shuting customers down who do the same. The difference being that I had that conversation with my boss and said, 'this will violate our normal terms of service but I would like to do this.' The boss is of course more willing to make that concession when he can walk down the hall and say, 'uh, we have extremely high usage and today, can you shut down the repos until we can get another link installed?'

Hey, I run 2 middle relays (~80-90mb/s bandwidth total) on cheap VPS's, but was thinking of upping bandwidth and hosting an exi on some dedicated hardwaret. Do you have any tips for running an exit node (legally and technically)?

If you want, you can email me (link in profile). Thanks!

Honestly, I don't like to give advice about this. Due, in part to risks involved, and since I've been out of it for a couple years, I don't really feel qualified.

Everything you need is online but you'd have to find it for yourself and make decisions about the best way for you to do it.

Yes, it's a hugely positive advertisement for the essential role that Github plays in the open source community. Also impressive is the followup message from a Github API developer (and Homebrew maintainer) offering access to a beta API that Homebrew is working with to reduce load. https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

taking a look at homebrew's implementation of this new API feature, i fail to see how it would dramatically reduce fetches for their (homebrew's) use case. from what i understand, it will only be called when the user manually invokes `brew update`. how often are users calling this command over and over?

that being said, i do believe it could help cocoapod's use case since the fetches are done automatically (as i understand it)

Hello! I'm the Homebrew maintainer and GitHub employee who wrote this. The main thing this API does for Homebrew is make no-ops really fast for `brew update`. As you point out this results in no speedup (in fact a tiny slowdown) for the case where you only run it where you know you have changes. Where it becomes useful is if you are using multiple taps (Homebrew's 3rd-party repositories) which update infrequently or if you want to run `brew update` automatically in e.g. a project bootstrap script that's run frequently.

In the medium to long-term I'd like to consider Homebrew running `brew update` automatically before you `brew install` (https://github.com/Homebrew/homebrew/issues/38818). For us to ever be able to do that `brew update`'s no-op case needs to be extremely fast.

Thank you for everything you do at Github and with Homebrew.

And thank you for taking the time to share some kind words.

If you plan to rehaul Homebrew, I suggest having by default 3 operations, "update", "fetch" and "install" because some people might find themselves in the situation of having bad connectivity(especially low bandwidth) and being able to fetch the sources, to compile later is very important. Especially having "install" issue synchronous "update" operations is bad if you're on a network with high packet loss, like tethering to one's phone during vacation, etc... Of course, that requires being able to have repeatable "fetch" operations, and putting a local cache between the "fetch" and the "install" operation, so that if a "fetch" succeeds, a later "install" will not fail with "file not found". I've never used Homebrew, but that's advice from having used many package managers on Linux and other *nixen. My apologies if it's redundant.

I know you're trying to suggest some good ideas, but never having used homebrew I think you should find out more about homebrew itself before suggesting things. I don't mean you have to be an expert at it, but just to know a bit about how it's used, and who it's used by before trying to suggest things to people.

My comment isn't to pass judgement on your suggestion, but if you took a look at homebrew itself you'd be able to make better non-generic suggestions.

Re: your suggestion - It's a generically seemingly good thing to separate out fetch from install, but as a user of homebrew, it's not very applicable because when homebrewing things you're likely already connected to the internet, and it's hard to predict when you want to brew install something before hand. If you have the internet capacity to fetch something, it'd be just as easy to brew install it there on the spot.

By extension, it's important to run brew update before installing just to make sure the package index is up to date, so I agree with the dev above, integrating brew update step before brew install would be a good thing - except - perhaps print out on console the exact version number that's going to be installed. Current behaviour does put the version # in the file name of the package being installed, but it could be listed in a more obvious way.

Often times I do a brew info, find the version and details on it before brew install. If the installation step then installs a new version (because of the brew update step), then it's a bit strange that I didn't get the version I was intending to get.

It's not always that easy, I've found myself having to go to a place with high-bandwidth internet to fetch stuff, and being able to stay there only a few minutes. Or before boarding a plane(you don't really want to pay for internet on an intercontinental flight), etc... Also, while you might do things as-needed, it's not that difficult to always remember to run an update every time you run brew. And I wasn't even proposing to make it the default, just to make it possible to run those 3 things separately.

All I'm saying is that you really need to investigate into the things you are suggesting. Please do the investigation, even if cursory before you jump in.

I am guilty of the same, but in my case, I have an installation I can actually check. You can read the source code if you don't have a mac to work with, but the important thing is this already exists

from the brew man page:


fetch [--force] [-v] [--devel|--HEAD] [--deps] [--build-from-source|--force-bottle] formulae

Download the source packages for the given formulae. For tarballs, also print SHA-1 and SHA-256 checksums.

I don't get it... If you've never used homebrew, and are only really a user of other package managers, why are you attempting to offer advice on what they've been doing for years.

Is this classic "I have an opinion and all of my opinions are great, so I must pontificate!"?

Maybe you should just trust that the developers of the only successful package manager for OS X have some idea of what their users want and need... and that as someone who has NEVER used their software, and not a seasoned veteran of any sort of similar projects, your opinion counts WAY less than any of their actual users.

Because I'm interested in package managers in general, and I'd like them to improve across operating systems. And I am a seasoned(9 years) veteran of several Linux distributions.

A random, off topic reply to a comment on Hacker News by one of the maintainers probably isn't the best way to make suggestions.

We do actually do that: `brew update`, `brew fetch` and `brew install` commands are separate. If you haven't already fetched the sources `brew install` will do them for you. What I'm considering is also making it do `brew update` for you too.

brew update is being called over and over again in most use cases - if only to check for example, if the latest openssl release patching a critical exploit is out yet. (of course on the mac openssl is packaged with the system, but if you are using the brew version, or are compiling other software against openssl from brew, you'll need to check for updates diligently)

It is also being used in scripts, etc. Since from the user's point of view it's a no-op if there are no updates, there is no reason not to do it on a schedule.

It's also being used in scripts. In our Mac CI environment (for iOS builds) we update homebrew to have the latest xctool.

When a package manager monopolizes that much resources from Github at the expense of others there is no reason to "commit" all resources to this one project. Thus cocoapods getting rate limited because of the obvious bandwidth abuse going on here. mhagger answer is pretty straight forward.

EDIT: the upside is that cocoapods will have to either rethink there architecture in order to eat less resources or move to their own paid infrastructure because their package manager will soon be less than functional given the aggressive rate limiting github is performing.

> EDIT: the upside is that cocoapods will have to either rethink there architecture in order to eat less resources or move to their own paid infrastructure because their package manager will soon be less than functional given the aggressive rate limiting github is performing.

I'd like to see both happen:

* CocoaPods refactoring to be more efficient

* GitHub providing open source projects the option to buy reserved capacity if they're using excessive resources (versus just saying "No").

    > GitHub providing open source projects
    > the option to buy reserved capacity.
I have no affiliation with GitHub, but I'd guess that if you were paying for one of their $200/month organization plans[1] you'd be having a very different conversation with them about rate limiting.

1. https://github.com/pricing

I would be interested if any of the top five open source projects consuming the most resources are paying Github anything.

They probably aren't, but they aren't going to be using the infrastructure in the pathological way CocoaPods is either, which requires you to have a client that uses GitHub on behalf of your users.

I'm just pointing out that the feature you're wishing exists very likely already exists in practice. Unless GitHub is stupid they aren't going to be complaining about you pegging 5 CPU cores for $200/month.

How much cash do the top five open source projects bring in? That's the other side of it. Funding a side project, let alone a large open source project, is hard

The top five open source projects on Github can't bring in $200/month each?

GitHub already has paid accounts. There's nothing I'm aware of that would prevent an open source project from paying for one.

That response was sheer excellence except maybe it was too nice about how ridiculous the situation was. I'm pretty diplomatic on job but an aggressive freeloader braggjng about what tge damage saves them would try my patience.

GitHub people are truly going above and beyond in service even when barely warranted. I'll give them that.

Agreed. Perhaps better even would be an automatic message that says that rate-limiting is in effect, explaining the reasons.

From CocoaPods.org:

> CocoaPods is a dependency manager for Swift and Objective-C Cocoa projects. It has over ten thousand libraries and can help you scale your projects elegantly.

The developer response:

> [As CocoaPods developers] Scaling and operating this repo is actually quite simple for us as CocoaPods developers whom do not want to take on the burden of having to maintain a cloud service around the clock (users in all time zones) or, frankly, at all. Trying to have a few devs do this, possibly in their spare-time, is a sure way to burn them out. And then there’s also the funding aspect to such a service.


So they want to be the go-to scaling solution, but they don't want to have to spend any time thinking about how to scale anything. It should just happen. Other people have free scalable services, they should just hand over their resources.

Thank goodness Github thought about these kinds of cases from the beginning and instituted automatic rate limiting. Having an entire end user base use git to sync up a 16K+ directory tree is not a good idea in the first place. The developers should have long since been thinking about a more efficient solution.

It seems particularly galling that their response to GitHub was to essentially throw their hands up and say "We don't want to change anything, fix it for us". I think GitHub had a near perfect response to this, they analyzed the problem, came up with a set of changes that could be made to help fix it (both short and long term), and pointed to steps they've taken to help out. CocoaPods on the other hand (or at least one of their developers) did not handle this particularly well. When presented with the evidence of why they were seeing slow responses and long queues and suggestions of how to fix it, they complained that they didn't want to fix it and didn't have the time or resources to do so.

Honestly if I was GitHub, I'd be tempted to just increase the throttling on CocoaPods and call it done, it isn't their problem if the users of that project have a bad experience. GitHub has provided solutions to the problem, it's CocoaPods that's resisting implementing those solutions.

Yeah, I'd have to agree. I was not at all impressed by the CocoaPods response here, especially since it was made clear by the GitHub staff that CocoaPods is using up a lot of CPU and terabytes of bandwidth. If you get all that for free, I'd expect you to be a little more open to changes that make it easier for your provider to continue giving you all that for free.

A later comment from @alloy was a bit more gracious about this https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm..., but I agree, it wasn't a good look.

I think that's pretty unfair. It's really obvious that the initial reply didn't really understand what was going on, and what was being explained. A couple followup additional explanations later, the same dev grokked the problem, CocoaPods' responsibility for the problem, and outlined a list of how they're going to solve it. Seemed to me to be a pretty nice example of professional and helpful candor between GH and an OSS project working to figure out a long-term solution.

I don't know, maybe I'm being overly pessimistic here, but to me it just screams of backpedaling once they saw the reaction they were receiving in this thread. The position shifted from "it's the way we architected things, how can you fix this for us" to "okay, here's some things we can do" pretty quickly and dramatically when the HN thread went up and people were reacting to the response. Cocoapods is using Github resources for free, so the appropriate response from the start should have been what it eventually came down to, not pushing back on Github because they don't want to invest in an actual CDN solution. But, as I said, maybe I'm being overly pessimistic in my analysis here, that's just how it came off to me.

I get where you're coming from. I also had a similar initial reaction. However, as I read through the subsequent discussion, it began to read as though the commenter was really not grokking the problem—and, more importantly, what to do to fix it. I thought it was very impressive that none of the GH participants reacted like some of the HN commenters here. Instead, they showed a great deal of patience and restraint in fully explaining the technical details, offering actionable solutions, and keeping everything very civil and supportive. Then the same guy who sounded like he was possibly being a jerk came back and sounded totally different because he seemed to actually know what to do to fix his project. Maybe the CP commenter read this HN thread and reacted to it, but I'll admit HN is the last place I'd think of finding one of my GH issues discussed.

Perhaps I'm just being too charitable. Either way, the project rather rapidly seemed to come to the right conclusion and jump on board fixing their problem.

On a related note, I feel like this issue could be turned into a great teachable moment for OSS projects; one agH could use as a tech blog and guides for how to be a good citizen and avoid things that can make your project get rate-limited without you knowing.

Yeah, I really just think it went a bit too far in the other direction and overcompensated somewhat, which is what was giving me that view. The comment with the heart emoji really stood out to me as a "huh, this might be because of HN" since it basically touched on exactly what was being criticized in here, that they weren't really appreciating what GitHub was providing for free. That said, I can totally see it just that alloy realized it on his own and wanted to make it clear. It's just that the timing of it all and the fact that it's hitting the same point kind of led me to believe that it was a reaction.

Obviously, that's not to say the sentiment isn't genuine. The eventual conclusion makes it seem that yeah, they do appreciate what GH is providing and are trying to make it less strenuous on the servers to get a better experience all round. Making it work well is really in their best interests since the users are seeing a degraded experience until something can be done about it. Definitely also happy that the right conclusion was eventually reached.

I don't understand how you can read "our use of git as our package manager is better than some other package managers we won't name :wink:" is an example of not understanding the problem. It looks more like someone who doesn't want to accept that the problem is on their end.

To me, that comment shows precisely that someone isn't really understanding the problem in full. It feels to me to be a deflection—of responsibility, sure, but also of admitting one doesn't understand what's really going on, and how one is at fault.

Given how rapidly the same commenter changed gears, it strikes me as plausible there was an "ohhhhh eureka" moment, and suddenly the guy got it. His followup comments began dealing with the problem after a couple other GH participants explained further what was happening and why (as well as some actionable steps to take to correct the problem for good).

But perhaps I'm being too charitable.

> It's really obvious that the initial reply didn't really understand what was going on, and what was being explained.

If you are in such a position, then it seems like the best course action would be to ask questions rather than list off reasons that you don't want to deal with it.

Sure. If you know that you don't know what's going on. I walked away with the impression that the guy didn't actually know that.

If he can't understand that his project's resources are consuming 5 whole nodes and terabytes of throughput on Github's infrastructure, then I question his skills as a developer. Even if all of the other technical details are completely obtuse to him, he should at the very least be able to understand the sheer scope of the resources their project is consuming on Github's infrastructure.

Well, the flip side is that the CocoaPods developers are all volunteers (right?). They aren't really deriving any benefit out of the work they do on CocoaPods, and if you ask them to take on financial or ongoing maintenance obligations for a volunteer project, they probably just won't do it. The major benefits of CocoaPods existence go to iOS developers, but there's a tragedy of the commons effect here, where no individual developer is willing to pony up money for the extra convenience that CocoaPods offers.

I think that long-term, the solution will be the Swift Package Manager, and CocoaPods will just be deprecated in favor of it. Let Apple host iOS packages; they're the ones that gain the most benefit from easy iOS development; they have the developer expertise, and the hosting costs are a drop in the bucket compared to iCloud & CloudKit. But that's not all that helpful for people who need an Objective-C package now.

> They aren't really deriving any benefit out of the work they do on CocoaPods

I don't think working on CocoaPods is an altruistic endeavor. I imagine (know) that some of the cocoapods folks are app developers and ostensibly CocoaPods makes developing applications easier.

Side Note: its not a tragedy of the commons. Github owns the infrastructure and they enforced their private property rights by rate limiting a group of users that were disproportionately using resources. It is a collective action problem for CP users.

> They aren't really deriving any benefit out of the work they do on CocoaPods

No direct financial benefit, but they are deriving a benefit out of their work.

Presumably - otherwise they wouldn't be doing it - but it often doesn't take all that much to flip a volunteer from "Okay, this is cool, I can help other people and learn some stuff as well" to "Fuck this, it's way more trouble than it's worth." Top amongst this is when the people you're helping expect you to give them free work.

Jobs compete with other jobs, and most people expect that they'll have to do some unpleasant things in their job. Open-source & volunteer work competes with hobbies, and there are many hobbies where you never need to deal with demands, unexpected work, and interpersonal drama.

It also doesn't take much to move a company from "Okay, we'll help you by hosting your shit for free" to "Fuck you, you're banned since you are ungrateful bastards" (in nicer language of course).

I agree, and I think the GitHub employees who commented on the thread have been really patient, and that it's impressive that GitHub as a company has tolerated and supported this use case.

My point, though, is that it's not the CocoaPods developers who are ungrateful bastards. It's any Hacker News commenter here who also uses CocoaPods. If you think this behavior is insane, submit a pull request.

Ugh, I'm skeptical of giving Apple ownership of any kind of developer tool. We all saw how badly they screwed up TestFlight, and now you want to give them the only OSS package manager?

Amen to that. Fortunately the Swift Package manager (Which is also OSS) is ran by their open source swift team who so far is doing a great job of being open and delivering the best solutions for the job.

TestFlight on the other hand....

If you're upset about Apple TestFlight, why not just use HockeyApp?

Haven't tried HockeyApp. I use Crashlytics Beta now and it's amazing. Literally "press of a button" deployment: no dealing with provisioning profiles, device UUIDs, or any of that garbage. Just build and deploy, and all your testers get the update instantly!

It's very "sharing economy": someone else has a resource you can use for free, so why not take it?

(Edit: </rhetorical> </sarcasm>)

Because abusing that resource makes sure you (or anyone else) won't be able to take it in the future. I thought that's pretty obvious?

This may very well be the first time that Cocoa Pods was told that they're consuming a huge amount of resources. They might not have even known that what they're doing is considered abuse.

hard for me to believe no one was ever curious enough to do the math

People are starting to get used to the idea that you hit someones api and they scale for you. I mean from a company based on scaling that's strange but there's a lot of 'magic' that goes into a lot of providers being able to always respond to an API call and most don't see it.

Based on just looking at how some of my employer's customers use our service, plenty of them are completely clueless that they're well outside of normal usage patterns.

I think the parent was expressing the project owners' thought, "we could do this, why not," and not saying they agreed with it.

You're assuming that people are willing to defer gratification today in order to ensure long-term sustainability. To illustrate why this might be a problematic assumption, I would point to all of human history.

pjc50's question was rhetorical, and was offering a critique of the "sharing economy" mindset. That it should be pretty obvious is exactly the point.

OT (from primary discussion): This isn't the sharing economy. This is someone abusing a resource.

I take your use of "sharing economy </r></s>" as post-uber/airbnb/etc, instead of, as I first heard it, from couchsurfing, potlucks, bittorrent, etc well before that.

Also while nit-picking, I would clear up your use of "for free </r></s>" as "as a freebie", again post-[insert: x̄ȳz, inc].

It seems like a classic example of ignoring the negative externalities. Luckily, we live in a connected world where it is sometimes easy to trace the after-effects.

I understand the desire to personally maintain as few of one's own servers as possible, but when the result is negative effects on the service hosting the project and a worse experience for the end-user, it might be time to start looking over what google cloud offers.

1) I don't think they mean "scale your projects elegantly" in a "distribute your project to millions of customers" sense, but rather in a "add lots of libraries and not have it be a hassle" sense.

2) It makes perfect sense to let GitHub handle the performance hit until issues arise. Premature optimization is the devil, right? But once there start to be issues, it's definitely unfair to turn around and say "well you offer the service for free, so you should fix it"

Yea pretty sure they definitely mean "Add More Libraries" when they say "scaling", it wouldn't make sense otherwise given what CocoaPods does...

It was pretty shocking to see a couple of the responses. If point is to be a package manager then they should see this as important part of the project and not just see it as Github's duty to provide this free service. Basically they are saying their service cannot exist with Github or someone will to expend the, apparently fairly significant, capital to provide the backend to their service. And this is all happening at a time when many people have figured out how to provide package management services in a reasonable way.

I think the word "scale" is used to mean different things in the two sentences you quoted. In the first case, "scale" is about package management and dependency tracking, helping an individual software project approach large scale (many third-party dependencies with possibly conflicting requirements, new developers who need to get up-to-speed quickly, etc.). In the second case, "scaling" is about distribution of the CocoaPods metadata to large numbers of users, each with their own (possibly small) software project.

Sentence 1 would still be true if CocoaPods was only used by ten companies developing the ten biggest (in terms of lines of code) Objective-C projects, but there would no longer be a need to scale in the sense of sentence 2.

This reply: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

"Not having to develop a system that somehow syncs required data at all means we get to spend more time on the work that matters more to us, in this case. (i.e. funding of dev hours)"

In other words, using github as a free unlimited CDN lets them be as inefficient as they like. Such as having 16k entries in a directory ( https://github.com/CocoaPods/Specs/tree/master/Specs ) which every user downloads.

Package management and sync seems to suffer really badly from NIH. Dpkg is over 20 years old and yum is over a decade old. What's up with this particular wheel that people keep reinventing it seemingly without improvement?

Debian's sync may be nicer, but their client-side solution leaves a bit to be desired.

Trivial apt operations (e.g. trying to install a package which is already installed) on an NSLU2 (an ancient 266MHz ARM machine) take several minutes, whereas the same operation takes several seconds on a modern laptop.

It turns out this is due to the fact that Debian "main" (Packages.gz) has ballooned to 32MB of plain text when uncompressed, comprising more than 41,000 packages, and it has to be parsed and assembled into a dependency tree for every apt operation. This problem screams for SQLite.

A side project I've started looking into is to make a transparent apt proxy which provides a trimmed down Packages.gz (e.g., removing anything which uses X11), which would be a lot easier that rewriting apt to use a SQLite backend.

> This problem screams for SQLite.

This is precisely why yum/dnf has been switching from XML for repodata to SQLite. In fact, the only thing that is still XML-only is the comps file which just lists package groups, is updated rarely and "only" weighs in at half a MB.

> Trivial apt operations (e.g. trying to install a package which is already installed) on an NSLU2 (an ancient 266MHz ARM machine) take several minutes

Actually, I'm surprised one can actually run a modern Linux on the NSLU2 given its shameful lack of RAM and slooow USB port. But it was a nice gadget when it came out and it was fun to experiment with it.

> It turns out this is due to the fact that Debian "main" (Packages.gz) has ballooned to 32MB of plain text when uncompressed, comprising more than 41,000 packages, and it has to be parsed and assembled into a dependency tree for every apt operation. This problem screams for SQLite.

Correct me if I'm wrong, but isn't apt (and dpkg) basically composed out of a ton of different (perl/shellscript) modules? So it should be possible to create an interface-compatible sqlite data store.

Actually the part which parses the packages file is a bunch of c++

> A side project I've started looking into...

Wouldn't cleaning up the package search interface be a similar effort with much greater payoff?

It isn't just package search which is the problem, it is everything which has to parse the packages file, which is basically every apt command. So if you can trim "main" down to 10,000 packages, suddenly every part of apt is faster, and no one has to install any custom apt replacements.

But wouldn't just storing that file as a database solve most of the issues?

Yes, and that would be awesome, but it means rewriting apt.

Arch and Debian contributors have tried a good approach for package management..

0: p2pacman - Bittorrent powered pacman wrapper

1: pacman & torrent, feasible?

2: DebTorrent

(0) https://bbs.archlinux.org/viewtopic.php?id=163362

(1) https://bbs.archlinux.org/viewtopic.php?id=115731

(2) https://wiki.debian.org/DebTorrent

I believe scaling this could happen with either: 1) lightweight filesystem\directory versioning support, like how btrfs allows you to mount snapshots. This way, peers could update whichever version of a torrent they have. Or 2) very reliable means to update to the latest torrent release (as reliable as syncing with peers), which afaict means smarter bittorrent clients that can perform DHT-based "crawling". Those recent defcon(?) "hacks" to query peers for similar torrents based on user pools and connection histories (or something like that) would make sense here.

A cool side-note: In one of my few experiences diving into `.git`, I diff'd it before and after making changes to its tracked sources, like adding files and modifying them. It looked like a torrent that included version control data would make out just fine if an updated torrent expected similar data in the same location. Again, a smarter bittorrent client would need to sort some of this out. See also 0': Updating Torrents Via Feed URL. Anyway, most users would probably leave that part out, in favor of only which parts they need.

(0') http://www.bittorrent.org/beps/bep_0039.html

Another cool side-note: This would also allow for easily adding repos from multiple sources... Look at how many ( non-automated :-( ) merge requests com.github/CocoPods/Specs's caregivers have reviewed: 13,331 as of now (0'').

(0'') https://github.com/CocoaPods/Specs/pulls?q=is%3Apr+is%3Aclos...

> Arch and Debian contributors have tried a good approach for package management..

> 0: p2pacman - Bittorrent powered pacman wrapper

> 1: pacman & torrent, feasible?

> 2: DebTorrent

That's about distributing packages via p2p. The problematic repository doesn't store any package data, it stores package metadata (it's the cocoapods index if you will).

>~ package metadata, not package data

metadata != data ??

You got it.

I see metadata very much as "regular" data (in terms of needs and tooling); practically speaking, even from the same-ish data set. Simply put, it just looks different.. above the surface.

As a data point, "dnf" is the successor to yum. Started using it recently with Fedora 23... and it's pretty decent.

(It may not have been earlier on, I really don't know. ;>)

Something nifty about the new dnf is several of the older yum commands (eg builddep, yum-downloader) are now integrated directly so don't need extra utils installed. Seems like refinement is still happening.

If only my fingers didn't keep typing "dns" instead of "dnf" all the time, it would be great. :D

I just keep typing "yum", since it's an alias of dnf (albeit with an annoying nag message) and since I work on CentOS servers a lot and am automatically used to typing "yum".

There's also zypper from SUSE/openSUSE.

Perhaps because Cocoapods is not an OS package manager or anything close to it. It installs libraries within the context of an XCode project, regardless of the host system or what is installed for other projects.

It is a package manager, though? The fundamental idea of downloading a list of available options of which the user picks some, and the system pulls in dependencies, is almost exactly how dpkg and yum work. The location to which the packages are installed is a detail.

A language package manager must be able to "install" the same packages over and over again (and possibly "install" multiple versions of the same package in the same environment), and the ability to push packages is generally considered part of their duty, not so for OS package managers, you don't use dpkg to send a package to debian's repositories.

> The fundamental idea of downloading a list of available options of which the user picks some, and the system pulls in dependencies, is almost exactly how dpkg and yum work.

If you reduce it to the fundamentals you don't need yum or dpkg either to do that, just a dependency solver and curl.

The fundamentals for package management also move the package into it's final resting place(s), where it's going to do it's work. Curl doesn't do that, it just gives you a single file somewhere.

I'd also consider removing a package to be a fundamental part of a manager. The two items you describe would be a 'package grabber'.

CocoaPods is much more akin to RubyGems or PyPI or CPAN, all of which are established as useful tools outside of OS level package managers. There's a need for a iOS/Cocoa package manager (that understands Xcode!) and CocoaPods has so far been the most successful.

A package manager for a project can be the same os package manager with reduced dependency tree and default to install with a prefix (that is the project root or the vendor directory).

Care to substantiate that last paragraph? Are you really suggesting OS X users use yum?

Why not? dpkg is already been used by jailbroken devices in Cydia.

Actually it seems very likely that one or more of the popular linux distro package manager ecosystems would fare well on other OSs. Arch Linux's pacman was ported to Windows, e.g..

pacman has been ported to OS X a few times


It's absolutely possible to install yum on OS X (I know this from experience)

Why not? A friend of mine was employed by a Very Large Company at one point to (amongst other things) maintain their AIX port of rpm/yum.

Not literally yum, but something with a similar design rather than abusing a git repo.

I help run Mozilla's version control infrastructure and the problems described by the GitHub engineer have been known to me for years. Concerns over scaling Git servers are one of the reasons I am extremely reluctant to see Mozilla support a high volume Git server to support Firefox development.

Fortunately for us, Firefox is canonically hosted in Mercurial. So, I implemented support in Mercurial for transparently cloning from server-advertised pre-generated static files. For hg.mozilla.org, we're serving >1TB/day from a CDN. Our server CPU load has fallen off a cliff, allowing us to scale hg.mozilla.org cheaply. Additionally, consumers around the globe now clone faster and more reliably since they are using a global CDN instead of hitting servers on the USA west coast!

If you have Mercurial 3.7 installed, `hg clone https://hg.mozilla.org/mozilla-central` will automatically clone from a CDN and our servers will incur maybe 5s of CPU time to service that clone. Before, they were taking minutes of CPU time to repackage server data in an optimal format for the client (very similar to the repack operation that Git servers perform).

More technical details and instructions on deploying this are documented in Mercurial itself: https://selenic.com/repo/hg/file/9974b8236cac/hgext/clonebun.... You can see a list of Mozilla's advertised bundles at https://hg.cdn.mozilla.net/ and what a manifest looks like on the server at https://hg.mozilla.org/mozilla-central?cmd=clonebundles.

A number of months ago I saw talk on the Git mailing list about implementing a similar feature (which would likely save GitHub in this scenario). But I don't believe it has manifested into patches. Hopefully GitHub (or any large Git hosting provider) realizes the benefits of this feature and implements it.

Wow, this is pretty cool. Reminds me of the performance optimizations Facebook has done with Mercurial: https://code.facebook.com/posts/218678814984400/scaling-merc...

Mercurial was designed to be easy to extend, and it shows.

Git was created and designed to support Linus' workflow when developing the Linux kernel.

Hg was designed to be a DVCS system.

GitHub claim in this thread that they pay about 1s for a full clone without a caching CDN, due to their bitmap indexing patches.

Wow, really impressive response from GitHub. The right amount of technical detail coupled with balanced tone--halfway between "we support you" and "you make us crazy."

One correction to the post title: it's not maxing five nodes, but five CPUs.

Yeah, 5 cpu's is an order of magnitude difference. ;)

Wait, so the CPU isn't the big white tower sitting under my desk?!

According to my mother, that's "The hard drive".

Ok, we replaced "nodes" with "server CPUs" in the title.

I keep coming back to point #4 - who ever thought that 16k objects in a single directory would be a good idea? Ever since FAT that's been a bad idea, and while modern FSes will handle it without completely melting down it's still going to cause long access operations on anything to do with it.

Even Finder or `ls` will have trouble with that, and anything with * is almost certainly going to fail. Is the use-case for this something that refers to each library directly, such that nobody ever lists or searches all 16k entries?

I do think that your last sentence is the answer: if you're using a package manager instead of working with the directory heavily, this isn't a visible problem which is going to motivate people to work on it.

The other side to consider: “one directory per package” is a very simple policy and it feels right in many ways to people (e.g. Homebrew has a similar structure because it's a natural fit for the domain). If the filesystem and basic tools like ls work just fine (which is certainly the case on OS X, where even "ls -l" or the Finder take less than a second on a directory of that size), isn't there a valid argument that the answer should be some combination of fixing tools which don't handle that well or encouraging people to learn about things like `find` instead of using wildcards which match huge numbers of files?

One directory per package is completely sensible, just not all in one bunch. It's even fine if the mapping is to a flat namespace at something like the HTTP level - I can mod_rewrite /abcdefg to /a/b/c/abcdefg no problem. My only objection is to file- or directory-level structures that are this flat. I might be mentally deficient, but I can't even process anything that's structured this way.

As loathe as I am to admit anything about Perl is good, CPAN got this right. 161k packages by 12k authors, grouped by A/AU/AUTHOR/Module. That even gives you the added bonus of authorship attribution. Debian splits in a similar way as well, /pool/BRANCH/M/Module/ and even /pool/BRANCH/libM/Module/ as a special case.

Tooling can be considered part of the problem in this case. Because the tooling hides the implementation, nobody (in the project) noticed just how bad it was. I hadn't seen modern FS performance on something of this scale, apparently everything I've worked with has been either much smaller or much larger. Ext4 (and I assume HFS+) is crazy-fast for either `ls -l` or `find` on that repo.

It seems like tooling is part of the solution as well, but from the `git` side. Having "weird" behavior for a tool that's so integral to so many projects scares me a little, but it's awesome that Github has (and uses) enough resources to identify and address such weirdness.

My (perhaps naive) thoughts on this are - suppose a 16k-packages-in-one-directory solution were just as fast as a 16k-packages-sharded-by-prefix (the CPAN solution), then the former is conceptually simpler and so should be preferred. And the fact that you can mechanically transform one structure to the other means that the filesystem (or git) should be able to transparently do it for you (eg use the sharded approach as a hidden implementation, while the end user sees a flat directory). This seems to be similar to what ext4 does (https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash...).

The obvious question is how would you implement that. You might argue (as you should) that git has closer semantics to a filesystem than version control. But actually implementing this sharding would require git be a kernel module. Hardlinks and softlinks won't save you because they are both still dentries and thus have the same performance pathology. Maybe you could do it with fuse, but what have you gained by making your version control system even more annoying to use?

It's one of those shouldn't-be-arcane-but-somehow-is pieces of knowledge. Almost every job I've had I've ended up speaking to developers about more efficient file storage when I find yet another "shove everything in a single directory" implementation.

The criticism against CocoaPods here seems awfully harsh.

Think about it from their perspective. GitHub advertises a free service, and encourages using it. Partly it's free because it's a loss leader for their paid offerings, and partly it's free because free usage is effectively advertising GitHub. CocoaPods builds builds their project on this free service, and everything is fine for years.

Then one day things start failing mysteriously. It looks like GitHub is down, except GitHub isn't reporting any problems, and other repositories aren't affected.

After lots of headscratching, GitHub gets in touch and says: you're using a ton of resources, we're rate limiting you, you're using git wrong, and you shouldn't even be using git.

That's going to be a bit of a shock! Everything seemed fine, then suddenly it turns out you've been a major problem for a while, but nobody bothered to tell you. And now you're in hair-on-fire mode because it's reached the point where the rate-limiting is making things fail, and nobody told you about any of these problems before they reached a crisis point.

It strikes me as extremely unreasonable to expect a group to avoid abusing a free service when nobody tells them that it's abuse, and as far as they know they're using it in a way that's accepted and encouraged. If somebody is doing something you don't like and you want them to stop, you have to tell them, or nothing will happen!

I'm not blaming GitHub here either. I'm sure they didn't make this a surprise on purpose, and they have a ton of other stuff going on. This looks like one of those things where nobody's really to blame, it's just an unfortunate thing that happened.

(And just to be clear, I don't have much of a dog in this fight on either side. My only real exposure to CocoaPods is having people occasionally bug me to tag my open source repositories to make them easier to incorporate into CocoaPods. I use GitHub for various things like I imagine most of us do, but am not particularly attached to them.)

I think Github's response was about as good as it could be. In hindsight, they probably should have contacted CocoaPods when they pegged one CPU. And they could have given the same general solution to Homebrew and others.

With respect to CocoaPods, I would hope someone on the team had thought through performance characteristics of their architecture.

It's like they brought a shopping cart onto a city bus and were then surprised that it inconvenienced the bus driver and the other passengers.

It's more like bringing a shopping cart onto a city bus, when the bus company said "bring all your stuff! we love it!" doing this for years with no problem, the bus driver says nothing, and then one day the bus driver says "hey, you've been causing a ton of problems with that shopping cart, you need to stop." Surprise seems entirely warranted.

I can't seem to find any posting by GitHub saying "yes! please use our free service as your git-based package manager's backend!" Advertising "host your code and assets with us" doesn't suddenly mean that it's justified to say "fuck it, GitHub can be our CDN".

Obvious in hindsight, but if you grew up from a little project to a big one, built so that your "users" are cloning your git repository, is it really clear that you've transitioned from "hosting source code" to "using it as a CDN" sometime along the way?

It's not like these guys thought, "Well, we really should use some dedicated high-end host for all our traffic, but we'll use GitHub because it's easier."

On the flip side, user 'alloy' gives the response that their decision to use github as a CDN was an explicit decision. In designing a product to scale, they apparently explicitly decided to outsource the 'scaling' part. While it may have been surprising to them, I don't think it should have been so surprising.

> It strikes me as extremely unreasonable to expect a group to avoid abusing a free service when nobody tells them that it's abuse

I don't think so at all. An experienced developer should expect that a free service will rate-limit their offerings at some point, and design around that. Viewing 'free' as 'an eternal resource sponge that we never have to think about' is the extremely unreasonable thing to do, in my opinion. I think that 'abuse' is probably the wrong word to use here, since that implies malice, and they don't appear to be malicious.

I have never seen anywhere that GitHub advertises using them as a CDN.

GitHub is for source control. That means a limited number of people pulling and submitting changes. That does not mean the general public using it as a CDN.

In fact I seem to remember seeing somewhere active discouragement of using it as a CDN.

They advertise their CDN for user/organization pages. I've always been a little bothered that they have you use got for that.

That's fair, but they're really advertising a specific feature. That is, statically generated sites hosted based on a specific branch in a repository. Nowhere do they advertise themselves as a CDN in the way CocoaPods is using them now.

I entirely agree with this. GitHub gets so much advertising + community from open source projects like this.

Also, I'm amazed this is even a problem. 5 CPUs is not a lot in the scheme of things (even if they mean physical instead of cores). TBs of bandwidth are also virtually free compared to a company the size of Github.

Even better: they are getting basically real world loadtested for free and finding loads of pain points, which may hit paying customers.

Unless I'm missing something, fire more metal at the problem. Many companies would love to be able to have every single cocoapod user (which is nearly every iOS developer) have to type github.com into their terminal for the cost of a bunch of servers + some bandwidth.

Pretty strange, unless this is hitting some really bad area of their service that can't easily be scaled out of (but i would be surprised)

>>Even better: they are getting basically real world loadtested for free and finding loads of pain points, which may hit paying customers.

I think their point is that it's using the system in a way that isn't intended or desired. How does that count as "real world" load testing?

And by that logic, shouldn't anybody who gets hit with a DoS attack just say "thanks"? It's tons of free load testing on your network infrastructure, and you'll definitely find some pain points.

They are not telling them to stop using GitHub, they are giving them advice on making it work better.

It's totally reasonable to host your code on github and to build a package manager that loads the content of a package from it's github repo.

What seems insane is to use a single github repo as the universal directory of packages and their versions driving your package manager.

There's a reason rubygems has their own servers and web services to support this use case for the central library registry, even if the source for gems are all individually projects hosted on github.

I assume they modelled it after Homebrew, which has been working fine doing exactly that for the last 7 years.

That only has 3,000 packages vs 15,000 for CocoaPods or 115,000 for RubyGems.

In case somebody is interested in such figures (I certainly am) - NPM has 249,838 as of today [0].

[0]: https://www.npmjs.com

I wonder whether you could use a DHT for the package directory.

> Scaling and operating this repo is actually quite simple for us as CocoaPods developers whom do not want to take on the burden of having to maintain a cloud service around the clock (users in all time zones) or, frankly, at all.

The CocoaPods developers seem to be missing the entire point of git: it's a _distributed_ revision control system.

Setup a post-recieve hook on Github to notify another server, that is setup with a basic installation of git, to pull from Github so as to mirror the master repo. Then, have your client program randomly choose one of these servers to pull from at the start of an operation. Simple load balancer to solve this problem.

Rackspace is also known to sponsor significant resources for larger projects whom ask nicely. GlusterFS is one I used to be involved with doing this, and there are definitely others.

If CocoaPods reach out to Rackspace and/or other hosting providers, there's a decent chance they'll be able to pull together a good solution. :)

The downside though, is they'll need to figure out some way to keep it monitored/maintained. :/

Last I checked, Rackspace wasn't accepting any more projects.

I find it amusing how GitHub's contact[1] form has (probably a recent addition):

> GitHub Support is unable to help with issues specific to CocoaPods/CocoaPods.


[1]: https://github.com/contact

I think that contact page remembers the last repo you visited. I went to it in incognito mode and is wasn't there.

That's a pretty neat feature!

Looks like this shows the last project you looked at before heading to the contact page. I refreshed the leaf repo and then refreshed the contact page and it mentioned the leaf project at the top.

It's pretty nifty - it seems to pull in from your history, and then show `user/repo` for the most recent one you've looked at.

They seem to be doing this based on the last repository visited (for some popular once). Try visiting another repository before going to the contacts page and the message will change or disappear.

I'm seeing something similar, only with Carthage instead of CocoaPods

CocoaPods (and Homebrew) mainly exist because of a lack of tooling in the typical Apple ecosystem. So I would blame Apple for not supporting the community with money or tooling. Letting GitHub with its limited amount of funding pay the bill isn't a nice move. Apple dev relations should throw some money at GitHub so they can provide some dedicated resources or offer to pay the cost of other solutions (like a 3rd party CDN/AWS/Google Cloud/…).

CocoaPods exists because developers want to learn how to "build apps" but lack the resources to intelligently include and link to 3rd party code in their projects. CocoaPods doesn't enable anything not otherwise configurable via git submodules and Xcode project hierarchies / build settings.

Therefore, it's not Apple's problem. In fact, I've talked to a non-trivial amount of engineers (both in Cupertino and long time Cocoa devs) that disapprove of the shortcuts that Cocoapods takes all over, software architecture be damned. Reasonable parties can agree to disagree, but I do include 3rd party framework inclusion without a dependency manager as an interview screen for prospective iOS hires.

Since you mention developer relations, I'll assume you're not actually arguing that this is Apple's technical responsibility, but that they should throw around some $ to grease the wheels to make dependency inclusion better. As a platform vendor, funding hosting costs for some project that you don't agree with just to "support the community" is a bad idea. Better idea is to allocate resources to setup a structure that can fix the issue in a technically agreeable way while also benefitting from the independence a FOSS project provides. In doing so, you are correct that it'd be preferable for Apple to fund/use well-known FOSS standards, such as Github.

In conclusion, Apple should setup a FOSS project to address the current inconveniences associated with third party package inclusion and should involve and pay Github somehow.

Oh wait...https://github.com/apple/swift-package-manager

Apple decided to block dylibs from the start of the iOS app store, and I think that friction point from how things are usually done in OSX, C & C++ land before iOS is what started the entire cocapods thing in the first place. The replacement with dynamic frameworks came about 8 years too late in ios 8.

well, same could be said about Homebrew vs. Macports (which is hosted and maintained by Apple). Afaik even a lot of Apple developers don't use macports anymore…

Afaik the swift package manager works only for swift code, so it's not an replacement.

Also it's a very bad habit to try to stand over the users that supply software for your most valuable product. We've seen a lot of stories lately that indie app development is dead. We also regularly see how weak Apple is in web services (cloud sync).

So either developers invest a lot of time to build something that works (and maybe even share it on GitHub) or they will stick with the holy Apple solution and provide a crappy user experience and go bankrupt. Companies like Google or Amazon (AWS) do a very good propag^developer releations job, IMHO way better than Apple ever did (in the last 10 years).

It's probably worth pointing out here that Apple is currently working on a package manager (https://github.com/apple/swift-package-manager).

I've always found Github's business model interesting. What if a massive open-source organization (e.g. Fedora, Apache) decided to use it for all of their development, integrating it with continuous builds and all the associated pulls. Of course this isn't likely to happen for a number of reasons, but there are large open source projects that could put a significant load on their infrastructure if they chose to use Github as their main code versioning system.

It's the same business model that Jira used in the past, when the alternatives were Mantis, Trac and Bugzilla. They had[1] one of the better and well designed issue trackers, free for OSS projects. That turned into a great way to champion adoption within paying customers' organizations.

[1] In my opinion, they have since lost that edge on the UI.

I remember seeing this new much improved UI for bug tracking on OSS, and thinking, "Thank goodness they're no longer using Bugzilla". And then using it a bit, and thinking, "I've got to get that bug tracker into the org I'm working for." Yep, it was Jira. It was a great strategy they used!

> What if a massive open-source organization (e.g. Fedora, Apache) decided to use it

This one is pretty big. https://github.com/torvalds/linux

Sure but it's just the kernel, and that's just a mirror. Linus does not use Github to manage kernel development. In fact he's been vitriolic in the past about how Github does pull requests.

I wonder how much traffic the Github Linux repo gets. Seems to me that people who want to use Linux, will go get a distro instead. And people who want to develop the kernel, will follow the kernel development process (which doesn't rely on Github).

For a period of time when Kernel.org was breached, GitHub was the repository for Linux. [1] I remember reading a review of GitHub by him shortly after. He did not like how Pull Requests or patches worked on GitHub. I'd link it but I'm having trouble finding it.

1. http://www.theregister.co.uk/2011/09/06/linus_torvalds_dumps...

This is just a mirror and not the repo the devs actually work on, so I don't think it's very taxing on the resources.

> What if a massive open-source organization (e.g. Fedora, Apache)

https://github.com/apache - Lots of mirrors but many projects use it as their main source.

I think they should do 'ok' though it won't make you wealthy: - They collect money from businesses (who pay quite well) - They collect money from private repos (like I have for lots of my config files, no, no passwords/keys in there ;) for e.g. tex files. - The large companies probably pay some form of 'support'

Presumably they'd get load-limited to the same degree and eventually be forced to move to something they control.

This bug report is a great step in the direction for GitHub. As of this comment there are 3 different GitHub staff members responding and providing feedback to the CocoaPods team. From the previous "Dear GitHub" messages and responses, this seems like perfect community involvement.

I have been seeing this trend of GitHub getting "abused" for purposes other than hosting source code.

- My school uses GitHub to host and track our software engineering project (which still can be argued as OSS).

- People using GitHub issue system as a forum.

- Friends uploading pdfs to GitHub.

- Recently people posted on HN about using GitHub to generate a status page.

I think this is a really bad trend and people should stop doing that.

The school example and friends uploading pdfs to GitHub are both uses that GitHub encourages.

Using GitHub Issues as a forum and a source for generating status pages are both ok from a use/abuse perspective, but you may not have the best experience since that isn't what Issues is intended for.

The issue tracker on new repos has a 'question' tag by default, so Github are gently encouraging using issues as a forum. Though my inner cynic says that makes sense for them - issues tie a project to Github more than the git repo itself does.

It should be fine to come up with new ways to use Github, as long as it's not causing excessive load.

GitTorrent: http://blog.printf.net/articles/2015/05/29/announcing-gittor...

Imagine a world where GitTorrent is fully developed, includes support for issue tracking, and has a nice GUI client that makes the experience on-par with browsing github.com.

I mention this not as an "Everybody bail out of GitHub and run to GitTorrent!!!" sort of statement, because I believe GitHub's response here was excellent and confidence inspiring. But it's an unnatural relationship for community supported, open source projects to host themselves on commercial platforms such as GitHub. GitHub primarily hosts them to promote its business. That's not necessarily a bad thing, but it results impedance mismatches like demonstrated here.

That isn't to say that a mature GitTorrent would replace GitHub. Rather, I envision GitHub becoming a supernode in the network, an identity provider, and a paid seed offering, all alongside their existing private repo business.

Honestly, once I scrape a few projects off my plate, I'm inclined to dive into GitTorrent, see where it's at in development, and see if I can start contributing code. It just seems like such a cool and useful idea.

My main issue with Free Software projects using GitHub is that it's proprietary, not that it's commercial. Admittedly, I think GitTorrent is a really cool idea, but I'm wondering if a distributed filesystem might be a more elegant solution than using both BitTorrent and Bitcoin.

I've never really understood CocoaPods. Dragging a framework into Xcode was never much trouble, and the amount of 3rd party libraries in a OS X / iOS project ought to be fairly small, so the gains are trivial.

The potential downsides seem much more annoying. Do you really want to have your dependencies on an overloaded central server somewhere?

Until recently iOS only supported statically linked libraries which could lead to issues if you needed to use multiple components that had a shared dependency that you needed to upgrade for one reason or another. You couldn't touch the embedded version. Additionally there was no native package manager which made sharing libraries a clumsy affair. Cocoapods makes both cases easier.

This is correct. Starting with iOS 8 they finally allowed linking against custom frameworks which is why Carthage is now becoming much more popular. CocoaPods solved a critical problem of getting dependencies linked in, while creating a new problem of having your xcodeproj file and build settings be managed by them. I'll be happy enough to drop them for new projects going forward.

> Starting with iOS 8 they finally allowed linking against custom frameworks

I don't keep up with iOS development enough to know if anything has changed with respect to static/dynamic linking in iOS 8, but it has always been possible to use custom frameworks in iOS (eg, frameworks you build yourself, unless the community has another definition for 'custom framework').

The framework directory structure is a bit unorthodox, but it's really just your statically built library (absent any '.a' suffix) alongside any header files in a Headers folder. Again, not sure if this has changed with any support for dynamic linking.

Sorry you're right I should have been more explicit, starting in iOS 8 you can use dynamically linked frameworks.

Dynamic libraries don't help shared dependencies, do they? In both cases, you want the shared dependency to be in its own library. There is a difference in that duplicated dependencies in shared libraries will produce a subtle runtime error rather than an obvious build-time error, but that's not really better.

For the sake of brevity I didn't dive as deeply into the subject as I maybe should have, but the solution you get with Cocoapods to this is if you have two Pods that point at a shared dependency in another Pod you end up with just one copy. I forget how they handle version pinning as it's been a little while since I've had to actually worry about it (I do little iOS work these days) but I seem to remember that in this case the pinned version wins and if you have a version conflict it complains during the Pod installation.

I'm sure CocoaPods handles it, but what I'm wondering about is how static versus dynamic libraries come into the picture. As far as I can tell, that has no relevance to the problem of shared dependencies.

What an unusually reasonable discussion. good on everyone.

I love how this was like the perfect storm of things that could go wrong, and how it seems like mhagger is just amazed more than anything else.

> I love how this was like the perfect storm of things that could go wrong

How so ? I bet the cocoapods team knew they were hammering Github with that gigantic repo. They just didn't care and expected Github to just give them more bandwidth, for free.

It's nice to see that they post a technical analysis of the problems with some very reasonable sounding potential fixes rather than just killing the repo.

Hardly. Do they hit edge cases? Yes! Is using GitHub for your CDN a dick move and going to cause problems, regardless? Yes!

The response from mhagger is unnecessarily apologetic, and I predict we'll see an official update from GitHub on this soon.

> Is using GitHub for your CDN a dick move and going to cause problems, regardless? Yes!

I don't know about that. Both oh-my-zsh[1] and emacs prelude[2] use git repos as their code distribution mechanisms, and that works really well. I think the real issues here are exactly what is called out in the issue: poor usage of git, and poor directory layout.

[1] https://github.com/robbyrussell/oh-my-zsh [2] https://github.com/bbatsov/prelude

They are using GitHub for their intended purpose, hosting their code. That is perfectly fine.

What is not perfectly fine is using GitHub as your package host, CocoaPods/Specs is the equivalent of Debians APT using one big GitHub repo to host all their packages. It has 92567 commits and 6872 contributors.

OTOH Homebrew also uses github as its package host and has a respectable 62000 commits and 5600 contributors and github seems to be just fine with it.

The big differences seems to be in the way they do their thing: I'm reasonably sure homebrew just git clones then updates the local repository normally[0], it has "only" 2500 files in Library/Formula, and because of its different subject it is way less write-active, CocoaPod has 1k commits/week which look to be increasing pretty much constantly, homebrew is around 350 with ups and downs.

Also not sure it matters, but homebrew has lots of commits updating existing formula, cocoapod changes are almost solely addition (publishing a new version of a package adds a new spec and doesn't touch the old one)

[0] which is exactly the bread and butter of github

The fact that Github added an API specifically to reduce the server load from Homebrew suggests that it wasn't "just fine". One of the Homebrew maintainers works for GH so they just had a much more direct route to solving the problem than with CocoaPods.

In this case it's actually more that there's a Homebrew maintainer who works for GitHub (me) who has been working on a bunch of improvements to Homebrew's update system (in my spare time). The desire for the API came from Homebrew's side rather than GitHub's and, as it reduces load for GitHub, it was a net win for both parties.

Thankfully, none of Homebrew's binaries are stored on GitHub.

Most of the changes committed to Homebrew are formulae-based; of the ~5600 contributors in Homebrew's lifetime only ~430 have contributed to the core code.

cocoapod's binaries are not stored in the problematic specs repository, cocoapods/specs only stores the equivalent of homebrew formulas (podspecs).

Prelude is using a git repo as a git repo, it absolutely isn't using it as a package manager's CDN (AFAIK it doesn't even have packages, and those packages it depends on it pulls from ELPA and MELPA)

Are either of those anywhere near as popular?

Why do you think mhagger's response is not official?

Using GitHub for a CDN would be hotlinking to a .js file from your production website, not whats happening here. Other package managers do the same, for example "bower" will clone from github, it just does normal clones.

I think you're missing my distinction here. Yes, bower will clone from GitHub, but of course only the packages you actually use; in the same sense that Go can import source from GitHub repos. These repos are storing code and using GitHub as intended.

But CocoaPods/Specs is the equivalent of someone collecting all possible packages you could ever use in Bower in one big GitHub repo.

It's only the package metadata (the spec) though, same as Homebrew, it's not like the package code is pushed to the repo.

Though I wonder if the actual project-fetching is similarly daft.

I love GitHub's response, but I would urge the project more strongly to use modern CDN solutions. CDNs are dirt cheap and incredibly powerful nowadays, for the data sizes that we're talking about here.

How would you define "dirt cheap"? One of the most popular CDN's out there (Akamai) charges $3,500/month for 10TB/month. Who's going to foot that bill? :)

That's a list price that they charge those who don't care to negotiate.

CloudFlare starts at $0 and doesn't meter/charge for bandwidth. CloudFront charges 9 cents per GB and is integrated with other AWS APIs (which can be very useful). Both those solutions could be managed with a donation pool, I would try the CloudFlare free tier first.

CloudFlare's terms of service specifically forbid using it as a file hosting service.

Not surprising, Akamai is always at the top end of the pricing spectrum for CDN services.

That price point works out to $0.35/Gbyte. More typical list pricing for US/EU is in the $0.10-0.15/Gbyte ballpark. Prices decline rapidly as your utilization approaches 1PB per month.

10TB Amazon AWS is around $85-100, depending on how international your users are.

I believe that's off by an order of magnitude - that's actually the price per 1 TB. Also, you can turn off distribution into remote PoPs that charge more than $85/TB.

I believe the problem was not the bandwith usage (which would be saved by a CDN) but the way CocoaPods is using git that generates a lot of CPU expensive operations.

Things that GitHub suggested help with that: faster check for updates, breaking up big directories so diffs are computed faster.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact