To be blunt, you're abusing the shit out of SOMEONE ELSE'S product that you're not even paying for. Your first question shouldn't be to see what Github can do for you to make it so you don't have to make changes. You should be falling over yourself investigating all available avenues for reducing load.
It's an incredibly entitled way to think about things and I would have a real hard time employing someone who's first response was like this.
I certainly don't think the barb about your willingness to employ people who write things on Github issues threads that you disagree with is helping anyone understand any part of this situation. I understand the urge to find ways to be emphatic about how much you disagree with things, and I often find myself compelled to write lines like that, but I think they're virtually always a bad idea.
It seems to be one of HN's go-to insults. "Look at this person's behavior, I would never hire them," as if everyone wants to work at your startup.
Do you really think that when people say "I would never hire that person" that there is an implication of "everyone wants to be hired by me (and by extension my company)?"
Nailed it. Just because someone, a team, or a company has hiring criteria doesn't mean they assume everyone wants to work at their company. It means they have an idea of who they are looking for.
its the difference between buying a packet of your favorite snacks and telling all your friends what your favorite snack is..
you probably expect them to like the same snack.
Additionally what if they inform me about my snack with information that means I can't morally choose it anymore, or that it's dangerous to my health? I now have the opportunity to switch my viewpoint, or reduce the weight that it has in my criteria.
It begins the discussion if you as the person starting the thread are interested in having it and not just looking to be agreed with. I, whether I'm in the minority or not, am always looking to start the dialog. Being agreed with is boring.
I wonder how many times CocoaPods has ruined someone's day/night on some GH team. I wonder how many dinners some mom or dad has missed with their kids because their service alarms are going apeshit. I don't think it's hyperbole to say that if you are a top 10 repo at Github, you are responsible for ruining individuals days and taking time away from their families if you are hammering the system.
Now, these are entirely my opinion and I'm not saying alloy is bad at what they do. I'm saying that is a collection of attitudes that I'm not going to put on my team.
Let's think this through and ask ourselves a few question.
1. Did they go out there way to do damage?
2. Are they responsible for deciding the infrastructure and how well it can handle with load?
3. Did they force said people to work at GitHub?
4. Is the open source culture and hosting a major part of GitHub business plan?
5. Are they responsible for staffing to ensure people are scheduled to work when work needs some?
That's selfish to the point that it could be a textbook example of an externality. Fortunately, like you said, things got agreeable by the end with Alloy taking simple steps he was given to make thjngs better for everyone.
That's actually not true.
https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm... is the answer we are talking about, aren't we? What alloy is doing there is:
1. Thanking mhagger for the response
2. Asking for additional explanations
3. He explains then why the project is taking the route they took, the benefits for them. Explaining alone does not mean unwillingness to change. It just makes sure it is clear why it is like it currently is. Yes, that mentions the time and money benefit, but so what – it's honest, and it is valid to limit expectations on what is possible now.
4. He then indeed asks for a discussion on how to improve the current system without changing it completely. But this does not mean not investing time, to the contrary: He actively invites a continuation of a discussion and already there makes clear that he is indeed willing to work on a better solution, and that is the core point making this a constructive answer.
How the discussion continues in my eyes clearly shows that the negative interpretation of this first answer, in your comment and this thread, is wrong. That's not someone blocking change, not even at first, that is someone asking for clarification and even clearer tasks to do. That's not a bad thing at all.
"That's actually not true."
"taking the route they took, the benefits for them... Yes, that mentions the time and money benefit, but so what – it's honest, and it is valid to limit expectations on what is possible now."
So, it was true. Then the dialog continued from there with Alloy paying attention to the situation. Alloy refused to do any signiciant work on their end for their maximal benefit in a disruptive situation from a free service. Even admitted they were using a service for something it's not designed for but didn't care about moving to a more appropriate one. Hence, my calling it selfish.
Eventually, others had invested their own time and energy into the problem enough to come up with some simple recommendations for Alloy that take very little effort on his part. Alloy summarized those and agreed to attempt them. Thread was closed before we could see where that went.
Given above, I stand by my claim that he was a selfish individual pushing his own liabilities onto others wherever possible. Even in how it was remedied was mostly on others. On other hand, he might make a good capitalist w/ that level of exploitation and externalizing. :)
> I.e. I’d like us to continue this discussion, at first, from the notion of us maintaining the existing architecture. Where things are absolutely impossible, it would be great if you can include more links to docs/source that explain why things are impossible.
Notice the "at first", notice also the following proposed action of working with a snapshot.
You made clear you were expecting another reaction. And I see why. Still, I think you miss how much good faith was contained in this response. It is a bit sad this gets overlooked. People forget text is not that easy to interpret.
You're right that text is not easy to interpret. For instance, there were two interpretations of my text: a literal and precise one that focuses on how much effort I say he would commit; an interpretation that realizes I was speaking figuratively with hyperbole. It was the latter. The message was a counterpoint supporting that he was selfish rather than a precise statement of how selfish he was. You would be 100% right if we were talking literally about him such as in a court filing or HR report.
"Notice the "at first", notice also the following proposed action of working with a snapshot. You made clear you were expecting another reaction. And I see why. Still, I think you miss how much good faith was contained in this response."
This is possible. Let me re-read his post first. Alright, done. Here's a re-review.
His first response starts with thanks and statements that show either (a) an incredibly joyous and friendly personality or (b) brown-nosing of a salesman before a pitch. Horizontal line. Unclear on some things. Asks for more information. We then get to the reasons:
1. Did no work on syncing data to reduce funded development hours.
2. Don't want to operate a repo due to reduced effort or funding.
3. Easier for their users and adoption.
These are all self-centered. Honest as you said but already support my claim of selfishness. Let's keep looking. Upon a suggestion of other packaging systems, vaguely claims they are using a "smarter" method then reiterates HR and funding justifications above. Ignores alternatives in next sentence to reiterate their existing, strange, and broken solution with a dismissal about having to build a cathedral rather than just using existing solutions.
So, Alloy already laid a foundation of total selfishness in terms of time, funding, and design inflexibility. At this point, Alloy is interested in solutions that totally maintain their existing design and lack of commitment to anything else. Offers to make a few simple changes that "would still use your resources." Asks for information that basically leads to those in recommendations that they begin to apply.
So, re-reading his post, it comes off as incredibly selfish using text that's not hard to interpret. He clearly believes their design works, won't be changed unless forced, changes must take little effort from them, they must not use their funding, and must specifically use GitHub's resources. My claim of selfish and externalizing is fully supported at this point. I think the other commenter's claim of being "myopic" about what he's doing in the project is accurate, too.
this is what we used to call the "good" problem (things breaking because they are successful), but that doesn't make it cheap or easy to fix. the other stuff they're talking about in the thread will alleviate some pain and buy some time, but it won't solve the fundamental problem if CocoaPods continues to get "big" (imagine apt or yum trying to run like this).
i understand their want to maintain the simple and coherent workflow... if i was writing a package dist sys, i would love to have it work off a standard git repo. maybe this is something that could be solved with a plugin architecture like the large binaries stuff so that developers could continue with their preferred workflow but end-users could take advantage of a CDN-like system for distribution.
This is mine as well, but it's also troubling to me, given that the repo in question is meant to be a package management system; it means there are fundamental holes in the user's understanding of packaging systems.
My mentor has a lot of contempt for the bevy of packaging solutions that people come up with - invariable people look at the old ones, think they're too complex and wrong, write up new code that is Slick(tm) and Fast(tm) and Cool(tm) and they are... until they hit scale. Whether that scale is number of users, or serving multiple environments, or serving a great many packages of different versions... the lack of domain knowledge in the design stage will cause huge amounts of issues.
Now granted, I'd try to give them a chance to step away from that statement and let them show me that they are interested in understanding the issue and interested in reducing their impact on the product. I'm okay if they don't yet know HOW, but if they basically just throw up their hands, say it's someone else's problem, and leave it at that then no. In fact, hell no. I'm not interested. It's shows a level of entitlement and a lack of interest in their craft and I will not subject my team to someone like that.
And it had this passive agressive ring to it, with the hand clapping and the hurray in the beginning and the stone walling in effect.
...only to have someone come up and act like a paying customer whose expectations weren't being met. He answered suggestions by saying something that comes down to "I don't understand, can you repeat please", and never quite grasped that if he wants a better experience for his users, he also needs to work for it.
The introduction to the response, in typical douche manager style, was the cherry on top.
I took that emojis to be the exact opposite of the almost sarcastic tone I think you're interpreting it to have.
I guess even with emojis, it's hard not to make tone ambiguous.
...probably especially with emoji.
And what the hell was with the quotes around "free"? Are you paying? No? Then there's no quotes about it.
fwiw: "I would have a really hard time working for someone whose first inclination was always towards criticism over accommodation or compassion." But then I also acknowledge there may be a whole bunch of other stuff going on here behind the scenes. ;)
My biggest problem is that I can't imagine a time, even if I don't understand the problem, where I would say that I'm not interested in essentially not abusing a free system that can't handle my load. He basically said he had better things to spend his time or money on. That shows a motivation/attitude that would likely be there regardless of the issue or if they understood it. That's his knee jerk reaction, which I would say is probably his most honest, and it seems very selfish and not empathitic to the GH team at all.
But I think you may be reading his tone more negatively than necessary. What I see is him starting off by expressing gratitude and then switching voices to communicate very explicitly what the needs and desires of his stakeholders are. He's simply trying to reflect that as clearly as possible and discover the additional context. This was clearly effective, as you can see from the rest of the conversation that with all the information out there, everybody comes to a mutually acceptable consensus. Problem solved!
What we've ultimately got here is a free service built on a free service. CocoaPods has nothing but the time and effort of volunteers. GitHub has input of resources from the commercial side of their business. Both sides clearly want to preserve the utility of this end-to-end workflow in a more sustainable way.
"Falling over yourself" is a subjective amount of effort, but clearly CocoaPods has tried to minimize their impact on GitHub. As it turns out, the attempted optimization of shallow fetching backfired, but that's not from lack of regard for the resources they rely on. What was missing was exactly the context the Github employees provided.
Honestly, I think people are offended second-hand by a perceived lack of groveling on CocoaPods part, and to me, that's way overblown.
The Git deep v. shallow issue just puts a band-aid on the CPU problem, but it doesn't do anything about the terabytes of bandwidth per week (it'll be worse), and it won't do anything about GitHub's claim that Git is not meant to be used as a CDN and doesn't scale well.
They've become a big project that warrants thinking about revenue or organization strategy, but they're delaying it by externalizing their costs. Cases like these can pressure GitHub into rethinking the leniency of their policies.
I also think that if you're in the top-5 resource consuming group, more sympathy would go your direction if you were a paying customer, but they've indicated no interest.
> To be blunt, you're abusing the shit out of SOMEONE ELSE’S
> product that you're not even paying for.
AFAIC, that makes every free user a customer. They may not be a paying customer, but it's GitHub’s choice to be in the free hosting business.
From @mhagger's measured and thoughtful reply: "We understand that part of the CocoaPods workflow is that its end users (i.e., not just the people contributing to CocoaPods/Specs) fetch regularly from GitHub..."
I guess I’m saying that I don’t see CocoaPods as being a “bad actor” so much as the extreme tail of a distribution.
The one that CP's usage most likely confronts is G.12 - found here: (https://help.github.com/articles/github-terms-of-service/)
> If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption.
Cocoapods uses GitHub. No abuse here.
Using a GitHub repo as a high traffic code CDN and keeping 5+ cores pegged while being the single biggest consumer of resources across the whole platform could be reasonably defined as an abuse of the service.
Primary definitions on Google:
"abuse" verb: use (something) to bad effect
"abuse" noun: the improper use of something
I don't know... I'm sure they'd get some, CocoaPods has a very large user base after all. But if GitHub laid out the reasons like they laid out the options to start this thread, I think they'd weather the storm fine and diffuse some people who show up to be angry.
I could see why GH would drop them and think they'd be well within their right and in good moral standing in my book. I just don't think they would unless the maintainers became incredibly hostile or proved unable to fix the problem or even band-aid it. GH just seems too culturally invested in making things like this work to a satisfactory conclusion. I have a feeling that the CocoaPods team will be schooled in a lot of things directly from the GitHub team as they work to resolve the performance issues, just look how informative the initial post was.
It's abuse. Wasn't intentional, but at the scale that Cocoapods is running it's abuse.
Discounting the fact that CocoaPods is used by millions of iOS developers...
If CocoaPod was run by a business that was charging those millions of developers for their services, it would be reasonable to expect that business to pay for a real CDN.
They're not, so that's not a reasonable expectation. But it's no more reasonable to make these demands of GitHub. Giving out a free product doesn't mean that you're required to give it out unconditionally, or in unlimited amounts, or forever. In the end, GitHub owns the infrastructure and services it's providing and can do what it wants with it.
Just about every other package manager for every other language (Pip, CPAN, Hackage, etc.) uses its own infrastructure.
I've run tor exit nodes and repo hosting when I worked for ISPs and Datacenters while at the same time shuting customers down who do the same. The difference being that I had that conversation with my boss and said, 'this will violate our normal terms of service but I would like to do this.' The boss is of course more willing to make that concession when he can walk down the hall and say, 'uh, we have extremely high usage and today, can you shut down the repos until we can get another link installed?'
If you want, you can email me (link in profile). Thanks!
Everything you need is online but you'd have to find it for yourself and make decisions about the best way for you to do it.
that being said, i do believe it could help cocoapod's use case since the fetches are done automatically (as i understand it)
In the medium to long-term I'd like to consider Homebrew running `brew update` automatically before you `brew install` (https://github.com/Homebrew/homebrew/issues/38818). For us to ever be able to do that `brew update`'s no-op case needs to be extremely fast.
My comment isn't to pass judgement on your suggestion, but if you took a look at homebrew itself you'd be able to make better non-generic suggestions.
Re: your suggestion - It's a generically seemingly good thing to separate out fetch from install, but as a user of homebrew, it's not very applicable because when homebrewing things you're likely already connected to the internet, and it's hard to predict when you want to brew install something before hand. If you have the internet capacity to fetch something, it'd be just as easy to brew install it there on the spot.
By extension, it's important to run brew update before installing just to make sure the package index is up to date, so I agree with the dev above, integrating brew update step before brew install would be a good thing - except - perhaps print out on console the exact version number that's going to be installed. Current behaviour does put the version # in the file name of the package being installed, but it could be listed in a more obvious way.
Often times I do a brew info, find the version and details on it before brew install. If the installation step then installs a new version (because of the brew update step), then it's a bit strange that I didn't get the version I was intending to get.
I am guilty of the same, but in my case, I have an installation I can actually check. You can read the source code if you don't have a mac to work with, but the important thing is this already exists
from the brew man page:
fetch [--force] [-v] [--devel|--HEAD] [--deps] [--build-from-source|--force-bottle] formulae
Download the source packages for the given formulae. For tarballs, also print SHA-1 and SHA-256 checksums.
Is this classic "I have an opinion and all of my opinions are great, so I must pontificate!"?
Maybe you should just trust that the developers of the only successful package manager for OS X have some idea of what their users want and need... and that as someone who has NEVER used their software, and not a seasoned veteran of any sort of similar projects, your opinion counts WAY less than any of their actual users.
It is also being used in scripts, etc. Since from the user's point of view it's a no-op if there are no updates, there is no reason not to do it on a schedule.
EDIT: the upside is that cocoapods will have to either rethink there architecture in order to eat less resources or move to their own paid infrastructure because their package manager will soon be less than functional given the aggressive rate limiting github is performing.
I'd like to see both happen:
* CocoaPods refactoring to be more efficient
* GitHub providing open source projects the option to buy reserved capacity if they're using excessive resources (versus just saying "No").
> GitHub providing open source projects
> the option to buy reserved capacity.
I'm just pointing out that the feature you're wishing exists very likely already exists in practice. Unless GitHub is stupid they aren't going to be complaining about you pegging 5 CPU cores for $200/month.
GitHub people are truly going above and beyond in service even when barely warranted. I'll give them that.
> CocoaPods is a dependency manager for Swift and Objective-C Cocoa projects. It has over ten thousand libraries and can help you scale your projects elegantly.
The developer response:
> [As CocoaPods developers] Scaling and operating this repo is actually quite simple for us as CocoaPods developers whom do not want to take on the burden of having to maintain a cloud service around the clock (users in all time zones) or, frankly, at all. Trying to have a few devs do this, possibly in their spare-time, is a sure way to burn them out. And then there’s also the funding aspect to such a service.
So they want to be the go-to scaling solution, but they don't want to have to spend any time thinking about how to scale anything. It should just happen. Other people have free scalable services, they should just hand over their resources.
Thank goodness Github thought about these kinds of cases from the beginning and instituted automatic rate limiting. Having an entire end user base use git to sync up a 16K+ directory tree is not a good idea in the first place. The developers should have long since been thinking about a more efficient solution.
Honestly if I was GitHub, I'd be tempted to just increase the throttling on CocoaPods and call it done, it isn't their problem if the users of that project have a bad experience. GitHub has provided solutions to the problem, it's CocoaPods that's resisting implementing those solutions.
Perhaps I'm just being too charitable. Either way, the project rather rapidly seemed to come to the right conclusion and jump on board fixing their problem.
On a related note, I feel like this issue could be turned into a great teachable moment for OSS projects; one agH could use as a tech blog and guides for how to be a good citizen and avoid things that can make your project get rate-limited without you knowing.
Obviously, that's not to say the sentiment isn't genuine. The eventual conclusion makes it seem that yeah, they do appreciate what GH is providing and are trying to make it less strenuous on the servers to get a better experience all round. Making it work well is really in their best interests since the users are seeing a degraded experience until something can be done about it. Definitely also happy that the right conclusion was eventually reached.
Given how rapidly the same commenter changed gears, it strikes me as plausible there was an "ohhhhh eureka" moment, and suddenly the guy got it. His followup comments began dealing with the problem after a couple other GH participants explained further what was happening and why (as well as some actionable steps to take to correct the problem for good).
But perhaps I'm being too charitable.
If you are in such a position, then it seems like the best course action would be to ask questions rather than list off reasons that you don't want to deal with it.
I think that long-term, the solution will be the Swift Package Manager, and CocoaPods will just be deprecated in favor of it. Let Apple host iOS packages; they're the ones that gain the most benefit from easy iOS development; they have the developer expertise, and the hosting costs are a drop in the bucket compared to iCloud & CloudKit. But that's not all that helpful for people who need an Objective-C package now.
I don't think working on CocoaPods is an altruistic endeavor. I imagine (know) that some of the cocoapods folks are app developers and ostensibly CocoaPods makes developing applications easier.
Side Note: its not a tragedy of the commons. Github owns the infrastructure and they enforced their private property rights by rate limiting a group of users that were disproportionately using resources. It is a collective action problem for CP users.
No direct financial benefit, but they are deriving a benefit out of their work.
Jobs compete with other jobs, and most people expect that they'll have to do some unpleasant things in their job. Open-source & volunteer work competes with hobbies, and there are many hobbies where you never need to deal with demands, unexpected work, and interpersonal drama.
My point, though, is that it's not the CocoaPods developers who are ungrateful bastards. It's any Hacker News commenter here who also uses CocoaPods. If you think this behavior is insane, submit a pull request.
TestFlight on the other hand....
(Edit: </rhetorical> </sarcasm>)
Also while nit-picking, I would clear up your use of "for free </r></s>" as "as a freebie", again post-[insert: x̄ȳz, inc].
I understand the desire to personally maintain as few of one's own servers as possible, but when the result is negative effects on the service hosting the project and a worse experience for the end-user, it might be time to start looking over what google cloud offers.
2) It makes perfect sense to let GitHub handle the performance hit until issues arise. Premature optimization is the devil, right? But once there start to be issues, it's definitely unfair to turn around and say "well you offer the service for free, so you should fix it"
Sentence 1 would still be true if CocoaPods was only used by ten companies developing the ten biggest (in terms of lines of code) Objective-C projects, but there would no longer be a need to scale in the sense of sentence 2.
"Not having to develop a system that somehow syncs required data at all means we get to spend more time on the work that matters more to us, in this case. (i.e. funding of dev hours)"
In other words, using github as a free unlimited CDN lets them be as inefficient as they like. Such as having 16k entries in a directory ( https://github.com/CocoaPods/Specs/tree/master/Specs ) which every user downloads.
Package management and sync seems to suffer really badly from NIH. Dpkg is over 20 years old and yum is over a decade old. What's up with this particular wheel that people keep reinventing it seemingly without improvement?
Trivial apt operations (e.g. trying to install a package which is already installed) on an NSLU2 (an ancient 266MHz ARM machine) take several minutes, whereas the same operation takes several seconds on a modern laptop.
It turns out this is due to the fact that Debian "main" (Packages.gz) has ballooned to 32MB of plain text when uncompressed, comprising more than 41,000 packages, and it has to be parsed and assembled into a dependency tree for every apt operation. This problem screams for SQLite.
A side project I've started looking into is to make a transparent apt proxy which provides a trimmed down Packages.gz (e.g., removing anything which uses X11), which would be a lot easier that rewriting apt to use a SQLite backend.
This is precisely why yum/dnf has been switching from XML for repodata to SQLite. In fact, the only thing that is still XML-only is the comps file which just lists package groups, is updated rarely and "only" weighs in at half a MB.
Actually, I'm surprised one can actually run a modern Linux on the NSLU2 given its shameful lack of RAM and slooow USB port. But it was a nice gadget when it came out and it was fun to experiment with it.
> It turns out this is due to the fact that Debian "main" (Packages.gz) has ballooned to 32MB of plain text when uncompressed, comprising more than 41,000 packages, and it has to be parsed and assembled into a dependency tree for every apt operation. This problem screams for SQLite.
Correct me if I'm wrong, but isn't apt (and dpkg) basically composed out of a ton of different (perl/shellscript) modules? So it should be possible to create an interface-compatible sqlite data store.
Wouldn't cleaning up the package search interface be a similar effort with much greater payoff?
0: p2pacman - Bittorrent powered pacman wrapper
1: pacman & torrent, feasible?
I believe scaling this could happen with either: 1) lightweight filesystem\directory versioning support, like how btrfs allows you to mount snapshots. This way, peers could update whichever version of a torrent they have. Or 2) very reliable means to update to the latest torrent release (as reliable as syncing with peers), which afaict means smarter bittorrent clients that can perform DHT-based "crawling". Those recent defcon(?) "hacks" to query peers for similar torrents based on user pools and connection histories (or something like that) would make sense here.
A cool side-note: In one of my few experiences diving into `.git`, I diff'd it before and after making changes to its tracked sources, like adding files and modifying them. It looked like a torrent that included version control data would make out just fine if an updated torrent expected similar data in the same location. Again, a smarter bittorrent client would need to sort some of this out. See also 0': Updating Torrents Via Feed URL. Anyway, most users would probably leave that part out, in favor of only which parts they need.
Another cool side-note: This would also allow for easily adding repos from multiple sources... Look at how many ( non-automated :-( ) merge requests com.github/CocoPods/Specs's caregivers have reviewed: 13,331 as of now (0'').
> 0: p2pacman - Bittorrent powered pacman wrapper
> 1: pacman & torrent, feasible?
> 2: DebTorrent
That's about distributing packages via p2p. The problematic repository doesn't store any package data, it stores package metadata (it's the cocoapods index if you will).
metadata != data ??
(It may not have been earlier on, I really don't know. ;>)
Something nifty about the new dnf is several of the older yum commands (eg builddep, yum-downloader) are now integrated directly so don't need extra utils installed. Seems like refinement is still happening.
If only my fingers didn't keep typing "dns" instead of "dnf" all the time, it would be great. :D
> The fundamental idea of downloading a list of available options of which the user picks some, and the system pulls in dependencies, is almost exactly how dpkg and yum work.
If you reduce it to the fundamentals you don't need yum or dpkg either to do that, just a dependency solver and curl.
I'd also consider removing a package to be a fundamental part of a manager. The two items you describe would be a 'package grabber'.
Fortunately for us, Firefox is canonically hosted in Mercurial. So, I implemented support in Mercurial for transparently cloning from server-advertised pre-generated static files. For hg.mozilla.org, we're serving >1TB/day from a CDN. Our server CPU load has fallen off a cliff, allowing us to scale hg.mozilla.org cheaply. Additionally, consumers around the globe now clone faster and more reliably since they are using a global CDN instead of hitting servers on the USA west coast!
If you have Mercurial 3.7 installed, `hg clone https://hg.mozilla.org/mozilla-central` will automatically clone from a CDN and our servers will incur maybe 5s of CPU time to service that clone. Before, they were taking minutes of CPU time to repackage server data in an optimal format for the client (very similar to the repack operation that Git servers perform).
More technical details and instructions on deploying this are documented in Mercurial itself: https://selenic.com/repo/hg/file/9974b8236cac/hgext/clonebun.... You can see a list of Mozilla's advertised bundles at https://hg.cdn.mozilla.net/ and what a manifest looks like on the server at https://hg.mozilla.org/mozilla-central?cmd=clonebundles.
A number of months ago I saw talk on the Git mailing list about implementing a similar feature (which would likely save GitHub in this scenario). But I don't believe it has manifested into patches. Hopefully GitHub (or any large Git hosting provider) realizes the benefits of this feature and implements it.
Mercurial was designed to be easy to extend, and it shows.
Hg was designed to be a DVCS system.
One correction to the post title: it's not maxing five nodes, but five CPUs.
Even Finder or `ls` will have trouble with that, and anything with * is almost certainly going to fail. Is the use-case for this something that refers to each library directly, such that nobody ever lists or searches all 16k entries?
The other side to consider: “one directory per package” is a very simple policy and it feels right in many ways to people (e.g. Homebrew has a similar structure because it's a natural fit for the domain). If the filesystem and basic tools like ls work just fine (which is certainly the case on OS X, where even "ls -l" or the Finder take less than a second on a directory of that size), isn't there a valid argument that the answer should be some combination of fixing tools which don't handle that well or encouraging people to learn about things like `find` instead of using wildcards which match huge numbers of files?
As loathe as I am to admit anything about Perl is good, CPAN got this right. 161k packages by 12k authors, grouped by A/AU/AUTHOR/Module. That even gives you the added bonus of authorship attribution. Debian splits in a similar way as well, /pool/BRANCH/M/Module/ and even /pool/BRANCH/libM/Module/ as a special case.
Tooling can be considered part of the problem in this case. Because the tooling hides the implementation, nobody (in the project) noticed just how bad it was. I hadn't seen modern FS performance on something of this scale, apparently everything I've worked with has been either much smaller or much larger. Ext4 (and I assume HFS+) is crazy-fast for either `ls -l` or `find` on that repo.
It seems like tooling is part of the solution as well, but from the `git` side. Having "weird" behavior for a tool that's so integral to so many projects scares me a little, but it's awesome that Github has (and uses) enough resources to identify and address such weirdness.
Think about it from their perspective. GitHub advertises a free service, and encourages using it. Partly it's free because it's a loss leader for their paid offerings, and partly it's free because free usage is effectively advertising GitHub. CocoaPods builds builds their project on this free service, and everything is fine for years.
Then one day things start failing mysteriously. It looks like GitHub is down, except GitHub isn't reporting any problems, and other repositories aren't affected.
After lots of headscratching, GitHub gets in touch and says: you're using a ton of resources, we're rate limiting you, you're using git wrong, and you shouldn't even be using git.
That's going to be a bit of a shock! Everything seemed fine, then suddenly it turns out you've been a major problem for a while, but nobody bothered to tell you. And now you're in hair-on-fire mode because it's reached the point where the rate-limiting is making things fail, and nobody told you about any of these problems before they reached a crisis point.
It strikes me as extremely unreasonable to expect a group to avoid abusing a free service when nobody tells them that it's abuse, and as far as they know they're using it in a way that's accepted and encouraged. If somebody is doing something you don't like and you want them to stop, you have to tell them, or nothing will happen!
I'm not blaming GitHub here either. I'm sure they didn't make this a surprise on purpose, and they have a ton of other stuff going on. This looks like one of those things where nobody's really to blame, it's just an unfortunate thing that happened.
(And just to be clear, I don't have much of a dog in this fight on either side. My only real exposure to CocoaPods is having people occasionally bug me to tag my open source repositories to make them easier to incorporate into CocoaPods. I use GitHub for various things like I imagine most of us do, but am not particularly attached to them.)
With respect to CocoaPods, I would hope someone on the team had thought through performance characteristics of their architecture.
It's like they brought a shopping cart onto a city bus and were then surprised that it inconvenienced the bus driver and the other passengers.
It's not like these guys thought, "Well, we really should use some dedicated high-end host for all our traffic, but we'll use GitHub because it's easier."
> It strikes me as extremely unreasonable to expect a group to avoid abusing a free service when nobody tells them that it's abuse
I don't think so at all. An experienced developer should expect that a free service will rate-limit their offerings at some point, and design around that. Viewing 'free' as 'an eternal resource sponge that we never have to think about' is the extremely unreasonable thing to do, in my opinion. I think that 'abuse' is probably the wrong word to use here, since that implies malice, and they don't appear to be malicious.
GitHub is for source control. That means a limited number of people pulling and submitting changes. That does not mean the general public using it as a CDN.
In fact I seem to remember seeing somewhere active discouragement of using it as a CDN.
Also, I'm amazed this is even a problem. 5 CPUs is not a lot in the scheme of things (even if they mean physical instead of cores). TBs of bandwidth are also virtually free compared to a company the size of Github.
Even better: they are getting basically real world loadtested for free and finding loads of pain points, which may hit paying customers.
Unless I'm missing something, fire more metal at the problem. Many companies would love to be able to have every single cocoapod user (which is nearly every iOS developer) have to type github.com into their terminal for the cost of a bunch of servers + some bandwidth.
Pretty strange, unless this is hitting some really bad area of their service that can't easily be scaled out of (but i would be surprised)
I think their point is that it's using the system in a way that isn't intended or desired. How does that count as "real world" load testing?
And by that logic, shouldn't anybody who gets hit with a DoS attack just say "thanks"? It's tons of free load testing on your network infrastructure, and you'll definitely find some pain points.
What seems insane is to use a single github repo as the universal directory of packages and their versions driving your package manager.
There's a reason rubygems has their own servers and web services to support this use case for the central library registry, even if the source for gems are all individually projects hosted on github.
That only has 3,000 packages vs 15,000 for CocoaPods or 115,000 for RubyGems.
The CocoaPods developers seem to be missing the entire point of git: it's a _distributed_ revision control system.
Setup a post-recieve hook on Github to notify another server, that is setup with a basic installation of git, to pull from Github so as to mirror the master repo. Then, have your client program randomly choose one of these servers to pull from at the start of an operation. Simple load balancer to solve this problem.
If CocoaPods reach out to Rackspace and/or other hosting providers, there's a decent chance they'll be able to pull together a good solution. :)
The downside though, is they'll need to figure out some way to keep it monitored/maintained. :/
> GitHub Support is unable to help with issues specific to CocoaPods/CocoaPods.
That's a pretty neat feature!
Therefore, it's not Apple's problem. In fact, I've talked to a non-trivial amount of engineers (both in Cupertino and long time Cocoa devs) that disapprove of the shortcuts that Cocoapods takes all over, software architecture be damned. Reasonable parties can agree to disagree, but I do include 3rd party framework inclusion without a dependency manager as an interview screen for prospective iOS hires.
Since you mention developer relations, I'll assume you're not actually arguing that this is Apple's technical responsibility, but that they should throw around some $ to grease the wheels to make dependency inclusion better. As a platform vendor, funding hosting costs for some project that you don't agree with just to "support the community" is a bad idea. Better idea is to allocate resources to setup a structure that can fix the issue in a technically agreeable way while also benefitting from the independence a FOSS project provides. In doing so, you are correct that it'd be preferable for Apple to fund/use well-known FOSS standards, such as Github.
In conclusion, Apple should setup a FOSS project to address the current inconveniences associated with third party package inclusion and should involve and pay Github somehow.
Afaik the swift package manager works only for swift code, so it's not an replacement.
Also it's a very bad habit to try to stand over the users that supply software for your most valuable product. We've seen a lot of stories lately that indie app development is dead. We also regularly see how weak Apple is in web services (cloud sync).
So either developers invest a lot of time to build something that works (and maybe even share it on GitHub) or they will stick with the holy Apple solution and provide a crappy user experience and go bankrupt. Companies like Google or Amazon (AWS) do a very good propag^developer releations job, IMHO way better than Apple ever did (in the last 10 years).
 In my opinion, they have since lost that edge on the UI.
This one is pretty big. https://github.com/torvalds/linux
I wonder how much traffic the Github Linux repo gets. Seems to me that people who want to use Linux, will go get a distro instead. And people who want to develop the kernel, will follow the kernel development process (which doesn't rely on Github).
https://github.com/apache - Lots of mirrors but many projects use it as their main source.
- My school uses GitHub to host and track our software engineering project (which still can be argued as OSS).
- People using GitHub issue system as a forum.
- Friends uploading pdfs to GitHub.
- Recently people posted on HN about using GitHub to generate a status page.
I think this is a really bad trend and people should stop doing that.
Using GitHub Issues as a forum and a source for generating status pages are both ok from a use/abuse perspective, but you may not have the best experience since that isn't what Issues is intended for.
It should be fine to come up with new ways to use Github, as long as it's not causing excessive load.
Imagine a world where GitTorrent is fully developed, includes support for issue tracking, and has a nice GUI client that makes the experience on-par with browsing github.com.
I mention this not as an "Everybody bail out of GitHub and run to GitTorrent!!!" sort of statement, because I believe GitHub's response here was excellent and confidence inspiring. But it's an unnatural relationship for community supported, open source projects to host themselves on commercial platforms such as GitHub. GitHub primarily hosts them to promote its business. That's not necessarily a bad thing, but it results impedance mismatches like demonstrated here.
That isn't to say that a mature GitTorrent would replace GitHub. Rather, I envision GitHub becoming a supernode in the network, an identity provider, and a paid seed offering, all alongside their existing private repo business.
Honestly, once I scrape a few projects off my plate, I'm inclined to dive into GitTorrent, see where it's at in development, and see if I can start contributing code. It just seems like such a cool and useful idea.
The potential downsides seem much more annoying. Do you really want to have your dependencies on an overloaded central server somewhere?
I don't keep up with iOS development enough to know if anything has changed with respect to static/dynamic linking in iOS 8, but it has always been possible to use custom frameworks in iOS (eg, frameworks you build yourself, unless the community has another definition for 'custom framework').
The framework directory structure is a bit unorthodox, but it's really just your statically built library (absent any '.a' suffix) alongside any header files in a Headers folder. Again, not sure if this has changed with any support for dynamic linking.
How so ? I bet the cocoapods team knew they were hammering Github with that gigantic repo. They just didn't care and expected Github to just give them more bandwidth, for free.
The response from mhagger is unnecessarily apologetic, and I predict we'll see an official update from GitHub on this soon.
I don't know about that. Both oh-my-zsh and emacs prelude use git repos as their code distribution mechanisms, and that works really well. I think the real issues here are exactly what is called out in the issue: poor usage of git, and poor directory layout.
What is not perfectly fine is using GitHub as your package host, CocoaPods/Specs is the equivalent of Debians APT using one big GitHub repo to host all their packages. It has 92567 commits and 6872 contributors.
The big differences seems to be in the way they do their thing: I'm reasonably sure homebrew just git clones then updates the local repository normally, it has "only" 2500 files in Library/Formula, and because of its different subject it is way less write-active, CocoaPod has 1k commits/week which look to be increasing pretty much constantly, homebrew is around 350 with ups and downs.
Also not sure it matters, but homebrew has lots of commits updating existing formula, cocoapod changes are almost solely addition (publishing a new version of a package adds a new spec and doesn't touch the old one)
 which is exactly the bread and butter of github
Most of the changes committed to Homebrew are formulae-based; of the ~5600 contributors in Homebrew's lifetime only ~430 have contributed to the core code.
But CocoaPods/Specs is the equivalent of someone collecting all possible packages you could ever use in Bower in one big GitHub repo.
Though I wonder if the actual project-fetching is similarly daft.
CloudFlare starts at $0 and doesn't meter/charge for bandwidth. CloudFront charges 9 cents per GB and is integrated with other AWS APIs (which can be very useful). Both those solutions could be managed with a donation pool, I would try the CloudFlare free tier first.
That price point works out to $0.35/Gbyte. More typical list pricing for US/EU is in the $0.10-0.15/Gbyte ballpark. Prices decline rapidly as your utilization approaches 1PB per month.
Things that GitHub suggested help with that: faster check for updates, breaking up big directories so diffs are computed faster.
At the time, I wrote a script that hammered git commits into a repository using different strategies and looked at what the git repository would look like after 100,000 and a million commits. The "one version per file, nested in a flat structure" had serious issues.
There may still be scaling limits with the Cargo approach, but if we reach them, we have plans to create a new registry with a new initial commit and let the old registry age out, then rinse/repeat. At the moment, we haven't hit limits yet (with about 1/3 of the packages that Cocoapods has).
Go get on the other hand doesn't keep any index. It just uses the url to download the dependency because of the mapping "url==project name" that exists with go projects.
$ git fetch --depth=2147483647
Shallow (depth=1) can be converted into a full clone with the above.
1 - https://bintray.com/
I had no idea it was just CocoaPods repo because my other repos were working fine. I accepted defeat, went to bed and everything was working great in the morning.
1. NuGet packages, which are built from the GitHub repository but then redistributed over a non-Github CDN (NuGet's)
2. tsd management tool (https://github.com/Definitelytyped/tsd), which looks like it prefers Github's CDN raw URLs rather than full/shallow git clones
3. typings management tool (https://github.com/typings/typings) with "ambient" typings searches (`--ambient` flag), which I believe also prefers Github's CDN raw URLs, as it essentially forked from TSD
That said, it's past time to move beyond the giant huge DefinitelyTyped repo, and I for one heartily recommend people migrate to typings which has better support for NPM and other module and package management systems, as well as smaller unit/module-focused Github repositories.
> We would like to provide a package index in the future, and are investigating possible solutions.
1. You can't edit the frameworks unless you open a separate project, re-edit and recompile. With Pods, you can edit the Pod in your workspace.
2. Carthage doesn't go the last mile to bundle the framework into your project.
If they addressed those two things, I feel like Cocoapods would probably start losing a lot of steam. Although, after watching some videos about what the goals of Carthage were from the creators, I doubt that those two things will be addressed so I'm waiting for Swift's Package Manager.
The main difference is that homebrew actually updates the git tree to provide updated versions of package specs. CocoaPods adds a new directory and some files for each package version, causing the repo to balloon.
No matter which way you slice it, what CocoaPods is doing is a bit daft, especially at their scale.
I don't know what's normal; last night was one of my first iOS projects.
Seems like a poor design decision on the CocoaPods side.
The problem is not packages, it's the index, containing 16000 subdirectories,
It's still absolutely ludicrous to use git for it, or worse, to bother github with it.
Short term sure, they’re doing the right thing, implementing a nice way to manage the free rider problem without hurting them too much.
But long term it’s different.
Financially, one average programmer = $80k/year, one average cloud server = $4k/year.
And, GitHub has hundreds of millions of venture capital.
More than enough to provision a few more servers, even if they will be installing new servers just for those pods.
The way they act now will lead to someone will develop a decentralized git+torrent hybrid.
When that happens, sure, those pods will no longer consume precious GitHub’s resources.
However, for the rest of the github users, there will be no reason to stay on GitHub either.
Not to mention that "just buy another server for this one project" sounds like something CocoaPods should pay for.
Very likely true, but I don’t see how’s that related.
>they start to have pathologies when you have many entries in the same directory
Only true for inefficient filesystems like FAT.
For NTFS, 16k entries is nothing, the performance fill start to degrade (due to directory fragmentation) at around 100k entries:
>"just buy another server for this one project" sounds like something CocoaPods should pay for.
I don’t think that’s how 21 century economy works in this case.
Github’s value is likely between $0.75B and $2B.
Bad PR caused by this story will exceed 10 years TCO of that extra server.