
CocoaPods downloads max out five GitHub server CPUs - jergason
https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193772935
======
onli
Note how perfect that response from mhagger is. A clear, honest sounding
assurance of what Github wants to deliver. A perfectly comprehensible
description of what is the problem, and where it is coming from. And then
suggestion how to fix it the project actually can work on, plus mentioning
changes to git itself that Github is trying to make that would help. It not
only shows great work going on behind the scenes (and if that is untrue, it at
least gives me that impression, which is what counts), but also explains it in
a great way.

~~~
manyxcxi
I was astonished at how selfish/myopic/whatever alloy's response was.

To be blunt, you're abusing the shit out of SOMEONE ELSE'S product that you're
not even paying for. Your first question shouldn't be to see what Github can
do for you to make it so you don't have to make changes. You should be falling
over yourself investigating all available avenues for reducing load.

It's an incredibly entitled way to think about things and I would have a real
hard time employing someone who's first response was like this.

~~~
tptacek
I don't know, it sounded to me like he just didn't totally understand what
Github was saying. By the end of the thread, it seemed like everyone was
agreeing. I wouldn't be comfortable using words like "selfish" to describe any
of what I read.

I certainly don't think the barb about your willingness to employ people who
write things on Github issues threads that you disagree with is helping anyone
understand any part of this situation. I understand the urge to find ways to
be emphatic about how much you disagree with things, and I _often_ find myself
compelled to write lines like that, but I think they're virtually always a bad
idea.

~~~
mintplant
> I certainly don't think the barb about your willingness to employ people who
> write things on Github issues threads that you disagree with is helping
> anyone understand any part of this situation.

It seems to be one of HN's go-to insults. "Look at this person's behavior, I
would never hire them," as if everyone wants to work at your startup.

~~~
manyxcxi
I didn't say I disagree with his statement. I'm saying, maybe more implied,
that I'm not going to hire someone who displays a lack of interest in finding
a real solution to a real problem that has to do very much with what they're
trying to build. And on top of that shows a serious streak of entitlement and
a lack empathy towards the very service they're essentially abusing and not
even paying for.

I wonder how many times CocoaPods has ruined someone's day/night on some GH
team. I wonder how many dinners some mom or dad has missed with their kids
because their service alarms are going apeshit. I don't think it's hyperbole
to say that if you are a top 10 repo at Github, you are responsible for
ruining individuals days and taking time away from their families if you are
hammering the system.

Now, these are entirely my opinion and I'm not saying alloy is bad at what
they do. I'm saying that is a collection of attitudes that I'm not going to
put on my team.

~~~
johnnyfaehell
I think it is a hyperbolic statement to say they're responsible for ruining
people's days.

Let's think this through and ask ourselves a few question.

1\. Did they go out there way to do damage? 2\. Are they responsible for
deciding the infrastructure and how well it can handle with load? 3\. Did they
force said people to work at GitHub? 4\. Is the open source culture and
hosting a major part of GitHub business plan? 5\. Are they responsible for
staffing to ensure people are scheduled to work when work needs some?

------
Gratsby
From CocoaPods.org:

> CocoaPods is a dependency manager for Swift and Objective-C Cocoa projects.
> It has over ten thousand libraries and can help you scale your projects
> elegantly.

The developer response:

> [As CocoaPods developers] Scaling and operating this repo is actually quite
> simple for us as CocoaPods developers whom do not want to take on the burden
> of having to maintain a cloud service around the clock (users in all time
> zones) or, frankly, at all. Trying to have a few devs do this, possibly in
> their spare-time, is a sure way to burn them out. And then there’s also the
> funding aspect to such a service.

\--

So they want to be the go-to scaling solution, but they don't want to have to
spend any time thinking about how to scale anything. It should just happen.
Other people have free scalable services, they should just hand over their
resources.

Thank goodness Github thought about these kinds of cases from the beginning
and instituted automatic rate limiting. Having an entire end user base use git
to sync up a 16K+ directory tree is not a good idea in the first place. The
developers should have long since been thinking about a more efficient
solution.

~~~
orclev
It seems particularly galling that their response to GitHub was to essentially
throw their hands up and say "We don't want to change anything, fix it for
us". I think GitHub had a near perfect response to this, they analyzed the
problem, came up with a set of changes that could be made to help fix it (both
short and long term), and pointed to steps they've taken to help out.
CocoaPods on the other hand (or at least one of their developers) did not
handle this particularly well. When presented with the evidence of why they
were seeing slow responses and long queues and suggestions of how to fix it,
they complained that they didn't _want_ to fix it and didn't have the time or
resources to do so.

Honestly if I was GitHub, I'd be tempted to just increase the throttling on
CocoaPods and call it done, it isn't their problem if the users of that
project have a bad experience. GitHub has provided solutions to the problem,
it's CocoaPods that's resisting implementing those solutions.

~~~
bobwaycott
I think that's pretty unfair. It's really obvious that the initial reply
didn't really understand what was going on, and what was being explained. A
couple followup additional explanations later, the same dev grokked the
problem, CocoaPods' responsibility for the problem, and outlined a list of how
they're going to solve it. Seemed to me to be a pretty nice example of
professional and helpful candor between GH and an OSS project working to
figure out a long-term solution.

~~~
mynameisvlad
I don't know, maybe I'm being overly pessimistic here, but to me it just
screams of backpedaling once they saw the reaction they were receiving in this
thread. The position shifted from "it's the way we architected things, how can
you fix this for us" to "okay, here's some things we can do" pretty quickly
and dramatically when the HN thread went up and people were reacting to the
response. Cocoapods is using Github resources for free, so the appropriate
response from the start should have been what it eventually came down to, not
pushing back on Github because they don't want to invest in an actual CDN
solution. But, as I said, maybe I'm being overly pessimistic in my analysis
here, that's just how it came off to me.

~~~
bobwaycott
I get where you're coming from. I also had a similar initial reaction.
However, as I read through the subsequent discussion, it began to read as
though the commenter was really not grokking the problem—and, more
importantly, what to do to fix it. I thought it was very impressive that none
of the GH participants reacted like some of the HN commenters here. Instead,
they showed a great deal of patience and restraint in fully explaining the
technical details, offering actionable solutions, and keeping everything very
civil and supportive. Then the same guy who sounded like he was possibly being
a jerk came back and sounded totally different because he seemed to actually
know what to do to fix his project. Maybe the CP commenter read this HN thread
and reacted to it, but I'll admit HN is the last place I'd think of finding
one of my GH issues discussed.

Perhaps I'm just being too charitable. Either way, the project rather rapidly
seemed to come to the right conclusion and jump on board fixing their problem.

On a related note, I feel like this issue could be turned into a great
teachable moment for OSS projects; one agH could use as a tech blog and guides
for how to be a good citizen and avoid things that can make your project get
rate-limited without you knowing.

~~~
mynameisvlad
Yeah, I really just think it went a bit too far in the other direction and
overcompensated somewhat, which is what was giving me that view. The comment
with the heart emoji really stood out to me as a "huh, this might be because
of HN" since it basically touched on exactly what was being criticized in
here, that they weren't really appreciating what GitHub was providing for
free. That said, I can totally see it just that alloy realized it on his own
and wanted to make it clear. It's just that the timing of it all and the fact
that it's hitting the same point kind of led me to believe that it was a
reaction.

Obviously, that's not to say the sentiment isn't genuine. The eventual
conclusion makes it seem that yeah, they do appreciate what GH is providing
and are trying to make it less strenuous on the servers to get a better
experience all round. Making it work well is really in their best interests
since the users are seeing a degraded experience until something can be done
about it. Definitely also happy that the right conclusion was eventually
reached.

------
pjc50
This reply:
[https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...](https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193810378)

 _" Not having to develop a system that somehow syncs required data at all
means we get to spend more time on the work that matters more to us, in this
case. (i.e. funding of dev hours)"_

In other words, using github as a free unlimited CDN lets them be as
inefficient as they like. Such as having 16k entries in a directory (
[https://github.com/CocoaPods/Specs/tree/master/Specs](https://github.com/CocoaPods/Specs/tree/master/Specs)
) which every user downloads.

Package management and sync seems to suffer really badly from NIH. Dpkg is
over 20 years old and yum is over a decade old. What's up with this particular
wheel that people keep reinventing it seemingly without improvement?

~~~
jahewson
Care to substantiate that last paragraph? Are you really suggesting OS X users
use yum?

~~~
wyldfire
Actually it seems very likely that one or more of the popular linux distro
package manager ecosystems would fare well on other OSs. Arch Linux's pacman
was ported to Windows, e.g..

~~~
286c8cb04bda
pacman has been ported to OS X a few times

    
    
      https://bbs.archlinux.org/viewtopic.php?id=53960
      https://bbs.archlinux.org/viewtopic.php?id=122544

------
indygreg2
I help run Mozilla's version control infrastructure and the problems described
by the GitHub engineer have been known to me for years. Concerns over scaling
Git servers are one of the reasons I am extremely reluctant to see Mozilla
support a high volume Git server to support Firefox development.

Fortunately for us, Firefox is canonically hosted in Mercurial. So, I
implemented support in Mercurial for transparently cloning from server-
advertised pre-generated static files. For hg.mozilla.org, we're serving
>1TB/day from a CDN. Our server CPU load has fallen off a cliff, allowing us
to scale hg.mozilla.org cheaply. Additionally, consumers around the globe now
clone faster and more reliably since they are using a global CDN instead of
hitting servers on the USA west coast!

If you have Mercurial 3.7 installed, `hg clone
[https://hg.mozilla.org/mozilla-central`](https://hg.mozilla.org/mozilla-
central`) will automatically clone from a CDN and our servers will incur maybe
5s of CPU time to service that clone. Before, they were taking minutes of CPU
time to repackage server data in an optimal format for the client (very
similar to the repack operation that Git servers perform).

More technical details and instructions on deploying this are documented in
Mercurial itself:
[https://selenic.com/repo/hg/file/9974b8236cac/hgext/clonebun...](https://selenic.com/repo/hg/file/9974b8236cac/hgext/clonebundles.py).
You can see a list of Mozilla's advertised bundles at
[https://hg.cdn.mozilla.net/](https://hg.cdn.mozilla.net/) and what a manifest
looks like on the server at [https://hg.mozilla.org/mozilla-
central?cmd=clonebundles](https://hg.mozilla.org/mozilla-
central?cmd=clonebundles).

A number of months ago I saw talk on the Git mailing list about implementing a
similar feature (which would likely save GitHub in this scenario). But I don't
believe it has manifested into patches. Hopefully GitHub (or any large Git
hosting provider) realizes the benefits of this feature and implements it.

~~~
_yy
Wow, this is pretty cool. Reminds me of the performance optimizations Facebook
has done with Mercurial:
[https://code.facebook.com/posts/218678814984400/scaling-
merc...](https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-
facebook/)

Mercurial was designed to be easy to extend, and it shows.

~~~
rjbwork
Git was created and designed to support Linus' workflow when developing the
Linux kernel.

Hg was designed to be a DVCS system.

------
jdcarter
Wow, really impressive response from GitHub. The right amount of technical
detail coupled with balanced tone--halfway between "we support you" and "you
make us crazy."

One correction to the post title: it's not maxing five nodes, but five CPUs.

~~~
justinclift
Yeah, 5 cpu's is an order of magnitude difference. ;)

~~~
joshribakoff
Wait, so the CPU isn't the big white tower sitting under my desk?!

~~~
minsight
According to my mother, that's "The hard drive".

------
web007
I keep coming back to point #4 - who ever thought that 16k objects in a single
directory would be a good idea? Ever since FAT that's been a bad idea, and
while modern FSes will handle it without completely melting down it's still
going to cause long access operations on anything to do with it.

Even Finder or `ls` will have trouble with that, and anything with * is almost
certainly going to fail. Is the use-case for this something that refers to
each library directly, such that nobody ever lists or searches all 16k
entries?

~~~
acdha
I do think that your last sentence is the answer: if you're using a package
manager instead of working with the directory heavily, this isn't a visible
problem which is going to motivate people to work on it.

The other side to consider: “one directory per package” is a very simple
policy and it feels right in many ways to people (e.g. Homebrew has a similar
structure because it's a natural fit for the domain). If the filesystem and
basic tools like ls work just fine (which is certainly the case on OS X, where
even "ls -l" or the Finder take less than a second on a directory of that
size), isn't there a valid argument that the answer should be some combination
of fixing tools which don't handle that well or encouraging people to learn
about things like `find` instead of using wildcards which match huge numbers
of files?

~~~
web007
One directory per package is completely sensible, just not all in one bunch.
It's even fine if the mapping is to a flat namespace at something like the
HTTP level - I can mod_rewrite /abcdefg to /a/b/c/abcdefg no problem. My only
objection is to file- or directory-level structures that are this flat. I
might be mentally deficient, but I can't even process anything that's
structured this way.

As loathe as I am to admit anything about Perl is good, CPAN got this right.
161k packages by 12k authors, grouped by A/AU/AUTHOR/Module. That even gives
you the added bonus of authorship attribution. Debian splits in a similar way
as well, /pool/BRANCH/M/Module/ and even /pool/BRANCH/libM/Module/ as a
special case.

Tooling can be considered part of the problem in this case. Because the
tooling hides the implementation, nobody (in the project) noticed just how bad
it was. I hadn't seen modern FS performance on something of this scale,
apparently everything I've worked with has been either much smaller or much
larger. Ext4 (and I assume HFS+) is crazy-fast for either `ls -l` or `find` on
that repo.

It seems like tooling is part of the solution as well, but from the `git`
side. Having "weird" behavior for a tool that's so integral to so many
projects scares me a little, but it's awesome that Github has (and uses)
enough resources to identify and address such weirdness.

~~~
zodiac
My (perhaps naive) thoughts on this are - suppose a 16k-packages-in-one-
directory solution were just as fast as a 16k-packages-sharded-by-prefix (the
CPAN solution), then the former is conceptually simpler and so should be
preferred. And the fact that you can mechanically transform one structure to
the other means that the filesystem (or git) should be able to transparently
do it for you (eg use the sharded approach as a hidden implementation, while
the end user sees a flat directory). This seems to be similar to what ext4
does
([https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash...](https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash_Tree_Directories)).

~~~
cyphar
The obvious question is how would you implement that. You might argue (as you
should) that git has closer semantics to a filesystem than version control.
But actually implementing this sharding would require git be a kernel module.
Hardlinks and softlinks won't save you because they are both still dentries
and thus have the same performance pathology. Maybe you could do it with fuse,
but what have you gained by making your version control system even more
annoying to use?

------
mikeash
The criticism against CocoaPods here seems awfully harsh.

Think about it from their perspective. GitHub advertises a free service, and
encourages using it. Partly it's free because it's a loss leader for their
paid offerings, and partly it's free because free usage is effectively
advertising GitHub. CocoaPods builds builds their project on this free
service, and everything is fine for years.

Then one day things start failing mysteriously. It looks like GitHub is down,
except GitHub isn't reporting any problems, and other repositories aren't
affected.

After lots of headscratching, GitHub gets in touch and says: you're using a
ton of resources, we're rate limiting you, you're using git wrong, and you
shouldn't even be using git.

That's going to be a bit of a shock! Everything seemed fine, then suddenly it
turns out you've been a major problem for a while, but nobody bothered to tell
you. And now you're in hair-on-fire mode because it's reached the point where
the rate-limiting is making things fail, and nobody told you about any of
these problems before they reached a crisis point.

It strikes me as extremely unreasonable to expect a group to avoid abusing a
free service when nobody tells them that it's abuse, and as far as they know
they're using it in a way that's accepted and encouraged. If somebody is doing
something you don't like and you want them to stop, you have to tell them, or
nothing will happen!

I'm not blaming GitHub here either. I'm sure they didn't make this a surprise
on purpose, and they have a ton of other stuff going on. This looks like one
of those things where nobody's really to blame, it's just an unfortunate thing
that happened.

(And just to be clear, I don't have much of a dog in this fight on either
side. My only real exposure to CocoaPods is having people occasionally bug me
to tag my open source repositories to make them easier to incorporate into
CocoaPods. I use GitHub for various things like I imagine most of us do, but
am not particularly attached to them.)

~~~
martinald
I entirely agree with this. GitHub gets so much advertising + community from
open source projects like this.

Also, I'm amazed this is even a problem. 5 CPUs is not a lot in the scheme of
things (even if they mean physical instead of cores). TBs of bandwidth are
also virtually free compared to a company the size of Github.

Even better: they are getting basically real world loadtested for free and
finding loads of pain points, which may hit paying customers.

Unless I'm missing something, fire more metal at the problem. Many companies
would love to be able to have every single cocoapod user (which is nearly
every iOS developer) have to type github.com into their terminal for the cost
of a bunch of servers + some bandwidth.

Pretty strange, unless this is hitting some really bad area of their service
that can't easily be scaled out of (but i would be surprised)

~~~
breischl
>>Even better: they are getting basically real world loadtested for free and
finding loads of pain points, which may hit paying customers.

I think their point is that it's using the system in a way that isn't intended
or desired. How does that count as "real world" load testing?

And by that logic, shouldn't anybody who gets hit with a DoS attack just say
"thanks"? It's tons of free load testing on your network infrastructure, and
you'll definitely find some pain points.

------
wpeterson
It's totally reasonable to host your code on github and to build a package
manager that loads the content of a package from it's github repo.

What seems insane is to use a single github repo as the universal directory of
packages and their versions driving your package manager.

There's a reason rubygems has their own servers and web services to support
this use case for the central library registry, even if the source for gems
are all individually projects hosted on github.

~~~
lucaspiller
I assume they modelled it after Homebrew, which has been working fine doing
exactly that for the last 7 years.

That only has 3,000 packages vs 15,000 for CocoaPods or 115,000 for RubyGems.

~~~
philippnagel
In case somebody is interested in such figures (I certainly am) - NPM has
249,838 as of today [0].

[0]: [https://www.npmjs.com](https://www.npmjs.com)

------
riscy
> Scaling and operating this repo is actually quite simple for us as CocoaPods
> developers whom do not want to take on the burden of having to maintain a
> cloud service around the clock (users in all time zones) or, frankly, at
> all.

The CocoaPods developers seem to be missing the entire point of git: it's a
_distributed_ revision control system.

Setup a post-recieve hook on Github to notify another server, that is setup
with a basic installation of git, to pull from Github so as to mirror the
master repo. Then, have your client program randomly choose one of these
servers to pull from at the start of an operation. Simple load balancer to
solve this problem.

~~~
justinclift
Rackspace is also known to sponsor significant resources for larger projects
whom ask nicely. GlusterFS is one I used to be involved with doing this, and
there are definitely others.

If CocoaPods reach out to Rackspace and/or other hosting providers, there's a
decent chance they'll be able to pull together a good solution. :)

The downside though, is they'll need to figure out some way to keep it
monitored/maintained. :/

~~~
voltagex_
Last I checked, Rackspace wasn't accepting any more projects.

------
spoiler
I find it amusing how GitHub's contact[1] form has (probably a recent
addition):

> GitHub Support is unable to help with issues specific to
> CocoaPods/CocoaPods.

\---

[1]: [https://github.com/contact](https://github.com/contact)

~~~
jdreaver
I think that contact page remembers the last repo you visited. I went to it in
incognito mode and is wasn't there.

That's a pretty neat feature!

------
rmoriz
CocoaPods (and Homebrew) mainly exist because of a lack of tooling in the
typical Apple ecosystem. So I would blame Apple for not supporting the
community with money or tooling. Letting GitHub with its limited amount of
funding pay the bill isn't a nice move. Apple dev relations should throw some
money at GitHub so they can provide some dedicated resources or offer to pay
the cost of other solutions (like a 3rd party CDN/AWS/Google Cloud/…).

~~~
Gorbzel
CocoaPods exists because developers want to learn how to "build apps" but lack
the resources to intelligently include and link to 3rd party code in their
projects. CocoaPods doesn't enable anything not otherwise configurable via git
submodules and Xcode project hierarchies / build settings.

Therefore, it's not Apple's problem. In fact, I've talked to a non-trivial
amount of engineers (both in Cupertino and long time Cocoa devs) that
disapprove of the shortcuts that Cocoapods takes all over, software
architecture be damned. Reasonable parties can agree to disagree, but I do
include 3rd party framework inclusion without a dependency manager as an
interview screen for prospective iOS hires.

Since you mention developer relations, I'll assume you're not actually arguing
that this is Apple's technical responsibility, but that they should throw
around some $ to grease the wheels to make dependency inclusion better. As a
platform vendor, funding hosting costs for some project that you don't agree
with just to "support the community" is a bad idea. Better idea is to allocate
resources to setup a structure that can fix the issue in a technically
agreeable way while also benefitting from the independence a FOSS project
provides. In doing so, you are correct that it'd be preferable for Apple to
fund/use well-known FOSS standards, such as Github.

In conclusion, Apple should setup a FOSS project to address the current
inconveniences associated with third party package inclusion and should
involve and pay Github somehow.

Oh wait...[https://github.com/apple/swift-package-
manager](https://github.com/apple/swift-package-manager)

~~~
mahyarm
Apple decided to block dylibs from the start of the iOS app store, and I think
that friction point from how things are usually done in OSX, C & C++ land
before iOS is what started the entire cocapods thing in the first place. The
replacement with dynamic frameworks came about 8 years too late in ios 8.

------
zymhan
I've always found Github's business model interesting. What if a massive open-
source organization (e.g. Fedora, Apache) decided to use it for all of their
development, integrating it with continuous builds and all the associated
pulls. Of course this isn't likely to happen for a number of reasons, but
there are large open source projects that could put a significant load on
their infrastructure if they chose to use Github as their main code versioning
system.

~~~
discreditable
> What if a massive open-source organization (e.g. Fedora, Apache) decided to
> use it

This one is pretty big.
[https://github.com/torvalds/linux](https://github.com/torvalds/linux)

~~~
snowwrestler
Sure but it's just the kernel, and that's just a mirror. Linus does not use
Github to manage kernel development. In fact he's been vitriolic in the past
about how Github does pull requests.

I wonder how much traffic the Github Linux repo gets. Seems to me that people
who want to use Linux, will go get a distro instead. And people who want to
develop the kernel, will follow the kernel development process (which doesn't
rely on Github).

~~~
discreditable
For a period of time when Kernel.org was breached, GitHub was _the_ repository
for Linux. [1] I remember reading a review of GitHub by him shortly after. He
did not like how Pull Requests or patches worked on GitHub. I'd link it but
I'm having trouble finding it.

1\.
[http://www.theregister.co.uk/2011/09/06/linus_torvalds_dumps...](http://www.theregister.co.uk/2011/09/06/linus_torvalds_dumps_kernel_for_github/)

------
iBotPeaches
This bug report is a great step in the direction for GitHub. As of this
comment there are 3 different GitHub staff members responding and providing
feedback to the CocoaPods team. From the previous "Dear GitHub" messages and
responses, this seems like perfect community involvement.

------
paradite
I have been seeing this trend of GitHub getting "abused" for purposes other
than hosting source code.

\- My school uses GitHub to host and track our software engineering project
(which still can be argued as OSS).

\- People using GitHub issue system as a forum.

\- Friends uploading pdfs to GitHub.

\- Recently people posted on HN about using GitHub to generate a status page.

I think this is a really bad trend and people should stop doing that.

~~~
WillAbides
The school example and friends uploading pdfs to GitHub are both uses that
GitHub encourages.

Using GitHub Issues as a forum and a source for generating status pages are
both ok from a use/abuse perspective, but you may not have the best experience
since that isn't what Issues is intended for.

~~~
takluyver
The issue tracker on new repos has a 'question' tag by default, so Github are
gently encouraging using issues as a forum. Though my inner cynic says that
makes sense for them - issues tie a project to Github more than the git repo
itself does.

It should be fine to come up with new ways to use Github, as long as it's not
causing excessive load.

------
fpgaminer
GitTorrent: [http://blog.printf.net/articles/2015/05/29/announcing-
gittor...](http://blog.printf.net/articles/2015/05/29/announcing-gittorrent-a-
decentralized-github/)

Imagine a world where GitTorrent is fully developed, includes support for
issue tracking, and has a nice GUI client that makes the experience on-par
with browsing github.com.

I mention this not as an "Everybody bail out of GitHub and run to
GitTorrent!!!" sort of statement, because I believe GitHub's response here was
excellent and confidence inspiring. But it's an unnatural relationship for
community supported, open source projects to host themselves on commercial
platforms such as GitHub. GitHub primarily hosts them to promote its business.
That's not necessarily a bad thing, but it results impedance mismatches like
demonstrated here.

That isn't to say that a mature GitTorrent would replace GitHub. Rather, I
envision GitHub becoming a supernode in the network, an identity provider, and
a paid seed offering, all alongside their existing private repo business.

Honestly, once I scrape a few projects off my plate, I'm inclined to dive into
GitTorrent, see where it's at in development, and see if I can start
contributing code. It just seems like such a cool and useful idea.

~~~
cyphar
My main issue with Free Software projects using GitHub is that it's
proprietary, not that it's commercial. Admittedly, I think GitTorrent is a
really cool idea, but I'm wondering if a distributed filesystem might be a
more elegant solution than using both BitTorrent and Bitcoin.

------
pavlov
I've never really understood CocoaPods. Dragging a framework into Xcode was
never much trouble, and the amount of 3rd party libraries in a OS X / iOS
project ought to be fairly small, so the gains are trivial.

The potential downsides seem much more annoying. Do you really want to have
your dependencies on an overloaded central server somewhere?

~~~
Moto7451
Until recently iOS only supported statically linked libraries which could lead
to issues if you needed to use multiple components that had a shared
dependency that you needed to upgrade for one reason or another. You couldn't
touch the embedded version. Additionally there was no native package manager
which made sharing libraries a clumsy affair. Cocoapods makes both cases
easier.

~~~
seanalltogether
This is correct. Starting with iOS 8 they finally allowed linking against
custom frameworks which is why Carthage is now becoming much more popular.
CocoaPods solved a critical problem of getting dependencies linked in, while
creating a new problem of having your xcodeproj file and build settings be
managed by them. I'll be happy enough to drop them for new projects going
forward.

~~~
jallmann
> Starting with iOS 8 they finally allowed linking against custom frameworks

I don't keep up with iOS development enough to know if anything has changed
with respect to static/dynamic linking in iOS 8, but it has always been
possible to use custom frameworks in iOS (eg, frameworks you build yourself,
unless the community has another definition for 'custom framework').

The framework directory structure is a bit unorthodox, but it's really just
your statically built library (absent any '.a' suffix) alongside any header
files in a Headers folder. Again, not sure if this has changed with any
support for dynamic linking.

~~~
seanalltogether
Sorry you're right I should have been more explicit, starting in iOS 8 you can
use dynamically linked frameworks.

------
jrochkind1
What an unusually reasonable discussion. good on everyone.

------
sdegutis
I love how this was like _the_ perfect storm of things that could go wrong,
and how it seems like mhagger is just amazed more than anything else.

~~~
revelation
Hardly. Do they hit edge cases? Yes! Is using GitHub for your CDN a dick move
and going to cause problems, regardless? Yes!

The response from mhagger is unnecessarily apologetic, and I predict we'll see
an official update from GitHub on this soon.

~~~
zeveb
> Is using GitHub for your CDN a dick move and going to cause problems,
> regardless? Yes!

I don't know about that. Both oh-my-zsh[1] and emacs prelude[2] use git repos
as their code distribution mechanisms, and that works really well. I think the
real issues here are exactly what is called out in the issue: poor usage of
git, and poor directory layout.

[1] [https://github.com/robbyrussell/oh-my-
zsh](https://github.com/robbyrussell/oh-my-zsh) [2]
[https://github.com/bbatsov/prelude](https://github.com/bbatsov/prelude)

~~~
revelation
They are using GitHub for their intended purpose, hosting their code. That is
perfectly fine.

What is not perfectly fine is using GitHub as your package host,
CocoaPods/Specs is the equivalent of Debians APT using one big GitHub repo to
host all their packages. It has 92567 commits and 6872 contributors.

~~~
masklinn
OTOH Homebrew also uses github as its package host and has a respectable 62000
commits and 5600 contributors and github seems to be just fine with it.

The big differences seems to be in the way they do their thing: I'm reasonably
sure homebrew just git clones then updates the local repository normally[0],
it has "only" 2500 files in Library/Formula, and because of its different
subject it is way less write-active, CocoaPod has 1k commits/week which look
to be increasing pretty much constantly, homebrew is around 350 with ups and
downs.

Also not sure it matters, but homebrew has lots of commits updating existing
formula, cocoapod changes are almost solely addition (publishing a new version
of a package adds a new spec and doesn't touch the old one)

[0] which is exactly the bread and butter of github

~~~
plorkyeran
The fact that Github added an API specifically to reduce the server load from
Homebrew suggests that it wasn't "just fine". One of the Homebrew maintainers
works for GH so they just had a much more direct route to solving the problem
than with CocoaPods.

~~~
mikemcquaid
In this case it's actually more that there's a Homebrew maintainer who works
for GitHub (me) who has been working on a bunch of improvements to Homebrew's
update system (in my spare time). The desire for the API came from Homebrew's
side rather than GitHub's and, as it reduces load for GitHub, it was a net win
for both parties.

------
ak217
I love GitHub's response, but I would urge the project more strongly to use
modern CDN solutions. CDNs are dirt cheap and incredibly powerful nowadays,
for the data sizes that we're talking about here.

~~~
thecodemonkey
How would you define "dirt cheap"? One of the most popular CDN's out there
(Akamai) charges $3,500/month for 10TB/month. Who's going to foot that bill?
:)

~~~
ak217
That's a list price that they charge those who don't care to negotiate.

CloudFlare starts at $0 and doesn't meter/charge for bandwidth. CloudFront
charges 9 cents per GB and is integrated with other AWS APIs (which can be
very useful). Both those solutions could be managed with a donation pool, I
would try the CloudFlare free tier first.

~~~
teraflop
CloudFlare's terms of service specifically forbid using it as a file hosting
service.

------
tjdetwiler
Rust's cargo does something similar, however it looks like they were much more
conscious of git-scalability (ex: limiting the directories in a single level,
only appending lines to files to make diffs small).

[https://github.com/rust-lang/crates.io-index](https://github.com/rust-
lang/crates.io-index)

~~~
wycats
For what it's worse, both of these characteristics indeed weren't an accident.

At the time, I wrote a script that hammered git commits into a repository
using different strategies and looked at what the git repository would look
like after 100,000 and a million commits. The "one version per file, nested in
a flat structure" had serious issues.

There may still be scaling limits with the Cargo approach, but if we reach
them, we have plans to create a new registry with a new initial commit and let
the old registry age out, then rinse/repeat. At the moment, we haven't hit
limits yet (with about 1/3 of the packages that Cocoapods has).

------
iamleppert
Amazing to me that people create inefficient systems like this and then
complain when they are rate limited.

------
maaku
Using Github as your CDN is a dick move. Kudos to GH for not banning the
project out-right, but CocoaPods should seriously reconsider what they are
doing.

------
xemdetia
As a current maintenance developer/systems guy I can definitely feel the
tempered annoyance from mhagger here. It's definitely nice to not remind
yourself that it's not only your set of recurring issues in front of you that
people have to deal with.

------
noahlt
Go's package manager, `go get`, also downloads from GitHub. I don't know the
details of how `go get` and CocoaPods work, but I would be interested in
learning why one is unscalable and the other seems to work.

~~~
karaziox
Because cloning the dependency is not the issue here. CocoaPods is keeping its
index in a git repo that is updated by the user as a way to get the latest
index. This is the repo that incur a lot of requests from everyone.

Go get on the other hand doesn't keep any index. It just uses the url to
download the dependency because of the mapping "url==project name" that exists
with go projects.

------
SuperKlaus
In fact, they are maxing out five _CPUs_ \- not five nodes, big difference.

------
fokinsean
I found the solution humorous. Ironically shallow clones are causing the
problems, so fetch the max :)

$ git fetch --depth=2147483647

~~~
alblue
The problem is that the initial clone is depth=1, then subsequent fetches are
depth=MAX. If you did clone depth=MAX in the first place it is faster to
serve.

Shallow (depth=1) can be converted into a full clone with the above.

------
kodablah
Has any consideration been given to Bintray[1] as an alternative store for
this stuff?

1 - [https://bintray.com/](https://bintray.com/)

------
rcthompson
Reading the issue, it seems that one of the problems is a single directory
with lots and lots of files in it, which is something of a pathological case
for Git. Now, this could be "fixed" by splitting the files in that directory
into subdirectories, but the one giant directory will still exist in all the
past commits. So would this actually fix anything, or just keep it from
getting worse?

~~~
cyphar
Filesystems also have the same pathology, which is why git's object store is
of the form prefix/object.

~~~
rcthompson
In a filesystem, you can address the problem by reorganizing the directory
structure. And you can do the same for future Git commits. But unless I'm
mistaken, that colossal directory is stuck in the git history unless you
actually rewrite that history, which would require everyone who cloned the
repo to either rebase or re-clone the repo. Maybe it's not a problem because
the cost is only paid when one checks out one of those old commits, which
would happen rarely?

------
kmm
Funny thing is that the repo is only 7 MB gzipped (or 4 with lzma). Not that
surprising, since it's just metadata of course. They say they have about 1
million fetches/clones per week, so that would make about 16 TB per month. I'm
not sure how much bandwidth costs, but wouldn't some sympathetic CDN host that
for free, since they're OSS?

------
soheil
I was up until 2am last night trying to publish my Pod [1] and Github kept
timing out.

I had no idea it was just CocoaPods repo because my other repos were working
fine. I accepted defeat, went to bed and everything was working great in the
morning.

[1] [https://github.com/soheil/SwiftCSS](https://github.com/soheil/SwiftCSS)

------
sly010
I would be interested to know what are the other top GitHub repositories.
Afaik the Nix package manager uses a similar model (using a GitHub repo as a
database), however they periodically release snapshots and the default
configuration uses those instead of git.

~~~
vbezhenar
I'm not sure how DefinitelyTyped works, but their repository
[https://github.com/DefinitelyTyped/DefinitelyTyped](https://github.com/DefinitelyTyped/DefinitelyTyped)
is huge and may be they use that github repository as a single distribution
point.

~~~
WorldMaker
People interact with DefinitelyTyped in a set of ways, from currently most
common to least common:

1\. NuGet packages, which are built from the GitHub repository but then
redistributed over a non-Github CDN (NuGet's)

2\. tsd management tool
([https://github.com/Definitelytyped/tsd](https://github.com/Definitelytyped/tsd)),
which looks like it prefers Github's CDN raw URLs rather than full/shallow git
clones

3\. typings management tool
([https://github.com/typings/typings](https://github.com/typings/typings))
with "ambient" typings searches (`--ambient` flag), which I believe also
prefers Github's CDN raw URLs, as it essentially forked from TSD

That said, it's past time to move beyond the giant huge DefinitelyTyped repo,
and I for one heartily recommend people migrate to typings which has better
support for NPM and other module and package management systems, as well as
smaller unit/module-focused Github repositories.

------
zoul
Another reason I consider
[https://github.com/Carthage/Carthage](https://github.com/Carthage/Carthage) a
better solution of the dependency management problem.

~~~
cballard
And [https://github.com/apple/swift-package-
manager](https://github.com/apple/swift-package-manager) once it's ready,
which is similar to Carthage in structure (decentralized!).

~~~
kylef
Swift Package Manager will also have a centralised package index in the
future, here's a quote from their [package manager proposal][1]:

> We would like to provide a package index in the future, and are
> investigating possible solutions.

[1]: [https://github.com/apple/swift-package-
manager/blob/master/D...](https://github.com/apple/swift-package-
manager/blob/master/Documentation/PackageManagerCommunityProposal.md)

------
Negative1
When he says approaches similar to 'other packaging systems', which ones is he
referring to? I can see why this is a bad approach but am unfamiliar with what
would be considered a better practice (outside of just hosting a .tar on
CloudFront).

~~~
wpeterson
RubyGems.org has their own web server and web services for publishing library
versions and to allow the clients to fetch libraries and query the universal
registry.

~~~
s_kilk
Same goes for almost any language-specific package manager you could name:
Rust/Cargo, Clojure/Clojars, Node/npm, Elixir/hex, Python/PyPi.

No matter which way you slice it, what CocoaPods is doing is a bit daft,
especially at their scale.

~~~
steveklabnik
Small note: Cargo _does_ use an index (not the source code), in git, on
GitHub. However, we're already doing the directory layout that they recommend
in-thread, so we shouldn't have this specific problem.

------
superuser2
Just last night, all my pod installs were timing out after ~30ish minutes.
That explains it.

~~~
jessaustin
Should we infer from your behavior that such a timeframe is fairly normal for
cocopod? If I ran an "npm i" (or "pip install", etc.) that didn't respond for
a minute, I would suspect a problem and kill it. How can any development
process that takes longer than that be practical?

~~~
superuser2
Indeed, I was killing and restarting the process for a while, trying to get it
to be more verbose, etc. Eventually I decided to try letting it run, and that
didn't work either.

I don't know what's normal; last night was one of my first iOS projects.

------
joeblau
I just installed Cocoapods last night and tried to clone down the repo. It
took about 5 minutes and I thought to myself "Is my 150MB/s connection slow?"
This definitely clears up what was going on.

------
debacle
Why aren't the packages distributed? Composer is incredibly distributed and
likely doesn't cause nearly the same headaches for GitHub.

Seems like a poor design decision on the CocoaPods side.

~~~
donarb
What package? My project may have a dependency on a specific commit that's
between point releases.

The problem is not packages, it's the index, containing 16000 subdirectories,

~~~
debacle
So the packages themselves aren't part of the repository?

~~~
kmm
No, only references to the repositories themselves are. The total Specs repo
is only 300 megabyte, which comes to about 18 kilobyte per package on average.

It's still absolutely ludicrous to use git for it, or worse, to bother github
with it.

------
voltagex_
It's difficult to run an open source project on a budget of $0. You're always
relying on the goodwill of others.

------
LoneWolf
While I do not have much knowledge on the subject, why not using rsync?

------
nimish
hopefully we can now move to using real artifact repositories.

------
rdancer
tl;dr: "Using GitHub as your [free-of-charge] CDN is not ideal, for anybody
involved."

------
speps
Why is everyone talking about CocoaPods where the title is CoacoaPods anyway?
:)

~~~
rtkwe
The title was misspelled and has been fixed. It's CocoaPods right on Github.

------
whitehat2k9
Only the Apple development community would think it's OK to have 16,000
subdirectories in one place and abuse GitHub as a free CDN instead of putting
some actual effort in and develop their own repository infrastructure - you
know, like almost every other package manager in existence.

~~~
mayli
It should be "Apple or Ruby development community", homebrew is using similar
tech stack distributing the `Formula`.

~~~
whitehat2k9
I guess it's no coincidence that those two hispter communities are closely
related.

------
Const-me
I don’t think GitHub acts wisely here.

Short term sure, they’re doing the right thing, implementing a nice way to
manage the free rider problem without hurting them too much.

But long term it’s different.

Financially, one average programmer = $80k/year, one average cloud server =
$4k/year. And, GitHub has hundreds of millions of venture capital. More than
enough to provision a few more servers, even if they will be installing new
servers just for those pods.

The way they act now will lead to someone will develop a decentralized
git+torrent hybrid. When that happens, sure, those pods will no longer consume
precious GitHub’s resources. However, for the rest of the github users, there
will be no reason to stay on GitHub either.

~~~
cyphar
The way CocoaPods is using git is bad, so they'd probably run into some
problems no matter how they host it. If you know anything about filesystems,
they start to have pathologies when you have many entries in the same
directory (which is why git has it's object store in the format
prefix/object). In addition, it looks like git has it's own pathologies with
such large numbers of dentries. Having 16000 entries in your "specs" directory
is not a good idea. No matter how you store it.

Not to mention that "just buy another server for this _one_ project" sounds
like something CocoaPods should _pay_ for.

~~~
Const-me
>The way CocoaPods is using git is bad

Very likely true, but I don’t see how’s that related.

>they start to have pathologies when you have many entries in the same
directory

Only true for inefficient filesystems like FAT.

For NTFS, 16k entries is nothing, the performance fill start to degrade (due
to directory fragmentation) at around 100k entries:
[http://stackoverflow.com/a/291292/126995](http://stackoverflow.com/a/291292/126995)

>"just buy another server for this one project" sounds like something
CocoaPods should pay for.

I don’t think that’s how 21 century economy works in this case.

Github’s value is likely between $0.75B and $2B.

Bad PR caused by this story will exceed 10 years TCO of that extra server.

