CocoaPods downloads max out five GitHub server CPUs

onli · on March 8, 2016

Note how perfect that response from mhagger is. A clear, honest sounding assurance of what Github wants to deliver. A perfectly comprehensible description of what is the problem, and where it is coming from. And then suggestion how to fix it the project actually can work on, plus mentioning changes to git itself that Github is trying to make that would help. It not only shows great work going on behind the scenes (and if that is untrue, it at least gives me that impression, which is what counts), but also explains it in a great way.

manyxcxi · on March 8, 2016

I was astonished at how selfish/myopic/whatever alloy's response was.

To be blunt, you're abusing the shit out of SOMEONE ELSE'S product that you're not even paying for. Your first question shouldn't be to see what Github can do for you to make it so you don't have to make changes. You should be falling over yourself investigating all available avenues for reducing load.

It's an incredibly entitled way to think about things and I would have a real hard time employing someone who's first response was like this.

tptacek · on March 8, 2016

I don't know, it sounded to me like he just didn't totally understand what Github was saying. By the end of the thread, it seemed like everyone was agreeing. I wouldn't be comfortable using words like "selfish" to describe any of what I read.

I certainly don't think the barb about your willingness to employ people who write things on Github issues threads that you disagree with is helping anyone understand any part of this situation. I understand the urge to find ways to be emphatic about how much you disagree with things, and I often find myself compelled to write lines like that, but I think they're virtually always a bad idea.

mintplant · on March 8, 2016

> I certainly don't think the barb about your willingness to employ people who write things on Github issues threads that you disagree with is helping anyone understand any part of this situation.

It seems to be one of HN's go-to insults. "Look at this person's behavior, I would never hire them," as if everyone wants to work at your startup.

pyre · on March 9, 2016

> as if everyone wants to work at your startup.

Do you really think that when people say "I would never hire that person" that there is an implication of "everyone wants to be hired by me (and by extension my company)?"

manyxcxi · on March 9, 2016

> Do you really think that when people say "I would never hire that person" that there is an implication of "everyone wants to be hired by me (and by extension my company)?"

Nailed it. Just because someone, a team, or a company has hiring criteria doesn't mean they assume everyone wants to work at their company. It means they have an idea of who they are looking for.

lazylizard · on March 9, 2016

nope. they also want everyone to know.

its the difference between buying a packet of your favorite snacks and telling all your friends what your favorite snack is..

you probably expect them to like the same snack.

manyxcxi · on March 9, 2016

But what if they respond with a snack I've never heard of and interest me so with its description that I've just found my NEW favorite snack.

Additionally what if they inform me about my snack with information that means I can't morally choose it anymore, or that it's dangerous to my health? I now have the opportunity to switch my viewpoint, or reduce the weight that it has in my criteria.

It begins the discussion if you as the person starting the thread are interested in having it and not just looking to be agreed with. I, whether I'm in the minority or not, am always looking to start the dialog. Being agreed with is boring.

pyre · on March 9, 2016

You're right. The obvious answer is to never communicate with other people... ever. Communication with other people implies that you are full of yourself and looking to show off to other people how awesome you are.

manyxcxi · on March 9, 2016

I didn't say I disagree with his statement. I'm saying, maybe more implied, that I'm not going to hire someone who displays a lack of interest in finding a real solution to a real problem that has to do very much with what they're trying to build. And on top of that shows a serious streak of entitlement and a lack empathy towards the very service they're essentially abusing and not even paying for.

I wonder how many times CocoaPods has ruined someone's day/night on some GH team. I wonder how many dinners some mom or dad has missed with their kids because their service alarms are going apeshit. I don't think it's hyperbole to say that if you are a top 10 repo at Github, you are responsible for ruining individuals days and taking time away from their families if you are hammering the system.

Now, these are entirely my opinion and I'm not saying alloy is bad at what they do. I'm saying that is a collection of attitudes that I'm not going to put on my team.

johnnyfaehell · on March 9, 2016

I think it is a hyperbolic statement to say they're responsible for ruining people's days.

Let's think this through and ask ourselves a few question.

1. Did they go out there way to do damage? 2. Are they responsible for deciding the infrastructure and how well it can handle with load? 3. Did they force said people to work at GitHub? 4. Is the open source culture and hosting a major part of GitHub business plan? 5. Are they responsible for staffing to ensure people are scheduled to work when work needs some?

matt4077 · on March 8, 2016

Yup, only matched by 'Great article! We also use Javascript at <randomStartupNameNobodyEverHeard>.

justinlardinois · on March 8, 2016

I wonder how many of the people that say that actually are employed in a position where they get to make hiring decisions.

manyxcxi · on March 9, 2016

Even as a non-manager I was employed in a position to make hiring decisions. Do you ever get brought into interview loops? You're part of hiring decisions. You may not have final say, but you are hopefully very much listened to.

nickpsecurity · on March 9, 2016

Really? Alloy specifically said they were blasting a free service's infrastructure for their own benefit. Told about the issues, the response was basically Alloy et al wanted to invest no time or money into a better situation. Even cited HR and funding benefits.

That's selfish to the point that it could be a textbook example of an externality. Fortunately, like you said, things got agreeable by the end with Alloy taking simple steps he was given to make thjngs better for everyone.

onli · on March 9, 2016

> Told about the issues, the response was basically Alloy et al wanted to invest no time or money into a better situation.

That's actually not true.

https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm... is the answer we are talking about, aren't we? What alloy is doing there is:

1. Thanking mhagger for the response

2. Asking for additional explanations

3. He explains then why the project is taking the route they took, the benefits for them. Explaining alone does not mean unwillingness to change. It just makes sure it is clear why it is like it currently is. Yes, that mentions the time and money benefit, but so what – it's honest, and it is valid to limit expectations on what is possible now.

4. He then indeed asks for a discussion on how to improve the current system without changing it completely. But this does not mean not investing time, to the contrary: He actively invites a continuation of a discussion and already there makes clear that he is indeed willing to work on a better solution, and that is the core point making this a constructive answer.

How the discussion continues in my eyes clearly shows that the negative interpretation of this first answer, in your comment and this thread, is wrong. That's not someone blocking change, not even at first, that is someone asking for clarification and even clearer tasks to do. That's not a bad thing at all.

nickpsecurity · on March 9, 2016

"> Told about the issues, the response was basically Alloy et al wanted to invest no time or money into a better situation."

"That's actually not true."

"taking the route they took, the benefits for them... Yes, that mentions the time and money benefit, but so what – it's honest, and it is valid to limit expectations on what is possible now."

So, it was true. Then the dialog continued from there with Alloy paying attention to the situation. Alloy refused to do any signiciant work on their end for their maximal benefit in a disruptive situation from a free service. Even admitted they were using a service for something it's not designed for but didn't care about moving to a more appropriate one. Hence, my calling it selfish.

Eventually, others had invested their own time and energy into the problem enough to come up with some simple recommendations for Alloy that take very little effort on his part. Alloy summarized those and agreed to attempt them. Thread was closed before we could see where that went.

Given above, I stand by my claim that he was a selfish individual pushing his own liabilities onto others wherever possible. Even in how it was remedied was mostly on others. On other hand, he might make a good capitalist w/ that level of exploitation and externalizing. :)

onli · on March 9, 2016

Saw how you went from "no time and money" to "no significant work"? My point is that he signaled explicitly that he was willing to invest time and work. He literally writes so:

> I.e. I’d like us to continue this discussion, at first, from the notion of us maintaining the existing architecture. Where things are absolutely impossible, it would be great if you can include more links to docs/source that explain why things are impossible.

Notice the "at first", notice also the following proposed action of working with a snapshot.

You made clear you were expecting another reaction. And I see why. Still, I think you miss how much good faith was contained in this response. It is a bit sad this gets overlooked. People forget text is not that easy to interpret.

nickpsecurity · on March 10, 2016

"Saw how you went from "no time and money" to "no significant work"? "

You're right that text is not easy to interpret. For instance, there were two interpretations of my text: a literal and precise one that focuses on how much effort I say he would commit; an interpretation that realizes I was speaking figuratively with hyperbole. It was the latter. The message was a counterpoint supporting that he was selfish rather than a precise statement of how selfish he was. You would be 100% right if we were talking literally about him such as in a court filing or HR report.

"Notice the "at first", notice also the following proposed action of working with a snapshot. You made clear you were expecting another reaction. And I see why. Still, I think you miss how much good faith was contained in this response."

This is possible. Let me re-read his post first. Alright, done. Here's a re-review.

His first response starts with thanks and statements that show either (a) an incredibly joyous and friendly personality or (b) brown-nosing of a salesman before a pitch. Horizontal line. Unclear on some things. Asks for more information. We then get to the reasons:

1. Did no work on syncing data to reduce funded development hours.

2. Don't want to operate a repo due to reduced effort or funding.

3. Easier for their users and adoption.

These are all self-centered. Honest as you said but already support my claim of selfishness. Let's keep looking. Upon a suggestion of other packaging systems, vaguely claims they are using a "smarter" method then reiterates HR and funding justifications above. Ignores alternatives in next sentence to reiterate their existing, strange, and broken solution with a dismissal about having to build a cathedral rather than just using existing solutions.

So, Alloy already laid a foundation of total selfishness in terms of time, funding, and design inflexibility. At this point, Alloy is interested in solutions that totally maintain their existing design and lack of commitment to anything else. Offers to make a few simple changes that "would still use your resources." Asks for information that basically leads to those in recommendations that they begin to apply.

So, re-reading his post, it comes off as incredibly selfish using text that's not hard to interpret. He clearly believes their design works, won't be changed unless forced, changes must take little effort from them, they must not use their funding, and must specifically use GitHub's resources. My claim of selfish and externalizing is fully supported at this point. I think the other commenter's claim of being "myopic" about what he's doing in the project is accurate, too.

nrser · on March 8, 2016

yeah, my read was that he didn't totally get the context of the package dist / CDN suggestion immediately. i think the github peeps probably understood that they were using this approach because the developer workflow was simple and strait-forward, but as you scale up simple and strait-forward approaches often break down, and CocoaPods seems to be hitting this tipping point with using Git and Github as a package dist system.

this is what we used to call the "good" problem (things breaking because they are successful), but that doesn't make it cheap or easy to fix. the other stuff they're talking about in the thread will alleviate some pain and buy some time, but it won't solve the fundamental problem if CocoaPods continues to get "big" (imagine apt or yum trying to run like this).

i understand their want to maintain the simple and coherent workflow... if i was writing a package dist sys, i would love to have it work off a standard git repo. maybe this is something that could be solved with a plugin architecture like the large binaries stuff so that developers could continue with their preferred workflow but end-users could take advantage of a CDN-like system for distribution.

vacri · on March 8, 2016

> my read was that he didn't totally get the context of the package dist / CDN suggestion immediately.

This is mine as well, but it's also troubling to me, given that the repo in question is meant to be a package management system; it means there are fundamental holes in the user's understanding of packaging systems.

My mentor has a lot of contempt for the bevy of packaging solutions that people come up with - invariable people look at the old ones, think they're too complex and wrong, write up new code that is Slick(tm) and Fast(tm) and Cool(tm) and they are... until they hit scale. Whether that scale is number of users, or serving multiple environments, or serving a great many packages of different versions... the lack of domain knowledge in the design stage will cause huge amounts of issues.

je42 · on March 9, 2016

There is not an old "xcode" compatible package system. The issue is here, that the C++/Objective-C/C eco system doesn't have a standardize module system. Cocoa pods fills good part of the gap.

manyxcxi · on March 9, 2016

No, if this way of thinking came out in an interview, where the candidate's first response to the hypothetical was basically "I don't really understand what the problem is but I'm sure 'they' can mostly fix it" (and obviously I'm paraphrasing into how I interpreted Alloy's response), I wouldn't hire them.

Now granted, I'd try to give them a chance to step away from that statement and let them show me that they are interested in understanding the issue and interested in reducing their impact on the product. I'm okay if they don't yet know HOW, but if they basically just throw up their hands, say it's someone else's problem, and leave it at that then no. In fact, hell no. I'm not interested. It's shows a level of entitlement and a lack of interest in their craft and I will not subject my team to someone like that.

BogusIKnow · on March 8, 2016

I had the same impression from alloy's response. I've basically read it as "hi ho we will not change anything".

And it had this passive agressive ring to it, with the hand clapping and the hurray in the beginning and the stone walling in effect.

whatever_dude · on March 8, 2016

Had exactly the same feeling. I felt bad for the GH employee who responded. He was helpful and thoughtful, made it clear what the problem was, offered advice and promised to make whatever's possible on their end...

...only to have someone come up and act like a paying customer whose expectations weren't being met. He answered suggestions by saying something that comes down to "I don't understand, can you repeat please", and never quite grasped that if he wants a better experience for his users, he also needs to work for it.

The introduction to the response, in typical douche manager style, was the cherry on top.

manyxcxi · on March 9, 2016

Even if they were a paying customer- what would you do if GH could support that volume? Obviously there are SLAs in place if you're big enough, but at the end of the day, if they simply just can't handle the volume what are going to do? 'Nothing' isn't the answer if you want to maintain users. You'll have to solve the problem no matter what, it just seems really egregious to me to have that response when you're not giving anything back (money or time) to the team that is essential allowing your service to exist.

melloclello · on March 8, 2016

That was the smuggest use of emoji I've seen in a while

Nexxxeh · on March 8, 2016

I took it to be more defensive than passive-aggressive.

I took that emojis to be the exact opposite of the almost sarcastic tone I think you're interpreting it to have.

I guess even with emojis, it's hard not to make tone ambiguous.

couchand · on March 9, 2016

I guess even with emojis, it's hard not to make tone ambiguous.

...probably especially with emoji.

knorker · on March 8, 2016

Yeah. "That's your problem, and we don't want to change anything because that's a non-zero cost for us just to fix a cost on your side."

And what the hell was with the quotes around "free"? Are you paying? No? Then there's no quotes about it.

CRConrad · on March 8, 2016

Not that I'm in the habit of giving people the benefit of the doubt, especially when it comes to mangling grammar, spelling, and punctuation... BUT, I've noticed that awfully many people nowadays seem to think that quote marks are some kind of emphasis sign. So those of you who ARE in the habit of giving people the benefit of the doubt on shit like this might take that into consideration here.

cmyr · on March 8, 2016

I think you could assume better of people. His response was maybe a bit tone deaf, but text is a very poor way of communicating mood and circumstance. There are any number of creditable explanations from tiredness to language skills that could make sense of the response you're offended by.

fwiw: "I would have a really hard time working for someone whose first inclination was always towards criticism over accommodation or compassion." But then I also acknowledge there may be a whole bunch of other stuff going on here behind the scenes. ;)

manyxcxi · on March 9, 2016

That's valid, everyone has the attitudes and traits they're looking for on both sides. I should have better clarified in my first statement that I in fact would give them the opportunity to walk that statement back to show their interest in fixing it, and more importantly show some empathy to the GH team.

My biggest problem is that I can't imagine a time, even if I don't understand the problem, where I would say that I'm not interested in essentially not abusing a free system that can't handle my load. He basically said he had better things to spend his time or money on. That shows a motivation/attitude that would likely be there regardless of the issue or if they understood it. That's his knee jerk reaction, which I would say is probably his most honest, and it seems very selfish and not empathitic to the GH team at all.

acjohnson55 · on March 9, 2016

I'll caveat this with the fact that I'm a coworker of alloy's and orta's, so I'm admittedly biased to seeing them in a positive light.

But I think you may be reading his tone more negatively than necessary. What I see is him starting off by expressing gratitude and then switching voices to communicate very explicitly what the needs and desires of his stakeholders are. He's simply trying to reflect that as clearly as possible and discover the additional context. This was clearly effective, as you can see from the rest of the conversation that with all the information out there, everybody comes to a mutually acceptable consensus. Problem solved!

What we've ultimately got here is a free service built on a free service. CocoaPods has nothing but the time and effort of volunteers. GitHub has input of resources from the commercial side of their business. Both sides clearly want to preserve the utility of this end-to-end workflow in a more sustainable way.

"Falling over yourself" is a subjective amount of effort, but clearly CocoaPods has tried to minimize their impact on GitHub. As it turns out, the attempted optimization of shallow fetching backfired, but that's not from lack of regard for the resources they rely on. What was missing was exactly the context the Github employees provided.

Honestly, I think people are offended second-hand by a perceived lack of groveling on CocoaPods part, and to me, that's way overblown.

jwarren · on March 8, 2016

I think there's definitely a tone problem in his first response, but if you continue reading the thread, you'll see that everything else is very positive. I guess it was just a bit of the classic internet text expression problem. Massive high five for the Github guys for looking past it and being extreme pros.

Bluestrike2 · on March 8, 2016

It probably wasn't intentionally selfish. Just really short-sighted, which isn't uncommon. Sometimes it takes a while for things to penetrate for anybody. Alloy's later comments seem much more productive. Whether that's because of backlash, he reread the earlier comment and realized how it seemed, or just had some time to think about the imposition on Github, at least it looks like CocoaPods is going to reconsider how they handle their distribution.

threatofrain · on March 9, 2016

It's hard to see the difference between selfish behavior and short-sighted behavior because they're often confounded. What I got from this is that Alloy will consider the deep vs shallow deep copying Git issue, but wants to continue using Git and GitHub as a CDN for a massive user base, and doesn't want to re-architect because developers are expensive and time is limited.

The Git deep v. shallow issue just puts a band-aid on the CPU problem, but it doesn't do anything about the terabytes of bandwidth per week (it'll be worse), and it won't do anything about GitHub's claim that Git is not meant to be used as a CDN and doesn't scale well.

They've become a big project that warrants thinking about revenue or organization strategy, but they're delaying it by externalizing their costs. Cases like these can pressure GitHub into rethinking the leniency of their policies.

I also think that if you're in the top-5 resource consuming group, more sympathy would go your direction if you were a paying customer, but they've indicated no interest.

braythwayt · on March 9, 2016

  > To be blunt, you're abusing the shit out of SOMEONE ELSE’S
  > product that you're not even paying for.

I <3 GitHub, but for their own business/values/whatever reason they choose to host open source for free. It’s not like these people have found a loophole and are getting a paid service for free.

AFAIC, that makes every free user a customer. They may not be a paying customer, but it's GitHub’s choice to be in the free hosting business.

couchand · on March 9, 2016

Just to clarify, GitHub hosts a very specific type of content: open source software development projects. They never offered to be a general purpose hosting provider.

From @mhagger's measured and thoughtful reply: "We understand that part of the CocoaPods workflow is that its end users (i.e., not just the people contributing to CocoaPods/Specs) fetch regularly from GitHub..."

braythwayt · on March 9, 2016

I know where you’re going with this, but GitHub deliberately blurs that line too. For example, I host both of my blogs on GitHub.

I guess I’m saying that I don’t see CocoaPods as being a “bad actor” so much as the extreme tail of a distribution.

couchand · on March 10, 2016

Are you hosting your blogs as plain git repos or are you publishing static pages to GitHub Pages, the static hosting option?

manyxcxi · on March 9, 2016

That's true that anyone with a repo is essentially a customer, paying or not. But CocoaPods is really a bad actor in this in that they're not just hosting source code up their for development purposes. They're using it like a CDN, and bless Github for not find a reason to boot them, but I'm sure they've got a bunch of legal ways to do it in their terms of service. I'd argue that CocoaPods is really breaking the spirit of what GH is trying to provide.

moby · on March 9, 2016

This is getting granular, but any GitHub user - whether free or paid - is bound by the Terms of Service. Yes, that makes them a customer, and they accordingly have to adhere to the conditions of using the product.

The one that CP's usage most likely confronts is G.12 - found here: (https://help.github.com/articles/github-terms-of-service/)

> If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption.

sanjeetsuhag · on March 8, 2016

> you're abusing

Cocoapods uses GitHub. No abuse here.

Wintamute · on March 8, 2016

You're quibbling semantics.

Using a GitHub repo as a high traffic code CDN and keeping 5+ cores pegged while being the single biggest consumer of resources across the whole platform could be reasonably defined as an abuse of the service.

Primary definitions on Google:

"abuse" verb: use (something) to bad effect

"abuse" noun: the improper use of something

jethro_tell · on March 8, 2016

Dabbling in the 'kind of customer we can afford to lose' territory.

jenscow · on March 8, 2016

Not really. Imagine the backlash github would receive here.

mioelnir · on March 8, 2016

As an ops person, I can tell you that I would probably upgrade my paid github account instead of cancel it if a project was thrown out that had the audacity to issue the statement that they don't care about the infrastructure they use for free so they can focus on the important things - namely developer funding.

jethro_tell · on March 8, 2016

yeah, it's the kind of backlash that would cancel a couple other non paying projects and there would be half dozen discouraging blog posts but paying customers that aren't open source aren't going to leave because a free tier opensource customer was abusing a service.

manyxcxi · on March 9, 2016

> Not really. Imagine the backlash github would receive

I don't know... I'm sure they'd get some, CocoaPods has a very large user base after all. But if GitHub laid out the reasons like they laid out the options to start this thread, I think they'd weather the storm fine and diffuse some people who show up to be angry.

I could see why GH would drop them and think they'd be well within their right and in good moral standing in my book. I just don't think they would unless the maintainers became incredibly hostile or proved unable to fix the problem or even band-aid it. GH just seems too culturally invested in making things like this work to a satisfactory conclusion. I have a feeling that the CocoaPods team will be schooled in a lot of things directly from the GitHub team as they work to resolve the performance issues, just look how informative the initial post was.

MBCook · on March 8, 2016

It's very clear GitHub was not designed to be used as some project's personal CDN for this kind of traffic.

It's abuse. Wasn't intentional, but at the scale that Cocoapods is running it's abuse.

odbol_ · on March 8, 2016

> personal CDN

Discounting the fact that CocoaPods is used by millions of iOS developers...

justinlardinois · on March 8, 2016

He definitely was referring to the CocoaPods team when he used the word "personal," not their users.

If CocoaPod was run by a business that was charging those millions of developers for their services, it would be reasonable to expect that business to pay for a real CDN.

They're not, so that's not a reasonable expectation. But it's no more reasonable to make these demands of GitHub. Giving out a free product doesn't mean that you're required to give it out unconditionally, or in unlimited amounts, or forever. In the end, GitHub owns the infrastructure and services it's providing and can do what it wants with it.

Lazare · on March 8, 2016

Pretty sure that's the point. If it wasn't used by so many developers, it wouldn't be causing load issues. :)

mianos · on March 8, 2016

It's not the developers causing the issue, it's the end users.

wslack · on March 9, 2016

In this case, developers are the end user population.

mianos · on March 9, 2016

"This repository experiences a huge volume of fetches (multiple fetches per second on average). We understand that part of the CocoaPods workflow is that its end users (i.e., not just the people contributing to CocoaPods/Specs) fetch regularly from GitHub"

accatyyc · on March 9, 2016

Yes, but he means that "end users" of CocoaPods are actually developers. It's a package manager for libraries.

mianos · on March 9, 2016

Actually, in the text, I guess no one voting me down has actually read it, he pretty clearly defines the developers as the people who contribute to the packages, not the people who download and use them. It's like saying, as an IDE user, you are a developer of the project, you are not, you are an end user of a developer product. The end users, in this case, software developers, are using it as a CDN. It's pretty clearly stated in multiple places, github is not a CDN.

SlashmanX · on March 9, 2016

Except the original quote in this comment chain was "Discounting the fact that CocoaPods is used by millions of iOS developers." so that's what 'developers' was referring to in the comment you originally replied to

jlarocco · on March 8, 2016

I totally disagree. The CocoaPods usage model is not at all the expected way to use GitHub. I'm surprised using GitHub that way is even allowed by the TOS. It's very obviously a hack to avoid paying for their own infrastructure. The project representative in the issues thread even admits as much in his response.

Just about every other package manager for every other language (Pip, CPAN, Hackage, etc.) uses its own infrastructure.

zyxley · on March 9, 2016

Homebrew uses Github in roughly the same way.

jethro_tell · on March 9, 2016

In this case the difference between 'abuse' and fair use may be a homebrew dev that works for github. I would expect that to be a perk of most places that I work.

I've run tor exit nodes and repo hosting when I worked for ISPs and Datacenters while at the same time shuting customers down who do the same. The difference being that I had that conversation with my boss and said, 'this will violate our normal terms of service but I would like to do this.' The boss is of course more willing to make that concession when he can walk down the hall and say, 'uh, we have extremely high usage and today, can you shut down the repos until we can get another link installed?'

niij · on March 9, 2016

Hey, I run 2 middle relays (~80-90mb/s bandwidth total) on cheap VPS's, but was thinking of upping bandwidth and hosting an exi on some dedicated hardwaret. Do you have any tips for running an exit node (legally and technically)?

If you want, you can email me (link in profile). Thanks!

jethro_tell · on March 9, 2016

Honestly, I don't like to give advice about this. Due, in part to risks involved, and since I've been out of it for a couple years, I don't really feel qualified.

Everything you need is online but you'd have to find it for yourself and make decisions about the best way for you to do it.

dankohn1 · on March 8, 2016

Yes, it's a hugely positive advertisement for the essential role that Github plays in the open source community. Also impressive is the followup message from a Github API developer (and Homebrew maintainer) offering access to a beta API that Homebrew is working with to reduce load. https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

arrakeen · on March 8, 2016

taking a look at homebrew's implementation of this new API feature, i fail to see how it would dramatically reduce fetches for their (homebrew's) use case. from what i understand, it will only be called when the user manually invokes `brew update`. how often are users calling this command over and over?

that being said, i do believe it could help cocoapod's use case since the fetches are done automatically (as i understand it)

mikemcquaid · on March 8, 2016

Hello! I'm the Homebrew maintainer and GitHub employee who wrote this. The main thing this API does for Homebrew is make no-ops really fast for `brew update`. As you point out this results in no speedup (in fact a tiny slowdown) for the case where you only run it where you know you have changes. Where it becomes useful is if you are using multiple taps (Homebrew's 3rd-party repositories) which update infrequently or if you want to run `brew update` automatically in e.g. a project bootstrap script that's run frequently.

In the medium to long-term I'd like to consider Homebrew running `brew update` automatically before you `brew install` (https://github.com/Homebrew/homebrew/issues/38818). For us to ever be able to do that `brew update`'s no-op case needs to be extremely fast.

manyxcxi · on March 9, 2016

Thank you for everything you do at Github and with Homebrew.

mikemcquaid · on March 9, 2016

And thank you for taking the time to share some kind words.

ragall · on March 8, 2016

If you plan to rehaul Homebrew, I suggest having by default 3 operations, "update", "fetch" and "install" because some people might find themselves in the situation of having bad connectivity(especially low bandwidth) and being able to fetch the sources, to compile later is very important. Especially having "install" issue synchronous "update" operations is bad if you're on a network with high packet loss, like tethering to one's phone during vacation, etc... Of course, that requires being able to have repeatable "fetch" operations, and putting a local cache between the "fetch" and the "install" operation, so that if a "fetch" succeeds, a later "install" will not fail with "file not found". I've never used Homebrew, but that's advice from having used many package managers on Linux and other *nixen. My apologies if it's redundant.

barkingcat · on March 8, 2016

I know you're trying to suggest some good ideas, but never having used homebrew I think you should find out more about homebrew itself before suggesting things. I don't mean you have to be an expert at it, but just to know a bit about how it's used, and who it's used by before trying to suggest things to people.

My comment isn't to pass judgement on your suggestion, but if you took a look at homebrew itself you'd be able to make better non-generic suggestions.

Re: your suggestion - It's a generically seemingly good thing to separate out fetch from install, but as a user of homebrew, it's not very applicable because when homebrewing things you're likely already connected to the internet, and it's hard to predict when you want to brew install something before hand. If you have the internet capacity to fetch something, it'd be just as easy to brew install it there on the spot.

By extension, it's important to run brew update before installing just to make sure the package index is up to date, so I agree with the dev above, integrating brew update step before brew install would be a good thing - except - perhaps print out on console the exact version number that's going to be installed. Current behaviour does put the version # in the file name of the package being installed, but it could be listed in a more obvious way.

Often times I do a brew info, find the version and details on it before brew install. If the installation step then installs a new version (because of the brew update step), then it's a bit strange that I didn't get the version I was intending to get.

ragall · on March 8, 2016

It's not always that easy, I've found myself having to go to a place with high-bandwidth internet to fetch stuff, and being able to stay there only a few minutes. Or before boarding a plane(you don't really want to pay for internet on an intercontinental flight), etc... Also, while you might do things as-needed, it's not that difficult to always remember to run an update every time you run brew. And I wasn't even proposing to make it the default, just to make it possible to run those 3 things separately.

barkingcat · on March 8, 2016

All I'm saying is that you really need to investigate into the things you are suggesting. Please do the investigation, even if cursory before you jump in.

I am guilty of the same, but in my case, I have an installation I can actually check. You can read the source code if you don't have a mac to work with, but the important thing is this already exists

from the brew man page:

brew(1)

fetch [--force] [-v] [--devel|--HEAD] [--deps] [--build-from-source|--force-bottle] formulae

Download the source packages for the given formulae. For tarballs, also print SHA-1 and SHA-256 checksums.

hatsix · on March 8, 2016

I don't get it... If you've never used homebrew, and are only really a user of other package managers, why are you attempting to offer advice on what they've been doing for years.

Is this classic "I have an opinion and all of my opinions are great, so I must pontificate!"?

Maybe you should just trust that the developers of the only successful package manager for OS X have some idea of what their users want and need... and that as someone who has NEVER used their software, and not a seasoned veteran of any sort of similar projects, your opinion counts WAY less than any of their actual users.

ragall · on March 8, 2016

Because I'm interested in package managers in general, and I'd like them to improve across operating systems. And I am a seasoned(9 years) veteran of several Linux distributions.

justinlardinois · on March 8, 2016

A random, off topic reply to a comment on Hacker News by one of the maintainers probably isn't the best way to make suggestions.

mikemcquaid · on March 9, 2016

We do actually do that: `brew update`, `brew fetch` and `brew install` commands are separate. If you haven't already fetched the sources `brew install` will do them for you. What I'm considering is also making it do `brew update` for you too.

barkingcat · on March 8, 2016

brew update is being called over and over again in most use cases - if only to check for example, if the latest openssl release patching a critical exploit is out yet. (of course on the mac openssl is packaged with the system, but if you are using the brew version, or are compiling other software against openssl from brew, you'll need to check for updates diligently)

It is also being used in scripts, etc. Since from the user's point of view it's a no-op if there are no updates, there is no reason not to do it on a schedule.

ropiku · on March 8, 2016

It's also being used in scripts. In our Mac CI environment (for iOS builds) we update homebrew to have the latest xctool.

aikah · on March 8, 2016

When a package manager monopolizes that much resources from Github at the expense of others there is no reason to "commit" all resources to this one project. Thus cocoapods getting rate limited because of the obvious bandwidth abuse going on here. mhagger answer is pretty straight forward.

EDIT: the upside is that cocoapods will have to either rethink there architecture in order to eat less resources or move to their own paid infrastructure because their package manager will soon be less than functional given the aggressive rate limiting github is performing.

toomuchtodo · on March 8, 2016

> EDIT: the upside is that cocoapods will have to either rethink there architecture in order to eat less resources or move to their own paid infrastructure because their package manager will soon be less than functional given the aggressive rate limiting github is performing.

I'd like to see both happen:

* CocoaPods refactoring to be more efficient

* GitHub providing open source projects the option to buy reserved capacity if they're using excessive resources (versus just saying "No").

avar · on March 8, 2016

    > GitHub providing open source projects
    > the option to buy reserved capacity.

I have no affiliation with GitHub, but I'd guess that if you were paying for one of their $200/month organization plans[1] you'd be having a very different conversation with them about rate limiting.

1. https://github.com/pricing

toomuchtodo · on March 8, 2016

I would be interested if any of the top five open source projects consuming the most resources are paying Github anything.

avar · on March 8, 2016

They probably aren't, but they aren't going to be using the infrastructure in the pathological way CocoaPods is either, which requires you to have a client that uses GitHub on behalf of your users.

I'm just pointing out that the feature you're wishing exists very likely already exists in practice. Unless GitHub is stupid they aren't going to be complaining about you pegging 5 CPU cores for $200/month.

voltagex_ · on March 9, 2016

How much cash do the top five open source projects bring in? That's the other side of it. Funding a side project, let alone a large open source project, is hard

toomuchtodo · on March 9, 2016

The top five open source projects on Github can't bring in $200/month each?

blakeyrat · on March 8, 2016

GitHub already has paid accounts. There's nothing I'm aware of that would prevent an open source project from paying for one.

nickpsecurity · on March 9, 2016

That response was sheer excellence except maybe it was too nice about how ridiculous the situation was. I'm pretty diplomatic on job but an aggressive freeloader braggjng about what tge damage saves them would try my patience.

GitHub people are truly going above and beyond in service even when barely warranted. I'll give them that.

amelius · on March 8, 2016

Agreed. Perhaps better even would be an automatic message that says that rate-limiting is in effect, explaining the reasons.

Gratsby · on March 8, 2016

From CocoaPods.org:

> CocoaPods is a dependency manager for Swift and Objective-C Cocoa projects. It has over ten thousand libraries and can help you scale your projects elegantly.

The developer response:

> [As CocoaPods developers] Scaling and operating this repo is actually quite simple for us as CocoaPods developers whom do not want to take on the burden of having to maintain a cloud service around the clock (users in all time zones) or, frankly, at all. Trying to have a few devs do this, possibly in their spare-time, is a sure way to burn them out. And then there’s also the funding aspect to such a service.

--

So they want to be the go-to scaling solution, but they don't want to have to spend any time thinking about how to scale anything. It should just happen. Other people have free scalable services, they should just hand over their resources.

Thank goodness Github thought about these kinds of cases from the beginning and instituted automatic rate limiting. Having an entire end user base use git to sync up a 16K+ directory tree is not a good idea in the first place. The developers should have long since been thinking about a more efficient solution.

orclev · on March 8, 2016

It seems particularly galling that their response to GitHub was to essentially throw their hands up and say "We don't want to change anything, fix it for us". I think GitHub had a near perfect response to this, they analyzed the problem, came up with a set of changes that could be made to help fix it (both short and long term), and pointed to steps they've taken to help out. CocoaPods on the other hand (or at least one of their developers) did not handle this particularly well. When presented with the evidence of why they were seeing slow responses and long queues and suggestions of how to fix it, they complained that they didn't want to fix it and didn't have the time or resources to do so.

Honestly if I was GitHub, I'd be tempted to just increase the throttling on CocoaPods and call it done, it isn't their problem if the users of that project have a bad experience. GitHub has provided solutions to the problem, it's CocoaPods that's resisting implementing those solutions.

mynameisvlad · on March 8, 2016

Yeah, I'd have to agree. I was not at all impressed by the CocoaPods response here, especially since it was made clear by the GitHub staff that CocoaPods is using up a lot of CPU and terabytes of bandwidth. If you get all that for free, I'd expect you to be a little more open to changes that make it easier for your provider to continue giving you all that for free.

dantiberian · on March 8, 2016

A later comment from @alloy was a bit more gracious about this https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm..., but I agree, it wasn't a good look.

bobwaycott · on March 8, 2016

I think that's pretty unfair. It's really obvious that the initial reply didn't really understand what was going on, and what was being explained. A couple followup additional explanations later, the same dev grokked the problem, CocoaPods' responsibility for the problem, and outlined a list of how they're going to solve it. Seemed to me to be a pretty nice example of professional and helpful candor between GH and an OSS project working to figure out a long-term solution.

mynameisvlad · on March 8, 2016

I don't know, maybe I'm being overly pessimistic here, but to me it just screams of backpedaling once they saw the reaction they were receiving in this thread. The position shifted from "it's the way we architected things, how can you fix this for us" to "okay, here's some things we can do" pretty quickly and dramatically when the HN thread went up and people were reacting to the response. Cocoapods is using Github resources for free, so the appropriate response from the start should have been what it eventually came down to, not pushing back on Github because they don't want to invest in an actual CDN solution. But, as I said, maybe I'm being overly pessimistic in my analysis here, that's just how it came off to me.

bobwaycott · on March 9, 2016

I get where you're coming from. I also had a similar initial reaction. However, as I read through the subsequent discussion, it began to read as though the commenter was really not grokking the problem—and, more importantly, what to do to fix it. I thought it was very impressive that none of the GH participants reacted like some of the HN commenters here. Instead, they showed a great deal of patience and restraint in fully explaining the technical details, offering actionable solutions, and keeping everything very civil and supportive. Then the same guy who sounded like he was possibly being a jerk came back and sounded totally different because he seemed to actually know what to do to fix his project. Maybe the CP commenter read this HN thread and reacted to it, but I'll admit HN is the last place I'd think of finding one of my GH issues discussed.

Perhaps I'm just being too charitable. Either way, the project rather rapidly seemed to come to the right conclusion and jump on board fixing their problem.

On a related note, I feel like this issue could be turned into a great teachable moment for OSS projects; one agH could use as a tech blog and guides for how to be a good citizen and avoid things that can make your project get rate-limited without you knowing.

mynameisvlad · on March 9, 2016

Yeah, I really just think it went a bit too far in the other direction and overcompensated somewhat, which is what was giving me that view. The comment with the heart emoji really stood out to me as a "huh, this might be because of HN" since it basically touched on exactly what was being criticized in here, that they weren't really appreciating what GitHub was providing for free. That said, I can totally see it just that alloy realized it on his own and wanted to make it clear. It's just that the timing of it all and the fact that it's hitting the same point kind of led me to believe that it was a reaction.

Obviously, that's not to say the sentiment isn't genuine. The eventual conclusion makes it seem that yeah, they do appreciate what GH is providing and are trying to make it less strenuous on the servers to get a better experience all round. Making it work well is really in their best interests since the users are seeing a degraded experience until something can be done about it. Definitely also happy that the right conclusion was eventually reached.

cyphar · on March 9, 2016

I don't understand how you can read "our use of git as our package manager is better than some other package managers we won't name :wink:" is an example of not understanding the problem. It looks more like someone who doesn't want to accept that the problem is on their end.

bobwaycott · on March 9, 2016

To me, that comment shows precisely that someone isn't really understanding the problem in full. It feels to me to be a deflection—of responsibility, sure, but also of admitting one doesn't understand what's really going on, and how one is at fault.

Given how rapidly the same commenter changed gears, it strikes me as plausible there was an "ohhhhh eureka" moment, and suddenly the guy got it. His followup comments began dealing with the problem after a couple other GH participants explained further what was happening and why (as well as some actionable steps to take to correct the problem for good).

But perhaps I'm being too charitable.

pyre · on March 9, 2016

> It's really obvious that the initial reply didn't really understand what was going on, and what was being explained.

If you are in such a position, then it seems like the best course action would be to ask questions rather than list off reasons that you don't want to deal with it.

bobwaycott · on March 9, 2016

Sure. If you know that you don't know what's going on. I walked away with the impression that the guy didn't actually know that.

pyre · on March 10, 2016

If he can't understand that his project's resources are consuming 5 whole nodes and terabytes of throughput on Github's infrastructure, then I question his skills as a developer. Even if all of the other technical details are completely obtuse to him, he should at the very least be able to understand the sheer scope of the resources their project is consuming on Github's infrastructure.

nostrademons · on March 8, 2016

Well, the flip side is that the CocoaPods developers are all volunteers (right?). They aren't really deriving any benefit out of the work they do on CocoaPods, and if you ask them to take on financial or ongoing maintenance obligations for a volunteer project, they probably just won't do it. The major benefits of CocoaPods existence go to iOS developers, but there's a tragedy of the commons effect here, where no individual developer is willing to pony up money for the extra convenience that CocoaPods offers.

I think that long-term, the solution will be the Swift Package Manager, and CocoaPods will just be deprecated in favor of it. Let Apple host iOS packages; they're the ones that gain the most benefit from easy iOS development; they have the developer expertise, and the hosting costs are a drop in the bucket compared to iCloud & CloudKit. But that's not all that helpful for people who need an Objective-C package now.

dfc · on March 8, 2016

> They aren't really deriving any benefit out of the work they do on CocoaPods

I don't think working on CocoaPods is an altruistic endeavor. I imagine (know) that some of the cocoapods folks are app developers and ostensibly CocoaPods makes developing applications easier.

Side Note: its not a tragedy of the commons. Github owns the infrastructure and they enforced their private property rights by rate limiting a group of users that were disproportionately using resources. It is a collective action problem for CP users.

protomyth · on March 8, 2016

> They aren't really deriving any benefit out of the work they do on CocoaPods

No direct financial benefit, but they are deriving a benefit out of their work.

nostrademons · on March 8, 2016

Presumably - otherwise they wouldn't be doing it - but it often doesn't take all that much to flip a volunteer from "Okay, this is cool, I can help other people and learn some stuff as well" to "Fuck this, it's way more trouble than it's worth." Top amongst this is when the people you're helping expect you to give them free work.

Jobs compete with other jobs, and most people expect that they'll have to do some unpleasant things in their job. Open-source & volunteer work competes with hobbies, and there are many hobbies where you never need to deal with demands, unexpected work, and interpersonal drama.

mordocai · on March 8, 2016

It also doesn't take much to move a company from "Okay, we'll help you by hosting your shit for free" to "Fuck you, you're banned since you are ungrateful bastards" (in nicer language of course).

nostrademons · on March 9, 2016

I agree, and I think the GitHub employees who commented on the thread have been really patient, and that it's impressive that GitHub as a company has tolerated and supported this use case.

My point, though, is that it's not the CocoaPods developers who are ungrateful bastards. It's any Hacker News commenter here who also uses CocoaPods. If you think this behavior is insane, submit a pull request.

odbol_ · on March 8, 2016

Ugh, I'm skeptical of giving Apple ownership of any kind of developer tool. We all saw how badly they screwed up TestFlight, and now you want to give them the only OSS package manager?

jscampbell05 · on March 9, 2016

Amen to that. Fortunately the Swift Package manager (Which is also OSS) is ran by their open source swift team who so far is doing a great job of being open and delivering the best solutions for the job.

TestFlight on the other hand....

cballard · on March 9, 2016

If you're upset about Apple TestFlight, why not just use HockeyApp?

odbol_ · on March 10, 2016

Haven't tried HockeyApp. I use Crashlytics Beta now and it's amazing. Literally "press of a button" deployment: no dealing with provisioning profiles, device UUIDs, or any of that garbage. Just build and deploy, and all your testers get the update instantly!

pjc50 · on March 8, 2016

It's very "sharing economy": someone else has a resource you can use for free, so why not take it?

(Edit: </rhetorical> </sarcasm>)

izacus · on March 8, 2016

Because abusing that resource makes sure you (or anyone else) won't be able to take it in the future. I thought that's pretty obvious?

kingnothing · on March 8, 2016

This may very well be the first time that Cocoa Pods was told that they're consuming a huge amount of resources. They might not have even known that what they're doing is considered abuse.

betenoire · on March 8, 2016

hard for me to believe no one was ever curious enough to do the math

jethro_tell · on March 8, 2016

People are starting to get used to the idea that you hit someones api and they scale for you. I mean from a company based on scaling that's strange but there's a lot of 'magic' that goes into a lot of providers being able to always respond to an API call and most don't see it.

kingnothing · on March 8, 2016

Based on just looking at how some of my employer's customers use our service, plenty of them are completely clueless that they're well outside of normal usage patterns.

shortstuffsushi · on March 8, 2016

I think the parent was expressing the project owners' thought, "we could do this, why not," and not saying they agreed with it.

smacktoward · on March 8, 2016

You're assuming that people are willing to defer gratification today in order to ensure long-term sustainability. To illustrate why this might be a problematic assumption, I would point to all of human history.

bmm6o · on March 8, 2016

pjc50's question was rhetorical, and was offering a critique of the "sharing economy" mindset. That it should be pretty obvious is exactly the point.

Jtsummers · on March 8, 2016

OT (from primary discussion): This isn't the sharing economy. This is someone abusing a resource.

mitchtbaum · on March 8, 2016

I take your use of "sharing economy </r></s>" as post-uber/airbnb/etc, instead of, as I first heard it, from couchsurfing, potlucks, bittorrent, etc well before that.

Also while nit-picking, I would clear up your use of "for free </r></s>" as "as a freebie", again post-[insert: x̄ȳz, inc].

Kapura · on March 8, 2016

It seems like a classic example of ignoring the negative externalities. Luckily, we live in a connected world where it is sometimes easy to trace the after-effects.

I understand the desire to personally maintain as few of one's own servers as possible, but when the result is negative effects on the service hosting the project and a worse experience for the end-user, it might be time to start looking over what google cloud offers.

rqebmm · on March 8, 2016

1) I don't think they mean "scale your projects elegantly" in a "distribute your project to millions of customers" sense, but rather in a "add lots of libraries and not have it be a hassle" sense.

2) It makes perfect sense to let GitHub handle the performance hit until issues arise. Premature optimization is the devil, right? But once there start to be issues, it's definitely unfair to turn around and say "well you offer the service for free, so you should fix it"

carlosdp · on March 8, 2016

Yea pretty sure they definitely mean "Add More Libraries" when they say "scaling", it wouldn't make sense otherwise given what CocoaPods does...

dinkumthinkum · on March 8, 2016

It was pretty shocking to see a couple of the responses. If point is to be a package manager then they should see this as important part of the project and not just see it as Github's duty to provide this free service. Basically they are saying their service cannot exist with Github or someone will to expend the, apparently fairly significant, capital to provide the backend to their service. And this is all happening at a time when many people have figured out how to provide package management services in a reasonable way.

geofft · on March 8, 2016

I think the word "scale" is used to mean different things in the two sentences you quoted. In the first case, "scale" is about package management and dependency tracking, helping an individual software project approach large scale (many third-party dependencies with possibly conflicting requirements, new developers who need to get up-to-speed quickly, etc.). In the second case, "scaling" is about distribution of the CocoaPods metadata to large numbers of users, each with their own (possibly small) software project.

Sentence 1 would still be true if CocoaPods was only used by ten companies developing the ten biggest (in terms of lines of code) Objective-C projects, but there would no longer be a need to scale in the sense of sentence 2.

pjc50 · on March 8, 2016

This reply: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomm...

"Not having to develop a system that somehow syncs required data at all means we get to spend more time on the work that matters more to us, in this case. (i.e. funding of dev hours)"

In other words, using github as a free unlimited CDN lets them be as inefficient as they like. Such as having 16k entries in a directory ( https://github.com/CocoaPods/Specs/tree/master/Specs ) which every user downloads.

Package management and sync seems to suffer really badly from NIH. Dpkg is over 20 years old and yum is over a decade old. What's up with this particular wheel that people keep reinventing it seemingly without improvement?

cellularmitosis · on March 8, 2016

Debian's sync may be nicer, but their client-side solution leaves a bit to be desired.

Trivial apt operations (e.g. trying to install a package which is already installed) on an NSLU2 (an ancient 266MHz ARM machine) take several minutes, whereas the same operation takes several seconds on a modern laptop.

It turns out this is due to the fact that Debian "main" (Packages.gz) has ballooned to 32MB of plain text when uncompressed, comprising more than 41,000 packages, and it has to be parsed and assembled into a dependency tree for every apt operation. This problem screams for SQLite.

A side project I've started looking into is to make a transparent apt proxy which provides a trimmed down Packages.gz (e.g., removing anything which uses X11), which would be a lot easier that rewriting apt to use a SQLite backend.

snuxoll · on March 8, 2016

> This problem screams for SQLite.

This is precisely why yum/dnf has been switching from XML for repodata to SQLite. In fact, the only thing that is still XML-only is the comps file which just lists package groups, is updated rarely and "only" weighs in at half a MB.

mschuster91 · on March 8, 2016

> Trivial apt operations (e.g. trying to install a package which is already installed) on an NSLU2 (an ancient 266MHz ARM machine) take several minutes

Actually, I'm surprised one can actually run a modern Linux on the NSLU2 given its shameful lack of RAM and slooow USB port. But it was a nice gadget when it came out and it was fun to experiment with it.

> It turns out this is due to the fact that Debian "main" (Packages.gz) has ballooned to 32MB of plain text when uncompressed, comprising more than 41,000 packages, and it has to be parsed and assembled into a dependency tree for every apt operation. This problem screams for SQLite.

Correct me if I'm wrong, but isn't apt (and dpkg) basically composed out of a ton of different (perl/shellscript) modules? So it should be possible to create an interface-compatible sqlite data store.

cellularmitosis · on March 8, 2016

Actually the part which parses the packages file is a bunch of c++

debacle · on March 8, 2016

> A side project I've started looking into...

Wouldn't cleaning up the package search interface be a similar effort with much greater payoff?

cellularmitosis · on March 8, 2016

It isn't just package search which is the problem, it is everything which has to parse the packages file, which is basically every apt command. So if you can trim "main" down to 10,000 packages, suddenly every part of apt is faster, and no one has to install any custom apt replacements.

debacle · on March 8, 2016

But wouldn't just storing that file as a database solve most of the issues?

cellularmitosis · on March 8, 2016

Yes, and that would be awesome, but it means rewriting apt.

mitchtbaum · on March 8, 2016

Arch and Debian contributors have tried a good approach for package management..

0: p2pacman - Bittorrent powered pacman wrapper

1: pacman & torrent, feasible?

2: DebTorrent

(0) https://bbs.archlinux.org/viewtopic.php?id=163362

(1) https://bbs.archlinux.org/viewtopic.php?id=115731

(2) https://wiki.debian.org/DebTorrent

I believe scaling this could happen with either: 1) lightweight filesystem\directory versioning support, like how btrfs allows you to mount snapshots. This way, peers could update whichever version of a torrent they have. Or 2) very reliable means to update to the latest torrent release (as reliable as syncing with peers), which afaict means smarter bittorrent clients that can perform DHT-based "crawling". Those recent defcon(?) "hacks" to query peers for similar torrents based on user pools and connection histories (or something like that) would make sense here.

A cool side-note: In one of my few experiences diving into `.git`, I diff'd it before and after making changes to its tracked sources, like adding files and modifying them. It looked like a torrent that included version control data would make out just fine if an updated torrent expected similar data in the same location. Again, a smarter bittorrent client would need to sort some of this out. See also 0': Updating Torrents Via Feed URL. Anyway, most users would probably leave that part out, in favor of only which parts they need.

(0') http://www.bittorrent.org/beps/bep_0039.html

Another cool side-note: This would also allow for easily adding repos from multiple sources... Look at how many ( non-automated :-( ) merge requests com.github/CocoPods/Specs's caregivers have reviewed: 13,331 as of now (0'').

(0'') https://github.com/CocoaPods/Specs/pulls?q=is%3Apr+is%3Aclos...

masklinn · on March 8, 2016

> Arch and Debian contributors have tried a good approach for package management..

> 0: p2pacman - Bittorrent powered pacman wrapper

> 1: pacman & torrent, feasible?

> 2: DebTorrent

That's about distributing packages via p2p. The problematic repository doesn't store any package data, it stores package metadata (it's the cocoapods index if you will).

mitchtbaum · on March 8, 2016

>~ package metadata, not package data

metadata != data ??

masklinn · on March 8, 2016

You got it.

mitchtbaum · on March 8, 2016

I see metadata very much as "regular" data (in terms of needs and tooling); practically speaking, even from the same-ish data set. Simply put, it just looks different.. above the surface.

justinclift · on March 8, 2016

As a data point, "dnf" is the successor to yum. Started using it recently with Fedora 23... and it's pretty decent.

(It may not have been earlier on, I really don't know. ;>)

Something nifty about the new dnf is several of the older yum commands (eg builddep, yum-downloader) are now integrated directly so don't need extra utils installed. Seems like refinement is still happening.

If only my fingers didn't keep typing "dns" instead of "dnf" all the time, it would be great. :D

hayleox · on March 8, 2016

I just keep typing "yum", since it's an alias of dnf (albeit with an annoying nag message) and since I work on CentOS servers a lot and am automatically used to typing "yum".

cyphar · on March 9, 2016

There's also zypper from SUSE/openSUSE.

superuser2 · on March 8, 2016

Perhaps because Cocoapods is not an OS package manager or anything close to it. It installs libraries within the context of an XCode project, regardless of the host system or what is installed for other projects.

pjc50 · on March 8, 2016

It is a package manager, though? The fundamental idea of downloading a list of available options of which the user picks some, and the system pulls in dependencies, is almost exactly how dpkg and yum work. The location to which the packages are installed is a detail.

masklinn · on March 8, 2016

A language package manager must be able to "install" the same packages over and over again (and possibly "install" multiple versions of the same package in the same environment), and the ability to push packages is generally considered part of their duty, not so for OS package managers, you don't use dpkg to send a package to debian's repositories.

> The fundamental idea of downloading a list of available options of which the user picks some, and the system pulls in dependencies, is almost exactly how dpkg and yum work.

If you reduce it to the fundamentals you don't need yum or dpkg either to do that, just a dependency solver and curl.

vacri · on March 8, 2016

The fundamentals for package management also move the package into it's final resting place(s), where it's going to do it's work. Curl doesn't do that, it just gives you a single file somewhere.

I'd also consider removing a package to be a fundamental part of a manager. The two items you describe would be a 'package grabber'.

pretz · on March 8, 2016

CocoaPods is much more akin to RubyGems or PyPI or CPAN, all of which are established as useful tools outside of OS level package managers. There's a need for a iOS/Cocoa package manager (that understands Xcode!) and CocoaPods has so far been the most successful.

yegle · on March 8, 2016

A package manager for a project can be the same os package manager with reduced dependency tree and default to install with a prefix (that is the project root or the vendor directory).

jahewson · on March 8, 2016

Care to substantiate that last paragraph? Are you really suggesting OS X users use yum?

yegle · on March 8, 2016

Why not? dpkg is already been used by jailbroken devices in Cydia.

wyldfire · on March 8, 2016

Actually it seems very likely that one or more of the popular linux distro package manager ecosystems would fare well on other OSs. Arch Linux's pacman was ported to Windows, e.g..

286c8cb04bda · on March 9, 2016

pacman has been ported to OS X a few times

  https://bbs.archlinux.org/viewtopic.php?id=53960
  https://bbs.archlinux.org/viewtopic.php?id=122544

umanwizard · on March 8, 2016

It's absolutely possible to install yum on OS X (I know this from experience)

rodgerd · on March 9, 2016

Why not? A friend of mine was employed by a Very Large Company at one point to (amongst other things) maintain their AIX port of rpm/yum.

pjc50 · on March 8, 2016

Not literally yum, but something with a similar design rather than abusing a git repo.

indygreg2 · on March 8, 2016

I help run Mozilla's version control infrastructure and the problems described by the GitHub engineer have been known to me for years. Concerns over scaling Git servers are one of the reasons I am extremely reluctant to see Mozilla support a high volume Git server to support Firefox development.

Fortunately for us, Firefox is canonically hosted in Mercurial. So, I implemented support in Mercurial for transparently cloning from server-advertised pre-generated static files. For hg.mozilla.org, we're serving >1TB/day from a CDN. Our server CPU load has fallen off a cliff, allowing us to scale hg.mozilla.org cheaply. Additionally, consumers around the globe now clone faster and more reliably since they are using a global CDN instead of hitting servers on the USA west coast!

If you have Mercurial 3.7 installed, `hg clone https://hg.mozilla.org/mozilla-central` will automatically clone from a CDN and our servers will incur maybe 5s of CPU time to service that clone. Before, they were taking minutes of CPU time to repackage server data in an optimal format for the client (very similar to the repack operation that Git servers perform).

More technical details and instructions on deploying this are documented in Mercurial itself: https://selenic.com/repo/hg/file/9974b8236cac/hgext/clonebun.... You can see a list of Mozilla's advertised bundles at https://hg.cdn.mozilla.net/ and what a manifest looks like on the server at https://hg.mozilla.org/mozilla-central?cmd=clonebundles.

A number of months ago I saw talk on the Git mailing list about implementing a similar feature (which would likely save GitHub in this scenario). But I don't believe it has manifested into patches. Hopefully GitHub (or any large Git hosting provider) realizes the benefits of this feature and implements it.

_yy · on March 8, 2016

Wow, this is pretty cool. Reminds me of the performance optimizations Facebook has done with Mercurial: https://code.facebook.com/posts/218678814984400/scaling-merc...

Mercurial was designed to be easy to extend, and it shows.

rjbwork · on March 8, 2016

Git was created and designed to support Linus' workflow when developing the Linux kernel.

Hg was designed to be a DVCS system.

sambe · on March 8, 2016

GitHub claim in this thread that they pay about 1s for a full clone without a caching CDN, due to their bitmap indexing patches.

jdcarter · on March 8, 2016

Wow, really impressive response from GitHub. The right amount of technical detail coupled with balanced tone--halfway between "we support you" and "you make us crazy."

One correction to the post title: it's not maxing five nodes, but five CPUs.

justinclift · on March 8, 2016

Yeah, 5 cpu's is an order of magnitude difference. ;)

joshribakoff · on March 8, 2016

Wait, so the CPU isn't the big white tower sitting under my desk?!

minsight · on March 8, 2016

According to my mother, that's "The hard drive".

dang · on March 8, 2016

Ok, we replaced "nodes" with "server CPUs" in the title.

web007 · on March 8, 2016

I keep coming back to point #4 - who ever thought that 16k objects in a single directory would be a good idea? Ever since FAT that's been a bad idea, and while modern FSes will handle it without completely melting down it's still going to cause long access operations on anything to do with it.

Even Finder or `ls` will have trouble with that, and anything with * is almost certainly going to fail. Is the use-case for this something that refers to each library directly, such that nobody ever lists or searches all 16k entries?

acdha · on March 8, 2016

I do think that your last sentence is the answer: if you're using a package manager instead of working with the directory heavily, this isn't a visible problem which is going to motivate people to work on it.

The other side to consider: “one directory per package” is a very simple policy and it feels right in many ways to people (e.g. Homebrew has a similar structure because it's a natural fit for the domain). If the filesystem and basic tools like ls work just fine (which is certainly the case on OS X, where even "ls -l" or the Finder take less than a second on a directory of that size), isn't there a valid argument that the answer should be some combination of fixing tools which don't handle that well or encouraging people to learn about things like `find` instead of using wildcards which match huge numbers of files?

web007 · on March 8, 2016

One directory per package is completely sensible, just not all in one bunch. It's even fine if the mapping is to a flat namespace at something like the HTTP level - I can mod_rewrite /abcdefg to /a/b/c/abcdefg no problem. My only objection is to file- or directory-level structures that are this flat. I might be mentally deficient, but I can't even process anything that's structured this way.

As loathe as I am to admit anything about Perl is good, CPAN got this right. 161k packages by 12k authors, grouped by A/AU/AUTHOR/Module. That even gives you the added bonus of authorship attribution. Debian splits in a similar way as well, /pool/BRANCH/M/Module/ and even /pool/BRANCH/libM/Module/ as a special case.

Tooling can be considered part of the problem in this case. Because the tooling hides the implementation, nobody (in the project) noticed just how bad it was. I hadn't seen modern FS performance on something of this scale, apparently everything I've worked with has been either much smaller or much larger. Ext4 (and I assume HFS+) is crazy-fast for either `ls -l` or `find` on that repo.

It seems like tooling is part of the solution as well, but from the `git` side. Having "weird" behavior for a tool that's so integral to so many projects scares me a little, but it's awesome that Github has (and uses) enough resources to identify and address such weirdness.

zodiac · on March 8, 2016

My (perhaps naive) thoughts on this are - suppose a 16k-packages-in-one-directory solution were just as fast as a 16k-packages-sharded-by-prefix (the CPAN solution), then the former is conceptually simpler and so should be preferred. And the fact that you can mechanically transform one structure to the other means that the filesystem (or git) should be able to transparently do it for you (eg use the sharded approach as a hidden implementation, while the end user sees a flat directory). This seems to be similar to what ext4 does (https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash...).

cyphar · on March 9, 2016

The obvious question is how would you implement that. You might argue (as you should) that git has closer semantics to a filesystem than version control. But actually implementing this sharding would require git be a kernel module. Hardlinks and softlinks won't save you because they are both still dentries and thus have the same performance pathology. Maybe you could do it with fuse, but what have you gained by making your version control system even more annoying to use?

Twirrim · on March 8, 2016

It's one of those shouldn't-be-arcane-but-somehow-is pieces of knowledge. Almost every job I've had I've ended up speaking to developers about more efficient file storage when I find yet another "shove everything in a single directory" implementation.

mikeash · on March 8, 2016

The criticism against CocoaPods here seems awfully harsh.

Think about it from their perspective. GitHub advertises a free service, and encourages using it. Partly it's free because it's a loss leader for their paid offerings, and partly it's free because free usage is effectively advertising GitHub. CocoaPods builds builds their project on this free service, and everything is fine for years.

Then one day things start failing mysteriously. It looks like GitHub is down, except GitHub isn't reporting any problems, and other repositories aren't affected.

After lots of headscratching, GitHub gets in touch and says: you're using a ton of resources, we're rate limiting you, you're using git wrong, and you shouldn't even be using git.

That's going to be a bit of a shock! Everything seemed fine, then suddenly it turns out you've been a major problem for a while, but nobody bothered to tell you. And now you're in hair-on-fire mode because it's reached the point where the rate-limiting is making things fail, and nobody told you about any of these problems before they reached a crisis point.

It strikes me as extremely unreasonable to expect a group to avoid abusing a free service when nobody tells them that it's abuse, and as far as they know they're using it in a way that's accepted and encouraged. If somebody is doing something you don't like and you want them to stop, you have to tell them, or nothing will happen!

I'm not blaming GitHub here either. I'm sure they didn't make this a surprise on purpose, and they have a ton of other stuff going on. This looks like one of those things where nobody's really to blame, it's just an unfortunate thing that happened.

(And just to be clear, I don't have much of a dog in this fight on either side. My only real exposure to CocoaPods is having people occasionally bug me to tag my open source repositories to make them easier to incorporate into CocoaPods. I use GitHub for various things like I imagine most of us do, but am not particularly attached to them.)

pkaler · on March 8, 2016

I think Github's response was about as good as it could be. In hindsight, they probably should have contacted CocoaPods when they pegged one CPU. And they could have given the same general solution to Homebrew and others.

With respect to CocoaPods, I would hope someone on the team had thought through performance characteristics of their architecture.

It's like they brought a shopping cart onto a city bus and were then surprised that it inconvenienced the bus driver and the other passengers.

mikeash · on March 9, 2016

It's more like bringing a shopping cart onto a city bus, when the bus company said "bring all your stuff! we love it!" doing this for years with no problem, the bus driver says nothing, and then one day the bus driver says "hey, you've been causing a ton of problems with that shopping cart, you need to stop." Surprise seems entirely warranted.

cyphar · on March 9, 2016

I can't seem to find any posting by GitHub saying "yes! please use our free service as your git-based package manager's backend!" Advertising "host your code and assets with us" doesn't suddenly mean that it's justified to say "fuck it, GitHub can be our CDN".

mikeash · on March 9, 2016

Obvious in hindsight, but if you grew up from a little project to a big one, built so that your "users" are cloning your git repository, is it really clear that you've transitioned from "hosting source code" to "using it as a CDN" sometime along the way?

It's not like these guys thought, "Well, we really should use some dedicated high-end host for all our traffic, but we'll use GitHub because it's easier."

ars · on March 8, 2016

I have never seen anywhere that GitHub advertises using them as a CDN.

GitHub is for source control. That means a limited number of people pulling and submitting changes. That does not mean the general public using it as a CDN.

In fact I seem to remember seeing somewhere active discouragement of using it as a CDN.

swiley · on March 9, 2016

They advertise their CDN for user/organization pages. I've always been a little bothered that they have you use got for that.

mynameisvlad · on March 9, 2016

That's fair, but they're really advertising a specific feature. That is, statically generated sites hosted based on a specific branch in a repository. Nowhere do they advertise themselves as a CDN in the way CocoaPods is using them now.

vacri · on March 8, 2016

On the flip side, user 'alloy' gives the response that their decision to use github as a CDN was an explicit decision. In designing a product to scale, they apparently explicitly decided to outsource the 'scaling' part. While it may have been surprising to them, I don't think it should have been so surprising.

> It strikes me as extremely unreasonable to expect a group to avoid abusing a free service when nobody tells them that it's abuse

I don't think so at all. An experienced developer should expect that a free service will rate-limit their offerings at some point, and design around that. Viewing 'free' as 'an eternal resource sponge that we never have to think about' is the extremely unreasonable thing to do, in my opinion. I think that 'abuse' is probably the wrong word to use here, since that implies malice, and they don't appear to be malicious.

martinald · on March 8, 2016

I entirely agree with this. GitHub gets so much advertising + community from open source projects like this.

Also, I'm amazed this is even a problem. 5 CPUs is not a lot in the scheme of things (even if they mean physical instead of cores). TBs of bandwidth are also virtually free compared to a company the size of Github.

Even better: they are getting basically real world loadtested for free and finding loads of pain points, which may hit paying customers.

Unless I'm missing something, fire more metal at the problem. Many companies would love to be able to have every single cocoapod user (which is nearly every iOS developer) have to type github.com into their terminal for the cost of a bunch of servers + some bandwidth.

Pretty strange, unless this is hitting some really bad area of their service that can't easily be scaled out of (but i would be surprised)

breischl · on March 8, 2016

>>Even better: they are getting basically real world loadtested for free and finding loads of pain points, which may hit paying customers.

I think their point is that it's using the system in a way that isn't intended or desired. How does that count as "real world" load testing?

And by that logic, shouldn't anybody who gets hit with a DoS attack just say "thanks"? It's tons of free load testing on your network infrastructure, and you'll definitely find some pain points.

ars · on March 8, 2016

They are not telling them to stop using GitHub, they are giving them advice on making it work better.

wpeterson · on March 8, 2016

It's totally reasonable to host your code on github and to build a package manager that loads the content of a package from it's github repo.

What seems insane is to use a single github repo as the universal directory of packages and their versions driving your package manager.

There's a reason rubygems has their own servers and web services to support this use case for the central library registry, even if the source for gems are all individually projects hosted on github.

lucaspiller · on March 8, 2016

I assume they modelled it after Homebrew, which has been working fine doing exactly that for the last 7 years.

That only has 3,000 packages vs 15,000 for CocoaPods or 115,000 for RubyGems.

_1tan · on March 8, 2016

In case somebody is interested in such figures (I certainly am) - NPM has 249,838 as of today [0].

[0]: https://www.npmjs.com

caf · on March 9, 2016

I wonder whether you could use a DHT for the package directory.

riscy · on March 8, 2016

> Scaling and operating this repo is actually quite simple for us as CocoaPods developers whom do not want to take on the burden of having to maintain a cloud service around the clock (users in all time zones) or, frankly, at all.

The CocoaPods developers seem to be missing the entire point of git: it's a _distributed_ revision control system.

Setup a post-recieve hook on Github to notify another server, that is setup with a basic installation of git, to pull from Github so as to mirror the master repo. Then, have your client program randomly choose one of these servers to pull from at the start of an operation. Simple load balancer to solve this problem.

justinclift · on March 8, 2016

Rackspace is also known to sponsor significant resources for larger projects whom ask nicely. GlusterFS is one I used to be involved with doing this, and there are definitely others.

If CocoaPods reach out to Rackspace and/or other hosting providers, there's a decent chance they'll be able to pull together a good solution. :)

The downside though, is they'll need to figure out some way to keep it monitored/maintained. :/

voltagex_ · on March 8, 2016

Last I checked, Rackspace wasn't accepting any more projects.

spoiler · on March 8, 2016

I find it amusing how GitHub's contact[1] form has (probably a recent addition):

> GitHub Support is unable to help with issues specific to CocoaPods/CocoaPods.

---

[1]: https://github.com/contact

jdreaver · on March 8, 2016

I think that contact page remembers the last repo you visited. I went to it in incognito mode and is wasn't there.

That's a pretty neat feature!

synunlimited · on March 8, 2016

Looks like this shows the last project you looked at before heading to the contact page. I refreshed the leaf repo and then refreshed the contact page and it mentioned the leaf project at the top.

jrgifford · on March 8, 2016

It's pretty nifty - it seems to pull in from your history, and then show `user/repo` for the most recent one you've looked at.

paulclinger · on March 8, 2016

They seem to be doing this based on the last repository visited (for some popular once). Try visiting another repository before going to the contacts page and the message will change or disappear.

sehr · on March 8, 2016

I'm seeing something similar, only with Carthage instead of CocoaPods

rmoriz · on March 8, 2016

CocoaPods (and Homebrew) mainly exist because of a lack of tooling in the typical Apple ecosystem. So I would blame Apple for not supporting the community with money or tooling. Letting GitHub with its limited amount of funding pay the bill isn't a nice move. Apple dev relations should throw some money at GitHub so they can provide some dedicated resources or offer to pay the cost of other solutions (like a 3rd party CDN/AWS/Google Cloud/…).