I've been considering a bit of a blog post on this (particularly the unknown parts of wide scale OSS projects), but basically open source projects get harder as you have more contributors.
Time to review code properly (and to be overly friendly and helpful in doing so) often can completely eat into the ability for you to write and architect code properly, which strips out the ability to do the design that needs to be done. (and as you're doing this - taking maybe 15 minutes per patch, the odds of someone fixing the submission are I would guess about 25% - and you might need to do a couple dozens of these a day).
People can get annoyed at filtering out of decisions unless you overcommunicate at a rate that is probably 5-10x times what is normally expected in normal business conversation. (I was already communicating at a rate that was probably 3-5x what most developers do on the lists, and generally got crucified for it, with myths building up about my character, etc).
Often decisions have no right answer, there's a good and a bad, and either decision will irk someone, and you'll get negative blog posts about what you are doing in either direction. You optimize for the thing that will help the most people, and the one guy who doesn't have his obscure itch scratched will assume you are deliberately ignoring him. You eliminate a problem user so you can concentrate on 1000 people and doing real work, and then others jump all over you after the micro-incident (that they only saw part of).
GitHub makes some things harder -- the issue tracker doesn't have issue templates, so you have to overcommunicate the need to fill out proper templates, and people also throw code at projects prior to asking what the code should be. I love GitHub for the OSS explosion it has enabled, but it is a chaotic way of managing a project as that project gets big.
It also makes it very obvious when a project is buried behind a lot of incomplete contributions, but similarly doesn't provide good tools to sort and manage large numbers of incoming tickets. It makes it very obvious when those ticket numbers build up, and arguments happen on closed tickets.
Twitter is probably one of the worst things, as there's a lot of passive aggressiveness on it. Twitter is an "argument machine". It doesn't provide context, but it does provide a place to rally a mob with pitchforks over the slightest percieved offense, that often should not have been an offense.
Often you don't want people leaving trash on your lawn, but if you edit out the offensive things, the same people claim you are censoring them.
My advice is dont try to acquire users too fast. The "Go" project recently said something like "we had to open source this once it was a certain way along". My conditioning from Red Hat was "do everything in the open", but that was not really something Red Hat always did, as they had a lot of projects with low contribution numbers.
My ultimate feeling is that good projects have good central direction. Contributions can be good, and making a project around a base that encourages lots of shallow contributions can be a very successful strategy for making a successful project, if you do that, you end up doing a lot of custodial work and can't always hit the goals you want to hit.
While it wasn't true in the last 5 years, now I feel the code is more important than the contribution process, and focusing on that allows the users to get a better experience. Users matter a lot - and folks trying to contribute matter a lot - but I don't like the way the inherant focus of contribution turns the creator of a thing into a project manager and a PR manager, and takes away their ability to innovate on the thing.
Being able to work on code is great, but I'd still want to see contribution structured around a mailing list. Strongly encourage talking about code/ideas before submission, but most people will not read it and will submit directly anyway.
I think part of my problem was the barrier to contribution was really low (and that was great) because it was pretty modular at the smaller ends, and we quickly got overwhelmed.
I like to thrash very complex codebases for not being contributor friendly, but the breathing room would have been nice.
I guess there's no clear answer - holding things longer before open sourcing them might help. Making sure you have very high coding standards helps. But eventually you're just going to have that very large number of people.
Most of everyone (95%) are awesome, it's just that the virtue of something being so open exposes you to everyone that might not be - and even those people are probably awesome, the nature of low-bandwidth communication on the internet probably just exposes you to misunderstandings and you end up stressing out over things vs being the friends you normally would.
Ignore what you can - it's a problem when others don't understand this and bug you about every single comment and interaction, and judge you on it. The ratio of complaints to thank you's is not always worth the pay at times, so just make sure you're doing it because you care about it, and find the best way to make it work for you, even if that's moving something to redmine and bitbucket :)
The opposite of course are projects with too few contributors that accept any patch out of desperation, be it reasonable or not. (ZFS on Linux comes to mind, it's a super nice community and Brian Behlendorf does a great job as project lead but sometimes features and patches creep in of which I'm wondering why nobody dared saying "no".)
The Linux kernel community solved growth by delegating responsibility to subsystem maintainers. Such a hierarchical model is not supported by GitHub. Also, the kernel community's process of submitting and discussing patches on mailing lists, while somewhat arcane, raises the barrier of entry and keeps at least a portion of the Twitter mob out.
Interesting. I was just the other day wondering if a github mirror of the OpenBSD ports tree existed. In my searching, I found this thread in which Ted Unangst sort of alluded to the same idea:
github is all about social coding and they have a point. But many
of the things they enable are considered antisocial in the OpenBSD
Django enforces this strongly. Anything other than trivial pull requests will be ignored. Contributors are strongly advices to start a discussion on the dev mailing list before working on code and for anything big there's now a formal process for 'Django Enhancement Proposals' (obviously modelled on Python's PEPs)
To not derail this entirely they say that there are two kinds of languages, those nobody use and those people complain about - that is also true of projects in general, so while it may not feel that way people complain because they care (maybe too much and maybe about the wrong things).
The great thing about open source is that you can basically do whatever you want with the software and git hub makes that easy by allowing to just fork the repo, hack together the features you want (and post it back upstream).
Otherwise, great summary!! Really appreciate it
Also... when you deal with dependencies in a package manager (like npm dependencies), it is less than ideal to rely on a GitHub fork rather than the module itself.
If a PR is not quite right (which is often the case), I will clone their fork locally, and edit their commit(s) using git rebase and then merge it.
That way the user is still acknowledged as a contributor on GitHub (and they can see both of our names/profile pictures next to the commit message). Then I make a comment explaining why I edited their commit(s) on the pull request itself or the issue page. That keeps everyone happy and encourages further contribution.
It's not always practical to do this though but it feels natural in many cases.
So quickly outline the reason why it's not ready for merge, and if the contributor doesn't respond in a reasonable window you can reject the PR with a clear conscience. If someone else takes an interest and offers to clean up the code, you can reopen the PR.
Personally, I think the balance between giving good user support and spending enough time on actual development work is really really hard to figure out. But one way projects could help it work better, is make sure it's very clear how users should get support, and how much they can expect. And I have yet to see any project do that. If it was clear, then you could politely point newbies who just don't know better at the support doc. People like me who spend a lot of time making sure our requests won't waste time, would have solid guidance on how to do that. And jerks who are just wasting time could be politely pointed at the doc and then ignored when they don't follow it.