First off, the Apache Software Foundation isn't trying to absorb anyone or anything. Projects and people come to the ASF. It's a specific policy of the Foundation to NOT solicit projects. If someone says they're representing Apache and soliciting projects, they're wrong.
Secondly, Apache is very opinionated about how projects should be run. This comes from years of experience as not only a successful project, but as a successful non-profit organization overseeing dozens of projects. If a community doesn't like the ASF's style or rules (such as no dictators, benevolent or otherwise), they don't need to be there. No one wants to keep projects hostage. Part of the point of the Incubator is to get this figured out earlier than later.
Thirdly, about git and subversion. First off, there's increasing support for git at Apache (see http://git.apache.org/) but there are some serious drawbacks for use of git. Consider this: subversion was practically made for Apache in the way Linus made git for Linux. With that in mind, subversion isn't going anywhere at the ASF. Some of the rational is just plain stubbornness, but some of it goes straight to the core values of the Foundation.
Apache has become, for better or worse, the place where lots of projects go when they grow up. Growing up is hard to do. It's not fun. You have to do things like get a job, pay taxes, etc. When a project grows up, people start caring about who contributed what, under which license and making sure every line of code is legit. A lot of engineers don't care about this, but businesses and their lawyers do. A lot of the Apache Foundation "bureaucracy" is to handle this oversight and paperwork.
Git is an impressive tool and github is awesome for what it is, but it's not a non-profit foundation and it won't replace one. Confusing the Apache Software Foundation for your coding sandbox only suggests you don't understand the true purpose of either.
This is perhaps the most depressing response I've received to my article.
As I said in my article this is far less about git and more about the chasm that has grown between Apache and the rest of the community.
Your first two points boil down to "nobody makes you join Apache, if you don't like our policies then you can get out". How does this help Apache or its projects?
Apache could still be valuable to the community but this kind of stubborn attitude will insure that it continues to become irrelevant when it could be a leader.
I do understand the purpose of Apache and it is not hosting source code. That is the point I'm trying to make. If that is not its value, and its policies around hosting that source are no longer beneficial to its projects, then it should change its policy.
I think that you, and many people in the ASF, have married the existing policies of Apache with the purposes for which they were created. While the intentions of the policies may still be relevant, and in my opinion correct, the policies themselves will not remain relevant forever in a field as rapidly evolving as technology and GitHub may just be the first example of Apache policy incompatibility with evolution of open source.
The Apache Foundation is what you make of it. It will not change just because you post to your blog, but it will change if you engage the committer and membership population, build a consensus around your ideas and volunteer to do the actual work to make the changes happen.
No one will force you to do such work and if you don't want to do it, then you're not obligated to do so. No one will be upset if you, I or anyone else leaves the Foundation. It's cool. We're all here at-will, volunteering effort and code.
Apache cannot be everything to everyone, despite how much it is pulled to be so. Right now it fills a particular important role in the open source and larger software ecosystems. It's in that position due to both historical precedent as well as intentional decisions by the membership body.
But trust me, no one in Apache is ever, ever completely satisfied with the Foundation. That's to be expected -- the organization is driven by the compromises of a large group of people with different ideas and expectations. To balance between the chaos of constant change and the death of no change, the organization has grown guidelines and rules from the collected wisdom of its membership. This gives us at least some framework by which to evolve.
As for hosting code, there have been proposals at time for Apache to push the code hosting to some other organization. Once it was SourceForge, then Google Code, now Github. Of course, it's a tricky situation as the Foundation has particular requirements and wants to know its code will be around for decades. Moreover, infra team is constantly understaffed and thus are a very, very conservative bunch. We've seen way too many people jump in with a great idea and then leave maintenance to someone else. They're stubborn for a reason.
And perhaps Apache and Github are incompatible. So what? Github is a tool. It's incompatible with lots and lots of organizations and ways of doing things. The FSF has its rules and culture. Same with the Linux kernel, distros and desktops like KDE and Gnome. Android is different too. Not all of those mesh with Github and that's fine.
> The Apache Foundation is what you make of it. It will not change just because you post to your blog, but it will change if you engage the committer and membership population, build a consensus around your ideas and volunteer to do the actual work to make the changes happen.
This is true only from a CYA standpoint. As Mikeal said in the article, it's possible to do a lot of work and build up a strong case for a change that's important to project maintainers, yet still have Apache come up with excuses for why it can't happen. This is what happened with git -- twice.
I believe he did engage the committer and membership population. You're response, disagreement or not, is proof of that. Disparaging the way he did it with statements like "just because you post to your blog" is completely unfair.
Linux is a bad example. It's not "community" development by any real definition of it, because Linus controls everything that goes into the mainline codebase. If anything, it's community maintenance, because that is delegated out.
More importantly, Git by itself does not promote community development. No source control system does. Some make that style of development easier, but none of them actually directly promote it.
GitHub is not Git. GitHub is the Git version of SourceForge. Nearly all the community development features(bug tracking, forums, etc.) on both sites are built beside the source control system, and aren't really integrated directly to either Git or SVN.
There's no reason apache can't maintain its own "legally authorative" git repo. Nothing in the authors post suggest that he is confusing the ASF with a "coding sandbox". Making that claim suggests to me that you are invested in the alternative and not thinking objectively.
And I disagree about subversion being "made for Apache in the way Linus made git for Linux". Subversion is an utterly derivative implementation of any server based VCS in existence, where as git is an example of truly creative thought (not just from Linus) about what VCS should be for a large community that requires the accountability that you claim ASF requires.
tl;dr: Sign your Git commits cryptographically with PGP if you don't want the history to be editable.
Git's ability to edit the history is a very useful tool. I don't think Subversion or any other VCS prevents you from editing the history either. Maybe they just don't provide tools for that so you'd have to hack the internal data structures of the VCS or something, but you actually want a tool to modify the history. Think about the situation where somebody accidentally pushed a secret private key or a database password to a public repository, you want it out of there! (there's a ton of examples of this in GitHub. git filter-branch is what you should do).
In order to provide "safe" history for Git, the commits must be cryptographically signed by their authors. This is vastly superior compared to trying to use some server side authentication kludge, which can be broken into. And the data structures of the VCS database can be modified, with a hex editor if all else fails. Cryptographic signing provides a guarantee against hex editor hacking too.
If I read one more "Git sucks because you can edit ancient history" comment from someone who doesn't understand the concept of crypto signing, I will cry.
You of all people should know that GIT history consists of a write-only log which is maintained using cryptographic hashes. If you edit one commit (even the metadata) you have to rewrite history, and all the hashes for commits after it change.
People will notice, and most importantly, everyone will still have the old commit chain locally.
This means that even the server cannot arbitrarily edit history. With SVN, afaik this is possible by manipulating the database.
In discussion of an article which makes the claim "The problem here is less about git and more about the chasm between Apache and the new culture of open source." it is ironic that an objection is raised that is trivially answered by using one of the very proponents of this new "open knowledge" culture, Stack Overflow:
Oh, I can think of a million scary sounding consequences of using git and I'm sure that all were raised. This is what established groups do when confronted with change: raise any objection even though a moments thought demonstrates the paucity of its merits.
I need not respond to the actual concern because the ASF has already done so. The ASF has already decided to allow git to be used. I assume that their lawyers OK'd this change. So I did not intend to continue an ongoing discussion: ASF has already concluded that discussion and approved git. Clearly, jaaron does not represent the views of all the "ASF People", and for him to raise issues as legal showstoppers when the lawyers have clearly approved is utterly disingenuous.
The purpose of my post was not to discuss the merit of the git vs subversion argument, but instead to discuss the merits of jaarons criticism of mikeals article.
One method that established groups resist change is to continue to bring back discussion to issues that have been decided. It helps slow discussion on change by making it appear that a previous issue was not, in fact, resolved. In their mind, of course, its not been resolved: the lawyers were wrong, or perhaps the lawyers didn't understand. Established groups don't just get over it and move on. Why would they?
ASF has decided to allow Git. I believe that those projects which use git will enjoy more success than if they use subversion. Mikeal makes some interesting observations about this. Jaaron spouts the traditional establishment bullshit:
1. The other side are children. We are grown ups.
2. Legal implications.
3. Nobody is forcing you to participate.
4. Condescension. "It's impressive for what it is" ... (but "what it is" is "just a sandbox")
That's the best part: if you have access to the source SVN repo you can change history and there will be no evidence that you did so. History in Git, on the other hand, cannot be modified without it showing up.
The reason is that in Git every commit gets its own unique hash so you can't change a commit without creating a new hash. To have this in SVN you have to buy 3rd party tools.
I do not think it is as black and white as you describe it. The way I see it: if somebody falsifies a complete repository, the only way to detect that it was changed is by comparing its content or a hash thereof with that of a (supposed) copy that is more trusted.
That is true for any digital archive, including those made by any SCM system. The only thing where git differs from svn in that respect are a) that it computes such hashes for you, and (typically/AFAIK) shows those hashes in its UI, and b) that it is typical for people to store those hashes on other systems. The net effect of that may be large or small, depending on the number of people keeping a copy who will not blindly copy changes from the 'main' repository.
>I do not think it is as black and white as you describe it. The way I see it: if somebody falsifies a complete repository
No, it is. You can't "falsify a complete repository". We will all have checked out from that repo and as soon as someone replaces it with a fake none of the hashes will match up.
>the only way to detect that it was changed is by comparing its content or a hash thereof with that of a (supposed) copy that is more trusted.
Which happens in the system automatically. Have you actually worked with Git? Go change history on something you've pushed and other people have pulled.
>number of people keeping a copy who will not blindly copy changes from the 'main' repository.
It's not about "blindly copy changes". If you pull from a repo where someone has tried to rewrite history you'll see duplicate entries all over your log. If you have a graphical tool you'll see right where they started their modification.
As I understand it, if the repo has receive.denynonfastforwards=true, a user can't push changes that will destroy history. This flag has been available since 2006. (And I didn't mod you down. You ask a legitimate question). A bit more research shows that there a couple more config changes required: http://stackoverflow.com/questions/2085871/strategy-for-prev...
Which is utterly trivial (I've done it, seriously, it's not the big deal you seem to think it is, aside from the obvious difficulty of particularly large repos), and is not conceptually different from what's necessary for editing git's history, except that nobody can tell you've done it without comparing the "new" repo to the old one -- and under svn's internal model, no one but the server will normally have a complete history.
With git's model, not only does everybody have the history, but the commit ID themselves are your insurance against tampering. You effectively validate that history every time you sync with another git repo.
Yeah, that particular bit of FUD is quite popular with the anti-git crowd. It's nonsense. Any attempt to edit the history of a public repository will be noticed instantly by anybody who tries to sync up, no matter what.
Stick in a post-commit hook to force a sync to a backup repo nobody has access to if you want to be really paranoid, but as it is, git is already far more resilient against tampering with the public history than svn ever was.
You are right, there is no reason why the ASF can't own its own "legally authorative" git repo.
That is exactly why the ASF is conducting experiments in exactly that. Assuming those experiments are a success and representatives of ASF users are happy (which include business folk and lawyers, not just developers) then the ASF will role Git out to all projects that want it.
I also agree with some of the replies stating that the ASF should do as much as they can to foster contribution and a more active/social community. Perhaps they've just been a bit slow to adopt newer solutions because active contributors are content with the existing setup, and it's not seen as a huge benefit to change this just to be hip with the new crowds every quarter. Either way, it's nice to see Git catching on a bit, since it's one of the most common choices nowadays.
As an organization, the ASF does an excellent job of overseeing crucial projects and the bureaucratic side of programming that comes along with developing services in the enterprise space. With such a large community and range of projects, a degree of bureaucracy is required to keep things running smoothly.
They also do a good job of staying a mostly neutral party and working strongly towards the success of any project they "take in" under their wing. As jaaron noted, they don't actively solicit anything, or go around trying to absorb projects. They simply help maintain the projects that have grown up (and many people rely on).
Being strong proponents of the open web, flexible software licenses, and doing a lot of the paperwork heavy-lifting in the internet and programming industry from a legalese standpoint, I think they do a lot more for the web as a whole than most people are aware.
On the community note, I've run a few Apache community forums over the years, and it always seemed Apache users are starving for more community interaction and very appreciative of the social environment. Many of the Apache projects have a bit of a learning curve to be put into practical use, and people appreciate some guidance in learning the ropes. Maybe the ASF will notice and embrace a larger sense of community as time goes on, or perhaps some of us here on HN can connect and brainstorm something to fill the void.
I think the title of this post was just hyperbole to catch peoples attention regarding a web server vulnerability or newly discovered bug. Haha, seemed to work pretty well.
We're often caught up in the hype of new and catchy-sounding web technologies, but Apache is an organization that helped shape the modern web. Could they perhaps make the community more approachable to newbies? Sure. Are they harmful to the open source movement? I would think not. They bring a lot to the table.
While correct, it would benefit Apache in the long run to foster better communities and tools for their projects to grow. While they provide value in terms of the protection/process/management it's all for nothing if the developers can't collaborate effectively.
GitHub has changed many developer's UX of opensource development to the point that they don't want to do it "the old way". Apache should be looking to grow in this direction to keep their developers and projects engaged.
Trouble is, their existing developers and projects seem to be doing fine with SVN. Don't forget those are the most important people, the workhorses you already have- not the flighty young birds you hope to one day trap.
Sure, there is always the future to think about, but the future isn't happening a week from now. They have time to watch Git grow and wait until the time is right.
Except that the existing developers aren't all doing fine with SVN. As referenced in the post, both CouchDB and PhoneGap (existing "workhorse" Apache projects!) prefer to use git, but have met with strong opposition from the ASF.
The time to allow for Git usage is already here. It's not just pie in the sky forecasting -- existing projects are being held back by bureaucracy.
>Trouble is, their existing developers and projects seem to be doing fine with SVN.
This statement has little meaning. If they were still on SCCS and had been using it for years I'm sure they would "do just fine" with it. But if they move to something modern they could do even better.
It's not terribly great when you have large files in the system. You end up with a huge repository on disk as those files change. But more importantly, you can't do a partial checkout of a particular path. I think I read that that'll be coming to git, which would be fantastic.
I've used centralized SCCSs for years and years. I guess I don't see why you can't use git as one, just have everybody agree to push to a central repository on a server.
There is the issue of the size of the local copy, but it doesn't seem to be a big deal in practice. I just don't check in the heavyweight frameworks as vendor drops into the same repos as I use for my own, smaller, codebase.
Granted, I'm scared to death of ending up in a merge-gone-wrong hell situation trying to do stuff that I'm very familiar with in Perforce. But I'm happy being a git newb taking baby steps a bit longer.
I don't think I've ever seen a situation where that's true. And I've converted a lot of repositories (I maintain the svn2git project). SVN has its metafiles, but git has the full history locally. In all but the most trivially-sized projects, the git clone is bound to be larger. For larger projects it can be several orders of magnitude larger.
I'm aware of that and didn't imply otherwise. But the storage mechanism is for the history, not for the materialized files in the working directory. That's going to be the same for either git or SVN, since they're checked out. So you're talking about comparing git's history DB to SVN's metadata files. The metadata files are effectively constant cost whereas the git history grows with each checkin. They get quite large.
An up-to-date snapshot of my source tree at work is several GB. The whole Perforce repo is probably 100GB. Most of that is vendor libraries. For example, every so often we update to a new version of the Boost C++ libraries and pre-build it for most common platforms. This amounts to a GB or two. This is easier on the other developers and it makes the process more repeatable for QA.
One of the great things about Perforce is that it's normal practice to map only selected subtrees of the repo. So I have several workspaces going at any one time.
As much as I am impressed with git's speed, this would not work with git. I used to try to managing Boost's vendor drop as a git repo. I now just keep my notes in there about how to download and build it locally.
When a new vendor release comes out, we build it with our "official" compiler settings for the different platforms, branch the headers from the source and combine them into a convenient "SDK" tree, sometimes tweak something here or there, and update the document. Amounts to several GB being checked in from multiple machines. Occasionally a developer (usually me, but others too) needs to commit changes back to our central repo vendor tree.
I suppose we could do that into a separate repository and then define parts of that as a git submodule.
Which means that if a developer wants to do do git-like stuff (say sharing a patch with a single other developer or committing a day's unfinished work to an alternate backup site), then he's just going to have to work around the SVN tool to do it.
The tool should exist to serve the people producing the work, not the other way around.
The more important question is: what are developers really missing when they have to use svn to hack ASF's code in comparison to using git? (leaving github out of the equation, it is not git) You can hack locally to your heart's content and in case you want to contribute, you can diff and contribute.
Part of the drive of open source is bootstrapping better ways of doing things. We could all likely do our jobs to some degree running Windows 95, but who wants to? Git improvements over SVN include increased performance, cheap branching, more detailed tracking of changes, the ability to code offline (not a huge deal, but it certainly has made traveling more fun for me), and DVCS collaboration capabilities.
All server based vcs are push based. Certainly they can be used pull-based, where others diff code, send the patch, and a committer then integrates and commits. Git, on the other hand does two things differently. 1) it makes it vastly easier to create, submit, and integrate patches (because it was designed for it) and 2) it makes it vastly easier for the people making the patches, who don't have commit privs, to maintain their changes in their own repo, while still syncing with the master.
svn does not offer these features, which means that when you dont have commit privs, its a PITA to track and maintain your own version. I'll say that again: it makes it massively frustrating to maintain your own version. The result is that people dont, and the number of potential hackers is vastly reduced.
git, like svn, allows a small community, such as ASF, to manage and control a project - addressing legal and operational concerns. committers can push just as easily as with svn. but in addition, each committer can (if they so choose) also maintain relationships with a much greater community, each member of which can hack away on their own with a fully synced, fully versioned repo of their own. hackers get their own repo. committers have a really robust system for integrating patches.
If ASF wishes to do no more than have a small number of committers working on a project then indeed, svn suffices. its "good enough". the point of mikeal's post is that this is missing out hugely on the vast army of hackers.
I said in an earlier post that I had no skin in the game. that's true. now. i used to track tomcat. but it was such a pain in the arse to handle merging my code with the official code that I gave up. it just wasnt worth it. git would have solved every problem I had. perhaps by now I would be a committer on the tomcat project. who knows? svn made that a certain "no".
an organization like ASF needs a finite number of committers. git allows keeping that group small, yet engaging a much greater community.
but as mikeal says, its not about svn vs git. its about the mindset that sees no need for git because it sees no need for more than just committers.
I would put up node.js as a counter-example. It is a large, important project successfully being managed on github. It's sponsor, Joyent, is a private company, but the role could easily be filled by a foundation like Apache.
There's actually not a lot to cover here? The question is why you're so scared of git and in six paragraphs you managed NOT to answer that question. This is really sad. One might hope an open source fondation would be forthright and fairly transparent. Apparently not.
Uh, I linked to the official git repos for Apache. Apache isn't afraid of git. Plenty of Apache people love git. At the same time there are issues with implementing and supporting it by the ASF infra team. If you want all those details, search those mailing lists.
Spot on, excellent points and I could not help but wonder about the article... most people would consider "apache" to be synonymous for the httpd and not the ASF, so the headline is clearly fishing.
Then, OP argues ASF's processes are broken and a github project with one maintainer is so much easier... you really have to consider the scale of the average sourceforge (back in the day) project vs. apache even back then - and they have only grown from there, so no wonder a project hosted on github now is more "fun" to contribute to and work with because processes are probably pretty much non-existent and communication-paths and hierarchies are pretty much flat. The same goes for working at a small start-up vs. working at HugeFatCat Inc.
But one thing I cannot stand: arguing git over subversion. Yes both are great revision control systems but they are just meant for different applications. Where I am working, I would take subversion over git any day if all I really want is an absolutely sure-fire way of having a simple and easy to use central repository. I am in the unfortunate position to having met and working with quite a few people (programmers non-the-less) who have a hard time coping with basic CVS/SVN update-commit cycles - with git and mercurial offering you the option to commit but to your local changes first before actually committing your changes to the repository you checked out from (or another repository or...), this would have literally created havoc and confusion for my small team here... I know them, yes it sounds ridiculous, but I know I have saved myself a hell of a lot of trouble and headache. SVN is an absolutely perfect tool for the job if you do not need the de-centralization, on-the-go-commits and all the possibilities of creating custom processes in git with all its flexibility and features and options. SVN works just fine for what it is.
git and mercurial are not better,newer,shinier substitutes but different beasts altogether. (great, shiny, powerful and useful none-the-less) So could be please just let them peacefully co-exist and be thankful someone made them for us to make our lives easier?
So after telling us that your current environment is working with people who have a hard time with svn/cvs, forgive me if I find your opinion on git's ability to handle the requirements of large-scale, community driven development of the code that pretty much runs the internet to be utterly irrelevant. No. The kind of people who cannot handle cvs/svn update concepts are not the target demographic for a solution used to develop a fucking operating system or the worlds http server.
Did you even read what I wrote? Your comment is irrelevant because you obviously did not understand my point: that git is not a "simply better" substitute for svn as a lot of people seem to think. There are situations where svn is the right tool for the job, my example is one of those situations in my opinion. So yes, definitely are they NOT the target audience for git - exactly my point.
I am sorry to tell you, but you are plain wrong! Git is better. The problem is that SVN embraces a workflow that is inferior. But if you are not willing to change your workflow, git will seem confusing, indeed.
> I am sorry to tell you, but you are plain wrong! Git is better.
Please, do elaborate! In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access. No distributed or remote or on-the-go development, no forks.
> SVN embraces a workflow that is inferior.
It is different but what exactly makes it inferior?
In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access.
I was in exactly this situation at my last job (minus the same room, I was stuck in a separate office). In spite of the main source repo being SVN, we all used git-svn as our client.
The main benefit is that git makes it easier to create clean commits and push them to trunk.
E.g., I want to make module Foo, and use it in Bar. I can use git as I go, building module Foo incrementally over 5 commits (none of which has sufficient quality to go into trunk). Then I can do 5 more commits which integrate Foo into Bar. So my individual history is tracked while I'm developing. If, while building Bar, I make a bugfix to foo, I can commit it.
So logically, my commits look like:
101-104 Work on Foo
105-107 Work on Bar
108 Bug fix on Foo
109-111 Finish work on Bar
When it comes time to push to trunk, I can rebase 101-104 + 108 into one clean patch, "Built module foo", and 105-107 + 109-111 into "Incorporate module foo into bar". Then I eventually push these into the main repo.
Further, if I'm working on this with someone else, we can use git to track work between the two of us without committing to mainline.
This workflow sounds like it could be replaced by creating a feature branch, developing the feature there, then merging the branch back into trunk when the feature is complete. This can be implemented in ordinary SVN, without git or git-svn.
Almost, but not quite. As far as I know, svn doesn't support rebase -i.
Also, part of the point of the workflow is to make sure all commits to the official repo are clean code. Using a feature branch means that the official repo contains commits like "Halfassed implementation of foo, joe take a look at it".
> Please, do elaborate! In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access. No distributed or remote or on-the-go development, no forks.
Because with svn you end up having dirty working directories that go uncommitted for days because committing would break the build, as a result:
1. Everyone ends up making a second working directory because the first one they have is dirty with changes they can't yet commit and they need to make a quick change
2. Those dirty working directories are branches in practice, even if svn doesn't call them that, they are just dealt with with inferior tools.
Also: cherry picking commits from a branch into another branch isn't as easy, making a new repository isn't as easy, git is way faster at (almost?) everything (including large binary files), and being able to check the history when you are not in the office or connected to the VPN is nice.
However, I agree that the conceptual model behind DVCS is harder to understand, significantly harder, svn can be good enough especially for a small, local team.
SVN has branches, too. The problem it had was with merge-tracking and that's about when everyone hopped over to git (myself included). But that hasn't been an issue for a couple years now. By all means, stick with git if you prefer it, but release some of the older criticisms as they've been addressed by the SVN team.
You certainly implied a branch-free workflow. Otherwise why would you have a dirty local workspace and multiple checkouts? And why would committing break a build? I'm unaware of any CI server configured to build every branch in the SVN tree.
Do you make a branch every time you change something? Do you make a branch for every developer's local copy?
The thing is, when you solve the problem of dirty local workspaces by making them actual branches svn becomes just as complex as git, probably even more so given that all the branches exist for all the users. And you still don't have has good interface for cherry-picking, you still need multiple local copies, it still isn't as fast and you still don't have all the other benefits associated with a DVCS.
You don't need multiple local copies. That's why "svn switch" exists. That's all I was addressing. Some of the things you knock SVN for are either non-issues or issues with git as well. E.g., "branches exist for all users" is a non-issue. Otherwise it's also a problem when I push a git branch (which is a wise thing to do).
Cherry-picking and speed are legitimate benefits of git.
"Please, do elaborate! In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access. No distributed or remote or on-the-go development, no forks"
You describe my environment almost exactly. I am pushing for a git migration mostly for the low-overhead branching/merging as I can have multiple discrete tasks on a given project which need to be rolled out individually.
The ASF is the first home that comes to mind when an successful open-source project needs independent stewardship. Often when a company wants to "spin off" an open source project, they turn to Apache.
What alternative organizations fill this need in a more lightweight fashion? Most other umbrella open source organizations I know of focus on copyleft and other issues that can be hostile to commercial interests.
Very cool. It seems like they are the type of organization Mikeal is encouraging ASF to become: legal and administrative support for open source projects, and other services if the project's leaders wish. Quite hands-off.
Comparing the lists of projects, I'm surprised to find I use more SFC software than ASF software.
"Conservancy doesn't care about licenses as long as they are free."
They do care about licenses and license terms are part of the requirements for application. The project license must be either free (per FSF) or open (per OSI). Docs must be made available under Creative Commons licenses. And the project must be completely non-profit. (All these requirements must be met.)
That being said, you're right, it's a good home without the politics discussed here.
Just to get two things out of the way: I'm an ASF member (albeit not very active lately) and a huge fan of git with or without GitHub. I'm one of the many people advocating for git internally at the ASF. I have been met with opposition in the past, but a lot of it has been around who's going to maintain the infrastructure, given it's a volunteer system. Let's just take it as axiomatic that the ASF is going to self-host its code. So it's at least a fairly pragmatic argument. And I think we finally have a solution.
My real issue is with the bouncing back-and-forth the author does in his post around the notion of IP. It's a shitty topic that most devs don't want to be bothered with, but alas, it's quite important in the real world. And GitHub is mostly a landmine field when it comes to this. I don't think it's a failing of GitHub itself, but most projects just don't have licenses attached to them. Unlike with SourceForge, there's no requirement to have an OSS license on public projects. Then many that do fail to meet the copyright header requirements for the license. Or you could have a public project with a restrictive license . Being public doesn't mean you get to do whatever you want with the code. This is dangerous and bad for OSS.
Apache gives you that protection. There's never any question about it. That's the primary reason projects go through the incubator -- to make sure the IP is all in order. It's an annoying, bureaucratic, but necessary process in a litigious society. But because of the care and protections Apache provides in this regard, I think they've done more to get OSS adopted in traditionally closed companies than just about anyone else.
 I came across Tom Preston-Werner's repo for his site. He's one of the GitHub founders. It's a public repo with a license that restricts usage of certain portions of the project (generally his content): https://github.com/mojombo/mojombo.github.com
This is a good point. Anyone who starts an open-source project should, from day one, have a vetted Contributor License Agreement and ONLY accept pull requests that include signed CLAs (or from a person who has one on file).
It'd be kinda nifty if GitHub had this built-in. I personally don't require CLAs for every project because it can be onerous. But at the least I try to pick a license that wouldn't allow for submarine patent claims.
I've wondered for years why GitHub doesn't provide a license field as part of the repo along side name and description. I've been known to pester people after they point me to their repo, and ask them to add a license before I'll use their code. Automating CLAs would be a dream.
Some good points, but github doesn't take the place of a community. When it's working well, it helps, but when there is a breakdown of collaboration and communication, you get one of those codebases that has been forked 1298 times where none of the people doing the forking is sharing anything. That's a community fail, not a version control system issue.
I do think that it'd be nice if the ASF offered git alongside svn, and concentrated on the community aspect of things, which it does tend to do fairly well.
Just because you can't 'see' the fork with subversion doesn't mean it doesn't exist. I'm sure plenty of ASF projects are 'forked' within companies and the code is never shared.
What I've been advocating for a while now on the members@apache list is that the ASF look at using Github (either they host it or the ASF does) as the basis to build a new type of community that the ASF has never experienced before. Something that isn't tied to the old school.
Sure, github may bring more things into the open, but it is ultimately just an aid to a community of people, who must communicate about the project. You can't just dump the code on github without communicating with other people working on it.
I agree that outsourcing some of the infrastructure to github might be an interesting idea.
I'm curious how does it harm anything? Did it kill any puppies? Maybe it's inefficient but harmful?
Also remember GitHub is a for profit company. Its allowance for Open Source hosting is a marketing tactic. Anytime they feel the marketing value is not there, they will shut it down. Not that I'm against GitHub. It's a great company for itself. But comparing the Apache to GitHub is like comparing apple and orange.
Don't get caught up on GitHub specifically. GitHub is just the most popular example of how projects don't need Apache to host them anymore, yet Apache still expects to do so, and worse, expects to assert a lot of restriction over it.
The point is, that is at odds with what the community wants and needs. As the author pointed out, ten years ago, rolling your own SCM hosting was a big pain. Now, it's not, partly because of GitHub and Bitbucket and others, but also because rolling your own isn't as hard either.
Anyone with minimal server admin experience and knowledge of Git can run their own Git server on a VPS with something like Gitolite. I know because I succeeded in doing so myself, and I'm neither a pro server admin, nor did I have any Git experience at the time I did the initial setup of Gitolite. Prior to that, I had set up a Mercurial server with no prior Mercurial experience either. It's pretty easy now.
So, yeah, GitHub is there, but GitHub could disappear tomorrow and the community still wouldn't need to turn to Apache for project hosting. In that respect, they're still solving a problem nobody has anymore, and that was the point the author was making.
Note that I have nothing against Git. I use it. The main difference between Apache and GitHub is one is a non-profit whose main goal to shepherd OS projects while the other one is a for-profit company out to make money. I don't think it's a fair comparison to use one to substitute another.
It harms the opportunity cost of the projects that reside within it. That's a real harm.
And I think the point is that while GitHub is there, use it. If it closes off, or goes bust, migrate. You can extract all your data - keep a backup elsewhere. At worst, you can move to Gitorious. I mean, currently they use JIRA for bug tracking - that's not an open project either, that's run by Atlassian.
Projects choose to go to the ASF. Who are you, or anyone else, to tell them it is not the right choice? Billions of dollars are made and saved every year thanks to ASF software (hell it's even in space)- there is a reason for that.
Can the ASF environment be improved for its projects? Of course it can - see my other comments where I address this point.
a) a complete misunderstanding of how and why the ASF operates the way it does
b) a desire for sensationalist blog pieces with almost no factual content
The ASF is working with Git, it has been for years. It doesn't yet provide a canonical repository from which to make releases. This is due to a number of non-trivial technical issues introduced by the processes adopted by Apache projects.
The Apache infrastructure team believe that they have now solved those issues and are testing them in CouchDB. Assuming the CouchDB experiment is a success the ASF will be rolling out Git as the canonical repository to all projects that want it.
Once the ASF has mapped the tools to the processes we can all move on and stop wasting our time with this spurious argument.
Disclaimer: Unlike the author of this blog I do have access to all the discussions about Git in the ASF and I am one of the mentors of PhoneGap, a project mentioned in the article.
I second this thoroughly. I was was almost driven to start blogging last week by ASF's poor job of maintaining its projects. There was a small bug in Solr. I was not the first to find this bug, and someone had not only reported the bug, but filed a patch on the bug tracker a year and a half ago. The patch was never merged in, nobody provided any feedback as to why the patch wasn't merged in.
One huge plus with Github is that if the official steward of a project would like to hand it off to someone else, or is failing to maintain it, it is trivial for someone else to take over the project.
So your problem is not with the Apache Software Foundation but with the committers of Apache Solr. On Github you can do a pull request and it never being accepted, so same result as your experience in Solr.
As you say it's trivial to someone to take over the project and maintain it but not trivial to anyone to find the right fork of the project when a project has 100+ forks.
I find it ironic that your first post complains that the author is comparing ASF to a sand box, but then you go and suggest that GP should just fork a project. I think you really are missing the point: sandboxing is a bunch of people just forking. A community is when those forks are then cherry picked and re-integrated. Subversion is shit at that. Git is awesome.
I have no skin in this game, but if I were to look at the requirements as you describe them, I'd recommend using git and think you were crazy to use subversion.
I'm not the OP, but if something raises my ire almost enough to make me blog about it, it probably isn't going to irritate me enough to fork a large software project. But the lower the bar for submitting a patch, the more likely you'll get one. And the more everyone will benefit from it (assuming it's a net positive patch).
It can sometimes be hard to fork a project just for a patch to one simple bug. Once again, GitHub really shines here: you can go to any project and see all of its pull requests, so you don't have to go hunting for patches attached to bugs, and it becomes quite clear and public when a project isn't properly or expediently merging in patches.
The GitHub layout is really telling of the new open source philosophy. They put the code front and center (main page), and right above it show you with first class status all the bugs it has (issues link) as well as all the proposed changes (pull requests link).
While forking is a solution, it hardly precludes discussion of other less drastic potential solutions to the problem at hand. It does nicely bound the maximum negative impact that Apache social problems can cause, though.
If you're using git, as the OP suggets, then you're going to fork it just to work on it. Forking isn't dramatic. Code is open source for a reason. If you have a critical bug in code you need running in your infrastructure, take ownership of it and fork it. Then do the dirty work to get the patch pushed back up stream.
If the maintainers really aren't doing what they volunteered to do, then volunteer yourself and get it done.
There isn't One Answer. The point is, you have plenty of alternatives. I don't know why the patch hasn't been applied. I know how I can find out though: I can join the developer mailing list and ask. If that doesn't work, I can track down a developer directly (they're not hard to find once you're on the mailing list) and bug 'em until I get a decent response. If it's clear the maintainers aren't doing their job, raise hell on the mailing lists and push to become a committer yourself so you can do the job right.
While that's happening, you can take the approach of maintaining your own patches so that you're not beholden to anyone in particular.
The whole point of open source is empowerment, not entitlement. No one is entitled to get any bug fixed. It's great when it happens but ultimately, everyone is empowered to make things happen themselves.
So you do agree that there are options beyond forking it or merely "raising a ruckus on the mailing list". My point was precisely that there are additional answers and that just glibly saying "Well, just fork it or accept what the mailing list result is" isn't a good summary of the alternatives, and in the context of what you were replying to borders on deceptive.
In the meantime, the fact that I am rich with options doesn't negate the original discussion, which is that the Apache processes are becoming distinctly suboptimal for the context they work in. The fact that I can just take the software and run with it doesn't fix their processes, and the fact that anybody can do so doesn't excuse broken processes. The fact that we can fork does not mean everybody should just stop discussing Apache processes; it doesn't follow.
I'm still in context: "I was was almost driven to start blogging last week by ASF's poor job of maintaining its projects. There was a small bug in Solr. I was not the first to find this bug, and someone had not only reported the bug, but filed a patch on the bug tracker a year and a half ago. The patch was never merged in, nobody provided any feedback as to why the patch wasn't merged in." "You can just fork it" is not an answer to this problem. I'd say in its own way it's a disguised confession that in fact the problems with the project are indeed so bad that your only hope is to fork it yourself. Well, that still says bad things about the project, regardless of whether I have mitigation options.
Because you act as if a "fork" is a drastic choice, when the reply calmly explained that "fork" doesn't have to mean "hey, let's create a new project and try to poach users into abandoning the original one".
The answers seems to be frank, and based on having absolutely no knowledge of Apache, maybe an appropriate one. It seems like they have a process that works for them, and they are quite interested in continuing with it. And that's fine. For people who want a more (for the lack of a more succinct way of expressing it) Git/GitHub style project, they can fork it and hack away to their hearts content.
That doesn't preclude upstream adoption of code, and it doesn't preclude discussions of improved workflow within ASF.
So you're saying a system that has had repeated successes is harmful. I really think you make a good point here about the need to remain open to change. So talk about that. Obviously github has some very positive impact. How can Apache adapt to that? You're not really talking about the tools here, you're talking about community.
I see a potential solution here being that Apache has different rules for projects in different stages. Do you think that would solve the issues?
Remember, you're view of anarchy on GitHub will only last so long. Rules and order come out of anarchy for a reason and like all things GitHub will become the exact same stale community you're complaining of now in 10 years.
What I've observed from running http://www.Apache.com for several years, is indeed an older crowd (40+) by a nice margin compared to a lot of the younger projects floating around that are generating a ton of buzz.
It's been much more rare in my experience to see a 20 something hipster programmer seriously diving in with the ASF. I'm generalizing though of course...
The type of questions and people I interact with through that project are older engineering types, and those with a long history in the programming and computer scene. Usually with an old-school *nix approach to things.
Just wanted to chime in with that, since like stated elsewhere in the HN comments, I think this blog post is more about the organization structure and members of the ASF than the actual Apache Web Server project... which we all love so dearly. ;)
Hate to be the grammar cop here, but the consistent misspelling of "its" in the article is distracting. If the author is here, could you please fix that? It's taking away from a very well-written and insightful piece of writing.
I am not entirely sure what the article is trying to get at.
Politics and law in open source are real and needed, especially in the face of software patents.
Many contributors develop open source code as part of their paid work, as such it is quite important to establish the legal framework to allow contributing the companies IP to an open source project (which includes necessary patent grants). Comitters need to submit an Individual Contributor License Agreement stating that they have the legal right to contribute the code they're contributing. If worked on contributions as employee the company also typically needs to submit a corporate contributor license agreement.
Like it or not (and I personally do very much not like it), you cannot just upload some code somewhere these days.
As such the even FSF is an extremely important organization. Much frowned upon and usually not understood.
Open source licenses would not work without Copyright Law, most developers don't know or understand that.
The main feature I need from VCS are atomic commits. So CVS is out for me for that reason. Sure git's nice and all, but I spend 99.9% of my time writing or thinking about code and software architecture, not tinkering with my VCS, so as long as it works, I don't care.
Easy forking and branching is nice too. In the end, though, just as with Linux there is/are some de-facto master branch(es) somewhere from which "releases" are cut.
Currently with apache it's more convenient to use svn, so I do that.
I don't get the religious opposition against one version control system vs another.
Ever since I started using Git, I would never go back to SVN, because branching and merging is now an essential part of my workflow, even on small project on which I work by myself with no other contributers.
just as with Linux there is/are some de-facto
Well yeah, but I don't get how that's an argument for SVN. The thing I like about Git is that branching is now really, really cheap. You can now keep track of dozens of local branches with experiments that you don't have to push to master. You can now share your experiments with a colleague and push to master whenever something is actually ready. You can now also ban commits to the main repository that haven't been code-reviewed (something which is a PITA with SVN). And so on and so forth.
I don't get the religious opposition against one
version control system vs another.
Even though I prefer Git, neither do I, especially since you can just use the Git-SVN bridge :) I've used it for more than a year, it does has some quirks, but it works fine.
Also, the Apache Foundation does its job and does it well. There's room for both anarchy and bureaucracy and both are needed.
1) Apache Software Foundation and GitHub are two totally different things. Who cares about their internal preferences and bureaucracies. They're both producing outstanding open-source projects which are used by hundreds of thousands of companies and people.
Open-source (and the world) has only gained positive things out of these communities.
2) If you're suggesting that ASF needs to change its bureaucracy, I disagree. Frankly, I feel the bureaucracy has worked, given the success the foundation projects have had.
3) I'm not sure what other points your post brings, but if you're simply just saying that ASF needs to keep itself up-to-date with new tech (dunno git?) then this is also a totally absurd argument since the tech being used in Apache is totally amazing and new.
I feel like your outcry is referred to general institutions... you should probably refer to governments and other political entities instead of bashing on a foundation that has given the world amazing products.
Apache is pretty much the last major open source community to not move to some form of distributed version control. It's either politics (they host the Subversion project there) or negligence in my opinion.
This might be slightly offtopic, but might be a symptom of the "institutional"/"organizational" issues addressed in the article:
I always thought Apache 2 and Subversion were two of the best examples of second-system effect. I mentioned this once to one of the core Apache (and Svn) developers years ago, and not only was he blissfully unaware of the effect, he indicated that he had helped build incredibly successful pieces of software (ie., Apache 1.x) and didn't need any advice from from Fred Brooks or anyone on how to do it.
Both Apache 2 and svn have been extremely successful projects, but both were late, didn't really match expectations or even the success of their predecessors, and are slowly being outcompeted by much smaller and usually more efficient projects (eg., nginx, lighttpd, git, hg) that are developed much more quickly by much smaller teams.
Free project infrastructure wasn't hard to setup five years ago. It hasn't really been a problem since 1999 when SourceForge opened. Before that, the SunSites did a nice job, and before that you basically had to know a friendly university sysadm (which wasn't _that_ hard to find).
I'm not sure what people think they gain by going under the Apache umbrella, but it must be something since they bother. There are no lack of alternatives.
I wish at least one open source replacement adopted .htaccess (and httpd.conf) compatibility.
Litespeed is the only product in existence which has made switching over from a complex Apache install a one hour affair, but it's free version is limited to 5 hosts and the commercial version can only be justified in a profitable environment.
The performance difference is breathtaking however.
As an aside, the Clay Shirky quote ("Institutions will try to preserve the problem to which they are the solution.") was new to me, but puts the RIAA/MPAA pretty much perfectly into perspective. Not really related to the article, but it clicked as I was reading.
Where to start with this blog post? It appears that the author has seen a couple of private emails and thinks he knows all about the internal workings of the Apache Foundation. He is wrong on so many counts.
His entire dislike of the Apache Foundation appears to be predicated on the fact that the organisation did not force every project to move to this blogger's favourite version control tool. Making a change as large as this requires many different things, but in particular:
1. Community change. How committers interact with each other when there are lots of forks is quite different to the current situation. That suits some projects and not others. Not every project at Apache will benefit. Some will. All who change will need to think long and hard about release processes, merging strategies and much more. Git encourages the idea that every commit or fork is completely equal to every other fork or commit. The Apache Foundation is built on the concept of meritocracy: commit rights are given in response to demonstrated skill. This is not an intractible problem with git, but new challenges need to be solved.
2. Legal change. Right now there is a simple process for signing off intellectual property for contributions which were merged from external contributors (who have not signed a release). That changes with git and becomes more complex. There are solutions, but they require careful planning.
3. Infrastructure. Hosting a large git repository with the level of downtime acceptable to Apache isn't something you do quickly. That needs planning and maintenance.
4. Toolsets. Lots of things in Apache are tied into subversion. From mailing list commit hooks to build servers and much more. Changing those things takes work.
5. Splitting the community. Right now the entire organisation's intellectual property is held in a single repository. Everyone knows where everything is to be found. Changing this simplicity requires a very good reason.
So what do we have now? A blogger who (it appears) doesn't actually contribute code to any Apache project. Telling other people how to run their organisation (which is wildly successful). And that they should change to this blogger's favourite new tool (they should have done it in 2008!) or face irrelevance.
If Apache moved every project to github tomorrow would that satisfy this blogger? More importantly, would that have caused this guy to commit high quality code toward one of the Apache projects? Or is he just blowing a lot of hot air about something he knows little?
And what brought on this great complaint? That the Apache Foundation is currently underway with trials for one project to see how git would succeed for their workflow. And to then evaluate its suitability for other projects across Apache.
Apache is not Github. That is, Apache is much more than a website, a couple of tools and a repository of code of random quality.
Disclaimer: I am an Apache member, but not speaking on behalf of Apache
> 2. Legal change. Right now there is a simple process for signing off intellectual property for contributions which were merged from external contributors (who have not signed a release). That changes with git and becomes more complex. There are solutions, but they require careful planning.
I'm curious as to how exactly you feel git impacts on Legal processes versus the use of svn. I'd expect that the choice of tools and the legal issues surrounded merges made by those tools should be completely orthogonal.
With an svn workflow the committer sends each patch in a single authenticated request directly to the Apache svn server. With every commit they are saying "this code is appropriately licensed, even though the code may have come from other committers. The history of that code is completely obscured."
With a git workflow, the push (which is authenticated against a committer who has signed the appropriate license agreement) could contain multiple commits from other sources. This is particularly the case if it includes code from a pull request. The Apache git tree will then have commits with publically visible attribution to people who are not Apache committers and may not have signed the appropriate license agreements.
I am not saying this is a deal breaker, but it does require some thought. We don't want some contributor to come back three years later and say "that contribution from me: it was only released under the GPL". We need clear guidelines around that original pull request and how copyright/patent signoff happens. Right now, third party contributions go through a Jira patch process which includes a copyright/assignment tick box.
Doesn't simply requiring that all pull requests be squashed down to a single clean commit from a developer known to have signed the license agreements give you back the exact scenario you have under SVN?
Anecdotally, a lot of projects I've been involved with have required that pull requests be squashed to avoid polluting the "main" repo with irrelevant/undesired third-party history.
The ASF is missing the boat here. It is like ... the 21st century? And core people of the Apache are not only thinking “SVN should be enough for everyone” but also make it exceptionally hard for projects to use the rigth tool for their job.
My bet is still on Git being shot down due to some random made-up “quality concerns” in the end.
ASF went from a helpful free software organization to a software graveyard: Ant, Maven, Subversion, Commons, OpenOffice ...
Are you just trolling or trying to make some point? What part of "Apache is currently evaluating git" makes you feel that git is being shot down?
Git is just a tool. In five years there will be another tool that everyone cannot live without. And people like you will be telling Apache that they are dinosaurs because they have not moved to that. Right now some very dedicated and skilled people are donating their free time to running the Apache organisation and evaluating the feasibility of making changes. And you accuse them of lying about quality concerns (which no one has even raised). Meanwhile, your assistance to the advancement of open source is what exactly?
Much of this is pretty unsurprising, especially for people like me who watched the attempted transition of OOo to Apache.
The way the article puts it is that the ASF is trying to solve problems that don't exist anymore, which is true to an extent, but the deeper problem is that the ASF has a particular view of how open source development and project management work, and attempts to impose that view on far too diverse a community, even as it tries to absorb more and more communities.
The ASF is simultaneously trying to be "big tent" and unified, and the balance is all out of whack. It's easy to draw parallels to recent political problems in the US and EU. In all three cases, there's going to have to be some transformations, probably in both society/community and structure, to come back to a place where the institution contributes to the greater good, instead of being a source of unending tension and meta-arguments.