Apache considered harmful

jaaron · on Nov 23, 2011

Ok, there's a lot to cover here.

First off, the Apache Software Foundation isn't trying to absorb anyone or anything. Projects and people come to the ASF. It's a specific policy of the Foundation to NOT solicit projects. If someone says they're representing Apache and soliciting projects, they're wrong.

Secondly, Apache is very opinionated about how projects should be run. This comes from years of experience as not only a successful project, but as a successful non-profit organization overseeing dozens of projects. If a community doesn't like the ASF's style or rules (such as no dictators, benevolent or otherwise), they don't need to be there. No one wants to keep projects hostage. Part of the point of the Incubator is to get this figured out earlier than later.

Thirdly, about git and subversion. First off, there's increasing support for git at Apache (see http://git.apache.org/) but there are some serious drawbacks for use of git. Consider this: subversion was practically made for Apache in the way Linus made git for Linux. With that in mind, subversion isn't going anywhere at the ASF. Some of the rational is just plain stubbornness, but some of it goes straight to the core values of the Foundation.

Apache has become, for better or worse, the place where lots of projects go when they grow up. Growing up is hard to do. It's not fun. You have to do things like get a job, pay taxes, etc. When a project grows up, people start caring about who contributed what, under which license and making sure every line of code is legit. A lot of engineers don't care about this, but businesses and their lawyers do. A lot of the Apache Foundation "bureaucracy" is to handle this oversight and paperwork.

Git is an impressive tool and github is awesome for what it is, but it's not a non-profit foundation and it won't replace one. Confusing the Apache Software Foundation for your coding sandbox only suggests you don't understand the true purpose of either.

dscape · on Nov 23, 2011

[this text was written by mikeal originally here but somehow got censored. gist exactly as he wrote it https://gist.github.com/1387977]

This is perhaps the most depressing response I've received to my article.

As I said in my article this is far less about git and more about the chasm that has grown between Apache and the rest of the community.

Your first two points boil down to "nobody makes you join Apache, if you don't like our policies then you can get out". How does this help Apache or its projects?

Apache could still be valuable to the community but this kind of stubborn attitude will insure that it continues to become irrelevant when it could be a leader.

I do understand the purpose of Apache and it is not hosting source code. That is the point I'm trying to make. If that is not its value, and its policies around hosting that source are no longer beneficial to its projects, then it should change its policy.

I think that you, and many people in the ASF, have married the existing policies of Apache with the purposes for which they were created. While the intentions of the policies may still be relevant, and in my opinion correct, the policies themselves will not remain relevant forever in a field as rapidly evolving as technology and GitHub may just be the first example of Apache policy incompatibility with evolution of open source.

jaaron · on Nov 23, 2011

"How does this help Apache or its projects?"

The Apache Foundation is what you make of it. It will not change just because you post to your blog, but it will change if you engage the committer and membership population, build a consensus around your ideas and volunteer to do the actual work to make the changes happen.

No one will force you to do such work and if you don't want to do it, then you're not obligated to do so. No one will be upset if you, I or anyone else leaves the Foundation. It's cool. We're all here at-will, volunteering effort and code.

Apache cannot be everything to everyone, despite how much it is pulled to be so. Right now it fills a particular important role in the open source and larger software ecosystems. It's in that position due to both historical precedent as well as intentional decisions by the membership body.

But trust me, no one in Apache is ever, ever completely satisfied with the Foundation. That's to be expected -- the organization is driven by the compromises of a large group of people with different ideas and expectations. To balance between the chaos of constant change and the death of no change, the organization has grown guidelines and rules from the collected wisdom of its membership. This gives us at least some framework by which to evolve.

As for hosting code, there have been proposals at time for Apache to push the code hosting to some other organization. Once it was SourceForge, then Google Code, now Github. Of course, it's a tricky situation as the Foundation has particular requirements and wants to know its code will be around for decades. Moreover, infra team is constantly understaffed and thus are a very, very conservative bunch. We've seen way too many people jump in with a great idea and then leave maintenance to someone else. They're stubborn for a reason.

And perhaps Apache and Github are incompatible. So what? Github is a tool. It's incompatible with lots and lots of organizations and ways of doing things. The FSF has its rules and culture. Same with the Linux kernel, distros and desktops like KDE and Gnome. Android is different too. Not all of those mesh with Github and that's fine.

benatkin · on Nov 23, 2011

> The Apache Foundation is what you make of it. It will not change just because you post to your blog, but it will change if you engage the committer and membership population, build a consensus around your ideas and volunteer to do the actual work to make the changes happen.

This is true only from a CYA standpoint. As Mikeal said in the article, it's possible to do a lot of work and build up a strong case for a change that's important to project maintainers, yet still have Apache come up with excuses for why it can't happen. This is what happened with git -- twice.

lucisferre · on Nov 24, 2011

I believe he did engage the committer and membership population. You're response, disagreement or not, is proof of that. Disparaging the way he did it with statements like "just because you post to your blog" is completely unfair.

buff-a · on Nov 23, 2011

Mikeal isn't arguing that Apache source should be put on GitHub.

He is saying that ASF would benefit from the kind of community development that git promotes as exemplified by GitHub and linux.

cube13 · on Nov 23, 2011

Linux is a bad example. It's not "community" development by any real definition of it, because Linus controls everything that goes into the mainline codebase. If anything, it's community maintenance, because that is delegated out.

More importantly, Git by itself does not promote community development. No source control system does. Some make that style of development easier, but none of them actually directly promote it.

GitHub is not Git. GitHub is the Git version of SourceForge. Nearly all the community development features(bug tracking, forums, etc.) on both sites are built beside the source control system, and aren't really integrated directly to either Git or SVN.

drivebyacct2 · on Nov 23, 2011

mike,

I don't understand why this response is "depressing" and characterizing the reply as "about git" is, frankly, not representative of the post I just read.

buff-a · on Nov 23, 2011

There's no reason apache can't maintain its own "legally authorative" git repo. Nothing in the authors post suggest that he is confusing the ASF with a "coding sandbox". Making that claim suggests to me that you are invested in the alternative and not thinking objectively.

And I disagree about subversion being "made for Apache in the way Linus made git for Linux". Subversion is an utterly derivative implementation of any server based VCS in existence, where as git is an example of truly creative thought (not just from Linus) about what VCS should be for a large community that requires the accountability that you claim ASF requires.

tptacek · on Nov 23, 2011

I'm not totally familiar with the issues here, but from an earlier perusal of the email threads on this, it seems like ASF's concern involves things like git's ability to edit the repository history.

exDM69 · on Nov 23, 2011

tl;dr: Sign your Git commits cryptographically with PGP if you don't want the history to be editable.

Git's ability to edit the history is a very useful tool. I don't think Subversion or any other VCS prevents you from editing the history either. Maybe they just don't provide tools for that so you'd have to hack the internal data structures of the VCS or something, but you actually want a tool to modify the history. Think about the situation where somebody accidentally pushed a secret private key or a database password to a public repository, you want it out of there! (there's a ton of examples of this in GitHub. git filter-branch is what you should do).

In order to provide "safe" history for Git, the commits must be cryptographically signed by their authors. This is vastly superior compared to trying to use some server side authentication kludge, which can be broken into. And the data structures of the VCS database can be modified, with a hex editor if all else fails. Cryptographic signing provides a guarantee against hex editor hacking too.

If I read one more "Git sucks because you can edit ancient history" comment from someone who doesn't understand the concept of crypto signing, I will cry.

buff-a · on Nov 23, 2011

Discussion on this subject with a reply from Linus:

http://git.661346.n2.nabble.com/GPG-signing-for-git-commit-t...

wladimir · on Nov 23, 2011

You of all people should know that GIT history consists of a write-only log which is maintained using cryptographic hashes. If you edit one commit (even the metadata) you have to rewrite history, and all the hashes for commits after it change.

People will notice, and most importantly, everyone will still have the old commit chain locally.

This means that even the server cannot arbitrarily edit history. With SVN, afaik this is possible by manipulating the database.

tptacek · on Nov 23, 2011

"Me of all people"? People sure are zealous about their version control systems. For the record, I use git. But I don't really give a shit about it. For most of my career, I used CVS.

Git is neat, and I like it, but I'm not planning on studying its internals any time soon.

wladimir · on Nov 24, 2011

Right. But it is useful to know, not so much because of the VCS aspect, but because of the security aspect. Hg/Mercurial BTW works in the same way.

JoshTriplett · on Nov 23, 2011

Git normally only allows you to edit unpublished history; the server can prohibit editing of published history. Similarly, svn allows history editing if the server permits it.

onedognight · on Nov 23, 2011

See Fedora's git repository for an example of how to do this.

jhawk28 · on Nov 23, 2011

I believe they use gitolite to do this. (https://github.com/sitaramc/gitolite)

buff-a · on Nov 23, 2011

In discussion of an article which makes the claim "The problem here is less about git and more about the chasm between Apache and the new culture of open source." it is ironic that an objection is raised that is trivially answered by using one of the very proponents of this new "open knowledge" culture, Stack Overflow:

http://stackoverflow.com/questions/2085871/strategy-for-prev...

buff-a · on Nov 23, 2011

Oh, I can think of a million scary sounding consequences of using git and I'm sure that all were raised. This is what established groups do when confronted with change: raise any objection even though a moments thought demonstrates the paucity of its merits.

tptacek · on Nov 23, 2011

That moment of thought was apparently too expensive for you; you didn't respond to the actual concern, but rather raised an argument suggesting that any argument about git must be meritless.

I don't know who you expect to convince by baying at the moon.

The ASF people are right in at least one sense: if you don't want to run projects in the ASF style, you are free to take your work elsewhere.

buff-a · on Nov 23, 2011

I need not respond to the actual concern because the ASF has already done so. The ASF has already decided to allow git to be used. I assume that their lawyers OK'd this change. So I did not intend to continue an ongoing discussion: ASF has already concluded that discussion and approved git. Clearly, jaaron does not represent the views of all the "ASF People", and for him to raise issues as legal showstoppers when the lawyers have clearly approved is utterly disingenuous.

The purpose of my post was not to discuss the merit of the git vs subversion argument, but instead to discuss the merits of jaarons criticism of mikeals article.

One method that established groups resist change is to continue to bring back discussion to issues that have been decided. It helps slow discussion on change by making it appear that a previous issue was not, in fact, resolved. In their mind, of course, its not been resolved: the lawyers were wrong, or perhaps the lawyers didn't understand. Established groups don't just get over it and move on. Why would they?

ASF has decided to allow Git. I believe that those projects which use git will enjoy more success than if they use subversion. Mikeal makes some interesting observations about this. Jaaron spouts the traditional establishment bullshit:

1. The other side are children. We are grown ups.

2. Legal implications.

3. Nobody is forcing you to participate.

4. Condescension. "It's impressive for what it is" ... (but "what it is" is "just a sandbox")

I'm calling it for what it is.

bct · on Nov 23, 2011

> 4. Condescension. "It's impressive for what it is" ... (but "what it is" is "just a sandbox")

You're jumping at shadows and putting words into his mouth. What he said was:

> Git is an impressive tool and github is awesome for what it is, but it's not a non-profit foundation and it won't replace one.

Which doesn't imply the same condescenscion as your "quote".

reissbaker · on Nov 23, 2011

The very next line:

    Confusing the Apache Software Foundation for your coding sandbox ...

bct · on Nov 23, 2011

But he does not say "just a sandbox", or otherwise imply that a "coding sandbox" is less valuable than a non-profit organization. They're different things.

DannoHung · on Nov 23, 2011

I'm not super familiar with subversion's internals, but couldn't a malicious user edit a subversion repo history?

danssig · on Nov 23, 2011

That's the best part: if you have access to the source SVN repo you can change history and there will be no evidence that you did so. History in Git, on the other hand, cannot be modified without it showing up.

The reason is that in Git every commit gets its own unique hash so you can't change a commit without creating a new hash. To have this in SVN you have to buy 3rd party tools.

Someone · on Nov 23, 2011

I do not think it is as black and white as you describe it. The way I see it: if somebody falsifies a complete repository, the only way to detect that it was changed is by comparing its content or a hash thereof with that of a (supposed) copy that is more trusted.

That is true for any digital archive, including those made by any SCM system. The only thing where git differs from svn in that respect are a) that it computes such hashes for you, and (typically/AFAIK) shows those hashes in its UI, and b) that it is typical for people to store those hashes on other systems. The net effect of that may be large or small, depending on the number of people keeping a copy who will not blindly copy changes from the 'main' repository.

danssig · on Nov 24, 2011

>I do not think it is as black and white as you describe it. The way I see it: if somebody falsifies a complete repository

No, it is. You can't "falsify a complete repository". We will all have checked out from that repo and as soon as someone replaces it with a fake none of the hashes will match up.

>the only way to detect that it was changed is by comparing its content or a hash thereof with that of a (supposed) copy that is more trusted.

Which happens in the system automatically. Have you actually worked with Git? Go change history on something you've pushed and other people have pulled.

>number of people keeping a copy who will not blindly copy changes from the 'main' repository.

It's not about "blindly copy changes". If you pull from a repo where someone has tried to rewrite history you'll see duplicate entries all over your log. If you have a graphical tool you'll see right where they started their modification.

tptacek · on Nov 23, 2011

Without access to the database itself? How?

This is part of git's interface. (I appreciate it and don't think it's a bogeyman, but can see how it could be incompatible with some projects).

buff-a · on Nov 23, 2011

As I understand it, if the repo has receive.denynonfastforwards=true, a user can't push changes that will destroy history. This flag has been available since 2006. (And I didn't mod you down. You ask a legitimate question). A bit more research shows that there a couple more config changes required: http://stackoverflow.com/questions/2085871/strategy-for-prev...

aidos · on Nov 23, 2011

I don't think it's easy to do. You can change a commit message, but even that's not easy (you basically need admin access to the repo files).

If you want to edit the contents of the repo I think you need to read > filter > rewrite the whole thing. I could be wrong about this, it's been a while since I thought about it.

nknight · on Nov 23, 2011

Which is utterly trivial (I've done it, seriously, it's not the big deal you seem to think it is, aside from the obvious difficulty of particularly large repos), and is not conceptually different from what's necessary for editing git's history, except that nobody can tell you've done it without comparing the "new" repo to the old one -- and under svn's internal model, no one but the server will normally have a complete history.

With git's model, not only does everybody have the history, but the commit ID themselves are your insurance against tampering. You effectively validate that history every time you sync with another git repo.

nknight · on Nov 23, 2011

Yeah, that particular bit of FUD is quite popular with the anti-git crowd. It's nonsense. Any attempt to edit the history of a public repository will be noticed instantly by anybody who tries to sync up, no matter what.

Stick in a post-commit hook to force a sync to a backup repo nobody has access to if you want to be really paranoid, but as it is, git is already far more resilient against tampering with the public history than svn ever was.

icefox · on Nov 23, 2011

Just turn of garbage collection (it isnt instant but i wouldnt bet that it would still be there in six months) and even if you rewrite the history you won't lose the objects. No need for a backup sync

rgardler · on Nov 23, 2011

You are right, there is no reason why the ASF can't own its own "legally authorative" git repo.

That is exactly why the ASF is conducting experiments in exactly that. Assuming those experiments are a success and representatives of ASF users are happy (which include business folk and lawyers, not just developers) then the ASF will role Git out to all projects that want it.

This whole argument is moot.

chaseideas · on Nov 23, 2011

Strongly agree with jaaron's post.

I also agree with some of the replies stating that the ASF should do as much as they can to foster contribution and a more active/social community. Perhaps they've just been a bit slow to adopt newer solutions because active contributors are content with the existing setup, and it's not seen as a huge benefit to change this just to be hip with the new crowds every quarter. Either way, it's nice to see Git catching on a bit, since it's one of the most common choices nowadays.

As an organization, the ASF does an excellent job of overseeing crucial projects and the bureaucratic side of programming that comes along with developing services in the enterprise space. With such a large community and range of projects, a degree of bureaucracy is required to keep things running smoothly.

They also do a good job of staying a mostly neutral party and working strongly towards the success of any project they "take in" under their wing. As jaaron noted, they don't actively solicit anything, or go around trying to absorb projects. They simply help maintain the projects that have grown up (and many people rely on).

Being strong proponents of the open web, flexible software licenses, and doing a lot of the paperwork heavy-lifting in the internet and programming industry from a legalese standpoint, I think they do a lot more for the web as a whole than most people are aware.

On the community note, I've run a few Apache community forums over the years, and it always seemed Apache users are starving for more community interaction and very appreciative of the social environment. Many of the Apache projects have a bit of a learning curve to be put into practical use, and people appreciate some guidance in learning the ropes. Maybe the ASF will notice and embrace a larger sense of community as time goes on, or perhaps some of us here on HN can connect and brainstorm something to fill the void.

I think the title of this post was just hyperbole to catch peoples attention regarding a web server vulnerability or newly discovered bug. Haha, seemed to work pretty well.

We're often caught up in the hype of new and catchy-sounding web technologies, but Apache is an organization that helped shape the modern web. Could they perhaps make the community more approachable to newbies? Sure. Are they harmful to the open source movement? I would think not. They bring a lot to the table.

- Chase

blasdel · on Nov 23, 2011

The ASF isn't where projects go to "grow up". It's mostly where companies like IBM go to dump their enterprisey Java frameworks so they can be marketed as ostensibly open to outside developers.

bergie · on Nov 23, 2011

Nope, ASF actually provides and enforces a reasonable governance model for projects it adopts.

I wrote something on why this is important: http://bergie.iki.fi/blog/open_source-free_software-what_we_...

visural · on Nov 23, 2011

While correct, it would benefit Apache in the long run to foster better communities and tools for their projects to grow. While they provide value in terms of the protection/process/management it's all for nothing if the developers can't collaborate effectively.

GitHub has changed many developer's UX of opensource development to the point that they don't want to do it "the old way". Apache should be looking to grow in this direction to keep their developers and projects engaged.

sliverstorm · on Nov 23, 2011

Trouble is, their existing developers and projects seem to be doing fine with SVN. Don't forget those are the most important people, the workhorses you already have- not the flighty young birds you hope to one day trap.

Sure, there is always the future to think about, but the future isn't happening a week from now. They have time to watch Git grow and wait until the time is right.

reissbaker · on Nov 23, 2011

Except that the existing developers aren't all doing fine with SVN. As referenced in the post, both CouchDB and PhoneGap (existing "workhorse" Apache projects!) prefer to use git, but have met with strong opposition from the ASF.

The time to allow for Git usage is already here. It's not just pie in the sky forecasting -- existing projects are being held back by bureaucracy.

danssig · on Nov 23, 2011

>Trouble is, their existing developers and projects seem to be doing fine with SVN.

This statement has little meaning. If they were still on SCCS and had been using it for years I'm sure they would "do just fine" with it. But if they move to something modern they could do even better.

thomaslangston · on Nov 23, 2011

There's no reason they can't do both at the same time.

And the time to adopt GIT was 2008.

mcantelon · on Nov 23, 2011

>there are some serious drawbacks for use of git

Such as?

nirvdrum · on Nov 23, 2011

It's not terribly great when you have large files in the system. You end up with a huge repository on disk as those files change. But more importantly, you can't do a partial checkout of a particular path. I think I read that that'll be coming to git, which would be fantastic.

hesselink · on Nov 23, 2011

Actually, you can, with sparse checkouts. They've been in since git 1.7, if I remember correctly. It's not very user friendly yet, and you do retain the entire path from the repository root. See for example here: http://vmiklos.hu/blog/sparse-checkout-example-in-git-1-7

nirvdrum · on Nov 23, 2011

Great. I'll have to check that out. Thanks for the link.

mtts · on Nov 23, 2011

It's not centralized.

marshray · on Nov 23, 2011

I've used centralized SCCSs for years and years. I guess I don't see why you can't use git as one, just have everybody agree to push to a central repository on a server.

There is the issue of the size of the local copy, but it doesn't seem to be a big deal in practice. I just don't check in the heavyweight frameworks as vendor drops into the same repos as I use for my own, smaller, codebase.

Granted, I'm scared to death of ending up in a merge-gone-wrong hell situation trying to do stuff that I'm very familiar with in Perforce. But I'm happy being a git newb taking baby steps a bit longer.

rimantas · on Nov 23, 2011

There is no issue with the size of the local copy. Most of the times the entire git repo is smaller than single SVN checkout.

marshray · on Nov 23, 2011

An up-to-date snapshot of my source tree at work is several GB. The whole Perforce repo is probably 100GB. Most of that is vendor libraries. For example, every so often we update to a new version of the Boost C++ libraries and pre-build it for most common platforms. This amounts to a GB or two. This is easier on the other developers and it makes the process more repeatable for QA.

One of the great things about Perforce is that it's normal practice to map only selected subtrees of the repo. So I have several workspaces going at any one time.

As much as I am impressed with git's speed, this would not work with git. I used to try to managing Boost's vendor drop as a git repo. I now just keep my notes in there about how to download and build it locally.

MattRogish · on Nov 23, 2011

I think the idea is to use git submodules for vendor libs. Anything you're not modifying and a 3rd party is maintaining should be a submodule; that way you don't keep any of those changes in the repo.

marshray · on Nov 23, 2011

When a new vendor release comes out, we build it with our "official" compiler settings for the different platforms, branch the headers from the source and combine them into a convenient "SDK" tree, sometimes tweak something here or there, and update the document. Amounts to several GB being checked in from multiple machines. Occasionally a developer (usually me, but others too) needs to commit changes back to our central repo vendor tree.

I suppose we could do that into a separate repository and then define parts of that as a git submodule.

Hmm, reading http://book.git-scm.com/5_submodules.html I don't see how to do that with git without using 100 GB on the library dev machines.

nirvdrum · on Nov 23, 2011

I don't think I've ever seen a situation where that's true. And I've converted a lot of repositories (I maintain the svn2git project). SVN has its metafiles, but git has the full history locally. In all but the most trivially-sized projects, the git clone is bound to be larger. For larger projects it can be several orders of magnitude larger.

fanf2 · on Nov 24, 2011

I have here a svn checkout of the FreeBSD HEAD branch, using 1550 MB disk in 199443 files.

I also have a git checkout of the same thing, which includes the history of the HEAD but not the other branches. It is 1234 MB in 54823 files.

rimantas · on Nov 23, 2011

Git also has much more efficient storage mechanism.

nirvdrum · on Nov 23, 2011

I'm aware of that and didn't imply otherwise. But the storage mechanism is for the history, not for the materialized files in the working directory. That's going to be the same for either git or SVN, since they're checked out. So you're talking about comparing git's history DB to SVN's metadata files. The metadata files are effectively constant cost whereas the git history grows with each checkin. They get quite large.

mtts · on Nov 23, 2011

You can, but SVN more or less forces you to.

marshray · on Nov 23, 2011

Which means that if a developer wants to do do git-like stuff (say sharing a patch with a single other developer or committing a day's unfinished work to an alternate backup site), then he's just going to have to work around the SVN tool to do it.

The tool should exist to serve the people producing the work, not the other way around.

skrebbel · on Nov 23, 2011

It's complicated.

kahawe · on Nov 23, 2011

>>there are some serious drawbacks for use of git

> Such as?

The more important question is: what are developers really missing when they have to use svn to hack ASF's code in comparison to using git? (leaving github out of the equation, it is not git) You can hack locally to your heart's content and in case you want to contribute, you can diff and contribute.

I don't really see the problem.

mcantelon · on Nov 23, 2011

>I don't really see the problem.

Part of the drive of open source is bootstrapping better ways of doing things. We could all likely do our jobs to some degree running Windows 95, but who wants to? Git improvements over SVN include increased performance, cheap branching, more detailed tracking of changes, the ability to code offline (not a huge deal, but it certainly has made traveling more fun for me), and DVCS collaboration capabilities.

buff-a · on Nov 23, 2011

What you describe is the original use case for git: pull-based community development. So I can only conclude that you don't see a problem because you've never actually done it.

cube13 · on Nov 23, 2011

Which is the exact basic use case for almost all modern source control systems. Git doesn't do anything new or unique for that use case.

buff-a · on Nov 24, 2011

All server based vcs are push based. Certainly they can be used pull-based, where others diff code, send the patch, and a committer then integrates and commits. Git, on the other hand does two things differently. 1) it makes it vastly easier to create, submit, and integrate patches (because it was designed for it) and 2) it makes it vastly easier for the people making the patches, who don't have commit privs, to maintain their changes in their own repo, while still syncing with the master.

svn does not offer these features, which means that when you dont have commit privs, its a PITA to track and maintain your own version. I'll say that again: it makes it massively frustrating to maintain your own version. The result is that people dont, and the number of potential hackers is vastly reduced.

git, like svn, allows a small community, such as ASF, to manage and control a project - addressing legal and operational concerns. committers can push just as easily as with svn. but in addition, each committer can (if they so choose) also maintain relationships with a much greater community, each member of which can hack away on their own with a fully synced, fully versioned repo of their own. hackers get their own repo. committers have a really robust system for integrating patches.

If ASF wishes to do no more than have a small number of committers working on a project then indeed, svn suffices. its "good enough". the point of mikeal's post is that this is missing out hugely on the vast army of hackers.

I said in an earlier post that I had no skin in the game. that's true. now. i used to track tomcat. but it was such a pain in the arse to handle merging my code with the official code that I gave up. it just wasnt worth it. git would have solved every problem I had. perhaps by now I would be a committer on the tomcat project. who knows? svn made that a certain "no".

an organization like ASF needs a finite number of committers. git allows keeping that group small, yet engaging a much greater community.

but as mikeal says, its not about svn vs git. its about the mindset that sees no need for git because it sees no need for more than just committers.

neilk · on Nov 23, 2011

Hey, I work at the Wikimedia Foundation -- a non-profit -- and we're eagerly anticipating a switchover to git. I don't see what one thing has to do with another.

mef · on Nov 23, 2011

what about Subversion goes "straight to the core" of the ASF that is not also true about git?

umarmung · on Nov 23, 2011

Subversion is a centralised or push model.

Git is a distributed or pull model.

This is at the heart of the processes one may use, certainly within a project but possibly at cross-project or organisational model.

mef · on Nov 23, 2011

what processes does the ASF use cross-project or cross-organizational which could not be replicated in a git environment?

cpeterso · on Nov 23, 2011

Ironically, http://git.apache.org is not responding.

bergie · on Nov 23, 2011

There is also a mirror on GitHub: https://github.com/apache

tedsuo · on Nov 23, 2011

I would put up node.js as a counter-example. It is a large, important project successfully being managed on github. It's sponsor, Joyent, is a private company, but the role could easily be filled by a foundation like Apache.

mapgrep · on Nov 23, 2011

There's actually not a lot to cover here? The question is why you're so scared of git and in six paragraphs you managed NOT to answer that question. This is really sad. One might hope an open source fondation would be forthright and fairly transparent. Apparently not.

jaaron · on Nov 23, 2011

Uh, I linked to the official git repos for Apache. Apache isn't afraid of git. Plenty of Apache people love git. At the same time there are issues with implementing and supporting it by the ASF infra team. If you want all those details, search those mailing lists.

kahawe · on Nov 23, 2011

Spot on, excellent points and I could not help but wonder about the article... most people would consider "apache" to be synonymous for the httpd and not the ASF, so the headline is clearly fishing.

Then, OP argues ASF's processes are broken and a github project with one maintainer is so much easier... you really have to consider the scale of the average sourceforge (back in the day) project vs. apache even back then - and they have only grown from there, so no wonder a project hosted on github now is more "fun" to contribute to and work with because processes are probably pretty much non-existent and communication-paths and hierarchies are pretty much flat. The same goes for working at a small start-up vs. working at HugeFatCat Inc.

But one thing I cannot stand: arguing git over subversion. Yes both are great revision control systems but they are just meant for different applications. Where I am working, I would take subversion over git any day if all I really want is an absolutely sure-fire way of having a simple and easy to use central repository. I am in the unfortunate position to having met and working with quite a few people (programmers non-the-less) who have a hard time coping with basic CVS/SVN update-commit cycles - with git and mercurial offering you the option to commit but to your local changes first before actually committing your changes to the repository you checked out from (or another repository or...), this would have literally created havoc and confusion for my small team here... I know them, yes it sounds ridiculous, but I know I have saved myself a hell of a lot of trouble and headache. SVN is an absolutely perfect tool for the job if you do not need the de-centralization, on-the-go-commits and all the possibilities of creating custom processes in git with all its flexibility and features and options. SVN works just fine for what it is.

git and mercurial are not better,newer,shinier substitutes but different beasts altogether. (great, shiny, powerful and useful none-the-less) So could be please just let them peacefully co-exist and be thankful someone made them for us to make our lives easier?

buff-a · on Nov 23, 2011

So after telling us that your current environment is working with people who have a hard time with svn/cvs, forgive me if I find your opinion on git's ability to handle the requirements of large-scale, community driven development of the code that pretty much runs the internet to be utterly irrelevant. No. The kind of people who cannot handle cvs/svn update concepts are not the target demographic for a solution used to develop a fucking operating system or the worlds http server.

kahawe · on Nov 23, 2011

Did you even read what I wrote? Your comment is irrelevant because you obviously did not understand my point: that git is not a "simply better" substitute for svn as a lot of people seem to think. There are situations where svn is the right tool for the job, my example is one of those situations in my opinion. So yes, definitely are they NOT the target audience for git - exactly my point.

jimbobimbo · on Nov 23, 2011

Spot on the "coping" with update/commit cycle - I have exact same problem in my team.

kahawe · on Nov 23, 2011

You have my full sympathy... we are doomed :(

kiba · on Nov 23, 2011

I am not sure what the use case differences are. To me, git is simply a better version of svn.

ma2rten · on Nov 23, 2011

I am sorry to tell you, but you are plain wrong! Git is better. The problem is that SVN embraces a workflow that is inferior. But if you are not willing to change your workflow, git will seem confusing, indeed.

kahawe · on Nov 23, 2011

> I am sorry to tell you, but you are plain wrong! Git is better.

Please, do elaborate! In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access. No distributed or remote or on-the-go development, no forks.

> SVN embraces a workflow that is inferior.

It is different but what exactly makes it inferior?

yummyfajitas · on Nov 23, 2011

In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access.

I was in exactly this situation at my last job (minus the same room, I was stuck in a separate office). In spite of the main source repo being SVN, we all used git-svn as our client.

The main benefit is that git makes it easier to create clean commits and push them to trunk.

E.g., I want to make module Foo, and use it in Bar. I can use git as I go, building module Foo incrementally over 5 commits (none of which has sufficient quality to go into trunk). Then I can do 5 more commits which integrate Foo into Bar. So my individual history is tracked while I'm developing. If, while building Bar, I make a bugfix to foo, I can commit it.

So logically, my commits look like:

101-104 Work on Foo

105-107 Work on Bar 108 Bug fix on Foo 109-111 Finish work on Bar

When it comes time to push to trunk, I can rebase 101-104 + 108 into one clean patch, "Built module foo", and 105-107 + 109-111 into "Incorporate module foo into bar". Then I eventually push these into the main repo.

Further, if I'm working on this with someone else, we can use git to track work between the two of us without committing to mainline.

spiffytech · on Nov 24, 2011

This workflow sounds like it could be replaced by creating a feature branch, developing the feature there, then merging the branch back into trunk when the feature is complete. This can be implemented in ordinary SVN, without git or git-svn.

yummyfajitas · on Nov 24, 2011

Almost, but not quite. As far as I know, svn doesn't support rebase -i.

Also, part of the point of the workflow is to make sure all commits to the official repo are clean code. Using a feature branch means that the official repo contains commits like "Halfassed implementation of foo, joe take a look at it".

EdiX · on Nov 23, 2011

> Please, do elaborate! In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access. No distributed or remote or on-the-go development, no forks.

Because with svn you end up having dirty working directories that go uncommitted for days because committing would break the build, as a result:

1. Everyone ends up making a second working directory because the first one they have is dirty with changes they can't yet commit and they need to make a quick change

2. Those dirty working directories are branches in practice, even if svn doesn't call them that, they are just dealt with with inferior tools.

Also: cherry picking commits from a branch into another branch isn't as easy, making a new repository isn't as easy, git is way faster at (almost?) everything (including large binary files), and being able to check the history when you are not in the office or connected to the VPN is nice.

However, I agree that the conceptual model behind DVCS is harder to understand, significantly harder, svn can be good enough especially for a small, local team.

nirvdrum · on Nov 23, 2011

SVN has branches, too. The problem it had was with merge-tracking and that's about when everyone hopped over to git (myself included). But that hasn't been an issue for a couple years now. By all means, stick with git if you prefer it, but release some of the older criticisms as they've been addressed by the SVN team.

EdiX · on Nov 23, 2011

> SVN has branches, too.

I don't think I have said it doesn't

nirvdrum · on Nov 23, 2011

You certainly implied a branch-free workflow. Otherwise why would you have a dirty local workspace and multiple checkouts? And why would committing break a build? I'm unaware of any CI server configured to build every branch in the SVN tree.

EdiX · on Nov 24, 2011

Do you make a branch every time you change something? Do you make a branch for every developer's local copy?

The thing is, when you solve the problem of dirty local workspaces by making them actual branches svn becomes just as complex as git, probably even more so given that all the branches exist for all the users. And you still don't have has good interface for cherry-picking, you still need multiple local copies, it still isn't as fast and you still don't have all the other benefits associated with a DVCS.

nirvdrum · on Nov 25, 2011

You don't need multiple local copies. That's why "svn switch" exists. That's all I was addressing. Some of the things you knock SVN for are either non-issues or issues with git as well. E.g., "branches exist for all users" is a non-issue. Otherwise it's also a problem when I push a git branch (which is a wise thing to do).

Cherry-picking and speed are legitimate benefits of git.

kahawe · on Nov 23, 2011

> I don't think I have said it doesn't

But those branches would be a solution to the "I cannot commit this yet" problem. And at least on the repository side it is a cheap FSFS copy.

However, I don't want to make them branch because it would pose the same issues unfortunately...

fuzzix · on Nov 23, 2011

"Please, do elaborate! In detail, why is git better than subversion if all I need is a central repository for a few people working at a company on one site, sitting in the same room with permanent network access. No distributed or remote or on-the-go development, no forks"

You describe my environment almost exactly. I am pushing for a git migration mostly for the low-overhead branching/merging as I can have multiple discrete tasks on a given project which need to be rolled out individually.

RyanMcGreal · on Nov 23, 2011

Don't forget speed. Git is fast.

mikealrogers · on Nov 23, 2011

This is perhaps the most depressing response I've received to my article.

As I said in my article this is far less about git and more about the chasm that has grown between Apache and the rest of the community.

Your first two points boil down to "nobody makes you join Apache, if you don't like our policies then you can get out". How does this help Apache or its projects?

Apache could still be valuable to the community but this kind of stubborn attitude will insure that it continues to become irrelevant when it could be a leader.

I do understand the purpose of Apache and it is not hosting source code. That is the point I'm trying to make. If that is not its value, and its policies around hosting that source are no longer beneficial to its projects, then it should change its policy.

I think that you, and many people in the ASF, have married the existing policies of Apache with the purposes for which they were created. While the intentions of the policies may still be relevant, and in my opinion correct, the policies themselves will not remain relevant forever in a field as rapidly evolving as technology and GitHub may just be the first example of Apache policy incompatibility with evolution of open source.

apike · on Nov 23, 2011

The ASF is the first home that comes to mind when an successful open-source project needs independent stewardship. Often when a company wants to "spin off" an open source project, they turn to Apache.

What alternative organizations fill this need in a more lightweight fashion? Most other umbrella open source organizations I know of focus on copyleft and other issues that can be hostile to commercial interests.

sanxiyn · on Nov 23, 2011

Software Freedom Conservancy. http://sfconservancy.org/

Conservancy doesn't care about licenses as long as they are free. For example, jQuery (MIT license) is part of Conservancy.

Conservancy members include prominent projects such as Boost, BusyBox, Darcs, Git, Inkscape, jQuery, Mercurial, PyPy, Samba, Selenium, Squeak, uClibc, Wine.

apike · on Nov 23, 2011

Very cool. It seems like they are the type of organization Mikeal is encouraging ASF to become: legal and administrative support for open source projects, and other services if the project's leaders wish. Quite hands-off.

Comparing the lists of projects, I'm surprised to find I use more SFC software than ASF software.

ternaryoperator · on Nov 23, 2011

"Conservancy doesn't care about licenses as long as they are free."

They do care about licenses and license terms are part of the requirements for application. The project license must be either free (per FSF) or open (per OSI). Docs must be made available under Creative Commons licenses. And the project must be completely non-profit. (All these requirements must be met.)

That being said, you're right, it's a good home without the politics discussed here.

nirvdrum · on Nov 23, 2011

Just to get two things out of the way: I'm an ASF member (albeit not very active lately) and a huge fan of git with or without GitHub. I'm one of the many people advocating for git internally at the ASF. I have been met with opposition in the past, but a lot of it has been around who's going to maintain the infrastructure, given it's a volunteer system. Let's just take it as axiomatic that the ASF is going to self-host its code. So it's at least a fairly pragmatic argument. And I think we finally have a solution.

My real issue is with the bouncing back-and-forth the author does in his post around the notion of IP. It's a shitty topic that most devs don't want to be bothered with, but alas, it's quite important in the real world. And GitHub is mostly a landmine field when it comes to this. I don't think it's a failing of GitHub itself, but most projects just don't have licenses attached to them. Unlike with SourceForge, there's no requirement to have an OSS license on public projects. Then many that do fail to meet the copyright header requirements for the license. Or you could have a public project with a restrictive license [1]. Being public doesn't mean you get to do whatever you want with the code. This is dangerous and bad for OSS.

Apache gives you that protection. There's never any question about it. That's the primary reason projects go through the incubator -- to make sure the IP is all in order. It's an annoying, bureaucratic, but necessary process in a litigious society. But because of the care and protections Apache provides in this regard, I think they've done more to get OSS adopted in traditionally closed companies than just about anyone else.

[1] I came across Tom Preston-Werner's repo for his site. He's one of the GitHub founders. It's a public repo with a license that restricts usage of certain portions of the project (generally his content): https://github.com/mojombo/mojombo.github.com

MattRogish · on Nov 23, 2011

This is a good point. Anyone who starts an open-source project should, from day one, have a vetted Contributor License Agreement and ONLY accept pull requests that include signed CLAs (or from a person who has one on file).

nirvdrum · on Nov 23, 2011

It'd be kinda nifty if GitHub had this built-in. I personally don't require CLAs for every project because it can be onerous. But at the least I try to pick a license that wouldn't allow for submarine patent claims.

mikey_p · on Nov 23, 2011

I've wondered for years why GitHub doesn't provide a license field as part of the repo along side name and description. I've been known to pester people after they point me to their repo, and ask them to add a license before I'll use their code. Automating CLAs would be a dream.

GitHub, are you listening?

davidw · on Nov 23, 2011

Some good points, but github doesn't take the place of a community. When it's working well, it helps, but when there is a breakdown of collaboration and communication, you get one of those codebases that has been forked 1298 times where none of the people doing the forking is sharing anything. That's a community fail, not a version control system issue.

I do think that it'd be nice if the ASF offered git alongside svn, and concentrated on the community aspect of things, which it does tend to do fairly well.

latchkey · on Nov 23, 2011

Just because you can't 'see' the fork with subversion doesn't mean it doesn't exist. I'm sure plenty of ASF projects are 'forked' within companies and the code is never shared.

What I've been advocating for a while now on the members@apache list is that the ASF look at using Github (either they host it or the ASF does) as the basis to build a new type of community that the ASF has never experienced before. Something that isn't tied to the old school.

davidw · on Nov 23, 2011

Sure, github may bring more things into the open, but it is ultimately just an aid to a community of people, who must communicate about the project. You can't just dump the code on github without communicating with other people working on it.

I agree that outsourcing some of the infrastructure to github might be an interesting idea.

ww520 · on Nov 23, 2011

I'm curious how does it harm anything? Did it kill any puppies? Maybe it's inefficient but harmful?

Also remember GitHub is a for profit company. Its allowance for Open Source hosting is a marketing tactic. Anytime they feel the marketing value is not there, they will shut it down. Not that I'm against GitHub. It's a great company for itself. But comparing the Apache to GitHub is like comparing apple and orange.

Legion · on Nov 23, 2011

Don't get caught up on GitHub specifically. GitHub is just the most popular example of how projects don't need Apache to host them anymore, yet Apache still expects to do so, and worse, expects to assert a lot of restriction over it.

The point is, that is at odds with what the community wants and needs. As the author pointed out, ten years ago, rolling your own SCM hosting was a big pain. Now, it's not, partly because of GitHub and Bitbucket and others, but also because rolling your own isn't as hard either.

Anyone with minimal server admin experience and knowledge of Git can run their own Git server on a VPS with something like Gitolite. I know because I succeeded in doing so myself, and I'm neither a pro server admin, nor did I have any Git experience at the time I did the initial setup of Gitolite. Prior to that, I had set up a Mercurial server with no prior Mercurial experience either. It's pretty easy now.

So, yeah, GitHub is there, but GitHub could disappear tomorrow and the community still wouldn't need to turn to Apache for project hosting. In that respect, they're still solving a problem nobody has anymore, and that was the point the author was making.

hrabago · on Nov 23, 2011

GitHub isn't a new version of Apache - GitHub is a new version of SourceForge. I don't think that even 10 years ago anyone with a line (or 100 lines) of code can set up their own Apache projects.

10 years ago if you needed free SCM, you'd use SourceForge, not Apache. I don't think it would've been that big of a pain even then.

GitHub projects don't necessarily come with its own community with diverse contributors, whereas Apache projects require it.

nl · on Nov 23, 2011

In that respect, they're still solving a problem nobody has anymore, and that was the point the author was making.

That wasn't the author's point at all.

rgardler · on Nov 23, 2011

I just want to be clear. To the best of my knowledge The Apache Software Foundation has never killed any puppies.

Disclaimer: I am a Member of The Apache Software Foundation. I do not own any puppies. These two facts are not connected.

(thanks for bringing some reality to this post, I don't agree with your other comments, see my comments elsewhere fore why, but your opening para is spot on)

ww520 · on Nov 23, 2011

Hyperbole is best answered with hyperbole.

Note that I have nothing against Git. I use it. The main difference between Apache and GitHub is one is a non-profit whose main goal to shepherd OS projects while the other one is a for-profit company out to make money. I don't think it's a fair comparison to use one to substitute another.

v21 · on Nov 23, 2011

It harms the opportunity cost of the projects that reside within it. That's a real harm.

And I think the point is that while GitHub is there, use it. If it closes off, or goes bust, migrate. You can extract all your data - keep a backup elsewhere. At worst, you can move to Gitorious. I mean, currently they use JIRA for bug tracking - that's not an open project either, that's run by Atlassian.

rgardler · on Nov 23, 2011

Projects choose to go to the ASF. Who are you, or anyone else, to tell them it is not the right choice? Billions of dollars are made and saved every year thanks to ASF software (hell it's even in space)- there is a reason for that.

Can the ASF environment be improved for its projects? Of course it can - see my other comments where I address this point.

v21 · on Nov 27, 2011

You're right of course - and I should have tempered my post with caveats about "if the poster is correct", and "I don't really know much about the internal workings of the ASF".

The main point I was trying to convey was just that there's a cost to not improving things, just as much as there's a cost to things getting worst.

inopinatus · on Nov 23, 2011

This essay put me in mind of The Cathedral and the Bazaar. It is a neat demonstration of how tools and processes are inextricably linked.

Who knew, though, that the ASF would be cast as the new priests in the cathedral? I suppose it took a whole new level of social development, enabled by tools, to cast them in that role.

rgardler · on Nov 23, 2011

Actually I think it is either:

a) a complete misunderstanding of how and why the ASF operates the way it does

or

b) a desire for sensationalist blog pieces with almost no factual content

The ASF is working with Git, it has been for years. It doesn't yet provide a canonical repository from which to make releases. This is due to a number of non-trivial technical issues introduced by the processes adopted by Apache projects.

The Apache infrastructure team believe that they have now solved those issues and are testing them in CouchDB. Assuming the CouchDB experiment is a success the ASF will be rolling out Git as the canonical repository to all projects that want it.

Once the ASF has mapped the tools to the processes we can all move on and stop wasting our time with this spurious argument.

Disclaimer: Unlike the author of this blog I do have access to all the discussions about Git in the ASF and I am one of the mentors of PhoneGap, a project mentioned in the article.

polychrome · on Nov 23, 2011

So you're saying a system that has had repeated successes is harmful. I really think you make a good point here about the need to remain open to change. So talk about that. Obviously github has some very positive impact. How can Apache adapt to that? You're not really talking about the tools here, you're talking about community.

I see a potential solution here being that Apache has different rules for projects in different stages. Do you think that would solve the issues?

Remember, you're view of anarchy on GitHub will only last so long. Rules and order come out of anarchy for a reason and like all things GitHub will become the exact same stale community you're complaining of now in 10 years.

apu · on Nov 23, 2011

Note that this article is about the Apache Software Foundation, not the webserver.

ccashell · on Nov 23, 2011

Yeah, I found the title to be misleading and quite annoying. It should definitely be fixed to clarify. To the vast majority of the IT world, Apache == Web Server, not Apache Software Foundation.

andrewflnr · on Nov 23, 2011

Indeed. It would be nice if the mods (we have mods here, right?) would change the title to make that clear.

ryanmarsh · on Nov 23, 2011

I'm curious as to the average age of committers by project under the ASF vs. popular projects on github. My hypothesis is that they would be older by a significant margin.

chaseideas · on Nov 23, 2011

I would agree with that hypothesis.

What I've observed from running http://www.Apache.com for several years, is indeed an older crowd (40+) by a nice margin compared to a lot of the younger projects floating around that are generating a ton of buzz.

It's been much more rare in my experience to see a 20 something hipster programmer seriously diving in with the ASF. I'm generalizing though of course...

The type of questions and people I interact with through that project are older engineering types, and those with a long history in the programming and computer scene. Usually with an old-school *nix approach to things.

Just wanted to chime in with that, since like stated elsewhere in the HN comments, I think this blog post is more about the organization structure and members of the ASF than the actual Apache Web Server project... which we all love so dearly. ;)

- Chase

nirvdrum · on Nov 23, 2011

I think you're right, but they overlap too, so hard to give a completely clear picture. I.e., every Apache person I've met also uses GitHub. Obviously the converse is not true.

compay · on Nov 23, 2011

Hate to be the grammar cop here, but the consistent misspelling of "its" in the article is distracting. If the author is here, could you please fix that? It's taking away from a very well-written and insightful piece of writing.

greatquux · on Nov 23, 2011

I stopped reading at the misuse of weary for wary.

rukkyg · on Nov 23, 2011

I agree. There are also several other typos like missing letters or extra words.

linuxhansl · on Nov 23, 2011

I am not entirely sure what the article is trying to get at. Politics and law in open source are real and needed, especially in the face of software patents.

Many contributors develop open source code as part of their paid work, as such it is quite important to establish the legal framework to allow contributing the companies IP to an open source project (which includes necessary patent grants). Comitters need to submit an Individual Contributor License Agreement stating that they have the legal right to contribute the code they're contributing. If worked on contributions as employee the company also typically needs to submit a corporate contributor license agreement.

Like it or not (and I personally do very much not like it), you cannot just upload some code somewhere these days.

As such the even FSF is an extremely important organization. Much frowned upon and usually not understood. Open source licenses would not work without Copyright Law, most developers don't know or understand that.

And the whole thing of svn vs git. Personally I don't get it. I use svn when it makes sense and I use git when that makes sense, just like I use Java/Ruby/C++/Python/Javascript/Closure/Scala/whatever or GoogleDocs/PDF/OpenOffice/MS Word(yes) when needed.

The main feature I need from VCS are atomic commits. So CVS is out for me for that reason. Sure git's nice and all, but I spend 99.9% of my time writing or thinking about code and software architecture, not tinkering with my VCS, so as long as it works, I don't care. Easy forking and branching is nice too. In the end, though, just as with Linux there is/are some de-facto master branch(es) somewhere from which "releases" are cut.

Currently with apache it's more convenient to use svn, so I do that.

I don't get the religious opposition against one version control system vs another.

Disclaimer: Apache committer here.

bad_user · on Nov 23, 2011

Ever since I started using Git, I would never go back to SVN, because branching and merging is now an essential part of my workflow, even on small project on which I work by myself with no other contributers.

     just as with Linux there is/are some de-facto
     master branch(es)

Well yeah, but I don't get how that's an argument for SVN. The thing I like about Git is that branching is now really, really cheap. You can now keep track of dozens of local branches with experiments that you don't have to push to master. You can now share your experiments with a colleague and push to master whenever something is actually ready. You can now also ban commits to the main repository that haven't been code-reviewed (something which is a PITA with SVN). And so on and so forth.

    I don't get the religious opposition against one 
    version control system vs another.

Even though I prefer Git, neither do I, especially since you can just use the Git-SVN bridge :) I've used it for more than a year, it does has some quirks, but it works fine.

Also, the Apache Foundation does its job and does it well. There's room for both anarchy and bureaucracy and both are needed.

delinka · on Nov 23, 2011

Or perhaps "Apache [Foundation] considered harmful" since in common usage "Apache" tends to mean "The Apache web server."

rmc · on Nov 23, 2011

Agreed. The title should be edited.

sktrdie · on Nov 23, 2011

1) Apache Software Foundation and GitHub are two totally different things. Who cares about their internal preferences and bureaucracies. They're both producing outstanding open-source projects which are used by hundreds of thousands of companies and people.

Open-source (and the world) has only gained positive things out of these communities.

2) If you're suggesting that ASF needs to change its bureaucracy, I disagree. Frankly, I feel the bureaucracy has worked, given the success the foundation projects have had.

3) I'm not sure what other points your post brings, but if you're simply just saying that ASF needs to keep itself up-to-date with new tech (dunno git?) then this is also a totally absurd argument since the tech being used in Apache is totally amazing and new.

I feel like your outcry is referred to general institutions... you should probably refer to governments and other political entities instead of bashing on a foundation that has given the world amazing products.

suprgeek · on Nov 23, 2011

Some people at the ASF think that Git may have some issues so are not wholeheartedly endorsing it for any and all projects.

Somehow this gets translated into "Apache considered harmful".

Why is this even on the front page?

funkah · on Nov 23, 2011

I would be seriously pissed if I donated my time and effort towards something like ASF, trying to help the open source world, only to be "considered harmful". WTF.

caniszczyk · on Nov 23, 2011

Well written... I wrote a response last night that dives into some numbers and my experiences with helping the Eclipse Foundation move to Git and Gerrit... http://aniszczyk.org/2011/11/23/apache-and-politics-over-cod...

Apache is pretty much the last major open source community to not move to some form of distributed version control. It's either politics (they host the Subversion project there) or negligence in my opinion.

latchkey · on Nov 23, 2011

Well written.

I wrote a similar article recently as well...

http://lookfirst.com/2011/11/contributing-to-open-source.htm...

luriel · on Nov 23, 2011

This might be slightly offtopic, but might be a symptom of the "institutional"/"organizational" issues addressed in the article:

I always thought Apache 2 and Subversion were two of the best examples of second-system effect. I mentioned this once to one of the core Apache (and Svn) developers years ago, and not only was he blissfully unaware of the effect, he indicated that he had helped build incredibly successful pieces of software (ie., Apache 1.x) and didn't need any advice from from Fred Brooks or anyone on how to do it.

Both Apache 2 and svn have been extremely successful projects, but both were late, didn't really match expectations or even the success of their predecessors, and are slowly being outcompeted by much smaller and usually more efficient projects (eg., nginx, lighttpd, git, hg) that are developed much more quickly by much smaller teams.

abrahamsen · on Nov 23, 2011

Free project infrastructure wasn't hard to setup five years ago. It hasn't really been a problem since 1999 when SourceForge opened. Before that, the SunSites did a nice job, and before that you basically had to know a friendly university sysadm (which wasn't _that_ hard to find).

I'm not sure what people think they gain by going under the Apache umbrella, but it must be something since they bother. There are no lack of alternatives.

ck2 · on Nov 23, 2011

I wish at least one open source replacement adopted .htaccess (and httpd.conf) compatibility.

Litespeed is the only product in existence which has made switching over from a complex Apache install a one hour affair, but it's free version is limited to 5 hosts and the commercial version can only be justified in a profitable environment.

The performance difference is breathtaking however.

wyck · on Nov 23, 2011

..why isn't project x on git /rant

tl;dr

Github has a great social interface, it is a tool, there are others and there will be more.

Yes it is an evolution towards DVC + great social interfaces, older industry mammoths will be slower to adopt then newer more agile projects.

ps. I just switched from git to mercurial and it's a breath of fresh air.

pyrhho · on Nov 23, 2011

As an aside, the Clay Shirky quote ("Institutions will try to preserve the problem to which they are the solution.") was new to me, but puts the RIAA/MPAA pretty much perfectly into perspective. Not really related to the article, but it clicked as I was reading.

aristedes · on Nov 23, 2011

Where to start with this blog post? It appears that the author has seen a couple of private emails and thinks he knows all about the internal workings of the Apache Foundation. He is wrong on so many counts.

His entire dislike of the Apache Foundation appears to be predicated on the fact that the organisation did not force every project to move to this blogger's favourite version control tool. Making a change as large as this requires many different things, but in particular:

1. Community change. How committers interact with each other when there are lots of forks is quite different to the current situation. That suits some projects and not others. Not every project at Apache will benefit. Some will. All who change will need to think long and hard about release processes, merging strategies and much more. Git encourages the idea that every commit or fork is completely equal to every other fork or commit. The Apache Foundation is built on the concept of meritocracy: commit rights are given in response to demonstrated skill. This is not an intractible problem with git, but new challenges need to be solved.

2. Legal change. Right now there is a simple process for signing off intellectual property for contributions which were merged from external contributors (who have not signed a release). That changes with git and becomes more complex. There are solutions, but they require careful planning.

3. Infrastructure. Hosting a large git repository with the level of downtime acceptable to Apache isn't something you do quickly. That needs planning and maintenance.

4. Toolsets. Lots of things in Apache are tied into subversion. From mailing list commit hooks to build servers and much more. Changing those things takes work.

5. Splitting the community. Right now the entire organisation's intellectual property is held in a single repository. Everyone knows where everything is to be found. Changing this simplicity requires a very good reason.

So what do we have now? A blogger who (it appears) doesn't actually contribute code to any Apache project. Telling other people how to run their organisation (which is wildly successful). And that they should change to this blogger's favourite new tool (they should have done it in 2008!) or face irrelevance.

If Apache moved every project to github tomorrow would that satisfy this blogger? More importantly, would that have caused this guy to commit high quality code toward one of the Apache projects? Or is he just blowing a lot of hot air about something he knows little?

And what brought on this great complaint? That the Apache Foundation is currently underway with trials for one project to see how git would succeed for their workflow. And to then evaluate its suitability for other projects across Apache.

Apache is not Github. That is, Apache is much more than a website, a couple of tools and a repository of code of random quality.

Disclaimer: I am an Apache member, but not speaking on behalf of Apache

msbarnett · on Nov 23, 2011

> 2. Legal change. Right now there is a simple process for signing off intellectual property for contributions which were merged from external contributors (who have not signed a release). That changes with git and becomes more complex. There are solutions, but they require careful planning.

I'm curious as to how exactly you feel git impacts on Legal processes versus the use of svn. I'd expect that the choice of tools and the legal issues surrounded merges made by those tools should be completely orthogonal.

aristedes · on Nov 25, 2011

With an svn workflow the committer sends each patch in a single authenticated request directly to the Apache svn server. With every commit they are saying "this code is appropriately licensed, even though the code may have come from other committers. The history of that code is completely obscured."

With a git workflow, the push (which is authenticated against a committer who has signed the appropriate license agreement) could contain multiple commits from other sources. This is particularly the case if it includes code from a pull request. The Apache git tree will then have commits with publically visible attribution to people who are not Apache committers and may not have signed the appropriate license agreements.

I am not saying this is a deal breaker, but it does require some thought. We don't want some contributor to come back three years later and say "that contribution from me: it was only released under the GPL". We need clear guidelines around that original pull request and how copyright/patent signoff happens. Right now, third party contributions go through a Jira patch process which includes a copyright/assignment tick box.

msbarnett · on Nov 25, 2011

Doesn't simply requiring that all pull requests be squashed down to a single clean commit from a developer known to have signed the license agreements give you back the exact scenario you have under SVN?

Anecdotally, a lot of projects I've been involved with have required that pull requests be squashed to avoid polluting the "main" repo with irrelevant/undesired third-party history.

soc88 · on Nov 23, 2011

The ASF is missing the boat here. It is like ... the 21st century? And core people of the Apache are not only thinking “SVN should be enough for everyone” but also make it exceptionally hard for projects to use the rigth tool for their job.

My bet is still on Git being shot down due to some random made-up “quality concerns” in the end.

ASF went from a helpful free software organization to a software graveyard: Ant, Maven, Subversion, Commons, OpenOffice ...

Sad, but that's reality.

aristedes · on Nov 25, 2011

Are you just trolling or trying to make some point? What part of "Apache is currently evaluating git" makes you feel that git is being shot down?

Git is just a tool. In five years there will be another tool that everyone cannot live without. And people like you will be telling Apache that they are dinosaurs because they have not moved to that. Right now some very dedicated and skilled people are donating their free time to running the Apache organisation and evaluating the feasibility of making changes. And you accuse them of lying about quality concerns (which no one has even raised). Meanwhile, your assistance to the advancement of open source is what exactly?

VladRussian · on Nov 23, 2011

push vs. pull. Basically the tools and overall environment grew up to have pull as a feasible model.

jmathes · on Nov 23, 2011

"Considered Harmful" considered pretentious

nknight · on Nov 23, 2011

Much of this is pretty unsurprising, especially for people like me who watched the attempted transition of OOo to Apache.

The way the article puts it is that the ASF is trying to solve problems that don't exist anymore, which is true to an extent, but the deeper problem is that the ASF has a particular view of how open source development and project management work, and attempts to impose that view on far too diverse a community, even as it tries to absorb more and more communities.

The ASF is simultaneously trying to be "big tent" and unified, and the balance is all out of whack. It's easy to draw parallels to recent political problems in the US and EU. In all three cases, there's going to have to be some transformations, probably in both society/community and structure, to come back to a place where the institution contributes to the greater good, instead of being a source of unending tension and meta-arguments.

jzawodn · on Nov 23, 2011

Very well said. I was going to write something like that but don't have to. Yay for up votes!

jowiar · on Nov 23, 2011

I second this thoroughly. I was was almost driven to start blogging last week by ASF's poor job of maintaining its projects. There was a small bug in Solr. I was not the first to find this bug, and someone had not only reported the bug, but filed a patch on the bug tracker a year and a half ago. The patch was never merged in, nobody provided any feedback as to why the patch wasn't merged in.

One huge plus with Github is that if the official steward of a project would like to hand it off to someone else, or is failing to maintain it, it is trivial for someone else to take over the project.

plunchete · on Nov 23, 2011

So your problem is not with the Apache Software Foundation but with the committers of Apache Solr. On Github you can do a pull request and it never being accepted, so same result as your experience in Solr.

As you say it's trivial to someone to take over the project and maintain it but not trivial to anyone to find the right fork of the project when a project has 100+ forks.

jaaron · on Nov 23, 2011

Then fork it. It's open source for a reason. Or raise a fuss on the mailing list. That's how open source works.

buff-a · on Nov 23, 2011

Indeed.

Git => designed for forking and reintegrating.

Subversion => fucking painful for either.

I find it ironic that your first post complains that the author is comparing ASF to a sand box, but then you go and suggest that GP should just fork a project. I think you really are missing the point: sandboxing is a bunch of people just forking. A community is when those forks are then cherry picked and re-integrated. Subversion is shit at that. Git is awesome.

I have no skin in this game, but if I were to look at the requirements as you describe them, I'd recommend using git and think you were crazy to use subversion.

v21 · on Nov 23, 2011

I'm not the OP, but if something raises my ire almost enough to make me blog about it, it probably isn't going to irritate me enough to fork a large software project. But the lower the bar for submitting a patch, the more likely you'll get one. And the more everyone will benefit from it (assuming it's a net positive patch).

tolmasky · on Nov 23, 2011

It can sometimes be hard to fork a project just for a patch to one simple bug. Once again, GitHub really shines here: you can go to any project and see all of its pull requests, so you don't have to go hunting for patches attached to bugs, and it becomes quite clear and public when a project isn't properly or expediently merging in patches.

The GitHub layout is really telling of the new open source philosophy. They put the code front and center (main page), and right above it show you with first class status all the bugs it has (issues link) as well as all the proposed changes (pull requests link).

jerf · on Nov 23, 2011

While forking is a solution, it hardly precludes discussion of other less drastic potential solutions to the problem at hand. It does nicely bound the maximum negative impact that Apache social problems can cause, though.

jaaron · on Nov 23, 2011

If you're using git, as the OP suggets, then you're going to fork it just to work on it. Forking isn't dramatic. Code is open source for a reason. If you have a critical bug in code you need running in your infrastructure, take ownership of it and fork it. Then do the dirty work to get the patch pushed back up stream.

If the maintainers really aren't doing what they volunteered to do, then volunteer yourself and get it done.

jerf · on Nov 23, 2011

Sorry, explain to me clearly why that is the One Answer and we must stop discussing alternatives?

You're answering a question I'm not even remotely asking.

jaaron · on Nov 23, 2011

There isn't One Answer. The point is, you have plenty of alternatives. I don't know why the patch hasn't been applied. I know how I can find out though: I can join the developer mailing list and ask. If that doesn't work, I can track down a developer directly (they're not hard to find once you're on the mailing list) and bug 'em until I get a decent response. If it's clear the maintainers aren't doing their job, raise hell on the mailing lists and push to become a committer yourself so you can do the job right.

While that's happening, you can take the approach of maintaining your own patches so that you're not beholden to anyone in particular.

The whole point of open source is empowerment, not entitlement. No one is entitled to get any bug fixed. It's great when it happens but ultimately, everyone is empowered to make things happen themselves.

jerf · on Nov 23, 2011

So you do agree that there are options beyond forking it or merely "raising a ruckus on the mailing list". My point was precisely that there are additional answers and that just glibly saying "Well, just fork it or accept what the mailing list result is" isn't a good summary of the alternatives, and in the context of what you were replying to borders on deceptive.

In the meantime, the fact that I am rich with options doesn't negate the original discussion, which is that the Apache processes are becoming distinctly suboptimal for the context they work in. The fact that I can just take the software and run with it doesn't fix their processes, and the fact that anybody can do so doesn't excuse broken processes. The fact that we can fork does not mean everybody should just stop discussing Apache processes; it doesn't follow.

I'm still in context: "I was was almost driven to start blogging last week by ASF's poor job of maintaining its projects. There was a small bug in Solr. I was not the first to find this bug, and someone had not only reported the bug, but filed a patch on the bug tracker a year and a half ago. The patch was never merged in, nobody provided any feedback as to why the patch wasn't merged in." "You can just fork it" is not an answer to this problem. I'd say in its own way it's a disguised confession that in fact the problems with the project are indeed so bad that your only hope is to fork it yourself. Well, that still says bad things about the project, regardless of whether I have mitigation options.

drivebyacct2 · on Nov 23, 2011

Because you act as if a "fork" is a drastic choice, when the reply calmly explained that "fork" doesn't have to mean "hey, let's create a new project and try to poach users into abandoning the original one".

The answers seems to be frank, and based on having absolutely no knowledge of Apache, maybe an appropriate one. It seems like they have a process that works for them, and they are quite interested in continuing with it. And that's fine. For people who want a more (for the lack of a more succinct way of expressing it) Git/GitHub style project, they can fork it and hack away to their hearts content.

That doesn't preclude upstream adoption of code, and it doesn't preclude discussions of improved workflow within ASF.

quadhome · on Nov 23, 2011

Or raise a fuss on the mailing list.

No. That's not how open source works.