Hacker News new | past | comments | ask | show | jobs | submit login
Abandon your DVCS and return to sanity (bitquabit.com)
331 points by drostie on Mar 3, 2015 | hide | past | web | favorite | 310 comments



Anyone else remember the days when having your SVN server go down meant that you more or less had to wait until the server came back up before you could get back to doing things? (if you wanted a nice atomic commit). Let's not forget people that have to travel a lot for work– SVN + airplanes is a match made in hell.

Anyone else ever had someone delete or close an open source project that you cared about? Finding a copy with the history that mattered to you could be tricky.

Anyone else ever needed their own fork of something due to differing goals from the parent project, but needed to fold in upstream changes relatively often? That's by nature the sort of problem that a DVCS solves easily.

Other objections to this article:

* Nothing precludes you from using the patch model with DVCS (I mean, Linux kernel development uses Git just fine with this)

* The author mentions that you have to retain the whole history of a project. For one thing, storage is cheap. Another point worth mentioning is that you can make shallow clones with Git. I don't know what the status is for committing to them these days, but there's nothing fundamental that should prevent a DVCS from letting you work on shallow clones if space is such a big deal.

I could go on, but the article seems to be griping about UX issues. Just because we haven't had tools in this space with wide adoption that are user friendly doesn't mean that we won't eventually.


> Anyone else remember the days when having your SVN server go down meant that you more or less had to wait until the server came back up before you could get back to doing things? (if you wanted a nice atomic commit).

That's now known as "I can't deploy because github is down". Note that the author proposes "local commits plus a centralized storage".

> The author mentions that you have to retain the whole history of a project. For one thing, storage is cheap.

Not in portable computers. I can get a TB, but that's it. I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

> Anyone else ever needed their own fork of something due to differing goals from the parent project, but needed to fold in upstream changes relatively often? That's by nature the sort of problem that a DVCS solves easily.

That's a strawman. The article does not argue that there's no use-case that's solved by git or and DVCS. It's just that not every use-case is solved by git. I'd even go out and argue that most use-cases are solved just as good by a solid centralized system with less complexity involved.

> Just because we haven't had tools in this space with wide adoption that are user friendly doesn't mean that we won't eventually.

I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain. DVCS fundamentally introduce complexity that is not always needed. And I doubt that the fundamental complexity can be abstracted away.


> That's now known as "I can't deploy because github is down". Note that the author proposes "local commits plus a centralized storage".

And in that cause you can scp your repo somewhere and change your origin and move on.

Also, do people really rely on 3rd party services for their deployed code? I've used github as a repo and colab tool at companies, but we still deployed from repos owned by the company.

> I'd even go out and argue that most use-cases are solved just as good by a solid centralized system with less complexity involved.

I'd go out an argue that because the added complexity of git is far outweighed by the benefits of a local repo.


Hell, for a ton of companies (including mine) everything is on 3rd party services/servers. We don't have any servers we can actually call our own.


Which is fine, but the very obvious and up-front trade-off of using servers that aren't yours is that they aren't yours, and everything that implies. There are plenty of benefits as well, but that's why it's a trade-off.


    Also, do people really rely on 3rd party services for their deployed code?
Yes, this is very common.


Also the fact that each developer has a full copy of the repo on their laptop lets me sleep at night wrt github exploding.


Or your company exploding or pretty much anything exploding. When every dev has a full copy of the repo, you can restore all the work as long as a single dev machine is alive.

Not to mention that in case of an office Internet outage, you can just mesh together those repos and continue working. Knowing that gives me some peace of mind.


Cloning doesn't require the full repo. Some of your devs may only have half of the repo. Look at the --depth option for git-clone.


Or, you could use Mercurial and sleep even better knowing that not only does every dev have a full copy, but does NOT have thousands of orphaned files of each revision or potential revision like Git does.


It also means that you get to loose control over all data in the repo including all versions and history when a single person looses the laptop, even if it's a person that only required read access to a tiny part of the data.

It's a valid tradeoff to make, but still something I'd keep in mind.


I'm going to go ahead and say that that's a red herring.

A successful business is built upon so much more than whatever code is in your repo.

Any competitors are probably more likely to look at the code and ignore it over their own home-grown solution than to adopt it wholesale...and if they can beat you with your own code, they would've beaten you without it, because they probably run a much better business game.

Trade secrets are like the stupidest reason ever to hobble your developers.


I think you're wrong on that account, there's tons of pain that can be caused by such a loss. But I was thinking more about the SSL certificates and SSH keys and hardcoded passwords and other secrets that inevitably seem to end up in repositories. (Rails cookie secrets anyone?) Some of that data gets deleted but not purged correctly and oops, there's that one server that still accepts last years key, didn't we delete that from the repo? I've seen some repos that could easily create damage in the 6-7 figure range if they ended up in the wrong hands. Enough to sink a small company in any case.


All of that is configuration, not code, any should never be in any repo external to one's company. Nor, frankly, should it be available to the developers. That's all production-specific, and should be restricted to deployment.

If your developers are deploying into production directly from their laptops, you have a problem.


So, admins never use version control for puppet manifests/chef cookbooks/terraform/saltstack/whatnot and all the config data that's put on the servers? And it would not be of value if we could version the configuration along in the same repository as the code to which it belongs? So it's easier to track? Now, currently we can't, because DVCS don't allow us to grant different privileges to different people/groups, but I could very well imagine that developers get access to a repo where they can change the configuration to the dev/test environments and the admins match that in the same repo for prod and other envs with higher level needs. I could also imagine building the final deployment from a single repo instead of having to merge two repos because I need to separate them due to permissions.

I could also imagine designers placing their assets in the repository without needing to deal with git or even seeing the source code that lives side-by-side. That totally used to work with SVN I'd totally like it if the repository contained the PSD sources for all the assets/mockups that are relevant for a given version of the source, but alas, you can't check out the source without getting the PSDs, even if you're not interested in them. Heck, SVN even allowed exposing the repo as webdav which you could mount as a networked folder, so people could access the last revision just as if it were a SMB drive.

git is a huge step forward in some regards, but we also lost quite a couple of good things along the way.

> If your developers are deploying into production directly from their laptops, you have a problem.

I never said they do.


So, the performance of binary assets in SVN is exactly why we used to--when doing game development--check all the source files out into their own repository, and occasionally sync over.

As for the configuration being in the same repo as the code--again, why? That tells me that you aren't packaging releases, or if you are, your documentation is less useful as a reference than your code.

These are antipatterns, again, and "fixing" your VCS isn't going to help in the long run.


> So, the performance of binary assets in SVN is exactly why we used to--when doing game development--check all the source files out into their own repository, and occasionally sync over.

The assets for game development likely are bigger than the assets in web development but we never had major issues with binaries. Things also improved much in later SVN versions (in the beginning the client was way to stupid and tried to diff binaries which obviously could not work)

> As for the configuration being in the same repo as the code--again, why?

Why not? I don't have that right now, but I'd like to have it. Because I could compare the state of the code and the state of the configuration with the state of the code easily at any time at any commit. I could use bisect to figure out at which point things started failing. Helps when you're trying to figure out when config and code started drifting apart. I can use google's repo tool or submodules or stitch together the config repo and the code repo by matching up dates or tagged versions, but that's all a hack. It's maybe the best hack currently available, but I still think there's room for improvement.

> That tells me that you aren't packaging releases, or if you are, your documentation is less useful as a reference than your code.

I can't imagine how you reach that conclusion. You could not be further from reality. Just because I'd consider it nice if I could have a single version identifier in a single repo that matches up both, config and code I'm not rolling that in a package? And how does that relate to my documentation?


> So, admins never use version control for puppet manifests/chef cookbooks/terraform/saltstack/whatnot and all the config data that's put on the servers?

Of course they do. But those repos should live on an internal repo server, and on the production systems themselves.

Copying production authentication credentials to a non-production machine should be a firing offence.

> And it would not be of value if we could version the configuration along in the same repository as the code to which it belongs? So it's easier to track?

Hell no. Code without configuration is like a gun without bullets: it's an interesting piece of work, but that's it (I'm not a huge believer in intellectual property, you can tell). But configuration is the keys to the kingdom. If a cracker has your app's code, he might be able to figure out some flaws in your protocols or your implementation, but if a cracker has your database passwords, your hostnames and IP addresses, your firewall configuration, your bastion hosts—then you're dead.


Enlighten me, I'm seriously interested in knowing, because I'm missing the point:

Currently there's code in one repo and configuration in another. Both repos are on the same internal repo server. Both repos are accessible to multiple persons - some can access the code only (developers) and some can access the configuration repo only and some can access both repos (admin). A build process tags both repositories and builds an artifact that gets deployed.

How is that situation superior in terms of security over:

Code and configuration live side by side in the same repository that supports access controls. The repository is hosted on an internal repo server. The parts that are code are accessible to developers only and admins can access both parts. A build process tags the repository and builds an artifact that gets deployed.

The only point I could see is that with two repositories it's harder to mess up the authentication, but I doubt that's true. In both scenarios we have people with access to the configuration and those people will in a lot of organization have a copy on their laptop that they carry around. That's how most people use version control. In both setups it would be possible to only ever handle the sensitive data on a remote system, but that's a property of the workflow and not a property of the VCS used. I seriously don't see the issue with a shared repo - if it supports access controls.


Well, you do that anyway when you allow people to check out a local copy of the code. Just as in most VCSes, you can set up a git server to only allow checkout of specific branches.


>Well, you do that anyway when you allow people to check out a local copy of the code.

You grant that permission to the person legitimately checking out code, but not to the person finding or stealing a laptop with a clone of a repository. The latter is a side-effect of how a DVCS works. In SVN you don't even need to expose the full history, you can grant access to the last revision only.

> Just as in most VCSes, you can set up a git server to only allow checkout of specific branches

In SVN for example you can restrict people to single directories (or even files - I don't remember exactly). That at least is impossible in git. I can prevent pushes using hooks but not reads.


> You grant that permission to the person legitimately checking out code, but not to the person finding or stealing a laptop with a clone of a repository. The latter is a side-effect of how a DVCS works.

I'm not sure what you're getting at. What difference is there (not that you would allow checkouts on unencrypted laptops anyway)?

> In SVN you don't even need to expose the full history, you can grant access to the last revision only.

> In SVN for example you can restrict people to single directories (or even files - I don't remember exactly). That at least is impossible in git. I can prevent pushes using hooks but not reads.

These restrictions may be useful in some cases, but I would wager that they are far more seldom than some of the advantages of git (like being able to work offline).


> I'm not sure what you're getting at. What difference is there

A checkout from SVN/CVS only contains the last version. Files that were deleted in an earlier version are only on the server. A clone of a DVCS contains all versions and all files that ever were in the repo (unless you use BFG or git-filter-branch, but people tend to forget that). So a clone can contain secrets that people are not aware of, such as accidentally committed and deleted files. An interested party could find stuff that you're not aware off by looking at HEAD.

> (not that you would allow checkouts on unencrypted laptops anyway)?

That's not my call to make, but I agree on that regard. Reality sadly different from what we both wish.


> An interested party could find stuff that you're not aware off by looking at HEAD.

Well, that goes without saying. But I don't think that security argument is a very poor one compared to the huge benefit of having the history locally to inspect.

We've had instances where secrets were committed to local repositories by accident. It never got past review and into the master branch. If it had, we would probably had taken the effort to rewrite that commit out of the history.


> Well, that goes without saying. But I don't think that security argument is a very poor one compared to the huge benefit of having the history locally to inspect.

If you go further upthread you'll find that I said "a valid tradeoff, but one I'd keep in mind"

> We've had instances where secrets were committed to local repositories by accident.

That's laudable, but countless examples show that not everyone is that diligent. I'd love if I could lock down some parts of some repos so that they're only accessible by people that I have an elevated level of trust in. (and where I can enforce a certain security level on the laptop).


> That's laudable, but countless examples show that not everyone is that diligent.

Sure, then again I would guess that the ones who are not that diligent are not likely to apply those access restrictions that you mention (although the "one revision" advantage is something they would get "for free" with SVN).


In any given larger organization there are people that have exert control over only parts of the whole. I could possibly argue to tighten down security on parts of a repository for some people within boundaries (like declare some folders as unreadable to some folks that don't need access to those) but I can't deny them all access since they need some of the content stored in the repo. With git that's currently all or nothing which exposes a flank that I'd prefer closed. In this particular case it's not a terrible issue, but for other folks with other data that can quite well be, so the tradeoffs may end up being in favor of SVN. I can imagine that that's one of the reasons I still see SVN deployed in corporate installations.


I'd go out an argue that because the added complexity of git is far outweighed by the benefits of a local repo.

But for precisely what use cases or in what situations? If one is a novice and just want to have source control and versioning, centralized systems are going to have the better cost/benefit.


> If one is a novice and just want to have source control and versioning, centralized systems are going to have the better cost/benefit.

This seems completely backwards to me. When I was in college, I thought it was a giant pain to source control my homework, because the options I was aware of (since all I knew about was SVN) were 1) public hosting on sourceforge (which was itself a pain to use), which professors were not keen on, 2) find a private host somewhere (not sure I ever did), 3) run my own server somewhere, 4) configure and run a local server. I went with (3) and (4) but it was never an easy or novice-friendly solution, and mostly I just didn't version control things. The local version control model of DVCS would have been much easier.

It's only when working with other people that any extra complexity even rears its head at all, which doesn't come up if you're a novice who just wants to have source control and versioning.


a purely local svn repository never needed a server.

svnadmin </path/to/repo/on/local/disk> followed by svn checkout <file:///path/to/repo/on/local/disk> is all it ever needed.


Same for git, though.


> And in that cause you can scp your repo somewhere and change your origin and move on.

If you happen to have - god forbid - a php app that pulls dependencies via composer you have a hard dependency on github since composer pulls practically all code from GH. I don't consider that a good idea, but that's how it currently is. [No, I don't do php and only sometimes have to clean up the resulting fallout]


If particular frameworks want to tie themselves to a single server/host, that's their problem, and is specifically not a problem with any of Github, git or DVCSes generally.

You really cannot stop all people from being idiots all of the time.


That's not a hard dependency, it's a sensible default that you should be overriding if you're running composer install as part of your deployment process instead of including composer's vendor directory in your repo. Documentation: https://getcomposer.org/doc/05-repositories.md


Practically all packages use git sources. Yes, you can vendor them - congratulations, you just kicked the can down the road and moved the issue to the build step. The issue exists for other languages as well - maven/mavencentral, ruby/rubygems.org - but only composer depends on github very much. I don't like that but it's not like the other solutions are much better.


Many companies mirror maven, so having mavencentral go down is not a big issue. Does composer make it possible to deploy a mirror of its dependencies?



You can use your own repository, so at least in theory that should be possible. I haven't done that in practice, so I can't comment any further.


I would bet its dependency is git not github.


> Not in portable computers. I can get a TB, but that's it. I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

git-annex is a pretty good solution for this: https://git-annex.branchable.com/

> That's a strawman. The article does not argue that there's no use-case that's solved by git or and DVCS. It's just that not every use-case is solved by git. I'd even go out and argue that most use-cases are solved just as good by a solid centralized system with less complexity involved.

I'm not convinced that it is a strawman– I'm not an uber-developer, but I've had to do it reasonably often. A common case is that the originator loses interest, and you still need to do maintenance on it. I'd rather my go-to choice of a version control system support that notion rather than learn a different tool to deal with this case.

> I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain. DVCS fundamentally introduce complexity that is not always needed. And I doubt that the fundamental complexity can be abstracted away.

Sure, there is an additional level of abstraction, but to argue that it's insurmountable seems rather pessimistic. Most technologies that we take for granted now required multiple decades to get to a consumer-friendly state. This is perhaps a philosophical difference that only time will answer.


I'm not a fan of solutions like git-annex, every time I've seen people try to do this type of stuff, two years down the line they end up having built a buggy knockoff of maven/gradle. Why not do things cleanly from the start? The entry barrier is really not that high, and the benefits are huge.

Then again maybe I'm a bit psycho-rigid on build management :)


You're absolute right from my experience. I"ve seen many shops using svn:externals to tie build dependencies together. And now that git is starting to see wider adoption within industry (not just software companies) the same is attempted with git submodules and git annex. It's horrible because this is not the use case these features were designed for, but people try anyway and eventually run into road blocks.

This tends to happen especially in shops that migrate off process-heavy monsters like clearcase and MKS. These tools do a lot more than just version control, and no open source version control system by itself matches what users of these systems expect. You have to throw at least an issue tracker and a good build tool into the mix as well.


Totally agree, big organizations need something integrated. With GitLab we include the issue tracker and build tool you mentioned as well as having git-annex for large binaries.


> The entry barrier is really not that high, and the benefits are huge.

Because of "we'll just add a submodule and we don't need a build system and it's only a single one and we'll replace that with a proper build tooling as soon as we have time". And then it's only a second one and suddenly submodules pop up like mushrooms after a light summer rain. It requires planning and forethought, just like a lot of things that should be an obvious huge benefit, but planning and forethought are in short supply in this industry.


What are you suggesting the right way from the start is? Just using maven/gradle?


A million times yes, if only for the dependency management side. Can you remember the dark ages of having to download libraries from the internet and check them into your source? Arghhhhh


git-annex is a tool for versioning files with git that you don't want to actually store IN git. It is not a build automation system.


Have you checked out ipfs?


That's now known as "I can't deploy because github is down".

git != github. I use git every day at work. I don't use github for that repo. If I did, I wouldn't make reliance on it part of my deployment strategy. If I did do so, I hope I would realize that any problems relying on github resulted in were ones I created with that choice. Hopefully there would be some useful benefits from that choice as well.


> Not in portable computers. I can get a TB, but that's it. I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

Why in the world would you check in build artifacts into source control, especially those that clock in at 300-500 MB?


Because you are accustomed to centralized version control systems that do not punish you for doing something that dumb, and refuse to learn new tools when you are forced to use them.

Anyone checking in 500MB artifacts into git is almost certainly refusing to use git correctly.

It is like somebody who grows up using hand tools, gets handed a power drill, then tries to use it like a pry-bar. You should not use a screwdriver as a pry-bar, that is an abuse of the tool. Nevertheless, many people abuse screwdrivers as pry-bars because conventional hand screwdrivers tolerate this practice.


Thanks for assuming I'm an idiot without knowing the constraint I'm trying to fulfill ;)


After a few days please review that comment of yours. Could anybody in an internet conversation really know all details of your constraints?

Maybe, maybe(!) there is an argument were you would need to version huge binaries which you could generate out of the same already versioned sources, but even if there is, there are methods to backup huge blobs, and git is simply not one of them.


> Could anybody in an internet conversation really know all details of your constraints?

No - nor do they need to. But everybody on the internet could just assume that _I_ know them and refrain from saying that "I'm doing something dumb" and "refuse to learn new tools". So I'd also ask you to review the comment of yours - you're doing the same thing - you imply that "Maybe, maybe(!)" there might be a use-case, so as a matter of fact you doubt I did review and choose my tools.

I reviewed my comment, and I'm at peace with it.


In that case, why did the parent poster assume anyone trying to check in a large binary file is "an idiot"?


The "idiot" part of course is wrong. But if you understand a technology and then use it for something it is explicitly bad at, than it's objectively a bad idea. The chance is very low that you would not be able to solve a binary sharing/backup problem with another tool that's made for these tasks.


> "almost certainly"

You might have a legitimate reason for putting 500MB artifacts into a git repo, but I reeeaaally doubt it.

It is a poor craftsman who blames tools that he is intent on misusing.


You're putting words in my mouth that I did never say. I'm not blaming git for anything nor do I consider git a bad tool. I'm using git for practically all my versioned data. I just don't consider it the tool that needs to solve all cases where I - legitimately or not - want to version data. A good craftsman should have more tools at his disposal than just a blunt hammer.

So you can doubt my use case but I consider it severely impolite to pretend that you know better. Actually you're quite nicely illustrating one of the points in the article.


And you're still not even trying to offer a real explanation, just engaging in a flamewar.


It's all discussed further down in the tread, and I gave as much info as I could. I'm not at liberty to discuss details in public - nor do I need to.

Further down on the page somebody else also mentions a legitimate use case to version-control large binaries (needed for comparison). Another use-case I've seen is version control rendered video output and keep the comments and metadata attached to the versions. Works just fine with SVN, fails hard in git. Yet another use case for a system that handles binaries better is what the rubygems folks do - they vendor all gems that a particular version of rubygems.org depends on so they can bootstrap without rubygems.org being available. They built a custom solution using multiple git repos which works for their use-case (it's been discussed on rubygems). Arguably, having a system where versioning large amounts of binary data works better than in git would have prevented that issue.

So there are use-cases that are ill-suited for what git can and cannot do - and just because I say I have one I get to be called a bad craftsman by someone who doesn't know the least bit about what I'm trying to do?

And now you're saying that I engage in a flamewar when I point out that I consider that an insult? Please note that I have not insulted Crito.


The real explanation is that someone would want to version binary data. Like say, images. The parent poster claimed that anyone who would want to version binary data is an idiot.


The original article addresses this.

"These are large, opaque files that, while not code, are nevertheless an integral part of your program, and need to be versioned alongside the code if you want to have a meaningful representation of what it takes to build your program at any given point."


In this case, git isn't the right tool for the job. Large gaming companies, IIRC, use Perforce for this reason.


> I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain. DVCS fundamentally introduce complexity that is not always needed.

Rebase and DVCS are orthogonal concepts. Rebase is a consequence of offline commits, not of decentralization. And I greatly prefer dealing with a `git rebase` issue than the equivalent in Subversion, which is to have your uncommitted local changes get irrevocably modified by the `svn update`, with conflict markers placed in your code and no way to abort the whole process.


But... but... but... git bad!


It's a prevalent meme, isn't it? I'm surprised at how many people never even realized how fundamentally dangerous it was to issue an `svn update` with local changes. And perhaps less seriously but still problematic, how `svn commit` can put the repository into a state that never existed on any developer's machine (because it effectively does a rebase on the server), which is not always safe to do even when there's no merge conflicts.


> how `svn commit` can put the repository into a state that never existed on any developer's machine

git add -p creates the same issue for git (a commit that never existed on disk) and git rebase with squash and/or reodering of commits does as well. That's usually considered best practice.

The problem I often observe with git (pull -)rebase is that people have a hard time wrapping their mind around the fact that all commits change and when they change and in what state the checkout is are when they get a merge conflict (remote HEAD plus whatever was already applied). The conflict is often unavoidable and you can argue that rebase is making it easier to resolve the conflict, but it's harder to reason about than "ok, update does a merge and I get whatever is on the server plus my local changes".


The difference between `svn commit` and the various forms of `git add` or rebasing is that the developer can test the commits (or visually inspect them) before pushing. In fact, the argument that `git add -p` produces a commit that didn't exist on disk is exactly the same as the argument that developers can commit code that they never actually built & tested. Which is to say, the important point isn't that the commit tree existed in isolation on the developer's disk, but rather that the developer did any testing necessary to verify that the commit is good. And the reason why `svn commit` is bad in this regard is because it creates the never-before-seen result on the server.

> The conflict is often unavoidable

If the conflict is unavoidable, then you're going to get a conflict regardless of how you go about it (rebase vs merge vs `svn update`). The difference is with `svn update` you can't abort the whole process and start over if you need to, because you've already permanently lost the previous state of your work tree. Whereas with `git rebase` and `git merge`, you can abort and you'll get the exact state you had prior to the command.


Why are you putting build artifacts into source control?


Because they're intermediary steps of a process, regenerating them takes an hour and I don't feel like setting up everybody's environment to build them. Most people don't need the capability, but they need the result.

Why are you asking?


I think the question asked was closer to "why aren't you using an artifact repository?"

Nexus is pretty good, but if the language you are using isn't integrated well with gradle/maven you can always just use a shared drive fed by jenkins builds.


> Nexus is pretty good, but if the language you are using isn't integrated well with gradle/maven you can always just use a shared drive fed by jenkins builds.

Here's where I start to have problems with contemporary development culture. You mentioned using Nexus, Gradle, Maven and Jenkins where the guy just want to keep some binaries along with the source code they're generated from.

We're complicating things beyond reason nowadays.


To bring it back to the OP, this argument is in fact represented in the OP.

OP is arguing that these (having to use Nexus, Gradle, Maven, and Jenkins just to keep some binaries along with the source code they're generated from) are workarounds to limitations in git that ideally would not be there (and don't neccesarily have to be there, and aren't there in all VCS's), and the OP mentions that instead git fans want to claim "No, that's just the way git SHOULD work, you SHOULD need to go use an 'artifact repo' in addition to git to keep a few binaries with your source code" instead.

I tend to agree with the OP.


Agreed.

That said--and this is without knowing the exact build and tooling environment, so I may well be giving advice inappropriate to the situation at hand!--the second part of that "keep some binaries along with the source files they're generated from" is kind of an antipattern.

If it takes too long to generate them from source, each time erry time, they need to fix that issue--least of all because slow builds mean slow testing, and slow testing means no testing.

That's why I spent two days earlier this year moving a 30-minute build down to a 2.5 minute build.


Yes, your advice is sound for the general case but inapplicable in this specific incarnation of the problem.


Out of curiosity, what is the gist of your setup? Why is this incarnation such a departure from the general case?


The output of the build is basically a tool in itself. So most people don't need the build process, just the resulting tool. The input changes on pretty much a monthly basis and is not easily versioned. I could set up all dev machines to support the build and everybody could build it themselves from sources, but that would require me to

  * install all the required instruments for the ritual rain dance
  * teach the whole team how to do the ritual rain dance
  * support the people that break an arm or a leg doing the ritual rain dance
So I prefer perform the dance on my machine, collect the tool and point the main dev environment to the right location. It's all scripted, so I kick off the job and grab a coffee. However, I want to keep old instances around so I can track when bugs crept in, so I can't just go overwrite the result, so I need to adjust the pointer every time. If I could just check the tool in with the regular dev setup that would be much easier, but - since we're using git - that would blow up quite quickly and overwhelm my disk space. (And folks would kill me for filling up their disks as well - rightfully so). That's something SVN or another centralized VCS would handle much more gracefully. In a gist, I have a use-case for fairly stupid versioned file storage with a push/pull api. No complicated merging, no branching, nothing. git-annex could do, but is overkill.

There are some better solutions to what we're currently doing, but there's so many yaks to shave and so few razors.


> That's something SVN or another centralized VCS would handle much more gracefully.

Subversion handles this gracefully because you don't download all of the repository's history to your machine. It's a trade-off that you're talking about here. Most people are ok with losing the ability to check in 500MB files in order to gain the decentralization of having a full copy of the repo (and not needing to query the server just to view history).


Thanks for such a good explanation. When I first saw artifacts of things generated by code in our repo, I had a big WTF moment, but it made a lot of sense once someone pointed out that it was Rather Handy for catching bugs in the code that does the generation.


What if, instead of binary build artifacts, they were images?


> We're complicating things beyond reason nowadays.

There isn't always a way to dumb things down to the level that people would like. It would be so much easier to get to work if I could fly in a straight line between work and home, but I don't complain that the world doesn't accommodate me.


He also mentioned using a simple shared drive. It would have been difficult to know which of these solutions was the right level of complexity and capability without knowing more context (a point which he seemed to make clear).


Shared drives are mutable which is a big problem with build output.


No, I didnt. I mentioned a shared location. It's not an networked drive, I'm not insane.


Fair point. Comment retracted. My apologies.


Valid question: Because the tooling does not think in terms of artifacts that are versioned in repositories, but rather in terms of "files" that are in a given location. I'm using a shared location, but every build requires modifying another file to point to the now current version. It's all solvable, but the easiest solution would be to just version the result.

I could fix the tooling, but alas, I have other yaks to shave. It's an imperfect world.


> but every build requires modifying another file to point to the now current version

If you could version the file in git, you would have to check in the new version of the file, so it's not like you're adding a step to have to update (e.g.) a symlink.


But I have a lot of files lying in a shared location that are named build-<datetime> and I can accidentally break revisions in the repo by moving/renaming/deleting any of these. That may be a feature to some people, but that's something I consider a weak spot. It's brittle and prone to breakage and I dislike brittle.


People get pretty religious about what should (or not) be checked in to version control.


I can tell from the downvotes :)


It's designed with a particular use-case in mind. When people complain that square pegs don't fit into round holes, it makes more sense for them to step back and evaluate what they are trying to accomplish, and the tools they are using the accomplish it.


>> Because they're intermediary steps of a process, regenerating them takes an hour and I don't feel like setting up everybody's environment to build them. Most people don't need the capability, but they need the result.

Put your build tools in a repo. It should be easy to set up a new developer with a complete build system by getting it from the a repo. Now use make or anything that can check dependencies so those things are not regenerated unless they need to be. Always check in a freshly generated file along with its source.


Setting up a build system can be a bitch. Not every developer needs to be able to build every obscure module of all your companies tooling. A .net developer shouldn't have to rebuild c++ boost just because he relies on a small native dll.

In some cases the tooling of one developer can even conflict with the one of another, one requires python 2.7 to be in your PATH while the other requires 3.3, etc.

I fully agree that it should be easy to set up but the reality is different, especially when you deal with legacy, rely on third party libraries or build open source projects from source.


What is annoying is when build artifacts are checked into source control systems along side of the source. There is very little way to exclude them or only check out the binaries, you can only have both. This happens all too often and I don't care if you want to keep using subversion you really should stop doing this.

/src/foo.c /src/foo.o


Not the OP, but in our product we have very large binaries that are the output of the product itself. Each time a change is made to the program, the output needs to be compared to the existing output first automatically and if the output isn't byte-by-byte identical then manually (visually).

So essentially, without large binaries in source control we'd lose testing.


I think you argue in the same direction I often argue. Git is not for everybody. If you don't want to spend the time to learn git you probably are better of using something else.

Additionally I would like to add something: There are more cases were you need than you think, especially if you haven't learned it. E.g., you start a little prototype to present to your boss. It succeeds, boom, ten years later you have 100 man team working on the same prototype you started 10 years ago. Suddenly you really need forks, branches, some people spend their whole day merging stuff, etc. But hey, you can't switch from SVN because you have chosen it in the beginning and now everybody is using it, all kinds of scripts, tools, and optimized workflows require that you continue to use it.

Some people might think that spending two weeks in the beginning to learn to use a power tool is much better than starting earlier and thereby getting years of pain later on.


I agree: whenever you find yourself saying "I don't have time to learn this advanced tool that will make my life better, I have to ship now!", you should really take a moment and carefully verify that this is a good choice. (Sometimes it is a good choice though.)

In the world of science, I have colleagues who have said this for ten years wrt. LaTeX vs. Word, or even just learning EndNote vs. typing up reference lists by hand. I cringe every time I watch them spend an order of magnitude more time on pure overhead, and always getting stuff wrong, for every single paper they put out.


That's a really good reason: When you are unable to learn it correctly (maybe simply because it's too complicated). With the simple tool you are still not able to do all the cool stuff, but at least you can get something simple done fast.


> That's now known as "I can't deploy because github is down".

I had that happen a few months ago. Had to push the repo to bitbucket and change where a config file pointed to. It took almost as long as doing a merge in SVN.

> I have build artifacts that clock in at about 300 - 500 MB and I'd version control them if possible. I can't, because that would fill my disk within a couple of month, so I have to push them to a server and somehow link them.

Those are very unlike source code. You're not going to want or be able to diff them, or blame them, or view their history. A different tool is appropriate.

> I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain.

Yeah. I see that at about the same rate I saw developers lose data back in the svn days. Git can "wedge itself" but never in a way that induces data loss, IME, in stark contrast to its predecessors.


> view their history

You can view the history of binary blobs in a git repo.


The main reason for the abandonment of centralized version control was as much about Git and Mercurial having the advantages of DVCSes as it was about SVN starting to show its age.

I found SVN literally painful to use, perhaps because it was based around CVS's standards, and CVS was originally a series of shell scripts written in 1990.


> That's now known as "I can't deploy because github is down".

No. No it's not. "I can't deploy" is different than "I can't work," unless you "move fast and break things" by deploying as soon as you push your code. Even then git allow you to continue to work, even if you haven't deployed your code.


"I see a developer wedge his git repo with a pull + rebase about once a month. And then somebody needs to walk over and explain."

Accidents happen, but the pattern of making people make highly predictable mistakes and then calling in the experts is the result of not investing on basic training (and possibly not hiring smart enough people, but it's another matter).

By the way, that "wedged" repository in all likelihood has an uncorrupted working copy containing all changes and can be repaired offline: Git is technically so superior to Subversion and TFS that it isn't even fun.


How do you get local commits plus a centralized storage without basically building a DVCS?


Local commits do not require the full history, only a known base. So you could in essence store the commits relative To the last pull from the server and push those later. No need for full history.


A DVCS without complete history is still a DVCS. In fact, what makes a VCS distributed is much more local commits than local history. At least on my opinion.

But well, anyway we decide to call that beast, a VCS with distributed commits and centrilized (fetchable) history, with a simplified evolution graph (created by the point of view of a central server) would bring 99.99% of the benefit of a centralized VCS, 99.99% of the benefit of a DVCS, and almost no problems from any of them.


git practically already has this with the --depth option to git-clone. The only thing that it's missing is the ability to enforce storing the last single commit.

I'm sure that you could probably build the system that you are describing on top of git, but you would still run into problems with large files and local commits, even if it was just "commits since last pull/push."


Yes, git almost has the shallow repository done. It's even in the article, and that "almost" means it's almost trouble free, but will catch everybody off-guard once in a while.

The other part about the simplified evolution graph is completely missing. I don't think one can solve that without completely rewriting the protocol.


You could indeed do that. But now you have two different kinds of commit that can conflict in surprising ways. I very much doubt you could manage this in a way that's simpler than what git does (remote branches and local branches are branches, you can merge one into the other using the normal tooling you use for merging branches).


I'm fairly sure that at least on the ui-side you could improve over git, but the goal is not necessarily lower complexity. A semi-distributed system could have other advantages, for example fine-grained access control (read permissions on folder or file-level), support for partial checkouts, better support for large binaries (only the last revision on the local repo, older revisions on the server) etc.

git currently can't do that because of the way it's designed and built.


I have to agree here. You only need a startpoint. In essence a commit is a diff on that startpoint. So "cloning" a single commit should be enough to create a new one and push it.


git has 'shallow' clones, where it checks out the tree as it was at a particular commit, and has a dummy commit which replaces the history of the repo up until that particular commit.


A semi-distributed DVCS[1], which is what the author actually advocates, allows for local commits and work with no connectivity.

[1]: https://code.facebook.com/posts/218678814984400/scaling-merc...


> The author mentions that you have to retain the whole history of a project. For one thing, storage is cheap. Another point worth mentioning is that you can make shallow clones with Git.

And for a third, it opens options which are not available without that. Annotating or bisecting with server round-trips every time is not really an option.


I disagree - I've done bisects on very large codebases, and they take a while (5-10 seconds for a checkout, perhaps). This is all immaterial compared to the time to recompile the code (possibly a minute or two to get everything recompiled (a Java project)).


I have worked on plenty of projects where both of those would have taken much longer than that with a non-DVCS.


> Nothing precludes you from using the patch model with DVCS (I mean, Linux kernel development uses Git just fine with this)

Technically, no. But how many projects out there would accept an email patch? They'd probably reject it and tell you to issue a Pull Request instead.

I think his greatest argument is comparing the steps of contributing to github vs contributing to an svn repo.


> Technically, no. But how many projects out there would accept an email patch?

Mercurial works with email patches. Not only would they accept it, that's the only way to contribute, sending emails to mercurial-devel.

> They'd probably reject it and tell you to issue a Pull Request instead.

Obviously you're supposed to use the project's workflow, but the point is nothing prevents you from setting up a patch model with a DVCS. Quite the opposite in fact, both Git and Mercurial have facilities for automatically formatting and sending patchsets, and for applying trucktons of patches.

Ref: git am, git format-patch, hg export, hg import and hg email


> Technically, no. But how many projects out there would accept an email patch? They'd probably reject it and tell you to issue a Pull Request instead.

That's more a social issue. How many projects accept patches that go against their submission guidelines? Or coding style guidelines?

> I think his greatest argument is comparing the steps of contributing to github vs contributing to an svn repo.

I found that particularly weak.

Let's look through them in more detail.

    1. Get a copy of the source code
    2. Make your change
    3. Generate a patch with diff
    4. Email it to the mailing list
    5. Watch it get ignored
Wrong. You can't generate a diff, unless you first made copy of the original sources, or re-download/unpack it. So there's an essential step missing.

And often enough, you simply don't have access to the svn repo to do step 1.

    1. Fork the repository on GitHub
    2. Clone your fork of the source code
    3. Make sure you’re on the right branch that upstream expects your patch to be based on, because they totally won’t take patches on master if they expect them on dev or vice-versa.
    4. Make a new local branch for your patch
    5. Go ahead and make the patch
    6. Do a commit
    7. Push to a new branch on your GitHub fork
    8. Go to the GitHub UI and create a pull request
    9. Watch it get ignored
1. is github specific. Gitlab and Bitbucket don't require that

3. applies to SVN projects too.

4. is optional (though highly recommended)

But it gets really interesting when you want to do a second, separate patch. Do that svn when you can't commit directly? well, either throw away your first set of changes, or make a complete copy of your whole checkout.


"Watch it get ignored"

"Submit a patch" is open source's way of telling you to fuck off.

The Github business of creating a whole publicly visible fork just to submit a patch is a bit much. I have some obsolete forks on GitHub which I need to kill off so someone doesn't try to use them.


Even worse is when people are actually using them because they liked one of your PRs that was never accepted and yell at you when you kill it.


You could of added a few steps, like find out what is the right svn branch to commit source code to, because they might only accept patches to the dev branch vs the trunk branch for example.


git has tools to accept email patches, it's just most people don't use it. I'd accept email patches as long as they merge in a sane fashion.


There are also projects that reject pull requests and require an email patch. Different projects, different work flows.

I'm agree with the author that the Github model of "always fork the repo public" is stupid. Why not simply push to the official project repo and (for unauthorized people) let it show up as a "pull request"?


I wouldn't be surprised if there are still more repos with >100 maintainers who mostly receive mail patches. Just because you haven't grown up learning them doesn't mean they are not bigger than all you know, right?


It's a fake comparison. It only looks like fewer steps because he's not counting how many steps it takes to send an email with an attached file. Not to mention joining a mailing list.


The idea of a "pull request" is a github specific thing, isn't it? Email patches are the normal way of contributing changes in the distributed projects I'm familiar with.


Actually the original idea for a pull-request is an email send from your local (but accessible via internet) repo to the original repo maintainers that asks them to fetch changes from your repo and merge them.

I think the name of the tool is git pull-request.


Ah, it's git-request-pull. Thanks, I'd never heard of it.


From the conclusion at the end of the article:

"We aren’t going to abandon DVCSes. And honestly, at the end of the day, I don’t know if I want you to. I am, after all, still a DVCS apologist, and I still want to use DVCSes, because I happen to like them a ton. But I do think it’s time all of us apologists take a step back, put down the crazy juice, and admit, if only for a moment, that we have made things horrendously more complicated to achieve ends that could have been met in many other ways."

In the next paragraph, the author links to a post that explains how Facebook will soon experience productivity bottlenecks because of their repository size. That post also explains why they don't want to split up their repository and that, "..the idea that the scaling constraints of our source control system should dictate our code structure just doesn't sit well with us."

These are not UX gripes and the problems aren't solved by adding more cheap storage.


Remember the days when working from home using Clearcase meant 3-minute right clicks?

Peprage farms remembers.

Sorry shelshock.


> UX issues

this is exactly what the author is saying. and he covers your post there much more concisely. all use cases you mentioned are used infinitely less used than the time you want things git make more cumbersome. that's the while point and i think you made your reply after reading only the title of the article.

but the trolling succeed. everyone replied. sigh. even me.


> Note: we fell in love with DVCSes because they got branching right, but there’s nothing inherently distributed about a VCS with sane branching and merging.

No, I fell in love with it because it was distributed and I could work without an Internet connection, which aren't prevalent everywhere, and even in my house, in a large city, can be iffy. By work I mean things like blame, bisect, log, &c not just committing.

> Let me tell you something. Of all the time I have ever used DVCSes, over the last twenty years if we count Smalltalk changesets and twelve or so if you don’t, I have wanted to have the full history while offline a grand total of maybe about six times.

Well lucky him for only being able to code when he has a nice connection, not all of us do.

Also, I like the distributed aspect as well. I like not having to give people commit access to my repo for them to have a proper dev env and then they can send me a patch or a PR and we can incorporate their changes. How would they be able to make any commits or anything otherwise without access to my repository?


I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

There are a lot of companies that actually would prefer if the code never left the premises and have a use-case for finer grained permissions (some folks can only touch the assets, others can only ever see the dev branch, can't see history,...), things that are by definition not possible in a DVCS.

Storing large assets in git sort of suck and requires ulgy hacks. I'd love to version the toolchain and the VM images for the local development environment, but that's just not feasible with git.

I consider git a perfect match for loosely knit teams that are spread around the world and travel a lot. It's a great tool for OS development, but it's advantages quickly evaporate for teams that sit in a centralized location with a good connection to the server (cable, Gigbit) and only ever work from there.


> I consider git a perfect match for loosely knit teams that are spread around the world and travel a lot. It's a great tool for OS development, but it's advantages quickly evaporate for teams that sit in a centralized location with a good connection to the server (cable, Gigbit) and only ever work from there.

Yes, I think what a lot of people forget is that git was designed specifically with the Linux kernel in mind. Linus wrote it to the workflow of his project without much regard for what other projects do. That's fine; there's nothing wrong with that at all. It just means that it's not suitable for every project, and that's a good thing: different types of projects should use tools that are actually designed for them.

It also explains why git has such a strong learning curve: it was written for kernel hackers. The only people who were expected to use it are the kind of people who are used to delving deep into the nitty gritty. It's why I'm kinda disappointed GitHub became the dominant public source code host, because Mercurial is IMO much better at actually being penetrable to new users. I think people who are mostly familiar with SVN would be far, far more at home with Mercurial than git.


Mercurial beats Git hands down across the board for me. I've worked on so many projects where the initial development was tossed into Git and then the devs spend three days trying to get their codebases synced and each using a different tool which may or may not implement the core commands of Git.

Mercurial? Works everywhere much more simply, even ties into .NET with VisualHG and gives a better version/branch management than TFS. And doesn't mismanage disk space like Git.

Mercurial + BitBucket is the cleanest, fastest way I have right now for adding devs to new projects. I avoid Git because so few people (ESPECIALLY those who have only used Git) understand source control well enough not to make a mess of it.


> each using a different tool which may or may not implement the core commands of Git

That's really one of the core problems of using git and why it's not for everyone. If you want a tool to do your job then git won't make you happy. Using the core tools and learning how it works _inside_ is the only way to make it work efficiently.

If you want/need a bike (something easy like svn), use a bike. If you want/need to use an air plane (git) you need to learn how to fly and that costs a lot of time. Putting something on the plane to make it look like a bike (a tool that may or may not enable all of git but probably not) won't suffice in either case.


> I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

Maybe your team isn't a bunch of road-warriors but networks still drop packets, servers get overloaded and many people work remotely using imperfect VPNs & ISPs. It's really easy to forget how much time that used to waste but switching to Git meant that we no longer had daily chatter when any of those flared up & people just got on with life.

That said, I'd love to see some focus on tooling which improves the painful parts you mentioned. I'd love to share binary data in Git and it's possible but painful. Similarly, the main selling point for Git on internal projects is the massive performance and usability wins over most of the competitors but there's no reason why that must be the case other than inertia on the part of the other options.


Sure, networks drop packets and sometimes break down. But the number of issues with a solid office network is fairly low. Glass fiber, gigbit to the server. Folks don't work via VPN. I'm not pretending that the "road warrior" and remote worker use case isn't well served by git - but the "office worker tied to a desk" use case still exists. And from what I see is that it's more dominant than we'd expect.


> But the number of issues with a solid office network is fairly low

The point was simply that “low” is not the same as “does not apply” and that matters when it's something which prevents someone from doing their job. Even when I worked at 100% on-site projects, I used git-svn so I could make local commits and ignore locking mishaps.

Don't get me wrong, however, I'm totally in agreement for having better tools for supporting the local, centralized workflow. The other reason I used git-svn was because merging was much more reliable and I could rebase changes to squash commits before sharing them with others. All three of those features should work well in any serious version control system regardless of whether it's centralized.


Your machine will stop working sometimes, far more often than your office network stops working(1) and then all your nice little, local commits which haven't been pushed so far are down the gutter. There's always a trade-off, you just have to know it and work accordingly.

(1) If this doesn't apply to your office network get a better network. Now.


> Your machine will stop working sometimes, far more often than your office network stops working(1)

Bollocks. I've had two machine failures in eight years, versus at least 20 network failures. Sure, maybe the network admins at five different companies I've worked for all just happen to be a bunch of muppets, but I highly doubt it.


> Sure, maybe the network admins at five different companies I've worked for all just happen to be a bunch of muppets, but I highly doubt it.

That's your privilege. I don't. The only time I remember the network going down was after a complete air conditioning failure in the server room (a highly unlikely event in itself, but not IMPOSSIBLE) which forced a complete shutdown of IT services. And even then people could still work. Sure, not as well as usual but working was possible. The last time a machine failed was .. oh right. Yesterday.


Your machine, or anyone's machine? Remember to multiply a network failure by the number of people if you're doing that kind of comparison.

Also, get better machines. The last time a machine failed at this all-macbook shop was several months ago.


This statement reflects a misunderstanding of the problem:

In either case, you will lose unpublished work in the event of a catastrophic local drive failure.

Only in the case of a centralized system, you will also be unable to work unless the entire network path and remote server is available. This will almost never be a question of data loss but it means that you will be unable to perform version control operations until it's resolved.


> Folks don't work via VPN.

Plenty of people work via VPN. Not just people that have remote jobs, but people that just work from home one or two days per week.


VPNs are sort of primarily used to work via them.


Don't forget everyone working for a consulting company.


Entire remote offices work exclusively through VPNs to the main office. It's definitely a major usecase.


> I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

Why must I be part of a team? Why can't I just be hacking randomly and syncing my history with my personal server as I feel like it.


> Why must I be part of a team? Why can't I just be hacking randomly and syncing my history with my personal server as I feel like it.

Sure, go, do. Just don't pretend that there are no teams and no other people that have different use cases.


Likewise, acknowledge that many other people have use cases where git works very well for them. That might have more to do with its popularity than a mad love for DVCS.


Go three post up, read the last paragraph. I'll quote it for your convenience:

> I consider git a perfect match for loosely knit teams that are spread around the world and travel a lot. It's a great tool for OS development, but it's advantages quickly evaporate for teams that sit in a centralized location with a good connection to the server (cable, Gigbit) and only ever work from there.


Yes, and that argument was silly. There are many use cases besides dispersed teams and road warriors where git's weaknesses never actually come up and its strengths are useful. However, your arguments, like TFA's, rely on an unconvincing and entirely unproven premise that git doesn't actually suit most coders' use of it.


> However, your arguments, like TFA's, rely on an unconvincing and entirely unproven premise that git doesn't actually suit most coders' use of it.

No. The premise is "a system is conceivable that has gits upsides and less of its downsides" and look - facebook is even building it.


That's a premise so inane that it goes right into meaninglessness. Everything is "conceivable", especially "something that works as well as what I'm using in every way, but doesn't have problem X". It does nothing to conceive that, though.

As to what Facebook's building, meh. Anyone can try to build a better-in-every-way-for-every-application VCS, but look to TFA for a list of just some of the failed attempts to produce better version control mousetraps. It's more likely that they'll produce something that will be handy for niche uses than something that will be a clear win over git for everyone else.


> Sure, go, do. Just don't pretend that there are no teams and no other people that have different use cases.

Sure, there are. I've worked on them. I've had networking not work, servers fail, diffs take forever because of an overloaded server. Honestly, at this point, I can't conceive of working in a centralized VCS anymore, so unless you make a salient point about what can be done with one that can't be done with a DVCS it's all opinion vs opinion.


> I call that filter bubble. I like using git for local commits, but 15 out of 20 persons here don't even take their laptop with them and I bet that 17 out of 20 won't touch code on the road. Maybe at home, if they have to. A decent centralized VCS would totally do.

It's not only "for local commits", although being able to have local branches without polluting a public namespace is a huge win. It's also about _speed_ when you're doing VCS operations. Linus Torvalds actually made the case really well in his talk: https://www.youtube.com/watch?v=4XpnKHJAok8

> There are a lot of companies that actually would prefer if the code never left the premises and have a use-case for finer grained permissions (some folks can only touch the assets, others can only ever see the dev branch, can't see history,...), things that are by definition not possible in a DVCS.

That's a question that's completely orthogonal to whether or not you use a DVCS. How is a "traditional" VCS going to help you when you can check out the code locally and smuggle it out on a flash drive?

In my company, we use git and there are access restrictions as to who can access and commit to our branches.

> Storing large assets in git sort of suck and requires ulgy hacks. I'd love to version the toolchain and the VM images for the local development environment, but that's just not feasible with git.

..and that's not the use case for git. Linus has been very clear about _what_ git is optimized for, performance wise.

That doesn't mean that DVCSes in general are useless for storing large assets, but that the most popular implementation is. Also, I'm not really sure what traditional VCS you're referring to, that makes it easy to version VM images and remain storage efficient?


> I have wanted to have the full history while offline a grand total of maybe about six times.

I have probably run git init in a directory at least once a month to have full tracking capability without the expense of setting up a "server". It says something that the author assumes that a DVCS can only exists if it has a central server and that your local copy is nothing but the "offline" copy. I have also run projects where the github repo was just a copy and the version on my box was the official repo. I also run expensive (cpu/time) scripts that walk over the projects history, something the server admin would never let me do. And then we get into the realm of very expensive hooks that run on my desktop and not on the "server". And lastly even in an always connected world if your server is in Australia you can't change the speed of light, hitting it from Germany will always be a long "slow" trip.

Having the full repository at your disposal without being tied to some other authoritative repository provides a lot of flexibility and enables capabilities that are just not possible otherwise. Some features could easily be included in non-dvcs systems (such as local hooks), but I do not know if we would have seen their success without dvcs systems providing the means for exploration.


Agree with you 100%, any project, no matter how small, gets "git init"ed with me simply to start tracking it in case I later regret that change I just made.

And I love that if I need it on another machine it's just "git clone ssh://..." instead of having to setup a server.


Well, I mean, the author goes on to make a bunch of salient points like the difficulty of diffs of nested directories (solved in [1] but not widely implemented) and saving all of the binary blobs in every checkout (solved by [2] which has not "won out").

I think the big thing with DVCSes is that you can pretend you have a client-server model with a handful of directories on your local machine. At one point at my present job I replayed a bunch of SVN history through Hg so that I could by-hand divide the work that had been done into a few named branches; this helped me to figure out where things had "gone wrong" in the project. It was really effective to just have a day of SVN update, hg diff, copy files to ../branch_name, commit to hg, rinse, repeat. What I really needed was indeed the "killer feature" that he's saying -- sane branching and merging -- but the fact that it was all easily contained in my filesystem was a nice plus.

[1] http://www.cs.utexas.edu/~ecprice/papers/confluent_swat.pdf

[2] http://darcs.net/Internals/CacheSystem#lazy-repositories-and...


I fell in love with git because it's FAST. And it's fast because it's distributed.


Have you ever used Perforce?


Perforce is not fast, it cheats: You tell the server in advance what you are doing, so that the submit is fast. This comes at a tremendous price: Your editor/IDE has to have a perforce plugin to do what should be the VCS' job (tell perforce what is happening in your workspace), and the connection to the server has to be reliable and low latency, lest you want to spend seconds every time you make an edit in a file that has not been checked out already.

In practice, this model is a constant source of frustration, and everything Perforce has done in the last few years seems to be workarounds for this broken architecture.


Maybe it's just me, but "fast" isn't something that comes to my mind when I think of Perforce.

There are cases when P4 is the only choice (large binaries come to mind, and really really big code bases) but it's the kind of thing you shift to because you have to, not because you want to.


> Maybe it's just me, but "fast" isn't something that comes to my mind when I think of Perforce.

Agreed. I use git-p4 for interacting with p4 servers here at work. I love that I can create my commits and make them as granular as I want without having to interact with the server until I'm ready to submit my commits.

Using git-p4 means I don't have to 'p4 edit' my files before I edit them (which really sucks when the p4 server isn't available for any reason), I can simply put off any version control workflows until I'm done with my changes (and slice and dice the changes the way I want with interactive rebase).

Thinking of all the little interactions I do with git which a) aren't possible with p4 or if they were b) would involve talking to the server every step of the way makes me cringe. But then out of necessity, p4 developers probably aren't creating such fine-grained commits like I like to do (and indeed isn't even possible without a lot of fore-planning with p4), so they wouldn't notice the speed impact.


I believe you are conflating two aspects: Large binaries, which are, in very specific circumstances, a (or maybe the only) valid reason to use Perforce, and large codebases, which usually aren't.

When looking at an actual example of the latter [1], you will see that they are heavily optimizing against contention on the central database by limiting the size of database operations. If you want to do something that would require a longer database query: Enjoy your client side error message about implementation details you never wanted to know about.

[1] http://research.google.com/pubs/pub39983.html


> I believe you are conflating two aspects: Large binaries, which are, in very specific circumstances, a (or maybe the only) valid reason to use Perforce, and large codebases, which usually aren't.

Unfortunately I did not mean to. I would agree with you that binaries are the 95% use case for P4. I think most developers typically wouldn't want to check many or any binaries in (maybe the odd icon or other small, slowly-changing asset, in which case git is adequate), but game developers and people with other binary stuff (e.g. circuit designs) will have large, changing binaries.

However really big (GB-scale) repos can be painful in git. This is why google gritted their teeth and used P4 until they outgrew it too. That's what I meant by "really really big" -- something of the scale that most of us will (hopefully) never see.


I quit my job because we were stuck using Perforce - how's that? :P


Try the git-p4 contrib. It has its issues, but it's not bad if you're forced to use p4.


I'll second this. git-p4 kept me sane through several years of working in a very large perforce shop.


I have, and at least at my work, it's terrible. Sometimes it gets to the point where my submit requests time out several tries in a row. Yes, I'm sure it's because the people maintaining the server aren't doing it properly, but that's sort of the point. With git, you don't have to worry about it.


> With git, you don't have to worry about it.

Wait until you see a badly managed git server that serves a central repository. You'll quickly change your mind if pushes start failing randomly.


> Wait until you see a badly managed git server that serves a central repository. You'll quickly change your mind if pushes start failing randomly.

But I can't code a better sysadmin into either git or Perforce. A badly managed Perforce server will have the same issues. (Unless, of course, you have an argument that Perforce under bad management somehow performs better than git under similar conditions.)

With git, however, I can still commit, and I can still push and pull changes from other people using side channels such as email. I can, for the most part, keep working. Is it more difficult? Of course. But in the particular scenario, git outperforms Perforce, in my opinion. (But this is not the primary reason I use git; at work, we use GitHub and git in very much a centralized manner. GitHub has its outages, and they're annoying. But not work-ending.)


The GP's words were: "With git, you don't have to worry about it."

I never said that perforce would perform in any way better, but I'd argue that if your VCS server is mismanaged you'd better change the person managing because a badly managed VCS means trouble all around. Try pushing your changes via email to your CI server. Fully decentralized would be beautiful, but I seriously don't see many teams that use git (or any other system) in that manner. Some parts of the infrastructure fundamentally end up being centralized, as stupid and wasteful as it is.


I was responding to you. :-)

> I'd argue that if your VCS server is mismanaged you'd better change the person managing because a badly managed VCS means trouble all around.

I agree. The first time I read your argument, I interpreted it as a reason to not use git itself, but I think we're on the same page.

> Some parts of the infrastructure fundamentally end up being centralized, as stupid and wasteful as it is.

What else can be done? I don't really want to push changes to my co-workers individually, I want a place to push changes that any co-worker can then pull from — do I not? Toward this goal, certainly I could create n servers, and make pushing redundant over those n servers, have then do consensus to agree on HEAD, etc., but that seems to me to be what I'm paying GitHub to do.


If Git pushes start failing randomly, it isn't even a failure: no file has been harmed, and a new central "server" can be freely improvised.


There are good managed git services for companies that don't want to run their own. If my sysadmins are truly incompetent I can use a free private bitbucket repository.


But at least you can run git log and not have to wait minutes.


Ironically, Perforce doesn't even have proper search in commit history when it's working as designed. No need for the server to break...


True. You are spared some of the pain, but not all of it.


I have used it for some full moons. I didn't like it. This was before I learned about DVCS, so my only real reference was SVN, but for what I did (medium-size business apps, a few years of history) I can't say it felt faster than SVN.

OTOH Perforce had some serious downsides like

  * immature eclipse plugin (would crash eclipse more 
    than once a week, forcing a full reset every.single.time)
  * no Netbeans plugin (back then)
  * the whole idea of having to decide in advance, -while in
    the office or otherwise connected, which files you 
    needed to modify and therefore "check out" -
  * which brings me to the next problem: by default only one 
    person can work at a file at the same time.
YMMW, I see some people liking Perforce, OTOH I see people using Macs for coding as well. (That last part is tongue-in-cheek, yes, I really wish I liked Macs and I recommend everyone to consider them.)


For those wondering, Perforce just announced their own DVCS implementation[0][1].

[0] http://www.perforce.com/blog/150303/introducing-helix

[1] http://www.perforce.com/helix


Perforce sucks. From a UX perspective, it really is rotten. Furthermore, contrary to popular belief, it scales poorly.

Perforce scales provided you can continue to throw money and hardware at the machine your repo is on. After a while, for large software companies, that is no longer feasible. You have to split off into multiple perforce repos, at which point you abandon the benefits of a monolithic repo.

I have seen a very large software company abandon perforce for git for this reason. You can't push a single git repo as far as you can push a single perforce repo, but you can push a fleet of git repos way farther than you can push a single perforce repo, and a fleet of perforce repos is something you really don't want to deal with.


You mean that proprietary code? No, I avoid non F/OSS at all costs.


Wherein it is pointed out that Hacker News is proprietary software.



Where is the license file? Just because the code can be viewed does not make it legal to use or that it is "free" or "open source".



Hacker News is a website (so even with their code being open-sourced, I cannot verify what exactly they're running anyway), not software I'm running (JS aside).

The firmware on the routers between me and the server are proprietary, and there is nothing I can do about that.

I can choose not to run proprietary software without having to not use the web.


You sure about that? Your system will be full of proprietary firmwares, drivers, etc.


I avoid it as much as I possibly can. I don't know what more to tell you. I never said I'm proprietary code-free, but just that I avoid it at all costs.

Hell, I've recently been forced to install proprietary code to work with government generated data. Such a mess.


* until you work on a project which has a 9Gb repo and history...


You can do a shallow clone in any reasonable versioning system: http://stackoverflow.com/questions/6941889/is-git-clone-dept...


but then you lose the ability to git bisect, blame etc, which is the root comment's advantage of doing git.

I've worked on a project where the git repository was man gigabytes - because at some point someone decided to put some binary files in the repository, which periodically got updated - now years on the repository's about 10GB, and you can't really delete the stuff clogging it up without rewriting history from years prior and making the 200 devs life hell.

Importantly, you do need all that history, because there are commits from the same time that are relevant.


But then git blame or bisect don't really work anymore.


I'm not too sure what could be realistically done to implement offline bisect without getting the all the history you care about.


One approach would be to query the server for these operations. And unlike what happens in Git, where blame is an O(N) operation, a centralized server is free to spend some extra storage to add some caches or indexes to make these searches faster.


Querying the server rarely works when you're offline.


It may work well enough.


And if you never do, even after fifteen years working dozens of different project? Including a few projects that have run for most of those fifteen years?

I agree that git is weak for binaries, but the only binaries I need to keep in it are a few images and installer - nothing that has many versions, nothing that causes me problems. Similarly, a lot of open source software is mostly made up of code, not blobs.

As for distributed work, even when I'm the only developer on a project, it helps not to have to worry about a central server, especially if I'm on-site. It's also hugely faster than SVN ever was.


9 Gbit really aint that much.


Ours is 370GB. git-svn typically falls over and dies on that, if you're brave enough to try.


370GB is peanuts. Try keeping a centralized version control server up and operating when you have dozens of terabytes and thousands of developers.

Better to abandon the ultimately doomed monolithic repo scheme and tool your build and deployment systems to expect a multi-repo ecosystem. Then allow your teams to create their own repos on the fly, one for each individual project if they please. Once you reach that point, there is little reason to not use a more sensible VCS like hg or git.


>Well lucky him for only being able to code when he has a nice connection, not all of us do.

So basically, you're de facto the field tech example he gave, out in the virtual wilderness of iffy internet.


Really? This is one hell of a filter bubble.

Not entire world is a goddamn Sillicon Valley. Internet is not electricity, and won't be for a while - free and reliable WiFi ain't everywhere; sometimes you work where you're not allowed to connect to a network, sometimes your ISP decides to fuck up the link for an hour for no reason whatsoever, the other day you work at a venue with office-grade Internet (i.e. slow as hell). And then you might want to hack during commute. Data plans are expensive, not every place on Earth has LTE connection (or any form of reasonable data connection for that matter) - and I don't mean wilderness, I mean the highway between two big cities. And try to tether your laptop to your phone, and then suddenly something decides it's time to download 2GB of software updates and boom, there goes your plan.

Above all that, what happened with the concept of owning the data? I want my stuff available off-line, because it's mine, period. I want to run arbitrary code I write on it, and hell, I sometimes want to open the data in text editor and edit by hand. Moreover, doing constant round-trips around the world to do things that you should be able to do locally is kind of stupid.

EDIT:

And don't tell me about YAGNI and stuff. Version control is infrastructure, and the iron rule of infrastructure is - you use what you have; the more you have, the more you use. If you give me a feature, I'll adapt my workflow to use it, even though I was fine without it before. Cutting down on infrastructure reduces the amount of things people will create with it.


>Not entire world is a goddamn Sillicon Valley. Internet is not electricity, and won't be for a while - free and reliable WiFi ain't everywhere; sometimes you work where you're not allowed to connect to a network, sometimes your ISP decides to fuck up the link for an hour for no reason whatsoever, the other day you work at a venue with office-grade Internet (i.e. slow as hell). And then you might want to hack during commute. Data plans are expensive, not every place on Earth has LTE connection (or any form of reasonable data connection for that matter)...

So, in other words, like someone working away from home base...like someone...in the field. I suppose I wasn't clear. I had hoped my use of "virtual wilderness" implied a different domain for "field" that is not necessarily a physical location.

>Internet is not electricity

And there was a time where electricity wasn't ubiquitous either (and it still isn't depending where you are). Such places, then and now, could be reasonably described as being relatively in the field.

>and I don't mean wilderness, I mean the highway between two big cities.

That could definitely fall under working in the field, especially compared to an office.

>Above all that, what happened with the concept of owning the data?

Choose whatever VCS you like for whatever reasons you want; it doesn't matter to me. I'm merely pointing out that I think the author covered that base. You may not be underwater in a submarine, but if you're on the road on your laptop trying to scrounge for LTE or zipping from firewalled office to firewalled office, you may as well be virtually underwater with respect to the Internet.


No, I'm not a field tech. I'm a hacker. I work where I am and when I'm in the mood. In a pavilon at the park and idea strikes? Hack! At home and my internet goes out? No Problem!


At the park when an idea strikes? Write it down on a piece of paper. No need to have your face buried in your computer every second you're at a park.

Edit: All you have to do it jot down the idea if your worried you'll forget about it. You don't have to hand write out all the code that would be silly. I find having a small notebook with me is pretty handy.


It's no better to have your face buried in sheets of paper. And who carries paper and pens around anyway?


For the record, I carry a journal and pen in my pocket all the time. Writing down phone numbers and notes, taking a log of my life, keeping cards from businesses. Looking back through them is much more enjoyable than looking back through a Wordpress.

http://shop.moleskine.com/FileLibrary/4be794eb3e094e59ac8a8f...


I was being a tad facetious. I used to as well, but these days I just use a smart phone or tablet with a suitable note taking app, since I always have one with me anyway. My main point is I don't see the difference between taking notes digitally or on paper.


Are you taking your laptop with you to the park 'just in case'.


I sometimes am. Definitely when the park happens to be in another city I went to visit for some reason - I feel bad without having a proper computational device with me for longer periods of time. What's wrong with that?


I often have it with me because I'm normally out to meet with people and collaborate, but stay out before and after:)


Anyone can work without internet. You just don't commit. If you really need to commit without internet, you're probably an edge case. It's a bonus, but not something I'd choose a VCS system for. My central repo is where the build server is looking, so if I'm not connected, I'm not deploying no matter which VCS I have. The repo is most definitely centralized.


I suspect some of the down votes are because what you mean by commit and what a git user means by commit is different. I suspect that most people using git are "committing" in a way that would better be handled with a versioned file system.

AKA, translated for the git people, "you can work without the internet, you just don't issue pull requests (or pull for that matter)".

Where I work now we still use CVS, and frankly it works fine (even with large binaries checked in). I ran git-cvs for a while, and had a dozen different branches for all the separate features I was working on.

Then I found I was spending a metric crapload of time messing with the version control rather than writing code. CVS may be basic, but in a way that is a plus. It strongly encourages you not to have 20 different features being worked on at the same time, then merging this piece here to that piece there. No, its simple: Work on one thing in your working directory, commit it to the head, pull everyone's changes, go work on the next thing. Repeat.

Eventually I kicked my git-cvs to the curb for plain old CVS. Gives me more focus.

Now I can waste the time I spent on stack overflow (learning how to fix my git screwups) posting here...


Yeah, I was using "commit" in the SVN sense. Which is another good complaint against git that it rewrote the established lexicon to mean different things. Sure a git commit may be technically equivalent to an SVN commit, but when I "commit" I mean that I'm blessing a change as ready to share via a shared repo.


Unpopular opinion here but we're using SVN here on a large project. No intention of changing it. Why?

1. Deterministic "whodunnit" as we have AD integrated with. That is needed when you have 220 people using a repo.

2. Partial check outs. Our platform is 6MLoC in total.

3. TortoiseSVN. Sorry there's nothing else out there as good as that. We use it for DMS tasks as well as you can merge word docs etc.

4. Binaries and locking. Works for design workflows and the odd file we can't merge easily (git/mercurial aren't magic bullets for that).

5. Centralised. All our shit is in one place so there's a "single source of truth". Also no patches flying around.

6. Easy to backup and replicate. Svndump into single file. Or svnsync and there's a true copy.

7. Perfect tooling/tracking integration due to the centralisation and hook support.

8. Easy to use for mere mortals (excel/word pokers).

9. Forces backup behaviour. A specific case but we had a couple of people using git and then pushing to our svn. One SSD fail and bang, the guy lost a week of work because he didn't push to master.

Merge tracking is a non issue since 1.7 and if you need to work offline, just create patches and check them in when you're back online. Also our svn instance has been down for 3 minutes in the last 8 years...

I really can't justify a move to a DVCS and lose all the above so I'm quite happy with my sanity.


1. are you worried about people spoofing someone else's commiter/author entries? 2. `git clone --depth` or use submodules. I agree that they're not very friendly to use. 5. you can easily have a centralized place where everyone pushes 'master' with git 6. `git push --mirror` to some other place 7. you can have hooks on your git server too (for email, continous integration, etc.) 9. even with SVN I was using git-svn (and svk) before that purely for the offline commit feature, or having the ability to do more than one local commit at a time and push a tested set. In your case it wouldn't have made a difference whether your server was SVN or git, I would've still used svk or git-svn locally.

Backups are important, have people run a daily backup on their workstations (like duplicity, but for a LAN even something like rsnapshot would work nicely). You might think SVN saves you until you find out that they forgot to 'svn add'/ a file (or whatever the equivalent was) and they mistakenly cleaned their source tree and now its gone for good.


Some replies...

1. Yes. When you have 220 people to herd, some of whom are contractors, there is inevitably a probability of a bad egg or two. Human personalities don't scale well from direct experience unfortunately.

2. Will investigate. I think one of our guys tried this and found a number of shortcomings.

5. Until someone doesn't or someone pushes to someone else's repo who then pisses off on holiday (this happens so often on our git projects it's unreal - yes we have a few test git projects on the go as well)

6. Aware of that. However it pushes local branches too which introduces some interesting problems.

7. Yes but on every rev that gets committed? With git people work offline for days at a time and then push to master in one huge chunk and our pre-commit validation would go apeshit (it does a lot of analysis and validation so would result in massive blocking of work if you don't commit early and often).

9. You would have used what you were allowed to ;-) (I tried git-svn and had some problems but they weren't terrible I admit).

Regarding backups of workstations, our workstations are disposable. It's not unusual (at least every 9-12 months) for one to get swapped or upgraded overnight with no notice or blow up and turn up with a base image. "meh" and 20 mins and you're back where you started. The mantra is "if it ain't on the fileserver or the SVN, it doesn't exist". Best to checkpoint your shit at the end of the day either by exporting a patch to the fileserver or committing on your feature branch. Nothing else matters really on a workstation and it needs to stay that way.

Mainly cat herding issues to be honest.

Edit: a final comment... You wouldn't believe how many ways there are to fuck up a software project with 200+ people on it. It's a warzone of competing ideas, personalities and politics. The only way to run it successfully is with an iron fist and strict control. That's probably a large reason why centralised VCS makes sense for some orgs.


> 3. TortoiseSVN. Sorry there's nothing else out there as good as that.

TortoiseHG and TortoiseGIT are good contenders.


On paper, yes. In reality, they're really nowhere near TSVN in reliability, documentation, flexibility and integration.


I am honestly baffled by your points 1 & 2.

You have 220 people working on 6MLoC and you think git can't handle this? You are aware that git is used for the Linux kernel, right? I mean, it must be its most famous use case.


1) Unlike the linux kernel, cssmoo's project has lots of large binary files. If he used git the repo sizes would blow out of control and designers would lose the ability to lock the files they are working on (locking is important because you can't merge binary files after the fact)

2) His organization is very different than what the Linux kernel does. Linux operates in a hierarchical manner where each subsystem has a maintainer that collects patches and sends them up to Linus. This system makes full use of Git's DVCS capabilities but its not perfect for everyone.


Git specifically does not allow partial checkouts. I can't check out a subset of the source tree.


gotta live how the ONLY reply that agrees with the article it's full of borderline-trolling replies doing exactly what the article call out. gotta love HN hive mind!

it's basically:

> > I'm using that and it satisfies all my needs.

> but if you join the hive mind you can do all those much more complicated and futile work around steps and get the exact same result.


The best part of the whole article was the line:

>> "If you say something’s hard, and everyone starts screaming at you—sometimes literally—that it’s easy, then it’s really hard.


Hahahaha love this. But I think all version control systems suck - not inherently, but because their usability is atrocious. I enjoy Git... well, perhaps enjoy is a strong word, perhaps what I really mean is I have Stockholm Syndrome.

I've tried to go back to TFS, SVN and other centralized systems, but I'm too emotionally bonded to having my source control system with me, everywhere I go... Starbucks, Tim Hortons, the Train, sitting on the beach. But I don't delude myself into thinking this is anything more than emotional. I like having it there, it gives me comfort - and there is some convenience.

I don't want to have to worry about "How do I undo all those changes from my last 10 commits and go back to where I started down this path?" Git gives me that nice fuzzy feeling of being able to play meaninglessly with my code until something takes form and if it comes to nothing, I can do git reset --hard 1f3c2047 and I'm right back to where I was before I went down this path of insanity...and if it takes a nice form, I can do git rebase -i and squash all my messy commits into a single commit with a single commit message. I didn't need to branch, I didn't need to clone, I didn't need to do anything crazy, I just fire up Bash and I'm done, no muss, no fuss, no having to be connected to the source control server, no dependence on anyone else... and if I did want to put that code aside, I can stash it or spin it onto another branch before coming back to my original branch and resetting back to a sane commit. It's nice having ultimate control.

But having that amount of control has its drawbacks...you can't expect the ability to drive stick, but not have to learn how to use a clutch and gearshift.


> Let me tell you something. Of all the time I have ever used DVCSes, over the last twenty years if we count Smalltalk changesets and twelve or so if you don’t, I have wanted to have the full history while offline a grand total of maybe about six times. And this is merely going down over time as bandwidth gets ever more readily available. If you work as a field tech, or on a space station, or in a submarine or something, then okay, sure, this is a huge feature for you.

Or if you're a Kiln customer, and their site is taking 4 minutes to load a page, assuming it's not just down...

Sorry, couldn't resist...

Seriously, though, I don't see where the author is going with his rant. It doesn't say much, and almost seems like a snide jab at Git and GitHub.

None of his points are very good. Yeah, it doesn't take a DVCS to treat the whole repo as a unit. But right now they're the only ones who do. A hypothetical patch to SVN doesn't help me much.

Also, I know from experience that SVN is horrible at binary files, and especially horrible at large binary files. I'm not saying Git and Mercurial are great, but they're no worse than SVN.

The entire section about the amount of books and documentation is just silly. There was a LOT more subversion info around than "just the Red book". I owned two books, and I know we had a few different ones floating around the office, not to mention boatloads of tutorials and guides online.

It's also funny how he makes fun of GitHub's git tutorial, but doesn't mention Kiln's Mercurial and Git tutorials. He also rips on GitHub's pull requests, but fails to mention that they work better than Kiln's.

But it's great to see the founder of Kiln telling everybody not to use the technology his company is based on...


> Also, I know from experience that SVN is horrible at binary files, and especially horrible at large binary files.

Bug reports to users@subversion.apache.org please. Or it didn't happen ;-)


Indeed. I work on hg a lot (and custom git servers some), and I'd be quick to tell you that Subversion is a lot better in the common case than either git or hg on large binary files.


Interesting about kiln. My interpretation was:

1) Rant about a problem specific to you...

2) Provide no meaningful guidance or solution... (Except move back to SVN to solve a problem I don't have)

3) Make it to the top of Hacker News?

4) ???

5) Profit?


Actually he does point folks towards the work being done with Mercurial to embrace the centralized + offline commits model. Facebook's apparently been building some good stuff around that.


>"The actual reason is because GitHub let coders cleanly publish code portfolios, and GitHub happened to use Git as the underlying technology to enable that,"

Is this guy genuinely suggesting that people fell in "love" with distributed version control solely because of GitHub? I don't feel old enough to be telling this author to get off my lawn. I mean, GitHub is great and all, but it does not represent the sum whole of all the commits in this world.

Granted, does anyone know the timelines of when DVCS took off (as opposed to launched)? Mercurial and Git were both released a good 3 years before GitHub was launched.

Okay, I looked up more about DVCS. Wikipedia article's history section on it:

History

Closed source DVCS systems such as Sun WorkShop TeamWare were widely used in enterprise settings in the 1990s and inspired BitKeeper (1998), one of the first open systems. BitKeeper went on to serve in the early development of the Linux kernel.

First generation open-source DVCS systems include Arch and Monotone. The second generation was initiated by the arrival of Darcs, followed by a host of others.

When the publishers of BitKeeper decided in 2005 to restrict the program's licensing,[5] free alternatives Bazaar, Git, and Mercurial followed not long after.


I think both of you are respectively overestimating and underestimating the significance of GitHub.

Are there a lot of projects that use Git without a hosting intermediary because of genuine advantages? Yes, absolutely. Plenty of core infrastructure and applications, many of which significantly predate Git or GitHub.

Is there a growing subculture of people who essentially equate Git with GitHub and end up completely subverting the "D" in DVCS, incurring unnecessary downtime and who pretend Git/GitHub are the be-all-end-all? Also yes, with circles such as those of HN being overrepresented with "GitHub is your resume" rhetoric and various startups or hip SaaS having GitHub be their single point of failure.

Being the #1 software hosting site in the world does have an effect on demographics, at the end of the day. By now Git and GitHub have an essential, if controversial relationship.


Back when I followed the long discussions about introducing git at the ASF (now live at git.apache.org) several git proponents said that git without github is pointless to them. I was quite surprised how adamant some people were about that. Before then I never thought of github as a sort of GUI for git, just as... one service for repository storage out of many, I guess?

The ASF has a strict policy of self-hosting. So git repos are now mirrored to github from ASF intrastructure, for the github fans who initially suggested the reverse (ignoring obvious problems such as ASF having no influence over what happens at github).


> Is this guy genuinely suggesting that people fell in "love" with distributed version control solely because of GitHub?

The author suggests that the "tipping point" for Git adoption vs e.g. Mercurial was when Github made open source development a much more gameable, visibly social activity. Prior to that, every OSS project had a web page, a mailing list, a Sourceforge or code.google.com repo, etc. But no single use ID existed for developers across all of those little islands of identity and conversation. Github changed all of that.

Coming from the scientific Python community, I can absolutely attest to the huge long email threads as projects decided whether they were going to stick with SVN+Trac or Mercurial+Bitbucket or Git+Github. (This was before Bitbucket supported git.) In the end, all of them ended up going to Github because of the social stickiness, even though its ticket and tracking system is really primitive compared to what even Trac was able to do.


I had the misfortune to have used Arch around 2005-2007. That was an excruciating experience. Updates that would take over an hour.


Note that the author of the article doesn't advocate forgoing local commits, but something like Facebook's work on a semi-distributed Mercurial, which allows local commits without cloning the entire history to each client. Here's the relevant section from the linked post[1]:

But what if the central Mercurial server goes down? A big benefit of distributed source control is the ability to work without interacting with the server. The remotefilelog extension intelligently caches the file revisions needed for your local commits so you can checkout, rebase, and commit to any of your existing bookmarks without needing to access the server. Since we still download all of the commit metadata, operations that don't require file contents (such as log) are completely local as well.

[1] https://code.facebook.com/posts/218678814984400/scaling-merc...


Git snapped into focus and I've had little problems since I was told it's simple a DAG, and all the branch/tag/etc. stuff are just labels pointing to a part of the graph. After that, I've seriously never been in a situation where I was totally at a loss as to what was going on.

Also, even in SF, Internet connectivity is often flaky enough that requiring a network roundtrip for anything to deal with my files is a non-starter. (Right now, downtown SF, and we only have a single Internet provider option. It goes to ~100% packet loss about 20 times a day. We get about 10-15Mbps out to the wider Internet (but 10x more to servers nearby - probably a shitty ISP using HE or something). So it's really quite a negative to need connectivity to do things.


Yup. I spent less time learning in total how git works than I have spent trying to figure out workable branch merging in svn.

(For the latter, I ended up developing scripts that worked with patches as first class objects, so that e.g. every bug fix that needed to be applied on multiple branches would actually live in a patch file as my personal way of tracking a unit of work.)

As to distribution, I spent 5 years as a remote developer working for a CA company from London. 200+ms round trips and occasional VPN weirdness meant painful sessions with svn. Having a local repo is excellent. Large binaries that constantly change are still a problem though.


Mercurial and other DVCSs are also about simple DAGs but git has the disadvantage that you really need to learn the underlying data model and lots of terminology (trees, refs, blobs, revlog, etc)


Here's my problem with this article: Return to sanity by adopting what?

I work on several small-ish projects, and due to the leads coming and going, there are a smattering of source control solutions. On a weekly basis, I use SVN, TFS, and Git about equally.

However, the workflow supported by Git is by far the best for me: I can commit locally as much as I want, rebase the commits or just merge them with work other people are doing, bisect if I broke something, and even branch locally to experiment or when I get interrupted by a question or another task.

Neither TFS nor SVN support this at all. With both of them, I can't really check in until I'm completely done and sure I won't break the build or tests. I end up zipping my directory or fighting with patches/shelvesets that don't do what I want.

Now, does the way I want to work require a DVCS? I don't know - perhaps it doesn't in a theoretical sense. However, DVCS is the only one that actually supports that now.

So sure, we all push to the same repository and it could be a centralized system. But what would actually work? What can I switch to? I'm not abandoning Git for TFS or SVN, that's for sure. Nor Perforce which was also painful.

Yes, you convinced me I don't need the "D" in DVCS. So make a CVS that supports local branching, commiting, sane merging and diffing and show it to me! But complaining that I'm not using one of the features of my DVCS has no bearing on if I should abandon it or not.


The author gives an excellent opening summary of the fundamental architecture of common VCSes over time, and then accurately identifies the most painful points of (eg) git, but then dramatically understates the benefits of a good DVCS.

If you need binary files or otherwise very large repos, or centralized control and auditing, or non-developers to work with it, then git has major drawbacks. However there is so much code in the world—probably the majority—that works incredibly well with the full benefits of having it all in a single git repo.

Having the whole history and the ability to mutilate it at will in a very fine-grained fashion is a power tool. I like it for the same reason I like vim: the learning curve is steep, but the idea is that this is a core tool I can use across programming languages for the duration of a 40, 50, 60 year career. I don't want to conflate something as horrible as svn with a theoretical good centralized VCS, but I have to say that using svn for 5 years I had much less idea how it worked then I did after 1 year of using git. Even though git was a steeper learning curve, the concepts in it are well-defined and the low-level pieces in it have a clarity of purpose which allows some implicit reasoning about the way things work without having to go in and actually read the source. This allows the fluent practitioner to devise flexible approaches to individual problems that a less modular system can not provide.

I see commit messages and history as a fine craft. With git you can commit everything in small units that make sense only during construction as you are still exploring the problem space, then you can rebase it into cohesive changesets with deep documentation and a continuously unbroken build. A centralized VCS user could only achieve the same by withholding intermediate commits; that's okay, but it can become unwieldy for large changes. With git you can use initial commits like a master woodworker uses scrap wood, structuring them one way for building, and then discarding them in favor of finished pieces built to stand the test of time, providing massive forward benefit in terms of documentation/bisectability. Having worked on at least one project with a 7-year history, I can testify that investing in history curation pays huge dividends and can well make the difference between your successors thinking of you as a solid developer or a shotgun-programming hack.


Problem 1: Developers unaware of the full capabilities of their DVCS system (thinking they can't do work because github is down). Solution: Education.

Problem 2: Lots of versions of big blobs. Solution: git clone --depth 1 remote-url

Problem 3: Big commit history. Solution: git clone --depth 1 remote-url

Problem 4: Bad UX. Solution: Write a better UI, or find someone to do it. Linus is not a UX guy and never will be.

Also, the example workflow differences are pretty similar to the differences between the github model and the Linux contribution model. This has nothing to do with DVCS vs VCS.

So, in short, there's no actual fundamental problem with DVCS. Nothing to see here. Move along.


Of all the problems we have as an industry, I'm convinced that version control is near the bottom of the list.

Sure, 'DVCS' in the form of git or mercurial doesn't deal well with binary files, and has large repo sizes. If these are problems, you might be using the wrong tool.

But that's fine; pretty much everybody should be using something like git or mercurial. They're flexible, relatively easy-to-use, and cover almost all use cases. If you need to store large frequently-changing binary blobs, then you are probably in a minority; there's nothing really wrong with git-annex for managing that content.

But this whole thing we're seeing at the moment with people objecting to DVCS utterly mystifies me, as someone who went from CVS to SVN to git and has utterly no interest in going back.


>The people yelling at you are trying desperately to pretend like it was easy so they don’t feel like an idiot for how long it took them to figure things out. This in turn makes you feel like an idiot for taking so long to grasp the “easy” concept, so you happily pay it forward, and we come to one of the two great Emperor Has No Clothes moments in computing.

This part doesn't make any sense. If anything, people would want to convince you it was hard to not look stupid for taking long.

That being said the phenomenon clearly exists, but the reason is probably more along the lines of calling everything easy to try look smart in general.


This drove me crazy too. I do not know anyone who conspicuously takes a long time to learn something, and then claims it's easy. In fact, it's quite the opposite, as you mentioned.

Now, I have observed cases where someone inconspicuously took a long time to learn something and then claimed it was easy. This was more to elevate one's own intelligence in the eyes of another. As in, "Wow, this is really hard, but Joe said it was easy. He must be super brilliant."

The other prevalent case of this behavior is when reading mathematic proofs. Often times you'll run across a statement in a proof along the lines of: "It's easy to see that X is true," for some statement X. But, X is always an existing result, and so this is really a way to focus on the details of the proof at hand and not get bogged down in proving every single non-trivial step.

In fact, when I was a Mathlete, our coaches instructed us to make use of this approach, even if we weren't 100% sure that X was true. If we were pretty sure X was true, but we couldn't remember the proof, and we knew it would take prohibitively long to derive the proof, then we would make the statement.


They want to convince you it's easy because they feel deep inside that if they were to explain things to you they'd realize they don't understand those things themselves.

A person who understands something doesn't shout at you that it's easy (nor that it's hard). It just goes and explains it to you.


Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: