But your solution isn't going to help much. In twenty years, do we even have a computer that is capable of reading the old floppy disks? Most computers these days don't come with a floppy drive and the new mac mini doesn't even have a CD drive. Sure USB may be around, but twenty years ago you would have said the same thing about the 3.5inch floppy.
Really if you want to store source code like that, you would have to print it physical paper and store it in a massive archive. And how would you handle changes?
Do you want to print everything each time you do a svn commit? Or just the diff (yeah, that is going to be fun to type in)?
A central server, properly organized and upgraded would properly be the best, but even so it is never going to be very good. In a world were the price of data is very close to nothing, good metadata seems increasingly expensive.
Just as hard with software, whatever version control system is used...
Who is going to remember the culture of old-but-still-running Ruby on Rails apps in 30 years time, when even Node.js isn't fashionable any more ;)
That would include BitKeeper --- copies were available for a few years at no cost to Linux kernel developers, but only on increasingly restrictive terms, which McVoy ultimately wound up revoking altogether. IIRC, Linus released the first embryonic version of git within weeks after McVoy withdrew the free-as-in-beer version of BitKeeper.
The resulting debacle was fairly ridiculous; Linux chastising Andrew Tridgell, continued flamewars about using a closed-source product.
What I find interesting here, though, is just how much hot water McVoy landed in. He gave away free licenses to Linux developers, then when someone in that community started reverse engineering his product with the intent to replace it, he revoked the free license, leading Linus to develop a replacement anyway -- one that has largely consumed the vast majority of BitKeeper's target market.
People often complain about the idea of using version control for large binary files, as if it is unreasonable to want such a thing, and that as a point of principle version control systems should contain only text files, and the fact that many version control systems support this poorly is proof that you don't want it anyway. But there are actually people who create, with their own hands, large binary files, often of the completely-unmergeable variety, and they deserve version control just as much as the programmers do.
(And then once you have a system that works well for them, you can then use it to solve all manner of problems that might previously have involved storing files in public folders, mailing them round, or maybe just waiting for them to compile again. No need for any of that crap any more - just check the files in, they're there forever, and you can get them back quickly.)
It is, in the sense that a decentralized VCS is, essentially, a superset of a centralized one.
Blobs are certainly still an issue, though orthogonal to distribution (I don't think you intended to imply it was related, but it could be read as if you did).
Distributed systems rely on allowing people to (in effect) create multiple versions of the same file, and then merge them all together later. But it's very rare that binary files are mergeable! And if the file can't be merged, the distributed approach won't work. People will step on one another's changes by accident, and people will have to redo work.
The usual solution is simply not to allow multiple versions to exist: enforce some kind of locking system, so that each editor has to commit their changes before the next one can have a go. But now you need some centralized place to store the locking information...
I think that the GP used the phrase correctly, but not being a native English speaker myself, I'm not entirely sure.
But the author addresses the point anyway:
> I’m not recording the first time anyone made the astonishing thing, but the first time it was productised and became popular.
But that said, git really isn't very similar to bitkeeper except insofar as it enables distributed development. Both model development as a forest of independent developer trees which communicate with each other through merges. But bitkeeper is still a traditional centralized server keeping a bunch of delta'd files.
The decision bitkeeper made to keep files in SCCS format was of course not revolutionary, but tells you quite a bit about the target market (people who had makefiles that relied on implicit commands like get just working). They went to extra effort to make it look just a bunch of delta'd files.
Is anyone doing a decent work of synthesis for the history of computing in general? Something like Judt/Postwar or TARUSKIN/History of Western Music?
I just became a little lightheaded, and my vision has gone all blurry and grey.
I hope you're claiming some sort of disability compensation from that company over the long term harm this must have caused you.
But all-in-all, it was quite serviceable for a 75,000 line C++ project. Just don't try to do branches.
Thank goodness we didn't need to do any branching.
So perhaps we can think of RCS as first gear, VS as second gear, CVS/SVN as third gear, and git as fourth gear?
cvs was great and all, but we couldn't version our directories, branching and merging was a mess, the wire protocol was hard to use, it had a ton of security holes and the storage format took up too much space. So, Subversion was created as a way to do a better cvs, without thinking about the larger intrinsic issues with the current state of version control. Thus, missing the whole 'distributed' boat and letting Linus eat our lunch.
Note: I worked at CollabNet during this time and watched a lot of the discussions around Subversion. I have great respect for the Subversion developers. Karl is an awesome and brilliant guy. It was a bubble, technology isn't anywhere where it is today and I think we were all misguided at that time. We all made a lot of mistakes.
I wrote a long'ish blog post detailing why it is a failure:
I think what's needed is an intelligent (as in AI) merge mechanism. Right now, if two people are adding two different features to a set of files, then merging those changes is error-prone and requires a lot of manual work.
If this ever gets perfected and automated, it will be a huge milestone.
We got the basics working, including simple diffs. One goal was to link the same variables between two versions; we did not manage to make that work, but had a very hacky approach that looked like it worked.
Doing any sort of merging with this data is nontrivial. We were planning to implement it, but unfortunately ran out of time. Still, we did have a cute demo of some commits and some diffs in the end--it actually worked a little, which is much more than I expected starting out.
However, despite not implementing merging, we did throw in some nice features. Particularly, we were able to identify commits that did not change the function of the code (whitespace and comment changes only) and mark them. This was very easy but yet still useful, and a good indicator of the sorts of things one could do with a system like that.
After the hackathon, one of my friends found some papers about a system just like ours. I don't remember where they were from, but if you're interested you could look for them. (I think the phrase "semantic version control" is good for Googling; that's what we called our project.)
Overall I think that it's a neat domain but in hindsight maybe it was a little too much for 18 hours of coding :) We did have fun, and it was cool, so I have no regrets.
One potential reason no to store code as text is that there are many equivalent programs that differ only in inconsequential text. A perfect example is trailing whitespace.
There are also some benefits of storing code as an AST. For one, it would make it trivial to identify commits that did not change the actual code--things like updated comments. This would help you filter out commits when looking for bugs. Another benefit would be better organized historical data: in a perfect system, you would be able to look at the progress of a function even if it got renamed part of the way through.
Exactly! Try versioning data with ChronicDB:
I used to work on a large Lisp system where our entire source control system was provided by Emacs versions and locking on a central NFS server, with some explicit branching support in the build code, and with version freezing done by copying directories. I can hear you gagging, dear reader, but actually it didn't work that badly, except that it didn't handle distributed development.
As everything, to begin with there was no software.
“At my first job, we had a Source Control department. When you had your code ready to go, you took your floppy disks to the nice ladies in Source Control, they would take your disks, duly update the library, and build the customer-ready product from the officially reposed source.” (Miles Duke)
You're right that it talks about floppies, so it can't have been.
Enjoying the few more comments the article has generated with memories of those earlier days. Would love to see the earlier astonishments written up more precisely.
cd dir should propose to update it.
save file should commit it and push it in tmp branch.
"Save" can tag a state as interesting.
But please don't conflate "cd" and "update" - I rarely cd (emacs), but I really don't want to grab partial changes from other branches just because I'm working in a directory. Notification that a file has been changed in other commits would be fine, but "a directory" is a poor heuristic for unit-of-change.
Too much noise/versions is ALSO bad. I'm quite happy with having to tell git every hour or so "this is a version I'd like to go back to"; and continuous zfs snapshots when I want to undo an unplanned delete of the last 20 minutes of work and immediate IDE crash (otherwise, ctrl-z is just as good).
Right tool for the right job.