Most of the stuff in the article I already know but it's still nice to have a document that you can point folks to who aren't familiar with these things. I think one interesting point from the article is that CVS is from the 80s, SVN from the 90s, and git from the 2000s.
The past decade had no major VCS show up, but rather was characterized by the success story of git replacing everything else. Gcc's and LLVM's migration to git isn't that far in the past at all. So I wonder, as the last decade lacked a major leap in VCS technology, can VCS be considered a solved problem now?
I recently read on hn a comment saying that Linux will be the last OS. Is git the last VCS?
I think VCS is far from solved, it hasn't even started yet. So far we only have distributed VCS for sequences (mostly text / source code) and sometimes things which can be serialized unambiguously like trees and tables. But we don't have a general solution to distributed VCS.
We currently can't use distributed VCS for:
- DAGs
- Arbitrary graphs (Topology)
- 3D models (Geometry)
- Mutli-dimensional sequences: Images, Videos, etc.
- most domain specific problems / formats
- ... almost everything else
Realistically, there seems to be only two ways you could solve this: built-in VCS support in the software used to edit these files or exporting them to line-based text and dealing with them like we currently do. Building support for millions of formats (and their countless variants when created by different software and even versions of the same software) into the VCS system itself I strongly suspect would be futile. The VCS would end up much bigger and more complex than the OS.
It would take a pretty monumental effort to build, but I think a huge step forward could be made with a VCS that has knowledge of code structure, and the relationships between source files.
Large scale refactoring is such a fragile affair with git. Conceptually though, it should be possible to cleanly and concisely represent such operations as renaming functions, reordering arguments, widening type parameters, etc.
Stability of the programming language in question is of course highly desirable, if you were to build such a system. The pace of change of Go would be much more conducive to success than that of JavaScript (or TypeScript).
My experience here is much more limited, but I'm certain there are text based versions of 3D model formats. The names of these formats just doesn't come immediately to mind, but having worked with antenna models, we'd stay with text representations whenever possible, and it worked just fine.
> - Mutli-dimensional sequences: Images, Videos, etc.
> - most domain specific problems / formats
> - ... almost everything else
This is where things get tricky, and again, I don't have experience in these domains. I can say, I've worked on projects where binary assets were tracked in Perforce. Not sure how efficient it was with diffs, but I know it worked, and probably better than git, which yes, doesn't know anything about binaries.
Note: I am not a fan of Perforce, for other reasons, but I am not blind to the fact that it handles binary versioning better than git.
Note2: I will also add, SVGs are just one image format that can be stored as text. NetPBM (https://en.wikipedia.org/wiki/Netpbm) has also been around forever.
People do not want to represent most of them as text.
Some of them need high amounts of compression to be viable. Others don't have any concept of locality that one can map into a single dimension. For some, locality of changes is even an anti-feature.
> People do not want to represent most of them as text.
"People" don't care what things are stored as. You could make GUI tools save to GraphViz's text based format (or heck PlantUML[0]) for graphs at least and it would work just fine.
For assets which have to be truly binary, Perforce offers versioning of those. I know it's not open source and costs money, but if it's really a need, it's worth it.
You can do that, but you will only get some of the features (a distributed backup system with an undo function). The most important one IMO is merging and that does not work on arbitrary formats serialized to text. E.g. let two 3D artists fork a blender 3D file in GIT, both do changes and commit them. Until here everything seems fine, now they want to merge their changes ... and suddenly everything falls apart.
Edit: To clarify, if you don't need merging than you also don't need distributed VCS and centralized VCS will be sufficient. That is why I explicitly talked about distributed VCS.
You are right, in that it is very hard. That is why I said that I think distributed VCS is far from being solved.
However, I think the "fault" lies neither in the VCS nor file format. It simply shows that merging requires domain knowledge and that future VCS will need to cooperate with file formats to achieve that.
Git has a really simple view of changes. That is, none. It just saves blobs. It doesn't really know what you commit. For example it doesn't know if commits are independent of each other.
Darcs was also from the 2000s but unfortunately not fast enough to compete with git.
If you want to see what a post-git VCS might look like, read this introduction to pijul, a VCS that is based on patches and not on blobs like git:
I suspect the theory behind Pijul is where VCS development goes from being a practice (build a pragmatic solution to everyday problems and iterate) to having a solid theoretical foundation. It'll be exciting to see how it evolves from here.
Git is a funny one. I do believe the data model at the heart of git has a good change at being the last VCS data model. Apart from little details like the choice of hash function, there is little wrong with it. Git's success despite its perception of being difficult to learn and use is a testament to that.
But we are long overdue a better porcelain for git. The standard CLI it comes with is just not good. Its level of abstraction is too high in a few places where it tries to make things seem not-too-dissimilar to "old world" VCS systems. And its too low in other places, for example tags are a great, flexible tool but the vast majority of users don't use them or need them, they just need to mark releases. The fact I need to write something like "git tag --annotate --sign" instead of just "git release" is a silly.
I think a concerted effort to take what git is and what we do with (and forget old VCS systems) and reinvent the interface will create the closest to a successor to git. The only porcelain that is vaguely on the right track here is magit.
There are a few things wrong with the git data model, one is lack of chunking, which makes versioning large files space inefficient. Some backup systems like borg and restic have fixed that issue in the git data model, but suffer from lack of chunking in directory listings, I assume git would have that issue too.
Maybe it's better to say git could be the last SCM then. A VCS is much more general and git certainly won't be the last version control system for things that aren't source code.
Actually if you want to give credits were it is due, you should mention bup (https://github.com/bup/bup) that arguably came up with the idea of content-defined-chunking for backups, and even stores it in git.
* Graham Keeling's Masters thesis on content-defined chunking used in his Burp backup program (2014): https://burp.grke.org/images/burp2-report.pdf. Burp dates back to 2011 at least. Section 6 of the thesis has an interesting contemporary comparison of rsync, bup, burp, bacula, rdiff-backup and a few others.
* A Linux Weekly News review of Bup from 2010, with comments on strengths and weaknesses of the other approaches then available: https://lwn.net/Articles/380983/
Digging these out was a trip down memory lane. I tried most of them at various times, and now use borg-backup (https://borgbackup.readthedocs.io).
If anything I'm not sure Linus is held in enough esteem outside of tech circles.
I wish all the time that more of the people I know outside of software could benefit from something like Git. It's unbelievable how powerful a tool it is, and when I watch people doing awkward and error prone simultaneous-editing and remote synching etc, it just doesn't compare.
But then git is too abstract for many. Your files are a representation of a state of combined changes, the files themselves are not the owners of their content.
I even had to have a huge debate with a developer once about how storing a git repo in Dropbox folder so they could have same exact state across two machines was a terrible idea, and already solved with Git itself...
And then how few people also know how many places the Linux kernel or derivatives live, or even that many things they use are running something based on Linux.
What an incomprehensibly huge achievement on both counts!
>What an incomprehensibly huge achievement on both counts!
He is an amazing dev, don't get me wrong. BUT he had lots of luck starting from not choosing the name Freax, choosing the same license as his compiler (GPL), and a professor that did no understood free-software (Tannenbaum and his minix), the legal battle of the BSD's...and maybe a too expensive and ~slow solaris, and a really good but proprietary VCS (Bitkeeper) as a idea-blueprint for git.
True, although when you compare for example to people behind large brands with frequently huge acclaim, and then glimpse inside their (frequently) empty shell that is a marketing PR machine where all the real innovation, manufacturer and logistics is all bought, I think it's ok to give acclaim to somebody lucky, with prior art and inspiration from people who's shoulders they stand on, simply for the outcome that emerged.
What we call "luck" is very unevenly distributed in the world.
Linus is also lucky that he had decent nutrition in his childhood, and that Finland was not at war with Russia during his university years. He had the luck to be born to parents that were able to prioritize education over subsistence. So yes, he was lucky.
But in your previous comment, you also call Linus's choices "luck". That doesn't read fairly to me.
He had the luck to make good choices? Maybe in retrospect, but you must acknowledge that there's much more than luck involved.
Of the tens of thousands of people with enough "luck" to arrive at roughly the same place in 1991 -- by which I mean smart, well-fed, decent academic/life schedule with adequate free time, few or manageable commitments like spouses/dependent children or parents, broad and deep exposure to Unix, access to USENET and the Internet -- Linus remains exceptional.
My closing parenthetical was not a counterpoint to your message. It was to frame my perspective: I don't worship, but I do respect.
Perhaps I'm not communicating well today. I apologize if so.
Among the tens of thousands who were in the right place, single-digit dozens made the attempt, and only Linus succeeded in such a visible way. If you want to argue that he was merely the lucky one among dozens, then you still have a lot of extraordinary evidence to present, but you might have an interesting story to tell.
Your original comment reads like a dismissal because Linus drew inspiration from the world he lived in, and was just "lucky".
The point is that he did the work, he attracted the community, he managed the project. None of that was luck. And none of that is highly correlated with being an "amazing dev". Of course he stood on the shoulders of giants. Of course he learned from (and drew from) the environment around him. Of course Linux would be a forgotten academic project without the GNU userland (and license, maybe). Etc etc.
We all lived in that soup. Linus did the hard work and made something from it. I thought your comment was highly uncharitable in its disregard for all of things that made Linux successful.
Git is at a vastly different level of abstraction than most of the population are used to (or capable of). When Linus made git, concepts like references, pointers, indirection, graphs, hashing, indexing etc. were already as obvious to him as the sky being above the ground. Even some programmers struggle to operate at this level. It does hurt to see non-technical people inventing their own terrible VCSs, though ("Copy of Important Document 2020-11-21 (2) (Updated PB).docx")
>It does hurt to see non-technical people inventing their own terrible VCSs, though ("Copy of Important Document 2020-11-21 (2) (Updated PB).docx")
To be honest, for most purposes, shared docs do revision control well enough for the purposes of text docs and presentations. I've used GitHub for some larger collaborations, but it's really overkill for most day-to-day docs.
>It does hurt to see non-technical people inventing their own terrible VCSs, though ("Copy of Important Document 2020-11-21 (2) (Updated PB).docx")
I don't think that is necessarily due to not be able to use git (as in not competent to use it) but not able to use it due to restrictions in their workplace or even just unfamiliarity with the possibility of such a tool existing.
Well that and out of the box git isn't very user friendly, even for developers who are used to dealing with command line tools. To this day in my experience most devs have a relatively shallow understanding of git and often need assistance for anything more complex than commit and push. I can't count the number of times I've had to assist a colleague for what ended up being a relatively simple merge.
I think you'd need an extremely good GUI if you wanted to market git to the masses.
Hahaha, watch as colleagues stare blankly when you ask them to look at the reflog after botching a rebase and force pushing... and loads of other tools like bisect and blame heavily under-utilised, custom git hooks... The list of under-utilised git commands is long, and while perhaps a sign of a poor UX, if you use the features it really is rare that you are surprised or that you cannot return a repo to a good state, regardless of what you do.
I do think that git's UI is a good case study on how not to design a command line interface. It's improved over the years but I remember that a decade ago the contrast was stark if you compared, say, mercurial to git.
Git always felt more like you were exposed a raw low level API on the command line with a few shell macros on top to make it a bit more palatable which is effectively what it was for a long time. I suppose it fits well within the unix philosophy of clunky but very flexible and not very opinionated tools.
I'm a programmer, I understand all those things well, but I struggle to use git beyond pull, commit, push.
Every time I do something more than that I end up needing to google error messages, and usually end up copying my changes to a temp directory, doing a hard reset, and starting over.
This way, clone (instead of pull) will make a local copy on your computer.
This is your local working copy. When you are ready to save, you can “snap” shot it, and leave a little message. And you can do as much of this as you want.
Then when you’re ready to merge it back to the primary database, then you can “commit” it.
The whole concept of pull/commit/push seems nonsensical to me. As a layman, I can’t understand the importance of pull vs. push in this context. And this makes it difficult to explain to someone non-technical.
While I like Linus as much as anyone, he had a large impact in a small field. Even then both linux and git are community projects first and foremost, Torvalds is just the BDFL but the projects would continue very well without him.
Almost nobody can name influential luminaries outside their field. Who outside the field of fiber communications has ever heard of Kao and Hockham? How many HN commenters could tell what Reynold Johnsons' team invented without having to google it? Those are just some examples from software-adjacent fields that we all use every day. Imagine how many fields there are out there that you'd never think of.
In any case, what you mention as "counting" is a very narrow approach. If you mean "has to appear before congress for questioning" then having a lot of money helps. There are hundreds if not thousands people doing impactful work that you will never read about. There is not enough time in the day to keep up with all the work done in modern society but that does not make their work "count" any less.
I don't think that's true, he made a kernel NOT an OS, and the last VCS...we will see. There is zero proof that any software survives forever...well ok, COBOL maybe.
By some peoples' standards, I'm a young un', but I can distinctly recall being 100% remote in 2000, working on adding real-time to the Linux kernel. We were using BitKeeper, and had no central office. We'd meet at each others' homes as needed, but otherwise just worked from our home offices (it was at this point I moved out of a house I rented with two other guys still in college (I had just graduated), and got my own 2 bedroom apartment to have a home office).
Now that I've been reminded of how much more productive and happy I can be working from home, I'm working on making 100% work from home a reality.
I have hard time explaining exactly how our async way of communicating makes the whole wfh situation so much more pleasant (we work via GitHub) to my non-developer colleagues. I feel they could benefit a lot and their constant complaining about telco filled days could be much reduced. But how could they do this when they email office docs around (i.e. for grant applications or IP) and fill out bespoke systems for their project management steps...?
Shared docs (e.g. GSuite or Office365) help. I rarely mail around documents. And even to the degree that you also have calls, a shared Agenda is a good start.
Personally, I don't much care for using things like Trello or GitHub for non-software tasks and generally resist doiung so.
I push for those, but some people lost some data because at point OneDrive just decided it would incorporate someone's desktop folder (seriously, why?!). And occasionally it errors with cryptic error messages like "can't open document" no further info. People fear it and I understand them actually. It must really be flawless, mess up 3 hours of work for someone and it will take years before you can try and pry them away from email-only again.
I actually don't have any real experience with Microsoft's cloud products, just Google's. They're pretty flawless IMO and have just the right set of features for me in general. Certainly I've lost far more work over the years with desktop products crashing or just accidentally deleting the wrong file.
I have been doing remote software development, with commercial software, since around 2000, and back in the 90's if you could afford modem calls tarifs, with stuff like Novel Netware over WAN.
My dad mentioned that he did his graduate work from home in the early 1980s (maybe late 70s?) using a 32-column terminal that he made from an Intel 8008 and a TV. He apparently bypassed the color demodulator in the TV to get a crisper B&W image. He dialed in to the University computer with a 300 baud modem.
Yes, except management did not approve because they couldn't check to make sure you were wearing your required white shirt, dark tie, and work-appropriate slacks.
Even in the office they used terminals connected to mainframes over serial cables.
Webcams would have been so handy back then. I bet some hacker managed to sent a picture of him in correct attire every now and then. Proving the whole thing is BS of course.
Absolutely it happened, but it was uncommon. My dad had a Teletype at home in the 1970s, with a 110-baud acoustic coupler. Later upgraded to an ADM-3A and speed went to 300-baud.
I credit this for sparking some interest in programming, as I was able to use these to play around with small FORTRAN and BASIC programs.
The Bell Labs folks had phone lines to their homes, so they could have terminal access from home I believe. Or at least I think Brian Kernighan mentioned he did.
This is a common sentiment but I think it glosses over the upside of cloud computing. Yes, the managed services generally lock you in, but cloud computing has given us cheap and robust VMs in world-class infrastructure. If you want to keep your project portable, you only need to avoid the vendor-specific managed services.
As the FSF put it, there is no cloud, just other people's computers. Using someone else's computers doesn't have to mean lock-in.
One way in which I've often thought about it is that whenever you build something on a foundation over which you have no (or little) say or control, then at least work to make it possible to be able to substitute that foundation for an alternative in the future. Or at the very least, be aware of the risks that come with such an arrangement and mitigate them appropriately.
Even beyond software or cloud-based tech, this can be useful to think about and apply even more broadly in life, for example when consideing any variety of real or virtual tools, platforms or services that you may depend on.
As you say, deliberately committing to lock-in isn't necessarily a mistake. The Amazon Aurora managed database solution looks very impressive, for instance, and I imagine it could make good sense to use it despite the lock-in. In that instance, the service is compatible with MySQL/Postgres, but I imagine it could still be a considerable challenge to move away from it (moving from one high-availability Postgres-based database solution to another).
The upsides may outweigh the downsides of lock-in, but the upside needs to be substantial.
I was thinking the worst case would be something like Google Cloud Datastore, a proprietary 'managed-only' database that isn't compatible with any existing product. I think there was an attempt to make a Free and Open Source drop-in replacement, but I don't think it got far.
AWS North Virginia offers t4g.micro instances (AArch64 VMs with 1GB of RAM) for $0.0084 per hour, which works out as just under $74/year. [0] (Not counting data-store cost.) It's backed by highly reliable server-grade hardware, and is connected to the high-speed AWS network. Doesn't that count as cheap?
I agree that 'lower tier' providers may be even cheaper, but they still count as cloud computing, so I think my point stands. As a slight off-topic, I've found that the cheaper providers generally offer an inferior product. You can expect to encounter poorly considered features, broken features, or expected features missing entirely.
i'm mostly comparing to low-end on-prem hardware. $74 isn't a lot; but then, it's still much more expensive than something like a raspberry pi 4 with a USB SSD. also, the network isn't that relevant when going this low, since you won't be serving anything substantial out of it. you can get the price lower by using AWS features, but then it isn't 'just a VM' anymore.
IMHO getting into the cloud without using any cloud-native features isn't money well spent. you'll be much better off avoiding the VMs as much as possible - which is where your point about on-demand pricing makes sense for elastic workloads.
> i'm mostly comparing to low-end on-prem hardware
Running your own servers properly is a considerable undertaking. If you're just playing around, that's a different matter. Power-management equipment alone would cost more than what you'd spend on AWS. Even if you somehow got all your hardware for free, the time spent setting up would cost far more than AWS are asking.
> $74 isn't a lot; but then, it's still much more expensive than something like a raspberry pi 4 with a USB SSD
Serious cloud computing instances are backed by server-grade hardware, with automatic recovery systems to help cope with hardware failures if they do occur, in a dedicated facility with measures in place for everything from burglary to fire to power outages to network outages. The major cloud providers can offer all this for a few pounds/dollars a month due to economies of scale.
If it's your intent to just play around, a Raspberry Pi might be a fine choice, but a Raspberry Pi plugged into the wall isn't close to a proper server setup.
> the network isn't that relevant when going this low, since you won't be serving anything substantial out of it
Firstly, that's not necessarily true. If you're running something like an SFTP/FTPS server, you might make good use of the connection speed, despite only using a lightweight VM. Even if you're hosting a proper Web 2.0 site, an efficient architecture goes a long way. HackerNews and StackOverflow use famously few servers, for instance.
Secondly, a lot of ISPs are hostile to web hosting from home connections, so a Raspberry Pi might not be an option even if you're only hosting a 'toy' site.
Thirdly, it's not just performance, it's also reliability. AWS invest in redundancy and robustness. A home/small office connection isn't in the same league.
> getting into the cloud without using any cloud-native features isn't money well spent
If you can get a serious virtual/physical server for a better price elsewhere, that might be worth considering, sure. GitHub does this - they're hosted with a company called Carpathia, rather than with one of the major cloud providers. [0] The 'cloudy features' of a platform like EC2 are powerful though, features like automatic recovery from hardware failure, the ability to easily change the hardware spec of your instance with just a reboot of downtime, the ability to easily transform your virtual HDD into a virtual SSD, etc.
To me this title feels a bit like ... paving the way paved the way to paving the way...
Open source software paved the way to many things, not sure why we subselect this one thing. It is like saying modern agriculture paved the way to the prescription opioid epidemic.. I mean it did, but why do we care about connecting these two things?
We can also say romans paving the way paved the way to open source software paving the way to remote software development paving the way to a covid-19 vaccine to pave the way back on location development. Why would we though.
I have a teammate that loves to send messages like "I have a question about xyz, hey @my-handle can you jump on a Zoom call to discuss this?". I freaking hate this! Nobody in the open source can do this!
I see this quite a bit. I also see a lot of people waiting around until the next Zoom call to present issues that should and could have been dealt with over Slack/Email when they first arose. I'm not sure if it's a matter of people wanting brownie points for speaking or if there is an intention to avoid owning such issues by blurting about them in a time constrained meeting where there isn't an opportunity to discuss details, impact, and next steps.
The past decade had no major VCS show up, but rather was characterized by the success story of git replacing everything else. Gcc's and LLVM's migration to git isn't that far in the past at all. So I wonder, as the last decade lacked a major leap in VCS technology, can VCS be considered a solved problem now?
I recently read on hn a comment saying that Linux will be the last OS. Is git the last VCS?