Hacker News new | past | comments | ask | show | jobs | submit login
Linus Torvalds: Git proved I could be more than a one-hit wonder (techrepublic.com)
783 points by DarkCrusader2 on Nov 1, 2019 | hide | past | favorite | 508 comments



To me the beauty of git stems from the fact that it is an implementation of a functional data structure. It‘s a tree index that is read-only, and updating it involves creating a complete copy of the tree and giving it a new name. Then the only challenge is to make that copy as cheap as possible - for which the tree lends itself, as only the nodes on the path to the root need to get updated. As a result, you get lock-free transactions (branches) and minimal overhead. And through git‘s pointer-to-parent commit you get full lineage. It is so beautiful in fact that when I think about systems that need to maintain long-running state in concurrent environments, my first reaction is ”split up the state into files, and maintain it through git(hub)“.


". . . unlike every single horror I've ever witnessed when looking closer at SCM products, git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful. . . .

"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

--- Linus Torvalds, https://lwn.net/Articles/193245/


That last comment is absolutely golden. Once upon a time I had the privilege to spend a few years working in Swansea University's compsci department, which punches above its weight in theoretical computer science. One of the moments that made me the programmer I am today (whatever that's worth) came when I was meeting with the head of the department to discuss a book he was writing, and while we were discussing this very point of data vs code, I said to him, realising the importance of choosing the right structure, "so the data is central to the subject" (meaning computer science in general" — to which he replied emphathically that "the data IS the subject". That was a lightbulb moment for me. From then on I saw computer science as the study of how data is represented, and how those representations are transformed and transported — that's it, that basically covers everything. It's served me well.


That's great. It reminds me of a comment by Rich Hickey, the inventor of Clojure:

" Before we had all this high falutin' opinions of ourselves as programmers and computer scientists and stuff like that, programming used to be called data processing.

How many people actually do data processing in their programs? You can raise your hands. We all do, right? This is what most programs do. You take some information in, somebody typed some stuff, somebody sends you a message, you put it somewhere. Later you try to find it. You put it on the screen. You send it to somebody else.

That is what most programs do most of the time. Sure, there is a computational aspect to programs. There is quality of implementation issues to this, but there is nothing wrong with saying: programs process data. Because data is information. Information systems ... this should be what we are doing, right?

We are the stewards of the world's information. And information is just data. It is not a complex thing. It is not an elaborate thing. It is a simple thing, until we programmers start touching it.

So we have data processing. Most programs do this. There are very few programs that do not.

And data is a fundamentally simple thing. Data is just raw immutable information. So that is the first point. Data is immutable. If you make a data structure, you can start messing with that, but actual data is immutable. So if you have a representation for it that is also immutable, you are capturing its essence better than if you start fiddling around.

And that is what happens. Languages fiddle around. They elaborate on data. They add types. They add methods. They make data active. They make data mutable. They make data movable. They turn it into an agent, or some active thing. And at that point they are ruining it. At least, they are moving it away from what it is."

https://github.com/matthiasn/talk-transcripts/blob/master/Hi...


Many decades ago I was coaxed into signing-up for an APL class by my Physics professor. He was a maverick who had managed to negotiate with the school to create an APL class and another HP-41/RPN class with full credit that you could take instead of FORTRAN and COBOL (yeah, it was a while ago).

One of the things he pounded into everyone's heads back then was "The most important decision you have to make is how to represent the problem. Do that well and programming will be easy. Get it wrong and there isn't a force on this world that will help you write a good solution in any programming language."

In APL data representation is of crucial importance, and you see the effects right away. It turned out to be he was right on that point regardless of the language one chose to use. The advise is universal.


I also really like this quote and his has influenced the way I work a lot. When I started working professionally as programmer I sometimes ended up with quite clunky data structures, a lot of expensive copying (C++ :-)) and difficult maintenance.

But based on this, I always take greatest care about the data structures. Especially when designing database tables, I keep all the important aspects of it in mind (normalization/denormalization, complexity of queries on it, ...) Makes writing code so much more pleasurable and it's also key to make maintenance a non-issue.

Amazing how far-sighted this is, when considering that most Web Apps are basically I/O - i.e. data - bound.


A mentor often repeated the title of Niklaus Wirth’s 1975 book “Algorithms Plus Data Structures Equals Programs”.

This encapsulates it for me and informs my coding everyday. If I find myself having a hard time with complexity, I revisit the data structures.


I think part of the confusion stems from the word “computer” itself. Ted Nelson makes the point that the word is an accident of history, arising because main funding came from large military computation projects.

But computers don’t “compute”, they don’t do math. Computers are simplifying, integrating machines that manipulate symbols.

Data (and its relationships) is the essential concept in the term “symbolic manipulator”.

Code (ie a function) is the essential concept in the term “compute”.


But what is math, if not symbolic manipulation? Numbers are symbols that convey specific ideas of data, no? And once you go past algebra, the numbers are almost incidental to the more abstract concepts and symbols.

Not trying to start a flamewar, I just found the distinction you drew interesting.


Well, the question of whether there's more to math than symbolic manipulation or not was of course one of the key foundational questions of computer science, thrashed out in the early 20th century before anyone had actually built a general computing machine. Leibniz dreamt of building a machine to which you could feed all human knowledge and from which you could thus derive automatically the answer to any question you asked, and the question of whether that was possible occupied some of the great minds in logic a hundred years ago: how far can you go with symbolic manipulation alone? Answering that question led to the invention of the lambda calculus, and Turing machines, and much else besides, and famously to Godel's seminal proof which pretty much put the nail in the coffin of Leibniz' dream: the answer is yes, there is more to math than just symbolic manipulation, because purely symbolic systems, purely formal systems, can't even represent basic arithmetic in a way that would allow any question to be answered automatically.

More basically and fundamentally, I'd suggest that no, numbers aren't symbols: numbers are numbers (i.e. they are themselves abstract concepts as you suggest), and symbols are symbols (which are much more concrete, indeed I'd say they exist precisely because we need something concrete in order to talk about the abstract thing we care about). We can use various symbols to represent a given number (say, the character "5" or the word "five" or a roman numeral "V", or five lines drawn in the sand), but the symbols themselves are not the number, nor vice versa.

This all scales up: a tree is an abstract concept; a stream is an abstract concept, a compiler is an abstract concept — and then our business is finding good concrete representations for those abstractions. Choosing the right representations really matters: I've heard it argued that the Romans, while great engineers, were ultimately limited because their maths just wasn't good enough (their know-how was acquired by trial-and-error, basically), and their maths wasn't good enough because the roman system is a pig for doing multiplication and division in; once you have arabic numerals (and having a symbol for zero really helps too BTW!), powerful easy algorithms for multiplication and division arise naturally, and before too long you've invented the calculus, and then you're really cooking with gas...


It involves symbolic manipulation, but it’s more than that. Math is the science of method. Science requires reason.

If one were to say computers do math, they would be saying computers reason. Reason requires free will. Only man can reason; machines cannot reason. (For a full explanation of the relationship between free will and reason, see the book Introduction to Objectivist Epistemology).

Man does math, then creates a machine as a tool to manipulate symbols.


You make some interesting points. There was a time I was intrigued by Objectivism but ultimately it fell flat for me. I sort of had similar ideas before encountering it in the literature, but these days I'm mostly captivated by what I learned from "Sapiens" to be known as inter-subjective reality, which I also mostly arrived at through my own questioning of Objectivism. I'm not sure we can conceive of any objective reality completely divorced from our own perceptive abilities.

> Reason requires free will

isn't it still kind of an open question whether humans have free will, or what free will even is? How can we be sure our own brains are not simply very complex (hah, sorry, oxymoron) machines that don't "reason" so much as react to or interpret series of inputs, and transform, associate and store information?

I find the answer to this question often moves into metaphysical, mystical or straight up religious territory. I'm interested to know some more philosophical approaches to this.


Your comment reminds me of the first line from Peikoff’s Objectivism: The Philosophy of Ayn Rand (OPAR): “Philosophy is not a bauble of the intellect, but a power from which no man can abstain.” There are many intellectual exercises that feel interesting, but do they provide you with the means—the conceptual tools—to live the best life?

If objective reality doesn’t exist, we can’t even have this conversation. How can you reason—that is, use logic—in relation to the non-objective? That would be a contradiction. Sense perception is our means of grasping (not just barely scratching or touching) reality (that which exists). If a man does not accept objective reality, then further discussion is impossible and improper.

Any system which rejects objective reality cannot be the foundation of a good life. It leaves man subject to the whim of an unknown and unknowable world.

For a full validation of free will, I would refer you to Chapter 2 of OPAR. That man has free will is knowable through direct experience. Science has nothing to say about whether you have free will—free will is a priori required for science to be a valid concept. If you don’t have free will, again this entire conversation is moot. What would it mean to make an argument or convince someone? If I give you evidence and reason, I am relying on your faculty of free will to consider my argument and judge it—that is, to decide about it. You might decide on it, you might decide to drift and not consider it, you might even decide to shut your mind to it on purpose. But you do decide.


Last idea, stated up front: sorry for the wall of text that follows!

It's not that I reject the idea of objective reality–far from it. However I do not accept that we can 1) perfectly understand it as individuals, and 2) perfectly communicate any understanding, perfect or otherwise, to other individuals. Intersubjectivity is a dynamical system with an ever-shifting set of equilibria, but it's the only place we can talk about objective reality–we're forever confined to it. I see objective reality as the precursor to subjective reality: matter must exist in order to be arranged into brains that may have differences of opinion, but matter itself cannot form opinions or conjectures.

I'll assume that book or other studies of objectivity lay out the case for some of the statements you make, but as far as I can tell, you are arguing for objectivity from purely subjective stances: "good life", "improper discussion"... and you're relying on the subjective judgement of others regarding your points on objectivity. Of course, I'm working from the assumption that the products of our minds exist purely in the subjective realm... if we were all objective, why would so much disagreement exist? Is it really just terminological? I'm not sure. Maybe.

Some other statements strike me as non-sequiturs or circular reasoning, like "That man has free will is knowable through direct experience". Is this basically "I think, therefore I am?" But how do you know what you think is _what you think_? How do you know those ideas were not implanted via others' thoughts/advertisements/etc, via e.g. cryptomnesia? Or are we really in a simulation? Then it becomes something like "I think what others thought, therefore I am them," which, translated back to your wording, sounds to me something like "that man has a free will modulo others' free will, is knowable through shared experience." What is free will then?

"free will is a priori required for science to be a valid concept" sounds like affirming the consequent, because as far as we know, the best way to "prove" to each other that free will exists is via scientific methods. Following your quote in my previous paragraph, it sounds like you're saying "science validates free will validates science [validates free will... ad infinitum]." "A implies B implies A", which, unless I'm falling prey to a syllogistic fallacy, reduces to "A implies A," (or "B implies B") which sounds tautological, or at least not convincing (to me).

I apologize if my responses are rife with mistakes or misinterpretations of your statements or logical laws, and I'm happy to have them pointed out to me. I think philosophical understanding of reality is a hard problem that I don't think humanity has solved, and again I question whether it's solvable/decidable. I think reality is like the real number line, we can keep splitting atoms and things we find inside them forever and never arrive at a truly basic unit: we'll never get to zero by subdividing unity, and even if we could, we'd have zero–nothing, nada, nihil. I am skeptical of people who think they have it all figured out. Even then, it all comes back to "if a tree falls..." What difference does it make if you know the truth, if nobody will listen? Maybe the truth has been discovered over and over again, but... we are mortal, we die, and eventually, so do even the memories of us or our ideas. But, I don't think people have ever figured it all out, except for maybe the Socratic notion that after much learning, you might know one thing: that you know nothing.

Maybe humanity is doing something as described in God's Debris by Scott Adams: assembling itself into a higher order being, where instead of individual free will or knowledge, there is a shared version? That again sounds like intersubjectivity. All our argumentation is maybe just that being's self doubt, and we'll gain more confidence as time goes on, or it'll experience an epiphany. I still don't think it could arrive at a "true" "truth", but at least it could think [it's "correct"], and therefore be ["correct"]. Insofar as it'll be stuck in a local minimum of doubt with nobody left to provide an annealing stimulus.

I will definitely check out that book though, thanks for the recommendation and for your thoughts. I did not expect this conversation going into a post about Git, ha. In the very very end (I promise we're almost at the end of this post) I love learning more while I'm here!


One problem is that, at least for certain actions, you can measure that motor neurons fire (somewhere in the order of 100ms) before the part of your brain that thinks it makes executive decisions.

At least for certain actions and situations, the "direct experience" of free will is measurably incorrect.

Doesn't mean free will doesn't exist (or myabe it does), but it's been established that that feeling of "I'm willing these actions to happen" often times happens well after the action has been set into motion already.


Starting at 1:12:35 in this video, there is a discussion of those experiments with an academic neuroscientist. He explains why he believes they do not disprove free will.

https://youtu.be/X6VtwHpZ1BM


Oh, thank you for this :) Because I won't deny, a friend originally came to me with this theory and it has been bugging me :)


There is a lot here. For now, I will simply assert that morality, which means that which helps or harms man’s survival, is objective and knowable.

I’ve enjoyed this discussion. It has been civil beyond what I normally expect from HN. From our limited interaction, I believe you are grappling with these subjects in earnest.

This is a difficult forum to have an extended discussion. If you like, reach out (email is in my profile) and we can discuss the issues further. I’m not a philosopher or expert, but I’d be happy to share what I know and I enjoy the challenge because it helps clarify my own thinking.


Yeah, I expect we're nearing the reply depth limit. Thanks for the thought provoking discussion! Sent you an email. My email should be in my profile, too, if anyone wants to use that method.


In Spanish the preferred name is "ordenador" which would translate to something like "sorter" or "organizer machine".


That's in Spain. In American Spanish computador/a is most often used: http://lema.rae.es/dpd/srv/search?key=computador

There is also informática/computación; both Spanish words to refer to the same thing but used in Spain/America.

I guess that literally they'd be something like IT and CS.


Good points from both, indeed it's a country thing not a language thing. My bad!


In French, it's the same; it's about "putting things in order", similar in concept to an ordonnateur:

https://en.wikipedia.org/wiki/Ordonnateur


Similarly for French - "ordinateur"

https://www.dictionnaire-academie.fr/article/A9O0665

A search for "computer" does not find anything; though I suspect many French actually use computer not ordinateur.


No we don't. We use ordinateur.


In Finnish, it's an 'information machine'.

To use one is colloquially 'to data'; as in, a verb form of data :)


more accurate, it is called 'ordenador' in Spain. In Latin America, is 'Computadora'


Hmm, interesting. In Norwegian the word for computer translates to "data machine" (datamaskin)


As in Swedish.


The Swedish name is “dator”, isn’t it? Its root is certainly “data”, but I like it better than the more cumbersome Norwegian word “datamaskin”.


I’ve always thought that dator was just a short form of datamaskin. But some other comments suggested otherwise, so I had to look it up. Apparently, dator is a made up word from 1968, from data and parallels the words tractor and doctor.


Yes, it's "dator". The word was initially proposed based on the same Latin -tor suffix as in e.g. doctor and tractor, so the word would fit just as well into English as it does in Swedish.


And in Danish we had "datamat", which has a nice ring to it. But everybody says "computer" instead.


In Anathem by Neal Stephenson, computers are called Syntactic Devices ("syndev").


Computer science indeed sounds a lot like you're working with computers.

In German the subject is called "Informatik", translating to information science. I find that quite elegant in contrast.


Yes, I've heard it said that calling it computer science is like calling astronomy "telescope science".


It also helps identifying journalists that don't know what they are writing about. They frequently translate "computer science" literally as "Computerwissenschaft".


Interestingly, Computer Science is called "Datalogi" in Danish. I always liked that term better.

Coined by Peter Naur (of BNF-"fame"), by the way.


Same in Swedish. Also the Swedish word for computer is dator. Don't know if this in any way shifts the mental perspective though.


Informatika (Інформатика) in Ukrainian. Probably originated from German or French.


Linus was probably exposed to Wirth's book (from 1976) at some point.

I believe it was the first major CS book that emphasised data structures.

https://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures...


The Mythical Man-Month, published a year before Wirth's book, provides the most well-known quote on the subject (though he uses now antiquated language):

"Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious."

But I don't think Brooks was trying to suggest it was an original idea to him or his team, either. I imagine there were a decent number of people who reached the same conclusion independently.


There's a good bit about data structure-centric programming in The Art Of Unix Programming: http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id...

(Apologies for linking to esr but it's a good book)


What's wrong with esr?


He's kind of a nutcase. He's a gun rights advocate, to the point where immediately after 9/11 (within a day or two?) he argued that the solution should be for everyone to carry a gun always, especially on airplanes.

And then he accused women in tech groups of trying to "entrap" prominent male open source leaders to falsely accuse them of rape.

And then he claimed that "gays experimented with unfettered promiscuity in the 1970s and got AIDS as a consequence", and that police who treat "suspicious" black people like lethal threats are being rational, not racist.

Basically, he's a racist, bigoted old man who isn't afraid to spout of conspiracy theories because he thinks the world is against him.


At least half of these "nutcase" claims are plainly true. Thanks for the heads up, I'll be looking into this guy.


Which half?


Maybe someone willing to get into a politically fraught internet argument over plainly true things will jump in for me. I'm already put off by the ease and comfort with which HN seems to disparage someone's character for his ideas and beliefs, actions not even entering the picture.


Public utterances are actions which can have consequences. If you're in favor of free speech, buckle up because criticism of public figures is protected speech.

But in this case the "consequence" to esr was somebody apologizing for linking to him. Methinks the parent protests too much


Every action has consequences, it's either profound or meaningless to point this out. I see it used as a reason to limit speech because this speech that I disagree with is insidious and sinister. Rarely is any direct link provided between this sinister speech and any action that couldn't be better described as being entirely the responsibility of the actor.


Indeed, I point out that actions have consequences because it's a common trope that "free speech" implies a lack of consequence.

> I see it used as a reason to limit speech because this speech that I disagree with is insidious and sinister.

Limiting speech is a very nuanced issue, and there's a lot of common misconceptions surrounding it. For a counterexample, if you're wont to racist diatribes, that can make many folks in your presence uncomfortable; if you do it at work or you do it publicly enough that your coworkers find out about it, that can create a toxic work environment and you might quickly find yourself unemployed. In this case, your right to espouse those viewpoints has not been infringed -- you can still say that stuff, but nobody is obliged to provide audience.

And as a person's publicity increases, so do the ramifications for bad behavior -- as it should. Should esr be banned from the internet by court order? Probably not. Does any and every privately owned platform have the right to ban him or/and anybody who dis/agrees with him? Absolutely: nobody's right to free speech has been infringed by federal or state governments. And that's the only "free speech" right we have.


The reason free speech is called free is that it is supposed to be free of suppression and negative consequence where that speech does not infringe on the interests of others. That it is only now protected in scope by interference from government does not make this version of the free speech the one that supporters of it (myself included) the ideal.

> Should esr be banned from the internet by court order? Probably not.

Where's the uncertainty in this?

> Does any and every privately owned platform have the right to ban him or/and anybody who dis/agrees with him?

Those that profess to being a platform and not a publisher should not be able to ban him, nor anybody else, for their views, whether expounded via their platform. That's why they get legal protections not afforded to others. Do you think the phone company should be able to cut you off for conversations you have on their system?


[flagged]


> I just explicitly affirmed at least two of four "racist, misogynistic, bigoted" statements of fact.

Well, that's how you're characterizing your actions, okay. But just so you know. Your employer is free to retain you, or fire you, on the basis of opinions that you express in public or private. Wicked tyranny, that freedom of association.

> Presumably now you'd like to...

Well, that's certainly a chain of assumptions you've made. Why would you, say, "respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize" when you're out in public? Oh right, that's a quote from HN guidelines. In any case, you're not changing minds by acting this way.

> because this is how you tyrants prefer we genuflect to avoid guilt by association.

Oh, no, the tyranny of public criticism! Hey did you know something? You're free to disagree with me. And criticize me. In public! And others are free to agree with me, or you, or even both of us, even if that makes zero sense!

> This is profoundly idiotic, but I again refrain from arguing because my audience has proven itself very unthinking and vicious

A personal attack, how droll.

> I hope you're not an American,

I am! And as an American I've got the freedom of association -- that means that I'm not legally obligated to verbally support or denounce anybody; nor is it unlawful for me to verbally support or denounce anybody! Funny thing about freedoms; we've all got 'em and it doesn't mean we need to agree on a damned thing.

> because you don't understand what "free speech" is or why we have it,

Well you're wrong there, but IANAL so here's first amendment attorney, Ken White.

>> Public utterances are actions which can have consequences

https://www.popehat.com/2013/09/10/speech-and-consequences/

>> buckle up because criticism of public figures is protected speech.

https://www.popehat.com/2012/07/31/the-right-not-to-be-criti...

> my friend

That's taking things too far. No thank you.


Next time you're in New York we'll get some boba, on me. I'm friends with everybody.


He's become somewhat controversial due to his worldview and political writings, which include climate change denial


And this has nothing to do with programming.

Imagine Einstein alive and denying climate change. Would you apologize every time when you are referring to the theory of relativity?

P.S. Sorry, if you don't agree with the apologising comment and were just informing about possible reasons.


Sorry if you're getting downvoted a lot. We as a group need to start learning a little subtlety when it comes to condemning all of a person's contributions because we don't like their opinions or their actions. We are smart enough that we should be able to condemn ESR's idiotic words and actions and still praise his extremely important contribution to technology.


Absolutely agree


I don't know, does it have to be a hard and fast rule?

Sometimes I quote HP Lovecraft and sometimes I feel like apologizing for his being racist (and somewhat stronger than just being a product of his times). But most of the time, also not. But it does usually cross my mind and I think that's okay and important. In a very real "kill your idols" way. Nobody's perfect.

And that's just for being a bigot in the early 20st century, which, as far as I know, is of no consequence today.

However if Einstein were alive and actively denouncing climate change today, I would probably add a (btw fuck einstein) to every mention of his theories. But that's just because climate change is a serious problem that's going to kill billions if we would actually listen to the deniers and take them seriously. This hypothetical Einstein being a public figure, probably even considered an authority by many, would in fact be doing considerable damage spouting such theories in public. And that would piss me off.

What I mean to say is, you don't have to, but it's also not wrong to occasionally point out that even the greatest minds have flaws.

Also, a very different reason to do it, is that some people with both questionable ideas and valuable insights, tend to mix their insightful writings with the occasional remark or controversial poke. In that case, it can be good to head off sidetracking the discussion, and making it clear you realize the controversial opinions, but want to talk specifically about the more valuable insights.

And this IS in fact important to keep in mind both, even if you think it is irrelevant. Because occasionally it turns out, for instance, through the value of a good deep discussion, that the valuable insights in fact fall apart as you take apart the controversial parts. Much of the time it's just unrelated, but you wouldn't want to overlook it if it doesn't.


I disagree.


The theory of relativity is a much bigger contribution to society than TAOUP.

The chapter I linked to was just a summary of ideas put forth by others - though admittedly written well.

My problem with esr is more his arrogance and conceit than politics (which I also find distasteful)


I'd say they are incomparable, but I hope it helped to get my point across :)

I've read and liked his book, btw, but I had to ignore all his stupid Windows-bashing where he attributes every bad practice to the Windows world and every good one - to the Unix world.


This is a good review of the book by Joel Spolsky which also touches on that point:

https://www.joelonsoftware.com/2003/12/14/biculturalism/


Referring to relativity and linking to Einstein's personal web page are surely two different things, no?


Yes, but I don't think this invalidates my analogy


Right, the book stands on its own. Thoughts on the author are irrelevant on the context of the work.


He's kind of crusty about climate change, but other than that he's just a guy with some strong opinions. I guess that scares some folks enough to require an apology.


Telling how this very reasonable, “maybe things aren’t completely black and white” comment got downvoted.


Not saying I agree with either sentiment, but there's a delicious irony in this comment in that you're reading into votes as if they're pure expressions of support or not for an issue that's not black and white... Even though the expressions are just projections of a spectrum of thoughts through a binary voting system!


Ah yes, the old insight. Fred Brooks: "Show me your [code] and conceal your [data structures], and I shall continue to be mystified. Show me your [data structures], and I won't usually need your [code]; it'll be obvious."


Yes, and here's one by Rob Pike, "Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self­evident. Data structures, not algorithms, are central to programming." --- https://www.lysator.liu.se/c/pikestyle.html

I think I found all these quotes on SQLite's website, https://www.sqlite.org/appfileformat.html


> I'm a huge proponent of designing your code around the data

That same comment was made to a class I was in by a University Professor, only he didn't word it like that. He was discussing design methodologies and tools - I guess things like UML and his comment was he "preferred Jackson, because it revolved around the the data structures, and they changed less than the external requirements". (No, I have no idea what Jackson is either.)

Over the years I have come to appreciate the core truth in that statement - data structures do indeed evolve slower than API's - far slower in fact. I have no doubt the key to git's success was after of years of experience of dealing with VCS systems Linux hated, he had an epiphany and came up with the fast and efficient data structure that captured the exact things he cared about, but left him the freedom to change the things that didn't matter (like how to store the diff's). Meanwhile others (hg, I'm looking at you) focused on the use cases and "API" (the command line interface in this case). The end result is git had a bad API, but you could not truly fuck it up because the underlying data structure did a wonderful job of representing a change history. Turns out hg's API wasn't perfect after all and it's found adapting difficult. Git's data structure has had hack upon hack tacked onto the side of it's UI, but still shines through as strong and as simple as ever.

Data structures evolving much more slowly than API's does indeed give them the big advantage of being a solid rock base for futures design decisions. However they also have a big down side - if you decide that data structure is wrong it changes everything - including the API's. Tacking on a new function API on the other hand is drop dead easy, and usually backwards compatible. Linus's git was wildly successful only because he did something remarkably rare - got it right on the first attempt.



My memory is a little fuzzy, but I think Jackson was/is an XML serializer/deserializer that operates on POJOs (potentially with annotations). You define your data structures as Java objects, and Jackson converts them to XML for you. As opposed to other approaches where you define some schema in non-Java (maybe an XSD) and have your classes auto-generated for you.


"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." This quote is from Alan Perlis' Epigrams on Programming (1982).


Which is also a base design principle of Clojure. There are few persistent data structures at the core, a sequence abstraction and lots of functions to work on them.


So you would have one data structure with 10 pointers to those 10 data structures you need and 10 times the functions?

Id rather split up independent structures.


Having a smaller amount of data structures makes the whole graph of code more comparable. Creating a bespoke data structure for 10 different elements of a problem means writing quite a lot of code just to orchestrate each individual structure, mostly due to creating custom APIs for accessing what is simple data underneath the hood.

There’s a reason why equivalent Clojure code is much much shorter than comparable programs in other languages.


> "I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

It should be noted that the basic premise of Domain-Driven Design is that the basis of any software project is the data structure that models the problem domain, and thus the architecture of any software project starts by identifying that data structure. Once the data structure is identified then the remaining work consists of implementing operations to transform and/or CRUD that data structure.


DDD is about modeling which is data and behaviour.


> DDD is about modeling which is data and behaviour.

It really isn't. DDD is all about the domain model, not only how to synthesize the data structure that represents the problem domain (gather info from domain experts) but also how to design applications around it.


I remember that Richard Hipp (SQLite creator) once cited a bunch of similar quotes including the Linus' one.

https://www.percona.com/sites/default/files/hipp%20sqlite%20...

"Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious." -- Fred Brooks, The Mythical Man-Month, pp. 102-103


>> Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

Tsk. Now I'll never know if I'm a good programmer. I do all my programming in Prolog and, in Prolog, data is code and code is data.


"The more I code the more I observe getting a system right is all about getting the data structures right. And unfortunately that means I spend a lot of time reworking data structures without altering (or improving) functionality..." https://devlog.at/d/dYGHXwDinpu


These are ideas echoed by some of the top people in the game development community as well. there is a nice book about these kinds of ideas:

https://www.amazon.com/dp/1916478700


Can anyone point to some good resources that teach how to code around data and not the other way round?


The Art of Unix Programming, by Eric Raymond, particularly chapter 9: http://www.catb.org/~esr/writings/taoup/html/generationchapt...


Switch to a language that emphasises functional programming (F#, Clojure, OCaml, etc) and it will happen naturally.


Surprisingly there is no one book AFAIK but techniques are spread across many books;

The first thing is to understand FSMs and State Transition Tables using a simple two-dimensional array. Implementing a FSM using a while/ifthenelse code vs. Transition table dispatch will really drive home the idea behind data-driven programming. There is a nice explanation in Expert C Programming: Deep C secrets.

SICP has a detailed chapter on data-driven programming.

An old text by Standish; Data Structure Techniques.

Also i remember seeing a lot of neat table based data-driven code in old Data Processing books using COBOL. Unfortunately i can't remember their names now. Just browse some of the old COBOL books in the library.


Jonathan Blow is also talking about data-oriented programming as a basis for designing his Jai programming language.


Data-oriented programming in games is a completely different concept though. It’s about designing your data to be able to be operated on very efficiently with modern computer hardware. It speeds up a lot of the number crunching that goes on in extremely fast game loops.

The Linus comment is about designing your programs around a data representation that efficiently models your given problem.


If I could say just one thing about programming to my kids, I would quote this.


I sometimes like to explain things the other way around in terms of functional programming immutability being like version control for program state.

I rarely use functional programming but I certainly see its appeal for certain things.

I think the concept of immutability in functional programming confuses people. It really clicked for me when I stopped thinking of it in terms of things not being able to change and started instead to think of it in terms of each version of things having different names, somewhat like commits in version control.

Functional programming makes explicit, not only which variables you are accessing, but which version of it.

It may seem like you are copying variables every time you want to modify them but really you are just giving different mutations, different names. This doesn't mean things are actually copied in memory. Like git, the compiler doesn't need to make full copies for every versions. If it sees that you are not going to reference a particular mutation, it might just physically overwrite it with the next mutation. In the background "var a=i, var b=a+j", might compile as something like "var b = i; b+=j";


I encountered a large company where they had a private git server for their engineering teams.

Over time someone discovered that the number of repositories and usage was much greater than they expected. What they found was that non engineering folks who had contact with engineering had asked questions about how they manage their code, what branches were, and etc. Some friendly engineering teams had explained, then some capable non engineering employees discovered that the server was open to anyone with a login (as far as creating and managing your own repositories) and capable employees had started using it to manage their own files.

The unexpected users mostly used it on a per user basis (not as a team) as the terminology tripped up / slowed down a lot of non engineering folks, but individuals really liked it.

IT panicked and wanted to lock it down but because engineering owned it ... they just didn't care / nothing was done. They were a cool team.


Unfortunately git does not handle binary files elegantly (unless you use git-lfs). You can inflate storage rapidly by, say, editing a 10M zip file a few times. I've had to GC more than one repo where someone accidentally added an innocuous binary file, and the next thing you know the repo has exceeded 2G of storage space.


> I've had to GC more than one repo where someone accidentally added an innocuous binary file

My god, the things I've seen in repos. vim .swp files. Project documentation kept as Word documents and Excel spreadsheets. Stray core dumps and error logs in random subdirectories, dated to when the repo was still CVS. Binary snapshots of database tables. But the most impressive by far was a repo where someone had managed to commit and push the entirety of their My Documents folder, weighing in at 2.4GB.


If you crawl a package repository such as PyPI, you will find a lot of that same stuff in packages as well. Which is even weirder because those are created from a setup.py which does not have a `git add .` equivalent. People are not good at building clean archives.


I found git-lfs to be a huge pain, since the "public" server implementations are basically github and gitlab. We have plain git repos (via NFS/ssh plus bugzilla hooks), so we either have to use some random-github-user's virtually unmaintained implementation or roll our own - both not the best options. On the other hand, we put our custom built GCCs plus sources into a git, and trust me, having a 8GB repo (after a few version bumps) is really annoying, so having git-lfs would be plain amazing.

(I checked this out the day before I left for vacation, so to be fair, my research might have not been thorough enough to find each and every implementation - but I think it is comprehensive enough to make some preliminary judgement)


Did you try the lfs-test-server?

https://github.com/git-lfs/lfs-test-server


we've got Bitbucket's LFS pointed to our Artifactory server. not the cleanest solution, but haven't had any major problems on over a year.


External hosting is not an option for us ;) The gccs are the biggest pain point, but customer projects plus binaries are the other - and those are just too sensitive to be pushed into someone's cloud.


Bother our Bitbucket and Artifactory instances are internally hosted.


Luckily storage is getting cheaper. I do wish someone hadn't checked a custom-built nginx binary into ours though.


Even with infinite storage, having lots of blobs can make a repo unmanageable. In order to get an 8GB repo onto github, I had to make temporary branches and and push them incrementally.

I highly recommend git-annex. It is like git-lfs but a bit less mature but much more powerful. Especially good if you don't want to set up a centralized lfs server.


Yea, I recommend git-annex too.


It's not just a question of storage, as the size of the repository increases git starts having a hard time dealing.

Binary files don't cause the issue, but because binary files don't deltify / pack well significant use of them makes repos degenerate much faster.


I heard of a web consultancy around 2006 where the Subversion repository history contained a full copy of the Rolling Stones discography in MP3.


The real genius of git is the clear, concise user interface for that data structure.

(for those with a severe sugar hangover, I'm being a little bit sarcastic)


Quoting myself from yesterday:

https://news.ycombinator.com/threads?id=miohtama

"Git is what a version control UX would look like if it were written by kernel developers who only knew Perl and C"

Back in a day we had Subversion, Mercurial, Bazaar, some others. I used all of these. All of them were more coherent than Git. However they were slower - but not much - and they were not used by the most popular software project in the world. Then, GitHub popularized git and Github become well funded enough to take over the software development world.

Bitbucket, now Atlassian, started as a hosted Mercurial repos. Bazaar was DVCS for Ubuntu, developed by Ubuntu folks.

Will we see another DVCS ever again? I hope yes. Now all software developers with less than 5 years of experience are using Google and GitHub as the user interface for Git. Git's cognitive burden is terrible and can be solved. However Git authors themselves are not priorising this.

Software development industry could gain a lot of productivity in the form of more sane de facto version control system, with saner defaults and better discoverability.


> Will we see another DVCS ever again? I hope yes.

As just the first example off the top of my head, Pijul [1] is "new" compared to the others you listed. There will likely always be folks exploring alternatives.

> Now all software developers with less than 5 years of experience are using Google and GitHub as the user interface for Git. Git's cognitive burden is terrible and can be solved. However Git authors themselves are not priorising this.

I have the opposite impression, that the Git Team is finally getting serious about the UX, whether its just all the fresh blood (thanks at least partly to Microsoft moving some of their UX teams off of proprietary VCSes to converge on git), or that Git's internals are now stable enough that the Git team feels it is time to focus on UX (as that was always a stated goal that they'd return to UX when all the important stuff was done).

Clear example: The biggest and first announcement in the most recent Git release notes was about the split of `git checkout` into `git switch` and `git restore`. That's a huge UX change intended to make a lot of people's lives easier, simplifying what is often people's most common, but most sometimes most conceptually confusing git command given the variety of things that `git checkout` does.

The Git UX is better today than it was when it first "beat" Mercurial in the marketplace, and there seems to be at least some interests among git contributors to make it better.

[1] https://pijul.org/


> However they were slower - but not much

I don't think we should underestimate how much performance differences can color opinions here, especially for something like CLI tools that are used all the time. Little cuts add up. At work I use both git and bazaar, and bazaar's sluggishness makes me tend to avoid it when possible. I recall Mercurial recently announced an attempt to rewrite core(?) parts in Rust, because python was just not performant enough.


Related to this here is some discussion from Facebook, Microsoft, and why Facebook is using Mercurial over Git:

https://news.ycombinator.com/item?id=16565299


The speed of branching in git, compared to subversion, was a huge part of convincing me to move. Not an entirely "fair" comparison given that subversion is a centralized VCS, but speed is very important.


Isn't subversion's take on branching, as well as tagging, actually copying/duplicating whole directory trees?


This is the interface you're presented with, but it's actually a "cheap copy" underneath. So if you write "svn cp https://svn.myserver.com/trunk https://svn.myserver.com/branches/foo" that takes about 1-2 seconds in my experience (no matter how many files, how long the history, or how many binary files you have, etc.).


Likewise, Git has been steadily improving on the user interface front for years. It's much better than 1.0 was, but it's still not the easiest DVCS to learn.

In my experience, saving a little on runtime doesn't make up for having to crack open the manual even once. UI is a big cut.

From an implementation POV, it's also generally easier to rewrite core parts in a lower-level language, than it is to redesign a (scriptable, deployed) UI.


> From an implementation POV, it's also generally easier to rewrite core parts in a lower-level language, than it is to redesign a (scriptable, deployed) UI.

But only if the data structure is simple and works well for the problem domain. Bazaar (a DVCS from Ubuntu; I mean bzr, not baz) had a much simpler and consistent UI, but it had several revisions to the data structures, each one quite painful, and it was slow; they were planning the rewrite to a faster language but never got to it. (Mercurial also used Python and wasn't remotely as slow as bzr - the data structures matter more than the language).


>In my experience, saving a little on runtime doesn't make up for having to crack open the manual even once.

Please don't ask me to give up features so you don't have to read the documentation. It is the most basic step of being a good citizen in a software ecosystem.


> In my experience, saving a little on runtime doesn't make up for having to crack open the manual even once. UI is a big cut.

Maybe, but I used to work in a multi-GB hg repo, and I would have given up any amount of manual cracking to get the git speed up. Generally you only open the manual a few times, but you can sync many times a day. I'd give a lot to get big speeds up in daily operations for something I use professionally.


At my current place of employment, I don't find git to be super performant. We do have an ugly monolith of a repo though.


Unless you are hitting large binary checkins and not utilizing LFS I can't imagine a scenario where another current RCS would perform better.


"Large binary checkins" is not the actual issue. Git degrades as repository size increases. Large binary checkins make it much easier / faster to reach this situation, but you can also reach it just fine with regular text-based repository if they're big and have a significant history (long-lived and many contributors).


Unless you mean another DVCS, P4 can run circles around git on large repos (mostly on account of not conceptually trying to copy the entire state of the entire repo at every commit).


Run a reconcile offline changes on an actually large repo and come back and tell me that again. ;)


> However they were slower - but not much

CVS was much much much slower; multiple branch handling was horrible until ~2004 (and even on a single branch you did not have atomic commits). Also, no disconnected operation.

SVN was only a little slower than git, but didn't have disconnected operation, and horrible merge handling until even later (2007 or 2008, I think)

Bazaar 2 was, at the time, while comparable in features, dead slow compared to git. But it also sufferend from bazaar1 (branched from arch=tla) being incompatible with bazaar2 and an overall confusing situation.

Mercurial and Git were a toss-up. Git was faster and had Linus aura, Mercurial had better UI and Windows support. But all the early adopters were on Unix, and thus the Linus aura played a much bigger part than the Win32 support.

Github became externally well funded after the war was over. But it was self well funded, because git was more popular (in part because github made it so ...)

Really, I think the crux of the matter is that Git's underlying data model is really simple, and the early adopters were fine with UX ... mostly because those adopters were Perl and C people. So the UX was not a factor, but speed and Linus aura were.


My experience was very different. SVN was far slower than git. It was slower for me to checkout my company's SVN repo at head than to use git-svn to clone the entire SVN history locally. And from then on, most git operations were effectively instant, save for pushing and pulling to SVN.

The killer feature though was that git didn't put my data at risk while I worked. With the normal SVN workflow, your working directory was the only copy of your changes. And when you sync'd with upstream it would modify the code in your working directory with merge information. Better hope that you get that merge right, because there's no second chances. Your original working directory state is gone forever, and it's up to you to recreate it with the pieces SVN hands you.


I have seen in the wild at a previous job, a repo with over 200m commits, but, also in the repo, single commits with over 500m lines changed diffs. and git on modest hardware would get through it. slowly, but eventually.


A very good insight, sir! Your memory serves better than mine.

However I believe in long run Hg caught up in the speed and Bazaar was getting a lot of better as well.

SVN merge was nightmare. People avoided doing work that would result a merge as it hurted to get it executed nicely.


My experience was that SVN was a lot slower than CVS, from using the Apache and FreeBSD repos from the UK. The chatty protocol suffered a lot from transatlantic round trip times.


Fossil. It's very fast, works well on low bandwidth connections, much cleaner interface, highly customizable, very easy to setup and self host. And the repo is stored in a SQLite database, so it is very easy to backup and explore w/ SQL. It's a shame it isn't more widely used.


I love much about Fossil and used it a lot several years ago, but my workflow have mental model have since come to rely heavily on interactive rebase, a philosophy that Fossil abhors.



I don't understand why people aren't using Fossil[1]. It's still simple, while maintaining a consistent and sensible user interface. It might not be as flexible, but the repos are just sqlite databases, and Sqlite can be called from almost any language, so there's huge potential for custom tools that solve a specific use-case etc. It's main advantage, though, is that it's about much more than just code. Issues (called tickets), Wikis or even Forums can be a part of your repo. That means there's absolutely no vendor lock in. In fact, you can host your repos by just SCPing a repo file to a public server. You can also collaborate on issues offline etc. It's written by the Sqlite guy, so it's highly reliable and well documented, upto the technical details like file formats etc. It's designed so that repos can last for hundreds of years. The C code is also of very high quality.


> I don't understand why people aren't using Fossil

For the same reason that BD won over HD-DVD: «Greater capacity tends to be preferred to better UX», except in this case it's performance rather than capacity.


I'm going to say this again and take the downvotes but it's comments like this that generally come from people who don't get git.

git is not the same as those other pieces of software mentioned.

git's default workflow encourages lots of parallel work and making tons of branches (which because of bad naming are confusing because git branches are not what other software calls branches) .

it's a fundamental difference and has increased my productivity and changed my work style for the positive in ways what would never have happened with CVS, svn, p4, hg, etc... all of which I used in the past for large projects.

If you're using git and your mental model is still one of those other systems you're doing it wrong or rather you still don't get it and are missing out.

I'm not suggesting the UX couldn't be better but when you finally get it you'll at least understand what it's doing and why the UXs for those other systems are not sufficient.


> Back in a day we had Subversion, Mercurial, Bazaar, some others.

I don’t agree bazaar was a UX panacea over git, and it was not just “not by much” slower. Subversion was a piece of shit full stop (especially if you had the misfortune of using the original bdb impl), bested in this regard only by VSS. I think slower “not by much” is the understatement of the century for a repo of any substantial size for all but mercurial on your list.

You don’t even mention perforce, leading me to think most of your experience is skewed by the niche of small open source projects.

Mercurial was a contender... great windows support too. I think it was less kernel that killed it and more github.


Bitbucket started about the same exact time as GitHub. It's not necessarily a given that Mercurial lost because of GitHub.

I think it was perceived performance that led git to besting Mercurial, which the Linux Kernel team certainly contributed to that drama, including the usual "C is faster than Python" one-upmanship, this especially funny because it was despite most of git at the time being a duct taped assortment of nearly as much bash, perl, awk, sed scripts as C code.


>Software development industry could gain a lot of productivity in the form of more sane de facto version control system, with saner defaults and better discoverability.

So write it.

I'm sorry to be so dismissive but it seems notable that those who like git get along with using it while those who complain about it just throw peanuts from the gallery. If it's obvious to you where git's flaws lie, it should be easy to write an alternative. If saner defaults and better discoverability are all you need, you don't even have to change the underlying structure, meaning you can just write a wrapper which will be found by all the competent developers whose productivity is so damaged that they do what they do when they encounter any problem and search the internet for a solution.

It seems notable this hasn't happened.


About a year ago I dropped into a place that was still using SVN. Now they're switching to Git. This experience has really shown me how much SVN just gets out of the way compared to git-- much less I had to think about when using it.


> However they were slower - but not much

Depends, we went from CVS to Git and nightly jobs tagging the repository went from taking hours to being almost instant.


What don’t you like about git?


7000 votes on https://stackoverflow.com/questions/4114095/how-do-i-revert-...

If one cannot figure one of the most common use case of a version control system without Googling a StackOverflow answer then we have a problem somewhere.


Reading the answers to that Stack Overflow question provides great insight into why git is so successful. "One of the most common use case" is actually several closely related use cases, and git has one-line commands to cleanly handle all of them.

I will say from experience that it's not hard to use git productively with a bit of self-study and only a few of the most common commands. You still have to understand what those commands actually do, though.


This person didn't know (or at least didn't know how to say) which of the several "most common use cases" they wanted to actually accomplish. I think most of the value of this question comes from the distinctions the top voted answers make between "temporarily navigate to", "delete local progress", "publish the reverse of the published changes"; all three of these are very common operations. The actual commands git uses to accomplish these aren't important, and this question should be popular in any distributed version control system. It doesn't matter how much sense the names of your commands make, someone starting out won't know that these three things can even be accomplished.


To be fair, 'revert' is too vague a term, and the very first sentence of the chosen answer asks what the asker meant. I think the answer is quite clear and concise once the question is clear.


    man git
The problem is people are unwilling to read the documentation. I have little patience for them demanding I change my workflow to accommodate their sloth.

Fortunately, I don't have to worry because the overlap of 'people who don't RTFM' and 'people who are capable of articulating how they want to change git' have so far failed to write a wrapper that's capable of manipulating git trees without frustrating everyone else on the same repo.

And of course they can't: version control[1] is not a trivial problem. So I see no reason for us to demand that someone knows how to do it without studying when we don't expect the same for other auxiliary parts of software development such as build systems or containerisation or documentation.

[1] As opposed to the backup system the link wants to use it as: asking better questions is another important step. There's little reason to checkout an older commit as a developer unless you want to change the history, in which case it's important you understand how that will interact with other users of the same branch. If you don't need it to be distributed, you already have diff or cp or rsync or a multitude of other tools to accomplish effective backups.


I am a big fan of git but honestly, if you can't recognize that there are unlikeable things about it you're suffering from some kind of stockholm syndrome. Just start with the fact that several of the most common actions / commands are named in ways that are either directly misleading or at very least severely conflict with standard use of common version control terms.

(one of my favorite, for example, is that `git checkout` causes silent data loss while every other git command will print out giant errors in that scenario)


You can't checkout if you have tracked changes. If you mean you lose untracked changes, then a) it's unsolvable in the general case unless we all start doing out-of-source builds, so don't have to worry about build artefacts and b) it's already solved by git-worktree, so if you haven't RTFM, adding new features won't matter anyway.


Using it, probably.


The trick to suffering the git user interface is using magit in emacs. Even if you don’t usually use emacs, it’s probably worth installing it and setting it up with a command to start it up straight into magit.

Otherwise I’m hoping for pijul to somehow gain popularity (and a bit of polish) and become mainstream. I guess a motto for it could be “the high quality user interface and semantics of darcs without the exponential time complexity”


There's a few good UIs for git if you don't like its command line; along the lines of magit, I've recently been using fugitive in Vim and it's terrific. For the Mac, there's the free and open source Gitup, and of course there's a host of commercial clients.

But, having said that, I made my peace with the git command line years ago, in part by learning to appreciate aliases:

    co = checkout
    ci = commit
    dt = difftool
    mt = mergetool
    amend = commit --amend
    pfwl = push --force-with-lease
(The first two are my personal hangovers from Subversion.) I also have a "gpsup" shell alias which expands to

    git push --set-upstream origin $(git_current_branch}
The latter is taken from Oh My Zsh -- which actually has dozens of git aliases, most of which I never used. (When I realized "most of which I never used" applied to all of Oh My Zsh for me, I stopped using it, but that's a different post.)

tl;dr: I used to have a serious hate-on for git's command line, but one of its underestimated powers is its tweakability.


You don't even need git aliases for this, I personally use bash aliases for 90% of git use cases. Thus I type gc instead of "git commit", gd instead of "git diff", ga instead of "git add", etc.


I especially like the intuitive order of command line arguments


some intuition you have...


Try gitlab.


Wow, that is a beautiful post, thank you for writing it out that way...it makes me pine for VCS in my job.

Can you or someone else reflect on my file system? I work for the government doing statistical analysis of healthcare data, and there is no VCS where I code, other than how you name the files and where you put them in folders and how you back them up manually.

I am facing a major data-branching event where I'm going from ~40 scripts (R, SQL, STATA) on one dataset, to then three overlapping but different datasets and having ~100 scripts. I just don't know if my brain and wits are up to the task of maintaining these 3 branches with 3 different languages, given all I have is folder/file names and my knowledge reservoir and memory...

I know this is a perfect use case for git, but I've never used it before and no one else in my department uses it. I don't know if I have the time and energy left at this job to implement a new system of VCS AND reproduce my code for 3 different-but-similar projects.

Burnout approaches...


Tell management that your current approach isn't going to work for much longer, and say you have some ideas that might improve the situation.

Get your department to pay for you and ~2 colleagues to go on a git training course for a few days. As well as teaching you how to use git, it'll give you some time with an expert to look at your problem, and give you some relaxation time helping the burnout, and with 3 of you on the course, you'll likely get buy-in for a new setup.

Beware that git isn't a silver bullet. While it solves a bunch of issues, it causes many new ones - especially when you have lots of people who aren't knowledgeable about version control using it. I wish git had better integration with 'regular files' - ie. so that Mary in the marketing department can update a readme file without having to learn a totally new way of working. I wish you could "mount" a git repo as a drive in Windows, and all changes would be auto-committed to a branch as soon as a file is saved, and that branch were auto-merged to master as long as tests pass. Then people without git knowledge can work as before.


> wish you could "mount" a git repo as a drive in Windows, and all changes would be auto-committed to a branch as soon as a file is saved, and that branch were auto-merged to master as long as tests pass. Then people without git knowledge can work as before.

Cool idea for a project



Cool find, didn't know about that.

Does it only check files passing tests? I read quickly and didn't see that


You can use Git on your own without anyone else being affected. It doesn't require a server to add benefit. Learn to work with it and then introduce your coworkers later.


I've done exactly this ~4 years ago when I briefly worked at a place that used Subversion, after an acquisition. I wanted to be able to dick around in my own branches, with proper diffing and tracking and all, without updating the server, which appeared to be impossible (more or less). There was a git-to-svn I could use but considering how easy it was to screw up other people's state in Subversion, it made me nervous. So I just worked in my own, local git then copied the files to SVN when ready to commit something worth sharing.


It’s possible you can’t install it in the computing environment


Sublime-merge (the Gui git client from the sublime text people) is available in a portable version, and so can be run as a .exe from the filesystem, or a mountable drive. Comes with its own git binary.

The GUI is stunningly beautiful and functional, and there are more than enough keyboard shortcuts to keep things snappy once you're in the flow. I used to live and die by the terminal, now I am in love with sublime merge.

I used the portable version for a job where I didn't have install rights to the corporate laptop, and it preserved my workflow and kept me sane during my dev work. The portable version can run a little slow, but it's a pretty good solution.


I'm in a similar situation and the entire git for windows setup (including git bash that works beautifully with things like Windows network drives!) can be used without ever needing admin privileges. So I not only have git but also vim and perl and the whole *nix kit I was so sorely missing.

Some truly locked down environments may not allow it but if the poster has other open source tools like R they can probably run .exe files.


git is actually pretty easy to drop into a terrible methodology without too much disruption.

git works by creating its own .git directory wherever you create a new git repository, but doesn't touch the files and directories outside of it until you tell it to.

So you can have a directory of old code and you just cd to it and run 'git init', and now you have a git repository in the same directory. It won't be managing any of the files yet, but it will technically be there.

Because git is just a bunch of extra data in a .git directory, and because git is also built as a distributed VCS, the "make a copy of a directory to back it up" methodology actually works pretty OK with git. Ideally you should be using 'git clone' to copy your directories and 'git pull' to keep them in sync, but if you just Control-C Control-V your source code directory, git will actually be just fine with that, and you can still later use git to sync the changes between those two directories.

I'm not going to put a full git tutorial into this post, about how you add files to the repository and make commits, but I just want to convey that while git has a justifiable reputation for sometimes devolving into arcane incantations -- it's actually low effort to get started and you only need to learn three or five commands to get like 95% of the value from it.

Once you learn those three or five commands, you'll find yourself running 'git init' in nearly every directory you make -- for your random scripts, for your free time coding projects, for your free time creative writing projects -- and you'll even find it easy to use on horrible "27 directory copies of the source code with 14 file renames" projects where none of your teammates use git; you can use git yourself in such cases without adding any real friction, and it still helps you even if your teammates just copy your code directory or send you copies of their code directories.

EDIT: One other note: git can also go away easily if you decide you don't like it. You don't need to run git commands to create, edit, copy or otherwise modify the files in your code base, like you do with some other source control systems, so if you can just forget it is there if you are busy and don't want to worry about it, and then later go ahead and add or commit all of your work. If you really don't like it, you just stop running git commands and you're no longer using it: you don't need to 'export' or 'ungitify' your code base. So it's pretty low-risk in that way as well.


Other cool things about git being "just a directory full of files":

- you can put the git directory somewhere other than in your working directory, if you really want to. Or reference a bunch of .git directories in a series of commands without having to change your current directory. Sometimes this is handy (usually for automation or something like that).

- If you're nervous about some command you're about to run—something that might screw up your git tree—just copy the .git directory somewhere else first. You can copy it back to entirely restore your state before the command, no need to figure out how to reverse what you did (assuming it's even possible).


Wow, thank you for this, it is a gem of a comment. I truly want to implement this and I see a massive potential to improve what I do...but...

My brain is basically overloaded with stress and I'm headed for burnout...only 18 months into this position. I just can't handle the tech stack, the shitty office, the commute, the feelings of being the worst analyst and the worst researcher in every single room I'm in. It is totally wearing me down. Management said new employees can get work from home after 12 months, then at 18 months I asked, and they revoked their verbal agreement and said they'd reconsider their decision if I made an article and let someone else be first author on it (unethical).

Outside of my complaints...I'm just not a great worker. I just feel that the whole team and department would be better off without me, that I can not handle this tech stack and QoL and its frustrations...govt is a very very restrictive environment and I feel like a circle being jammed into the square hole. I can't implement most of what these comments stated because I can not install anything onto my computing environment...even Python, I have to go through red tape and request special access to use Python instead of R and STATA.

I'm sorry to vent but all of these shortcomings are seriously burning me out.


It's fine to vent; it's half of what the internet is for.

Since the internet is also for acting like you know what you are talking about and offering unsolicited advice, I'll also drop some here. Feel free to ignore it, and I hope you situation gets better, either at your current job or a new one.

I won't speak too much to your work skills, because I don't know you; but feeling like and worrying that you're terrible at your job is a pretty normal experience. You pretty much have to rely on whether other people think you are doing a good job because people in general are garbage at judging their own skill. It's pretty hard to tell the difference between "I think I'm doing poorly and am" and "I think I'm doing poorly and am actually doing fine", without a lot of feedback from people you trust (ideally, your coworkers).

If your coworkers think you're doing fine, well, you can't stop worrying about it, but you'll at least have some evidence against your feelings; if your coworkers think that you're under-performing, they might at least be able to offer some advice on how to do better.

The burnout advice I have to give is in three parts: first, focus on making some small, incremental progress every day; second, avoid the temptation to overwork; third, make sure to invest time in your life outside of work.

The first is both about positive thinking and also about developing good work habits. The second is because it doesn't usually work (you end up doing less with more time, which is even more depressing than feeling like you aren't getting enough done in 8 hours). The third is because you will feel better and be more resilient if your entire identity isn't invested in your job. It's easier to both to avoid burnout and to recover from it when it does happen if your job is only one part of your life.


> I'm just not a great worker.

I sincerely doubt that you. You sound like a conscientious employee in an environment not set up for the kind of work you were hired to do. You also sound like you want to leave your job - which can give you leverage. Not that you should threaten to quit, but that since you are so unhappy, you are willing to quit. That means you can start saying what kind of computing environment you need. Not want, but need.

Personally, I think that having source control is basic table-stakes when writing code as a part of a job.


I’m sorry to hear that. I’d recommend looking for a new job (if possible), the market is in your favour at the moment (edit: if you live in a big city in Europe or the USA).

Otherwise, another poster commented that a git training course paid for by the company could help (+ give you some relief from burning out).


Now you have hidden subfolders with .git in your source.

And remember git doesn't save directories that are empty.


Git being distributed means you can use it without any centralized "master"--your local repository contains the entire history.

And if git seems too difficult to start with, Subversion can also "host" a repository on the file system, in a directory separate from your working directory.


Agreed with this, SVN (short for subversion) is a good alternative.

I understood and was comfortable with SVN within a few minutes (using the TortoiseGit front-end, which I highly recommend).

I wrestled with git for months and at the end still feel I haven't subdued it properly. I can use it reliably but SVN is just so much friendlier.

So my suggestion is go with SVN + TortoiseGit. SVN is your butler. Git is a hydra that can do so much, once you've tied it down and cut off its thrashing heads.

It's not just me, our whole (small) company moved to it it burnt too much of our time and mental resources.

Edit: after learning TortoiseGit, learn the SVN command line commands (it's easy), and learn ASAP how to make backups of your repository!


SVN is easy, Git is simple.

Getting started with SVN is very quick, but once you need to peek under the hood, you'll find out it's super complicated inside.

Git is just the other way around: the interface is a mess, but the internals are simple and beautiful. Once you understand four concepts (blobs, trees, commits, refs), the rest falls into place.

Recommended four page intro to git internals: https://www.chromium.org/developers/fast-intro-to-git-intern...


I'll check your link, thanks.

Could you explain what you mean by svn being super complicated inside? I presume you mean from a user's not a programmer's perspective; I never found it confusing, ever.

It has it's flaws (tags are writable, unless that's been cured) but it's really pretty good, and far better than git for a beginner IMO.


I meant that SVN's internal concepts and workings are not simple. It's easy to use in the beginning, but it becomes difficult to even understand what's going on when you get into some kind of a tricky or unusual situation.

In Git, no matter how strange the situation, everything is still blobs, trees, commits, and refs. There are very few concepts used in Git, and they're simple and elegant.

SVN to Git is like WordPress to Jekyll - WordPress is easier to use than Jekyll, but Jekyll is simpler than WordPress.


I'm afraid I've still no idea what you mean. I've had plenty of confusion with git, and none that I can ever recall with SVN.

SVN's concepts are straightforward - commit stuff, branch, branches are COW so efficient, history is immutable unlike git (for better or worse) erm, other stuff. Never got confusing.


I'm not sure what you mean by SVN being super complicated? I've been using and administering it for years and it's just as straightforward as git (if a little easier because centralization is simpler to grok than decentralization).


Did you mean TortoiseSVN? TortoiseGit is a frontend for git, as the name implies, AFAIK it doesn't work with SVN at all.


(cringes)

Yes, I did. Thanks.


I faced a moment like this, where I realised I needed git to survive a big set of changes. (Though I was on SVN before, which was better than nothing but a far cry from git).

However, branching may not be the ideal solution given how you describe your issue. With git branches, we typically dont want to run something then switch branches then run something else. I would say branches are primarily for organizing sets of changes over time.

If you have multiple datasets with similarities, what you may need more than git is refactoring and design patterns. To handle the common data in a common way, and then cleanly organize the differences.

That said I would still definitely want all scripts in git. It is not that hard to learn, lean on someone you know or email me if you need to.


git for ML projects with data: https://dvc.org

In particular, dvc carefully handles large binary files.


Totally agree with this point, so much that I did a presentation on "Git Data Structure Design" two months ago for Papers We Love San Diego that hammers it home over the course of 50 minutes.

https://www.youtube.com/watch?v=fHSZz_Mx-Uo

To become a Git power user, it is far more beneficial to learn its underlying content-addressable-store data structure rather than explore the bazillion options in its command line interface. It is surprisingly easy to create a repository manually and then to start adding "blob" files to the store!


Fyi this is called a merkle tree. Also the data structure that blockchain uses and a couple other protocols. They’re wonderful and surprisingly easy to implement.

https://en.wikipedia.org/wiki/Merkle_tree


Wow, 1979. And patented.


Most ideas in CS are old as dirt. There was a flurry of theory advancements mid-century and a lot of work since then has been putting those ideas into practice.


Even back in renaissance days with the most advanced mathematics there were unknown prior art


I have a theory this is a consequence of Euclid not being taught anymore.


..so right about the time computers were invented, people figured out how to use them and did some basic research.


Patents from 1979 have expired a long time ago.


Git is great but all the major drawbacks stem from this design as well? Immutability and a full copy cause public rebase headaches, extremely large repos (before LFS) and the lack of partial checkouts to name a few.

Doesn't it show more of the drawbacks of this functional data structure?


I think rebases are causing so much pain that it is better to have tool support such that they are not needed. E.g. it was a revelation to discover that github doesn't care in PRs - just do a merge with master, done.

As for the working directory - yes, there could be more management around that. I'm not sure why the git community went for nested repos / submodules rather than partial checkouts. It's a different question than the data structure of the repo itself, though. Compared to other VCS it still seems miles ahead.

Large repos: It seems one could alleviate that by limiting the pull history (and LFS if needed), right?


It seems like git can also be used as an immutable database as well. Does anyone here have experience / comments on using git as a database backend?

EDIT: I found a couple of interesting references for folks who may be curious about this as well. I especially like [2] for its diagrams.

[1] https://stackoverflow.com/questions/20151158/using-git-repos...

[2] https://www.kenneth-truyers.net/2016/10/13/git-nosql-databas...


It seems you would want a hosted git installation like github / gitlab or VSO for that. The main concern to me is that API keys are usually too limiting (or too expensive). E.g. if you don't want to manage local state by having a working directory you need to do many API calls to make a change (create the objects, create the trees, then create the commit, then change the ref) so that it is almost more worthwhile to use the same data structure on a more generic backend like a NoSQL key-value store. I haven't done this though (though it's been on my wish list for long).

What it also doesn't give you for free is sensible search indexes. I do think though that combining it with a search index could be very powerful.


It is a beautifully simple data structure. I just wish its UI was that simple.


If I had known that from the start, I would have found git much easier to learn!


But git also has (and most use cases demand) the concept of tree deletion.


eh, it's just copy-on-write - an ancient OS technique. You can call it functional if you like but the idea is old and has been applied very generally for decades.


database version of this is datomic (closed source). Are there any others?

browser version of this is datascript (oss).


I think it's kind of weird that they left out any mention of BitKeeper in this article.

The whole impetus for git (someone correct me if I'm wrong):

1. Linux source was hosted on BitKeeper before git. It basically was one of the first distributed source control systems (or the first? not sure if anything predated it).

2. Linux devs got into conflict with the BitKeeper owner over open-sourcing and reverse engineering, so Linus realized they needed a new system because no other source control system had the features they needed like BitKeeper had (mainly, I understand, the distributed repos).

So basically, Linux is to Unix like git is to BitKeeper (roughly).


This gets at the underlying truth. Once you have been successful in one situation, it becomes far easier to extend that success into other closely related areas. Linus is a celebrity in the open source world, switching his project to git effectively forced much of the core infrastructure maintainers/developers to learn/use the tool. Its the equivalent of putting a bluray drive in the playstation.


Mercurial was also spawned from the BitKeeper drama:

https://lkml.org/lkml/2005/4/20/45


And no-one's yet mentioned Monotone, Graydon's project a bit before Rust. My English Literature teacher was right, Nothing Is Written In A Vacuum.


Right, there was a golden age of interesting DVCSs around then - Darcs (2003), Monotone (2003), Bazaar (2005), Git (2005), Mercurial (2005), Fossil (2006).

I use Arch (2006) btw.


It shows how important having a big name behind a project is. Linux gave Git its momentum.


It's not just the big name -- Linus learned from the mistakes of the older DVCSs. I also used to use Arch a bit. Git was a vast improvement (eg. it didn't force you to use long ugly branch names).


If memory serves, DVCS were slow until Git came along


I'd not heard of it before your post, interesting!

https://en.wikipedia.org/wiki/Monotone_(software)


Monotone should have been what we're all using :(

It has a much cleaner interface, easier to use, better formalism, code signing is built-in and required rather than tacked-on and optional, and insanely good documentation (I still sometimes recommend people to read the monotone docs as an introduction to git/DVCS).

It was also basically finished and stable when the BitKeeper drama happened. It was one of the few alternatives Linus looked at and publicly evaluated before writing git. But unfortunately, its performance was pretty poor for a project the size of Linux at the time, and a combination of human failings + the well architected abstractions which hid the underlying monotone data structures convinced Linus that it was less effort to write git than to fix monotone.

In a literal sense that was true since git was basically written in a weekend.. but more than a decade on we're still basically stuck with all the design short-cuts and short-falls of a weekend project. The world would be a better place if we used monotone instead :(

Edit: these days I'm more a fan of the formalism of darcs, and really looking forward to a stable pijul. But there is a soft spot in my heart for monotone, and more than once I've considered forking it to modernize with compact elliptic curve crypto instead of RSA and a faster non-relational database or filesystem backend instead of SQLite.


Git, Mercurial, and Bazaar all came out within a month of each other, so I'm guessing Bazaar did as well.


That's so bizarre! I remember hearing about mercurial & bazaar in 2005, was still using svn at work in 2006/2007. I don't think I knew about git until Ruby on Rails switched to it in 2008. Interestingly the switch happened at the same time Github launched.


> 1. Linux source was hosted on BitKeeper before git. It basically was one of the first distributed source control systems (or the first? not sure if anything predated it).

There were several systems which predated it. The most arcane one I know of is Sun's TeamWare[1] which was released in 1993 (BitKeeper was released in 2000). It also was used to develop Solaris for a fairly long time.

Larry McVoy (the author of BitKeeper) worked on TeamWare back when he worked at Sun.

> 2. Linux devs got into conflict with the BitKeeper owner over open-sourcing and reverse engineering, so Linus realized they needed a new system because no other source control system had the features they needed like BitKeeper had (mainly, I understand, the distributed repos).

That is the crux of it. Linus does go over the BitKeeper history in the talk he gave at Google a decade ago on Git[2].

[1]: https://en.m.wikipedia.org/wiki/Sun_WorkShop_TeamWare [2]: https://www.youtube.com/watch?v=4XpnKHJAok8


Wasn't the final straw when Bitkeeper revoked usage for open source projects?


That appears to be accurate. BitKeeper froze out kernel developers in April 2005[0] and Git was started the same month[1]:

> Git development began in April 2005, after many developers of the Linux kernel gave up access to BitKeeper, a proprietary source-control management (SCM) system that they had formerly used to maintain the project. The copyright holder of BitKeeper, Larry McVoy, had withdrawn free use of the product after claiming that Andrew Tridgell had created SourcePuller by reverse engineering the BitKeeper protocols.

But as the BitKeeper article makes clear, the relationship between BitKeeper and OSS had been frosty for a while.

[0] https://en.wikipedia.org/wiki/BitKeeper

[1] https://en.wikipedia.org/wiki/Git#History


And in an interesting twist, it seems that BitKeeper is now Open Source anyway (or at least I see "Now available as Open Source under the Apache 2.0 License" on their webpage) meaning that this could all have been avoided


Yes. They even did a Show HN in 2016 when it happened: https://news.ycombinator.com/item?id=11667494

Larry McVoy comments here occasionally as https://news.ycombinator.com/user?id=luckydude


Going by the posts in the forums[1], it's more that BitKeeper has effectively ceased operations in terms of selling its system.

People have noted multiple times that its current build system is fairly hostile to packaging on *NIX, but nobody seems to be putting in the work and the time. I'm somewhat curious about BitKeeper, but not enough to make my first goal to go down into its guts (complete with a custom stdio from NetBSD with modifications!) to make it play nicely with my system.

[1] https://users.bitkeeper.org/t/thoughts-after-a-few-days-with...


Larry McVoy was know for threatening employees of companies using BK and developers of competing VCSs [1], so I understand why so many people wanted to keep away from it.

[1] https://lwn.net/Articles/153990/


One of the Samba people connected to a Bitkeeper server using telnet and typed ‘help’. The server helpfully returned the commands it would accept. This rather infuriated Larry who considered this reverse engineering.


I've never used BitKeeper. How much of Git's design is "inspired" by BitKeeper/how similar are they? Particularly at a file format/low level.


If anything they are as different as possible. And part of that was to avoid legal problems with BK.


BitKeeper was created in large parts to fit Linux and Linus came with a lot of feedback to McVoy. Perhaps even took part in the original design? Someone who can remember can surely fill in. I never really used it, but I expect there to be some shared ideas for those reasons.


BitKeeper was distributed sccs on steroids, but kept true to its sccs heritage -- as tridge had demonstrated when he "reverse engineered" BK by telneting to the server and typing "HELP" (and following examples from it).

SCCS is a single-file revision management system, which inspired RCS (but they are not compatible IIRC; one keeps changes forward and the other backwards). CVS was a management system on top of RCS to provide "repository-wide" actions and branches. SVN was supposed to be "CVS done right".

git is a blockhain/dag content addressable storage, inspired and borrowing a lot from monotone (a system created by Graydon Hoare who later went to create rust; Linus credits monotone and graydon for inspiring the design of git, which is essentially monotone but much simplified).

McVoy and Linus did, in many ways, collaborate on making BitKeeper more comfortable for Linus and Linux development; but the design and implementation of BitKeeper predates that and goes back to SCCS through TeamWare, and maybe even farther.


SCCS uses a “weave” representation in which all revisions of a file are interleaved. Checking out any revision takes the same time; diffing is fast. RCS (and CVS) represents files as diffs backwards from the latest revision, and if there are branches, diffs forwards from each branch point.


He’s probably responsible for more wealth creation than anybody in the last hundred years just from those two projects.


> more wealth creation

Back in the 90’s, my sister asked what “Linux” was and I explained it to her as a free replacement for Windows (I know, I know, but that was the right description for her). She asked why it was free while Windows was expensive enough to make Bill Gates the richest man in the world. I told her that the guy who wrote it gave it away for free. She said, “wow, I’ll bet that guy feels really stupid now.”


If he had not given it away for free, today, nobody would even know what Linux is. It's all about network effects. He was severely disadvantaged because of a late start and no industry connections. There were plenty of competing commercial operating systems at that time. In fact, there were plenty of competing free operating systems too (e.g. MINIX).

Gaining user attention at a global scale is always extremely competitive, even if you give it all away for free.

Some people like Bill Gates got extremely lucky thanks to excellent social connections but others like Linus who were not so lucky had to go to extreme lengths to break through all the social and economic obstacles imposed on them by the incumbents.


Arguably the biggest impact on the early success of Microsoft was the rampant cloning of IBM PCs by third party manufacturers like Compaq, which generated competition that drove down prices and expanded the market for PCs. In that ecosystem, the cloned IBM spec was the "free" part that enabled the network effects.

Gates was smart enough, though, to negotiate a non-exclusive license with IBM.


> In fact, there were plenty of competing free operating systems too (e.g. MINIX).

I don't think there were many (any?) free UNIX or UNIX-like OSs at that time. MINIX wasn't albeit Tanenbaum wanted it accessible to as many students as possible so teh licence fee was relatively cheap compared to other UNIXes of the time. 386BSD (the precursor to FreeBSD and NetBSD) wasn't released until around a year after Linux albeit they started quite a bit before.

I guess there was lots of free OS's in the hobbyist sense but nothing that actually competed with MINIX.


To be honest, most commercial Unix offerings at that time were really bad and/or really expensive.

Furthermore, his version worked on modern hardware


If he had not given it away for free, nobody today would know who Linus Torvalds is, most likely.


I really don’t think you could conflate Gates’ success with luck...


- Born in the US and lived in Seattle; a major tech hub at the time.

- Father was a wealthy attorney.

- Mother worked for IBM.

- His first OS demo for IBM worked the first time even though they had only tested it on an emulator before. This is extremely unusual; there are a lot of factors which can make the emulator behave differently from the real thing.

- IBM did not see the value in software and did not ask for exclusivity (they could easily have demanded it).

Sure he is a very smart guy, but mostly he is a ridiculously lucky guy.


Hah. Torvalds is a millionaire and, since he is Finnish, he probably feels at least as rich as Bill Gates.


He actually did okay financially because both RedHat and VALinux gifted him shares during the dot-com bubble. Not "richest man in the world" rich, but "enough to be financially comfortable and not have to work" rich.


I think the bragging rights of having at least half the internet running on top of his software would count for quite a bit too. There are very few people on this planet who can boast that level of influence.


Not to mention, something like 90% of handsets, tablets, tv set boxes, ebook readers, etc. Almost everything not iOS is based on Linux (Android, Tizen, KaOS, Tivo, Roku, ChromeOS, Amazon's fire stuff, etc)


I've had almost the exact same experience with some of my friends, when they asked about Linux.


I've always thought that open source developers devalue their work by giving it away for free.


isn't Linus making a low 8-figure income from a pretty much more relaxing job than being a billions-dollars worldwide company CEO.


Linux in part won because the Regents were getting sued at a critical time. Without linux we would run BSD and it would be fine.

Git is good in some ways, terrible in others. I've used it for years and still don't feel really comfortable with it, but I've never had something as multiheaded as the linux repo.


Not really. Linux won because the GPL has proven to be more suitable to businesses, and especially commercial open source business models as practiced by companies such as Cygnus and Red Hat.

BSD-style licenses tend to be a better fit for business models that sell proprietary extensions to free software. These form lock-in moats that inhibits the growth of any deeper ecosystems. We've seen this over and over again with things such as graphical subsystems for non-free UNIX, but also more recent examples with firewalls and storage boxes. Those are great for what they do, but work on the free parts are seen more like a gift to the community than a money maker.

The tit-for-tat model of the GPL enables those ecosystems to form. By forcing your competitors to free their code in exchange for yours, game theory dictates that moats cannot form, and when everyone stands on other's shoulders development is faster.

I'd say that's pretty much experimentally proven by now. Of course, reality is not as black and white, especially when GPL-style companies contribute to BSD-licensed software and vice versa. Perhaps PostgreSQL is a prominent example of that. There are however traces of these patterns there too, for example in how the many proprietary clustering solutions for the longest time kept the community from focusing on a standard way.


> Linux won because the GPL has proven to be more suitable to businesses, and especially commercial open source business models as practiced by companies such as Cygnus and Red Hat.

That's an interesting take, but I'm not sure I understand.

Are you saying Linux would've lost (presumably to proprietary OSs?) if it used a permissive license?

If the GPL has been proven to be more suitable for business, why is the use of GNU GPL licenses declining in favor of permissive licenses?

I don't have a hog in this pen and so I'm not trying to provoke. I'd just like to hear thoughts on why it looks so different from where I sit.


Not the parent, but here's my thoughts:

The size and quantity of companies working on things related to a project determines whether a strong copyleft license or a permissive license makes the project more successful

Say you want to make a business around a FOSS project. Which license should you choose for that project?

If your business starts gaining traction, people may realize it's a good business opportunity, and create companies that compete against you.

I'll simplify to two licenses, GPL and MIT. Then there's two options, based on which one you chose originally:

1) If you chose the GPL, then you can be sure that no competitor will get to use your code without allowing you to use theirs too. You can think of this as protection, ensuring no other company can make a product that's better than yours without starting from scratch. Because everyone is forced to publish their changes, your product will get better the more competition you have. However, your competitors will always be just a little behind you because you can't legally deny them access to the code.

2) OTOH if you chose MIT, a competitor can just take your project, make a proprietary improved version of it and drive you out of the market. The upside is if you get to be big enough, you can do exactly that to _your_ competitors.

You can see that when you are a small company the benefits of GPL outweigh the cons, but for big ones it's more convenient to use MIT or other permissive licenses. In fact, I think the answer to your question "why is the use of GNU GPL licenses declining?" is because tech companies tend to be bigger than before.

Now say you want to make a business around some already existing software. And say there's two alternative versions of that software, one under the GPL and one under the MIT license (for example, Linux and BSD). Which one should base your business on? And contribute to? Well, it's the same logic as before.


In the context of Linux, think of the GPL as a joint development agreement, with teeth.

I used to follow LLVM development. There was lots of mailing list traffic of the form "I'll send you guys a patch as soon as management approves it..." followed by crickets.

Basically, RMS was exactly correct about the impact that loadable modules would have on GCC's development.


Basically, RMS was exactly correct...

The last thirty years in a nutshell.


> If the GPL has been proven to be more suitable for business, why is the use of GNU GPL licenses declining in favor of permissive licenses?

It isn't, at least not exactly. It's declining in favor of a combined permissive/commercial license model. And it's only doing that for products that are meant to be software components.

The typical model there is that you use a permissive license for your core product as a way of getting a foot in the door. Apache 2.0 is permissive enough that most businesses aren't going to be afraid that integrating your component poses any real strategic risk. GPL, on the other hand, is more worrisome - even if you're currently a SAAS product, a critical dependency on GPLv2 components could become problematic if you ever want to ship an on-prem product, and might also become a sticking point if you're trying to sell the company.

But it's really just a foot in the door. The free bits are typically enough to keep people happy just long enough to take a proper dependency on your product, but not sufficient to cover someone's long-term needs. Maybe it's not up to snuff on compliance. Or the physical management of the system is kind of a hassle. Something like that. That stuff, you supply as commercial components.


I don't agree, I think the license doesn't matters at all for users. And there really aren't that many companies distributing Linux, such that they would need to comply with the GPL. Google can make all the patches they want for their servers. The only reason for them to contribute them back is to offload the maintenance cost.

As with most things, I think Linux succeeded because it was a worse-is-better clone of existing systems that happened to get lucky.


It's a shame how poor the command-line usability of the git client is. Commands are poorly named and often do the wrong thing. It's really hard to explain to new users why you need to run `git checkout HEAD *` instead of `git reset` as you'd expect, why `git branch [branchname]` just switches to a branch whereas `git checkout -b [branchname]` actually creates it, etc.

I really wish he'd collaborated with some more people in the early stages of writing git to come up with an interface that makes sense, because everyone is constantly paying the cost of those decisions, especially new git learners.


I don't even know what `git checkout HEAD * ` does lol. Does it just checkout the current branch to the HEAD of that branch?

I can never seem to guess what things do in Git, and I consider myself fairly comfortable with the core concepts of Git. Having written many types of append-only / immutable / content address data systems (they interest me), you'd think Git would be natural to me. While the core concepts are natural, Git's presentation of them is not.. at least, to me.

edit: formatting with the * .


git checkout <reference> [<files>]

so, that says, copy all of the files out of the current branch at the current commit into the local dir. What this will do in practice is "discard current changes to tracked files". So If i had files foo, bar, baz, and I had made edits to two of them, and I just want to undo those changes, that's what checking out * from HEAD does. It doesn't however delete new files you have created. So it doesn't make the state exactly the same.

So why not just git checkout HEAD? Well, you already have HEAD checked out (you are on that branch), so there's nothing for git to do. You want to specify that you want to _also_ explicitly copy the tracked file objects out also. It's kind of like saying "give me a fresh copy of the tracked files".

The confusing thing is that in practice it is "reverting" the changes that were made to the tracked files. But `git revert` is the command you use to apply the inverse of a commit (undo a commit). One of the more confusing aspects of git is that many of the commands operate on "commits" as objects (the change set itself), and some other commands operate on files. But it's not obvious which is which.


That command would only discard changes to non-hidden files though, because * typically doesn't expand to hidden files. I think the command one really wants in these cases is

  git reset --hard


That throws away everything though, whereas `git checkout HEAD *` only throws away stuff in the current directory and below, or you can pass exact filepaths to be surgical about which changes exactly you're reverting. This is what I use it for most often -- reverting some, but all, edits.


Gonna risk getting my head put on a stake, but why not just use a GUI git client at that point like TortiseGit/GitKraken/SourceTree?


It's been a long time since I used a GUI source control client. Maybe I should try one out again. Certainly it makes diffs nicer.

It's just that I've been using git CLI for so long, and know exactly which commands to use in any circumstance without having to look them up, that I don't benefit much from switching to something new, whereas someone who hasn't yet put in that time to really learn git would stand to benefit more.


A small correction...

`git branch [branchname]` creates a branch without switching to it.

`git checkout -b [branchname]` creates a branch and checks it out.

And `git reset --hard` will also discard changes. (Arguably, this is better than `git reset` discarding local changes, as it is more explicit.)


Your comment only makes the GPs point stronger.


I don't think so; I'm not much of a developer but I've used git productively for years and I've never had to run the commands CydeWeys listed.

The commands johnmaguire2013 listed are the ones usually recommended for beginners and I have found them easy to understand.

"git branch [name]" is for creating branches; it tells you if the branch already exists. Pretty easy to understand.

"git checkout [name]" is for checking out branches; it tells you if you're already on that branch.

You can run these sequentially and it works fine; there's no need for `git checkout -b [branchname]`.

I think there is sometimes some productivity porn involved in discussions of git, where people feel really strongly that everything should be doable in one line, and also be super intuitive. It's a bit like the difference between `mkdir foo && cd "$_"` on the command line, vs just doing mkdir and cd sequentially. IMO the latter is easier to understand, but some experienced folks seem to get upset that it requires typing the directory name twice.


Mm, I'm not sure I agree. The first correction shows that the command works as GP would have expected. The second command shows why checkout works too -- because it is checking out the branch (like the GP expected) in addition to creating a branch.

And I have already explained why `git reset --hard` makes more sense in my opinion.

I agree that Git can be hard to wrap your head around, and that the commands could be more intuitive. But Git is complex in large part because the underlying data structure can be tricky to reason about -- not because the UI on top of it is terrible.


There may be complexity under the hood (I don't know), but in my experience, even pretty advanced users employ a pretty straightforward mental model that's considerably simpler than the commands themselves are to use.


I disagree. I think the commands are intuitive and work perfectly for the system. It's very expressive, but it all makes sense once you've learned it.

git is a tool. Different tools take different amounts of time to master. People should probably spend some time formally learning git just as one would formally learn a programming language.


Other version control systems don't have this problem as much.


And where are they now? If you're not using git as source control at this point, I wouldn't even consider downloading your project.



Git succeeded in spite of its CLI, not because of it.


It's a good point: just because an entire product won out doesn't mean that every single one of its features was individually superior to its competitors. This is definitely not true for git.


That's roughly the same as saying that you won't eat the food prepared by a chef if they don't use your favorite brand of knife. You've diminished the value of your previous comment substantially with this one.


The git storage structure is not that difficult. People could implement their own compatible clients on top of it with a quite different UI. That there is nothing that overtook git as UI in popularity seems to be an indication that the interface is not as bad as many people claim.


People have added better UIs on top of git. The problem is they don't come installed by default out of the box, and unless everyone else you work with is using them too it becomes quite hard for you to communicate properly over git issues (especially in writing development workflow docs). hg has a better UI out of the box, and is notably easier for new users to pick up and become productive with.

You're underestimating how much inertia is created simply by being the out-of-the-box default, and how hard that inertia is to overcome even by better alternatives.


Not sure who "many people" are, but no one in this thread is claiming that git is unworkable, only that it is confusing. A collaboration tool like git is highly subject to network effects. The usability delta between git and a given alternative must be very, very high before people will leave git for the alternative. Ergo, git can be both awful and "good enough" to have a majority market share (although I don't think anyone is even saying git is awful in this thread).


GoT (gameoftrees.org) did just that.

Built another tool similar but not same, using the same storage method underneath.


I use magit and it is a lifesaver.


I can still never remember the git equivalents for things like ‘hg log -r default:feature-branch’ or ‘hg diff -c <commit>’. People who haven’t used Mercurial really have no idea how pleasant a VCS interface can be.


The latter is 'git show <revision>', show being one of those fun commands that does a bazillion different things depending on what the argument is. My fun example of what's-the-git-command is 'hg cat -r <commit> <path>', whose git commit equivalent is another variant of 'git show'


Complain about Git, but there are commercial alternatives (I'm thinking of Perforce) that make even less sense.


>but there are commercial alternatives (I'm thinking of Perforce) that make even less sense.

Have you ever worked with Rational ClearCase? It's a true horror show.


ClearCase is wonderful if you fit the use case, which is a sizable team on a fast LAN.

It's great being able to change a ClearCase config file to choose a different branch of code for just a few files or directories, then instantly get that branch active for just those specific files.


I have during several occasions in the past (up to 2007), and while it is a monster, I still felt more productive than using git nowadays.


Remember, back then companies had to hire dedicated Clear case engineers to get things working properly.


You mean just like we have to reach out to IT to sort out git issues for anyone that strays outside the path?


Not really a common occurrence here.

Does it happen often at your workplace? What kind of issues are we talking about?


Basically the usual ones that end up with copying the modified files to a temporary directory and doing a fresh clone followed by a manual merge and commit, because someone messed up their local repository while trying out some git command beyond clone/pull/push/checkout/commit, and now cannot push without messing everyone's else.


Interesting; that never, ever, ever happens where I work, and most of our engineers are fresh-outs.


How often does this happen? Doing a fresh clone should be a last resort.


A couple of times per month, not everyone is a git black belt.


And there are alternatives that make better sense, too.

But the existence of something even worse doesn't excuse something that is merely bad. And git is so much more widely used that its total overall harm on developer productivity is worse.


You really only need to know a half dozen commands for basic productivity.

I started in the late 90's when cvs was popular. Then we moved to svn. You had productivity issues of all sorts, mainly with branching and merging.


Thing is most of these can simply be fixed using an alias, its not that hard to remember '-b' creates a new branch, but 'gcb' followed by the branch name I want is probably easier, however maybe someone wants 'gcob' or just 'cob', thats what Alias is for.


I don’t think those examples were meant to be an exhaustive list of git’s UI warts. There are many that are harder to remember and creating aliases and functions for each of them require building a full alternative UI. For example, how do you see all of the changes in a branch (IIRC the equivalent of ‘hg diff -b branch-name’)? How do you see the changes for just one commit (I.e., ‘hg diff -c $commit’). These things are all feasible in git, but I can never remember the incantation, so I have to Google every time. I haven’t used hg in 5 years and I still have an easier time remembering those commands.


> How do you see the changes for just one commit (I.e., ‘hg diff -c $commit’).

git show <commitish>

will show the log message and the diff from the parent commit tree.


The changes for just one commit are `git diff $commit`, while the changes for a branch are `git diff $branch`.

While there are a metric ton of things which are confusing about git, this was perhaps not the greatest example.


`git diff <commit>' is the set of changes since that commit, not the changes of that commit (the distinction between hg diff -r and hg diff -c). Similarly, `git diff <branch>' is the diff between that branch and the current HEAD, not the diff of the branch itself.

So perhaps it's a great example if you've gotten it wrong?


/u/samatman is definitely mistaken, and I think you are correct (although I'm not sure about "the set of changes since that commit"). As far as I can tell, `git diff <commit>` and `git diff <branch>` both just diff your current workspace against the commit/branch respectively.

In the case of the branch, the correct result is something like "what has changed since my branch diverged from its parent"--basically what you see in a PR. I think this is unnecessarily obscure in Git because a "branch" isn't really a branch in the git data model; rather it's something like "the tip of the branch".

I don't think I've ever wanted to compare my workspace against a branch, but clearly diffing the branch is useful (as evidenced by PRs). Similarly, I'm much less inclined to diff my workspace against a particular commit, but I often want to see the contents of an individual commit (another common operation in the Github UI).

In essence, if Github is any indicator, Git's data model is a subpar fit for the VCS use case.


That's fair, insofar as I'm unfamiliar with the mercurial commands.

It's unfair insofar as it does what I expect it to, which is to diff between what I'm curious about, and where I am.

In other words, if you elide the second argument, it defaults to wherever HEAD is.

The point being, this is not something I personally need to look up. I'd venture a guess that your familiarity with hg is interfering because the conventions are different.


That brings up a deeper issue with git's philosophy. Git's UI is largely geared to introspecting the repository history only insofar as it exists to the currently existing checkout--commands that interact with history without concerning themselves with the current checkout are far more inscrutable, confusing, and difficult to find.

By contrast, Mercurial's UI makes the repository history a more first-class citizen, and it is very easy to answer basic questions about the history of the repository itself. If you're doing any sort of source code archaeology, that functionality is far more valuable than comparing it to the current state: I don't want to know what changed since this 5-year-old patch, I want to know what this 5-year-old patch itself changed to fix an issue.


> I'd venture a guess that your familiarity with hg is interfering because the conventions are different.

Git users also need to answer questions like "What changes are in my feature branch?" (e.g., a PR) and "What changed in this commit?" (e.g., GitHub's single-commit-diff view). These aren't Mercurial-specific questions, they're applicable to all VCSes including Git, as evidenced by the (widely-used) features in GitHub.

Even with Git, I've never wanted to know how my workspace compares to another branch, nor how a given commit compares to my workspace (except when that commit is a small offset off my workspace).

> In other words, if you elide the second argument, it defaults to wherever HEAD is.

Yeah, I get that, but that's not helpful because I still need to calculate the second argument. For example, `git diff master..feature-branch` is incorrect, I want something like `git diff $(git merge-base master feature-branch)..feature-branch` (because the diff is between feature-branch and feature-branch's common ancestor with master, not with HEAD of master).

One of the cool things about Mercurial is it has standard selectors for things. `hg log -b feature-branch` will return just the log entries of the range of commits in the feature-branch (not their ancestors in master, unlike `git log feature-branch`). Similarly, `-c <commit>` always returns a single-commit range (something like <commit>^1..<commit> in git). It's this consistency and sanity in the UI that makes Mercurial so nice to work with, and which allows me to recall with better accuracy the hg commands that I used >5 years ago than the git commands that I've used in the last month.


It'd be better to not have to do these things at all, i.e. if the commands just made sense out of the box.

These are problems that every single person learning git has to figure out and then come up with their own solutions for.


I use git with no aliases and have forever. I came from SVN, and myself and the entire team I worked with at the time enjoyed the transition and had very few issues.

So much of this drama seems propped up on things that just aren't that difficult.


I'd argue that naming and usability of git is actually very-very up to the point. Naming reflects not the what you want to do on high level, but what you want to do with underlying data structure and checked out files. This could be seen as weakness or unnecessary problem for newbies, but if you work with branches or in multiuser environment, you inevitably would run in some complex and problematic conflict, and then "magical tools" would leave you with a mess, while I've yet to see problem with git repository I couldn't solve.

I've actually taught how to use git to many teams, and I always start with merkle trees. They are actually easy to grasp even for designers and other nontechnical people, and could be explained in 10 minutes with a whiteboard. And then suddenly git starts to totally make sense, and I'd dare to say, become intuitive.


> It's a shame how poor the command-line usability of the git client is.

In comparison to an average CLI program's usability, I think git's got a very good one. It's not perfect, but I think saying it's "poor" is really exaggerating the problems.

In particular, I love how well it does subcommands. You can even add a script `git-foobar` in your $PATH and use it as `git foobar`. It even works with git options automatically like `git -C $repo_dir foobar`.

> It's really hard to explain to new users why you need to run `git checkout HEAD * ` instead of `git reset` as you'd expect

Why would you ever do `git checkout HEAD * ` instead of `git reset --hard`? The only difference is that your checkout command will still leave the changes you've done to hidden files, and I can't think that's ever any good.

> why `git branch [branchname]` just switches to a branch whereas `git checkout -b [branchname]` actually creates it

If you think those behaviors should be switched, good, because they are.

EDIT: How did you manage to add the asterisk to the checkout command in your post so that it's not interpreted as italics without adding a space after it?


While I think a world where BSD would have become dominant would have thrived, things would have been different. Because of GNU existing before Linux, and it never fully adopting Linux as its kernel, Linux has always existed seperate from a specific userland. In my mind, this allowed more variety to be created on top of it (for better or worse). Moreover, Linux' license has encouraged a culture of sharing around kernel components that the BSD license did not mandate.

In an alternative timeline where BSD would be dominant, would we have e.g. free software AMD drivers? Would we have such big variation in containers, VMs, and scalabe system administration as we do on Linux? I wonder. No doubt that world would also be prettier than what we have now - in line with ways in which the BSDs are already better than Linux - but who knows.


I used to believe that absent the lawsuits that BSD would have been THE choice instead of Linux, but I think there's a lot of truth to the position that Linux was far more experimental and evolving rapidly -- and exciting! -- than FreeBSD (et all) which were busy doing things like powering the biggest web companies of the 90s (Yahoo and many more). Making waves and iterating rapidly was never going to mesh with the Unix Way (even open source Unix). As such, Linux got the mind-share of hackers, idealists, students, and startups and the rest is history.

(I think it's a pity that the useful innovations that happened in Linux cannot be moved back over to FreeBSD because of licensing -- the computing world would be better off if it could.)


Serious question, what innovations in Linux would FreeBSD even want? I honestly can't think of any.

IMO it's Linux that should want the features from FreeBSD/Solaris. I want ZFS, dtrace, SMF, and jails/zones. Linux is basically at feature parity, but the equivalents have a ton of fragmentation, weird pitfalls, and are overall half baked in comparison.

For example, eBPF is a pretty cool technology. It can do amazing things, but it requires 3rd party tooling and a lot of expertise to be useful. It's not something you can just use on any random box like dtrace to debug a production issue.


> Serious question, what innovations in Linux would FreeBSD even want? I honestly can't think of any.

systemd


But those lawsuits were well settled long before Linux saw a significant inflection point, mostly with the rise of cloud computing. For example, AWS launched EC2 in 2006 (and Android 2 years after that), 12 years after the BSD lawsuit was settled. Linux still doesn't have a desktop footprint outside of the workstation market. By contrast, Apple (well, NeXT) incorporated portions of FreeBSD and NetBSD into their operating system.

This might be a controversial opinion but: Linux likely "won" because it was better in the right areas.


Linux was already used heavily long before "cloud" computing became a coined term. Not just for cheap hosting providers either, in the early 00s Linux dominated the 500 super computers. I also remember repairing an enterprise satellite box in 2002 which ran Linux.

You're right that those law suits were settled long before Linux gained momentum though. FreeBSD and NetBSD were released after Linux and their predecessor (386BSD) is very approximately as old as Linux (work started on it long before Linux but it's first release was after Linux). As far as I can recall, 386BSD wasn't targeted by lawsuits.

Also wasn't BSD used heavily by local ISPs in the 90s?

In any case, I think Linux's success was more down to it being a "hacker" OS. People would tinker with it for fun in ways people didn't with BSD. Then those people eventually got decision making jobs and stuck with Linux because that's where their experience was. So if anything, Linux "won" not because it was "better" than BSD on a technical level but likely because it was "worse" which lead to it becoming more of a fun project to play with.


> but I've never had something as multiheaded as the linux repo.

Git is overkill for so many projects, I hate being forced into for everything.


Git is the simplest, low friction, low cost, low everything above file storage. How can there be something simpler atop an existing file system (I know there are some versioning file systems but I've never used them). I use git for practically anything I do. I to git init and I have my project versioned and I can but don't have to even add messages to each of my versions. You don't have to use anything else if you don't want to but you have so many options if you need them. You don't have to even use git on line if you don't want but if you do there are multiple (even open source) git repositories with free private repos. What is there not to like?


Mercurial is wonderfully simple, particularly for smaller teams. Also, not being able to throw away branches ensures the project maintains a history of some wrong paths that were pursued.


> How can there be something simpler atop an existing file system

Mercurial? Similar DVCS concepts, but you no longer have to worry about garbage collection or staging areas...


What garbage collection? Isn't staging area actually a feature? I've never used anything else since when I started needing something like 5 years ago git was already a recommended choice, but I also never felt like I needed anything else.


If your commits are not referenced by a branch or tag, then those are eventually committed. Having to have a branch to keep the commit around means you need to come up with a name for it if you ever want more than one name. When I go back to Mercurial, it's actually quite relieving to not have to come up with a short name to describe what the current work branch is doing, only commit messages.

And no staging area is strictly simpler than having a staging area, which is contrary to your assertion.


When would you be creating a branch to do work without knowing what work you're doing?


It's not that I don't know what work I'm doing, it's that I don't know how to give it a unique name.

Sometimes, I try a few different approaches to make something work. Each of these attempts is a different branch--I might need to revisit it, or pull stuff out of it. Good luck staring at a branch name and working out if it's landloop or landloop2 that had the most working version of the code.


Super late, but I think a good approach would be to use an issue/work tracker of really any flavor (even manually), log all of your to-do headings as issues, and then just name each branch after the associated issue/job number.


> Isn't staging area actually a feature?

I've been using git for a few years, and staging has been all cost with zero benefit so far.


"Isn't staging a feature?"

Well yes, but the GP is claiming that git is the most simple thing above file storage.

Staging may be a feature, but it adds complexity. Perhaps useful complexity, but complexity nonetheless.


mercurial with a list of different plugins for each project? no thanks


Why would you need a different list of plugins for each project?


> I know there are some versioning file systems but I've never used them

But those other systems were the whole point of the post you replied to ;)


`.git` is a directory, while in Fossil, the repo is a single SQLite file.

Not making any larger comparison here, what I'm saying is that a single file is simpler than a single directory.


care to elaborate? I fail to see how git would be considered 'overkill' for a project.


Other version control software has way simpler syntax and workflow. Subversion for example. The complexity of git makes total sense if you indeed have a complex, multi-HEADed project like the Linux kernel. But most software isn't Linux.


You literally need to know two commands to work with git in a simple project; add and commit. I don’t see how that is any complicated?


By that argument, you literally need to know a single command to work with Mercurial (hg commit) or SVN (svn commit), or hell, even CVS (cvs commit).


> "Mercurial (hg commit) or SVN (svn commit), or hell, even CVS (cvs commit)."

Why would I [re]learn those tools if I already know git?

If I'm going to move to a new VCS, it's going to be one that actually gives me something I didn't have before, like Fossil. Not some other VCS that captures the same concepts with a slightly different cli UX (which hardly even impacts me at all, since I rarely interact with such systems on the command line rather than through porcelain provided by an editor extension.)


Sure, I'm not saying they're more difficult, but people here are saying that git adds too much complexity in simple projects. It doesn't, but it lets you expand into the complexity if you ever need it in the future.


git commit -a then.


You've misinterpreted you finding git to be difficult as everyone finding it difficult, leading to an argument based on git being difficult that will never be compelling to those who didn't find learning git to be difficult. I'm one of them - I don't have any CS training - and so are the interns and new starters who use it without complaint in my workplace.

If you are forced into using it to everything but still haven't taken the steps necessary for understanding it, why is that my problem?


Git is a simple or as complex as you need it to be. And the complexity doesnt come at a cost to anyone who doesn't require it but uses git anyway.


Subversion needs a server, for one. For a single user and a single file, Git is already less overhead.


It doesn't, actually, you can host a repo on the filesystem without any sort of server process.

http://svnbook.red-bean.com/en/1.7/svn.ref.svnadmin.c.create...


I agree - I even use it for tiny personal projects that I don’t even push anywhere because you can instantly get version control with a single ‘git init .’ in a directory. It’s plenty scalable and has very little overhead...


It has such minimal overhead I don't know how you could say that.


I agree. I have reasonable familiarity with git but I find that traditional SVN-type systems often (not always) have a lower cognitive overhead.

If I ever need to manage the codebase for a huge, multi-level project involving large numbers of geographically dispersed developers then I'm sure I'd use git. For simpler projects, not so likely.


Sure. But you know what is more complex than git? Git + <anything else>.

If you use git at all, you may as well use it for everything. If you have control over which version control system to use, there's no good reason to actively use multiple ones at the same time.


I use git at work where branching matters, I also use git for home projects where git add/commit/push/pull are the only commands I use. Git is efficient at both scales, it is opposite of overkill.


Furthermore, if you're forced to use it, it's because you need it to interact with others' versions of the repo, in which case branching matters.


How is git overkill? Perhaps you're conflating Git and Github? Or perhaps you're confusing git best practices or methodologies with git functionality?

Git costs nothing to use, you add it to a project and then it sits there until you do something with it. If you want to use it as a "super save" function, it'll do that. If you want to use it to track every change to every line of code you've written, it'll do that too.


Definitely. I find SVN so much easier. But we must all use Git because cargo cultism is cool, or something.


We use Git because GitHub happened to be the first non-shitty code repository website.


I'm talking about internal repos.


I know people like to complain about git's interface, but is it so lacking to the point that it justifies the time spent on learning multiple version control systems?


Yes.


Go easy on yourself and stop forcing yourself to use the CLI tools if you dislike them so much. For every editor and IDE under the sun, there exists extensions for these version control systems that provide you with a nicer interface than the CLI interface of any VCS.

For years probably 99% of my interactions with git, or any other VCS, is through editor extensions like magit or fugitive.


How do you work on more than one thing at once with svn without manually managing patch files?


Branches.


Branches in SVN are such a pain! If I recall correctly, creating a branch in SVN consists of making a full copy of everything (remotely, usually). In Git, creating a new branch consists of creating a new pointer to an existing commit.


Yes, branches look like full copies, but they are sparse copies. So only any changed data on the branch gets actually stored in the repository.


That's basically all it is in SVN as well...

And of course it's remote, every action in SVN is remote since it's centralized (except for shelving).


imho git is not a version control system but a version control tool. when we started using git trough a system mediating all the functionality trough a goal/workflow oriented approach our whole experience radically changed.

both fork and the intellij ide are great for that, handling the common cases solidly and building up so many convenience functions I can't live without them now, like whitespace aware conflict resolution or single line commits.


Right, git sort of make the easy thing easier, and the hard things harder. I think its a large part why "stable" software branching is unpopular. The difficulty of tracking fixes against their core features over time is extremely difficult without an additional "shim" on top. Even knowing what group of commits comprise related functionality becomes difficult without layering on a commit group id (as gerrit does for example) on top. (AKA i'm looking at $file which has a set of commits for feature $Y, what are the related commits in other parts of the system required to implement this feature). Or for that matter, the ability to group a fix with its original commit (without rewritting public history) is why projects like the linux kernel implement "fixes:" tags which are then scanned by scripts/etc to detect fixes for a given commit group for back-porting functionality. Except in that case its brittle as frequently everyone involved forgets the tag.

Bottom line, git is the VSC equivalent of C, it is quite powerful, but its got a lot of foot-gun moments and requires a lot of human rigor/experience to make it work well. Rigor that even the best users frequently mess up.


The thing about fiction is that it is... fiction!

So yes, if you make the hypothesis that things went out so we would all be using BSD, then we would. And yes, successful projects and people always come from a part of luck. But so what? What happened in reality is what happened, and if they went lucky good for them, but this does not really removes anything from their achievements.


If your achievements came about by luck then how do you get off claiming credit for their success? I don’t think git was purely luck—it is a formidable tool in its own right, but there are better tools out there, and that was especially true at the time when git really took off, which is to say when GitHub began to be popular.


> If your achievements came about by luck then how do you get off claiming credit for their success?

The same way it doesn't stop gamblers, stock pickers, actors, and entrepreneurs from mistaking survivorship bias for talent.

That said, I don't think git was purely luck either.


what if google built androd on top of BSD?


Almost there. The Linux kernel is the only GPL piece still standing.


wouldn't change a thing. Apple built on freebsd; Android userland and libc is all BSD


Slight clarification, Darwin is a mish-mash of CMU's Mach microkernel, some 4.3 BSD, some BSD userland, some GNU userland (although that seems to be going away), and then NeXT/Apple stuff on top of that.


I'd say a big part was attitude toward common hardware and existing common setups.

There were a couple of times early on when I wanted to try both Linux and one of the BSDs on my PC. I had CDs of both.

With Linux, I just booted from a Linux boot floppy with my Linux install CD in the CD-ROM drive, and ran the installation.

With BSD...it could not find the drive because I had an IDE CD-ROM and it only supported SCSI. I asked on some BSD forums or mailing lists or newsgroups where BSD developers hang out about IDE support, and was told that IDE is junk and no one would put an IDE CD-ROM in their server, so there was no interest in supporting it on BSD.

I was quite willing to concede that SCSI was superior to IDE. Heck, I worked at a SCSI consulting company that did a lot of work for NCR Microelectronics. I wrote NCR's reference SCSI drivers for their chips for DOS, Windows, Netware, OS/2, and Netware. I wrote the majority of the code in the SCSI BIOS that NCR licensed to various PC makers. I was quite thoroughly sold on the benefits of SCSI, and my hard disks were all SCSI.

But not for a sporadically used CD-ROM. At the time, SCSI CD-ROMs where about 4x as expensive as IDE CD-ROMs. So what if IDE was slower than SCSI or had higher overhead? The fastest CD-ROM drives still had maximum data rates well under what IDE could easily handle. If all you are going to use the CD-ROM for is installing the OS, and occasionally importing big data sets to disk, then it makes no sense to spring for an expensive SCSI CD-ROM. This is true on both desktops and servers.

The second problem I ran into when I wanted to try BSD is that it did not want to share a hard disk with a previous DOS/Windows installation. It insisted on being given a disk upon which it could completely repartition. I seem to recall that it would be OK if I left free space on that disk, and then added DOS/Windows after installing BSD.

Linux, on the other hand, was happy to come second after my existing DOS/Windows. It was happy to adjust the existing partition map to turn the unpartitioned space outside my DOS/Windows partition into a couple Linux partitions and install there.

As with the IDE thing, the reasons I got from the BSD people for not supporting installing second were unconvincing. The issue was drive geometry mapping. Once upon a time, when everything used the BIOS to talk to the disk, sectors where specified by giving their actual physical location, specifying what cylinder they were on (C), which head to get the right platter (H), and on the track that C and H specifies, which sector it is (S). This was commonly called a CHS address.

There were limits on the max values of C, H, and S, and when disks became available that had sectors whose CHS address would exceed those limits, a hack was employed. The BIOS would lie to the OS about the actual disk geometry. For example, suppose the disk had more heads than would fit in the H field of a BIOS disk request. The BIOS might report to the OS that the disk only has half that number of heads, and balance that out by reporting twice as many cylinders as it really has. It can then tranlate between this made up geometry that the OS thinks the disk is using and the actual geometry of the real disk. For disks on interfaces that don't even have the concept of CHS, such as SCSI which uses a simple block number addressing scheme, the BIOS would still make up a geometry so that BIOS clients could use CHS addressing.

If you have multiple operating systems sharing the disk, some using the BIOS for their I/O, and some not, they all really should be aware of that made up geometry, even if they don't use it themselves, to make sure that they all agree on which parts of the disk belong to which operating systems.

Fortunately, it turns out that DOS partitioning had some restrictions on alignment and size, and other OSes tended to follow those same restrictions for compatibility, and you could almost always look at an existing partition scheme and figure out from the sizes and positions of the existing partitions either what CHS to real sector mapping the partition maker was using. Details on doing this were includes in the SCSI-2 Common Access Method ANSI standard. The people who did Linux's SCSI stuff have a version [1].

I said "almost always" above. In practice, I never ran into a system formatted and partitioned by DOS/Windows for which it gave a virtual geometry that did not work fine for installing other systems for dual boot. But this remote possibility that somehow one might have an existing partitioning scheme that would get trashed due to a geometry mismatch was enough for the BSD people to say no to installing second to DOS/Windows.

In short, with Linux there was a good chance an existing DOS/Windows user could fairly painlessly try Linux without needing new hardware and without touching their DOS/Windows stuff. With BSD, a large fraction would need new hardware and/or be willing to trash their existing DOS/Windows installation.

By the time the BSD people realized they really should be supporting IDE CD-ROM and get along with prior DOS/Windows on the same disk, Linux was way ahead.

[1] https://github.com/torvalds/linux/blob/master/drivers/scsi/s...


That mostly matches my experience. I was following 386BSD's progress at that time and was really eager to try it for myself. However, the machines that it was targeting (SCSI disk of ~200MB, math coprocessor) were out of my reach. It made sense that a workstation-like OS was expecting workstation-class hardware, but it did rule out most 386 PCs that people actually owned.

However, I also agree with @wbl that the lawsuits were ultimately the decisive factor. The hardware requirements situation of BSD was a tractable problem; it just needed a flurry of helping hands to build drivers for the wide cacophony of PC hardware. The lawsuit-era stalled the project at just that critical point. By the time that FreeBSD was approaching an acceptable level of hardware support Linux already had opened up a lead... which it never gave up.


Hm. First thank you for writing the NCR-BIOS. It never let me down while deploying about 200 of them for SMBs. I had Adaptecs at the same time which were annoying to integrate. And there is the thing, from my point of view Adaptec did things differently while setting the pseudo-standard in the PC-world. There was this group-think that if SCSI then Adaptec, which i never understood, because they could be underwhelmingly fiddly to integrate and were expensive.

As to the C/H/S low-level-format, NCR could read some Adaptec formatted drives, while Adaptec couldn't read NCRs. Asshole move. Never mind.

As for the BSDs being behind? Not all the times. I had an Athlon XP 1800+ slightly overclocked by about 100Mhz to 2000+ in some cheap board for which i managed to get 3x 512MB so called 'virtual channel memory' because dealer thought it was cheap memory which ran only with via chipsets. Anyways 1,5GB RAM about twenty years ago was a LOT! With Linux of the times i needed to decide how to split it up, or even recompile the kernel to have it using it at all. No real problem because i was used to it, and it wasn't the large mess it is today.

Tried NetBSD. From a two or three floppy install set. I don't remember the exact text in the boot console anymore, just that i sat there dumbstruck because it just initialized it at once without further hassle. These are the moments which make you smile! So i switched my main 'workstation' from Gentoo to NetBSD for a few years, and had everything i needed, fast and rock solid in spite of overclocking and some cheap board from i can't even remember who anymore. But its BIOS had NCR support for ROM-less add-on controllers built in. Good times :-)

Regarding the CD-ROM situation, even then some old 4x Plextor performed better than 20x Mimikazeshredmybitz if you wanted to have a reliable copy.

As to sharing of Disks by different OS? Always bad practice. I really liked my hot-pluggable 5 1/4" mounting frames which took 3,5" drives, with SCSI-ID, termination, and what not. About 30 to 40USD per piece at the time.


And Microsoft at the time wasn't serious about POSIX.


Haber and Bosch would like to have a word with you.

https://en.wikipedia.org/wiki/Haber_process#Cause_of_populat...


That's just barely over 100 years ago.

Maybe Norman Borlaug as having the most wealth creation?


Not sure how large the contributions of Norman Borlaug are but the work of Fritz Haber (also was unfortunately also responsible for weaponizing chemistry) and Carl Bosch enabled mass-production of artifical (nitrogen) fertilizer.

About 50% of the world population depend on crops produced with artificial fertilizers. They enabled billions of people to live at all. In my opinion they are setting the bar quite high.


"Then, in 1909, German chemist Fritz Haber successfully fixed atmospheric nitrogen in a laboratory.[6][7] This success had extremely attractive military, industrial and agricultural applications. In 1913, barely five years later, a research team from BASF, led by Carl Bosch, developed the first industrial-scale application of the Haber process, sometimes called the Haber-Bosch process.[8][9]"

https://en.wikipedia.org/wiki/History_of_the_Haber_process

First extraction was 110 years ago, then it was industrialized 106 years ago.

That means they are out for

"anybody in the last hundred years" just barely.

It's amazing to think of how much change they enabled in the last 103 years.

Roughly as I understand it Haber-Bosh process enabled us to get to the billions (e.g. 1B & 2B). Norm Borlaug, & the green revolution which he helped start, built on top of fertilizer & enabled the next couple billions.

https://en.wikipedia.org/wiki/Green_Revolution

Norm Borlaug is credited with saving over a billion people from starvation & famine.


Not the inventors of solid-state transistors in 1947? 100 years is a long time and solid state transistors was the last piece of basic research needed for the digital age to begin.


And besides that, the total sum of intellectual development. I remember in my student years, first learning GNU userland, and then compiling a custom kernel, and finally pouring over kernel module code.

There must be hundreds of thousands of me, hungry to try to understand what makes an OS tick.


Agreed 100%. I don't think I would work as a software professional if growing up I had to buy (or somehow finagle) computer manuals for closed-source systems in order to learn.

Maybe I didn't read the code, but I learned so much by compiling it. Knowledge I wouldn't have bothered with had I just downloaded a binary blob.


I would like to pose a question to you, what if our culture, the tech culture, is a pop culture? Often times technologies become popular at the expense of others. If the technology that got popular wasn't the best technology but got popular for other reasons like, let's say creator was popular, then it's clear that technology might have actually destroyed a lot of potential wealth that otherwise would have occurred!

And empirically hasn't this been the case in our history in tech? Windows being popular over Mac, IE being more popular than netscape, etc etc.


Norman Borlaug man.



> He played a major role in developing leaded gasoline (Tetraethyllead) and some of the first chlorofluorocarbons (CFCs), better known by its brand name Freon

Some would argue that his impact wasn't even net-positive. He might have done it in good faith, but it didn't really work out well.


> Midgley possessed "an instinct for the regrettable that was almost uncanny"

indeed


Not sure if Linux and git beat spreadsheets and Stack Overflow. But if we're measuing individual contributors and not products - then he's still probably ahead of everyone else.


You are completely diminishing the efforts of thousands of people who contributed to the project.


And interviewers ask him "are you a socialist?"


It's really astonishing to see all the recent drama around GitHub/GitLab. Hard to believe those are multibillion dollar companies when their entire existence can be summed up as "Filling in some gaps in git". Like, if Linus Torvalds would just dedicate a few months and build that functionality directly into git, both those companies' value propositions would fall off the face of the earth.

Edit: I mean, their value propositions to internal teams would disappear. They would still have value as social networks and as centralized hosting.


GitHub is like half social network and half hosting service. I don't see these as features that should or could be built into git itself.


Idk if "gaps" is fair. Building a web UI into git would go quite against the "do one thing well" unix philosophy. And even then, hosting services would still be a thing


But git already has a built in webserver...

https://git-scm.com/book/en/v2/Git-on-the-Server-GitWeb


Yes! This has been there for over a decade! And since there have been hundreds of alternative UI tools that build upon this same tooling.

How are people here saying they've never heard of this? Oof... Typical developers, not even familiar with their tools.


TIL! Thanks! Interesting, though, that it doesn't appear to work with MacOS built-in apache (webrick works fine though).


Can you run it with python's built in http server? That's native on all OSX installs.

I believe OSX ships with Python2 by default, right?

https://docs.python.org/2/library/cgihttpserver.html

Hmm...


Yup MacOS ships with python. I didn’t try a python server since the ruby webrick server worked fine I just found it curious Apache didn’t work since it also ships in the OS.


The fact that this is news to people here is confirmation that it was a mistake.


I disagree, the fact that it's news to people here means people don't know their tools as well as one might assume.

Implying that it's a mistake because people don't know about it is very odd. Clearly, I know about it, as do others.

It's been possible for 10+ years to serve git repos using gitweb, and mercurial repos using hgweb. We did this in like 2007-2009ish, locally, in our LAN, among developers... because our code couldn't be pushed to a third party for... reasons.

Eventually I did setup our own internal SSH server to serve the repos, but for quick browsing of a team's repo state, using the built in HTTP server is just fine.


Are all HN stories about mistakes?


I definitely did not know that. Thanks!


I’m not sure. Git has proven that however good its core model might be, it’s pretty bad at putting a UI over it.


> both those companies' value propositions would fall off the face of the earth.

what do you think are their value propositions? I think they biggest part of their value proposition has to do with a centralized git repository as a service.

The centralized part is really important for most companies. To the point that many git users don't really understand it's decentralized nature.


To me, the primary value propositions are: issues, merge requests, comments on issues and merge requests, related stuff like tags/milestones/etc., and the ability to expose this stuff in a friendly way to project managers who don't use the commandline.

I guess since my team uses a self-hosted instance of GitLab, I'm biased and don't put any value on the social network aspect or the hosting aspect.


You could get these things by running a Phabricator istance - not much reason to pay for GitLab self-hosted version unless you really care about the support aspect and some of tgeir enterprise-focused niche features.


Social networks are not popular because of their tech stacks or their codebase being written. There are already several open source git frontends, GitLab included.

Social networks are popular because of their userbase and the ease of discovering other users and code. There is nothing you can build into a piece of client software that enables that.


p2p and trust networks, but it has never been pulled off


What an odd suggestion. Git follows the Unix philosophy of being designed to do one thing and do it well (concurrent version control). Building an issue tracker and more directly into it just doesn't make sense.


perhaps, but building a suite of tools that work together (but can still work individually) would not go against that philosophy.


I think you are misrepresenting what Github actually is.


Nothing happens in few months with one guy. Even if it's Linus Torvolds. There are 1300 contributors to the project - https://github.com/git/git


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: