Why SQLite Does Not Use Git

jordigh · on April 11, 2018

"Nobody really understands git" is the truest part of that. While hyperbolic, it really has a lot of truth.

It's always a bit frustrating when working with a team because everyone understands a different part of git and has slightly different ideas of how things should be done. I still routinely have to explain to others what a rebase is and others have to routinely explain to me what a blob really is.

In a team of the most moderate size, teaching and learning git from each other is a regular task.

People say git is simple underneath, and if you just learn its internal model, you can ignore its complex default UI. I disagree. Even just learning its internal model leads to surprises all the time, like the blobs that I keep forgetting why aren't they just called files.

mabbo · on April 11, 2018

The day I got over what I feel was the end of the steep part of the learning curve, everything made so much sense. Everything became easy to do. I've never been confused or unsure of what was going on in git since.

What git needs is a chair lift up that hill. A way to easily get people there. But I have no idea what that would look like. Lots of people try, few do very well at it.

johnfn · on April 11, 2018

The whole point about abstractions is you shouldn't need to understand the internals to use them. If the best defense of git is "once you understand the inner workings, it's so clear" then it is by definition a poor abstraction.

andrewflnr · on April 11, 2018

Who said it's supposed to be an abstraction? The point, theoretically, of something like Git is that the actual unvarnished model is clear enough that you don't need an abstraction. The problem IMO is that the commands are kind of random and don't map cleanly to the model.

AstralStorm · on April 11, 2018

Indeed, the worst offenders are in my opinion checkout, reset and pull.

They mix multiple only slightly related commands in one.

simik · on April 11, 2018

There are couple projects that try to tackle this problem by providing an alternative CLI (on top of git's own plumbing), like gitless and g2. Haven't used any of them myself, but would be interested in experience of others.

naasking · on April 11, 2018

Any interface means you'll build an mental model of the system you're manipulating. How else could you possibly know what you want to do and what commands to issue?

So given a mental model is inevitable, seems reasonable that that model should be the actual model.

tedivm · on April 11, 2018

You don't need to understand how media is encoded to watch a movie or listen to a song. You don't need to understand the on disk format of a Word document to write a letter. When writing a row to an SQL database I don't always understand how that software is going to record that data, but I do know I can use that SQL abstraction to get it back out.

fiedzia · on April 11, 2018

> You don't need to understand how media is encoded to watch a movie or listen to a song.

I recall the time when mp3 was to demanding for many CPUs, so you had to convert to non-compressed formats. Today you do need to know that downloading non-compressed audio will cost you a lot of network traffic. Once performance is a concern, all abstractions have to be discarded.

jsjohnst · on April 11, 2018

Exactly, if you stick to the very basics with git, you can live a happy life never caring about the internals. If you however want to dig into the depths of Git and use all its power, I don’t get why people don’t think there would be an obvious learning curve.

Same exact thing above applies to so many things in software development, from IDEs, to code editors (Vim/Emacs/Sublime/etc), to programming languages, to deploy tools, the list goes on. There’s a reason software development is classified as skilled labor and not a low end job generally. You’re expected to have knowledge of, or be willing to learn a lot, to do your job.

naasking · on April 11, 2018

The difference is that the video model abstracts over the encoding, the git model does not abstract over the storage model, it exposes it. git commands are operations on a versioned blob store.

blattimwind · on April 11, 2018

It's not versioned.

ethbro · on April 11, 2018

> So given a mental model is inevitable, seems reasonable that that model should be the actual model.

I think the longevity of SQL has proved there's value is non-leaky abstracted interfaces.

flukus · on April 11, 2018

> I think the longevity of SQL has proved there's value is non-leaky abstracted interfaces.

How is sql non-leaky? To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc. To debug and improve them you need to look at the query plan which is the database exposing it's inner workings to you.

You have to know about the abstractions an sql server sits on as well. Why is it faster if it's on an SSD instead of an HDD? Why does the data dissapear if it's an in memory DB?

bch · on April 11, 2018

> To be proficient with sql you have to understand how results are stored on disk, how indexes work, how joins work, etc

No, you don’t. As far as I know, the data is stored in discrete little boxes and indexes are a separate stack of sorted little boxes connected to the main boxes by spaghetti. This is the abstraction, it works, and I don’t need to know about btrees, blocksizes, how locks are implemented, or anything else to grok a database.

flukus · on April 11, 2018

You've never had to look at a query plan that explains what the database is doing internally? If not then I wouldn't consider you proficient, or you've only ever worked with tiny data sets.

Have you created an index? Was it clustered or non-clustered? That's not a black box, that's you giving implementation details to the database.

bch · on April 11, 2018

I don’t think being a professional DBA managing an enterprise Oracle installation is isomorphic to the general populace that might use git.

There’s no question that knowing more will get you more, but I think for the question of “when will things go sideways and I need to understand internals to save myself”, one would be able to use a relational database with success longer than git, getting by on abstractions alone. Running a high-performance installation of either is really outside the scope of the original point.

LyndsySimon · on April 11, 2018

Those things don't generally influence how you structure the query, though - you can choose to structure your query to fit the underlying structure better, or you can modify the underlying structure to better fit your data and the manipulations you are trying to preform.

Yes, most of us will have to do both at some point, but they can be thought of as discrete skills.

btschaegg · on April 12, 2018

This isn't a bad analogy though. Git itself is similar - once you understood the graph-like nature of commits (which isn't all that complicated to begin with), it's generally not hard to skim through a repository and understand its history. Diffing etc. is also simple enough this way.

If, on the other hand, you are working to create said history (and devise/use an advanced workflow for that), it's very helpful if you understand the underlying concepts. Which also goes for designing database layouts - someone who doesn't understand the basics of the optimizer will inevitably run into performance problems, just as someone who doesn't understand Git's inner workings will inevitably bork the repository.

DomreiRoam · on April 11, 2018

You don't need to know more than sql to manipulate the data. The semantic of your query is fully contained in sql.

You may need to go deeper and understand the underline model if you want performance but sticking to normal form can make unnecessary for a lot of people a lot of the time.

You can have a useful separation of work between a developer understanding/using sql and a DBA doing the DDL part and the optimization when needed.

eesmith · on April 11, 2018

You have a very high standard for what 'proficient' means, and yet a very low one.

That is, I am not proficient with relational databases, and I can handwave why an SDD is faster, and why data may disappear from an in-memory DB.

But I couldn't do an outer join without help. Nor do I know when I would want to do one.

Bob Martin wrote the essay at http://blog.cleancoder.com/uncle-bob/2017/12/09/Dbtails.html , in which he writes:

> Relational databases abstract away the physical nature of the disk, just as file systems do; but instead of storing informal arrays of bytes, relational databases provide access to sets of fixed sized records.

This isn't true. SQLite does not use fixed size records.

This suggests to me that a lot of people who consider themselves proficient with SQL don't know how the results are stored on disk, nor the difference between the SQL model and the actual implementation details, making them not proficient under your definition.

flukus · on April 11, 2018

> That is, I am not proficient with relational databases, and I can handwave why an SDD is faster, and why data may disappear from an in-memory DB.

Because you know that information for other reasons as most people would. Just because the information is gained for other reasons does not make it irrelevant when using a database though.

> This isn't true. SQLite does not use fixed size records.

It's actually true of most/all modern databases these days. The point isn't knowing the exact structure the database uses to store it's information (even though it can be useful) but knowing how efficiently it can find the information for any given request. Knowing when a database is doing an index lookup or a full table scan is very important and I wouldn't consider someone that can't make a reasonable guess to be proficient in sql. Many of these details are even exposed in the sql, when you create an index and decide if it's clustered or non-clustered your giving the database specific directions about how the data will be physically stored.

The fact that you need to know anything about how they do their work internally to be reasonably competent at using them makes them a leaky abstraction.

blattimwind · on April 11, 2018

SQL leaks for complex queries and schemas if performance needs to be optimized. I argue virtually all abstractions leak heavily when performance is considered, some more than others. SQL leaks relatively little in comparison to some other technologies IME.

Also, SQL has well-established processes and formalisms to design schemas which generally result in solid performance by themselves. That's what RDBMS are around for, after all: enabling efficient and consistent record-oriented data manipulation. This is quite difficult to do correctly in reality; for example, if you write your own transaction mechanism for disk/solid-state storage, you are going to do it wrong. This is genuinely difficult stuff.

There is a ton of internals that SQL abstracts so well that very few DB programmers know or (have to) care about them. Things like commit and rollback protocols, checkpointing, on-disk layouts, I/O scheduling, page allocation strategies, caching etc.

eesmith · on April 12, 2018

You wrote "Just because the information is gained for other reasons does not make it irrelevant when using a database though."

Certainly. My comment, however, concerned what you meant by 'proficient', and not simple use.

You used the qualifier "all modern databases". Was that meant to imply that SQLite is not a modern database?

My point remains that there are many people who are proficient in SQL, and would do very well with SQLite, even without knowing the on-disk format.

That is why I disagree with your use of the term "proficient".

i2om3r · on April 11, 2018

You seem to be talking about a different kind of leakiness. In my mind, there are two kinds: conceptual and performance leakiness. You are talking about the latter. Pretty much any non-trivial system on modern hardware leaks performance details. From what I understand, git's UI tries to provide a different model that the actual implementation but still leaks a lot of details of the implementation model.

Mikhail_Edoshin · on April 12, 2018

It probably should be homomorphic to the actual model, but not the actual model. The map cannot be the terrain.

boudin · on April 11, 2018

I disagree with that. The point of an abstraction is to not having to know the implementation. Understanding the principles used behind will always lead to a much better use of your abstraction

AlexCoventry · on April 14, 2018

I'd also say an abstraction could be carrying its weight even if it only reduces the amount you have to think about the implementation details when using it.

specialist · on April 11, 2018

Leaky abstractions is how we get stuff like ORMs.

pritambaral · on April 11, 2018

To be fair, most ORMs poorly implement the "leaky" principle. When implemented well, like with SQLAlchemy, the end result is a much nicer ORM.

In fact, one of the things in common among the ORMs that have left a bad taste in my mouth is that they all tried to abstract away SQL without leaking enough of it.

specialist · on April 11, 2018

I have a different thesis:

Picking the ideal interface to abstract is critically important (and very hard).

In the case of ORMs, available solutions abstract the schema (tables, rows, fields), the objects, or use templates. My solution abstracted JDBC/ODBC. The only leak in my abstraction was missing metadata, which I was able to plug (with much effort!).

My notions for interfaces, modularity, abstractions are mostly informed by the book "Design Rules: The Power of Modularity". http://a.co/hXOGJq1

jimmy1 · on April 11, 2018

This might sound a little out of touch, but am I the only one who doesn't think git is that hard? It is a collection of named pointers and a directed acyclic graph. The internals aren't really important once you have that concept down.

banku_brougham · on April 11, 2018

But what about the deal breaker in the article: a way to follow the decendants of a commit.

pdfernhout · on April 11, 2018

Took a few seconds with a search engine on "git descendants of a commit": https://stackoverflow.com/questions/27960605/find-all-the-di...

That said, I do feel some "porcelain" git commands are poorly named and operate inconsistently -- compared to the plumbing of the acyclic graph concepts which is good but limited.

chipotle_coyote · on April 11, 2018

So, in git, to show descendants of a commit, you use

    git rev-list --all --parents | grep "^.\{40\}.*<PARENT_SHA1>.*" | awk '{print $1}'

whereas in fossil, you use

    fossil timeline after <COMMIT>

I mean, one of these looks just a little more straightforward than the other, doesn't it?

Also, a cursory test in a local git repo just now showed that command seems to print out only immediate descendants--i.e., unless that commit is the start of a branch, it's only going to tell you the single commit that comes immediately after it, not the timeline of activity that fossil will--and all it gives you is the hash of those commit(s), with no other information.

I use git myself, not fossil, but if this is something you really want in your workflow, fossil is a pretty clear win.

dbt00 · on April 11, 2018

I mean, sure. He really wanted this feature in fossil, gave it a first class command line ui, and its super easy.

How many other ways of looking at commits or trees are there, that are hard in git but impossible in fossil because the author didn’t feel like it?

alain_gilbert · on April 11, 2018

I don't know why they have the need to retrieve the hash of the descendant commit, but usually what I'm doing is: I use a decent visual tool and just follow the branch (sourcetree).

You could alternatively use:

    git log --graph

glandium · on April 11, 2018

  git log <COMMIT>..

usr1106 · on April 12, 2018

`git log` stays in the current branch unless you give it the `--all` option. But when you give it the `--all` option the limitation by `<COMMIT>..` does no longer work. So not a solution.

glandium · on April 12, 2018

  git log --all --ancestry-path ^<COMMIT>

e12e · on April 11, 2018

You mean you didn't just read git's easy-to-follow, well-structured man pages or built-in help? /s

Half the time, when I know what I want, I both keep forgetting git flags and sub-commands - and struggle to find them in the man pages.

Like the fine:

git diff --name-only # i list only the files that are changed. But I'm not --list-files.

OJFord · on April 11, 2018

Diff lists files without --name-only though, so the flag specifies you want _only_ the filenames of those with a diff.

e12e · on April 11, 2018

True, but it's inconsistent with eg grep.

yawaramin · on April 14, 2018

Getting only the changed filenames is a fairly specialised operation. Often in normal use you can get away with a more generic operation that comes close to what you need, but is way more common, e.g.

    git diff --stat

mabbo · on April 11, 2018

If you're new to tech or you've got a different mental model of how version control works, getting across the gap to git is a challenge.

My current team are mostly controls engineers, working on PLCs. But the software we're now working with has its configurations tracked in git. These aren't dumb people, they're quite talented, but their education wasn't in CS, and "directed acyclic graph" is not a thing they have a mental model for.

kadenshep · on April 11, 2018

No you're definitely not the only one. Git is one of the simplest and dumbest tools developers have at our disposal. People's inability to conceptualize a pretty straight forward graph is something no amount of shiny UI can ever fix.

I don't understand HN's hardon for hating Git.

carussell · on April 11, 2018

Sure, and a piece table is a simple way to represent a file's contents. But if anyone wrote a shell or a text editor that required you to directly interact with the piece table to edit a file—instead of something sane—then they'd rightfully be called out on it. It wouldn't matter how much you argued about how simple the piece table is to understand, and it wouldn't matter how right you were about how simple the piece table is to understand. It's the wrong level of abstraction to expose in the UI.

kadenshep · on April 11, 2018

The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands. That's about it. The whole point of an SCM is that graph that you want to move away from. People have asserted your claim many times but can't ever give specific things to fix about the "abstraction."

There are about 5/6 fundamental operations you do in git/hg. If that's too much then again, there's not an abstraction that is going to help you out.

carussell · on April 11, 2018

See, you're trying to foist a position on me that isn't mine—that I'm scared of the essential necessities of source control. And you act as if source control were invented with Git. Neither of these are true.

> git/hg

Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles. The tradeoff was a few minor foibles of its own, but a much better tool. It's a fucking shame that Git managed to suck all the air out of the room, and we're left with a far, far worse industry standard.

Grue3 · on April 11, 2018

>Mercurial was a great solution to the same problem that Git set out to tackle, virtually free of Git's foibles.

No, Mercurial's design is fundamentally inferior to Git, and practically the entire history of Mercurial development is trying to catch up to what Git did right from the start. For example having ridiculous "permanent" branches -> somebody makes "bookmarks" plugin to imitate Git's lightweight branches -> now there are two ways to branch, which is confusing. No way to stash -> somebody writes a shelve plugin -> need to enable plugin for this basic functionality instead of being proper part of VCS. Editing local history is hard -> Mercurial Queues plugin -> it's still hard -> now I think they have something like "phases". In Git all of this was easy from the start.

Another simple thing. How to get the commit id of the current revision. Let's search stack overflow:

https://stackoverflow.com/questions/2485651/print-current-me...

The top answer is `hg id -i`.

    $ hg id -i
    adc56745e928

The problem is, this answer is wrong! This simple command can execute for hours on a large enough repository, and requires write privileges to the repository! Moreover, it returns only a part of the hash. There's literally no option to display the full hash.

The "correct" answer is `hg parent --template '{node}'`. Except `hg parent` is apparently deprecated, so the actual correct way is some `hg log` invocation with a lot of arguments.

baud147258 · on April 11, 2018

I would not call "hg log -r tip" a lot of arguments.

Also, on the git/hg debate, I feel I've had problems (like the stash your modification and redownload everything) more often with git that hg. I mean perhaps it tells something about my capability to understand a directed acyclic graph, but hg seems less brittle when I'm using it.

renox · on April 11, 2018

I disagree with some of your comments, is git stash really essential or unneeded complexity? That's debatable, I never use it personally.

What I don't like in git is the loss of history associated with squashing commits, I would prefer having a 'summary' that would keep the full history but by default would ne used like a single commit.

WorldMaker · on April 11, 2018

In git you can use merge commits as your "summary" and `--first-parent` or other DAG depth flags to `git log` (et al) to see only summaries first. From the command line you can easily add that to key aliases and not worry about. I think that if GitHub had a better way to surface that in their UI (ie, default to `--first-parent` and have accordions or something to dive deeper), there would be a lot less squashing in git life. (Certainly, I don't believe in branch squashing.)

The DAG is already powerful enough to handle both the complicated details and the top-level summaries, it's just dumb that the UIs don't default to smarter displays.

(I find git stash essential given that `git add --interactive` is a painful UX compared to darcs and git doesn't have anything near darcs' smarts for merges when pulling/merging branches. Obviously, your mileage will vary.)

kadenshep · on April 11, 2018

>you're trying to foist a position on me that isn't mine

I just said you can't give specifics on what to change, because there isn't much too change.

>And you act as if source control were invented with Git

No I'm not?

>and we're left with a far, far worse industry standard.

Yeah, we definitely should have gone with the system that can't do partial checkouts correctly or even roll things back. Branching name conflicts across remote repositories and bookmark fun! Git won for a reason, because it's good and sane at what it does.

pjmlp · on April 11, 2018

That reason was called Linus and Linux kernel development.

The master can do no wrong.

AstralStorm · on April 11, 2018

No, the reason is mercurial sucked at performance with many commits at the time, and was extra slow when merging.

Lacked a few dubious features such as merging multiple branches at the same time too.

It has improved but git is still noticeably more efficient with large repositories. (Almost straight comparison is any operation on Firefox repository vs its git port.)

pjmlp · on April 11, 2018

Mercurial has always been better than Git on Windows.

Those dubious features are so relevant to daily work that I didn't even knew they existed.

AstralStorm · on April 11, 2018

Git main target is Linux. Obviously. Performance on the truly secondary platform was not relevant and it is mostly caused by slow lstat call.

Instead Mercurial uses additional cache file which instead is slower on Linux with big repos. But happens to be faster in Windows.

And the octopus merge is used by kernel maintainers sometimes if not quite a lot. That feature is impossible to add in Mercurial as it does not allow more than two commit parents.

pjmlp · on April 11, 2018

Which reinforces the position that git should have stayed a Linux kernel specific DVCS, as the Bitkeeper replacement it is, instead of forcing its use cases on the rest of us.

kadenshep · on April 11, 2018

>Which reinforces the position that git should have stayed a Linux kernel specific DVCS

No it doesn't? People use octopus merges all the time, every single day.

pjmlp · on April 11, 2018

Well, I only get blank stares when I mention octopus merges around here.

btschaegg · on April 12, 2018

...as I get stares (okay, mostly of fear) if I point out that we need a branch in my workplace. What you can/can't do (sanely) with your tool shapes how you think about its problem space.

To emphasize that even more: Try to explain the concept of an ML-style sum type (i.e. a discriminated union in F#) to someone who only knows languages with C++-based type systems. You'll have a hard time to even explain why this is a good idea, because they will try to map it to the features they know (i.e. enums and/or inheritance hierarchies), and fail to get the upsides.

pjmlp · on April 12, 2018

Easy, is is called std::variant, available since C++17.

btschaegg · on April 12, 2018

Yeah, I guess. Except that std::variant is basically a glorified C union with all the drawbacks that entails.

yawaramin · on April 14, 2018

But git didn't force its use on anybody, lol. If you need a scapegoat, try GitHub!

eesmith · on April 11, 2018

You wrote: People have asserted your claim many times but can't ever give specific things to fix about the "abstraction."

That seems like you made an assertion as well. I think there are counter-examples.

For example, the point of gitless is (quoting http://gitless.com/ ):

> Many people complain that Git is hard to use. We think the problem lies deeper than the user interface, in the concepts underlying Git. Gitless is an experiment to see what happens if you put a simple veneer on an app that changes the underlying concepts

Some commentary is at https://blog.acolyer.org/2016/10/24/whats-wrong-with-git-a-c... .

Many HN discussions as well, including https://news.ycombinator.com/item?id=6927485 .

chriswarbo · on April 11, 2018

> The whole point of an SCM is that graph that you want to move away from.

I think that's an exaggeration. For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead. I'm sure there are other useful ways to model DVCS too.

Whilst this is mostly irrelevant for Git users, you mentioned Mercurial so I thought I'd chime in :)

> The only thing Git can really fix is changing it's command flags to be consistent across aliases/internal commands.

I mostly agree with this: Git is widespread enough that it should mostly be kept stable; anything too drastic should be done in a separate project, either an "overlay", or a separate (possibly Git-compatible) DVCS.

kadenshep · on April 13, 2018

>For example, Darcs and Pijul aren't based around a "graph of commits" like Git is, they use sets of inter-dependent patches instead.

I said graph, I didn't say which graph. Both systems still use graphs. And still a graph you have to understand how to edit with each tool. The abstraction is still the same, and if you have problems with Git, you're going to have problems with either of those tools as well. The abstraction is not the problem, it's the developers inability to conceptualize the model in their head.

Where is the exaggeration?

chriswarbo · on April 13, 2018

> I said graph, I didn't say which graph

You said "that graph" which, in context, I took to mean the git graph.

> Both systems still use graphs

True

> The abstraction is still the same

Not at all, since those graphs mean different things. Each makes some things easier and some things harder. For example, time is easy in git ("what did this look like last week?"). Changes are easy in Darcs ("does this conflict with that?"). Both tools allow the same sorts of things, but some are more natural than others. I think it's easy enough to use either as long as we think in its terms; learning to think in those terms may be hard. For git in particular, I think the CLI terminology doesn't help with that (e.g. "checkout").

> if you have problems with Git, you're going to have problems with either of those tools as well

Not necessarily. As a simple example, some git operations "replay" a sequence of commits (e.g. cherrypicking). I've often had sequences which introduce something then later remove it (bugs, workarounds, stubs, etc.). If there's a merge conflict during the "replay", I'll have to spend time manually reintroducing those useless changes, just so i can resume the "replay" which will remove them again.

From what I understand, in Darcs such changes would "cancel out" and not appear in the diff that we end up applying.

> Where is the exaggeration?

The idea that "uses a graph" implies "equally hard to use". The underlying datastructure != the abstraction; the semantics is much more important.

For example, the forward/back buttons of a browser can be implemented as a linked list; blockchains are also linked lists, but that doesn't mean that they're both the same abstraction, or that understanding each takes the same level of knowledge/experience/etc.

kadenshep · on April 14, 2018

>The idea that "uses a graph" implies "equally hard to use".

What I'm getting at is that if you don't understand what the graph entails, and what you need to do the graph, any system is going to be "hard to use." This idea that things should immediately make sense without understanding what you need to do or even what you're asking the system to do, is just silly.

I've never seen someone who understands git, darcs, mercurial, pijul, etc go "I totally understand how this data is being stored but it's just so hard to use!" I don't think that can be the case, because any of the graphs those applications choose to use have some shared cross section of operations:

* add

* remove

* merge

* reorder

* push

* pull

I see people confused about the above, because they don't understand what they're really asking the system to do. I don't think any abstraction is ever going to solve that.

Git does have a problem with its command line (or at least how consistent and ambiguous it can sometimes be), but you really should get past it after a week or two of using it. The rest is on you. If you know what you want/need to do getting past the CLI isn't hard. People struggle with the former and so they think the latter is what's stopping them.

sctb · on April 11, 2018

Could you please remove the thorniness and condescension for your posts? It breaks the guidelines and makes discussions worse.

https://news.ycombinator.com/newsguidelines.html

kadenshep · on April 11, 2018

Can you tell the other guy to not post false and disingenuous statements? Because I'm pretty sure that is what degrades discussions, not any tone I choose to exhibit. I highly encourage you to read the thread thoroughly. If I switched my position on git we wouldn't be having this discussion, as evidenced elsewhere in the thread where people are taking a notably blunter tone than I am just with the side with popular support on this forum.

I posted a bald statement. He replied directly with snide remarks and fallacies. Look at the timestamps and edits. I have every right to be annoyed and make it known that I am annoyed in my posts when the community refused to consistently adhere to guidelines.

Enforce guidelines that keep discussions rational, not because people don't want to be accosted in public for their misleading, emotionally bloated statements.

sctb · on April 11, 2018

It doesn't matter what you're replying to. The guidelines always apply, so please follow them.

kadenshep · on April 11, 2018

>The guidelines always apply

They are currently not being applied. Is it fair for me to point out how inconsistently the posts are being treated?

https://news.ycombinator.com/newsguidelines.html:

>Don't say things you wouldn't say face-to-face. Don't be snarky.

"Every day humans make me again realize that I love my dogs, and respect my dogs, more than humans. There are exceptions but they are few and far between." [2]

>Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize.

"Which sort of doesn't matter since everyone thinks GitHub is source management." [1]

>Please don't post shallow dismissals

"You all lost out on "the most sane and powerful" as a result." [1]

"Calling it a sane and powerful source control tool is just not supported by the facts, calling "the most ..." is laughable." [1]

"Calling Git sane just makes it clear that you haven't used a sane source management system." [1]

"Lots of people are too busy/whatever to know what they are missing, maybe that's you. It's not me" [3]

>When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

"Arguing with some random dude who thinks he knows more than me is not really fun." [3]

"Dude, troll much?" [4]

[1] https://news.ycombinator.com/item?id=16806588

[2] https://news.ycombinator.com/item?id=16807652

[3] https://news.ycombinator.com/item?id=16806877

[4] https://news.ycombinator.com/item?id=16807763

At least I had the decency to correct dishonest statements with vanilla citations in my posts, and they still got ignored.

dang · on April 11, 2018

Some of the things you quoted there are admittedly borderline, but you went much further across the nastiness line. Could you please just not do that? It isn't necessary, and it weakens whatever substantive points you have.

kadenshep · on April 11, 2018

>borderline, but you went much further across the nastiness line.

I didn't insinuate that people are worth less than pets that I bought and own (who can't even choose who to be dependent on) because they don't agree with my perspectives over a piece of software. In what context would this be an acceptable statement to make face to face or in a public setting and you go "well, you know, it's kind of okay to say!"

I'm exceedingly interested in where I crossed that line in a considerable manner because that's one distant line to cross. Next time someone says something I perceive to be incorrect, or they get on my nerves for continually disagreeing with me, I'll be sure to tell them my dog is worth more than them since that's actively being allowed and has a precedent of moderator support.

And for the record, my tone is probably "abrasive" in this post because the above actions and outright blind eye towards outright lies and uncalled for statements is aggravating. I have a feeling you're not doing anything just because of who he is, and not because what he is saying is warranted or even accurate (it's definitely not, as I demonstrated across several different posts).

I've archived this thread so people are free to review my actions and moderator actions at a later date: https://web.archive.org/web/20180411062201/https://news.ycom...

I've said my piece.

iliaznk · on April 11, 2018

Exactly, it was no longer mysterious for me after I had to prepare a written branching procedure for our team starting from how to branch off, commit, rebase to doing resets and working with reflog. While doing that I've thoroughly read the official docs, examined lots of examples, created a local repo with a couple of text files to test various commands. An then it became so clear and simple! Especially the reflog – so powerful!

So, my advice is to try to write some instructions for yourself for all the common cases you might run into during your work. It will not only help you realise what you actually need from git, but also will serve as a good cheat-sheet.

mikez302 · on April 11, 2018

This looks like a good chair lift: http://gitless.com/

sammygutierrez · on April 11, 2018

Git Pro's chapter on git internals does a good job of explaining some of the things going on under the hood.

https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...

zwischenzug · on April 11, 2018

I wrote a book that tries to :)

https://leanpub.com/learngitthehardway

I start with simple examples and work up from there. It's based on training I've conducted at various companies, and avoids talk of Merkle trees or DAG.

TomK32 · on April 11, 2018

Like `git help`? I has everything important grouped nicely and hints you to even more subcommands.

ab0aa907 · on April 11, 2018

I am not a git expert or anything, but I have helped resolve weird git issues for my teammates usually using a lot of Google and StackOverflow.

I just know 5 basic commands; pull, push, commit, branch, and merge. Never ran into any issues. People who run into issues are usually editing git log or doing something fancy with “advanced” commands. I have a feeling that these people get into trouble with git cause they issue commands without really knowing what those commands do or even what they want to achieve.

flatline · on April 11, 2018

Start working in a repo with submodules and you suddenly have to understand a lot more and can get into trouble with no idea how you did it.

xxpor · on April 11, 2018

I use submodules every day, never had a problem with them. What do people complain about when it comes to them?

My mental model is basically that they're separate repos, and the main repo has a pointer to a commit in the submodule. Do your work that needs to be done for the submodule, push your changes, and then check out that new commit. Make a commit in the main repo to officially bump the submodule to that new commit. Done.

The annoying part is when you do a pull on the main repo, you have to remember to run git submodule update --recursive.

friend-monoid · on April 11, 2018

Because you have the .gitmodules file, the .git/config file, the index, and .git/modules directory, each of which can get out of sync with the others.

If, for example, you add a submodule with the wrong url, then want to change the url, then you instinctively change .gitmodules. But that won't work, and it won't even nearly work.

If you add a submodule, then remove it, but not from all of those places, and try to add the submodule again (say, to a different path), then you also get wierd errors.

If you add a submodule and want to move it to another directory then just no.

Oh and also one time a colleague ran into problems because he had added the repo to the index directly - with git add ..

Oh and let's talk about tracking submodule branches and how you can mess that up by entering the submodule directories and running commands...

AstralStorm · on April 11, 2018

Why do you want to bypass the tool at the first glance? Git submodule command has a way to update these urls...

friend-monoid · on April 11, 2018

Heh, good question.

But seriously, the fact that there is a .gitmodules file lulls you into a sense that that file is "the configuration file". If you don't know about these other files, then it's natural to edit .gitmodules. When you make errors, the fixing those errors are pretty hard. There is no "git submodule remove x" or "git submodule set-url" or "git submodule mv".

For example, do you know how, on the top of your head, to get an existing submodule to track a branch?

How do you think someone who does not quite understand git would do it? Even with a pretty ok understanding of git infernal, you can put yourself deep in the gutter. (case in point, if you enter the submodule directory and push head to a new commit, you can just "git add submodule-directory" to get point the submodule to the new commit. But if you were to change upstream url or branch or something else in the submodule, you're screwed. That's not intuitive by a long shot)

Edit: git submodule sync is not enough by the way... You can fuck up your repo like crazy even if you sync the two configuration files.

flatline · on April 11, 2018

Right, it’s not that hard, but there are some gotchas. The most common problem I see is the local submodule being out of sync with the remote superproject. Pushes across submodules are not atomic. Accidentally working from a detached head then trying to switch to a long out of date branch can be an issue, as can keeping multiple submodules synced to the head on the same branch. Recursive submodules are, as you mentioned, even more fun.

AstralStorm · on April 11, 2018

The same problem appears in any non monolithic project. In any SCM I know of.

Git subrepo or subtree are some of a solution but not quite complete and easy to use.

In some other scms (P4 and SVN, partly hg) the answer is don't do that, which had a whole lot of its own problems.

chris_wot · on April 11, 2018

Oh, so that's what you do!

xxpor · on April 11, 2018

Heh, I probably made it sound more complicated than it really is. Just think of it as a pointer that needs to be manually updated.

_doky · on April 11, 2018

I'm comfortable with most advanced git stuff. I don't touch submodules.

_pmf_ · on April 11, 2018

> I don't touch submodules.

What's the alternative? Managing all dependencies by an external dependency manager does not exactly reduce complexity (if you're not within a closed ecosystem like Java + Maven that has a mature, de-facto standard dependency manager; npm might count, too).

It's absolutely not feasible for C++ projects; all projects that do this have horrible hacks upon hacks to fetch and mangle data and usually require gratuitous "make clean"s to untangle.

buserror · on April 11, 2018

I use git sub-trees. Actually I love the thing. They give you a 'linear' history, and allow you to merge/pull/push into their original tree, keeping the history (if you require it).

_pmf_ · on April 11, 2018

Never heard of them (well, probably in passing); will look into is. Thanks!

yawaramin · on April 14, 2018

Why isn't it feasible for C++ projects?

lilbobbytables · on April 11, 2018

Oh you can fuck right off with submodules!

greasyjon1 · on April 11, 2018

^^^ this comment is supposed to be humor, not douchebaggery, by the way. Easy on the downvotes.

nurettin · on April 11, 2018

I never had any problems the past 6 years I've been using Git professionally. But then someone asked me what to do when Git prevents you from changing branches and not knowing they did not stage, I told them to stash or commit. They stashed and the changes were gone.

My point is, while your basic commands do the work, your habits and knowledge keep you from losing code like this without you knowing.

bonzini · on April 11, 2018

Why were the changes gone? Why couldn't they "git stash pop"?

nurettin · on April 11, 2018

Unstaged or untracked changes were gone. They couldn't get those back after pop. I can't remember which.

amalag · on April 11, 2018

Untracked files are not stashed, that is true.

bonzini · on April 12, 2018

They're also not deleted by "git stash" though.

TomK32 · on April 11, 2018

Is no one reading git's help pages before running a command the first time?

Not even once I lost I code worked on with git. stash is a reliable companion across branches and large timespans.

mrweasel · on April 11, 2018

I do like Git, most of the time, but really, not a single problem, in six years?

When using Git daily we never really did anything complicated, just a few feature branches per developer, commit, push, pull-request, merge. Basic stuff. We had Git crap out all the time. Never something that couldn't be fixed, but sometimes the fix was: copy your changes somewhere else, nuke your local repo, clone, copy changes in and then commit an continue as normal.

dbt00 · on April 11, 2018

I’ve been using git since 2007 and never ever even wanted to try nuking a checkout and starting over to recover from anything, much less did so. (Did have a nameless terrible Java ide plug-in do it for me once.)

Ace17 · on April 11, 2018

So you're not using checkout, reset, and diff?

ab0aa907 · on April 11, 2018

Good point, forgot about checkout, diff, clone, blame, add, rm, rebase, init, and probably a few more.

Haven't used reset personally though but only when trying to fix someone's repo.

e12e · on April 11, 2018

Or fetch?

OJFord · on April 11, 2018

I think a lot of people ignore fetch and only ever pull.

e12e · on April 11, 2018

I think it's the most important sources of my cognitive dissonance around git. It strengthens the illusion that a working directly is somehow related to a git store, which it really isn't.

You have a working directly/checkout - that can be: identical (apart from ignored files) to some version in git; or different.

If it's different ; some or all changes can be marked for storing in the git repo - most commonly as a new commit.

It's a bit unfortunate that the repo typically is inside your work directory/checkout - under '.git' along with some files like hooks, that are not in the repo at all...

TomK32 · on April 11, 2018

But you'd have to stash before pull. At least with my config where a pull will rebase automatically.

OJFord · on April 11, 2018

I use `git config pull.rebase true` too, but that doesn't mean you _have_ to stash first, just as rebase manually wouldn't - depends if there's a conflict.

Same is true of merge-based pull.

e12e · on April 11, 2018

Except for saving some typing, is there any benefit to stash over local branches?

In other words, shouldn't git just fix ux for branches and rip out stash?

OJFord · on April 25, 2018

So "some typing" would be:

    # git stash:
    prev_ref="$(git rev-parse --abbrev-ref HEAD)"
    git checkout -b wip-stash
    git add .
    git commit -m 'wip stuff'
    git checkout "$prev_ref"
    
    # git stash pop:
    git checkout wip-stash -- .
    git checkout -D wip-stash

It's quite a considerable saving. I suppose by "fix UX" you mean make it so the saving would be less anyway, but I think really they're just conceptually different:

    - branch: pointer to a line of history, i.e. a commit and inherently its ancestors
    - stash: a single commit-like dump of patches

If stashing disappeared from git tomorrow, I think I'd use orphan commits rather than branches to replace it.

_pmf_ · on April 11, 2018

pull == "just fuck my shit up"

whym · on April 11, 2018

EDIT: I guess I misread it. On reflection what I wrote really doesn't make sense so let me retract.

jhall1468 · on April 11, 2018

`cherry-pick` is just plucking a single commit and adding it the commit history of the current branch, and `rebase` is what civilized people use when they don't want merge commits plaguing their entire code base.

YZF · on April 11, 2018

merge is what civilized people who care about getting history and context in their repository use ;) ... I worked a lot in git using both rebase and merge workflows and I'll be darned if I understand the fear of the merge commit ... If work happened in parallel, which it often does, we have a way of capturing that so we can see things in a logical order ...

jhall1468 · on April 11, 2018

Polluting the master repo with a bunch of irrelevant commits isn't giving you context, it's giving you pollution. There's nothing to fear about merge commits. It's about wasting everyone's time by adding your 9 commits to fix a single bug to the history. I work on teams, and we care about tasks. The fact that your task took you 9 commits is irrelevant to me. What is relevant is the commit that shows you completed the task.

sameerds · on April 11, 2018

It's not really a fear of the merge commit. In a massively collaborative project, almost everything is happening in parallel, and most of that history is not important. The merge makes sense when there is an "official" branch in the project, with a separate effort spent on it. It's likely that people working on that branch rebase within the branch when collaborating, and then merge the branch as a whole when it is ready to join the mainstream.

buserror · on April 11, 2018

Ah, you can learn the beauty of merge AND rebase at the same time then...

Here to 'present' feature branches, we take a feature development branch will all the associated crud... Once it's ready to merge, the dev checkouts a new 'please merge me' branch, resets (or rebase -i --autosquash) to the original head, and re-lay all the changes as a set of 'public' commits to the subsystems, with proper headings, documentation etc.

At the end, he has the exact same code as the dirty branch, but clean... So he merges --no-ff the dirty branch in (no conflicts, same code!) and then the maintainer can merge --no-ff that nice, clean branch in the trunk/master.

What it gives us is a real, true history of the development (the dirty branch is kept) -- and a nice clean set of commits that is easy to review/push (the clean branch).

salt-licker · on April 11, 2018

Sometimes I want to take a subset of the commits out of a coworker's merge on staging to push to production, and then put all non-pushed commits on top of the production branch to form a new staging branch. I find having a linear history with no merges helpful for reasoning about conflict resolution during this process. What advantages do merged timelines give in this context?

YZF · on April 16, 2018

What I like about merges it that it shows you how the conflicts were resolved. You can see the two versions and the resolved and you can validate it was resolved properly. With a rebase workflow you see the resolutions as if nothing else existed, you can't tell the difference between an intentional change and a bad resolution...

jsjohnst · on April 11, 2018

> merge is a what civilized people who care about getting history and context in their repository use

> I'll be darned if I understand the fear of the merge commit

I apologize in advance for not adding much substance in this reply, but I agree too much to just upvote alone.

johnx123-up · on April 11, 2018

Just curious... are you working in a team using git workflow?

ab0aa907 · on April 11, 2018

Yes, my direct team is small of 4 devs but the main repo we work on is used by 100+ devs. We use git workflow (new branch for each feature) for the main repo and github style workflow (clone and then submit PR) for some other repos.

irrational · on April 11, 2018

The number 1 reason my team has not moved from Subversion to Git is we can't decide what branching model to use. Use flow, don't use flow, use this model, use that model, no, only a moron would use that model, use this one instead. Rebase, don't rebase, etc. No doubt people will say that it all depends on the project/team/environment/etc., but nobody ever says "If your project/team/environment/etc. look like this, then use this model." So we keep on using Subversion and figure that someday we will run across information that convinces us that it is the one true branching model.

naasking · on April 11, 2018

I have another solution: just switch to mercurial. I switched some big projects to mercurial from svn many years ago. Migration was painless, tooling was similar but better, the interface is simpler than git, and haven't regretted it once.

prepend · on April 11, 2018

This is the path I took for a few projects years ago when Google Code didn’t support git.

Switched to mercurial from svn and workflow was painless for the team. Interestingly, we slowly started adopting more distributed techniques like developer merges being common. With svn, I think I was the only one who could merge and it would be rare and added product risk.

Then after about a year of mercurial we switched to git and our brains had adapted. Our team was small, 5-10 people.

Somewhat relatedly, in 2002, I worked in a large team of 75 people or so with a large codebase of a few hundred thousand lines of active dev. It used Rational ClearCase had “big merges” that happened once or twice a release with thousands of files requiring reconciliation. There was a team who did this so it was annoying to dev in, but largely I didn’t care.

Company went through layoffs and the team was down to one. He quit, the company couldn’t merge, so couldn’t release new software versions.

There was a big crisis so they went to the architects and pulled a few out of dev work. It turns out I was the one who could figure it out and dumb enough to admit it.

That sucked. It took a few weeks to sort out and modify our dev process to make merges easy and common. But it was not fun. Upside is we ended up not having any “non-programmer” op/configuration management people since the layed off/quit team were ClearCase users, who didn’t code.

Moral- don’t let people know you can do hard, mundane tasks.

buvanshak · on April 11, 2018

I have converted all my mercurial repos to git and I have forgotten all mercurial now. It helps me feel less pain when I am forced to work in Git....

healsdata · on April 11, 2018

> but nobody ever says "If your project/team/environment/etc. look like this, then use this model."

Honestly, its because a lot of it comes down to preference and what value you gain from using version control. It is very much like code style standards -- it doesn't matter what is in the standard so much as your teammates all using the same one.

If part of the blocker for your team is that no one is experienced enough with git to have a strong opinion, I'd be happy to brainstorm with you for an hour to learn about your current process and offer a tailored opinion.

whatshisface · on April 11, 2018

Why not replicate whatever you are doing in Subversion in Git? You'll still be able to take advantage of the better merging algorithms, while maintaining whatever political momentum seems to be driving the team's decisions.

MsMowz · on April 11, 2018

It really, really doesn't matter. That's one great thing about a distributed SCM.

petre · on April 11, 2018

We moved from SVN to Fossil and it has worked out great for us. The other option was Mercurial but it required Python.

sopooneo · on April 11, 2018

If it is import to switch to Git, I suggest a technical leader, imbued with authority from management, make those decisions and just do it. However, I don't necessarily think a team should switch away from Subversion if it's working for them.

f1notformula1 · on April 11, 2018

> everyone understands a different part of git and has slightly different ideas of how things should be done

This was a big problem that bugged me too, so for every team I've worked with I've created a few scripts for the team's most common version control operations.

Most devs, including me, are pretty lazy so they'd all rather run this script than go to Stack Overflow to figure out git arcania.

This helps standardize conventions too: Feature branches/linear DAGs/topic branches/dev branches/prod branches/whatever weird thing a team does they all just do that using the script so it's standardized.

specialist · on April 11, 2018

“rebase” is just “pull before push”, right?

While I have no opinion on git, I can’t abide by all the precious chaotic mutant misuse, like git-flow.

I’d happily accept a subset of primitives, if only to disallow bad ideas. Kinda like Git vs SVN, C/C++ vs Java, flamethrower vs peanut butter.

aidenn0 · on April 11, 2018

Rebase is "rewind local changes" "pull" "replay local chances"

Basically it makes it so that all of the local-only commits are sequenced after any remote changes that you have not seen yet.

[edit]

YZF is correct. In the context of pulling (i.e. "git pull --rebase") my description is correct. However in general rebasing branch X to Y that diverge from commit C is:

rewind branch Y to commit C; call the old tip of Y Y'

play all commits from C -> X on Y

play all commits from C -> Y' to branch Y.

YZF · on April 11, 2018

You can rebase between two local branches. The rebase operation has nothing to do with pull or remote vs. local.

aidenn0 · on April 11, 2018

Yes. I thought we were in the context of git pull --rebase...

e12e · on April 11, 2018

"pull" might be the first thing I'd throw out, if thought there was any hope of fixing git ux. Then add a working merge --dry-run #do i have conflicts?.

aidenn0 · on April 11, 2018

I think a default of --ff-only would be fine for pull. This is great for when I'm merely a consumer of a project, and will never silently perform a merge or rebase.

specialist · on April 11, 2018

Thanks (all) for the clarifications.

When explaining to others, I should probably say 'pull, reapply, then push'.

Perhaps 'rebranch' is a better word choice than 'rebase', to conceptually more closely match what's actually happening under the hood.

joesb · on April 11, 2018

"rebase" is not just "pull before push", though.

It's pull then rewrite all your personal commits to be based on the latest tip from that pull.

YZF · on April 11, 2018

rebase is simply(tm) replaying a sequence of commits (or diffs or patches for that matter) over some arbitrary base, hence re-base ...

TomK32 · on April 11, 2018

rebase can do a lot more. Try `git rebase -i` to squash smaller commits, edit the commit msg, or even drop a commit before you push it to your colleagues.

Last time our devop did 20 commits to get something on elasticbeanstalk right, I squashed it all into just one clean commit that got merged into master branch.

It will help you to commit more often without worry until the moment you have to hand in your work.

fulafel · on April 11, 2018

Rebase is a controversial history altering operation and makes it easy to paint yourself into a corner and get weird error messages or wrong results. Its very different from pull/merge.

y2kenny · on April 11, 2018

History altering is only controversial on things that are published. There is nothing wrong with reordering, combining or splitting your local commits to give more clarity to what you are doing. Keeping this in mine will give you the freedom to commit frequently.

This confusion happens because many popular SCMs historically have the "commit" and "push" operation in a single step. Git keep them separate.

fulafel · on April 11, 2018

There is no tracking by git on what is published, so it's easy to make the mistake of rebasing things that are published and shared by others. Then you will have a bad time later when you try to sync with others, possibly days later.

y2kenny · on April 11, 2018

Um... git kind of does with remote tracking branches. You can also make it very obvious by your workflow? If you use local feature branches (which you should for juggling between development tasks, etc.), what you are working on vs what's upstreamed should be pretty clear. Sounds like you are not using local branches.

Not using local branch is another confusion caused by the perspective of historical/traditional SCMs (people thinking branches are the domain of a centralized server and are outside of their control.)

fulafel · on April 11, 2018

Often you want to push changes to a remote, but not yet merge or PR them to upstream.

Keeping "local feature branches" just on your dev machine is bad for many many reasons:

- you want to encourage low barrier cooperation in your team -> sharing changes

- you want changes to the CI pipeline early so the potentially slow testing machinery works in parallel with the developer

- you want to keep the team up to date on what changes you make

- you don't want to lose work if the machine/OS dies, or the developer leaves/becomes sick/goes on a 4 week vacation during which they forget their disk crypto password

So, in practice you can try to use rebase opportunistically, when out of chance your WIP work is still unpushed because the change was only made very recently. This is error prone. Or you can rebase published branches explicitly, by destroying the original branches in the PR merge phase. But all this is big bother if the purpouse is to just beautify history and at the same time hide the real trial and error that went into making the changes.

yawaramin · on April 14, 2018

Did you notice that y2kenny was talking about how, if you use local feature branches, then the remote tracking branches make it really clear what's been published vs not? The implicit meaning is that we should use local feature branches but also publish them to the repo while we're working on them.

But maybe to you, 'publish' means 'publish to master'? In that case I can assure you, they are not necessarily the same thing. I regularly work on a local feature branch, publish that branch to the shared repo, rebase it on top of master, then force-push to the shared tracking branch. When I'm done I merge it into master and don't rebase master on top of anything.

y2kenny · on April 11, 2018

who said anything about not publishing?

fulafel · on April 12, 2018

I'm not sure if you are being serious? The answer is that published advice on rebase overwhelmingly warns against rebasing published code, and for good reason.

y2kenny · on April 12, 2018

Who said anything about rebasing published code?

TomK32 · on April 11, 2018

I LOVE rebase but when I run into merge conflicts I rather `rebase --abort` and leave that merge commit as it is. But those instances are rare and having a merged branch's commits nice and compact in the log makes me happy every time.

emmelaich · on April 11, 2018

Nobody understands SVN or CVS either.

I discovered this supporting SVN servers for whole bunch of developers.

muxator · on April 12, 2018

I always found the mercurial ui super easy.

The error messages are clearer, it is multiplatform, all the advanced functionalities are there, a nice graphic interface exists.

I really do not understand why git won, apart from github.

collyw · on April 11, 2018

What I find ironic is that github is massively popular as a central way to use a distributed version control system. The distributed nature only adds to the complexity and I am sure it is only used by a fraction of git users.

yawaramin · on April 14, 2018

Yes...? What's surprising about using a central repo to collaborate? There needs to be a single source of truth for a coherent project, otherwise you're just going to have chaos.

The distributed nature of git led to the simple and secure contribution model of everyone working on their own repos and not needing to give write access to anyone else. This pretty directly led to an explosion of open source software.

sathishvj · on April 11, 2018

Is there any really good tutorial on git that teaches the internal model? Ideally, it would illustrate each command and show the before and after of the internal objects.

_e21c · on April 11, 2018

https://learngitbranching.js.org/ is the best guide I've seen. It shows you the complete commit graph and all refs on that graph, and updates the graph when you type in commands. It covers and displays workflows involving remotes as well.

If you don't want the tutorial, you can go straight to the sandbox here: https://learngitbranching.js.org/?NODEMO

cup-of-tea · on April 11, 2018

Indeed. When the article said "younger developers only know git" I immediately thought, no, they don't know anything. These people don't even know what a DAG is. Git was made for people who know these concepts. I've tried explaining git to people and they just don't understand. They just don't.

What's annoying is that git is just expected knowledge these days and having a github account is enough to claim it. There's not a good way to sell the fact that you're a bit more into it than that.

I've even said to git "experts" that branches should really be called refs and their eyes glaze over. It's difficult for me to understand what git is in their heads.

y4mi · on April 11, 2018

Why would you call branches ref's? They don't point to specific files or commits.

I know you can target commits through them - which utilizes the ref syntax... But they're still not really referencing anything directly.

They're completely arbitrary and are just a feature to improve gits workflow.

aequitas · on April 11, 2018

I started naming branches 'post-its', as to me that's what they are, labels you place on the real 'branches' (the commit tree). You can take them of easily, move them, discard them, whatever you want. They are just volatile.

cup-of-tea · on April 11, 2018

I should have said pointers. I didn't mean to overload existing git terminology. My point was just that they are pointers/references to some commit.

rantanplan · on April 11, 2018

> They don't point to specific files or commits.

A branch points to the tip(last commit) of a particular timeline.

friend-monoid · on April 11, 2018

But they are also called symbolic refs in git terminology...

ethomson · on April 11, 2018

A symbolic ref is a ref that points to another ref instead of a ref that points to a commit. `HEAD` is a symbolic ref. (It should be your only symbolic ref.)

friend-monoid · on April 11, 2018

Unless it is detached. :)

y4mi · on April 11, 2018

That term makes sense.

But just as you wouldn't call a symlink to a zip archive a zip file itself, you also shouldn't call a branch a ref.

friend-monoid · on April 11, 2018

Hrm, but a ref is a file containing a hash, right? So if the hash is equivalent to the file, the surely a ref is equivalent to a symlink? A symbolic ref, in turn, should be a symlink to a symlink... Or something like that...

y4mi · on April 11, 2018

A ref points to an object. That object doesn't change unless the hashing algo was tricked.

A branch points to anything you want it to point to. It can be any ref you want and can be changed at will.

friend-monoid · on April 11, 2018

sha1 - object (e.g. 5a480efb...) file with sha1 - ref (e.g. master) file with ref - symbolic ref (e.g. HEAD)

right? Seeing as you can git update-ref branches, but you need to git symbolic-ref HEAD.

rantanplan · on April 11, 2018

But it is a ref. It's an alias for the last commit of a particular timeline, as I said above.

cup-of-tea · on April 11, 2018

So would you rather say a branch is a commit?

y4mi · on April 11, 2018

A branch is a pointer or symlink if you will.

baud147258 · on April 11, 2018

> It's difficult for me to understand what git is in their heads.

In that case, they were thinking the git was you.

gaius · on April 11, 2018

Git is the solution to the problem of doing distributed development on the Linux kernel. People who aren’t doing that, I wonder if they’re entirely clear in their own minds why they use it. I’m certainly not... other than that it’s just the default choice these days, the path of least resistance...

neals · on April 10, 2018

I'm a big fan of Fossil myself. But the SQlite people have something that I don't really have within the teams I operate : the authority to dare and speak out against Git and not be laughed away like a hipster that is just trying to be different.

cies · on April 10, 2018

Hipster source code management? See this Rust project:

https://pijul.org

https://pijul.org/manual/why_pijul.html

A bit like DARCS (also very hipster, in Haskell and has some math behind it), but then fast.

https://pijul.org/model/#efficient-algorithms

https://pijul.org/faq/

Oh and it uses a cool hi-perf storage lib (also in Rust, by the same devs):

https://nest.pijul.com/pijul_org/sanakirja

hinkley · on April 11, 2018

    Pijul lets you describe your edits after you’ve made them, instead of beforehand.

Pardon my French, but about fuckin time.

On a big product, forensics matter. Not day to day, but often enough and if your metadata is rotten then you’re left with the oral history of the project as your only guide. And even that may not exist, depending on project structure.

pyre · on April 11, 2018

Git has something similar called git-notes, but at the time I tried using it, it was really early-days. No idea how support is working for that now. You could also make an annotated tag, which has it's own "commit message", but it will show up with all other tags.

[1] https://git-scm.com/docs/git-notes

hinkley · on April 11, 2018

Git notes is interesting but it’s a manual process.

When selecting technology I look for “a rising tide lifts all boats” situations and opt-in tools have limitations in that regard.

There’s a big gap between ‘can do’ and ‘will do’ and I feel like we downplay that frequently in our industry, and to our own peril.

Izkata · on April 11, 2018

Standalone, that sounds like a commit message - which you make after editing the code anyway. (And possibly tweak/update with git rebase before pushing)

In that section's context, it sounds like naming a branch after having already started on it. In which case, that seems to me the tiniest bit less useful than git's ability to rename branches (git branch -m oldname newname).

What am I missing?

peatmoss · on April 10, 2018

Darcs was also prior art (certainly the first DVCS I ever encountered), which makes me more inclined to call them innovative than hipster :-)

cies · on April 10, 2018

That would make Haskell innovative instead of hipster as well.. :)

peatmoss · on April 10, 2018

Woah woah woah, let’s not get ahead of ourselves! :-)

atombender · on April 11, 2018

I haven't used Pijul, but I did use Darcs for several large production projects back when it was still a thing.

Darcs was magical -- in both senses of the word. It was incredible to see it figure out which patches depended on which, allowing a fluid exchange of changes between branches in a way that quickly becomes a nightmare in git. But it was also magical in that nobody really understood the internals. Not in the sense of git where the underlying data model is pretty simple, and the "version control" aspect is a (thin!) UX veneer on top, but in the sense that it was like quantum physics. When something went wrong, it was almost always impossible to fix. And with Darcs, things did go wrong, because it had bugs, specifically a certain dreaded "exponential conflict" edge case where, if it encountered an identical line change in two patches from different branches (or something like that, it's been more than 10 years), computation time went through the roof and the merge command almost never finished. At several points we had to start history from scratch to avoid spending an entire day fighting the conflict problem. Another thing with Darcs (and presumably Pijul) was that since it tracks patch inter-dependencies, you can rarely cherry-pick individual patches -- pulling out one patch tends to pull with it a whole string of related patches, all connected. Which is often what you want (git just fails horribly in such cases), but sometimes you do want to "forcibly cherry-pick" and manually fix, change identity be damned. I don't know if Pijul supports this.

It looks like Pijul fixes the conflict problem, but it still seems to keep the "quantum theory of patches" that requires an above-average developer to understand. If it has no bugs, then maybe the problem is moot, but in our industry, transparent, "self-repairable" tech seems to win in the long run over the esoteric, opaque and magical.

That said, it's clear the Darcs/Pijul has a vastly better UX, which I'm all for. Git's data model works remarkably well for what it does, but it's always been obvious to me that its "record snapshots and try to make sense of them after the fact" philosophy is a bit flawed. The article mentions branch history. And rename detection doesn't work well with how most people work, for example; it's a clever kind of lazy evaluation, but probably designed for Linux kernel devs, so not clever enough. Darcs had a patch type specifically for renames, and it worked very well.

Another thing I wish version control systems had was what you might call a high-level changelog. It would let you group and annotate commits after the fact, but without changing them. For example, you might want to group a bunch of patches as a single "feature" commit. Then you could make a "release" group that groups a bunch of feature commits. In other words, several levels of nesting, with each commit containing child commits and so on. Viewing the log should show only the highest-level groups, with the option to expand them visually so you can see what they contain. You should be able to group things like this after the fact without changing commit order, and you should be able to annotate the log (e.g. add more information to a commit message) without mutating the underlying patches. Git was on the verge of ventured into this territory with its (now discouraged) "merge commits" -- a high-level commit that represents a single logical merge but encapsulates multiple physical patches -- but that didn't go anywhere. The nice thing about a high-level history like this is that you could use it to drive release notes and change logs, and it would greatly aid in project management and issue tracking, because you could manage entire sets of commits by what issues or pull requests or milestones or whatever they relate to.

WorldMaker · on April 11, 2018

> It looks like Pijul fixes the conflict problem, but it still seems to keep the "quantum theory of patches" that requires an above-average developer to understand. If it has no bugs, then maybe the problem is moot, but in our industry, transparent, "self-repairable" tech seems to win in the long run over the esoteric, opaque and magical.

The patch theory is complex, but it isn't that complex. Especially since there is plenty of alternate implementations out there of Operational Transforms (OTs) and Conflict Free Replicated Data Types (CRDTs), it's relatives/cousins/descendants. In theory, any developer than can grok a blockchain or a Redis cache should be able to grok the patch theory.

Darcs suffered much more from being written in Haskell, I think, than from the actual complexity of its patch theory.

Pijul being written primarily in Rust maybe has a chance of also getting over that hump a bit easier than Darcs had. Though now it also has the uphill climb of competing against git's inertia.

> Git was on the verge of ventured into this territory with its (now discouraged) "merge commits"

Discouraged only by people that don't know `--first-parent` exists as a useful `git log` and other command arguments. The useful thing about a DAG is you can very easily slice it to create arbitrary "straight line" views. You don't have to constantly smash and squash history to artificially force your DAG into a straight line.

sjellis · on April 11, 2018

"....in our industry, transparent, "self-repairable" tech seems to win in the long run over the esoteric, opaque and magical."

Quoted for truth.

FridgeSeal · on April 11, 2018

This is super, super cool - thanks for sharing!

cies · on April 11, 2018

Most welcome. It's one of those projects that I keep half an eye on because it is just waaay too fantastic while not being unfeasible.

MichaelRenor · on April 10, 2018

Git is a use-case that is excellent for 90% of development. Sqlite is just an example where the use-case isn't necessarily ideal, not an indicator that it's "better" than git.

bch · on April 10, 2018

I’m a fossil fan.

I’d say that git is fine for 90% of development (or some arbitrarily large number), but so is fossil. I don’t even think that SQLite-in-git would necessarily be a deal-breaker that couldn’t be worked around (drh ‘sqlite can chime in here). The whole space (from personal projects to global collaboration) is diverse enough that there’s no talking about “better” without qualifying the situation, either.

Fossil is good for a large subset of work that can benefit from source control management, regardless of git.

What git definately has is

1) scaleabilty, which is probably of no consequence for 99% of the cases it is employed

2) network effect, for better AND worse

nebulous1 · on April 10, 2018

> drh ‘sqlite can chime in here

He already has

> With Git, it is very difficult to find the successors (decendents) of a check-in ... This is a deal-breaker, a show-stopper.

AstralStorm · on April 11, 2018

Someone still thinks in the single main branch mode. It is sometimes the main case but definitely not in git world.

This operation is not easy in any DAG. It involves:

- find all or desired branch tips - walk backwards until hitting tge desired checkin - memoize already seen parents to not walk them multiple times

bzbarsky · on April 11, 2018

Amusingly enough, git's scaleability is also not that great (e.g. worse than mercurial last I checked).

The network effects are there, though.

AstralStorm · on April 11, 2018

Please provide the source about scalability. Also what kind of scalability?