Hacker News new | past | comments | ask | show | jobs | submit login

definitely agree, and I'm in the same boat. I don't even think the data model of git is that hard to grok at all, it's mostly that commands are very unclear on what they operate on and in particular people get really tripped up about how many levels of state there are (stage, working tree, local branches, remote refs) that they have to interact with.

Like, I've had to explain a lot of times why you `git pull origin master` but when you want to interact with that remote branch otherwise it's `origin/master` instead. The lack of clarity is in what commands operate on what levels, with many of them operating on several at once.

There have been some efforts to reform the command set to be more clear, like `git switch`, but the old commands will persist forever along with a lot of other footguns (like `git push --force` really ought to be replaced with `git push --force-with-lease` and moved to `git push --force-I-really-mean-it` so it hardly matters.




I've actually worked on git internals and I'm in the same boat.

As part of a security-related project some years ago, my team and I hacked jgit to use SHA256, which required changing the length of pretty much every on-disk data structure. Sadly, there was (probably still is) no HASH_LEN constant, just a lot of magic offsets strewn throughout the code. I had to compare lengths against the git spec at every step.

And yet I still scramble for stackoverflow every time something goes slightly amiss.


There's an ongoing effort to rework core Git so that the hash implementation can be swapped out for eg. SHA256. [1]

jGit is actually a separate project from core Git, but once it gets adopted into core Git we can expect that jGit will follow suite, given that it's critical to Gerrit and other projects.

[1] https://lore.kernel.org/git/20191223011306.GF163225@camp.cru...


What a pointless project! U hope you were paid well, at least.


I was. But it wasn't quite as pointless as it sounds - the tool was a sort of tripwire-like system, with changes shipped to an append-only log, that itself was checkpointed in an early blockchain-ish structure. The threat model was "nation state actor" so the client wouldn't accept SHA1.

It was actually a pretty cool system. I don't think it was ever sold though.


Man, I thought zero days and secret backdoors were bad enough. Now we have to worry about manufactured hash collisions in all our repos' files dating back forever?


That seems like an overkill. Couldn't you combine the hash together with the date to obtain uniqueness?


The date isn't really meaningful since it can be set to anything on a file. But if you can force two dissimilar files to have the same hash, you can combine that with some other attack to inject it into some sort of chain of trust, whether it's git or some other type of checksum based system. Then combine that with a SolarWinds like attack and even if they try to revert to something from years earlier, they can't guarantee that the rollback files are still unaltered unless they had multiple hashes to compare it to or diffed it manually. But multiply that by X thousand files over Y commits during Z years and it would be very difficult to detect.


I do not remember jgit internals, but its API is pretty bad. I always assumed it was some kind of throwaway PoC suddenly turned popular.


> some kind of throwaway PoC suddenly turned popular

Wow, that description feels spot on.


> levels of state

This is the crux for me. Command naming is completely unrelated to and unindicative of state.

It feels like surely there's an opportunity for the basic CRUD operations to be collapsed down into a standard "{action} {source} {target}" style.

There will be nuances, specifically around branching, but the basics should be basic. As opposed to a Swiss Army knife, where you have to pull out the scissors and squeeze them three times before you can unfold and use the blade.


i can't stress just how great magit is. it's worth trying out emacs for just that. something like spacemacs as a wrapper is useful too since it gives you some well configured defaults for file operations. emacs is a kinda trash text editor but an amazing text utility toolkit that enabled magit.


I'll stress with you. Even if you hate Emacs, magit alone is a valid single use-case to start up emacs.


Are there any git frontends that do this today?


The one built into IntelliJ IDEs is pretty good. SourceTree is decent too. They are both cover the vast majority of day to day operations. I only ever very rarely have to resort to the command line for ritualistic summoning of the git demons.


Magit comes close to this action-source-target model whenever possible.


can you explain the `git pull origin master` thing one more time here?


I don't think using `git pull` is a particular good way of working. A pull is a fetch and merge or a rebase combined.

If it's difficult to keep your mental model of some system up to date, I doubt that doing bigger steps at once makes things easier.

So

1. run `git fetch`

2. if the textual output does not tell you what has happened, run `gitk -all`

3. Decide what to do. Rebase, merge, whatever.

Of course if you know exactly what you are doing, pull can be fine. If you changed the repo yourself on another computer that is the case. Otherwise, how can you know your second step, before having even seen the data you are operating on? Well, it can work, but if it doesn't, don't complain.


> I don't think using `git pull` is a particular good way of working.

I agree. For a DVCS like git, separating the network transaction from updating the working copy on disk is the best way to go about it. Going in the other direction, this is the default since git add, git commit and git push are executed separately.


This is literally the first advice I give when teaching people git. The first months of use, just run the two commands separate. Many mistakes are avoided that way.


I agree, but I end up using `pull` anyway just because the alternative is so tedious. I wish there was a short command that did the same thing as pull without fetch: merge the remote-tracking version of the current branch's default upstream into the current branch.

Essentially the whole concept of "upstream" is weird and non-orthogonal. Another one that bothers me is that as far as I can see there's no way to globally turn off setting an upstream on newly created branches (I can pass a flag to the specific "git branch" command, but that's tedious and error-prone).


like why it's different?

`git fetch` (and by extension `git pull` when given a remote) and `git push` copy data to and from a remote. When you specify `git pull origin master` you're saying "pull down a copy of the remote ref master from origin", which it then saves locally as the ref `origin/master`.

Everything under `origin/` (or really `refs/heads/origin/`) is just a cached pointer to the last known state of that ref on the remote.

All other commands operate only on these local references. So when you want to refer to what you know to be the state of things on `origin`, you can use `origin/master`. Otherwise that command has no particular knowledge of how to talk to origin.

Incidentally this is a shortcut I use all the time to update my local master from a remote:

`git fetch origin master:master`

Which is super unclear in its meaning but it means fetch origin's master HEAD and put it in my local master ref. I actually use this more often than git pull nowadays.


I tend to default to `git pull --rebase`.


I have this configured as default everywhere and strongly believe that merge-pulls are always wrong. The first place I used git we were learning together (i.e. nobody knew what a sensible workflow was) and people would push their local merge commits back to master. It was horrible.


`git config merge.ff=only` is really helpful for enforcing this. It makes you have to say what you want for any non-trivial update of a ref through pull or merge.


Strongly disagree. Never rewriting local commits is great for the same reasons that never rewriting published commits is great; if you rebase you lose the ability to fearlessly work on multiple branches in parallel that's the great advantage of git.

Pushing merges is great. Pushing random (unreviewed) local commits directly to master is bad, but it's no worse when those commits are merges than when they're not. Conversely, rebasing master (which is quite easy to do if you're inexperienced but have been advised to use git pull --rebase) and pushing that creates a self-perpetuating mess that is very hard to fix (because even if you fix what you did, any other user who did a rebase-pull of master in the meantime is going to reintroduce the problem). Using rebase also trains you to force-push which makes messing up published branches much easier.


Also, one advantage of `git pull origin master:master` is that you don't have to checkout master first.


so the distinction here is

- origin master <=== the actual remote version of the master branch

- origin/master <=== a local branch that you cached from the "origin master" remote, may or may not be in sync with the real "origin master"


origin and master are completely arbitrary too...

`git pull remote_repository_name branch_name` is the generic way to look at it instead of some magic incantation.

I like to call origin "upstream" to differentiate them.

and then git pull is another way to think of git fetch and git merge as one command roughly.


yep, that's right. Or rather, origin and master are just two parameters given to pull/fetch/push to describe a target while origin/master is just the local name for, as you say, the locally cached ref.

Comparing against that locally cached ref is also what git uses to tell you how far behind/ahead of the upstream you are in `git status` or whatever. Fetch and push are the only git commands that actually talk to a remote (at the "user level" of the command set anyways, those are also composed of lower level commands).


> (or really `refs/heads/origin/`)

It is worth the time to fully understand refspecs. Once people do, they tend to understand all essential ramifications of branch and repository naming.


What's wrong with `git push -f`? When I'm working on a branch that's been previously pushed with `-u`, it's pretty normal to force push it, particularly if you're amending or reordering commits in response to review feedback, or rebasing due to conflicts in preparation to merge.


changing `-f/--force` to act like `--force-with-lease` would have no effect on that flow whatsoever. What it would prevent is you accidentally overwriting something on the remote because you didn't know its current state, potentially silently backing out changes someone else (or perhaps you yourself on another machine) had pushed.

All it does is add this simple check before actually pushing:

    if (remote_ref("blah") != local_ref("remote/blah"))
        fail();
Most of the time it doesn't matter, and for most people's uses of --force it would have no effect (because most people are just pushing to a branch they're the only one pushing to). But every now and then it helps a lot to avoid losing data.


It’s important to also understand where this might fall down: many tools fetch automatically and this can cause issues with reliability here.


Ultimately, I suppose, git usage is somewhat cultural. I personally have an aversion to push -f, along the premise that once it’s pushed it’s public and someone else may have branched (and pushed changes of their own) or simply had it checked out for review; doing push -f “changes reality,” while checking out a new branch is idempotent. If someone else has committed on that branch it’s especially jerky to push -f.

I try to be pragmatic about this sort of thing yet push —force is one of those cultural no-no’s for me.


It means you can't fearlessly pull from other people's feature branches. So people mostly don't bother looking at each other's feature branches (because there's nothing you can reasonably do with someone else's change-in-progress except wait for the branch to hit master), so you collaborate later and end up with more conflicts.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: