
Git 2.11 has been released - stablemap
https://github.com/blog/2288-git-2-11-has-been-released
======
mixedmath
This is a really well written partial set of release notes. I was curious and
looked at the full release notes [1], and I think these are pretty well
written as well. I'm very impressed, especially given that git has such a
large set of contributors.

[1]:
[https://github.com/git/git/blob/v2.11.0/Documentation/RelNot...](https://github.com/git/git/blob/v2.11.0/Documentation/RelNotes/2.11.0.txt)

~~~
jakub_g
I liked this gem in L547:

> The code that we have used for the past 10+ years to cycle 4-element ring
> buffers turns out to be not quite portable in theoretical world.

~~~
unwind
That piqued my curiousity, and I had to dig up the relevant commit:
[https://github.com/git/git/commit/bb84735c80](https://github.com/git/git/commit/bb84735c80).

It deals with wrapping an integer index around after incrementing it, the old
code just used ++index and a bitmask, the new code uses + 1 and modulo.

I have problems understanding this right now, in my world ++index for an int
really shouldn't trigger overflow when counting to at most 4, on any
(sem-)realistic environment?

Feeling extra dense, must have more coffee.

~~~
peff
The counter never resets, it just keeps going up and we only look at the low
bits. So eventually it will need to wrap. It's doubtful that ever happened in
practice, even on a 32-bit system (you'd need to print 2 billion SHA-1s in a
single process, and even the largest repos have on the order of millions).

So the key difference between the old and the new is that the counter resets
to zero every fourth call.

~~~
unwind
Ah, right, that was the thing I failed to see. Obvious now, of course. :)
Thanks.

------
godson_drafty
They were off by a factor of 10 with the likelihood of being struck and killed
by lightning, according to the nws website.

To clarify: the likelihood of being merely struck by lightning is ~
1/1,000,000 per year. The likelihood of being struck _and killed_ is
1/10,000,000 , or about 1/2^23.25

Given this, you would only have to be struck and killed by lightning 6.8 years
in a row to equal a sha1 hash collision probability.

~~~
cakoose
More importantly, the comparison is useless. The odds of running into issues
with SHA-1 collisions in Git is a very different question from just the odds
of two random SHA-1 hashes colliding.

~~~
arcatek
Doesn't the birthday paradox make it much more likely to eventually occur ?

------
kannonboy
Great write-up! I love the focus on performance in this release.

I've put together another write-up of the Git 2.11 release that discusses some
of the other new features (and goes into a little more detail on some of the
'sundries'): [https://medium.com/@kannonboy/whats-new-in-
git-2-11-64860aea...](https://medium.com/@kannonboy/whats-new-in-
git-2-11-64860aea6c4f)

~~~
peff
That is a nice writeup. One of the interesting things for me was to see which
topics you decided to cover and which to omit. For instance, I noted `clone
--reference --recurse-submodules` as a potential topic of interest, but I am
afraid to point anybody to the `--reference` option due to its hidden dangers.

I'm also curious how you came up with 19,290 for a birthday paradox on a 7-hex
hash. I think it's 16,384, but probability can sometimes be tricky. :)

~~~
kannonboy
Thanks Peff, congrats on the great release!

I came up with 19,290 using the generalized birthday formula[0] (actually
after double-checking it's slightly closer to 19,291).

16,384 is the value you get using the square approximation method[1] which I
believe is a bit less accurate in terms of probability, but faster to
calculate. I think Git's using square approximation under the hood -- which is
probably a good thing since I think it'll always yield a more conservative
result.

[0]:
[https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_col...](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem)

[1]:
[https://en.wikipedia.org/wiki/Birthday_problem#Square_approx...](https://en.wikipedia.org/wiki/Birthday_problem#Square_approximation)

~~~
peff
Right, I am so used to the square approximation being used for hash collisions
that I forgot it was an approximation. Thanks for setting me straight.

------
elevensies
I'm curious but not motivated enough to really search for it, but for
ambiguous hash abbreviations, why not select the oldest, since presumably it
was unique at the time it was created?

edit: I guess that information must not exist or I assume they'd be doing it.

~~~
emmelaich
Yeah, I guess dates are only available for commits but not blobs or trees.

This is suggested by the disambiguation listing; if there were dates they
would be displayed I hope.

I think approximate dates _might_ be inferred, but since it might be
misleading and costlier to determine it makes sense to leave it out - at least
in this version of git.

------
transfire
Converting to and from base 36 (or 32) would probably do more to help the
problem than any heuristics. Compare:

    
    
      66c22ba6fbe0724ecce3d82611ff0ec5c2b0255f
    

to:

    
    
      c04bo5604v5qsp6asgasjp9y4paxu8v
    

That's approx a 25% gain in compactness.

~~~
based2
[https://github.com/thanatos/baseunicode](https://github.com/thanatos/baseunicode)

~~~
hossbeast
Interesting, I was hoping for an example at the end of the readme

~~~
deathanatos
(I'm the author.) That is an excellent suggestion. Now that you've mentioned
it, it seems like a glaring omission. I'll try to fix that up once I get home.

In case the other parts of the README weren't clear, the concept was to use
_any_ Unicode character. I was even thinking of (eventually) getting it to
encode data with combining accents. Note that it was intended to optimize the
string for screen display space (pixels), _not_ space.

I'm not sure it'd be a good fit for git hashes, simply b/c sometimes you need
to type or speak a git hash, and the output from baseunicode was definitely
not intended to be pronounceable. (Esp. since I was thinking of using CJK
characters, but trying to weight them down for their wider screen area; but
imaging trying to describe that to a co-worker who might only speak English.)

I wrote it mostly for fun, after I had a couple of difficult to transfer files
between machines in the cloud. I find myself ssh'd into weird places, and
scp'ing is sometimes trying. (I do machine-to-machine, so I almost always need
-3, which I don't know why that isn't the default; scp doesn't deal well with
the file being only accessible by root, not your user; scp has the weirdest
arg syntax if you have ill-advised characters in your filenames, like spaces…)
So I was cat'ing files, copying them from one window, and pasting into another
window. base64 for binary data, tar/gzip for making it smaller. But for the
copy/paste, scrolling is a pain, and heaven forbid if you're in screen/tmux.

(Also, if you find yourself really without a file that you can't scp, you can
"re-implement" scp with `ssh $hosta sudo tar -cz <stuff> | ssh $hostb sudo tar
-xz`; see also the -C flag, and don't forget you can also `ssh $host "sudo
bash -c 'cd /where && tar -cz <stuff>'"`)

------
guomanmin
Participate in Atlassian Research

My name is Angela and I do research for Bitbucket. I’m kicking off a round of
discussions with people who use Git tools. Ideally, I’d like to talk to people
that sit on a team of 3 or more. If this is you, I would love to talk to you
about your experience with <using> Git tools, or just some of the pain points
that are keeping you up at night when doing your jobs.

We’ll just need 30 mins of your time, and as a token of my thanks to those
that participate, I’d like to offer a US$50 Amazon gift voucher.

If you’re interested, just shoot me an email with your availability over the
next few weeks and we can set up a time to chat for 30 minutes. Please also
include your timezone so we can schedule a suitable time (as I’m located in
San Francisco). Hope to talk to you soon!

Cheers, Angela Guo aguo@atlassian.com

------
MBCook
It's been a while since I looked into what Git was up to in the latest
version.

The release notes mentioned protocol improvements with git-filter that can
dramatically speed up git-LFS (the large file storage plugin).

Does anyone know if there are any plans to make git-LFS part of the base
instead of an add on?

~~~
shakna
I'd assume there's licensing issues before anything else: git-lfs is MIT
licensed, git GPLv2.

~~~
bonzini
It's a compatible license, so it's not an issue.

------
sulam
I'm curious if anyone knows if the optimizations Twitter made to improve fetch
performance for large, active repos have made it upstream yet? I don't work
there anymore and neither do any of the people who were originally doing that
work, but it was a pretty impressive speed up (I could git pull thousands of
commits and be done in under a second on a 3GB repo with no large objects). I
know the watchman support made it in, which was the other half of what made
large repos perform well, but I haven't seen mention of the log-structured
patch queue stuff that helped the server by eliminating most of the work to
calculate what to send on a fetch. Anyone know?

------
atemerev
Hexadecimal dumps of binary data are the worst of all worlds if used as
keys/references. Hard to memorize, hard to type, look ugly, aren't compact.

Better alternatives:

Base64 without padding: compact.

Grouped decimals: slightly less compact than hexadecimal, but extremely easy
to type and pronounce. E.g. 577-467-341-467

~~~
joallard
Case-insensivity is important for some to be able to reliably remember a
string. I won't easily retain the difference between 'b4dQbFs31' and
'b4DqBfs31'.

Same thing when speaking it out loud. 'B four D capital Q B capital F s
thirty-one' is way more convoluted and error-prone than 'B four D Q B F S
thirty-one'.

The best thing I've found that fits this criterion is Crockford's Base 32 [1],
basically the extension of hex digits, removing letters ILOU.

But Base 32 (case-insensitivity by proxy) constrains us to 5 bits, which is
only a 20% reduction over the 4 bits of base 16. So instead of the 20 bits
`1ab2f` we could express them with something like `1qm3`.

Or we could be using words...

[1]:
[http://www.crockford.com/wrmg/base32.html](http://www.crockford.com/wrmg/base32.html)

~~~
eridius
Regarding Base 32, I love the justification used for removing U. I, L, and O
all have potential confusion with digits, but U was removed because of
"Accidental obscenity".

~~~
OJFord
I'm surprised it wasn't just "and all vowels" with the same reasoning, or at
least 'a' (because I can more readily think of examples than for, say, 'e').

I suppose, though, there's an attraction in using b32 rather than b29...
(Though I notice mid-word apostrophes are double-tap-selectable at least on
macOS, so perhaps swapping 'a' for ''' would be advantageous, if more
complicated to explain.)

~~~
eridius
The two "worst" swearwords that pop into mind both have "u" in them.
Meanwhile, the swears with "a" in them seem to be the tamest of the lot.

~~~
OJFord
"Tameness" varies extraordinarily by region - an infamous example being 'twat'
which throughout the UK ranges from friendly to vulgar.

Regardless of variance, I don't think I'd regard either the above or the name
of a fat character in Austin Powers as "tamest of the lot".

~~~
eridius
I find it interesting that you're willing to say "twat" but not "bastard". I
admit that I didn't think of either of those words, though I think they're
still much tamer than the ones with "u" in them. Really what I was thinking of
was "crap" and "damn".

------
glandium
It's interesting that Git 2.11 shortened the delta chains on aggressive
repacks, when mercurial happily creates chains of > 1000 deltas (afaik, it
doesn't have a hard limit, it stops using deltas when the size of the required
deltas is larger than the full text).

Although it's worth noting mercurial and git use different delta formats.

Edit: This is apparently what chooses to store a delta or not in mercurial:
[https://www.mercurial-
scm.org/repo/hg/file/9e29d4e4e08b/merc...](https://www.mercurial-
scm.org/repo/hg/file/9e29d4e4e08b/mercurial/revlog.py#l1395) self._maxchainlen
is not set by default.

------
peterwwillis
Jesus christ Git's interface design is horrible.

    
    
      Master Coder: Hmm. We refer to changes by a long, totally non-human-parseable string of characters that nobody can memorize,
      and when we abbreviate it, it doesn't work 100% of the time. What can we do about it?
      
      Novice Apprentice: Well... how about we stop using a long totally non-human-parseable string of characters that no 
      human can can memorize just to briefly refer to specific changes in human-readable output?
      
      MC: What?! HERESY. Making a human use a cryptographic hash to reference a single random logical reference point in a mass of
      logical binary objects among millions of others is clearly the best way to go. We just need some quick fixes.
      
      NA: But... you can't reference it via speech, it doesn't work reliably via text when abbreviated, and it gives absolutely no
      context whatsoever as to what it is. What's the point of using a cumbersome, inhuman reference for something you
      only need to talk about briefly through a computer interface?
      
      MC: SILENCE FOOL! ME DESIGN GOOD. YOU MAKE CODE MASTER ANGRY.
      
      NA: Err... but what if we just let the program rename the references temporarily to human-parseable short strings, and resolve
      what they are in between logs and commits?
      
      MC: I SAID SILENCE!! Just for that, i'm going to make you explain to a new user why we force people to regularly clean
      our their repositories after doing complicated things with them, like merging.

~~~
exDM69
> Git is full of some of the worst design decisions in modern software
> history.

It's also full of some of the best software design decisions. The internals of
Git are simple and elegant and they work like a charm. There have been very
little changes to the internal workings since the first commit of Git.

I agree that the user interface is inconsistent, ugly and hard to grasp. But
if you have a solid understanding of the internals, with the help of the git
manpages it's pretty easy to achieve what you want.

There's software that's intended to work like a black box, just poke around
the user interface and you can get stuff done. Git is not one of those. You
need to understand the internal model, accept that the UI sucks, embrace the
manpages, quit whining and get shit done.

~~~
peterwwillis
> I agree that the user interface is inconsistent, ugly and hard to grasp. But
> if you have a solid understanding of the internals,

You should not have to understand the design of the modern combustion engine
to operate a car. You shouldn't even have to understand bicycle geometry to
ride a bike. It's a friggin tool!

Why on earth would you _want_ to have to become a master of the design of a
tool to use it? The whole point of making a complicated tool is to make your
life easier! No other revision control system is this complicated and
annoying. And i'm not going to quit whining, it sucks and it's stupid and it
doesn't have to be, and people keep worshipping it like it's this amazing
invention like nobody's heard of a merkle tree before. It treats chunks of
changed text like blobs, wow, nobody's done that before. Oh a collection of
packed objects, how novel.

By the way, the internals aren't that great either. You have to constantly
"clean" your repository as it collects useless crap, merges become such a
headache you're asked to destroy your merges to make it somewhat sane to
maintain, handling "large" objects is a mystery to us, and the entire design
is intended to interface with others and yet it's designed as if your personal
repo were the only repo in the universe. You have to use a dozen filters and
options and processes to do what a single script could do if it asked you what
you wanted to get done, but we have to literally sacrifice a goat on the
mountain of Unix Philosophy in order to get something done and go back to
_doing real work_.

Why does it force us to send these anonymous patches and not allow merges to
happen intelligently among a group using locks? Why does it force a maintainer
to do all the work of managing patches? Why does it waste local storage when
we don't need 99% of the repository most of the time? Why can't log messages
and status be rendered in a quasi-usable way, or, heaven forbid, we have a
Curses frontend for the myriad of random chunks of text and commands we have
to memorize to accomplish one small simple operation? Why does the repository
fall apart into a completely unusable mess if you don't weed it once a day?
Why do we have to shuffle around a bunch of commands to perform a single
simple task that any reasonable program 15 years ago would have done for you?

Answer: Because the design is crap. If people would at least just _admit_ the
design is crap, I would stop whining. But at this point I feel like i'm the
only sane human in a world full of people doing the work _for the robots_ and
smiling about how much easier their lives are now.

------
epberry
Big fan of the negative parent selector for merge commits. Also enjoyed the
writeup of the algorithm improvements for the various caches.

------
emmelaich
What a _great_ writeup.

------
OJFord
When a non-ambiguous short-hash _becomes_ ambiguous, can't it be disambiguated
by simply disregarding those not in existence at time of reference?

~~~
farnsworth
Imagine if you merge a branch of old commits from another repo or something,
which introduce short hash collisions. Then you copy/paste a short hash, and
Git doesn't know when that reference is from or which branch it might refer
to.

------
algesten
Why doesn't git upon commit ensure the sha-1 is unique by having some nonce as
part of it?

I guess if you only work with rebases instead of merges, it should be
possible, right?

~~~
nhaehnle
It would defeat the purpose of a content-addressable storage system. The fact
that the same file and the same tree will always have the same hash is
important for speeding up diffs, merges, and other operations.

------
supercoder
Git is the worst

