Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Git protocol version 2 (googleblog.com)
547 points by robmaceachern 11 months ago | hide | past | web | favorite | 163 comments



The current (and pretty much only, ever, despite Linus having been the creator) maintainer of git is a google employee [1], in case anyone else was wondering.

[1] https://en.m.wikipedia.org/wiki/Junio_Hamano


>"Linus Torvalds said in 2012 that one of his own biggest successes was recognizing how good a developer Hamano was on Git, and trusting him to maintain it."


I came across this email from Linus announcing the handover: https://lwn.net/Articles/145123/

It's interesting how the first ever git project itself was looking for new maintainer almost as soon as it was created.


wow! what an accolade


Seconded. I can only imagine how good Junio must be to have earned that kind of praise and trust.


Thanks, that really helps.

As an open-source advocate, my first thought was, "Why the hell is Google releasing a version of a protocol that Linus Torvalds wrote?"

Without that context, it would be like Google throwing up an announcement, "Introducing Google's Linux Kernel 5.0!"


Yeah, that was my reaction, and it made me sad that Google has so eroded my trust over the decades that I was turned off at seeing an announcement implying they are deeply involved in core open source tools. I mean, who else but companies swimming in cash can truly deeply support this stuff, and for the most part, the people working on these tools really do care about the open source community. But Google's reputation is so tarnished that my gut reaction is at odds with my rational one, and that's a sad thing to realize.


I wasn't aware that the maintainer of Git works at Google, so I was a bit surprised by the announcement too. But it wasn't because of any drama like Google eroding my trust or whatnot, just that my information was incomplete so my gut reaction was irrational.


That may seen odd, but it could happen in a open-source world: multiple parties releasing different versions of the same piece of software and calling it the same.


One fun example of this is the so-called RPM "5" fork[0], which is basically dead and almost entirely unused[1].

The result is the main RPM everyone uses will probably stay at version 4.x forever.

[0]: http://rpm5.org/

[1]: https://en.wikipedia.org/wiki/Rpm_(software)#RPM_v5


They could skip a version like PHP did. Among other reasons, since books and articles about PHP 6 had already been written long before PHP 5+1 came out, they went with 7.


Sure, they could. I don't think they've felt the need to. RPM tends to change slowly and conservatively.


Both Git and Linux are trademarked, presumably to prevent such hijinx


Linux is trademarked because of some "hijinx"...

"Initially, nobody registered it, but on August 15, 1994, [...] filed for the trademark Linux, and then demanded royalties from Linux distributors. In 1996, Torvalds and some affected organizations sued him to have the trademark assigned to Torvalds, and, in 1997, the case was settled." https://en.wikipedia.org/wiki/Linux#Copyright,_trademark_and...


Isn't it customary to at least rename the fork?


Customary, but unless the name is trademarked, not required.


Some licences require a change of name for substantial modifications, e.g. the Artistic Licence and Apache Licence v1. But those kinds of clauses are pretty rare nowadays.


Same here. I was like, "If there are about to drop 3 versions at the same time like angular, I need to you use SVN ASAP".


Non-Mobile link for those on desktop: https://en.wikipedia.org/wiki/Junio_Hamano


The mobile version of Wikipedia works perfectly fine in a browser. I personally prefer it for readability.


If you link the normal desktop version it the reader will automatically be redirected to the place they prefer.


I don't...


Fun fact: if you google “git blame” it returns his wikipedia entry.


Alright, I was wondering why this was published on the Google Opensource website. I had no idea. Yet, the Git project itself has not been published under their umbrella.

https://opensource.google.com/projects/list/featured


We currently only list project that are or were primarily developed by Google. We decided to include projects that started at Google and were since donated to foundations, such as Kubernetes.

But we aren't yet including projects where we are just heavy contributors, but they're not "Google projects". That includes Linux, git, LLVM, and a host of others. We do want to recognize them in our project directory, but want to make sure that they are distinguished from Google projects so that we're not implying something that is accurate.



[flagged]


I'm guessing that their job at Google allocates a significant percentage of their time to work generally on open source Git, with the rest on Google-specific needs and deployments like googlesource.com. This post on Google's open source blog fits squarely in the overlap of both.

As well, most of Google's releases in the developer tools, open source, or Cloud Platform worlds don't have ads - indeed GCP is an entire revenue stream not dependent on ads.

Source: Much of my day-job work on my last team at Google was open source on the GCP team, and I also got approval to own some side-project work done on my own time. So I got to see how they handle such things. I don't have specific knowledge of the Git team's arrangements, but this is an educated guess.


I guess others downvote you because your insinuation about how Google will sneak ads into an open source project. They've never done that before, so it is rather odd accusation.


They moved a bunch of open source functionality out of AOSP to force all-or-nothing adoption of their services + tracking.


Most likely it was noted because several of us probably immediately wondered why google would have taken it upon themselves to release a new version of something they weren't the maintainer of.


Google has as much incentive as anyone else to introduce a faster wire protocol. The article mentions the Chrome project, and don’t forget that Google Cloud has a git-repo-hosting product.


That's true, and while we all get to enjoy the new protocol, it seems like its primary beneficiary will be big organizations that have gigantic repos with massive numbers of refs in them. Furthermore, there's a clear orientation towards specialization -- putting that giant repo on a central server. Google clearly stands to benefit more from this work than those of us who use git as a distributed version control system. Who pays the piper calls the tune.


In all fairness, GitHub also has much to benefit from this too, playing host of the Mozilla git mirror, which also has a relatively large number of refs: https://github.com/mozilla/gecko-dev


My first thought was “Ugh, Google forked git and now we’ll have two competing protocols”

So its good to know why this was posted on Google’s site.


Even if he does it on Google's time, all his contributions are still licensed under GNU GPL V2 so Google cannot claim any ownership over Git.


The GPL is a license, not a Contributor Agreement or a Copyright Transfer. The author remains the owner of the copyright when they let others use it under a license, even open source licenses like the GPL.

Licensing a software out does not mean the author can no longer "claim any ownership". In the case of OSS licenses, it just means that they can't claim sole ownership.


Yes, but from the perspective of the end user (me), the license guarantees that I can use the current version of the software for free forever without Google or anyone else forcing any changes.

Plus, if I don't like any changes they make, I can fork my own copy and continue developing on that (which the community will certainly do in a heartbeat if needed).


Yes, of course. I was just trying to clarify that that guarantee is due to the software being licensed to you and not because ownership gets transferred from the author.


so right now, Google has a word to say about the direction of git.... But that's normal.


It is mentioned because this announcement is hosted on Googleblog.


Let that be a reminder to all the coders out there: if you ever design a protocol or file format to communicate between machines always remember to add a version field or some other way to allow for updates and revisions later without breaking everything. Having a way to specify extensions in a backward-compatible way is nice too.


> if you ever design a protocol or file format to communicate between machines always remember to add a version field or some other way to allow for updates and revisions later without breaking everything

Also, somehow make sure no servers, clients, or third-party middleboxes break when the version field is incremented. The TLS protocol designers had to give up on the version field; it's now going to forever be stuck at "TLS 1.2", since too much would break otherwise.


It's an universal truth: if you want to keep something from jamming up, you need to exercise it. It's true for the human body, for machine parts and for protocol features.


HTTP was at 1.1 for a very long time but it appears the upgrade to version 2 is going fine. What is the difference here? The protocol version exchange mechanism?


The HTTP 1.1 to 2 upgrade is only going fine due to massive work (mostly by Google) over a period of years. HTTP/2 was also able to benefit from a lot of pain that SPDY and WebSockets went through earlier. Protocol ossification is still a hard problem.


IIRC WebSockets has basically nothing to do with the way http2 is handled, and websockets are still going over a simple HTTP1.1 Connect/Upgrade. What is the connection between websockets and http2?


Early version of WebSockets exposed bugs in HTTP proxies, some of which were security problems: http://www.adambarth.com/papers/2011/huang-chen-barth-rescor... To fix these kinds of problems, the final version of WebSockets does not use a straightforward upgrade but instead has a kludgey handshake and content masking: https://en.wikipedia.org/wiki/WebSocket#Protocol_handshake https://trac.ietf.org/trac/hybi/wiki/FAQ

HTTP/2 doesn't have the same problems because it requires TLS+ALPN, but IIRC that "clean" solution was only arrived at after years of discussion and experimentation.


This is one of the major reasons HTTP 2 is only supported via TLS and only via a complex upgrade protocol.

It's not that you can just do "GET / HTTP/2.0" or something like that.

The TLS part is interesting, as wrapping a protocol into an encrypted channel solves a lot of these issues (but it can break again if you have stupid man in the middle boxes). It just doesn't solve the issue for TLS itself.


The main difference is that a "side channel" of the TLS connection (the NPN or ALPN extensions) is used to negotiate HTTP/2. The upgrade to version 2 without the TLS wrapper failed; so many servers and/or middleboxes had issues with it, that all browser makers decided "HTTP/2 is going to be TLS only" (the current "encrypt all the things" push played a small part, but the main reason was the compatibility problems).


Clients can fall back, and falling back to HTTP/1.1 isn't a security problem the way falling back to, say, TLSv1.1 is.


Because it's up to the client to request HTTP 2 if they support it. https://http2.github.io/http2-spec/#discover-http


But why is this not supported by TLS? Is it set up in such a way that it could never be amended to have a fallback?


If the newest version of a secure communication protocol includes some way to negotiate down to an older version, that opens the door to downgrade attacks - you risk ending up with a protocol that, in practice, has all the vulnerabilities of both versions.


You can work around this by having downgrade protection, and TLS 1.3 has this out of the box, it was also added belatedly to TLS 1.2 (but obviously the problem there is, you can still downgrade whenever either client or server knows TLS 1.2 but doesn't have protection yet)

In TLS 1.3 the downgrade protection works like this:

If I'm a TLS 1.3 server, and a connection arrives that says it can only handle TLS 1.2 or lower, I scribble the letters "DOWNGRD" (in ASCII) near the end of a field labelled Random that is normally entirely full of random bytes.

If I'm a TLS 1.3 client, I try to ask for TLS 1.3 from the server when I connect, if instead I get a TLS 1.2 or earlier reply, I check the Random field, and see if it spells out "DOWNGRD" near the end. If it does, somebody is trying to downgrade my connection, I am being attacked and can't continue.

This trick works because if bad guys tamper with the Random field then the connection mysteriously fails (client and server are relying on both knowing all these bytes to choose their encryption keys with ephemeral mode) while older clients won't see any meaning in the letters DOWNGRD near the end of these random bytes - so they won't freak out.

You might worry: What if somebody just randomly picked "DOWNGRD" by accident for a TLS 1.3 connection ? If every single person in the world makes one connection per second, this is likely to happen to one person, somewhere, only once every few years. So we don't worry about this.


Oh that's a good question in context of middleboxes. I don't know of any that force HTTP/1.1, but they might actually!


Amen!


Per the TLS 1.3 RFC:

> In previous versions of TLS, this field was used for version negotiation and represented the highest version number supported by the client. Experience has shown that many servers do not properly implement version negotiation, leading to "version intolerance" in which the server rejects an otherwise acceptable ClientHello with a version number higher than it supports. In TLS 1.3, the client indicates its version preferences in the "supported_versions" extension (Section 4.2.1) and the legacy_version field MUST be set to 0x0303, which is the version number for TLS 1.2. (See Appendix D for details about backward compatibility.)

It's really too bad that the version field can't be used as a version field anymore, but thankfully the "extension" format is pretty flexible in that regard.


>but thankfully the "extension" format is pretty flexible in that regard

Just like the version field.

I'm sure middlebox software is being updated as we speak to terminate connections with unknown versions in the „supported_versions“ extension.


If the extension field is anything like IP or TCP options, some middleboxes will also tamper the hell out of that field and strip unknown extensions, or just break connections.

Often-referenced paper in that field: http://conferences.sigcomm.org/imc/2011/docs/p181.pdf


In my opinion, they should break things that misuse the version field. Then maybe the makers will learn to develop properly.


It's almost invariably end users who suffer, not the "makers". And because of a human cognitive bias it doesn't matter that the middleboxes are wrong, if you get a new Chrome and it doesn't work you blame Chrome, you don't blame the middlebox that had been getting this wrong for five years.

Almost a year's work on TLS 1.3 was spent on working around problems with middleboxes. Because without that it would be impossible to deploy in practice. TLS 1.2 took years to deploy because so many middleboxes were incompatible and we had to wait for them to rust out.


What would break? Are you saying a TLS 1.3 client would not be able to connect to a TLS 1.2 server because the version request would cause the server to reject the client?


Yes. Or worse: a completely unrelated box in the middle of the path could drop all the TLS 1.3 packets, so instead of a clean rejection, the connection gets stuck.


Yes.


I'm mostly surprised they solved a critical server bug on the client side and by introducing even more hacks into the protocol. I mean, who in their right mind would run a public git server with this super easy to exploit DOS bug:

    Unfortunately due to a bug introduced in 2006 we aren't
    able to place any extra arguments (separated by NULs) other
    than the host because otherwise the parsing of those
     arguments would enter an infinite loop. 
I'm not sure if entering an infinite loop means what i think it does in this context but that's almost CVE worthy and they should release a fix and mark that version as obsolete as ever and never have to make their clients cater to it any more.


It's been fixed for almost a decade. You're asking for a retroactive CVE?

You can read about their fix by clicking the next link in the article.


DNS has no version field. I'm torn as to the choice here. On the one hand, DNS is backwards compatible with everything.

EDNS is the only way to extend the protocol now, which is basically just adding additional Records to the Message that are designated as Extended DNS records, and treated specially.


The IETF is working on a document which describes many reasons why DNS may stop working. EDNS related issue are in section 3.2:

https://tools.ietf.org/html/draft-ietf-dnsop-no-response-iss...


My own code to decode DNS packets [1] fell afoul of section 3.1.3 of the draft document. I fixed the issue, but the reason I originally rejected DNS packets with unknown flags was on the assumption of potential garbage being used as a possible exploit.

[1] https://github.com/spc476/SPCDNS


This is a great resource. Thank you for sharing.

I don’t read that as DNS stoping to work, but more reasons why DNS is flaky in different scenarios.

Some of the issues there are things related to mitigation’s against reflection attacks etc. I haven’t read the entire doc, but does it go into concerns around DDOS and other such things, and how DNS servers to mitigate those attacks?

Edit: right in the intro. So a server needs to “understand” when it is under “attack” and only then put in mitigations against the attack. In the worst case, the server doesn’t do this, fixes the issues in this RFC to always respond and then amplify the attack.


The message header hasn't been fully exhausted yet. Beyond the spare bit[1] in the header there is unassigned OPCODE values which can be used to bend the format in new ways[2].

1] It was briefly used experimentally if I recall

2] https://tools.ietf.org/html/draft-ietf-dnsop-session-signal


The specification of the v2 protocol is here: https://github.com/git/git/blob/master/Documentation/technic...

One of the more exciting things is that it can now be extended to arbitrary new over-the-wire commands. So e.g. "git grep" could be made to execute over the network if that's more efficient in some cases.

This will also allow for making things that now use side-transports part of the protocol itself if it made sense. E.g. the custom commands LFS and git-annex implement, and even more advanced things like shipping things like the new commit graph already generated from the server to the client.


If you are to link to a git repo, don't link to some unoffical mirror. That would just confuse search engines.

The specificiation of the v2 protocol is here: https://git.kernel.org/pub/scm/git/git.git/tree/Documentatio...

(There are a couple of repos listed as official mirrors, such as the googlesource.com one, but the one you linked to isn't one of them.)


The repository I linked to is official. See https://public-inbox.org/git/xmqqindt6g1r.fsf@gitster.mtv.co...

What list are you referring to? If it doesn't list the one on GitHub it needs to be fixed.


I had a release announcement open for other reasons and noted the different URLs, but I was wrong and the link is absolutely fine. Thanks. If I could edit the post I would, so I'll just let this sit here in case anyone is confused (as I was).


Too bad they didn't make Git LFS part of Version 2[0]. Most vendors[2] support LFS already but because it isn't required, some still lack it and its support cannot be assumed.

[0] https://git-lfs.github.com/

[1] https://github.com/git-lfs/git-lfs/wiki/Implementations


I say that's LFS' fault. Why do you even need a custom server? It should just be able to use any ol' file server or S3-API compatible service, and do everything on the client side.

I find git-annex a much better solution, it's a shame everyone went with LFS.


My experience with git-annex is that it seems heavily designed for individuals and not for projects. The places it looks for files are often just a computer you were once developing on, and it sometimes expects you to go find that computer. It never forgets about any crazy place your files have been.

It was very hard to use in asymmetric cases where different people have different credentials, such as where one person has access to a computer and others don't, or where a couple of core developers have authenticated R/W access to a file server or an S3 bucket and everyone else just has HTTP.


Git-annex doesn't look for files in random places. It uses the regular git remotes, plus something it calls "special remotes" which are basically accounts on file servers/S3/etc that you can manually add.

If Github et all thought this was confusing, they could have made a "beginner's mode" that auto-selected the storage server based on the git server, like LFS does. Which would still have been better, since it wouldn't have required a custom server API.

It was very hard to use in asymmetric cases where different people have different credentials

Right, but LFS can't be used in asymmetric cases at all - it assumes anyone with access to the git repository has access to the LFS storage area.


> Right, but LFS can't be used in asymmetric cases at all - it assumes anyone with access to the git repository has access to the LFS storage area.

Wait, really? I thought that Git LFS let people with push access push files to the LFS area, which can then be read by anyone. That's asymmetric in the way everyone expects from GitHub. But I didn't use Git LFS because it's too expensive.

Yes, I probably encountered extra weirdness from git-annex, from the fact that the codebase was on GitHub, which doesn't support git-annex, so _everything_ in git-annex had to be on a different remote.

If it was meant to be used with the upstream as the only remote, that makes things make a lot of sense, and explains why my attempt to use it felt a lot like early Git, where there was no good upstream service like GitHub.


What kind of changes to the wire protocol would help git-lfs? It seems to have no specific dependencies on protocol features.

If standard git ever implements shallow blob fetching, it would preferrably make git-lfs obsolete rather than help it.


Requiring Git-LFS support would be rather problematic for anyone who self-hosts git repos over SSH.


They'd just stick to Protocol 1


Why should people who self-host repos not be able to benefit from the improvements in protocol 2? Especially if other future extensions to protocol 2 prove useful for self-hosters?

Git is a decentralized version control system. Its core networking protocol must remain useful for people who self-host.


Git LFS is not part of core-git, but an extension built and maintained by github, and the code lives outside of the git tree, so it cannot be a required part of the protocol.


> Git LFS is not part of core-git

I know, that's what I am suggesting should change in version 2.0. It is a widely supported popular extension that solves a major pain point for Git, most vendors have adopted it.

New things can absolutely be required as part of a new protocol version, in fact this blog post lists several new things that will be new in 2.0 and beyond.

The analogy I'd use is HTTP/2 and SPDY. SPDY started out as a Google produced extension to HTTP, gained popularity, and was then standardized/merged into the HTTP/2 standard. All I am suggesting is Git LFS receive the same treatment.


The way to make that happen is for some interested party to take the LFS code and submit it for merging into git proper. If there were prior attempts, study them carefully and learn from them. It probably won't be accepted the first time, so you need to be persistent, addressing reviewer's comments along the way.


git v2.0 came out 4 years ago; these release notes are regarding a new version of the wire protocol used to communicate with remote repos.


I wouldn't be so sure. They said one of the motivations was to "unblocking the path to more wire protocol improvements in the future".


> but because it is required

Just to confirm, but you meant "because it is not required", right?


Right. Edited.


I'm a very (very) minor contributor to git.

If you are at all interested in hacking on Git, it's not that difficult. Knowing C and portable shell scripting for writing tests are the big things.

One sticking point, you need to submit patches to the mailing list, you can't just do a github pull request.

See https://github.com/git/git/blob/master/Documentation/Submitt...

I still see github pull requests rather frequently, even though they have never been allowed. All discussion AND patches go through the mailing list, much like the linux kernel.


It's unfortunate that github doesn't let a project disable the on-website UI for pull request submission; as it is it's easy for somebody to end up wasting their time trying to submit a change that way. (QEMU has that issue too.)


Totally agree! I made nopullrequests.com to help solve this.


And that's nice, but I'd also love to see a bot that formats the pull request into a patch email for you.


Sometime in the far future, someone will write an interesting story about how a double null byte came into existence in the git request protocol, and it will be amusing and interesting to look back. As the saying goes, hindsight is always 20/20. I'm glad that they found ways to maintain backward compatibility, at only a minor cost to understanding things.


It's quite a comedy that this feature has not been implemented for at least 6 years, solely because the raw git:// protocol's parameter handling was severely broken, and feature detection by disconnecting and retrying [1] was ultimately deemed far too dirty.

[1] https://public-inbox.org/git/CAJo=hJtZ_8H6+kXPpZcRCbJi3LPuuF...


Wait, why was this posted by Google? I thought Git was made by Linus Torvalds.


Git was created by Linus Torvalds, but out of the 50k+ commits on the repo, only 250 or so are from him, with only 14 in the past 6 years. [1]

[1] https://github.com/git/git/graphs/contributors?from=2012-03-...


Here is a more detailed analysis, which shows all contributors:

https://public.gitsense.com/insight/github?r=git/git#b%3Dgit...

These are contributions by Linus:

https://public.gitsense.com/insight/github?r=git/git#b%3Dgit...

and as you can see, his contributions, really tapered off after 2010, while contributions from Hamano remained steady from 2008 to present date, as shown below:

https://public.gitsense.com/insight/github?r=git/git#b%3Dgit....


> Linus Torvalds said in 2012 that one of his own biggest successes was recognizing how good a developer Hamano was on Git, and trusting him to maintain it.


It's because a Google employee implemented protocol v2, and wrote a post about it.


Opensource though.


Is there a git protocol variant that allows the client to avoid downloading objects that it already has stored locally in another repository or cache?

For example: I have the Linux kernel already cloned in some directory. I clone a second repo which has the Linux kernel as a submodule. Can I clone the second repo straightforwardly without having to download Linux a second time? (Well yes, but only by manual intervention before doing the git submodule update - it'd be nice if objects could be shared in a cache across also repos somehow).



You could literally link the two object directories?

I just tried this and it seems to work:

  git clone git://github.com/git/git
  mkdir git2
  cd git2
  git init
  cd .git/
  rm -rf objects
  ln -s ../../git/.git
  cd ../
  git remote add origin git://github.com/git/git
  git fetch # returned without downloading anything
  git checkout master
  ls # etc.
If you seriously want to use this, you'll probably want to hard link the contents, instead. But iirc git clone from local disk already does that, for you?

In short: clone your local copy and taking it from there?


You can also use alternates:

  echo ../../../git/.git/objects >> git2/.git/objects/info/alternates
or use the original as a reference:

  git clone --reference git git://github.com/git/git git2
This sets up the alternates for you.


There's git command for that: https://git-scm.com/docs/git-worktree


Maybe this project could work for you?

https://github.com/jonasmalacofilho/git-cache-http-server


I'm assuming from your comment that you're already aware of --reference but it doesn't completely meet your needs? The only other thing I can think of would be to use the 'insteadOf' configuration to tell Git to use your local clone instead of the remote one. Search 'git help config' for 'url.<base>.insteadOf'.


AIUI, the git ssh protocol is just the git protocol tunnelled through ssh. So why do they need different mechanisms for signalling V2?


Deploying Git over SSH entails locking the precise command line executable by the public key you use to authenticate. Locking SSH SendEnv down is mandatory too, otherwise thousands of people would have shell access to GitHub.com!

This isn't even theoretical, there was an environment-related bug not 5 years ago involving Git. At least BitBucket was impacted, I think GitHub were patched before it was announced


I don't think that answers the parent's question, if the update was in the git protocol itself (encapsulated in the SSH session) then you wouldn't have to change anything at the SSH level.

As you point out selectively allowing a new environment variable could open a can of worms for shared hosts like github if they mess up their implementation.


Because if you tunnel through ssh, you can signal v2 using ssh mechanism of setting environment variables. If you don't tunnel, you don't have this option. This is clearly described in the article.


I think what the person you're replying to is asking is why not, in the case of ssh, use the signaling in the git protocol, since it will be there anyways. That is, if you don't tunnel, you must signal w/ the git protocol. If you do tunnel, why use a different mechanism, since the signal in the git protocol must be there?

I think that this is because the SSH protocol isn't just encapsulating the Git protocol directly (the initial assumption of ssh "just" encapsulating the git protocol is not fully correct), and one of the parts that differs is this particular part. (Since on the git protocol side, we need to select a "service":

> a single packet-line which includes the requested service (git-upload-pack for fetches and git-receive-pack for pushes)

which in SSH would be done not by transmitting that packet-line but by instructing SSH to run that particular executable.

> This is clearly described in the article.

It really isn't, IMO; if you don't have precise knowledge of the protocols involved, I don't think anything in the article particularly spells this out.


Yes, but once you've updated the git protocol, ssh support comes for free. Having one mechanism is simpler than having two. And as your sibling notes, setting env vars from ssh has disadvantages. So why bother?


> Server-side filtering of references

I wonder if this will be somehow exposed by git daemon. It could be used for easy per ref access controls.

For example Git Switch [0] that uses Macaroons had to clone the repository to implement per ref ACL.

[0]: https://github.com/rescrv/gitswitch


I thought google uses hg, have they switched over to git as well?


For all the "big" Google projects they use a proprietary system called piper.

I think all their open-source stuff (Angular, GoLang, Android) uses git (and sometimes Gerrit).

Although given Google's scale, I'm sure there's some teams/projects that use Mercurial.


In fact, developers are allowed to use whichever VCS tool they want on their local machine (or on the online coding in the cloud CitC environment). Some opt to use hg. The canonical repo is in piper though, so the hg commits or git commits get converted before they land.


Gerrit is a review server that uses git. In fact, Gerrit now stores the majority of information in git itself for all the information it uses.

So for Google external projects, they use git.

> Although given Google's scale, I'm sure there's some teams/projects that use Mercurial.

I doubt it. Their tooling is probably pretty specific, and now that code.google.com has shut down, they probably don't have any review servers that support it.


Yes and no. The answer is actually quite complicated... and I have no idea if I'm allowed to talk about it publically or not.


The most recent reason public reference I can find to this is from 2016: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

Here's the money quote:

"The team is also pursuing an experimental effort with Mercurial an open source DVCS similar to Git. The goal is to add scalability features to the Mercurial client so it can efficiently support a codebase the size of Google's. This would provide Google's developers with an alternative of using popular DVCS-style workflows in conjunction with the central repository. This effort is in collaboration with the open source Mercurial community, including contributors from other companies that value the monolithic source model."

Project that forward logically by two years.


Not sure why I got heavily downvotes. This above was the pieces of information that got me to think they were all on hg. So judging from the comment I stand corrected.


Your assumption was pretty reasonable based on the public information. Honestly I’d love to talk about how Google does source control/ code review etc. because it’s actually pretty interesting at this point. You know... for some values of interesting.


Then don't waste peoples time with vacuous comments.


Ironic reply.

Sincere apologies if you can't derive any information from my comment, but that doesn't mean there isn't any there.


The only information is “google employs me”.


Does Google use hg? “Yes and no. It’s complicated.” You can’t read anything into that?


Everything speaks Piper.

Devs can use the mercurial/git clients mentioned in the paper linked by harveynick.


> Gerrit is a review server that uses git

Yup! I use Gerrit at my company and share Administration duties with our Devops team.

I know Android uses Gerrit I just wasn't sure if Angular and co. did which is why I worded it a bit more vaguely.


Go started on Mercurial and then eventually moved into Git.


And they like neither. They really want a versioned filesystem.


That's interesting. Do you have more details / refs?


I understand this was private communication from Rob and Russ.


well dang! :-)


Is Git a Google project now?


No, but many of the core contributors are employed by Google and spend time on it as part of their day job (with Google's knowledge and permission). This post straddles both the open source part of their jobs and the "Git deployment at Google" part.


BRB switching to Mercurial


This is disgusting. So little forsight in the past... At least the outcome of quite useful.


Ah yes, how disgusting that the developers of this free software that I've done nothing for except use for years made an unfortunate decision a decade ago.


IIRC, they were in a serious time crunch when they drafted/made git. I can't remember the whole story...


At the time Linus was the sole author/contributor of git, and he needed a replacement for BitKeeper in a hurry. BitKeeper had been made unavailable for Linux kernel development because the proprietor of BitKeeper was really unhappy that Tridge had reverse engineered the protocol and created an open-source client[1] which could talk to the Bitkeeper server. Linus created the first version of git sufficient to do a kernel commit in ten days[2].

[1] https://sourceforge.net/projects/sourcepuller/

[2] https://www.linuxfoundation.org/blog/10-years-of-git-an-inte...

(As Tridge tells the story[3], he telnet'ed to the bk port and typed "help" so it wasn't that much of a reverse engineering effort. :-)

[3] https://lwn.net/Articles/132938/


And really, like other famous software that people love to heap shade w.r.t how awful it is focusing only on it's warts instead of the immense productivity realized as a result, git really does have some nice parts and it was the best option for a while. The fundamental concepts of git is really not that hard to understand -- it's fundamental architectural model is event sourcing, and it's fundamental data structure is a DAG. Those are pretty good choices.

I personally have stuck to kind of basic git usages (call it "Git: The Good Parts" if you will), and have never had the problems people claim to have with git. It just has always worked, and it has always been there for me.


I thought Linus Torvalds was almost wholly responsible for the initial development of git? Even so, everything, especially software, is easier in hindsight....


Sorry, I was using the 'singular' they.


Using BitKeeper as the SCM for the Linux kernel always seemed like a bad idea and when issues between the company and the community peaked git was created.

https://git-scm.com/book/en/v2/Getting-Started-A-Short-Histo...


Agreed.

Bitkeeper itself is open-source these days available via the Apache 2.0 License, but it is too little, too late:

http://www.bitkeeper.org/


s/they/he/. It was Linus himself who alone created git, within a few weeks (two or three). At first it was just a handful of shell scripts, but it was self-hosting pretty early on.


Linus wasn't the only one with git; all Linux devs were in a hurry. This is also why Mercurial happened.


oh neat, and it's on a google blog.

That's great. Another subtle reminder that this ad company has way too much control.


Interesting that they took to the Google blog to announce this; is there a corresponding LKML post?


Why LKML? Despite Git's origins from and use by the Linux project, it isn't especially tied to it now.

LKML would presumably be the place for Linux to announce when they adopt this.

The Google open source blog is among the several credible options for this post, since Google employs much of the core Git team, and this post discusses their experience deploying Git protocol v2 at Google.

As noted in the blog text, it's not in a released version of Git yet, just Git master branch. So maybe it'll appear on a dedicated Git announcement list, if any, once that happens.


Junio posts on the list when there's a new release, e.g. https://public-inbox.org/git/xmqqwoxw6kkk.fsf@gitster-ct.c.g...

It seems that https://groups.google.com/forum/#!forum/git-packagers is the closest thing to a formal announcement list that there is.


Okay, I guess that tie continues for historical reasons. At least people who don't otherwise subscribe to LKML can still receive Git release announcements via the second link.

I presume Git 2.18 (the first release supporting protocol v2) will be announced via both channels once it's out.


> support for v2 was recently merged to Git's master branch and is expected to be part of Git 2.18

Not yet, but presumably there will be a post like this: https://lkml.org/lkml/2018/4/2/425 when it is released. It is strange that the Google Blog is the first place to announce it through.


As mentioned in another comment, protocol v2 was implemented by a Google employee, and they decided to write a blog post about it. This is not an official git announcement.


I found a mention of it in a "what's cooking" post on the git mailing list (Message-ID <xmqqvabm6csb.fsf@gitster-ct.c.googlers.com>). But I can't find a direct link on gmane.com right now.


Git didn't have a proper version number or extensibility field in its protocol? That's quite a bit of hubris.


Or, more likely, an oversight.


Hmm, I haven't designed very many data formats or wire protocols, and I won't claim I got it right any of those times, but I included some kind of extension possibility every time.


I damn near released a (private) message protocol without a version field a couple months ago, and I know better. Fortunately stopped myself and added it before any actual data got released.


Git was a 10-day urgent project. Given the timeframe, it's done remarkably well.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: