Hacker News new | past | comments | ask | show | jobs | submit login
Go Module Mirror and Checksum Database Launched (golang.org)
254 points by ingve 48 days ago | hide | past | web | favorite | 118 comments

Vendor. Everything. Always.

I appreciate the fact that Google is investing in infrastructure to make the Go ecosystem robust, I really do.

Having said that I encourage everyone to do a simple experiment.

Check out your code into some random directory. Copy the directory to a USB drive and walk it over to an air gapped machine (no wi-fi, no ethernet, clean dev environment installed). Copy the directory to the box, make some small code change and try to build your binary.

If your build fails you're doing it wrong.

In real life one of the following will happen[1]:

- Your network connection to the Internet will be down

- Your network connection to the corporate network will be down

- The services you depend on (Github, Google proxy, Docker hub) will be down

- The services you depend on will throttle you

- The services you depend on will serve you wrong or corrupt data

- The services you depend on will be offline for maintenance

- The certificates on the service you depend on were not renewed in time

- The account you use on services you depend on will be inexplicably suspended

Murphy's law being what it is, some or all of the above will happen when your SaaS is on fire, your customers are screaming, and you're trying to roll out a fix. Meanwhile your company is losing approximately your personal lifetime earnings in revenue every half an hour.

Vendor. Everything. Always.

(and have backups, lots of backups)

[1] Based on highly educational career experience

I don't see any of those as reasons to commit dependencies to the same VCS repo your source control lives in.

Your first point is particularly laughable. If your network connection to the internet is down and you're a SaaS provider, you're probably going to need to fix that before you can roll out a fix to your hosted software.

That said, they're all good points to consider when building out a deployment pipeline, and in general mirroring your dependencies is a great idea. But committing a vendor directory is a poor solution/mitigation for most of those risks.

> Your first point is particularly laughable. If your network connection to the internet is down and you're a SaaS provider, you're probably going to need to fix that before you can roll out a fix to your hosted software.

Don't assume that your SaaS runs on the same network as your build machines. In most cases this is not the case - or at least it shouldn't be. Your SaaS might be broken in some way, you're trying to create a build to fix it on your secure corp network and your connectivity to the Net disappears. And it's 3 AM in San Francisco and everyone who can fix it is asleep, and meanwhile your European customers are not happy.

Let's be clear. You can mitigate the effect of all these problems one way or another. But the mitigations are easier if you have less moving parts and less dependencies on systems you don't control.

Lets face it, ultimately it all comes down to limiting your business risk at least cost (because money). One simple way to do that is if your build environment travels with your code down the years.

> Don't assume that your SaaS runs on the same network as your build machines.

That wasn't the assumption. It's laughable because under any condition of where your hosting and build infrastructure live, you'll need to restore your internet connection before you can deploy whatever software fix you're trying to ('deploy' in the sense that your customers have access to the deployed fix).

> I don't see any of those as reasons to commit dependencies to the same VCS repo your source control lives in.

Why not? What's the downside except having a "update dependency" commit from time to time.

It's a great solution to always have your dependencies at hand, offline and even versioned. Makes it easy to see when a bug got introduced by a dependency or roll back an update. Especially for projects that don't have a build pipeline, cached dependencies or just smaller dependencies that aren't very well backed up from other people.

I think it's a great way to make sure things are still running after a few years (personal projects) and everything still works without having to hunt down some dependencies or find alternative mirrors.

> What's the downside except having a "update dependency" commit from time to time.

This isn't a downside, it's a chance to vet and code review your dependencies which you should be doing anyways (but nobody does).

They themselves are not, but lets look at another consideration. How will you catch a backdoor added to one of your dependencies?

- Scan them and hope someone else noticed and reported before you used it? - Assume your team mates reviewed them when they were updated? - Add SIEM like capabilities and monitor network connections?

It should really be all 3. There are counter points of course, but if you vendor them, they'll in a PR and reviewed just the same as the rest of your code.

Of course, where do you stop? You trust your distro to do the right thing and you don't review that code, etc. So the more generic question is why do you trust the golang modules you use?

How is it a poor solution? It is extremely simple, pragmatic and effective.

I dont think its a poor solution. I have had great success keeping all of GOPATH in git. Its just easier to deploy new packges. The build machine doesn't even need internet access at this point.

I'm not sure a solution that solves 100% of the problems, with minimal to no side effects, is a poor solution.

There are definitely downsides - a gigantic git repo is a big one. Lengthy clones, far slower git operations, etc. Many projects I've seen go this route will exceed 10s or 100s of gigabytes pretty quickly, and git is very unhappy about sizes like that.

> Vendor. Everything. Always. [...] Your network connection to the corporate network will be down

I think this is a bit extreme and becomes very hard to manage dependencies department wide. I can walk over to the storage machine if corporate network is down. I know that some people want the entire internet, every git repo, every docker image, every game/binary asset, and every needed dependency available locally, but this is just not reasonable. This kind of hardline stance disregards the benefit of intranet stores, be it for code, docker images, whatever.

There is a middleground between not relying on third party services and requiring the ability to build after a fresh clone in a vacuum. Some builds simply require more than vendorable pieces, and just telling them they're doing it wrong is like telling someone they are doing it wrong relying on their corporate email servers instead of walking across office and delivering a hand-written note.

Maybe something like, "vendor code as much as you can" is a more reasonable approach than "always" this, "you're doing it wrong" that.

> I know that some people want the entire internet, every git repo, every docker image, every game/binary asset, and every needed dependency available locally, but this is just not reasonable.

Sure it is. In most Enterprise outfits its even policy. Ask yourself why that is.

I don't mean to be flippant about it, your point is actually quite reasonable. If you are a solo developer, or a small company, then don't worry about it. Github will be up, Google proxy will have your back, your Net connection will work. Once per year when all the stars align just wrong your builds might get delayed but it isn't worth worrying about it.

However in certain industries if the delay during some incident is costing you more then a few digits per minute, even once per year, management starts to notice. Long term support contracts are also a thing, e.g. ensuring that your code still has the artifacts to build on Red Hat 5 or some such. If you think about it that's nothing but outsourcing vendoring to a third party for large amounts of money.

While I agree with you, we in the Java and .NET communities enjoy decentralized library distributions, so what most Java/.NET shops end up doing is setting their Nexus/NuGET servers in-house, and the proxies that care that all teams actually use those servers.

Just like vendoring, in a single place, and all teams profit from the speed of access.

The only downside is the update process to get IT to add new libraries to it, as they tend to require license validations from legal department.

Alternatively: have a shared cache. Share / replicate that around instead. It's just a local mirror, and you can even literally run it as a local mirror, so you can transparently convert your not-vendored-anything project into an equivalent-to-vendored-everything project without immensely bloating each git repo with duplicates.

I thought vendoring was being deprecated in favor of modules. Maybe I misunderstood though.

That's kind of the problem - well, issue - that I'm worried about. Now you have to do some special things in your build environment to vendor stuff (set env variables, compiler options), and by default your build infrastructure is depending on supposedly always-on external services.

I rather think it should be the other way around. By default your build environment should gather all required artifacts locally, and only if you want to depend on external services should you have to create non-default options.

I would have been slightly happier of vendoring was the default happy path, and module proxies were an alternative.

I wonder if Google folks are subconsciously influenced by the idea that all this infrastructure will exist forever and never decay because they have access to seemingly infalible and highly available Google services. The gradual decay of Perl CPAN (which I've always loved BTW) and the reliability/security issues with the Node ecosystem are instructive counterexamples.

I have to say that I'm not dogmatic about this, there are good cases to be made for certain features which go with module proxies. But ultimately if you've got a long running business you now have to do more work to make sure your code will still compile in 5 years - e.g. create and maintain a module proxy service in perpetuity. This as opposed to just archiving a bunch of files organized as a git repo.

Anyway, my experience says vendor all the things. You'll be glad you did.

> I thought vendoring was being deprecated in favor of modules. Maybe I misunderstood though.

The original proposal (or sketch) of modules eliminated vendoring. It was quickly updated to include support for vendoring in response to feedback.

The decision not to use vendoring (by default) has been controversial. That said, vendoring isn't "deprecated" in the sense that Go modules still allow for vendoring; they just happen not to vendor by default.

I would also say: vendor your tools.

There is a bit of a pattern emerging, where you add a „tools“ packages and this tools package just contains blank _ imports to external Go tools you use (e.g. gometalinter) This way when you run god mod vendor, you will also be able to vendor your tools and build them locally from your repo. This works quite well with a Makefile for now. There is an open issue tracking this vendor-tools approach

> Check out your code into some random directory. Copy the directory to a USB drive and walk it over to an air gapped machine (no wi-fi, no ethernet, clean dev environment installed). Copy the directory to the box, make some small code change and try to build your binary.

I don’t follow. Why is the code change important?

Why not just use the go module cache? Seems like a much cleaner solution with very little overhead.

The idea of making a change and rebuilding is about proving that the build process does not require a network connection and that you have properly resolved the depencenies on that machine.

The change is unnecessary. You can just rebuild.

If you’ve built once using the standard go module functionality then you will be able to rebuild as long as you don’t pull in more dependencies. Naturally you can move the cache around.

Yeah, we do - in our Docker repo. But I don't want my Dockerfiles polluted with checksums of modules I download. This allows me to build images safely and then it's all vendored in our private Docker repo. This just makes our lives easier and more intelligent.

I'm curious, do you or know if its policy, to vendor node_modules the same way as you do go modules?

This isn't usually done, because node_modules often contain machine specific (e.g. 32-bit vs 64-bit x86) or OS specific (e.g. Linux vs Mac) compiled native code.

Good luck deploying your vendored, air-gap-built binary.

I agree. Unfortunately you now have to go out of your way to vendor with modules enabled which is soon the default.

You can create the vendor directory with "go mod vendor" and have everything use that vendor directory by having "-mod=vendor" in your GOFLAGS env variable.

Is there something I'm missing?

For anybody wondering if vgo is production ready, we've been using it since it came out in go 1.11 and so far didn't have any problems.

Existing workflows don't break because there's the vendor command and dependency management is instantaneous. You can also roll it out gradually as it doesn't have any problems depending on non-module repositories (using commit hashes and dates)

There's a great blog post series too if you're interested in the internals of go modules: https://research.swtch.com/vgo (it's also one of the good counterexamples to people claiming go ignored x years of PL research and history)

I found the editor support to be still somewhat hit and miss though. VsCode go extension still does not work as good on module enabled projects for example. Otherwise we did not face any issues either.

with the addition of the `gopls` tool (https://github.com/golang/go/wiki/gopls), this shouldn't be nearly as much of a problem in the coming years.

That's a great reason to try out Goland, which has had excellent module support since day 1.

If you can afford that $100-200/year subscription.

I bought it and stopped using it. It does work with modules, but I don't like the IDE itself and went back to vs code even though it breaks on me regularly. The UI/UX is just too foreign for me in GoLand. I had to search Google to find out how to search the project. Everything I wanted to do was non-discoverable for me.

CMD+SHIFT+F just like any other editor (vscode, sublime etc) out there?

Also you have action manu on CMD+SHIFT+A that will give you any command IDE can offer.

It's as discoverable as vscode if not more (you can even use IDE Features Trainer plugin if you want to)

Pretty sure cmd+shift+f (maybe it was just cmd+f) would search the open file.

With Shift: search whole Project. Without: search in open editor

Well, GoLand is $89/yr for the first year, with their standard continuity discounts for further years. $200/yr is only for orgs - if you're paying for it yourself, that's not the price you should be getting.

(I've been using GoLand since when they were calling it Gogland - no complaints so far)

It's much better to get IntelliJ Ultimate, in case you code in other languages, which I assume, most do. And it comes with a full-featured DataGrip.

I mean, if you're a professional Go developer, then yes, you can? You could even expense it or write it off.

> You could even expense it or write it off.

If you only use it for your employer's code, sure. Otherwise it's a license violation. Personally, I think it's better to buy an individual license, which you can use for your day job as well as long as you are not compensated for it.

But that’s the point isn’t it? Goland clearly goes above and beyond what free and open source can provide, so don’t feel like you’re getting ripped off paying for an IDE.

Which most professionals are able to.

every 6 months I spend half a day trying to get gocode and whatnot to work with my IDE and still can't get a reliable fix. Go modules really messed up a lot of dev tooling

I had the same problem with Emacs, couldn't get it to work reliably.

Oh yeah, that's something I heard about from my coworkers. Goland had support for go modules since the 1.11 prerelease as far as I remember, so I hadn't experienced it myself.

However, most tooling works if you use go mod vendor prior to using it, that was the case for me at least when using linters and similar stuff.

I also got this problem when using modules with go mod but then I did "go mod vendor" and all extensions are now happy again for that project! :)

> For anybody wondering if vgo is production ready, we've been using it since it came out in go 1.11 and so far didn't have any problems.

As a 2nd datapoint: We've been using it since 1.11.2, and have had problems, but have had no problems since 1.12.

Should anyone be using vgo anymore? The regular go command supports modules now.

Indeed, I meant go modules saying vgo.

I like the simple plain text signing format for the API calls.

For example: https://sum.golang.org/lookup/go.etcd.io/etcd@v0.4.0

Is signed with:

  — sum.golang.org Az3grobHchAJWrV4M34o1kLnZV4vrGSfFA+2Q9VClbmWqBjsnN4GzK1xB1RaYGSo0jIjWH9GDcR3Tja5sadw2ESoKwg=
The format is documented here: https://godoc.org/golang.org/x/mod/sumdb/note

Also, as an aside, I am building a transparency tool for arbitrary binary downloads with the rget project: https://github.com/merklecounty/rget

I really like rget, btw.

I posted a trivial little command-line demo of a client and server for some others who were thinking about a transparency tool. It implements the cacheable GET API we designed for the checksum database but obviously it applies to arbitrary key-value pairs. It might be a better foundation than stuffing things into CT logs (or not, you do get some interesting infrastructure that way).



I am almost certain I am going to need to move off of the CT log thing but I hope to onboard a few projects first[1].

But, tlogdb looks like a great starting (sent you a PR if you have a moment). I am trying to understand how all of the parts fit together at the moment particularly if I need to use Trillian or how far a direct SQL store to Spanner/CockroachDB like tlogdb implements is sufficient for awhile.

The big advantage of implementing something like tlogdb is that it becomes a URL map which means that developers don't need to publish additional metadata (at the expense of the service needing to download entire files to calculate the digests).

All of that to say thanks for all of the work on sumdb. Great starting point.

[1] https://github.com/merklecounty/rget/issues/10

The format feels ad-hoc to me. These are intended almost exclusively for machine-reading, so why not JSON? One of the proxy endpoints (version metadata) returns JSON, so why not this one? If I were writing the client, I'd much prefer to parse a structured JSON object than that text file. So I'm curious why this choice -- I'm probably missing something.

In a security-critical context, I would much rather use a format where every byte is specified than delegate to something with as much flexibility as JSON. I mean, I guess we could have used a JSON object like {"Msg": "text", "Sig": "signature"} but there's so much opportunity for surprises there and so little benefit to JSON.

Is there a justifiable need for the proxy? It seems like we’re just adding a centralized point of failure, the lack thereof something I have absolutely touted as a feature.

For verification we already have the go.sum file validating the hash. We’ve got vendoring if you’re concerned about dependencies disappearing.

The touted speed feature seems like a non-issue? The absolute largest Go package I’ve ever pulled took maybe a minute to grab its dependencies, and then I had them and never had to again.

The cynical side of me feels like this is a way for Google to collect analytics on the Go ecosystem. The slightly less cynical part of me thinks people from other ecosystems demanded it due to cargo culting the idea.

The fact that it makes it more difficult to use private modules is pretty irritating as well as I am going to need to make sure all our developers and build systems are aware when we move to 1.13. Having to set GOPRIVATE on each machine is absurd. It should be definable in the go.mod so it can be known via git.

> The cynical side of me feels like this is a way for Google to collect analytics on the Go ecosystem

> other ecosystems demanded it due to cargo culting the idea

And there it is, the hallmark of a typical comment on a thread like this

* Cynical

* Expresses concern about data being collected even though this data can't possibly be used for anything bad. (If you disagree, go to crates.io and tell me what concerns you have).

* Expresses disapproval of a large corporation even though they haven't done anything wrong here.

* Pithy dismissal of other programmers by calling them names ("cargo cult"). This change improves speed of downloads by 6-7x on poor connections and 3x on good connections. Clearly this is helpful for situations like CI which typically involve clean builds. What's the issue here exactly?

* Furious attack on a free product that's completely optional to use. If you want to keep vendoring your dependencies, go ahead and do that. Why are you angry that some people are going to use this proxy?

"Furious attack". For serious? I gave a light hearted criticism of adding a single lynchpin to the entire Go ecosystem. I think that's a legitimate source of concern.

I was by no means pithy, I was by no means being dismissive. Thinking you need something just because you had it elsewhere is the definition of cargo culting - and is something the Go developers have otherwise largely avoided.

Just because something is faster doesn't make it better or worth it, particularly when it comes with tradeoffs and is already really fast - which is my argument - that the tradeoffs aren't worth the benefit.

This is not a single point of failure. It's by no means a legitimate concern. Go read the proposal. If you don't want to use proxy.golang.org, use goproxy.io. Or host your private one. Fairly sure you'll be able to use github soon as well. Do whatever.

I notice you couldn't back up your FUD about analytics being collected.

Go programmers are happy that they have a solution where their dependencies (and specific versions of those dependencies) will be available forever without them having to take the trouble of vendoring. Your dismissive response? Just vendor. They're happy that clean builds are faster, saving CI time. Your response? It doesn't need to be faster.

You're doing the same thing you're criticising, being dismissive of my concern.

> If you don't want to use proxy.golang.org, use goproxy.io

Yes, I can do that, but as I mentioned in my original post having to get an entire team of people and fleet of CI systems to use a non-standard configuration which is not communiciable in the project repo is a minor PITA.

If and when proxy.golang.org has a bad day, builds will fail. Go developers will have a bad day. Having the ability to work around it doesn't make it not a lynchpin.

As for the "FUD about analytics being collected" it was an offhanded quip. I thought it read pretty clearly as not entirely serious hence the prefix "The cynical side of me feels" and not "I feel". It doesn't merit the effort of defending.

if you read the blog post and the privacy policy, you’d see that they are collecting analytics and requests in logs, including failed private module lookups.

> (If you disagree, go to crates.io and tell me what concerns you have).

We had a number of people express concerns with data tracking on crates.io, including google analytics, GDPR compliance, etc. We've since removed GA and had to talk to lawyers about GDPR, etc.

That's reasonable. I can see how GDPR would be tricky considering that a core feature is prevention of yanking of crates, a right that GDPR guarantees.

I can also see why people might have concerns about Google Analytics. I'm happy that crates.io has removed it.

But this is likely not the point that GP was making. His FUD about analytics was somehow implying that module download statistics could somehow be used for nefarious purposes.

It is naive to suggest that this data can't possibly be used for anything bad. This change results in Google collecting a lot more information about the software that is being created outside of Google. After this change, Go developers working on private projects will end up regularly transmitting their lists of dependencies to Google, unless the risks are properly understood and preventative steps are taken.

Google's business is collecting and monetizing information. It is entirely reasonable and appropriate to express cynicism about new initiatives that result in mass data collection.

That's a good example of cargo culting.

(Also the downvotes this message is getting. Nobody can dare to question The Google.)

What’s cargo culting exactly? Seriously asking

Thank you sir!

Paragraphs 3, 4 & 5 of TFA speak directly to the benefits of a module proxy and mirror w.r.t. module availability & performance.

And the Checksum Database section of the post explains how it provides security above and beyond a go.sum file. For example, it enables untrusted proxies, guarantees that targeted attacks are not possible, and provides accountability for the operator.

The talk is also an excellent explanation of why proxy and sumdb are valuable. (EDIT: direct link to the recording https://youtu.be/KqTySYYhPUE)

Which is what I’m attempting to refute the need for. These are solved and non-problems respectively.

You're saying things work fine for you, but it doesn't follow that they work fine for everyone else. Maybe there are larger Go modules out there than the largest one you've personally built? Or people with slower network access to Github? (Consider China and Australia.) Or maybe they're anticipating that there will be larger Go projects in the future?

Most Go projects are hosted on Github. Github is backed by a global CDN and can handle significantly large projects. I don't think the focus of this project was to address slow network access. It was primarily driven by the need to analyze dependency usage and perhaps add redundancy to some extent.

Fetching a module from GitHub often requires cloning the repository, when all you needed was the contents of a specific version, or even just the go.mod to do version resolution. The proxy protocol provides the artifacts you need, for sometimes a 10x bandwidth saving.

(Also, lots of security issues in "go get" came from using git to fetch untrusted repositories.)

Ok that makes sense. Thanks! Also, this sounds like a perfectly neat use case for svn sparse check-out.

The point about go.mod only protecting you from changes after you first use it is valid.

Another reason is what if upstream deletes the package? What if github goes down?

Still, maybe we are better off vendoring packages. At least that would make us consider our use of deps.

GitHub is already the central point of failure for the vast majority of the ecosystem, and that is run by a company that has no incentive to care about Go.

Guess who supports VS Code Go plugin and has sponsored the work on Go debugging tools like Delve?

I don't have the link handy, but I remember someone posting very impressive speed improvements.

More important though, to those of us writing and/or stewarding Go code at companies that deal with financial information, these kind of safeguards are absolutely necessary.

You can be cynical about Google's motivations, sure, but doing this to track you harder is not consistent with the conversation and approach they've taken with these features.

That might have been me: https://twitter.com/zekjur/status/1154934952477093888

> Oh, wow! Setting GOPROXY=https://proxy.golang.org results in a 7x speed up for module installation on the hotel WiFi: from 17.7s (no proxy) to 2.3s (http://proxy.golang.org)

My experience from building docker images of Go programs from Github repositories is that Github very aggressively rate limits you. More than one "go build ..." per 10 minutes and it just randomly won't send you the repository contents anymore. So a module proxy was a must. (You could also vendor dependencies, which is my preference, but my team was strongly opposed to vendoring. So we used Athens.)

Are you pulling as an authenticated GitHub user? The rate limit for an authenticated user is 5000 pulls per hour, unauthenticated is 60.

No I am not. I don't even know how to make "go build" authenticate to Github. (It's not my code I'm pulling, it's the random libraries my code depends on.)

This is why the proxy exists.

"go build" doesn't fetch the dependencies, does it? Go uses the local settings (i.e .netrc file in your home directory)

go build will fetch the dependencies if you don't have them. However, they're stored in the $GOPATH, which now defaults to ~/go I believe. Therefore, chances are you already have them.

A package repository can go down, vanish, versions can be deleted.

Which as I mentioned is why we have `go mod vendor` if that’s a concern.

Some people don't like vendoring and would prefer keeping their dependencies separate.

Not to mention that I used to be able to get my own metrics (eg. by tailing access logs on the vanity import path server, or checking stats on my version control service or server, etc.) now only Google has that information and module owners are left in the dark about how many times their code is being hit.

They also aren't being very forward thinking. The Go team keeps saying "we promise no Google exec is doing anything nefarious with that" but we don't know who will be the execs next year, or 10 years from now or what they'll do with the power the Go team just gave them by being so short sighted.

Marking packages as private in Go.mod is a good idea. Bring it up on the issue tracker.

The default behavior is to fall back to fetching from source control directly if the proxy fails.


> ... this is a way for Google to collect analytics on the Go ecosystem.

Yeah. It's extremely tone deaf of the Go dev team to pull this crap.

Note - setting `export GOPROXY=direct` in your .bashrc (on nix Bash) should* stop the info collection.

Unless a "feature" is added to stop that working too.

Making the feature optional is a pretty normal way of catering to the minority of people who are more risk adverse and won't use something like this. Giving up entirely is not.

Optional yeah, but enabled by default.

"Below is an example of such a tree"... and there is a picture of a binary tree, with no indication as to how the tree is used.

I appreciate trying to explain the interesting technology behind authenticating modules, but this diagram appears to explain nothing at all.

There was a talk at gophercon that I belive goes into more detail about how this works and the cryptography behind it (I only skimmed it so far) https://youtu.be/KqTySYYhPUE

This is probably a dumb question, but here goes... I'm testing go modules with an internal project that uses a private github.com repo. It depends on other public go packages and modules is nice to manage those but the general public will probably never see or want to use this code.

Are modules in private repos impacted by this in any way? Would they show-up in the index somehow?

If the private one is the main module, you don't have to do anything, it doesn't get sent anywhere because it's just what's on your disk. Indeed, you can call it "module foo", which is not a go-gettable name.

If you _depend on_ private modules, you need to follow these instructions: https://tip.golang.org/cmd/go/#hdr-Module_configuration_for_...

In any case, private modules never show up in the index, mirror, or checksum database.

...but references to them are stored by google in logs and used per the privacy policy at https://proxy.golang.org/privacy

See here[1]. From what I've gathered, you should set GOPRIVATE appropriately, otherwise the Go Module Mirror(s) and the Go Checksum Database(s) may know that you have a private repo called “github.com/you/very-private”. It could also probably reason about your dependencies a little bit.

[1]: https://tip.golang.org/cmd/go/#hdr-Module_configuration_for_...

Yes, I don't mind if it shows up in the index. I have nothing to hide.

That just seems wasteful and clutterish. If too much of that occurs, the index could be littered with it. Sort of like old PGP keys that never go away.

In 10,000 years, the Go code will be lost, but bored future people will reconstruct it from its hash through shear brute force searching.

More seriously, if the hashes turn out to have serious security flaws, someone may be able to reconstruct the file someday.

> More seriously, if the hashes turn out to have serious security flaws, someone may be able to reconstruct the file someday.

Unlikely. The pidgeon-hole principle applies here, there are many equally likely Go source files that hash to the same thing (regardless of hash function).

The pigeon hole principle will find files that are not syntactically correct Go or of an unrealistic length unless there are really adversarially crafted inputs before the hash.

I filed a bug that this was not spoken about in the announcement.

It would seems like it will cause a lot of drama as this actually rolls out.


You have to use the GOPRIVATE env variable to define what not to use the proxy for. https://tip.golang.org/cmd/go/#hdr-Module_configuration_for_...

Thank you for this. It seems they've thought of almost everything we do.

Also I would like to bring up since JFrog's module proxy, GoCenter was publicly hosted first, they will have more than likely been first to host a module version. That means you are now open to the situation where someone uploaded v1.0 of a module to Go Center, and v1.0 of a module to Google's Mirror, and they contain different code.

I am not sure how the Go team plans to handle this, but the only obvious solution would be for Google's proxy to reach out to GoCenter first to see if they have a copy of a specific version, download and cache, and serve that. Does anyone know if that happens?

Why would two versions - specifically tied to a given commit - of say "v1.2.3" contain different code?

Well you mentioned it in your comment, tags are not specific commits. Go Modules standardize around SemVer, and fallback to hashes only when it needs to. The reality is anyone can git push -f and change what a tag points to currently.

So the scenario goes like this: 1. Someone requests a module from GoCenter that is currently not in it's caches. 2. Gocenter downloads the requested version v1.2.3, and stores that. 3. Bad/Careless/Ignorant/Evil (depending on circumstances, I guess) git push -f's 4. Module is requested in Google's module proxy, which now loads a "v1.2.3" into it's local caches, but it contains different code now than "v1.2.3" of GoCenter

Outside of git push -f, there are other hacky techniques you can use to "update" the code a tag points to.

Hmmm, maybe my understanding of tags is incorrect.

When I tag something in git, that is for a specific commit isn't it?


A Google privacy policy that's actually reasonable.

Does anyone know if the Go Proxy server using the original download protocol[0] defined by vgo?

    GET baseURL/module/@v/list fetches a list of all known versions, one per line.
    GET baseURL/module/@v/version.info fetches JSON-formatted metadata about that version.
    GET baseURL/module/@v/version.mod fetches the go.mod file for that version.
    GET baseURL/module/@v/version.zip fetches the zip file for that version.

[0] https://research.swtch.com/vgo-module

Yes, see also https://youtu.be/WDbbIS7m9bU?t=298 (sorry, slide deck not yet available it seems)

This looks great.

I'm interested in the possibility of running a private mirror. Does anybody know if they've release the source for proxy.golang.org? I can't seem to find it in the golang github org.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact