I appreciate the fact that Google is investing in infrastructure to make the Go ecosystem robust, I really do.
Having said that I encourage everyone to do a simple experiment.
Check out your code into some random directory. Copy the directory to a USB drive and walk it over to an air gapped machine (no wi-fi, no ethernet, clean dev environment installed). Copy the directory to the box, make some small code change and try to build your binary.
If your build fails you're doing it wrong.
In real life one of the following will happen:
- Your network connection to the Internet will be down
- Your network connection to the corporate network will be down
- The services you depend on (Github, Google proxy, Docker hub) will be down
- The services you depend on will throttle you
- The services you depend on will serve you wrong or corrupt data
- The services you depend on will be offline for maintenance
- The certificates on the service you depend on were not renewed in time
- The account you use on services you depend on will be inexplicably suspended
Murphy's law being what it is, some or all of the above will happen when your SaaS is on fire, your customers are screaming, and you're trying to roll out a fix. Meanwhile your company is losing approximately your personal lifetime earnings in revenue every half an hour.
Vendor. Everything. Always.
(and have backups, lots of backups)
 Based on highly educational career experience
Your first point is particularly laughable. If your network connection to the internet is down and you're a SaaS provider, you're probably going to need to fix that before you can roll out a fix to your hosted software.
That said, they're all good points to consider when building out a deployment pipeline, and in general mirroring your dependencies is a great idea. But committing a vendor directory is a poor solution/mitigation for most of those risks.
Don't assume that your SaaS runs on the same network as your build machines. In most cases this is not the case - or at least it shouldn't be. Your SaaS might be broken in some way, you're trying to create a build to fix it on your secure corp network and your connectivity to the Net disappears. And it's 3 AM in San Francisco and everyone who can fix it is asleep, and meanwhile your European customers are not happy.
Let's be clear. You can mitigate the effect of all these problems one way or another. But the mitigations are easier if you have less moving parts and less dependencies on systems you don't control.
Lets face it, ultimately it all comes down to limiting your business risk at least cost (because money). One simple way to do that is if your build environment travels with your code down the years.
That wasn't the assumption. It's laughable because under any condition of where your hosting and build infrastructure live, you'll need to restore your internet connection before you can deploy whatever software fix you're trying to ('deploy' in the sense that your customers have access to the deployed fix).
Why not? What's the downside except having a "update dependency" commit from time to time.
It's a great solution to always have your dependencies at hand, offline and even versioned. Makes it easy to see when a bug got introduced by a dependency or roll back an update. Especially for projects that don't have a build pipeline, cached dependencies or just smaller dependencies that aren't very well backed up from other people.
I think it's a great way to make sure things are still running after a few years (personal projects) and everything still works without having to hunt down some dependencies or find alternative mirrors.
This isn't a downside, it's a chance to vet and code review your dependencies which you should be doing anyways (but nobody does).
- Scan them and hope someone else noticed and reported before you used it?
- Assume your team mates reviewed them when they were updated?
- Add SIEM like capabilities and monitor network connections?
It should really be all 3. There are counter points of course, but if you vendor them, they'll in a PR and reviewed just the same as the rest of your code.
Of course, where do you stop? You trust your distro to do the right thing and you don't review that code, etc. So the more generic question is why do you trust the golang modules you use?
I think this is a bit extreme and becomes very hard to manage dependencies department wide. I can walk over to the storage machine if corporate network is down. I know that some people want the entire internet, every git repo, every docker image, every game/binary asset, and every needed dependency available locally, but this is just not reasonable. This kind of hardline stance disregards the benefit of intranet stores, be it for code, docker images, whatever.
There is a middleground between not relying on third party services and requiring the ability to build after a fresh clone in a vacuum. Some builds simply require more than vendorable pieces, and just telling them they're doing it wrong is like telling someone they are doing it wrong relying on their corporate email servers instead of walking across office and delivering a hand-written note.
Maybe something like, "vendor code as much as you can" is a more reasonable approach than "always" this, "you're doing it wrong" that.
Sure it is. In most Enterprise outfits its even policy. Ask yourself why that is.
I don't mean to be flippant about it, your point is actually quite reasonable. If you are a solo developer, or a small company, then don't worry about it. Github will be up, Google proxy will have your back, your Net connection will work. Once per year when all the stars align just wrong your builds might get delayed but it isn't worth worrying about it.
However in certain industries if the delay during some incident is costing you more then a few digits per minute, even once per year, management starts to notice. Long term support contracts are also a thing, e.g. ensuring that your code still has the artifacts to build on Red Hat 5 or some such. If you think about it that's nothing but outsourcing vendoring to a third party for large amounts of money.
Just like vendoring, in a single place, and all teams profit from the speed of access.
The only downside is the update process to get IT to add new libraries to it, as they tend to require license validations from legal department.
I rather think it should be the other way around. By default your build environment should gather all required artifacts locally, and only if you want to depend on external services should you have to create non-default options.
I would have been slightly happier of vendoring was the default happy path, and module proxies were an alternative.
I wonder if Google folks are subconsciously influenced by the idea that all this infrastructure will exist forever and never decay because they have access to seemingly infalible and highly available Google services. The gradual decay of Perl CPAN (which I've always loved BTW) and the reliability/security issues with the Node ecosystem are instructive counterexamples.
I have to say that I'm not dogmatic about this, there are good cases to be made for certain features which go with module proxies. But ultimately if you've got a long running business you now have to do more work to make sure your code will still compile in 5 years - e.g. create and maintain a module proxy service in perpetuity. This as opposed to just archiving a bunch of files organized as a git repo.
Anyway, my experience says vendor all the things. You'll be glad you did.
The original proposal (or sketch) of modules eliminated vendoring. It was quickly updated to include support for vendoring in response to feedback.
The decision not to use vendoring (by default) has been controversial. That said, vendoring isn't "deprecated" in the sense that Go modules still allow for vendoring; they just happen not to vendor by default.
There is a bit of a pattern emerging, where you add a „tools“ packages and this tools package just contains blank _ imports to external Go tools you use (e.g. gometalinter)
This way when you run god mod vendor, you will also be able to vendor your tools and build them locally from your repo.
This works quite well with a Makefile for now. There is an open issue tracking this vendor-tools approach
I don’t follow. Why is the code change important?
Why not just use the go module cache? Seems like a much cleaner solution with very little overhead.
If you’ve built once using the standard go module functionality then you will be able to rebuild as long as you don’t pull in more dependencies. Naturally you can move the cache around.
Is there something I'm missing?
Existing workflows don't break because there's the vendor command and dependency management is instantaneous. You can also roll it out gradually as it doesn't have any problems depending on non-module repositories (using commit hashes and dates)
There's a great blog post series too if you're interested in the internals of go modules: https://research.swtch.com/vgo (it's also one of the good counterexamples to people claiming go ignored x years of PL research and history)
Also you have action manu on CMD+SHIFT+A that will give you any command IDE can offer.
It's as discoverable as vscode if not more (you can even use IDE Features Trainer plugin if you want to)
(I've been using GoLand since when they were calling it Gogland - no complaints so far)
If you only use it for your employer's code, sure. Otherwise it's a license violation. Personally, I think it's better to buy an individual license, which you can use for your day job as well as long as you are not compensated for it.
However, most tooling works if you use go mod vendor prior to using it, that was the case for me at least when using linters and similar stuff.
As a 2nd datapoint: We've been using it since 1.11.2, and have had problems, but have had no problems since 1.12.
For example: https://email@example.com
Is signed with:
— sum.golang.org Az3grobHchAJWrV4M34o1kLnZV4vrGSfFA+2Q9VClbmWqBjsnN4GzK1xB1RaYGSo0jIjWH9GDcR3Tja5sadw2ESoKwg=
Also, as an aside, I am building a transparency tool for arbitrary binary downloads with the rget project: https://github.com/merklecounty/rget
I posted a trivial little command-line demo of a client and server for some others who were thinking about a transparency tool. It implements the cacheable GET API we designed for the checksum database but obviously it applies to arbitrary key-value pairs. It might be a better foundation than stuffing things into CT logs (or not, you do get some interesting infrastructure that way).
I am almost certain I am going to need to move off of the CT log thing but I hope to onboard a few projects first.
But, tlogdb looks like a great starting (sent you a PR if you have a moment). I am trying to understand how all of the parts fit together at the moment particularly if I need to use Trillian or how far a direct SQL store to Spanner/CockroachDB like tlogdb implements is sufficient for awhile.
The big advantage of implementing something like tlogdb is that it becomes a URL map which means that developers don't need to publish additional metadata (at the expense of the service needing to download entire files to calculate the digests).
All of that to say thanks for all of the work on sumdb. Great starting point.
For verification we already have the go.sum file validating the hash. We’ve got vendoring if you’re concerned about dependencies disappearing.
The touted speed feature seems like a non-issue? The absolute largest Go package I’ve ever pulled took maybe a minute to grab its dependencies, and then I had them and never had to again.
The cynical side of me feels like this is a way for Google to collect analytics on the Go ecosystem. The slightly less cynical part of me thinks people from other ecosystems demanded it due to cargo culting the idea.
The fact that it makes it more difficult to use private modules is pretty irritating as well as I am going to need to make sure all our developers and build systems are aware when we move to 1.13. Having to set GOPRIVATE on each machine is absurd. It should be definable in the go.mod so it can be known via git.
> other ecosystems demanded it due to cargo culting the idea
And there it is, the hallmark of a typical comment on a thread like this
* Expresses concern about data being collected even though this data can't possibly be used for anything bad. (If you disagree, go to crates.io and tell me what concerns you have).
* Expresses disapproval of a large corporation even though they haven't done anything wrong here.
* Pithy dismissal of other programmers by calling them names ("cargo cult"). This change improves speed of downloads by 6-7x on poor connections and 3x on good connections. Clearly this is helpful for situations like CI which typically involve clean builds. What's the issue here exactly?
* Furious attack on a free product that's completely optional to use. If you want to keep vendoring your dependencies, go ahead and do that. Why are you angry that some people are going to use this proxy?
I was by no means pithy, I was by no means being dismissive. Thinking you need something just because you had it elsewhere is the definition of cargo culting - and is something the Go developers have otherwise largely avoided.
Just because something is faster doesn't make it better or worth it, particularly when it comes with tradeoffs and is already really fast - which is my argument - that the tradeoffs aren't worth the benefit.
I notice you couldn't back up your FUD about analytics being collected.
Go programmers are happy that they have a solution where their dependencies (and specific versions of those dependencies) will be available forever without them having to take the trouble of vendoring. Your dismissive response? Just vendor. They're happy that clean builds are faster, saving CI time. Your response? It doesn't need to be faster.
> If you don't want to use proxy.golang.org, use goproxy.io
Yes, I can do that, but as I mentioned in my original post having to get an entire team of people and fleet of CI systems to use a non-standard configuration which is not communiciable in the project repo is a minor PITA.
If and when proxy.golang.org has a bad day, builds will fail. Go developers will have a bad day. Having the ability to work around it doesn't make it not a lynchpin.
As for the "FUD about analytics being collected" it was an offhanded quip. I thought it read pretty clearly as not entirely serious hence the prefix "The cynical side of me feels" and not "I feel". It doesn't merit the effort of defending.
We had a number of people express concerns with data tracking on crates.io, including google analytics, GDPR compliance, etc. We've since removed GA and had to talk to lawyers about GDPR, etc.
I can also see why people might have concerns about Google Analytics. I'm happy that crates.io has removed it.
But this is likely not the point that GP was making. His FUD about analytics was somehow implying that module download statistics could somehow be used for nefarious purposes.
Google's business is collecting and monetizing information. It is entirely reasonable and appropriate to express cynicism about new initiatives that result in mass data collection.
(Also the downvotes this message is getting. Nobody can dare to question The Google.)
The talk is also an excellent explanation of why proxy and sumdb are valuable. (EDIT: direct link to the recording https://youtu.be/KqTySYYhPUE)
(Also, lots of security issues in "go get" came from using git to fetch untrusted repositories.)
Still, maybe we are better off vendoring packages. At least that would make us consider our use of deps.
More important though, to those of us writing and/or stewarding Go code at companies that deal with financial information, these kind of safeguards are absolutely necessary.
You can be cynical about Google's motivations, sure, but doing this to track you harder is not consistent with the conversation and approach they've taken with these features.
> Oh, wow! Setting GOPROXY=https://proxy.golang.org results in a 7x speed up for module installation on the hotel WiFi: from 17.7s (no proxy) to 2.3s (http://proxy.golang.org)
This is why the proxy exists.
They also aren't being very forward thinking. The Go team keeps saying "we promise no Google exec is doing anything nefarious with that" but we don't know who will be the execs next year, or 10 years from now or what they'll do with the power the Go team just gave them by being so short sighted.
Yeah. It's extremely tone deaf of the Go dev team to pull this crap.
Note - setting `export GOPROXY=direct` in your .bashrc (on nix Bash) should* stop the info collection.
Unless a "feature" is added to stop that working too.
I appreciate trying to explain the interesting technology behind authenticating modules, but this diagram appears to explain nothing at all.
Are modules in private repos impacted by this in any way? Would they show-up in the index somehow?
If you _depend on_ private modules, you need to follow these instructions: https://tip.golang.org/cmd/go/#hdr-Module_configuration_for_...
In any case, private modules never show up in the index, mirror, or checksum database.
That just seems wasteful and clutterish. If too much of that occurs, the index could be littered with it. Sort of like old PGP keys that never go away.
More seriously, if the hashes turn out to have serious security flaws, someone may be able to reconstruct the file someday.
Unlikely. The pidgeon-hole principle applies here, there are many equally likely Go source files that hash to the same thing (regardless of hash function).
It would seems like it will cause a lot of drama as this actually rolls out.
I am not sure how the Go team plans to handle this, but the only obvious solution would be for Google's proxy to reach out to GoCenter first to see if they have a copy of a specific version, download and cache, and serve that. Does anyone know if that happens?
So the scenario goes like this:
1. Someone requests a module from GoCenter that is currently not in it's caches.
2. Gocenter downloads the requested version v1.2.3, and stores that.
3. Bad/Careless/Ignorant/Evil (depending on circumstances, I guess) git push -f's
4. Module is requested in Google's module proxy, which now loads a "v1.2.3" into it's local caches, but it contains different code now than "v1.2.3" of GoCenter
Outside of git push -f, there are other hacky techniques you can use to "update" the code a tag points to.
When I tag something in git, that is for a specific commit isn't it?
GET baseURL/module/@v/list fetches a list of all known versions, one per line.
GET baseURL/module/@v/version.info fetches JSON-formatted metadata about that version.
GET baseURL/module/@v/version.mod fetches the go.mod file for that version.
GET baseURL/module/@v/version.zip fetches the zip file for that version.
I'm interested in the possibility of running a private mirror. Does anybody know if they've release the source for proxy.golang.org? I can't seem to find it in the golang github org.