The Go team has been making progress toward a complete fix to this problem.
Go 1.19 added "go mod download -reuse", which lets it be told about the previous download result including the Git commit refs involved and their hashes. If the relevant parts of the server's advertised ref list is unchanged since the previous download, then the refresh will do nothing more than the ref list, which is very cheap.
The proxy.golang.org service has not yet been updated to use -reuse, but it is on our list of planned work for this year.
On the one hand Sourcehut claims this is a big problem for them, but on the other hand Sourcehut also has told us they don't want us to put in a special case to disable background refreshes (see the comment thread elsewhere on this page [1]).
The offer to disable background refreshes until a more complete fix can be deployed still stands, both to Sourcehut and to anyone else who is bothered by the current load. Feel free to post an issue at https://go.dev/issue/new or email me at rsc@golang.org if you would like to opt your server out of background refreshes.
I realize in the real world most modules are probably hosted by large providers that can absorb the bandwidth, like Github, but it seems incredibly discourteous to not prioritize the hammering of small providers, especially two years on when the response is still "maybe later this year".
I think Drew is right in that he shouldn't take a personalized Sourcehut-only exception because this doesn't address the core issue for any new small providers that pop up.
Between this and the response in the original thread that said, "For boring technical reasons, it would be a fair bit of extra work for us to read robots.txt," it gives the impression that the Go team doesn't care. Sometimes what we _need_ to do to be good netizens is a fair bit of boring technical work but it's essential.
It's super-weird that the Google-side Go folks' responses to this have basically been "we don't have the resources to run this service that we decided to run and that's now misbehaving, responsibly". Like... don't, then? Why take on that kind of thing in the first place if urgent fixes to its generating abusive traffic for no good reason take three years?
The Exception "exclusion solution" definitely it's not personalized and Sourcehut-only but available to anyone that requests it and in the issue track you can see several people that are using this exclusion already.
True that an opt-out (what this solution boils down to) is not ideal but it's way better than using your users quality of service to try strong-arming your side. And anyway from the go team it's been made clear that they are working on improving the situation of the refreshes and this opt-out is just the temporary solution until they fix the real issue.
Hi Russ! Thank you for sharing. I am pleased to hear that there is finally some progress towards a solution for this problem. If you or someone working on the issue can reach out via email (sir@cmpwn.com), I would be happy to discuss the issue further. What you described seems like an incomplete solution, and I would like to discuss some additional details with your team, but it is a good start. I'm also happy to postpone or cancel the planned ban on the Go proxy if there's active motion towards a fix from Google's end. I am, however, a bit uneasy that you mentioned that it's only prioritized for "this year" -- another year of enduring a DoS from Google does not sound great.
I cannot file an issue; as the article explains I was banned from the Go community without explanation or recourse; and the workaround is not satisfying for reasons I outlined in other HN comments and on GitHub. However, I would appreciate receiving a follow-up via email from someone knowledgeable on the matter, and so long as there is an open line of communication I can be much more patient. These things are easily solved when they're treated with mutual respect and collaboration between engineering teams, which has not been my experience so far. That said, I am looking forward to finally putting this issue behind us.
Why does the Go team and/or Google think that it's acceptable to not respect robots.txt and instead DDoS git repositories by default, unless they get put on a list of "special case[s] to disable background refreshes"?
Why was the author of the post banned without notice from the Go issue tracker, removing what is apparently the only way to get on this list aside from emailing you directly?
Do you, personally, find any of this remotely acceptable?
FWIW I don't think this really fits into robots.txt. That file is mostly aimed at crawlers. Not for services loading specific URLs due to (sometimes indirect) user requests.
...but as a place that could hold a rate limit recommendation it would be nice since it appears that the Git protocol doesn't really have the equivalent of a Cache-Control header.
> Not for services loading specific URLs due to (sometimes indirect) user requests.
A crawler has a list of resources it periodically checks to see if it changed, and if it did, indexes it for user requests.
Contrary to this totally-not-a-crawler, with its own database of existing resources, that periodically checks if anything changed, and if it did, caches content and builds chescksums.
I'm taking the OP at his word here, but he specifically claims that the proxy service making these requests will also make requests independent of a `go get` or other user-initiated action, sometimes to the tune of a dozen repos at once and 2500 requests per hour. That sounds like a crawler to me, and even if you want to argue the semantic meaning of the word "crawler," I strongly feel that robots.txt is the best available solution to inform the system what its rate limit should be.
After reading this and your response to a sibling comment I wholeheartedly disagree with you on both the specific definition of the word crawler and what the "main purpose" of robots.txt is, but glad we can agree that Google should be doing more to respect rate limits :)
As annoying as it is, there is precedent for this opinion with RSS aggregator websites like Feedly. They discover new feed URLs when their users add them, and then keep auto-refreshing them without further explicit user interaction. They don't respect robots.txt either.
I wouldn't expect or want an RSS aggregator to respect robots.txt for explicitly added feeds. That is effectively a human action asking for that feed to be monitored so robots.txt doesn't apply.
What would be good is respecting `Cache-Control`, which unfortunately many RSS clients don't, and just pick a schedule and poll on it.
I want my software to obey me, not someone else. If the software is discovering resources on its own, then obeying robots.txt is fair. But if the software is polling a resource I explicitly told it to, I would not expect it to make additional requests to fetch unrelated files such as a robots.txt
I can almost see both sides here... But ultimately when you are using someone else's resources, then not respecting their wishes (within reason) just makes you an asshole.
Google began pushing for it to become an Internet standard—explicitly to be applicable to any URI-driven Internet system, not just the Web—in 2019, and it was adopted as an Internet standard in 2022.
This is true but irrelevant to the parent's question -- in the article, it's made clear that Google's requests are happening over HTTP, which is the most obvious reason why robots.txt should be respected.
Read the OP; it's obvious based on the references to robots.txt, the User-Agent header, returning a 429 response, etc, that most (all?) of Google's requests are doing git clones over http(s).
I suspect they have a problem with this DDoS by default unless you ask to opt out behavior. Why is anyone getting hit with these expensive background refreshes until you have a chance to do it right? Why is it still not done right 2 years after this was first reported?
Maybe it should be an opt-in list where the big providers (such as github) can be hit by an army of bots and everyone else is safe by default.
This smells wildly overdramatic. They've been working on solutions big and small since it was reported, it's just that the big solutions take time, this was communicated to Drew.
This reminds me a bit of a disfunctional relationship: clearly Sourcehut wants Google to stop DDoS their servers; clearly Google don’t actually want to DDoS Sourcehut, but Sourcehut also doesn’t want to ask Google to stop, and Google also want to be asked to stop. And so nothing gets done.
The question is who will swallow their pride first: Sourcehut or Google.
This isn't true. Sourcehut reported a bug, and since the bug is somewhat involved to fix entirely, we asked what the impact of the bug is to them and offered to make a custom change for the site in the interim. The impact matters: the appropriate response is different for "I saw this in my logs and it looks weird but it's not bothering me" versus "this is causing serious problems for my site". We have been getting mixed signals about which it is, as I noted, but since Sourcehut told us explicitly not to put in a special case, we haven't.
Your comment in this thread is the first time I've seen anyone mention that it was being worked on since... June 2021? This despite repeatedly raising the issue up until I was banned without explanation. I was never told, and still don't know, what disabling the refresh entails, the ban prevents me from discussing the matter further, and I was under the impression that no one was working on it. We have suffered a serious communication failure in this incident. That said, I am looking forward to your follow-up email and seeing this issue resolved in a timely and amicable manner.
> the ban prevents me from discussing the matter further
Hi ddevault, FWIW, in May 2022 on that #44577 issue [0] you had opened, it looks like someone on the core Go team commented there [1] recommending that you email the golang-dev mailing list or email them directly.
Separately, it looks like in July 2022, in one of the issues tracking the new friendlier -reuse flag, there was a mention [2] of the #44577 issue you had opened. In the normal course, that would have triggered an automatic update on your #44577 issue... but I suspect because that #44577 issue had been locked by one of the community gardeners as "too heated", that automatic update didn't happen. (Edit: It looks like it was locked due to a series of rapid comments from people unrelated to Sourcehut, including about “scummy behavior”).
Of course, communication on large / sprawling open source projects is never quite perfect, but that's a little extra color...
The offer in [1] was to email the ML to ask for an exclusion, not to continue discussing the general issue which was still being discussed in the GH issue.
And given that they banned him for no reason, he is perfectly in the right to tell them that they should email him instead.
> the appropriate response is different for "I saw this in my logs and it looks weird but it's not bothering me" versus "this is causing serious problems for my site". We have been getting mixed signals about which it is
We have not been reading the same tickets and articles it seems
No problems were ever mentioned, serious or otherwise. Elevated traffic isn't automatically a problem. Drew's played it up quite a lot elsewhere, but the Go team can only be reasonably expected to follow the one issue filed, not Drew's entire online presence.
Yes. He had plenty of opportunity to state problems if they existed. Relaying harm caused would have likely accelerated things, and if harm was being done he would have taken up the still-open offer to solve this problem in the interim while the real solution is pushed out instead of writing misrepresentative and openly salty blog posts for years. Even with him being banned, the Go team is still tracking this issue, still brings it up internally, and has pushed a feature that would fix this ahead by an entire release.
So yes. The issue they banned him from. Because reality's more complicated than flippant one liners.
Thanks for the insight, Russ. Would you comment on what the potential consequences of opting out of background refreshes would be? Could there be any adverse effects for users?
Opting out of background refreshes would mean that fetching a module version that (1) no one else had fetched in a few days and (2) does not use a recognized
open-source license might not be in the cache, which would make 'go get' take a little extra time while the proxy fetched it on demand. The amount of time would depend on the size of the repo, of course.
The background refresh is meant to prefetch for that situation, to avoid putting that time on an actual user request. It's not perfect but it's far less disruptive than having to set GOPRIVATE.
Some people today raised concerns about disabling background refreshes (the temporary workaround originally suggested by the Go team) as having possibly unacceptable resulting performance for end users...
...but it sounds like disabling background refreshes would have strictly better end-user performance than what the Sourcehut team had been planning as described in their blog post today (GOPRIVATE and whatnot)?
Hey Russ, I got your messages that my emails aren't coming through but I'm not sure why. As an alternative, you can reach me on IRC at ddevault on Libera Chat. I'm in CEST, but my bouncer is always online. Cheers!
According to some comments in the linked GitHub issue, including [1] from last May, Drew could have simply asked to be excluded from automatic refresh traffic from the mirror. If I understand correctly, that would still leave traffic from the mirror when it’s acting as a direct proxy for someone’s request, but that is traffic that would be going to sr.ht regardless.
For some reason he did not do this and instead chose an option that causes breakage.
I think it is problematic that we are using github issues as "support forum" for asking a git host provider to be excluded from the refresh list. This should not have come to that. Whatever happened to "reasonable defaults", so as a random person hosting a single Go module doesn't get DOSed - https://github.com/golang/go/issues/44577#issuecomment-86087... ?
Everyone can make their own assessment of what is a reasonable default and what counts as a DoS (and they are welcome to opt-out of any traffic), but note that 4GB per day is 0.3704 Mbps.
Comes around $8-11 of egress monthly traffic on AWS. I would think twice before signing up for a service that charges me $10/month - not sure why this should be any different.
Also, how do you opt-out? Imagine a random developer in a startup, running a Gitlab instance and then pushing a Go module there and only to be left with inexplicable traffic pattern(and bill). I have no skin in the game but this default _does not_ sound reasonable to me, whichever way you slice it.
As an aside, I have done this and I do not have the problem. The most interesting question for me in all of this has always been, what triggers Google's request load?
AWS gouges the crap out of people with their egress fees, though. You can get 20TB of transfer for ~$5 a month from Hetzner plus a server to go with it.
I am very concerned that "own assessment" of what is a DoS means that source code is expected to be hosted only on large platform or by large corporation which is another way to say that "the little guys don't matter".
Self hosting of source code should be an option and the proxy should be there to reduce the traffic load, not amplify or artificially increase that load despite the "level of DoS".
One thing Drew is asking for is to respect robots.txt to allow the operator to determine what a reasonable level is for that operator and not apply a github bias to it.
This way of thinking is the exact opposite of what we need to do in the IT for reducing our impact on the environment. 4 GB is an enormous amount of data. It's enough to listen to 21k hours of music, or having a small dump of all french wikipedia (no pics, main paragraph). It's enough to completely travel off-line in Germany. It's enough to watch 3 to 4 movies in a good resolution. And that's per day.
That 4GB figure is for a repo at git.lubar.me, a self-hosted git repo where – quoting the person running it – "I am the only person in the world using this Go module".
In this context, that seems like a lot. Of course the module mirror can't know about this context, but there are certainly a lot of scenarios where this is comparatively a lot of bandwidth. Not everyone is running beefy servers.
Seems like an exceedingly poor and unreasonable default, and it doesn't take much imagination to see how this could be improved fairly easily (e.g. scale to number of actual go gets would already be an improvement).
Anyway, things get a bit kafka-esque when you realise that there is another company doing this WiFi thing and to opt out from that one you need a different SSID suffix. Since you can't have both, you end up with at least one company data mining you.
I’ve got to ask, which other Wi-Fi mapping provider requires a SSID suffix to opt out? Is it one of the big boys or is it someone like wigle.net or openwifimap.net?
I can confirm asking to be excluded from refreshes (which AFAIK is still a standing offer, but I obviously can't speak for the mirror team because I am not at Google anymore) would stop all automated traffic from the mirror, and that sr.ht could send a simple HTTP GET to proxy.golang.org for new tags if it wished not to wait for users to request new versions.
Should every person that hosts an instance of SourceHut, Gitea, or Forgejo have to opt-in to this? That just doesn't scale at all. Drew is standing up for all independent hosters as much as he is standing up for his own business interests.
No other independent hosters are having issues. I'm on one with no more than a few dozen users. Multiple of us write in Go, including the owner of the service, and yet Gitea and the VPS hosting it have never even blinked. If there were more of an issue, there would be more of a fuss than just two individuals. And one of those individuals was completely satisfied with the temporarily hackjob while a more permanent solution has been in the works. The other just denied it with no reason ever given cause he wants to keep litigating the issue.
I could be wrong but in comments to https://github.com/golang/go/issues/44577 issue there at least 4 hosters that forced to manually disable background refresh because of this exactly issue?
How are you getting that number? There's Drew, who refused all solutions offered. There's Ben, who took the offer and was satisfied. If you're counting Gentoo, which is linked, you shouldn't because that's an unrelated issue caused by a regression test.
They should probably use GOPRIVATE[1] instead. GOPRIVATE doesn't disable the module cache globally, it just disables it for individual domains or paths on domains. This is mainly used with private repo dependencies on GitHub.
Or, as mentioned in the post, why they don't do a shallow clone if they have to fetch it every time for whatever reason. Seems like a weird decision either way.
Yep, a shallow clone is enough to get the latest version. And you can even filter the tree to make the download size even smaller given you only want the hash but not the contents (if the git server supports this feature)
A checkout with this can literally clone nothing but hash
git clone --depth=1 --filter=tree:0 --no-checkout https://xxxx/repo.git
cd repo
git log
People using Go modules should be using git tags, right? They should have at least one hash already that should be infinitely cacheable, the tag commit.
Of course, I have seen alleged examples of Go modules using tags like branches and force pushing them regularly, but that kind of horror sends shivers down my back, at least, and I don't understand why you'd build an ecosystem supporting that sort of nonsense and which needs to be this paranoid and do full repository clones just for caching tag contents. If anything: lock it down more by requiring tag signatures and throwing errors if a signed tag ever changes. So much of what I read about the Go module ecosystem sounds to me like they want supply chain failures.
Yeah, on third-party code hosting platforms :). And maaaaaybe in some short-lived cache somewhere. I mean, why spend on storage and complicate your life with state management, when you can keep re-requesting the same thing from the third-party source?
Joking, of course, but only a bit. There is some indication Google's proxy actually stores the clones. It just seems to mindlessly, unconditionally refresh them. Kind of like companies whose CI redownload half of NPM on every build, and cry loudly when Github goes down for a few hours - except at Google scale.
I can see this happening on my own git hosting too. I've started moving my Go code off GitHub and the module mirror shows up every 25min for each repo it's aware of doing a full clone. Thankfully the few modules I've moved are very small ones with very little history. This won't come anywhere near my egress allocation for that box.
So either you disable the proxy completely for your site or get overwhelmed by traffic from a not to well written service, similar to a small DDOS attack, which is run by google and they are not planning to fix this? Did I get anything wrong here?
There is a refresh exclusion list which you can request your site to be added to. The proxy will continue to process requests for modules from your site but will not perform the automatic refresh which caused issues for Sourcehut. The Go team extended an offer to add Sourcehut to the list if a request to do so was made. The request never came and instead Sourcehut blocked the proxy.
"Opt out of me DoSing" you is not legit behavior -- especially when the victim has raised it with you multiple times, suggested fixes, and you have then blocked them from communicating in your issue tracker.
That's some really entitled thinking on the part of the Go team at Google, and it's sad to see people stanning for them.
If an amount of traffic that nobody else even notices brings you to your knees, you're doing something wrong. Go could be far more efficient but pointing at this and calling it a DDoS is silly.
Technically it'd be a DDoS, but however distributed it is makes no difference to what we're talking about here. If you could respond to substance, it'd be appreciated.
> and implies a trust relationship with Google to return authentic packages.
The entire point of the sumdb (go.sum), is to prevent the need for such a relationship. If Google (or any proxy you use) tries to return questionable packages, it will be detected by that system.
If the proxy changes a new version of a package, when you update it, there's no way to detect it since it fetches through the cache anyways, so a poisoned sum will be added to sumdb, and anyone who isn't fetching their packages through Google's proxy will get told that whatever they're using is trying to trick them.
> anyone who isn't fetching their packages through Google's proxy will get told that whatever they're using is trying to trick them.
That is exactly the detection of a poisoned module in the ecosystem. It would break builds, issues would get filed, and a new version would be released (and the malicious party may not be so lucky this time since it’s trust on anyone’s first use).
Considering how few people do so, I'm fairly certain it would take more than a month for somebody to catch that.
But I guess it's also fairly easy to test it: just serve a slightly different version to the google's go mirror (by the user agent), and see how long until somebody complains to you about it.
I think every company I know of with private Go modules (6-8 or so?) is running a module proxy, which will detect this. The several times we've detected this it's always been within 2-3 days of the upstream mistake. When I go to report a bug we're not always the first either.
> anyone who isn't fetching their packages through Google's proxy will get told that whatever they're using is trying to trick them.
No, the error message you get is neutral about which side might be wrong - it says "verifying module: checksum mismatch" and "This download does NOT match the one reported by the checksum server." (I've seen it a lot because it also appears when module authors rebase, which a small but surprisingly high number do...)
I find it annoying that blocking the Go module proxy (on the server side) doesn't cause a graceful fallback. Am I unreasonable in thinking that it should? Doesn't the sum file prevent you from getting a maliciously modified copy?
> Perform a shallow git clone rather than a full git clone; or, ideally, store the last seen commit hash for each reference and only fetch if it has been updated.
I'd be interested to understand why that solution hasn't been implemented yet.
> I was banned from the Go issue tracker without explanation, and was unable to continue discussing the problem with Google.
is completely asinine. But it's also par for the course when it comes to interacting with Google. When is anyone going to hold them to account for their terrible customer service and community interaction?
> I was banned from the Go issue tracker for mysterious reasons [ In violation of Go’s own Code of Conduct, by the way, which requires that participants are notified moderator actions against them and given the opportunity to appeal. I happen to be well versed in Go’s CoC given that I was banned once before without notice — a ban which was later overturned on the grounds that the moderator was wrong in the first place. Great community, guys. ]
When a story has two sides and one party chooses to keep silent when accused, I tend to favor the accuser.
I would share their side as well, but I never heard it. This is a violation of their own code of conduct, which requires them to notify the affected person, explain why, and offer the opportunity to mediate the situation. This is not the first time I was banned from the Go community without notice or explanation, and the first time turned out to be frivolous -- the ban was overturned months later with an admission that it was never justified in the first place. Community management in Go strikes me as very insular and unprofessional.
Regardless, I don't really want to re-litigate it here. The main issue is that Google has been DoS'ing SourceHut's servers for two years, and I think we can all agree that there is no conduct violation for which DoS'ing your servers is a valid recourse.
Could try and do both? Drew lives in The Netherlands nowadays. I don't know where Sourcehut is based legally, and I realize the Dutch court system is also overburdened & expensive but it might be worth a shot. I wonder if Drew considered it, and if he'd been able to use the Dutch court system. If Drew needs a good Dutch lawyer, I can recommend Arnoud Engelfriet of ICTRecht (has a blog on Iusmentis.com, legal advice is free if he's allowed to publicize it).
Knowing him, it probably wasn’t nothing. But that also does Dr matter - he runs a hosting service that sees a fair amount of traffic and the onus is on Google as stewards of the golang community and the brilliant minds behind invisibly proxing requests through Google’s servers to continue to figure this issue out. They have his email address.
All I can figure is they're trying really, really hard to keep this very parallel on their side and also avoiding having to coordinate between nodes. It can't possibly be the reading of the robots.txt that's hard, so I think that statement has more to do with applying those policies to all the nodes—they must regard coordination between nodes to ensure the system as a whole isn't exceeding e.g. request rate maximums, to be "a fair bit of extra work".
Judging just from the linked post, the issue on which this was discussed, and this thread, it's feeling a lot like this proxy was some kind of proof-of-concept that escaped its cage and got elevated to production.
From what I understand, the proxy also helps people make sure that an upstream deleting their GitHub repos doesn't result in builds breaking on new machines that don't have it cached locally. Imagine the problems that could happen if someone new joins your team, runs `go build` and then one of the vital dependencies 404s.
The other problem is that it's Google so their perception of "not much traffic" is "biblical floods" to other people.
> the proxy also helps people make sure that an upstream deleting their GitHub repos doesn't result in builds breaking
This is a consequence of what IMO is another bad decision by the Go team: having packages be obtained from (and identified by) random github repos, instead of one or more central repositories like Maven Central for Java or crates.io for Rust. The proxy ends up being nothing more than an ad-hoc, informally-specified, bug-ridden re-implementation of half of a central repository, to paraphrase Greenspun's tenth rule.
Why? If some open source maintainer goes off the rails and deletes all their packages, why should that break my builds? I still have a valid license to the code, I don't really care that a maintainer rage quit 4 dependencies down from my application. I certainly don't want to have to scramble to deal with that.
Do they change the behavior if the repo was dropped for legal issues, and no valid licenses could have been obtained because the repo owner didn't have one in the first place?
Sr.ht should set up their own graceful rate-limiting anyway, or they're going to have problems with badly coded CI setups as the service becomes more popular.
Wouldn't this just make the failures more mysterious and harder to track down for users? Failing sometimes for no reason that's obvious or apparent to end users is worse than always failing, IMO.
E.g.: source-based packages on distributions where users may not be Go programmers, or even non-programmers, will compile and install Go software where some nested library dependency is on sr.ht. These packages will now fail and, sadly, this is going to cause widespread disruption. I think it'd be worse if those failures only happened occasionally, and not reliably repeatably.
Are google's requests done on-demand, or part of some scheduled refresh?
I am wondering, because even if google's traffic is unreasonable, it might still be less than without the proxy.
CI configurations are notoriously inefficient with dependency fetching, so I would not be surprised if the actual client traffic is massive and might overwhelm sourcehut if all migrate to direct fetches.
I recently moved to docker build pipeline for a project, and it’s redownloading all deps on each source file change, unlike the efficient on disk incremental compilation, because of how docker layer caching works, so my usage skyrocketed (and my build times went from seconds to minutes).
Since writing this, I have seen the trick - copy just the go.sum and go.mid files into an earlier layer, then get the deps, then copy all the other source code.
The proposed resolutions from Google have been along the lines of changes to the proxy application code. The resistance has been that this is hard because of the way responsibility is divided between the thin layer that provides the proxy web service and the ‘go’ command itself, which actually fetches the packages.
Would it be simple to solve this with an additional layer of the same proxy? Currently, end users request a package from proxy.golang.org as per the default value of the GO_MOD_PROXY env var. Google runs many of these to handle the traffic, let’s say 1000. They all maintain a mirror of all packages that have been recently requested (note that I expect there’s more nuance here around shared lists of required packages, etc, but the structure should hold true)
The result is that every one of the 1000 proxy instances requests the source data from git.sr.ht every day.
Google could set GO_MOD_PROXY on the existing instances to internalproxy.golang.org. They could then run 100, or maybe 10 of these internal instances. This would drop the traffic to hit.sr.ht by one or two orders of magnitude.
I suspect it would require minimal if any change to application code. This might be accomplished entirely within the remit of a sysadmin (SRE?).
> For boring technical reasons, it would be a fair bit of extra work for us to read robots.txt […]
This is coming from one of the biggest, richest, most well-staffed companies on the planet. It’s too much work for them to read a robots.txt file like the rest of the world (and plenty of one-man teams) do before hammering a server with terabytes of requests.
If this is too much for them then no wonder they won’t implement smarter logic like differential data downloads or traffic synchronization among peer nodes.
Why did sourcehut not take the offer to be added to the refresh exclusion list like the other two small hosting providers did? It seems like that would have resolved this issue last year.
For a number of reasons. For a start, what does disabling the cache refresh imply? Does it come with a degradation of service for Go users? If not, then why is it there at all? And if so, why should we accept a service degradation when the problem is clearly in the proxy's poor engineering and design?
Furthermore, we try to look past the tip of our own nose when it comes to these kinds of problems. We often reject solutions which are offered to SourceHut and SourceHut alone. This isn't the first time this principle has run into problems with the Go team; to this day pkg.go.dev does not work properly with SourceHut instances hosted elsewhere than git.sr.ht, or even GitLab instances like salsa.debian.org, because they hard-code the list of domains rather than looking for better solutions -- even though they were advised of several.
The proxy has caused problems for many service providers, and agreeing to have SourceHut removed from the refresh would not solve the problem for anyone else, and thus would not solve the problem. Some of these providers have been able to get in touch with the Go team and received this offer, but the process is not easily discovered and is poorly defined, and, again, comes with these implied service considerations. In the spirit of the Debian free software guidelines, we don't accept these kinds of solutions:
> The rights attached to the program must not depend on the program's being part of a Debian system. If the program is extracted from Debian and used or distributed without Debian but otherwise within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the Debian system.
Yes, being excluded from the refresh would reduce the traffic to our servers, likely with less impact for users. But it is clearly the wrong solution and we don't like wrong solutions. You would not be wrong to characterize this as somewhat ideologically motivated, but I think we've been very reasonable and the Go team has not -- at this point our move is, in my view, justified.
You are basically arguing that sr.ht is taking a "principled stand" against google. If that is what they are doing they should just say that and not pretend like there were no other options.
I'm ok with saying "google should do better!" But the compromise solution from the Go team seems reasonable to solve the immediate issue in a way that doesn't harm end users. The author should at least address why they have chosen the more extreme solution.
Or, we should not assume that Google's stance is correct, that sr.ht is expected to explain themselves. We should ask why is Google continuing on that path of DOSing upstream servers by default, not willing to use standards for properly using network resources, expecting all of them to do all the work.
EDIT: Moreover, sr.ht doing a workaround only for sr.ht, and lubar doing a workaround for lubar, etc... is not what Free Software is about. The point is that we're supposed to act as a community, for the betterment of the collective. Individualism is not a solution.
I wondered that too, but then I wondered if that's what sourcehut have actually done. I didn't notice any details about how go module mirror will be blocked.
Wouldn't the effect on sourcehut users be identical?
No. The Go team offered to add sourcehat to a list that would stop background refreshes. It would still allow fetches initiated by end users. The change sourcehat is making is to break end users unless they set some environment variables.
I've not seen any explanation about why the solution offered by the Go team was unacceptable. Its weird that that is completely left out of the blog post here.
They could also just add the site to that list, or better yet, make it opt in for sites instead of slamming them with shitty workers, and shitty defaults.
You know, like be good neighbors and respectful of other people's resources, maybe read robots.txt and not make excuses for why you are writing shitty stateless workers that spam the rest of the community.
> Without the proxy it would just be millions of users hammering their servers.
Doing shallow clones, which are significantly cheaper.
Google is DDoSing them, by their service design. Why a full git clone, why not shallow? Why do they need to do a full git clone of a repository up to a hundred times an hour. It doesn't need that frequency of a refresh.
The likely answer is that the shared state to handle this isn't a trivial addition, it's a lot simpler to just build nodes that only maintain their own state. Instead of doing it on one node and sharing that state across the service, just have every node or small cluster of nodes do its own thing. You don't need to build shared state to run the service, so why bother? That's just needless complexity after all, and all you're costing is bandwidth, right?
That's barely okay laziness when you're interacting with your own stuff and have your own responsibility for scaling and consequences. Google notoriously doesn't let engineers know the cost of what they run, because engineers will over-optimise on the wrong things, but that also teaches them not to pay attention to things like the costs they inflict on other people.
It's unacceptable to act in this kind of fashion when you're accessing third parties. You have a responsibility as a consumer to consume in a sensible and considered fashion. Avoiding this means you're just not costing yourself money through your laziness, you're costing other people who don't have stupid deep pockets like Google.
This is just another way in which operating at big-tech-money scales blinds you to basic good practice (I say this as someone who has spent over a decade now working for big tech companies...)
> Google notoriously doesn't let engineers know the cost of what they run
Huh? I left a few months ago but there was a widely used and well known page for converting between various costs (compute, memory, engineer time, etc).
Per TFA
> More importantly for SourceHut, the proxy will regularly fetch Go packages from their source repository to check for updates – independent of any user requests, such as running go get. These requests take the form of a complete git clone of the source repository, which is the most expensive kind of request for git.sr.ht to service. Additionally, these requests originate from many servers which do not coordinate with each other to reduce their workload. The frequency of these requests can be as high as ~2,500 per hour, often batched with up to a dozen clones at once, and are generally highly redundant: a single git repository can be fetched over 100 times per hour.
The issue isn't from user-initiated requests. It's from the hundreds of automatic refreshes that the proxy then performs over the course of the day and beyond. One person who was running a git server that hosts a Go repo only they use was hit with 4gb of traffic over the course of a few hours.
That's not how the proxy works. The proxy automatically refreshes its cache extremely aggressively and independently of user interactions. The actual traffic volume generated by users running go get is a minute fraction of the total traffic.
sourcehut's recommendations seem absolutely reasonable: (1) obey the robots.txt, (2) do bare clones instead of full clones, (3) maintain a local cache.
I could build a system that did this in a week without any support from Google using existing open source tech. It's mind boggling that Google isn't honoring robots.txt, is requesting full clones, and isn't maintaining a local cache.
Despite the issue, I'm not convinced that Go doesn't do shallow fetches vs deep clones. Other issues (like Gentoo's issue with the proxy, I don't have a link handy sadly) point to fetches being done normally, not clones.
It's not about what Go allows, it's about what Google's proxy does on its own schedule. If there was a knob sr.ht could use to change this, it would've come up in the two years since this issue was raised with the Go team.
What does a local cache even mean at Googles scale though? Some of the cache nodes are likely closer to Sourcehuts servers than to Google HQ. I guess local would mean here that Google pays for the traffic. But then it is not a technical problem, but a "financial" one.
If you disregard the question who pays for a moment and only look at what makes sense for the "bits", the stateless architecture seems not so bad. Just a pity that in reality somebody else has to foot the bill.
Are you serious? Google Cloud Storage is a service that Google sells to folks using its cloud. If they can't use it for their own project, that would be shocking, no?
They are probably already using something like GCS to store the data at the cache nodes.
I was not talking about how the nodes store data, but about a central cache. Purely architecture wise, it doesn't make sense to introduce a central storage that just mirrors Sourcehut (and all other Git repositories). Sourcehut is already that central storage. You would just create a detour.
It's also not an easy problem. If the cache nodes try to synchronize writes to the central cache, you are effectively linearizing the problem. Then you might as well just have the one central cache access Sourcehut etc. directly. But then of course you lose total throughput.
I guess the technically "correct" solution would be to put the cache right in front of the Sourcehut server.
Go's Proxy service is already a detour for reasons of trust mentioned in the article. They are in a position to return non-authentic modules if necessary (e.g. court order). That settles all architecture arguments about sources-of-record vs sources-of-truth. The proxy service is a source of truth.
If Google is going to blindly hammer something because they must have their Google scale shared nothing architecture pointed at some unfortunate website, then they should deploy a gitea instance that mirrors sr.ht to Google Cloud Storage, and hammer that.
It's unethical to foist the egress costs onto sr.ht when the solution is so so simple.
Some intern could get this going on GCP in their 20% time and then some manager could hook the billing up to the internal account.
Drew has some... strong opinions on some things, but a straight reading of the issue suggests he's being perfectly reasonable here, and it's Google who can't be arsed to implement a caching service correctly - instead, they're subjecting other servers to excessive traffic.
It's about the clearest example of bad engineering justified by "developer velocity" - developer time is indeed expensive relative to inefficiency for which you don't pay because you externalize it to your users. Clearest, because there are fewer parties affected in a larger way, so the costs are actually measurable.
I do have a dog in this, in a way, because as one of the paying users of sr.ht, I'm unhappy that Google's indifference is wasting sr.ht budget through bandwidth costs.
Sure, but with it, the biggest impact will likely be... spam in logs on the Google side. Short-circuiting a request from a specific user agent to a 429 error code is cheap, compared to performing a full git clone instead.
I don't have any particular affinity for Google, but they're still a business and they're already developing the Go language (and relevant infrastructure) at their own expense. It's not like the Go team at Google has access to the entire Alphabet war chest like your "biggest, richest, well-staffed companies on the planet" suggests.
Go since inception has always been well funded. It is authored by some of the biggest names in programming and they are on staff at Google. This is not a side hobby. Not sure why you're suggesting that Go is lacking in resources.
No. It is much smaller team as far as resources go. Compared to Swift for Apple or Java for Oracle, Go is not strategic bet for Google. There is absolutely no dependency on Go to develop services for Google platform in it. Hell, large number of Google employees spend time on disparaging Go. It does not happen for other company sponsored languages.
Someone in the Go team (rsc, IIRC) commented on how a Google executive came to him in the cafeteria to congratulate him on the launch. It turns out the executive confused him with someone on the Dart or Flutter teams.
Now a bit of personal history. The Go project was started, by Rob,
Robert, and Ken, as a bottom-up project. I joined the project some 9
months later, on my own initiative, against my manager's preference.
There was no mandate or suggestion from Google management or
executives that Google should develop a programming language. For
many years, including well after the open source release, I doubt any
Google executives had more than a vague awareness of the existence of
Go (I recall a time when Google's SVP of Engineering saw some of us in
the cafeteria and congratulated us on a release; this was surprising
since we hadn't released anything recently, and it soon came up that
he thought we were working on the Dart language, not the Go language.)
Yes, Google staffs its Go team, but the original comment invokes Google's vast wealth as though its entire market cap is available for the development of Go, which is of course absurd. Google probably spends single-digit millions of dollars on Go annually, and it seems they've determined that supporting Drew's use case would require a nontrivial share of that budget which they feel could be spent to greater effect elsewhere.
Go is not only a "side project" at Google, but one of its most trivial side projects.
Knowing that "we only have a few million in funding per year" was a valid excuse for generating abusive traffic and refusing to do anything about it, would definitely have changed a few conversations I've had working at startups. Interesting.
Of course, Google doesn't materially benefit from optimizing the module proxy for Drew's use case, and I doubt your startups would have made traffic optimization its top priority either under similar circumstances (which is to say "no ROI from traffic optimization").
This is obviously untrue because we know that Google does write significant portions of its backend in Go and that Google derives ~0% of its revenue from Go (the very definition of a side project). My guess is that you're assuming that a side project for Google is the same as a side project for a lone developer or a small team, which is (pretty obviously, IMHO) untrue.
AdWords is mainly written in Go. YoutTube is mainly written in Go. Just because they have strategic reasons for not directly monetizing Go doesn't make it a side project more than any other internal tooling.
It's core to their ability to pull in revenue now. If they were somehow immediately deprived access to Go, the company would go under. That's how you know it's not a side project.
> AdWords is mainly written in Go. YoutTube is mainly written in Go
Can you source these claims? Last I checked, YouTube was primarily written in Python, and I doubt that's changed dramatically in the intervening years given the size of YouTube. I assume there's some similar thing going on for AdWords.
> Just because they have strategic reasons for not directly monetizing Go doesn't make it a side project more than any other internal tooling.
Agreed, but all internal tooling is a side project pretty much by definition.
> It's core to their ability to pull in revenue now.
No, it's just the thing that they implemented some of their systems in. I'm a big Go fan, but they could absolutely ship software in other languages for a marginal increased operational overhead.
> If they were somehow immediately deprived access to Go, the company would go under. That's how you know it's not a side project.
I don't know what it means to be "deprived access to Go", but this is a pretty absurd definition of "side project" since it applies to just about everything Google does and a good chunk of the software Google depends on whether first party or third party (Google depends much more strongly on the Linux Kernel; that doesn't mean contributing to the Linux Kernel is Google's primary business activity). It seems you have a bizarre definition of "side project" which hinges on whether or not a business can back out of a given technology on a literal moment's notice irrespective of how likely it is that said technology becomes unavailable on that sort of timeline, and that these unusual semantics are at the root of our disagreement.
Not to mention, it's likely a quite impactful form of marketing / developer relations gain for them. I think so because when I talk to people who start to learn Go, I usually see a transfer of positive feelings and excitement from Go itself to Google as its creator/backer - one of the clearest examples of "halo effect" I've seen first-hand.
Do you really imagine some significant number of Google's search, cloud, etc customers were driven to Google over a competitor because of "good vibes" derived from Go? Google only develops Go because it's a useful internal tool, and I'm pretty sure the marketing team nor the executives spend any meeting minutes discussing Go.
Yes, I do imagine that people who are really into Go are more likely than average to join or start Go shops, and then pick GCP over competitors because they have to start with something, and being Go people, Google stuff comes first to mind.
Lots of companies across lots of industries spend a lot of money to achieve more-less this fuzzy, delayed-action effect.
> Yes, I do imagine that people who are really into Go are more likely than average to join or start Go shops, and then pick GCP over competitors because they have to start with something, and being Go people, Google stuff comes first to mind.
How many such people do you imagine there are? I'm active in the Go community, and I've been a cloud developer for the better part of a decade. It's never occurred to me to pick GCP over AWS because Google develops Go, nor have I ever heard anyone else espouse this temptation. I certainly can't imagine there are so many people out there for whom this is true that it recoups the cost that Google incurs developing Go.
Rather, I'm nearly certain that Google's value proposition RE Go is that developing and operating Go applications is marginally lower cost than for other languages, but that at Google's scale that "marginally lower cost" still dwarfs the cost of Google's sponsorship of the Go language.
This problem isn't really specific to Google. If some hobby project was DoSing sites it would get banned. "We don't have the resources to not DoS" is not a valid excuse. The Go team needs to scope their ambitions properly; if they can't make their proxy work safely they should not have bothered to develop it.
Surely Google of all places has the most tested, battle-hardened robots.txt library in existence, and they have a company-wide public monorepo to boot. There's no excuse for this.
I'm pretty sure parsing robots.txt isn't the challenge. The Go team asserts that there are technical difficulties to this traffic optimization, and I don't have any reason to disbelieve them (they're clearly not dumb people, and I certainly trust them more than Internet randoms when it comes to maintaining the Go module proxy). It's a bummer for Drew, but he isn't Google's top priority right now (it seems wild to me that you think there is "no excuse" for Google not to prioritize niche use cases like Drew's--how do you imagine large organizations choose what to work on?).
I feel like Drew has been in a pissing match with the Go team for a while so this outcome doesnt surprise me.
I love Go but not Google's stewardship of it. The tracking proxy, Russ' takeover / squash of the package management work, the weird silence / stonewalling on other community issues...
Drew has a valid complaint. I hate to hear he was banned from the issue tracker but that sounds about right.
As a sibling said - GOPRIVATE is probably a good solution without throwing the baby out with the bathwater.
Seems like they could also let one request per hour through, say, and then serve up the rest a 429.
Users trying to clone their project would hit an almost certainly up to date Google cache and thus be happy and sr.ht save on pretty much all that traffic and thus be also happy.
Why the burden of making this google-specific rate-limiter should fall on Sourcehut ? They are a small team and a user-agent-per-repo specific limiter have rare uses.
Google should definitely fix their shit, but when we see that a googler said "it would be a fair bit of extra work for us to read robots.txt", I understand the frustration on Sourcehut side, and banning an user agent is usually just a single line of configuration to add.
I thought that too, but then sr.ht needs to keep state which IP made how many requests already, and as the post mentions google makes these requests from many different places. So they would have to count requests from the specific user agent and by that point it is again special purpose work and load on sr.ht (whereas general rate limiting per IP might be a good idea anyway).
But since it looks like the proxy fleet is made of a large set of nodes and each one of them only keeps track of their own state, if sourcehut were to only keep one counter for the whole user-agent it would lead to a large number of nodes not being able to refresh their cache for potentially long periods of time (there's nothing that would guarantee for example that it's always a new node that gets to make the one request per hour).
That's precisely why it's so unfair to ask the sourcehut team to try to come up with a solution to this problem: it's the very design of the proxy that Google put together that is causing this issue in the first place. And the sourcehut team has no control over their design. And to add insult to injury: when the sourcehut team offers recommendations on how to improve their implementation, Google responds with "it's too much work".
And to add a cherry on top, the only reason a single user-agent counter is even a possibility to discuss is because Drew made the Go team put a unique user-agent on the proxy in the early stage of this issue report.
The Linux firewall is designed specifically to be able to track and count packets by IP address and other metedata, and can do so extremely efficiently.
Setting up an IP based rate limiting rule in nftables takes a few minutes at max, and I am not a professional sysadmin.
I like this because it feels like malicious compliance and it would allow Google to seemlessly continue working should Google figure out their caching situation. Plus it uses a rarely used HTTP status. Love it.
I have seen more 429, almost all from large cloud providers, a fair amount in the last 5 years. Never before then, I was used to making rate limiting an in-band part of the API explicitly. AWS S3 will give you 429s before giving you 503s when you hit it with a go program touching every object or every version in a giant bucket from a 56 core machine and a few hundred Go routine workers. It is a good signal: if you do exponential back off per worker on the 429s, you won’t see the 503s.
Drew strikes me as very ideological, so it's refreshing to see a move this pragmatic. This has a better chance at bringing awareness to these issues, and resolving the crawler bullying than his usual antics.
Have to agree in as much banning people with legitimate issues is unhelpful. I was also taken aback when Peter Bourgon, a great programmer and contributor to the go ecosystem was banned from all go channels.
> I was also taken aback when Peter Bourgon, a great programmer and contributor to the go ecosystem was banned from all go channels.
Bourgon was frequently helpful and great, but also frequently rude, condensing, dismissive, and generally just unpleasant. I've seen this countless of times first-hand on Slack, Reddit, and Lobsters. I specifically stopped interacting with him long before he was banned. Whether he's a great programmer/contributor not isn't really important here.
I kind of hate how it's brought up here because I think he's not a bad bloke at all and I know the entire situation caused great personal hurt to him. But that doesn't change that he was the kind of "brilliant jerk" that would chase people out of the community with his behaviour, and that he was unreceptive to criticism of it (often getting pretty defensive/aggressive). No one liked how all of this turned out, but it did make the Go community a better place. Being helpful yesterday doesn't cancel out being a jerk today.
Same with Drew: he posted legitimate helpful issues. And he also ranted about how people were all a bunch of morons. I don't blame anyone for getting tired of that.
> Whether he's a great programmer/contributor not isn't really important here.
I don't see why not. Personalities fall on a broad spectrum. Still seems strange to me that the recent broad pushes for more inclusiveness, including neuro-atypicality, does not cover people that inconvenience you personally.
> that would chase people out of the community with his behaviour [...] but it did make the Go community a better place
I've noticed that claims like this are never backed by any evidence of this improvement, or evidence of people who have actually been chased away by rudeness. It no doubt causes great relief in the minds of those who dislike the exiled person, but it's always justified with a broader claim that "it's for the greater good".
I understand comments can be non-constructive, and that some people are more prone to it, but total exile is a big hammer that should be used more judiciously IMO.
> I've noticed that claims like this are never backed by any evidence of this improvement, or evidence of people who actually have been chased away by rudeness.
I am one of them. I've seen other people claim the same. I did not keep a list, nor did I keep a list of all of his posts that I found egregious, and I don't really feel like spending a lot of time crawling through all posts to find them, so I guess this is all I have.
It's hard to get "hard evidence" for these kind of things in the first place. Most people just disengage and don't come back. The best I know of is "Assholes are Ruining Your Project"[1] from a few years back. It would be interesting to check similar numbers for Go and other projects. I'm not sure if it's easy to get these kind of numbers from e.g. Slack or Reddit though.
> total exile is a big hammer that should be used more judiciously.
It wasn't the first time he was banned, but I'm not privy to the exact details on this. Was total "total exile" proportional? I don't know: obviously I didn't see everything. I just wanted to say he didn't "just" get banned over a minor thing, but after many years of problematic behaviour that had been raised plenty of times.
This is the kind of thing I'm asking about. Lots of numbers are trotted out but where's the actual data? Where's the methodology?
The blurb says, "This talk will teach you, using quantified data and academic research from the social sciences, about the dramatic impact assholes are having on your organization today and how you can begin to repair it."
Social science research has a dramatically poor replication rate, so on that basis alone I'm skeptical of the numbers even if he did interpret them correctly.
That said, I agree asshole behaviour has to be reigned in, but exile is pretty dramatic if you really think about it. It's super easy and I think that's why people do it, but that doesn't make it good option.
No one considers it a good option. It's usually the last option, done after a fair bit of mediation to try and improve the assholes behavior. Only when it's clear that they can't or won't do you ban.
> Only when it's clear that they can't or won't do you ban.
I'm saying that's still not a reasonable measure even in that case. Why not an exponential backoff, where the first measure is that they only get one post a day. If they want to be heard they have to be more careful in how they word things and they have more time to think about how it might be received. If they transgress again, then it's upped to every three days, then once a week, then once every other week, and so on. A total ban is the limit of this more nuanced process.
No doubt this feature doesn't exist, so I'm suggesting something like this should be added because I'm not at all a fan of bans. Even this is a stopgap measure used to manage assholes because we don't yet understand what's at the root of asshole behaviour.
Edit: to clarify, I mean the backoff/retry strategy is still not ideal, but an easy first attempt at trying to reframe this as a problem we can maybe address using programming abstractions to inhibit rather than facilitate communication. Most software is focused on reducing barriers to communication, which is why banning is the only recourse, but in cases like this you obviously want to raise barriers to communication in controlled ways so you don't have use the ban hammer.
> Social science research has a dramatically poor replication rate, so on that basis alone I'm skeptical of the numbers even if he did interpret them correctly.
It's not a perfect science, but that doesn't mean "do nothing" is the best option, or that we can't just use common sense for that matter. If someone joins a community space and their first interaction is being insulted then the chance that they will come back is lower than if they're not insulted. I don't think you need a whole lot of rigorous science to accept this basic point, just as we don't need a whole lot of rigorous science to accept that dogs can feel pain, have an emotional life, have different personalities, etc.
> That said, I agree asshole behaviour has to be reigned in, but exile is pretty dramatic if you really think about it. It's super easy and I think that's why people do it, but that doesn't make it good option.
It sure is dramatic! Like I said, I don't really have the full story on this, so it's very hard for me to judge if it's proportional. I don't think the decision was made lightly as everyone involved realized it's not J. Random Gopher but a fairly well-known person within the community.
Related story: in a community (unrelated to Go) I once sent a message to someone asking them not to insult people; pretty basic unambiguous "you can't call people idiots here" kind of stuff. They were also very helpful in other cases and I knew they were going to be sensitive about it, so I sent the kindest kid-gloves message I could come up with; no threats of any actions, just "hey, can you not do this here?" They just replied with "no, I will not change, fuck off". So ... I (temporarily) banned them. What else was I supposed to do at this point? Let them continue anyway even though it was clearly inappropriate? Anyone looking on might think "gosh, did you really have to ban them for those remarks? It wasn't that bad?" Not unreasonable, but ... they also weren't aware of the conversation I had with them, and their reply. No one made any remarks about it, but if they did, I wouldn't have commented on it because it's still a private conversation.
This is the kind of stuff we may be unaware of. In my first message I mentioned "unreceptive to criticism of it (often getting pretty defensive/aggressive)" for a reason. I don't know what happened behind the scenes, but from what I've seen in public cases where people commented on his behaviour I expect things didn't go swimmingly. It's one thing to screw up at times and at least acknowledge you screwed up, but it's quite another thing to be consistently dismissive about any concerns and outright reject the idea there is anything wrong with your behaviour. I expect that this attitude played a large factor in the decision.
> They just replied with "no, I will not change, fuck off".
As a former Rust moderator, this, so much. So many people don't see this part, where you reach out to folks and spend long grueling hours trying to get them to correct their behavior, precisely because no non-psychopath wants to drop the ban hammer on anyone. (Unless it's for obvious spammers and drive-by trolls.)
And the people saying "well I'm not suggesting do nothing, but just use better tools." Well, yeah, great, let's use better tools. Who's going to get GitHub to implement them? Or whatever other platform you're using? Some platforms have better support for this kind of tooling than others, but GitHub's is (last time I checked) pretty bad and coarse. It is slowly getting better over time. It used to be virtually non-existent.
But in the mean time, the people actually in the trenches doing the hard work of moderation have to do something. If the platform doesn't have this sort of idealistic tooling that's easy to navel gaze about on HN, then they have to do the best with what they have.
Sure, I'm not suggesting "do nothing", I elaborate on what I'm suggesting in another reply below, re: backoff/retry strategies. I think online community management software needs features to better handle defectors and other non-constructive interactions, and not just focus on features that facilitate or ease communication. Sometimes you don't want to increase communication speed, sometimes you want back pressure to slow things down.
Yes, I agree. Everyone deserves another chance, several of them even.
I'm reasonably sure there had been at least Slack bans before though; this wasn't the first ban (I thought I mentioned this before, but looks like I forgot).
I left various projects due to rudeness. I joined other projects as they felt welcoming.
In my (limited) experience, naming projects you left due to rudeness or bad behaviour tends to lead to that bad behaviour noticing your message and pestering you with questions about exactly why you left, and arguing that you are being unreasonable -- which is why I'm not naming those projects.
I dunno, aren't people a bunch of morons when you really get down to it? Like, isn't that a legitimate complaint too?
> Whether he's a great programmer/contributor not isn't really important here.
I'd make a distinction here between having a reputation as a great contributor and having something important and correct to say in a given exchange. No, a community shouldn't put up with a person with a great reputation (or elevated title or higher pay grade) if they are unpleasant and wrong. But if they're right and they're a little impatient or impulsive, the community has more to gain from listening and simply pointing out they don't need to be impatient and impulsive. Let him build up a reputation for being a jerk rather than just ban him.
> I dunno, aren't people a bunch of morons when you really get down to it? Like, isn't that a legitimate complaint too?
It's still unproductive to say that in a community support space.
Who's the bigger moron, the moron or the guy holding a public grudge for over a year about how unfair it is they're not letting him in the channel full of morons?
It is true but usually not useful to say. Start with the assumption he 80% of everything is crap and be happy about the exceptions but not really fussed at the crap. If you can’t do do these things, you lack the merit needed to work with others. You can always go off alone and do excellent stuff.
I don't have a full list of all posts at hand (some of which may be removed), but I've seen some other similar stuff as well; it's not an isolated incident. I was reading through the previous thread on this issue (goproxy sending loads of requests) and this one was posted as an example there.
I was indeed in the wrong when I made this comment four years ago. I have since apologized for it. I don't intend to re-litigate anything on HN at this point, but I have good reason to believe that this incident is unrelated to the reason I am presently banned.
The linked comment was indeed out of line, and perhaps you feel justified in thinking that it should be sufficient grounds for a permanent expulsion from the community. I won't argue with that, fair enough. However, I don't think it's reasonable to use it as grounds to suggest that anyone should have their servers DoSed by Google with no recourse, and I think blocking Google is a reasonable move given two years of inaction from the Go team to resolve the issue.
> I don't think it's reasonable to use it as grounds to suggest that anyone should have their servers DoSed by Google with no recourse
Of course not; this entire thread isn't necessarily hugely on-topic here, but it got brought up, so ... well ... here we are. And in fairness, you did bring up your ban in the posted article.
> The linked comment was indeed out of line, and perhaps you feel justified in thinking that it should be sufficient grounds for a permanent expulsion from the community. I won't argue with that, fair enough.
No, I don't think anyone should be banned for a singular comment, no matter how egregious. Everyone deserves second chances, and third ones, even fourth ones maybe. There's some decent data from Stack Overflow that shows that after a ban many people keep posting and many don't get a second ban (i.e. their behaviour improves).
> I have good reason to believe that this incident is unrelated to the reason I am presently banned.
I think the thing is that it's part of a pattern. Usually the "final straw" isn't the worst incident, or even that bad of an incident in itself. Incidents like this aren't isolated and previous behaviour does tend to factor in: "oh, that's the same guy who called us a bunch of morons last year".
> "oh, that's the same guy who called us a bunch of morons last year"
Wait, did some folks in the Go community write that EFAIL site that was referenced as a reason to drop OpenPGP? If so, that changes the context of the post a bit, but I didn't see anything indicating that was the case in the linked thread.
Obviously, your expulsion from the Go issue tracker for abusive conduct is a separable issue from the Go module proxy, as you can see from Go project participants reiterating that the offer to exclude you from the refresh list still stands.
Clearly it was not satisfactory to you, since it was made over 8 months ago, and you didn't take them up on it. I'm objecting here only to the framing you've created that your ouster from the Go issue forum --- which we can see was done with cause --- is what precipitated this situation.
We can behave like adults, ask why it's not satisfactory, and come to a more agreeable mutual solution, or we can blithely offer an incomplete solution, muzzle the other party, and just continue our DDoS.
See, here you just did it again: "muzzle the other party", as if it was causally connected to your disagreement about how the module proxy should work, and not to the abuse you inflicted on members of that community.
I think it's worth taking a step back here to say that IMHO regardless of whether the OP's previous comments justify his expulsion from the issue tracker, having the only other available "DDoS opt-out" mechanism be to email Russ Cox directly is _completely insane_ and unacceptable for an organization of Google's size and funding level. If they're going to ban members from the community (perhaps justifiably so), Google needs to either provide another public place to make one of these requests, or preferably make the DDoS feature opt-in rather than opt-out.
I admitted that my comments about EFAIL -- four years ago now -- were in the wrong, and apologized for them. Unless you're going to argue that this issue should justify consuming 70% of my system's network bandwidth without recourse, move on.
In the interest of not feeding the trolls, I think I can safely stop engaging with you on this thread. Or maybe on any thread -- you and I never seem to have a productive conversation on this website.
> In the interest of not feeding the trolls, I think I can safely stop engaging with you on this thread. Or maybe on any thread -- you and I never seem to have a productive conversation on this website.
HN would be so very much more pleasant with ignore-lists.
Since GH requires login to see minimized comments, here it is:
ddevault on Feb 15, 2019
"EFAIL" is an alarmist puff piece written by morons to slander PGP and inflate their egos. The standards don't need to change to fix the problems it mentions. The proposals help... marginally. The problem is not and was never with OpenPGP, it's with poorly written email clients (e.g. all email clients).
Virtually nobody uses PGP, and it is not at all pivotal. It is one of the least important widely-known cryptosystems on the Internet; like the book "Applied Cryptography", it has a cheering section because of the era in which it was released, and a generation of lay-engineers has taken PGP as a synecdoche for all privacy cryptography.
It is also badly broken and has an archaic design.
Most notably: Filippo had nothing to do with EFail, which was one of the most important cryptographic results of the last 5 years. You don't so much need Drew Devault to tell you that; it's peer-reviewed research.
I am no cheering fan, for sure, but I think it's disingenuous to say PGP is one of the least important systems on the internet. Debian package distribution, notably, depends rather pivotally on PGP to ensure authenticity. Keybase uses PGP as it's root trust mechanism. There are plenty of email services that use PGP to secure messages. I've even come across some recent (as in the last few years) startups using PGP to implement their internal or application-level trust relationships (run by quite sane and well adjusted individuals nonetheless). I worked at a Unicorn in the last 10 years that implemented secret storage and distribution using GPG tooling. In fact, recently and close to home for me, we implemented some application level key exchanges and the security person we consulted with for a 2nd set of eyes actually said (paraphrasing), "I don't like this thing it's custom but if you use ElGamal I'd be more comfortable because at least it's well understood."
Of course these are all things that can and probably should be replaced by something more palatable. So why haven't they?
If it's not obvious, my argument is neither for nor against PGP, really. It's that I'm tired of hearing about how much PGP sucks without also hearing about the solution. I think the burden is on the people wishing to eradicate it to muster up the blesséd alternative and shepherd it into the vernacular.
It is one thing to make a case for the continued maintenance of PGP, or even to say that it has a place in modern cryptography (that's an outré thing to say among cryptography engineers, but, whatever).
It's another thing entirely to say that any cryptography engineer critical of PGP must have a weird personal vendetta against it, as you did upthread.
Harsh criticism of the failings of PGP is practically an orthodoxy among cryptography engineers. It is not a good design by modern standards, and lots of cryptographers would dearly love to be rid of it. Push back on them because you don't think it's worth the time for Debian to switch to minisign, fine, but don't slander people while you're doing it.
I didn't say "that any cryptography engineer critical of PGP must have a weird personal vendetta against it". I know the history and context around the matter. I know Filo has actually tried to do the work to replace PGP. I know it didn't stick. I imagine he more than many people understands how difficult the task of replacing it is. But in my opinion that should lead to a more tempered stance that represents an understanding of this subtlety. Instead we see him on team deprecate PGP software because it's not what We want golang users using. Excuse me if I attribute a small ounce of personal pride to that stance. I could be wrong. This is a discussion thread not a formal essay. I respect many things about Filo. I'm just critical of this particular crusade.
I mean yeah, you're right. PGP has been culturally deprecated for years now. There's no skirting that. I am quite happy that Debian is switching to minisign. Once that transition is complete that will be one less reason to keep PGP around. Really, I have absolutely zero allegiance to PGP. I'm just willing to admit that it works (and quite well) despite all the shortcomings that cryptography engineers love to spar with during happy hours. I sincerely do not disparage efforts to replace PGP. I am just tired of the passé mantra that PGP sux amirite or gtfo. As we both clearly understand, it's not really that simple.
You could reasonably agree or disagree with Filippo take, and after quite a bit of discussion it was decided to not deprecate the opengpg[1]. I'm pretty sure that Drew's comment contributed exactly 0% to that decision.
[1]: It was deprecated a two years later as no one stepped up to maintain it, so it bitrotted even further, and there are other (better) 3rd party implementations anyway. Speaking up is nice, actually doing the work is better.
PGP is difficult to replace. It’s very well supported, and frankly works sufficiently well (sure, it’s outdated, but so is SSH, TLS etc). There are other software that might be more secure and user friendly, but PGP is also secure. A lot of extremely sensitive information is encrypted with pgp.
This is a weird definition of "personal", like PGP kicked his dog or something. The arguments he makes against it are detailed and the agreement of most working cryptographers, even if they don't agree with his specific deprecation schedule. Some people would call that "good engineering".
“Good engineering” would be to meticulously develop and standardize a replacement before idealistically purging the world of alleged “bad software”. Since this endeavor has yet to be undertaken, PGP it is. Good engineers understand this reality.
Look, you can make solid arguments till you are blue in the face about why PGP is unclean and unfit for modern cryptography. And you can be 100% right. But that doesn't mean people who disagree are wrong. There are 100% valid arguments and use cases for PGP too. It takes a mature personality to understand this nuance. And to understand that sharing a mic drop piece about why PGP sucks, getting your security buddies to laugh with you, and then trying to rip it out of existence is incredibly short sighted, ill mannered, and not in the least bit “good engineering”.
Come the fuck off this "mature personality" shit if you're going to write like this. He proposed freezing a module no one wanted to maintain in a library specifically meant to host stuff with weaker compat guarantees, he didn't hop in a DeLorean and kill Zimmermann's grandpa.
Meanwhile, the critical project Drew insisted he keep it for is... deprecated and unmaintained!
That was referencing past conversations I've had where it was very much like I describe. I'll admit I'm channeling some past frustrations and stereotyping and apologize for not making the distinction clear. I am not referring to you or anyone here or anyone on the golang thread, for the record.
> No one liked how all of this turned out, but it did make the Go community a better place
Doubtful. Knowing that the Go team has a habit of ousting contributors because some feefees got hurt ensures that I'll never even consider trying to contribute.
The question is: how many other valuable contributors are you missing out on because of that person?
The "classic" example of this is Ulrich Drepper, who maintained GNU libc for many years. Everyone agrees he's a great programmer. He's a better programmer than I am. But he was also ... difficult. More difficult than anyone else I've seen in a mainstream widely-used project. Many people didn't contribute purely because they just didn't want to deal with Drepper. Debian found it necessarily to fork GNU libc because of Drepper.
So even if we adopt a purely utilitarian attitude on this (and I don't think we should in the first place), I think it's still a bad idea to grant some people a license to be a jerk. In many cases you're not going to come out with better contributions and code.
>Many people didn't contribute purely because they just didn't want to deal with Drepper.
I have no numbers to be able to confirm or deny this, but...
>Debian found it necessarily to fork GNU libc because of Drepper.
... Debian created eglibc because they needed glibc to support their use case and Drepper didn't. Even if Drepper had been the nicest person in the world, if he rejected patches to run glibc on non-x86 then forking was unavoidable.
I don't think Ulrich Drepper is a good example. He wasn't just rude; he also blocked merging important changes. I think a better example is Linus Torvalds.
> In many cases you're not going to come out with better contributions and code.
Why should better drivers be permitted to road rage? That's just a nonsensical question, being unable to drive without being a menace makes you a worse driver.
I've blocked many of the people that called for Peter's ban on Twitter. I don't want to be on their radar or a target of some sort of witch hunt. I consider myself a nice person and not inflammatory/offensive, but I'm a belt and suspenders type of person. It's more important that I can submit an issue on the tracker than interacting "socially" with people who have a higher chance of ostracizing targeted individuals. The risks in my mind now far outweigh the rewards.
Why were people calling for Peter's ban? I'm having a hard time imagining behavior that is so toxic that it merits a ban when "publicly advocating for banning someone from the community" is apparently fair play.
Both are abrasive, but in my mind, this is a cultural issue.
I grew up in a judgmental, holier-than-thou religion that I rejected at an early
age and then was ex-communicated from on my 18th birthday. I've lived this
pattern of judgement and ostracisation. It's dangerous, closed minded, and
wrong.
One of the reasons I work in software is because I'm a "strange" person. I say
things people don't understand, my value system is radically different from my
peers and "normies". I am a very different type of person. Software was a safe
place for weird people, including the productive, but sometimes abrasive. It's a
shame to make software not a place for a wide range of diverse people. In my
difference, I don't want my cultural differences to be targeted by witch hunts,
those looking for blood to prop up their own "moral superiority".
And I personally have no cultural issue with people who say things like, "shitty
usage of the GPG command line tool by applications." They're welcome in my
circles as I sincerely consider myself less judgmental than those calling for a
ban, but perhaps in kindness I can urge people who say stuff like that to, over
time, to be more kind and sensitive. Kindness is a skill that needs
development. Not everyone is in the same place. It's certainly something I'm
working on every day. And if not, that's okay too, not all of us are built with
the same social skills, and that's okay. To _not_ be self-righteous requires
long-suffering. Peter and Drew were worth more effort.
> I don't want to be on their radar or a target of some sort of witch hunt.
My first thought at the time was "oh gosh, it's not just me!" I think many people had a similar feeling. I don't think it's really a "witch hunt"; more a sigh of relief.
As far as I know, they don't. AFAIK the two cases discussed here are the only two notable ones (that is, people who are not outright trolls and the like).
For me it seems that it's more that the few people at helm didn't really bother to learn any modern language or update themselves on theory so it feels like slighty nicer 90's language, not something developed recently.
And their weird entrenchment in "doing things the way plan9 did"
Hell, how you design a statically typed language and go "nah, we don't need sum types! That's too complex!"...
I doubt this is true in any objective sense. "issues" above is almost certainly subjective, and I don't think even in this particular case it's so much "how Google does it" as much as "Google isn't prioritizing limited Go-team resources to this issue right now".
In general, I've been very happy with Go's design for the last decade with relatively few, relatively minor exceptions. I specifically have enjoyed that the Go team has resisted demands from the wider community to add every feature that every other programming language has support for. Biasing toward minimalism has made the language very easy to learn and consequently any Go programmer can jump into any Go project and start being productive in a matter of minutes, and any non-Go programmer can be productive in a matter of hours. Similarly, the tooling is novel in that it has sane defaults (static, native compilation; testing out of the box; profiling out of the box; build tooling out of the box; reproducible package management and hosting out of the box; documentation generation and hosting out of the box; etc).
Obviously this isn't true, it only indicates that Go didn't pick exactly the right set of features at inception (and of course, no one claimed it did).
Go package management was quite poor for many real use cases outside Google, so the community railed around many competing proposals, and when it appeared there was a clear winner, the Go team came out with their own solution instead.
This after kind of supporting the ongoing community efforts.
Dep was very slow, tried to be very clever often resulting in it "cleverly" doing the wrong thing, and was generally a pain to deal with. At $dayjob we migrated to dep and then we had serious discussions if we should move back to glide (we didn't, as vgo was on the horizon by then). There were some plans to fix some of this IIRC, but they never really went beyond "plans".
It's always difficult if people spend a lot of time on a particular solution and then it turns out that's actually not what's desired, or if someone else thinks of an even better solution. All option suck here: chucking out people's work sucks (especially in open source volunteer-effort context), but accepting a bad solution "because people spent time on it" and then being stuck with that for years or decades to come sucks even more.
The communication could certainly have been better; a sort of "community committee" was given a repo on github.com/golang/dep and told "good luck". In hindsight the Go team should have watched development more closely and provided feedback sooner so that the direction could be adjusted. Actually, the entire setup probably wasn't a good idea in hindsight.
This is actually one of the reasons I trust the Go project is in good hands. The Go leadership has smart engineers, with good taste, willing to say no. The Go module implementation and version selection mechanism are better than what the community had converged on.
Go has demonstrated that its governance model is capable of making better decisions than a pure democracy. This is not the case for most open source projects.
And they, unlike almost any open source group outside of Linux and maybe libc, take compatibility seriously. Volunteers break so much stuff so wantonly, and it just isn’t necessary technically.
I left a job and came a few years later, and the Go code all still
worked under 1.19.
And yet go mod is far superior to anything that was created before, having used dep which was honestly very slow and buggy, go mod works just fine and is one of the best pkg mgmt right now ( all languages included ).
Package managment should not be created and maintained outside of the core team, maybe the process in which they decided the solution was not the best but the end result speaks for itself, go mod is very good.
But the Go team ignored it for _years_ until dep got a lot of traction and we had this big community meeting about it and live on the call Russ said he was going to work on minimal version selection and build a tool.
Sam Boyer had given a big talk about package management at GopherCon and on that call his only response when people asked what was going on was "no comment right now" or something like that. It was almost like it was his first time hearing of it.
It just seemed to be handled very poorly.
All parties involved are active on HN AFAIK so they can correct me. I am not trying to dig all the past back up just for funsies... maybe something will happen here and the Go core team will realize they need to work with Drew and sort it out. But history points to them doing what they want to do irrespective of the community.
I was kind of miffed by this early on, but the Go team's solution has been surprisingly effective and it seems to have stood the test of time. Package management is hard, and Go modules are one of the best solutions in the entire programming ecosystem, which is to say they mostly do what you would expect with relatively little debugging. Some superficial stuff (e.g., error messages) could be improved and there are probably some niche use cases that aren't as intuitive as they could be (maybe something like nested modules?), but the overall approach generally works well.
Even more fun, Google doesn't even use Go modules internally! They use their own homegrown build system abomination that requires teams of people to maintain.
Pretty sure virtually every big tech company does the same thing, because they have to integrate a bunch of languages, target any arbitrary platform/architecture tuple, and support use cases like code generation and so on. There aren't any good solutions to this problem that don't require teams of people to maintain--the best in class open source solutions seem to be Bazel and Nix and these are beyond my skill to manage even for pretty basic use cases. Further, Google's internal build system predates Go modules by a decade or more; why would they pivot to Go modules?
Everyone ends up having to do this kind of stuff, and so the best tools are ones that take this in stride and make it easier to pull off even if it is at the expense of more pain for trivial project; and like, if you are going to try to build a baby version of the tool for beginners, you probably aren't the right person to even figure out what is required of that tool if you wouldn't normally use it.
> Everyone ends up having to do this kind of stuff, and so the best tools are ones that take this in stride and make it easier to pull off even if it is at the expense of more pain for trivial project
I disagree. Lots of projects will never need support for multiple languages, code generation, targeting a vast array of platforms and architectures, etc and for those that do, there are workarounds that are a lot less painful than Bazel or Nix. I can go a looong ways with `go build` and some CI jobs before the pain of Bazel or Nix pay off. Basically, I don't think the "trivial projects vs everyone else" is a very good taxonomy, but rather it's "'approaching FAANG scale' vs everyone else".
If you're in the "approaching FAANG scale" group, then yeah, Bazel probably makes sense. For everyone else language-specific tools cobbled together with CI really is the least bad solution (and by a pretty big margin in my experience).
Is this something other than Bazel (or the progenitor thereof)? If so, calling it an abomination compared to the unfinished go build system is rather strong.
Indeed - Bazel is a complete build system. “go build” is really not, and go modules are typically used for dependency management in Bazel for Go anyway.
"Due to the relatively large traffic requirements of git clones, this represents about 70% of all outgoing network traffic from git.sr.ht. A single module can produce as much as 4 GiB of daily traffic from Google."
What does your ideal git host do when one of its (non-paying) clients uses bots to clones repos so hard that its (paying) human clients are unable to clone their repos?
Improve infrastructure. Apparently they are trivially DDOSable by anyone renting a few servers and running git clone in a loop? That's a problem they should solve. And no, blocking by user agent is not adequate protection.
You're not paying for the traffic load though, Microsoft does. I think it's unfair to compare a forge run by a giant and one run by a tiny company, or for that matter, self hosting users.
- let’s round and say 2000 requests at 1gb = 2000 gb/hour = 48000 gb/day
- AWS bandwidth at $0.02 / gb = $960 / day = $28,800 / month
So, one of the richest companies in the world is charging you nearly $30k monthly because they cannot be bothered to be polite. Would you be ok with that situation?
From what I can tell, per Drew's own characterization of the requests[0], this is wildly off in all aspects. Most repos are hundreds of KB. Large repos are a dozen MB. Per the other guy's characterization[1], his repo was less than 8MB, right in line with what I see in other repos. He saw 4GB of traffic split up across "more than 500" requests. And as Drew said, these bouts of activity aren't sustained even across an entire hour. To top it off, the cost of AWS bandwidth is about the least representative example you could have picked. You've wandered off the path at nearly every available opportunity.
> The situation remained so for over a year. In that time, I was banned from the Go issue tracker without explanation, and was unable to continue discussing the problem with Google
Something tells me there's another side to this story that we aren't hearing.
None of that bears any relation to what happens in the Go community.
Google is not a monolithic entity. The Go team operates pretty much independently from gmail and other stuff. Yes, there is a problem with Google randomly disabling accounts, but this doesn't really extend to Go, who make their own decisions on who to ban or not, and which are all manual. Just as you can comment on, say, github.com/facebook/zstd without a Facebook account, you can also comment on github.com/golang/go without a Google account.
What's the value proposition of not using Github? Github is such an incredibly useful project. I actually go out of my way to avoid projects on Gitlab and co, just because I don't want to have to worry that it's going to just disappear one day because they thought they could out-build Microsoft.
Your argument seems to be based on the idea that Microsoft would do a better job than its competition technically. The fact that you're happy to rely on them also seems to imply you think they're unlikely to act abusively. However, you're writing your comment in a thread about a similar megacorporation, Google, who is acting abusively, because their engineers are saying parsing a robots.txt file would be too difficult.
I actually wasn't making an argument but posing a question. Your reply seems defensive and emotional.
I'm not sure what you mean by "act abusively" since the core of the product is storing and viewing source code. I mean, I could verify checksums if I needed to, and such a thing is common in workplaces I frequent.
What I asked before, and I will reiterate for an additional attempt to your comprehension, is how can Sourcehut store and view source code BETTER than Github can?
Whether Drew's right or wrong, this is disappointing. At the very least, it's going to make recommending Sourcehut as an alternative to Github harder to justify.
This is really unfortunate, especially as sr.ht isn't free (any longer). As far as I know, it's the only remaining managed source hosting/CI option for Mercurial users.
This isn't a bad thing, it was always meant to be financially supported by its users.
The alternatives are ads or for Drew and co to run it at a loss indefinitely.
Go 1.19 added "go mod download -reuse", which lets it be told about the previous download result including the Git commit refs involved and their hashes. If the relevant parts of the server's advertised ref list is unchanged since the previous download, then the refresh will do nothing more than the ref list, which is very cheap.
The proxy.golang.org service has not yet been updated to use -reuse, but it is on our list of planned work for this year.
On the one hand Sourcehut claims this is a big problem for them, but on the other hand Sourcehut also has told us they don't want us to put in a special case to disable background refreshes (see the comment thread elsewhere on this page [1]).
The offer to disable background refreshes until a more complete fix can be deployed still stands, both to Sourcehut and to anyone else who is bothered by the current load. Feel free to post an issue at https://go.dev/issue/new or email me at rsc@golang.org if you would like to opt your server out of background refreshes.
[1] https://news.ycombinator.com/item?id=34311621