Recent change to H1B allow organizations that conduct research "as a fundamental activity" to be eligible for cap exemption status. Can you kindly share your opinion on this?
I don't think this has really expanded the organizations that qualify legally for cap exempt status (we essentially argued this before) but rather signifies a greater openness/flexibility on the part of USCIS to approve organizations as cap exempt organizations. Query, however, whether this openness/flexibility will continue.
Again, I can only comment from the perspective of a user; I haven't worked on the VCS infrastructure.
The obvious generic challenges are availability and security: Firefox has contributors around the globe and if the VCS server goes down then it's hard to get work done (yes, you can work locally, but you can't land patches or ship fixes to users). Firefox is also a pretty high value target, and an attacker with access to the VCS server would be a problem.
To be clear I'm not claiming that there were specific problems related to these things; just that they represent challenges that Mozilla has to deal with when self hosting.
The other obvious problem at scale is performance. With a large repo both read and write performance are concerns. Cloning the repo is the first step that new contributors need to take, and if that's slow then it can be a dealbreaker for many people, especially on less reliable internet. Out hg backend was using replication to help with this [1], but you can see from the link how much complexity that adds.
Firefox has enough contributors that write contention also becomes a problem; for example pushing to the "try" repo (to run local patches through CI) often ended up taking tens of minutes waiting for a lock. This was (recently) mostly hidden from end users by pushing patches through a custom "lando" system that asynchronously queues the actual VCS push rather than blocking the user locally, but that's more of a mitigation than a real solution (lando is still required with the GitHub backend because it becomes the places where custom VCS rules which previously lived directly in the hg server, but which don't map onto GitHub features, are enforced).
I would say that using GitHub only for a public git repository is pretty good value.
It is free and robust, and there is not much bad Microsoft can do to you. Because it is standard git, there is no lockdown. If they make a decision you don't like, migrating is just a git clone. As for the "training copilot" part, it is public, it doesn't change anything that Microsoft hosts the project on their own servers, they can just get the source like anyone else, they probably already do.
Why not Codeberg? I don't know, maybe bandwidth, but if that's standard git, making a mirror on Codeberg should be trivial.
That's why git is awesome. The central repository is just a convention. Technically, there is no difference between the original and the clone. You don't even need to be online to collaborate, as long as you have a way to exchange files.
Question: could I offer a patch without having a GitHub account?
Definitely I can access the source code. The review tools are not on GitHub. But is it even possible to host my proposed changes elsewhere, not on GitHub? I suppose that the answer is negative, but surprises happen.
This is a relatively theoretical question, but it explores the "what bad Microsoft can do to you" avenue: it can close your GitHub account, likely seriously hampering your ability to contribute.
I am banned from GitHub because I didn't want to give them my phone number. They ignored a legally binding GDPR request to delete all my data. I haven't got around to suing them yet.
Recently I also got "rate limited" after opening about three web pages.
Microsoft can do something to you, and that is to arbitrarily deny you access after you've built a dependence on it, and then make you jump through hoops to get access back.
> Recently I also got "rate limited" after opening about three web pages.
People who haven’t used it logged out recently may be surprised to find that they have, for some time, made the site effectively unusable without an account. Doing one search and clicking a couple results gets you temporarily blocked. It’s effectively an account-required website now.
Just opened a private window to try this, I did one search and clicked on four results, then a second search and got a 429 error. That is wild. I guess it's an anti-scraper measure?
Given the occasional articles that crop up showing the sheer volume of badly-behaved (presumably) AI scraper bots this makes all kinds of sense.
I can't find it now, but sometime in the past week or so I saw something that (IIRC) related to the BBC (?) blocking a ton of badly-behaved obvious scraper traffic that was using Meta (?) user-agents but wasn't coming from the ASNs that Meta uses. The graphs looked like this ended up reducing their sustained traffic load by about 80%.
Items where I'm doubting my recall (since I didn't find anything relevant doing some quick searches) are marked with (?)
Weird. Maybe it just hates my last two ISPs (Google Fiber, Frontier).
The usual way I notice I'm not logged in is by getting blocked after interacting with ~3 different parts of the site within a minute. If I search, click a single repo, and stay in that repo without using search, it seems to go OK, but if I interact with search and then a couple repos, or search again, temp-banned.
I made a search index for github repo [1] because it takes quite some time for github to load the repositories page (which is the page to allow searching),
And sometimes even using the exact repo name in Google search, I cannot see the corresponding (non-popular) repo.
At least you had the choice. Many potential contributors live in countries to which GitHub does not support SMS verification but still requires it. So there's a second tier of effectively blocked countries besides the officially sanctioned ones.
When did they ask you for a phone number? Last github account I set up back at the end of February didn't ask for one and does the mandatory 2fa step using a code sent via email.
They nagged me for a year for a phone number, threatening lockout. I finally gave in, so they almost immediately started nagging me to disable SMS 2FA because it is insecure.
This is kind of a weird hill to die on, but you’re well within your rights, so you do you.
However, it is clearly not correct to say that you were banned from GitHub. It’s like saying “I was banned from Google because I refuse to use computing devices.”
Not really a ban, just self flagellation, which, again, whatever works for you.
Give me your social security number or you may not reply to my comments. If you don't give me your social security number, choosing instead to die on this weird hill, it's not correct to say you're banned - you're merely self-flagellating.
This seems like a poor argument. I don't like much either having the obligation to give GitHub my phone number, but it's not the same thing as a social security number, now is it ? Would you argue otherwise ?
Not US but phone number is arguably worse: You can't legally get one without tying it to govt ID anymore and tends to be tied to your current physical location.
And is commonly used for authentication codes, and like Social Security Number, it is PII that should be default-deny.
Github seems to have no legit need for a user's phone number. Since there's not even a way to tell them to go pound sand, I'd say opting out of disclosing sensitive information they don't need by not signing in/up and equating their unreasonable demand with a ban is respectable.
A phone number given to a generally reputable company is hardly equivalent to giving a rando your social security number.
I mean, obviously you disagree with them being generally reputable, but you must realize that’s not a broad opinion, and they are certainly better at preventing data breaches than the average company that stores phone numbers.
Sincerely though, I hope you get your GDPR request sorted.
Hence the qualifier “generally”. I’m not saying they’re above reproach, but I am saying that companies that care far less about data security already have my phone number, such as most/all of my utilities - including my phone company. And those aren’t realistically optional.
> but I am saying that companies that care far less about data security already have my phone number
Not mine and it sucks that this means I'm not welcome as FireFox contributor anymore unless I move countries just to register a monthly contract for a dedicated GitHub-accepted SIM card.
Once you trigger phone-number verification requirement your account is globally shadowbanned and support blocked pending SMS code verification. Aside from the privacy issue it's completely blocking people in the several countries (beyond the ones offially totally banned due to sanctions) to which GitHub won't even try to SMS/call.
Remember that registering a second account would be violating GitHub ToS.
> sucks that this means I'm not welcome as FireFox contributor anymore
Nothing has changed regarding being a contributor. Bugzilla, Phabricator, Lando. You don't really interact with GH other than read-only needs as code search. (Which, funnily, is currently the most rate-limited thing on the whole of GH ;D) — but luckily as long as there's the Hg mirror, Searchfox continues to being used for that as well.
I've been gone for a few years now and have no insight into this decision, so take anything I say with a grain of salt. Having said that, I think that, for better or worse, GitHub is probably the best location simply because it provides the lowest barrier to entry for new contributors.
I know that's spicy enough to trigger dozens of keyboard warriors hitting the reply button, but every little thing that deviates from "the norm" (for better or for worse, GitHub is that) causes a drop-off in people willing to contribute. There are still people out there, for example, who refuse to create an account on bugzilla.mozilla.org (not that this move to GitHub changes that).
I'm not sure codeberg has managed two 9s of uptime while I've been using it. Manageable when it's just a public mirror for occasional publishing of my small hobby projects, but I wouldn't recommend it for Firefox sized projects
[not OP, but making educated guesses from what has already been said]
Given the post above, issues regarding self-hosting were at least part of the reason for the switch so a new self-hosted arrangement is unlikely to have been considered at all.
I don't know what the state of play is right now, but non-self-hosted GitLab has had some notable performance issues (and, less often IIRC, availability issues) in the past. This would be a concern for a popular project with many contributors, especially one with a codebase as large as Firefox.
I had a similar thought. I am disappointed that Mozilla didn't take some of the money they were spending on a self-hosted homegrown solution and throw it to something like Codeberg. I guess that a little funding from the likes of Mozilla could go a long way in helping Forgejo pioneer some super interesting federation.
Of course Mozilla is free to make their own choices. But this choice will be read as the latest alarm bell for many already questioning the spirit of Mozilla management.
If availability is on option then why Github? It doesn't support ipv6 and just cur people from part of the world. It denies access from Iran and other countries that US govs "doesn't like". I understand when small projects are hosted on Github, but Firefox should be much bigger to fit on Github.
I guess it's the CI/CD infrastructure. Pipeline and time requirement grows exponentially as the code supports more operating systems and configurations.
I used a GitLab + GitLab Runner (docker) pipeline for my Ph.D. project which did some verification after every push (since the code was scientific), and even that took 10 minutes to complete even if it was pretty basic. Debian's some packages need more than three hours in their own CI/CD pipeline.
Something like Mozilla Firefox, which is tested against regressions, performance, etc. (see https://www.arewefastyet.com) needs serious infrastructure and compute time to build in n different configurations (stable / testing / nightly + all the operating systems it supports) and then test at that scale. This needs essentially a server farm, to complete in reasonable time.
An infrastructure of that size needs at least two competent people to keep it connected to all relevant cogs and running at full performance, too.
This is all true, but as the sibling says, not really related to the change discussed here.
Firefox does indeed have a large CI system and ends up running thousands of jobs on each push to main (formerly mozilla-central), covering builds, linting, multiple testsuites, performance testing, etc. all across multiple platforms and configurations. In addition there are "try" pushes for work in progress patches, and various other kinds of non-CI tasks (e.g. fuzzing). That is all run on our taskcluster system and I don't believe there are any plans to change that.
Your guess is wrong as Firefox doesn't use GitHub for any of that, and AFAIK there are no plans to either.
The blog post linked in the top comment goes in to this in some detail, but in brief: git log, clone, diff, showing files, blame, etc. is CPU expensive. You can see this locally on large repo if you try something like "git log path/to/dir".
Add to this all the standard requirements of running any server that needs to be 1) fast, and 2) highly available.
And why bother when there's a free service available for you?
If the CI/CD is the most intensive part, it seems reasonable to move all of the other parts to a free provider to focus on the part that would be harder and more expensive to move. Even if they don't ever move any of the CI/CD over, I feel like I can understand the rationale for reducing the scope to just that rather than the source hosting. I've worked on plenty of projects with way less traffic than Firefox over the years that used GitHub for source hosting but alternate CI/CD; GitHub didn't even have built in CI for a while, so that was the only way to use it.
Given the frequency I see comments on this site about Mozilla trying to do far too much rather than just focusing their efforts on core stuff like Firefox, I'm honestly a bit surprised that there aren't more people agreeing with this decision. Even with the other issues I have with Mozilla lately (like the whole debacle over the privacy policy changes and the extremely bizarre follow-up about what the definition of "selling user data" is), I don't see it as hypocritical to use GitHub while maintaining a stance that open solutions are better than closed ones because I think trying to make an open browser in the current era is a large and complicated goal for it to be worth it to set a high bar for taking on additional fights. Insisting on spending effort on maintaining their own version control servers feels like a effort that they don't need to be taking on right now, and I'd much rather than Mozilla pick their battles carefully like this more often than less. Trying to fight for more open source hosting at this point is a large enough battle that maybe it would make more sense for a separate organization focused on that to be leading the front in that regard; providing an alternative to Chrome is a big enough struggle that it's not crazy for them to decide that GitHub's dominance has to be someone else's problem.
Yeah, I agree that everything that helps reduce maintenance overhead is good for Mozilla (although I believe there’s more low-hanging fruits that could be addressed before that).
I would love to see Mozilla moving to Codeberg.org (though I’d ask if they’re okay with it first) or something like that. Using GitHub is okay-ish? Personally, I frown upon it, but again I agree – it’s not the most important issue right now.
I think it can be done half/half. Do some, well-defined builds at GitHub and pull in for testing. Another comment tells that some users needed 10+ minutes to get a lock to pass their tests through CI, so maybe some sanity tests can be offloaded to GitHub actions.
I'm not claiming that my comment was 100% accurate, but they plan to move some of the CI to GitHub, at least.
> but they plan to move some of the CI to GitHub, at least
Really? I've seen no indication of that anywhere, and I'd be amazed if they did.
They're not using github PRs, and github actions really fights against other development workflows... not to mention they already have invested a lot in TaskCluster, and specialized it to their needs.
Recent changes to H1B allowing any organization that conduct research "as a fundamental activity" to be eligible for cap exemption status. What's your comment on this?
Based on your work experience with startup, do they ever fit this criteria?
Interesting post. Thanks you for taking the time to write it.
In the long stack trace example, the first two traces start at the same line 277 which, I think, solves part of the problem you've posed. I need to look at the source code for the rest of the stack trace.
Traceback (most recent call last):
File "requests\packages\urllib3\contrib\pyopenssl.py", line 277, in recv_into
return self.connection.recv_into(*args, \*kwargs)
File "OpenSSL\SSL.py", line 1335, in recv_into
self._raise_ssl_error(self._ssl, result)
File "OpenSSL\SSL.py", line 1149, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "requests\packages\urllib3\contrib\pyopenssl.py", line 277, in recv_into
return self.connection.recv_into(*args, \*kwargs)
File "OpenSSL\SSL.py", line 1335, in recv_into
self._raise_ssl_error(self._ssl, result)
File "OpenSSL\SSL.py", line 1166, in _raise_ssl_error
raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (10054, 'WSAECONNRESET')*
Not the author, but I make micro commits because it helps me backtrack my implementation decisions.
A lot of times I would look at a line of code thinking why I would write something like that and find the reasoning by git blaming that specific line or chunk of code.
In short, I do it to make up for my poor memory.
Interesting, I've almost never had this problem and I don't have a great memory. Not worth making small commits just for that reason (for me, of course!)