I’m at Google today and even with all the resources, I am absolutely most bottlenecked by the Presubmit TAP and human review latency. Making CLs in the editor takes me a few hours. Getting them in the system takes days and sometimes weeks.
I am also at Google, and I can corroborate this experience personally and corroborate this based off of comments teammates make to me, in group settings, and in team retrospectives.
There are a lot of technical challenges in maintaining code health in a monorepo with 100k+ active contributors, so teams and individuals get a lot of plausible excuses for kicking the problem down the road, and truly improving code health is not appropriately incentivized. One common occurrence is a broken monorepo, so one just waits until someone fixes the monorepo, and you retry submitting your code again. It's such a common occurrence that people generally do not investigate brokenness, and maybe the monorepo wasn't broken but your code change actually made things even flakier, but no one would be able to distinguish that from a broken monorepo that eventually got fixed when no one bothers to check anymore.
Indeed. You'd think Google would test for how well people will cope with boredom, rather than their bait-and-switch interviews that make it seem like you'll be solving l33tcode every evening.
You think people work on a single issue at a time?
Maybe at Google they can afford that, where I worked at some point I was working 2 or 3 projects switching between issues. Of course all projects were the same tech and mostly the same setup, but business logic and tasks were different.
If I have to wait 2-3 hours I have code to review, bug fixes in different places to implement. Even on a single project if you wait 2 hours till your code lands test env and have nothing else to do someone is mismanaging the process.
Yes and no, I'd estimate 1/3 to 1/2 of that is down to test suites are flaky and time-consuming to run. IIRC shortest build I had was 52m for Android Wear iOS app, easily 3 hours for Android.
While not at Google for myself a lot of the CI test failures just become knock on effects from complex interdependent CI components delivering the whole experience. Oops Artifactory or GitHub rate limited you. Oops the SAST checker from some new vendor just never finished. Even if your code passes locally the added complexity of CI can often be fraught with flaky and confusing errors that are intermittent or run afoul based on environmental problems that particular moment you tried.