Hacker News new | past | comments | ask | show | jobs | submit | pradn's comments login

There are a few examples of oral cultures ensuring perfect transmission across a large expanse of time.

One example is the several different recitation styles used to memorize Sanskrit verses. These methods give memorizers multiple ways to remember a line, and also prevent errors like the inadvertent mixing of adjacent words (" euphonic combination"). The "checksumming schemes" are far more elaborate than you'd imagine. [1] The result is perfect transmission of text and its pronunciation, including pitch accent.

Another example is a "multi-party verification" scheme in some Aboriginal Australian cultures. "... storytelling among contemporary Aboriginal people can involve the deliberate tracking of teaching responsibilities. For example, a man teaches the stories of his country to his children. His son has his knowledge of those stories judged by his sister’s children—for certain kin are explicitly tasked with ensuring that those stories are learned and recounted properly—and people take those responsibilities seriously. ... the 'owner-manager' relationship, requiring a story to be discussed explicitly across three generations of a patriline, constitutes a cross-generational mechanism which may be particularly successful at maximising precision in replication of a story across successive generations". [2]

[1]: https://en.wikipedia.org/wiki/Vedic_chant

[2]: "Aboriginal Memories of Inundation of the Australian Coast Dating from More than 7000 Years Ago", Patrick D. Nunn, https://doi.org/10.1080/00049182.2015.1077539


I think the harsh conditions in Australia, where there a thousand ways to die, put the onus on truth and preservation of it in the oral tradition. Unlike most other cultures where it's kind of ok to gradually deviate from the truth. It is worth paying attention to the aborigine stories.They have also made astronomical observations that are pretty accurate to the date. https://cosmosmagazine.com/space/australias-indigenous-peopl...

Singing/Ballads are also popular way of preserving information fidelity. Homer's works were based on ballads. We don't put premium on memory now, but before the invention of printing press, memory techniques were widely studied and used.


It kind of feels like a local optima situation where they relied on memory so much that they didn't feel the need for writing. Either that or the fact that there was no writing for so long resulted in them specialising in memory techniques a lot.

We didn't evolve reading and writing. Literacy is just a recently developed hack. Memorization was the only way for the vast majority of human history.

Who is we?

I've also heard it about the Buddhist tradition, that at least three groups of monks would memorize parts of the canon, and they would preiodically come together to chant it. If one group differed from the other two, they would know that's an error (at least to fairly high probability). This seems to have been an accurate method of transmission since independently written-down versions separated by centuries of oral tradition in Gandhara and Sri Lanka are very similar.

There are worse schemes, but that is barely comparable to Vedic chanting.

3-pick-2 as a method is quite good but also quite vulnerable to a bunch of transmission errors beyond simple memory issues. Over long periods of time there is a higher risk of group think or status plays leading to wild changes.

Of course the vedas aren't immune, but it is a lot more effort to get them wrong than just a few people thinking "we want to change the story". A group would have to be hugely motivated to successfully change the chant.


Google's monorepo, and it's not even close - primarily for the tooling:

* Creating a mutable snapshot of the entire codebase takes a second or two.

* Builds are perfectly reproducible, and happen on build clusters. Entire C++ servers with hundreds of thousands of lines of code can be built from scratch in a minute or two tops.

* The build config language is really simple and concise.

* Code search across the entire codebase is instant.

* File history loads in an instant.

* Line-by-line blame loads in a few seconds.

* Nearly all files in supported languages have instant symbol lookup.

* There's a consistent style enforced by a shared culture, auto-linters, and presubmits.

* Shortcuts for deep-linking to a file/version/line make sharing code easy-peasy.

* A ton of presubmit checks ensure uniform code/test quality.

* Code reviews are required, and so is pairing tests with code changes.


I always find these comments about interesting, having worked at Facebook and Google, I never quite felt this way about Google's Monorepo. Facebook had many of the features you listed and quite performantly if not more so. Compared with working at Facebook where there are no owners owners files and no readability requirements, I found abstraction boundries to be much cleaner at FB. At google, I found there was a ton of cruft in Google's monorepos that were too challenging / too much work for any one person to address.

For non-Googlers like me, here's some background about "readability":

https://www.moderndescartes.com/essays/readability/


OWNERS files rarely get in the way - you can always send a code change to an OWNER. They are also good for finding points of contact quickly, for files where the history is in the far past and changes haven't been made recently.

Readability really does help new engineers get up to speed on the style guide, and learn of common libraries they might not have known before. It can be annoying - hell, I'll have to get on the Go queue soon - but that's ok.


I think I have heard similar things from other googlers and I think there might be two factors on why I think this:

- I worked on Google Assistant which was responsible for integrating many services. This meant I had to work with other peoples code way more regularly that many at google.

- I moved from FB to google - I'm not really sure how many people have had this experience. I think many of my colleagues at google found it surprising how many of the things they thought were unique to google actually also existed at FB.

At the end of the day, any of these processes have pros/cons but I think the cruft of having APIs that are a couple steps harder to evolve due to finding Readability/Owners for everything you touch just makes things slightly less cohesive and a trickier place to have a "good" codebase.

When I worked at FB, I would frequently rebase my code on Monday and find that, for example, the React framework authors or another smaller infra team had improved the API and had changed *every* callsite in the codebase to be improved. This type of iteration was possible in certain situations but was just much less common at google than at fb.


> I think many of my colleagues at google found it surprising how many of the things they thought were unique to google actually also existed at FB.

Google workers are groomed to believe Google is the best, and hence they are too. A corollary of that, then, is that nobody else has it that good, when in fact, others sometimes have it better.


I also made the move from FB to G and echo everything said above. Googlers have a massive superiority complex. In reality, it's naiveté.

My 2 cents: OWNERS is fairly useful, if only as a form of automating code reviewer selection. Readabilty is a massive drag on org-wide productivity. I have had diffs/CLs take MONTHS to be approved by every Tom Dick and Harry whose claws were added to my code and made me re-design whole project approaches, and they were only there because they're supposed to check if my new-lines are in the right spot for that language. I thought about quitting.


People really underestimate how much productivity drain there is in having a bad code review culture. One of the worst things about working at Amazon was that any feedback on a merge request, no matter how small, required you to request a re-review.

It's not culture (organic), it's systems (planned): if you don't, on day zero, agree what code reviews should cover (and what not), then code reviews are a pissing contest first, and a useful tool second.

I noticed a lot of people don't understand both the limitations of code reviews, and what issues they can and should solve. Writing good critique (of anything, not just code) is hard, we don't train people to do it, and usually don't even regard this as something that needs training and understanding.


+1.

Going from FB to $REDACTED to Oculus was a pretty wild ride, there were a lot of different cultures, though I think generally speaking the best qualities filtered through.

(also, howdy former teammate)


(trying to recall unixname to human name mapping.....)

No, we’re quite aware the world outside has been catching up. There’s even a famous doc by a senior director about it…

The system still grooms Googlers to think they're better than though. Until that root cause would be fixed (which would have to be at a huge cost to Google, so no surprise it'd never be), nothing would change.

Huh? Facebook has a lot of that infra because ex-Googlers built it there. It takes an insane amount of delusion to notice something common between a father and a son and say that the dad inherited it.

QED

This isn't true at all for OWNERS files. If you try developing a small feature on google search, it will require plumbing data through at least four to five layers and there is a different set of OWNERS for each layer. You'll spend at least 3 days waiting for code reviews to go through for something as simple as adding a new field.

3 days for a new change on the biggest service on the planet? Not bad.

I agree that it could be worse! Facebook has significant (if not more) time spent and I found adding features to news feed a heck of a lot easier than adding features that interacted with google search. Generally a lot of this had to do with the number of people needed to be involved to ensure that the change was safe which always felt higher at Google.

I'm only an outside observer in this conversation but could it be that the review process (or the lack thereof) and the ease with which you can add new features has had an impact on the quality of the software?

The thing is, in my experience as a user Facebook (the product, not the former company) is absolutely riddled with bugs. I have largely stopped using it because I used to constantly run into severe UI/UX issues (text input no longer working, scrolling doing weird things, abysmal performance, …), loading errors (comments & posts disappearing and reappearing), etc. Looking at the overall application (and e.g. the quality of the news feed output), it's also quite clear that many people with many different ideas have worked on it over time.

In contrast, Google search still works reasonably well overall 25 years later.


There are pretty different uptime and stability requirements for a social product and web search (or other Google products like Gmail). When news feed is broken life moves on, when those products break many people can't get any work done at all.

One of Google's major cultural challenges is imposing the move slow and carefully culture on everything though.


It’s not considered ok for newsfeed to break. It would be a massive issue that would command the full attention of everyone.

And yet folks who are on call for it say things like this: https://news.ycombinator.com/item?id=40826497

I have the same background: I find the code quality at G to be quite a lot higher (and test pass-rate, and bug report-rate lower) than News Feed, which was a total shit-show of anything-goes. I still hold trauma from being oncall for Feed. 70 bugs added to my queue per day.

The flip side is of course that I could complete 4 rounds of QuickExperiment and Deltoid to get Product Market Fit, in the time it takes to get to dogfooding for any feature in Google.


Huh, also having worked at both I had exactly the opposite experience. Google’s tools looked ugly but just worked. At meta there were actually multiple repos you might have to touch and tools worked unreliably across them. Owners files made sure there was less abandoned code and parent owners woild be found by gwsqueue bots to sign off on big changes across large parts of the repo by just reading these files.

Same and another vote for meta. Meta made the language fit their use case. Go into bootcamp change the search bar text to ‘this is a search bar!’ press F5 and see the change (just don’t ship that change ;D). It’s incredibly smooth and easy.

Googles a mess. There’s always a migration to the latest microservices stack that have been taking years and will take many more years to come.

Like meta just changed the damn language they work in to fit their needs and moved on. Google rewrites everything to fit the language. The former method is better in a large codebase. Meta is way easier to get shit done to the point that google was left in the dust last time they competed with meta.


I think what you're saying is true for www, but not fbcode, and the later starts to look a lot like google3. I agree though, Meta's www codebase has the best developer experience in the industry.

> no owners owners files and no readability requirements

Move fast and break things, right?


I don’t think that owners files are the best way to ensure things don’t break.

I’m just one incompetent dev, but I’ll throw this in the convo just to have my perspective represented: every individual part of the google code experience was awesome because everyone cared a ton about quality and efficiency, but the overall ecosystem created as a result of all these little pet projects was to a large extent unmanaged, making it difficult to operate effectively (or, in my case, basically at all). When you join, one of the go-to jokes in their little intro class these days is “TFW you’re told that the old tool is deprecated, but the new tool is still in beta”; everyone laughs along, but hopefully a few are thinking “uhhh wtf”.

To end on as nice of a note as possible for the poor Googs: of all the things you bring up, the one I’d highlight the biggest difference on is Code Search. It’s just incredible having that level of deep semantic access to the huge repo, and people were way more comfortable saying “oh let’s take a look at that code” ad-hoc there than I think is typical. That was pretty awesome.


Imho the reason for the deprecated and beta thing is because there is a constant forward momentum.

Best practices, recommendations and tooling is constantly evolving and requires investment in uptake.

I sometimes feel like everything is legacy the moment it's submitted and in a constant state of migration.

This requires time and resources that can slow the pace of development for new features.

The flips side is this actually makes the overall codebase less fractured. This consistency or common set of assumptions is what allows people to build tools and features that work horizontally across many teams/projects.

This constant forward momentum to fight inconsistency is what allows google3 to scale and keep macro level development velocity to scale relative to complexity.


That’s all well said, thanks for sharing your perspective! Gives me some things to reflect on. I of course agree re:forward momentum, but I hope they’re able to regain some grace in that momentum with better organization going forward. I guess I was gesturing to people “passing the buck” on hard questions of team alignment and mutually exclusive decisions. Obviously I can’t cite specifics bc of secrecy and bad memory in equal amounts, so it’s very possible that I had a distorted view.

I will say, one of the things that hit me the hardest when the layoffs finally hit was all the people who have given their professional lives to making some seriously incredible dev tools, only to be made to feel disposable and overpaid so the suits could look good to the shareholders for a quarter or two. Perhaps they have a master vision, but I’m afraid one of our best hopes for an ethical-ish megacorp—or at least vaguely pro social—is being run for short term gain :(

However that turns out for society, hopefully it ends up releasing all those tools for us to enjoy! Mark my words, colab.google.com will be shockingly popular 5y from now, if they survive till then


I guarantee you that the master vision is exactly what you wrote.

Google is not a software company, it is an advertising system providing cash-flow to a hedge fund that comprises a large part of every pension and retirement fund in America. It's far too important as simply a financial entity to risk anything on ....product development.


My experience with google3 was a bit different. I was shocked at how big things had gotten without collapsing, which is down to thousands of Googlers working to build world-class internal tooling. But you could see where the priorities were. Code Search was excellent - I'd rate it 10/10 if they asked.

The build system always felt more like a necessary evil than anything else. In some parts of google3 you needed three separate declarations of all module dependencies. You could have Angular's runtime dependency injection graph, the Javascript ESM graph, and the Blaze graph which all need to be in sync. Now, the beautiful part was that this still worked. And The final Blaze level means you can have a Typescript codebase that depends on a Java module written in a completely unrelated part of google3, which itself depends on vendored C++ code somewhere else. Updating the vendored C++ code would cause all downstream code to rebuild and retest. But this is a multi billion dollar solution to problems that 99.99% of companies do not have. They are throwing thousands of smart people at a problem that almost everyone else has "solved" by default simply by being a smaller company.

The one tooling I think every company could make use of but doesn't seem to have were all of the little hacks in the build system (maybe not technically part of Blaze?). You could require a developer who updates the file at /path/to/department/a/src/foo.java to simultaneously include a patch to /path/to/department/b/src/bar.java. Many files would have implicit dependency on each other outside of the build graph and a human is needed to review if extra changes are needed. And that's just one of a hundred little tricks project maintainers can employ.

The quality of the code was uniformly at least "workable" (co-workers updating parts of the Android system would probably not agree with that - many critical system components were written by one person poorly who soon after quit).


> But this is a multi billion dollar solution to problems that 99.99% of companies do not have.

I know it's trendy for people to advocate for simple architectures, but the honest-to-god truth is that it's insane that builds work ANY OTHER WAY. One of the highest priorities companies should have is to reduce siloing, and I can barely think of a better way to guarantee silos than by having 300 slightly different build systems.

There is a reason why Google can take a new grad SWE who barely knows how to code and turn them into a revenue machine. I've worked at several other places but none of them have had internal infrastructure as nice as the monorepo; it was the least amount of stress I've ever felt deploying huge changes.

Another amazing thing that I don't see mentioned enough was how robust the automatic deployments with Boq/Annealing/Stubby were. The internal observability library would automatically capture RPC traces from both the client and server, and the canary controller would do a simple p-test on whether or not the new service had a higher error rate than the old one. If it did? The rollback CL would be automatically submitted and you'd get a ping.

This might sound meh until I point out that EVEN CONFIG CHANGES were versioned and canaried.


I've worked at the majority of FAANG.

Facebook's build system works the same as Googles, because most of FB's infra was made by ex-Googlers around 10-15 years ago. The worst thing I can say about Blaze is basically already pointed out above, sometimes you need to write little notes to the presubmit system to ensure cross-boundary updates. Whatever, it's all text files in the end.

The wildest was at Apple. It's just as you said, 300 build systems. Not only that, but 300 source code repositories! Two teams in the same hall that hang out all the time could be using git and svn, for no good reason besides what someone wrote "init" in 20 years ago. There was no cross team communication, by design, because Steve Jobs was paranoid. Their sync mechanism was to build the entire stack once a night, and force everyone to full-reinstall their OS and toolchain to "rebase". Insane.


I definitely agree most companies should use a monorepo. Most companies don't need Blaze, though.

And the whole rollout system was excellent. I wish that tech was standard but I have a vague idea of how much work that would be to implement and few companies will be able to afford to get that right.

Edit: I forgot to mention - I absolutely hated Cider and all of the included plugins. Sure the code at Google was fine but the code editing experience destroyed all of the fun of coding. Is that function signature correct? You'll find out in 45 seconds when the Intellisense completes! And when I was there Copilot was a thing outside of Google but we were not allowed to use any AI (even Google's own AI) to write code. The whole situation was so bad I wrote a few paragraphs about it in my offboarding survey.


> You could require a developer who updates the file at /path/to/department/a/src/foo.java to simultaneously include a patch to /path/to/department/b/src/bar.java.

Could you elaborate on how this worked exactly?


There's a configuration directive you put in a plain text file in the monorepo which lets you configure:

* File A's path

* File B's path

* The message shown on the commit's review page if B isn't updated when A is updated

* How a developer can override the alert (Which would be a custom named directive added to a commit message, like "SKIP_FILE_SYNC_ALERT=true")

You then need to either commit a diff to file B when file A is changed or override it in order to get the commit added to HEAD. This is just one of many different "plugins" for the CI system that can be configured with code.


Thanks!

Interesting to note that almost all of these are to do with tooling _around_ the codebase, not the contents _of_ the codebase!

Just like the man is the product of his genetic code, the codebase is invariably the product of the constraints on its edits enforced by tooling.

so we beat on, commits against the tooling, borne back ceaselessly into the technical debt

Is that true, though? Is the code itself good? Because it is sorely absent from GP's list...

If you are trying to say that people can't make bad code with good tools, I don't agree.


To extend the previous commenter's simile - a man is a _product_ of his genetic code, but is also affected by environment. Bringing it back to the point at hand - yes, you are right that people can make bad code with good tools, but they'll be _much more likely_ to make good code with them (and vice versa).

This Ask HN is not about "the code that should logically be best" but "the best code". There is no need for likelihood, people who have worked on it can report whether it is the case.

And people here seem to praise the tooling exclusively...

I would also point out that good tooling makes for good code, but big scale makes for bad legacy code. It is not at all obvious to me which of those effects should prevail at Google.


Question I've always wondered: Does Google's monorepo provide all its engineers access to ALL its code?

If yes, given the sheer number of developers, why haven't we seen a leak of Google code in the past (disgruntled employee, accidental button, stolen laptop, etc)?

Also how do they handle "Skunkworks" stlye top-secret projects that need to fly under the radar until product launch?


Partial check outs are standard because the entire code base is enormous. People only check out the parts they might be changing and the rest magically appears during the build as needed.

There are sections of the code that are High Intellectual Property. Stuff that deals with spam fighting, for example. I once worked on tooling to help make that code less likely to be accidentally exposed.

Disclaimer: I used to work there, but that was a while back. They probably changed everything a few times since. The need to protect certain code will never go way, however.


The very very important stuff is hidden, and the only two examples anyone ever gives are core search ranking algorithms and the self-driving car.

Even the battle-tested hyper-optimized, debugged-over-15-years implementation of Paxos is accessible. Though I’m sure folks could point out other valuable files/directories.


Former employee here. I remember a third example: the anti-DoS code is hidden. I remember this because I needed to do some very complicated custom anti-DoS configuration and as was my standard practice, I looked into how the configuration was being applied. I was denied access.

Fourth example: portions of the code responsible for extracting signals from employees' computers to detect suspicious activity and intrusion. I suspect it's because if an employee wants to do something nefarious they couldn't just read the code to figure out how to evade detection. I only knew about this example because that hidden code made RPC calls to a service I owned; I changed certain aspect of my service and it broke them. Of course they fixed it on their own; I only got a post-submit breakage notification.


Google3 monorepo source isn't, by policy, supposed to leave the corp network workstations, and can't even be on your corporate provided laptop (except for some edge cases in mobile apps dev). Even during full COVID lockdown WFH we had to remote into our machines. (I worked on embedded stuff and had to compile on my office workstation, scp the binaries home, and flash my device, and repeat. Super slow cycle.)

So, anyways, source code being basically on-premise only and on machines that they can fully audit and control... Would you be stupid enough to "cp -r srccheckout /media/MYUSBSTICK" on such a box?

Also believe it or not they used to have a very open internal culture at Google because the bulk of employees genuinely liked the company and its stated mission and there was a bit of a social contract that seemed to be upheld. Stuff didn't generally leak out of the wide open all hands, even. Past tense.


Google laptops and workstations (anything that can actually access srcfs to get this data) are extremely monitored and controlled.

Very critical stuff (ranking, spam/abuse, etc) can be further protected via silos which lock down sections of the code base (but still allow limited interactions with the build).

Google spent significant engineering $$$ into its development tools and policies (generally building custom with no intent to ever monetize vs buying). I don't see a company today, in this climate, that would emulate that decision.


I don't see a company today, in this climate, that would emulate that decision.

Why not? There are a lot of benefits from owning the tooling and being able to tailor it to do exactly what you want.


It's extremely easy to detect a disgruntled employee making a copy of source code. There's no accidental button to leak. There's no source code on laptops as policy doesn't allow it, with limited exceptions only.

But there was a giant leak a long time ago. It was called Operation Aurora done by China. Legend has it that to this date the Chinese search engine Baidu still uses stolen code from Google.


Within the monorepo there is the notion of "silos" where access to directories can be restricted to groups of people/bots. Though I believe that's exceedingly rare, I've never come across one.

I suspect this is part of the interview process and why it takes so long and so many people.

Character and trustworthiness is extremely important.


The Google interview process is overly focused on algorithm skills and absolutely does not select for character and trustworthiness. In fact the leaks from Google to the news started circa 2017 and in response the leadership basically neutered internal forums like TGIF and memegen. Remember the Damore incident? While Damore was wrong, it wouldn't be as big of a deal if the incident wasn't leaked to the press. It's clear that Google would be a much better company if its interview process actually accounted for character and trustworthiness.

The old article, Three Years of Misery Inside Google, the Happiest Company in Tech is still the best description of what went wrong inside Google: https://www.wired.com/story/inside-google-three-years-misery...


I have interviewed with Google. It's not explicitly tested for but you can be sure the interviewers will have an opinion of your character.

Also, (in the middle of six hours of interviewing) for lunch I lunched with someone completely outside of the particular group I was interviewing for. Rather genial chap and I'm sure his opinion was sought too.

I'm not downplaying the tech questions, I'm saying there's a meta/side evaluation as well.


There hasn't been a lunch interview since the pandemic. Everything is online. The old Google is no more.

Also I don't think Damore was wrong .. or right. He was certainly naive.

Edit - I guess there hasn't been zero leaks: https://searchengineland.com/google-search-document-leak-ran...

Code search not just across the entire code base but across all of time.

I recently left Google and knew it was going to be a step down from Google's build ecosystem, but I wasn't prepared for how far a step down it would be. It's the only thing I miss about the place, it' so awesome.

People who have never had it have no concept of how much they are missing. It's so frustrating.

Why do so many people like monorepos?

I tend to much prefer splitting out reusable packages into their own repos with their own packaging and unit tests and tagging to whatever version of that package. It makes it MUCH easier for someone to work on something with minimal overhead and be able to understand every line in the repo they are actually editing.

It also allows reusable components to have their own maintainers, and allows for better delegation of a large team of engineers.


Have you ever worked at FB / Google / whatever other company has huge mono repo with great tooling?

I went from many years at FB, to a place like you describe - hundreds of small repos, all versioned. It’s a nightmare to change anything. Endless git cloning and pulling and rebase. Endless issues since every repo ends up being configured slightly differently, and very hard to keep the repo metadata (think stuff like commit rules, merge rules, etc) up to date. It’s seriously much harder to be productive than with a well-oiled monorepo.

With a monorepo, you wanna update some library code to slightly change its API? Great, put up a code change for it, and you’ll quickly see whether it’s compatible or not with the rest of the whole codebase, and you can then fix whatever build issues arise, and then be confident it works everywhere wherever it’s imported. It might sound fragile, but it really isn’t if the tooling is there.


I have worked at a company that has huge monorepos and bad tooling.

Tooling isn't the problem though, the problems are:

- multiple monorepos copying code from each other, despite that code should be a library or installable python package or even deb package of its own

- you will never understand the entire monorepo, so you will never understand what things you might break. with polyrepos different parts can be locked down to different versions of other parts. imagine if every machine learning model had a copy of the pytorch source in it instead of just specifying torch==2.1.0 in requirements.txt?

- "dockerize the pile of mess and ship" which doesn't work well if your user wants to use it inside another container

- any time you want to commit code, 50000 people have committed code in-between and you're already behind on 10 refactors. by the time you refactor so that your change works, 4000 more commits have happened

- the monorepo takes 1 hour to compile, with nothing to compile and unit test only a part of it

- ownership of different parts of the codebase is difficult to track; code reviews are a mess


I think all of your problems are root caused by the phrase "multiple monorepos". This sounds more like "polyrepos" or something, multiple siloed repos that themselves might contained disparate projects, languages and tooling.

Google3 is a true monorepo. 99% of code in the company is one repo, with minor exceptions for certain open source projects and locked down code.

Edit: for example, you can change YouTube code, Google Assistant code, a Maps API, some config for how URLs are routed at top level load balancers, etc all in one CL if you really wanted to/needed to.


All of these are solved with tooling.

- dont have multiple monorepos

- use blaze or similar and run all downstream tests, the binary for your ml model includes pytorch whether you build from source or requirements.txt.

- other people committing doesn't matter if you are isolating presubmits and similar.

- using a blaze-like you never compile the whole monorepo

- code owners etc. makes this straightforward.

Like, as someone whose career has been mostly at Google, these are not problems I encounter at all, or only in cases where you'd have similar scope of problems no matter the repo structure.


> other people committing doesn't matter if you are isolating presubmits and similar.

Could you elaborate on this one?


If you only need to run the tests you affect, and only need to sync and update files touched in your CL, external changes are generally not impactful, sync & submit is a quick process even if people are submitting things elsewhere.

It's only a problem if someone submits a file you're touching, in which case you just have standard merge conflict issues.


How do you handle tagging, the whole repo gets a new tag / version every time something change?

Yes, all changes (CLs) are atomic operations. The CL number reflects a specific moment in time for the entire repo.

If you have a reusable component in a separate repository and need a change, you have to submit that, merge, release, then bump the version in the downstream project to use it. Then if someone else uses the project updates but hits an issue that you introduced they have to go fix it, perhaps a month later, with no context of what changed. Or they just don’t upgrade the version, and reimplement what they need. With a monorepo it would be one change, and your change breaking someone else’s would get flagged and fixed with the code change. I’ve seen the amount of shared code get less and less and more stale with polyrepo.

Up to what scale?

This works well for a couple dozen repos per team in my experience. It’s also my preferred way to work.

It doesn’t scale so well to hundreds of repos per team without significant tooling. At some point anything cross-cutting (build tool updates, library updates, etc) becomes hard to track. Repos are left behind as folks change teams and teams are reorg’d.

I’ve never worked in a monorepo, but I can see the appeal for large, atomic changes especially.


I can change a dependency and my code at the same time and not need to wait for the change to get picked up and deployed separately. (If they are in the same binary. Still need cross binary changes to be made in order and be rollback safe and all that.)

Google's code, tooling and accompanying practices are developing a reputation for being largely useless outside Google ... and many are starting to suspect it's alleged value even inside Google is mostly cult dogma.

Google used to have a near monopoly on the most expensive, educated, devoted, and conscientiously willful people and imposed very few demands on their time. The lengths to which they were willing to go, to make everything they did with the tools pleasant and elegant, was orders of magnitude beyond anything I'd seen.

Some of us thought that the magic of these people would be imbued in the dev tools that they created, so if enterprises adopted the tools, then they'd reap the benefits of that same magic too. But this simply wasn't true. The tools didn't actually matter; it was the way they used them.

For example, when other companies started adopting tools like Bazel (open source Blaze) they wanted features like being able to launch ./configure scripts inside Bazel, which totally violates the whole point of Bazel, and never would have been allowed or even considered inside Google. The Bazel team was more than happy to oblige, and the users ended up with the worst of all worlds.


If Google open sourced their BUILD files for public libraries, we wouldn’t be able to use workarounds… Migrating something complex like ffmpeg to Bazel is not something trivial.

Bazel is an awesome tool though, I’m very glad it was open sourced and receives constant attention from Google.


Google's systems were designed to index mountains of low value data at hitherto unseen scale, and they're good at that. But, to-the-second system-wide precision with full audit trails ... not so much.

You keep seeing startups with ex-Googlers that think they can "disrupt" Fintech with Google's "secret sauce" ... this tends to go badly.

I've had to clean up one of these messes where, in all seriousness, even a pre-2000 LAMP stack (never mind Java) implemented by people who understood the finance domain would have worked better.


> Google's code, tooling and accompanying practices are developing a reputation for being largely useless outside Google ...

Not that I don’t believe you, but where do you see this?


I can vouch for it. It's the main reason I quit: none of the "hard" skills necessary to code at Google were transferrable anywhere outside of Google. It would have been easy enough to skate and use "soft" skills to move up the management ladder and cash big checks, but I wasn't interested in that.

The reason it's not transferrable is that Google has its own version of EVERYTHING: version control, an IDE, build tools, JavaScript libraries, templating libraries, etc, etc. The only thing I can think of that we used that wasn't invented at Google was SCSS, and that was a very recent addition. Google didn't even use its own open-source libraries like Angular. None of the technologies were remotely usable outside Google.

It might sound cool to use only in-house stuff, and I understand the arguments about licensing. But it meant that everything was poorly-documented, had bugs and missing features that lingered for years, and it was impossible to find a SME because whoever initially built a technology had moved on to other things and left a mess behind them.

Some people may be able to deal with the excruciating slowness and scattered-ness, and may be OK with working on a teeny slice of the pie in the expectation that years later they'll get to own a bigger slice. But that ain't me so I noped out as soon as my shares vested.


12 year current Googler here. You are absolutely correct about "Google has its own version of EVERYTHING". Midway through my current career, I started to get existential dread about the fact that I wasn't "up to date" on any current development practices or frameworks.

Partly, this was assuaged through participating in open source projects in my free time. That's how I learned Docker, Github workflow, React, Vue, Bootstrap, Tailwind, etc.

But at the same time, I think it is a mistake to consider working with tools/languages/frameworks to be the only "hard" skills. Galaxy brain is realizing that anyone can learn a language/framework/workflow in a month or so. The real work is applying sound principles to the design and production of meaningful artifacts within those systems.


Though credit where it’s due, some of their tools really have been years ahead of anything outside of google, e.g. the closure compiler that made javascript development scalable.

Their own IDE? Is it web based? So they don't use, say, IntelliJ/Eclipse for Java projects?

It's basically a fork of VSCode.

That’s a recent development, used to be something else altogether.

I have seen this discussed in hiring decisions. I don't know that it played a large factor in a decision, but lack of experience in the standard tools/practices/terms of software development because of a career at Google was definitely a discussion point.

I haven't worked at google, but this is something I have heard from a few people. Reputation is largely word of mouth, so it checks out for me. I suspect the skills/tools at most large companies are increasingly less transferrable as they continue to grow in scale and scope.

I had a bunch of very tenured teammates that didn’t really know how to use git, so there were only a few of us comfortable enough integrating and interacting with an open source dependency repo.

> Google's code, tooling and accompanying practices are developing a reputation for being largely useless outside Google .

It is almost a tautology

Why would they be useful for domains they are not designed for?


Most companies don't use proprietary tools for everything.


I’m surprised they didn’t turn that into a product, it sounds great.

Parts have been. Sourcegraph is basically the code search post built by ex-Googlers originally. Bazel is the open source build tool. Sadly, most of these things require major work to set up yourself and manage, but there's an alternate present where Google built a true competitor to GitHub and integrated their tooling directly into it.

I've published my Google proprietary stuff (when we decided to open source it) on GitLab, but they wouldn't let me do it on GitHub.

Building tools for others is a competency that is under rewarded at Google. They would never.

> Entire C++ servers with hundreds of lines of code can be built from scratch in a minute or two tops.

Hundreds, huh? Is this a typo? It makes me wonder if the whole comment is facetious. Or do C++ programmers just have very low expectations for build time?


I suspect they meant "hundreds of thousands"

Yes, oops - fixed!

That's the beauty of C++, an absurdly slow build is just an include away.

Boost always helps prop up falling compile times.

Specially when your juggling is getting out of practice.

Has anyone done LLM training with template metaprogramming? That seems like another excellent way to keep Google’s build servers warm.

The public version of Google's build tool is Bazel (it's Blaze internally). It has some really impressive caching while maintaining correctness. The first build is slow, but subsequent builds are very fast. When you have a team working on similar code, everyone gets the benefit.

As with all things Google, it's a pain to get up to speed on, but then very fast.


Just wait until you try a “modern” language

Most of those arguments are not about code quality though.

Andrew O’Hagan's article on Assange is rather famous, not only for its contents, but also for being 25,000+ words in a magazine that still pays per word. The LRB can pull it off because they're subsidized by the editor's family funds.

I don't think the pay angle got serious until after like Google went IPO - somewhere around like 2005-2010. Before that, tech workers were paid well, but not so much beyond like a financial analyst. Now there's a mass market of FAANG-M jobs, perhaps like 500k jobs in total. They all pay way more compared to the average job, in multiplier terms.

There has been a quantitative shift in tech worker pay at the top end in the past twenty years.

In other words, the past twenty years have seen the rise of Big Tech: billion-user products, trillion-dollar valuations, eating up entire industries (retail, publishing, consumer electronics, entertainment). What enabled them are software economies of scale (production + distribution), the US's ability to train and attract talent, cheap money, and access to massive markets abroad.


> Before that, tech workers were paid well, but not so much beyond like a financial analyst. Now there's a mass market of FAANG-M jobs, perhaps like 500k jobs in total. They all pay way more compared to the average job, in multiplier terms.

We're really only talking about the top employees at the top tech companies in the top most expensive COL areas. It's not like every tech employee in the world is making $500K and driving a Ferrari, despite what HN commenters might sometimes say.

> There has been a quantitative shift in tech worker pay at the top end in the past twenty years.

Bingo--we're comparing only the top end of tech pay.


I think it’s important to calibrate what “great pay” is in a full spectrum way.

Cutting corners on details here - Jane, making $120-150k as an engineer at non-FAANG, but fully (or at least maximized risk controlled) on the right side of every gnarly automation unemployment theme we all know is coming simply because Jane knows how to code, link systems, and is ahead of the curve such that she can learn how to use AI in time, is in a much different and much more beneficial position than Sarah, at a bank, making $300k, knowledge stops at excel wizardry, and a LLM is about to unemploy her at 40 bc it gets better than a CFA 3.


This is a potential outcome, but by no means guaranteed.


It is happening already, the extent to which it fully occurs is the potential part. The vendors are out there and shopping. Or, OAI and co are totally blowing smoke.

Latter is possible for sure, but I don’t think anyone really thinks the tech hasn’t arrived, or isn’t on edge enough to give it the serious benefit of the doubt. Things just have to develop naturally from Netflix in 2013 (laggy and small library) to Netflix in 2024, and we’re there. IMO it’s a willful blind spot from those benefiting from this change to fully argue otherwise.


People said similar things about the cloud: shrink your IT staff!

I don't think that dream quite materialized.

I doubt it will with AI either; they will just be supervising the AI, or spending time on more important things that the AI frees them up for.

That's my "prediction", anyway...


Where this equivalency fails Imo is cloud’s expansion into employment market tasks was just providing a wrapper of a nice UI and automation suite on server admin tasks. That ability to scale quickly probably creates products that expanded into other labor markets more faster than if racking and stacking was required by every startup. But, that expansion probably would occur anyway, just slower.

What labor tasks AI providers a nice automation and UI wrapper on is… a much much larger scope than what cloud could do.


“Making 500k… driving a Ferrari”

I’m guessing you’re being sarcastic but, I make that much at a tech company and absolutely can not afford a Ferrari. At least, not if I want a roof over my head. Of all the engineers I know making this much, and even double or triple that, the most expensive car driven is a model S.

Actually the only person I know with a Ferrari is a cabinet contractor. He drove it over with paint cans that I requested for kitchen cabinet touch ups.


> I make that much at a tech company and absolutely can not afford a Ferrari. At least, not if I want a roof over my head.

Don't worry, I'm sure your luck will turn around.


Which is what I meant by “want to make the same pay as a financial analyst but without the hours and awful culture?”


Taste is a nebulous and rare thing. Microsoft rarely had it - it continues to flail around, with brief rays of hope - many of the Surface products are lovely.

Will Microsoft enshittify Windows? Yes, it already has.


I struggle to remember any point at which Windows (or DOS) was great. Perhaps Windows 2000. This is Windows thirteen years in:

https://www.youtube.com/watch?v=yeUyxjLhAxU


Windows XP was rather beloved as well.



This is so baffling to me. Is it just to reduce traffic to OneDrive or something?


Pernicious assumptions!

Google Cloud projects have three attributes: user-friendly names, system numbers, and system names. System names are alphanumeric. They can be chosen by the user, derived from the friendly name if there's no collision.

But! There's some system names from the olden days that are actually all numbers - so not actually alpha-and-numeric. Thankfully we don't run into those often.


Hyperthreading is also useful for simply making a little progress on more threads. Even if you're not pinning a core by making full use of two hyperthreads, you can handle double the threads at once. Now, I don't know how important it is, but I assume that for desktop applications, this could lead to a snappier experience. Most code on the desktop is just waiting for memory loads and context switching.

Of course, the big elephant in the room is security - timing attacks on shared cores is a big problem. Sharing anything is a big problem for security conscious customers.

Maybe it's the case of the server leading the client here.


Just this morning I benchmarked my 7960X on RandomX (i.e. Monero mining). That's a 24-core CPU. With 48 cores, it gets about 10kH/s and uses about 100 watts extra. With 48 threads, about 15kH/s and 150 watts. It does make sense with the nature of RandomX.

Another benchmark I've done is .onion vanity address mining. Here it's about a 20% improvement in total throughput when using hyperthreading. It's definitely not useless.

However, I didn't compare to a scenario with hyperthreading disabled in the BIOS. Are you telling me the threads get 20-50% faster, each, with it disabled?


This shop is brilliant, thank you!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: