Perhaps the operative phrase being "Many years ago". I currently work at Google, previously I worked at Square. Of the two, I generally prefer the OSS and off-the-shelf tooling at Square. Some things really are better at Google (code search and Blaze are definitely a big improvement over what we had), but many of our monitoring and CI tools feel antiquated and cluttered with inscrutable debris. Or take Gerrit, where each change has 3 different IDs and the developers decided that every workflow status should be expressed as an integer in the range [-2, 2] for some reason .
They were probably amazing (and much simpler) tools back in the day, but the world has moved and we're somewhat constrained by what's familiar.
I wrapped a 2-year stint at Facebook recently and my experience was similar to yours.
Some tools were amazing and cool and I really appreciated how they evolved over time to manage huge amounts of resources. But most were subpar when compared to OSS tooling - outdated, deprecated functionality, very little documentation, very few people working on them. The common approach was to learn about "the duct tape" way to make something work, then pass it on to new engineers.
An example would be tool X for working with diffs (PRs). It's the latest and greatest, except that it only covers 75% of its predecessor's, tool Y's, functionality, so you end up learning both. Tool Y has been "deprecated" for the past 3-4 years. Some of its features don't work, but you'll only know when you try and execute them.
The thing that bugs me most about my company’s tools are the ones that are enough younger than OSS solutions that someone either didn’t look very hard, or didn’t want to find anything (so they could write their own). Other priorities come along and those tools eventually can’t keep up with OSS alternatives, but the apologists take over.
You will not continue to get accolades for tools you wrote three years ago. The only “value” you derive is the time and effort it saves, offset my the effort expended. The time and effort expense of external tools are often lower. And, when the tools are annoying, you can commiserate with your coworkers instead of being the target of their criticisms. Which is often under-appreciated.
This was actually pretty useful in OpenStack back in the day. The reviewers who couldn’t yet approve code could only put +1s on there (and there were a lot of low quality reviewers who slapped these everywhere) so it was very obvious when a patch still needed attention from a commit-privileged dev.
Then -1 was standard review feedback of stuff that needed improving and -2 would come from commit-privileged devs when the patch fundamentally didn’t fit with the project direction.
1 = WTF?
2 = Needs Improvement
3 = Good Enough
4 = Better than Good Enough
5 = Almost Perfect
Daniel Kahneman recommends a scale like this for evaluating job candidates across core competencies. It's a surprisingly powerful little heuristic.
-2 = WTF
-1 = Needs Improvement
0 = Good enough
1 = Better than Good Enough
2 = Almost Perfect
It's even intuitive, since anything above zero is 'above and beyond' and anything below zero isn't good enough yet.
-1: Changes Requested
Communication is hard. People actively in the team probably have the context to understand the differences, but that implied context is what tribal knowledge is made of.
OP indicated that there were non-trivial "non-linear" differences going from 1 to 2 and -1 to -2. The use of numbers implies a relationship that doesn't really exist. The use of accurate language makes the actual meaning immediately obvious without the need for implied context. For example, I had no idea that only privileged devs could give 2s. "Accepted" and "Declined" do imply a finality that more accurately mirrors the intended usage.
This may seem overly analytical for something that is easy to just explain, but this kind of thing build up in layers.
Gerrit, Reviewable and Phabricator are very similar, except that Critique is much more polished.
Having used all three, Gerrit works best and the new UI is starting to look pretty good. Reviewable has great UX, but inherits some of the disadvantages of GitHub PRs.
I think they kept the same keystrokes, though.
That is one nice thing about Critique is although it's a webapp you can drive it pretty much entirely by keyboard. It has the feel of an old usenet newsreader like nn, in a way.
But yeah... even I find the names difficult to handle and remember, even after using Phabricator for more than 4 years now. Nevertheless, it is a great piece of software, especially the code review process is top-notch.
More pixels != to more information. The visuals and varieties in ways to present data in grafana always seemed more intuitive for me, being able to lay stuff out in different ways was great. I worked on a G-on-G project at Google and used both grafana and viceroy. I never had an issue with grafana on mobile, but I was never was able to see viceroy on a mobile browser because of the Corp network.
I'm sure viceroy is more performant, and I'm not denying that, I may also be scarred a bit because I found writing mash queries an arcane nightmare, and those are related to viceroy.
The default (rpc latencies and etc) views and metrics you get are amazing however.
Mash is indeed arcane, although gmon (viceroy) makes it worse than usual.
It's still a big step up from, say, GitHub PRs, and the UX has improved significantly.
Exactly my feelings about Phabricator. GitLab simply runs circles around it if you consider the issue tracking, CI etc. as part of the problem.
This sounded awesome to me (imagining some Fortran code behind it), but disappointingly the cited screen shot only has the range [-1,0,1] which is hardly the same.
No, I'll take Github PRs any day. They could be better, of course.
Why not just leave them uncollapsed and allow one to mark them as resolved when the change is actually resolved. The other problem is that it's difficult to see what actually changed when a line is marked out of date without having to scan through the entire updated diff and then jump back and forth between the diff and conversation view.
Collapsing them by default makes it difficult to find the comment by scanning. If they were uncollapsed until I marked them as resolved, then I could at least use the find feature in the browser and search for my username to find my comments.
Except that if the overall diff line I choose to comment on about the commit message changes in some way, then the comment is collapsed, which makes it difficult to keep track of what needs to change.
> since by the time the PR is done the commit msgs are pretty much not changeable, except for the last one
They can be changed via a git rebase.
If you're interested, I'm happy to talk you through it - just book some time here: https://calendly.com/ericyu3/15min
Then, we can run git rebase -i --autosquash --keep-empty to handle applying the changes requested to the correct commits at the end of the review process.
Maybe you'll build the framework & subfeature A together, because you need some bit of meat in there to properly figure things out and be able to test things
On Gerrit you'd probably create two commits which get pushed as two CLs
* CL1 Create Thing framework
* CL2 Implement A for Thing
They're now pushed for people to review. While they're looking at them, you continue working on, making CL3 implementing B
* CL3 Implement B for Thing
One of your coworkers points out an issue in CL1, so you fix it (by amending the commit) and repush the stack. Now your stack is
* CL1 Create Thing framework [v2]
* CL2 Implement A for Thing
* CL3 Implement B for Thing
CL1 & CL2 get approved so you merge them (though typically with Gerrit they're cherry picked - CLs are individual Git commits). You push up an implementation of CL4, so your stack now looks like
* CL3 Implement B for Thing
* CL3 Implement C for Thing
The important point is that CLs are atomic (and are reviewed atomically) even if they depend upon each other (i.e. are a part of a stack). When you're working in Git you typically (unless for whatever reason you have multiple unrelated stacks on the go, which is relatively rare) just work off of the master branch, so all the commits between `origin/master` and `master` - i.e. the set that automatically show up in `git rebase -i` and similar tooling - are your stack. When you pull, you `git pull --rebase` (or rather set the config for your machine or repo to default to that). When you've revised something in your stack or want to add something to it, you just do "git push HEAD:origin/refs/for/master" update Gerrit with the latest version of your stack
It takes a little time to get used to (and you have to learn how to hold `git rebase -i` properly), but when you're used to it it's immensely more productive than the Github/clones PR flow (and doesn't involve manual branch juggling, etc). I can't express just how badly they deal at handling reviews of stacked PRs - either you create your stacked PR targeting master (and it gets all of the changes from the underlying PRs merged into its changes list, which makes reviewing it harder) or you target it at the branch of the underlying PR (which is non-obvious and painful and you have to manually remember to shift it across when the PR beneath it gets merged)
When I'm in this situation when dealing with something which is PR based, I tend to end up merging CL1 & CL2 (because its easier to review things when you have an example) and hang on to CL3 & CL4 on my local machine until the first one gets merged
What prevents a situation like CL2 getting approved and merged prior to the resolution of CL1? Or more generally, what ensures that the ordering of commits in a set of CLs in a given stack is preserved prior to merging?
> I can't express just how badly they deal at handling reviews of stacked PRs - either you create your stacked PR targeting master (and it gets all of the changes from the underlying PRs merged into its changes list, which makes reviewing it harder) or you target it at the branch of the underlying PR (which is non-obvious and painful and you have to manually remember to shift it across when the PR beneath it gets merged)
Rather than stacked PRs (which essentially doubles the number of commits since Github (and possibly Gitlab) introduces a merge commit for each PR even if it's a fast-forward merge), would it not be better to just combine CL1 through CL4 into a single feature branch where each commit corresponds to a CL? It would make for a large PR, but it can be reviewed on a per commit basis.
One thing that is maybe not obvious: For an API author, code search in combination with a monorepo and the somewhat hermetic universe which is Google's code base provides immediate access to all uses of a library. You can see what worked well and what didn't, and it enables effective refactorings. That also means when writing a client, you can quickly figure out from other client code how stuff is supposed to be used. All this makes code search such an effective tool in Google's development environment (in the broader sense).
If anything like this were possible for open source (indexing code that depends on a library/API, across subrepos in any version control system, in a way that gets near complete coverage), it would enable similar possibilities of systematic improvement.
Alas, it does not seem realistic except in a few niches where the number of clients is bounded and code owners are willing and able to follow a protocol (approve changes to their code that unblock such global improvements.)
gets me thinking — this is totally doable for open source code. someone just needs to build some giant indexes across git repos + package repos (maven/npm/pip/etc)...
Then write plugins for VScode/intellij, and when developing you can right-click a symbol and “Find uses in open source code”
Does something like this already exist?
1. Vastly expanding the size of our global search index to cover every public repository on github.com, gitlab.com, and bitbucket.org.
2. Enabling LSIF indexing (https://lsif.dev) in every major language for compiler-accurate code navigation (go-to-def, find-refs) across repository/dependency boundaries.
The latter is already working on a subset of languages for private Sourcegraph instances, and we want to scale it to the entire open-source world. We think a single search box that covers all the visible code in the world and allows you to seamlessly walk the reference graph is super super powerful and someday will be a thing that most developers use every day.
Do OSS licenses distinguish between corporate usage of repository source as code vs as data.
That said, the community often takes authors' wishes into account.
Out of curiosity, why would you want to restrict this? It's not like Sourcegraph has any exclusivity to the concept or the ability to implement it.
But yeah, it could get expensive to keep the indexes current as things grow.
Sounds a lot like sourcegraph.
However, neither works great—maybe 30%-80% coverage?, and far from "in a way that gets near complete coverage" that the parent stated about the benefit of Google's monorepo.
It may be a case where you need 90 or 99 or even higher coverage before it becomes a game changer. But I think GitHub is headed there quickly.
For the latter, it's a known limitation in that we're only doing "fuzzy" or "c-tags-like" Code Nav — defs and refs are only matched by their textual unqualified symbol names. For some languages that's not a big deal, because symbol names tend to be unique. But for a language like Go, you might have dozens (or hundreds!) of methods named "Parse", and we don't currently distinguish those in our Jump to Def UI.
That's something we're actively working on, though! One benefit to the current fuzzy approach is that it's incremental (we only have to analyze changed files in a commit), and does not depend on tapping into a build or CI process. That makes it much faster — we have a long-running indexer service live and hot, and typically have new commits indexed and live in the UI within 1-5 seconds of receiving a push. And it requires no configuration on the part of the repo owner — no need to tell us how to build your project, or to configure an Actions workflow to generate the data. That makes it easier for us to support entire ecosystems when we roll out support for a new language. We've also tried to make it fairly easy for external communities to add support for their languages — this is all driven by the open-source tree-sitter parsing framework. This file, for instance, is where the symbol extraction rules for Go are defined: https://github.com/tree-sitter/tree-sitter-go/blob/master/qu...
We've been working on a new approach to generate more precise symbol data, while keeping those incremental and zero-config properties. It's not ready for public launch, yet, but it's close!
For those who want to learn more, I gave a talk at FOSDEM 2020 (back in the Before Times) going into more detail on some of these constraints and design decisions. (Though please note that this talk references a legacy way of defining symbol extraction rules, which relied on writing code in our open-source Semantic project.) https://dcreager.net/talks/2020-fosdem/
I don’t have any links saved, since they feel kinda sketch (ads all over the place). But it has been very helpful, unlike searching within GitHub for a string that I know is in the repo.
If the change is small or can be automated (i.e. changing # of parameters or function names), we run a script to make the change over the entire monorepo. This enormous CL is then approved by one of the Global owners.
If the change is complex (or non-obvious), you generally introduce the new API in one CL, change each use manually (say one CL per team), and then remove the old API in a final CL. In that case, you need each team to sign off. This isn't too hard in practice, teams are generally expected to approve cleanups.
We usually have a 3 steps process. 1st new API interface/service using the new dependency is added and runs in parallel with the existing one, 2nd we blast email to users of old one to migrate to the new one, 3rd when usage of the old one hits 0 it is removed. When a team slacks off in migrating (usually we have months between receiving the first notification and service shutdown) things escalate and a director might get involved.
Anyway, this means that migrating to the new interface is clients' responsibility.
However there's an exception: if the changes are trivial (like changing the the name of one of the API endpoints), we have an automated tool that basically performs this change across the whole codebase. These kinds of changes still need to be approved by the codebase owners and shouldn't break any test, so they must be really trivial.
In this case is usually the team owning the API/Service that performs that.
For third party dependencies there are two main options.
1. Do it all at once. This is good for relatively small numbers of users or small API changes.
2. Introduce the new version, migrate users, then delete the old version.
Note that for third party libraries Google does maintain a fairly strict one version policy so 2 is always a temporary measure.
* Ability to change other people's code safely, possibly gated by code review approval. Cultural and technical (e.g. unit tests must exist and be maintained) barriers may apply.
* Making incremental changes that are backward compatible (at least until you migrated all clients so you can cleanup later)
I recommend this talk https://youtu.be/tISy7EJQPzI (even if the scope is broader than your question)
At BazelCon 2020, Borja Lorente from Twitter gave a talk  about how they were migrating away from Pants to Bazel. Seeing as Twitter was heavily involved in the development of Pants, not sure what that means with respect to adoption and evangalism once they complete their migration.
Lately it seems that Bazel has basically won in the space of open source projects using Bazel clones as more and more companies switch over.
Nothing finished yet, performance issues (because guess what, polymer's routing turned out to be pretty much the same as a 'tab panel'), aaand Polymer 2 rolls around, backwards incompatible. Scramble to make all their components suitable for Polymer 2, and they're just about to sigh a breath of relief aaaaand polymer 3 rolls around and becomes deprecated in favor of lit-html at the same time.
Moral of the story: Ex-Google doesn't mean shit, and don't force the use of experimental technology for your whole fucking multi-billion company.
They should've stuck with Angular, it's fine, or migrate to React instead which has / is becoming an industry standard.
Material Components for the web is constantly introducing breaking changes that make building upon the framework a nightmare. Lately they've even set this expectation in the project description. The problem is that the components are rather buggy, so you must regularly update them to have upstream bugs fixed.
> Material Components Web tends to release breaking changes on a monthly basis, but follows semver so you can control when you incorporate them. We typically follow a 2-week release schedule which includes one major release per month with breaking changes, and intermediate patch releases with bug fixes.
The development team's decisions have been questionable , so consider this a warning in case you are tempted to use the project.
(i never worked at google but did interview, argued with the interviewer and hung up, he thought computers worked differently than they do in reality and just spoke with this heir of authority, without having a spec of it )
Hubris runs rampant in the ex-FAANG crowd, and businesses who aren't tech companies who hire these folks do so precisely because they don't understand that what worked at ex-FAANG won't necessarily translate to their business.
And I would encourage anyone who loves the field IMO should have a goal to do a stretch at one of the big ~5 (an internship may be enough). I speak as someone who was very anti-big tech until 2014 when someone convinced me to get my head out of the sand.
Your technical abilities can still improve (as long as you keep practicing yourself, because dev speed will slow down at a big org), you will gain tons of practice working with a lot of talented people, but irreplaceably you will understand more about how the industry is piloted.
Thats pretty close to the Google story.
I did read the article and yes it does have some good parts.
To me their absolute best technology edge is still their storage system (colossus)
I found the more meaningful thing is the ecosystem of smart engineers, and the ability to find others who face similar problems, exchange ideas and solutions. It's a skill of its own to find these people and learn their "language", but once you do, it's a huge multiplier that is hard to find elsewhere.
It’s when you get to a bigger size that you (probably) need to worry these cases.
Also I have been looking to move away from Makefiles and bash scripts and Bazel does come up more often. What are your experiences working it and how does it compare to Blaze?
The good is that I find BUILD files easier to read/reason about than Makefiles and it also is multilingual.
The bad is external dependency is hard, especially if you want to take advantage of all of bazel's features like hermetic builds. Most larger C and C++ don't use bazel to build and don't have BUILD files, so you end up maintaining your own build system for that project OR use something like [this].
Plug: for those interested in codesearch within bazel repositories, my vscode plugin for bazel has a special feature for this: https://stackb.github.io/bazel-stack-vscode/searching
This makes code search a super power.
As an example that surprised me, even our oncall rotations are kept in a text file along with a list of upcoming assignments. The rotation tool checks out that file and appends the next few names in the list to the end of that file to prepare the "calendar" for the coming weeks.
Pretty much all other "configure things that are at least moderately complex" are actually bad enough that it makes GCL/BCL look like a really good idea.
There isn't anything said about end to end, integration or functional testing here. I'm in a world where everyone hacks their own system together onto the same runtime, leading to some wonky outcomes and lots of operational support. Would be interesting if there was a 'google' way to do it.
The recent semantic references stuff they've added is helping, but that only seems to be available in certain language/setups and doesn't work cross repo. Google's xref system allowed you to browse essentially the whole Google codebase - it was amazing. Third party code was indexes too, I remember my team used code search to track down a bug in NGINX once.
Githubs normal search feature is bad. I can't even quote stuff for exact matches. I usually end up using bigquery for GitHub wide searches  or just pull down the repo and grep locally.
I was recently able to use it to find all repos in my org using git-lfs by searching for .gitattributes with certain properties.
And I was able to search all projects for a particular secret string.
Internal code search is miles ahead of GitHub/Gitlab search and is super fast and reliable. I have not used source graph (which everyone seems to talk about) but in my past experience in other companies nothing comes close.
So you can compare
If you want to search code sitting on your local hard disk try this tool (built on top of Lucene):
I made it when I was frustrated by existing tools such as sourcegraph and opengrok.
That having been said, I love Kythe, and we've actually considered using it as a semantic backend for Sourcegraph (and still might in the future). For the time, we're using indexers that emit LSIF (https://lsif.dev). This allows us to build on top of the substantial body of work provided by the many open-source language servers (https://microsoft.github.io/language-server-protocol). But Kythe has a far richer schema that can capture all sorts of useful relationships in code. It's awesome and I wish more people were building indexers for it.
Is it just a case of the visible implementation being a bit behind the ultimate capability of the system?
Like beliu said, the Kythe schema is far richer; it has fully abstract semantic layer in the graph, and is a superset of what can be represented with LSIF. It's not tied to specified text regions -- there are representations of symbols/functions/classes/variables/types that do have pointers to/from text regions.
Note that because of the richness and abstractness, it's theoretically feasible to drive much more than code navigation from the Kythe graph.
And yes, the open source is just part. The large scale pieces are basically (1) do instrumented build (2) run through Kythe indexers (3) post-process output for serving.
The Kythe OSS project offers solutions for (2) for C++/Java/Go/Typescript/protobuf (and early Rust support). We do have plans to open source support for at least some other languages at some point in the future. (Hedging as best I can here.) Note that the best candidates for Kythe indexing are those languages that admit solid static analysis.
(1) is inextricably tied to the build system. Bazel support should be nearly turnkey; other systems require more (maybe significantly more) work.
There's not-full-scale support for (3) available. (Clearly we use something far more sophisticated internally.) While we'd like to see this fleshed, expansion of that will depend on non-trivial community contributions.
It's not, and Google is hardly any authority on good software.
> Introducing code search and monitoring doesn't require asking anyone on the team to change existing workflows. Changing the code review tool, however, does.
It basically sounds like the author wants to take over and tell everyone how to do things.
After time at Google, comparing them to other organizations all the way from 8-person start-ups to other FAANG+Microsoft, there are a LOT of downsides to their tools and their isolated ecosystem. A lot of things are more difficult and take longer to just get done than elsewhere at the same quality.
> Google is well-known for its internal tools. It would help to educate yourself a bit.
This may also be exactly the attitude OP was talking about.
1) CitC (Client in the Cloud). Mounts your development environment on FUSE filesystems that exist in the cloud. The entire monorepo is mapped into your CitC directory. You can access it from your desktop shell, your home laptop, or a Web Browser based IDE. Any edits you make are overlaid onto the (readonly) source repository, looking seemless and creating reviewable changelists on the fly. Effortless sharing of editing between multiple machines. ObjFS, which also sits in your client, allows blaze(bazel) build artifacts to be shared as well, between clients, even between users. In other words, if I work on 3 machines, I don't need to "check out" my work 3 times. In fact, I almost never "check out" anything at all. I work in a single monorepo with Mercurial, edit files, which produce reviewable changelists against the main repo. I don't need to decide what files to check out or track, nor decide which machine I will work on, and I often switch between IntelliJ locally, IntelliJ via Chrome Remote Desktop on my office computer, and a VS-Code like Web IDE.
2) Skyframe (https://bazel.build/designs/skyframe.html). Imagine parsing the entire monorepo and every single BUILD file into a massive pre-processed graph that knows every possible build target and its dependencies. This allows ultra-efficient determination of "what do I need to rebuild? what tests need to be re-run" across all of Google. I guess the closest thing to this is MvnRepository.net or BinTray, but Skyframe doesn't just parse the stuff and give you a search box, it informs CI tools.
3) Citc/Critique extensions to Mercurial -- take a chain of commits and make them a single code review, or take a a chain of commits and make them into a stacked chain of code reviews.
4) Critique presubmit tools (e.g. errorprone, tricorder, etc). Google has a huge number of analysis tools can run on every review update, for bugs, security problems, privacy problems, optimizations, data-races, etc. Yes, these are usually available outside, but it's just so easy to enable them internally compared to doing it on GitHub. Lots of other codehealth tools, for automatically applying fixes, removing unused code, auto-updating build files with correct dependencies.
5) Forge -- basically Blaze's remote build execution (what Bazel calls RBEs). Almost every build at Google is extremely parallelized, and if you need to run flake tests, running a suite of tests 10,000 times is almost as fast as running it once.
6) Monitoring's been mentioned, but monitoring combined with CodeSearch hasn't been touched on. Depending on configuration, you can often see from Critique or CodeSearch what release or running server code ended up in and what happened to it (did it cause bugs?). CodeSearch has an insane number of overlays, it can even overlay Google's Sentry-like exception logger being able to tell you about how many times some line of code produced a crash.
A lot of Googlers use maybe 25% of all of the features in CodeSearch and Critique.
Here's a in-depth article from Mike Bland
This is available outside Google now - start at https://github.com/bazelbuild/remote-apis or https://docs.bazel.build/versions/master/remote-execution.ht... . It's a standardized API, supported by a growing set of build tools (notably Bazel, Pants, Please) with a variety of OSS and Commercial implementations. At this point almost anyone can set up Remote Execution if they wish to, and remote caching is even easier.
Minor terminology correction: RBE generally refers to Google's own implementation of the same name; Remote Execution (RE) and the REAPI are used to refer to the generic concept.
(Disclaimer: I work on this at Google.)
I really liked this article. It makes great points about the order in which to try to improve things. I feel seen about trying to push a build system before having built social capital. Great to learn about other build systems. I am still having trouble explaining "what's wrong with just using Makefiles"... working on it.
It's not surprising that Python doesn't work well in such a high-delay dev environment.
That's putting it nicely. Based on experience, I'd say it's arguing about preferences while demanding that code be coupled because "that's how we do things here".
> culture is far more like academia anyway
Academia is actually based on research and correctness to a much higher degree. The lack of focus on correctness in code worked on at Google is, honestly, appalling. Until, of course, you realize that all other FAANG employees still write loads of concurrency bugs and pronounce themselves gods for doing so.
That's certainly not been my experience (and I get the exact opposite impression from people involved in the formal readability process).
> Academia is actually based on research and correctness to a much higher degree.
This depends, greatly, on what part of academia you're in. In many ways, Google is much better about correctness than much of academia (reproducibility, for example, is often near trivial at Google but uncommon in most non-theoretical areas of academia).
> The lack of focus on correctness in code worked on at Google is, honestly, appaling
There are tradeoffs here. On the one hand, you have tens of thousands of engineers, you're not going to be able to enforce perfection by every single one with the tools available today with a reasonable cost. On the one hand, I see evidence that Google is willing to invest huge amounts into improving software correctness at the lowest levels (like proposing and upstreaming changes to languages to improve correctness-by-default).
To my eyes, we still develop software in plain text and the same languages are still dominant. It's still Linux and http.
It's not just Linux and http anymore. It's frameworks towering up to the heavens.
On the one hand I agree with you, in that these liabilities you describe are much like the legacy systems I see in our local banking sector.
On the other hand, these legacy systems just work. And adopting newer / more modern systems might definitely have advantages, but doing so creates a bunch of other liabilities - as I've seen take place in our new startup banks.
Stuck between a rock and a hard place.
Of course that's just in banking. I most certainly don't have the experience to comment on whether the same liability tradeoff would happen at FAANGs.
Its not being stuck between a rock and a hard place, you just need strong management that understands that changing whole systems is never a good idea without clear benefit, other than "technical debt" which is such a bad term for reliable software. Software doesn't age, im not sure how software engineers don't understand this, or maybe they do and want to write more s/w.
And that's besides tribal knowledge that goes away as people leave or retire.
It’s actually been pretty refreshing to be around.
wallah is arabic slang for "by god"
voila is french for "look there", meaning "just look at that"
They are totally different semantically interjections originally, yet one fits almost everywhere the other does, and if Wiktionary is to be believed, in Danish one is a spelling for the other.
It's more to see the lay of the land than anything else - might be something very cool for you to pick up. Most of it will be an overkill for smaller teams.
Using GitHub it’s between $4-21/user/month.
So this means that insight from my source costs sometimes more than actually managing my source.
I’ve never worked at Google but it seems like one benefit is that they’ve figured out how to scale the costs of this kind of functionality so you don’t have to run into conundrums like this.
Let’s say sourcegraph saves a few minutes and it worth it. GitHub saves way more time so it’s relative value is much higher.
I think this is a problem with many SaaS products in that it’s more “efficient” to just be software. It also seems like the per user costs are relatively low and the effort is in the code base indexing and whatnot.
I think it’s a good idea and think it’s great if users are happy. But I’d rather have an OSS product that I could install as part of the tons of other stuff I use than be on the hook for a monthly charge for every user.
I expect that they’ll be bought by GitHub or GitLab and rolled in at some point and will make more sense to me value wise when it’s like $.17 or something of an overall source management cost.
We developers often overestimate the value vs. cost of the tools we use ... many tools have a negative value vs. cost eventhough they are free.
“Product A is great value and product B is more expensive than (or same as) product A therefore product b is bad value and I won’t buy it.”
The ROI or otherwise of a product is not related to the first one. Unless your budget is so limited it’s either-or, why is this fallacy so prevalent?
I think that fairness is a factor in pricing and it’s not just an ROI calculation.
It’s like those epi pens that went up in price right? From an ROI perspective it’s great because you pay $400 and get to live. But know it costs little makes me think it’s less just.
Relative ROI is also important as we have to prioritize. GitHub seems a much better value to cost than this tool.
It’s perfectly fine for them to price however they like and it seems from their site that lots of companies pay this.
This isn't anchoring in the true sense, though. That's my point. It's like saying "my car costs £10000 and my house is £300 000 and my car goes everywhere so this house is a rip off". It's apples and oranges. People are being irrational about software purchases because MS etc. are cheap.
> It’s like those epi pens that went up in price right?
it isn't, because the price of the SaaS hasn't changed.
> Relative ROI is also important as we have to prioritize. GitHub seems a much better value to cost than this tool.
For something that costs tens of dollars per month for a company that employs software developers, there is zero debate about whether they can afford it. Provided ROI > cost, why on earth wouldn't you?
This just seems like irrational behaviour caused by the fact that software has near zero marginal costs and most people don't understand that.
lastly I am just curious to know why don't companies adopt opensource tools that are being used everywhere so skills can be transferable easily
2. Why doesn't Google open source them?
Considering the age of the articles and tools themselves, they probably predate everything you're using these days. All the GitHubs and other shiny SV startup things.
IIRC they also don't ever want to share code with other companies which makes most of the SaaS offerings a no-go for them.
Hence why it might not make sense to use the same tools as they do :)
For 1 I guess it also boils down to the mindset you approach a problem. On my past companies when you have a problem the first thing you do is to find a OSS or product that solves that for you and even when you don't find you try to reframe the problem in a way that will fit that solution that you already have in your mind, at google it normally involves on you solving that by yourself or reusing much lower level of abstractions.
This doesn't mean that you will need to always create your own database, but when you really need to you have the skills to do a decent job.
Often there's no OSS version yet.
Usually they have better performance.
They want cluster and multi-cluster services.
> 2. Why doesn't Google open source them?
But they release academic papers that OSS implements.
Because it benefits Google.
Because that would benefit the competition.
Google open-sources a lot: Tensorflow, K8S, Apache Beam, ... . And even if it doesn't end up as fully open sourced project, Google still releases white-papers on the subject that allows startups to create something similar (Cockroach-DB for instance).
However, while I admit that some decisions might be made to avoid benefitting competition (I think, that kind of stuff is way above my pay-grade), some things cannot be open-sourced for purely technical reasons (without a complete rewrite, that is). For instance, within Google everything is a protobuffer, and tools rely on that assumption heavily to work. Outside Google people don't use protobuffers nearly as much and the usage of those tools would be very low.
Other tools are tied to Google having many datacenters and multiple fibers between each of them for redundancy. Like Spanner, which also requires atomic clocks to work properly.
Plenty of people outside Google use protocol buffers, for example; I've run into them at every job I've had since Google, and in plenty of strange places that probably never cross-pollinated with Google (the most surprising place I found them was in Hearthstone). They're pretty popular and people aren't really surprised to see them anymore.
I think there is also a middle ground where people inside Google don't think there's interest, but there is. For example, I very much miss Monarch. I don't think the code is making them a lot of money; my understanding that the Cloud monitoring stuff is completely different. But it is way better than Prometheus or InfluxDB. Queries that are trivial in Monarch you simply can't do with those products. (The one thing I found most valuable in Monarch was that pretty much every query started with an "align" step. And I just haven't seen that anywhere else, so it's hard for me to reason about what the query is actually doing.)
As other people mention, the mere task of picking the transitive closure of dependencies out of google3 is hard. In fact, maintaining a bunch of non-monorepos is a huge chore compared to monorepos once you have the right tool. It's thankless work, literally, so I can believe that's one reason why there aren't more internal Google tools open sourced. But, it can be done if there is some thanks for doing the work. When Google split into Alphabet, work was done to let companies leaving Google take their chunks with them. There just had to be some sort of business reason to justify the tedium.
Unless its low level infrastructure or based on a research paper, its not an easy task.
> Because it benefits Google.
Developing internal tools does not always benefit Google. Sometimes there is just no alternative at the time it's needed, so Google has to develop something that might become a liability later, accumulating technical debt and stagnating compared to a newer open-source shiny thing. Sometimes it's NIH syndrome, or, in other words, wariness to adopt external solutions there's no control over and that do not fit Google very well. However Google does have a healthy internal ecosystem with clear product life cycle and balanced planned/organic change.
> Because that would benefit the competition.
Usually Google benefits from its protocols/tools/projects being used out in the open (TensorFlow, gRPC, Kubernetes, Angular, Android, Chrome), and that includes competitors.
There are plenty more mundane reasons this does not happen:
- Some of the tools are so dependent on the internal ecosystem that it would make little sense to open source them in a standalone way. There are many things at Google that only exist in one single deployment in the world and turning them into a deployable product for another setting would be a huge task without clear purpose. Also, it's hard to opensource operational knowledge and expertise.
Google Cloud is an example (positive or negative, depending on your point of view) of efforts to repackage many internal services in a way that is well-documented, supported and accessible to anybody.
- Open-sourcing is a spectrum: just throwing code over the wall (the worst one), controlling a project and allowing external contributors, cooperating with other big companies on a standard solution, supporting a more hobbyist-oriented project in line with its own needs. Every option has its own set of challenges and coordination problems. It's not always easy to reconcile the internal development model and open source workflow. It's not always easy to keep delicate power balance of working with a project instead of taking it over by engineering weight and influence.