
Google Open Source Code Search - habosa
https://cs.opensource.google/
======
kannmig
Seems like this is based on Google's own internal code search tooling,
something most engineers at Google rely on for every day code-level work. I
personally can't even begin to imagine how I'd navigate the gigantic codebase
without it.

(I work at Google)

~~~
arunaugustine
Would you know if the Google cloud product for hosting git projects [1] uses
the same underlying code search as the internal tool?

[1] [https://cloud.google.com/source-
repositories](https://cloud.google.com/source-repositories)

~~~
hanwenn
It is the same tool.

~~~
snazz
It’s also used for [https://source.chromium.org](https://source.chromium.org).
I now host my monorepo on Cloud Source Repositories because it has a super
nice integration with the rest of their products.

------
coderdd
Shameless plug:
[https://github.com/TreeTide/underhood](https://github.com/TreeTide/underhood)
is a work in progress UI over Kythe indices (the same indices that power Code
Search).

If you already know how to index, this is a completely open source
alternative, likely with less bells and whistles.

I worked at Google and miss Code search. But I have lots of ideas as well how
one can go beyond the status quo for code reading and debugging. Join if
interested.

------
vikinghckr
Googler here. We have the same Code Search tool internally, this is honestly
one of my favorite things about working at Google. Great to see this open
sourced.

~~~
cromwellian
This is missing tons of functionality and layers that the internal one has
tho, like all of the automatic code analysis and linting, coverage and fuzzing
integration, etc

~~~
xster
How could it though? Without (bazel) BUILD files, it couldn't even know how to
build everything.

------
malkia
Really loved this interface (also cs.chromium.org) while I worked at Google.
It was easy for me to orient myself, find what uses this and that, where it's
being used, and then it had whole "debugging" facility:

You select your binary on borg (think kubernetes/docker), and it'll fetch from
the binary with which CL (think like perforce "CL") it was built, and/or
additional cherrypicked CL's, then it'll somehow go back in-time and represent
how the source code looked then.

later one can (I tried it in Java, but I believe it's available for other
languages too), you can inject statements right around the begining of
function (a way of breakpoint), and that statement can be something like -
let's log how this function was called - you were able to reference nearby
statements. This could be set from the command-line, and took a bit mastery
(and was bit afraid first time using it, or more like had chilling effect on
me), but then my task (with 10 or 11 instances) reported these log lines, and
I was able to see them in the browser.

(I have no experience with GCP, or the public face of Google Cloud, so I don't
know what's available there), but this was freakin cool.

------
phillco
Misread as "Google Open Sources Code Search" :'(

cs.opensource.google is amazing

~~~
escardin
There's this
[https://github.com/google/zoekt](https://github.com/google/zoekt). It's
pretty light on features, but dang if it isn't fast and precise.

~~~
hanwenn
thanks!

If you install sourcegraph, you get the same btw. Sourcegraph indexed search
is powered by zoekt.

~~~
aksx
zoekt is absolutely awesome, I was reading it’s code yesterday to figure out
how it does ngram indexing and search.

------
ptman
Check out Debian Code Search
[https://codesearch.debian.net/](https://codesearch.debian.net/)

------
snek
Working with chromium/v8, I can honestly say google's code search infra is one
of the most valuable resources available. I really hope they open source the
backend at some point.

~~~
coderdd
The backend is open sourced, it is Kythe.io. It supports go, c++, java out of
the box, for some definition of out of the box. Maybe even typescript. Also
cross-references protobufs work generated code of you make the stars align ;)

As for UI, treetide/underhood I mention elsewhere is the only open option now.

But Kythe comes with command line utils and an API you can query directly as
well.

What is missing from the open source is a production-ready parallel serving
table builder. There is one in golang which uses Apache Beam, but last time I
checked the go workers are not well supported on the Flink runner. It didn't
even work properly on the GCP runner. Hope this would change.

------
typon
Question for Googlers or others: What do you think is the most well-written
piece of software produced by Google? I would like to study how the world's
best engineers write code. (Preferably C++, as it's the language i'm most
familiar with)

~~~
thedance
The one that people at Google who are the keepers of C++ code quality
standards maintain themselves is Abseil.

[https://cs.opensource.google/kythe/kythe/+/master:external/c...](https://cs.opensource.google/kythe/kythe/+/master:external/com_google_absl/absl/)

~~~
alexhutcheson
By necessity, Abseil is full of dark template magic that would very rarely be
used elsewhere in the codebase. That's the point - it encapsulates a lot of
useful abstractions and allows them to be used without the client code author
thinking about the guts of the abstraction. But it makes it pretty unusual
relative to typical Google C++.

~~~
thedance
True for much of it, but if you look at something like cord.h, it's almost
free of template programs. Google C++ application code isn't all that spiffy,
to be honest. I would say most of the code is dedicated to stuff that nobody
outside of Google is going to care about. I think the base libraries are more
interesting.

------
robinshen
Glad to see google open sourced this. I also implemented code search in my
open source project OneDev
([https://github.com/theonedev/onedev](https://github.com/theonedev/onedev)).
To try it, please visit [https://code.onedev.io/projects/android-framework-
base/](https://code.onedev.io/projects/android-framework-base/). Press "t" for
quick symbol search, and "v" for advanced search with regular expression
support.

You may also hover mouse over a symbol to find its declaration and
occurrences.

------
Hitton
Nice. I'm grateful for this being posted on HN, because discoverability of
that page seems to be zero (I couldn't find any link to it from
opensource.google. It doesn't even have page title so googling it would be
more complicated too.

------
ComputerGuru
Google used to have code search before (or at least far better than) anyone
else did. Then they killed it.

~~~
zerd
I misread an thought it said Google Open Sources Code Search, hoping they open
sourced
[https://en.wikipedia.org/wiki/Google_Code_Search](https://en.wikipedia.org/wiki/Google_Code_Search)

------
kediz
Code search is a great tool! It really helps with productivity! But sometimes
it is very easy to go down the rabbit hole.

I wonder if they can open source code search itself.

~~~
Eridrus
I don't know the state of it, but Kythe is open source:
[https://kythe.io/](https://kythe.io/)

But in reality you probably want something more like SourceGraph which
packages everything up nicely so that you don't need to worry about it, or
something more specialized.

~~~
thedance
Bringing up kythe from scratch is very daunting. First thing I did when I left
google, of course, but still really hard.

~~~
__float
What did you use for a UI?

~~~
thedance
That's one of the main problems. I started with the demo frontend that comes
in the box and just hacked on that. I'm no UI developer by any stretch.

------
dragonsh
If you want to do code search on your private git, mercurial, svn, cvs and
other repositories try a fully open source opengrok
([https://github.com/oracle/opengrok](https://github.com/oracle/opengrok)).

It’s easy to self install and use, with good documentation with added bonus
very fast.

------
gitgud
Sorry to be critical, but can someone explain to me the benefits of having
code search this powerful?

Surely code is easier to explore in an IDE which understands the _context_ and
_dependencies_ of the project... This just seems like a glorified "find"

~~~
marcyb5st
If you think about Google's codebase size, an IDE wouldn't cut it. You could
load and analyze dependencies/imports as you go, but that would make for a
terrible user experience (think about IntelliJ indexing task every time you
want to check the definition of something).

Also, Code Search has baked in a lot of goodies. History layer, cross-
references, call sites, ... and it's snappy. Moreover, is really well
integrated with all the other internal tools used for coverage, code analysis,
issue tracking, web text editor, ... .

I think an IDE (like IntelliJ IDEA) can't reach that level of integration with
several other systems unless you fully buy into the ecosystem a company like
JetBrain proposes you (their issue tracker, their code review tool, ...).

So, summarizing, it's a tool made by Googlers for Googlers' needs and it's
amazing using it every day for all the above reasons.

------
vkaku
This name is misleading. The domain is okay.

It should probably be 'Google Code Search'. I would have expected Google to
come up with a search engine for all Open Source code otherwise.

------
oscargrouch
Nice to see GN there. I wish more people knew about it.

For me is as powerful as Bazel, but without the need for a JVM and all the
insanity that comes with it in a desktop/dev environment.

The syntax is great, powerful (insane customization) and together with Ninja
theres nothing like it.

Its in C++ and even being as powerful as Bazel, its a light, standalone
library that can handle a huge amount of source code, dependencies, tools and
configurations.

~~~
londons_explore
Having tried to battle GN configs... I don't agree.

I was working on a big source tree and got frustrated that it kept rebuilding
files that hasn't changed just because I switched git branches to look at one
file, and then suddenly "Yay, another 18 hour full rebuild!".

I tried to fix it and found there is no option to ignore file timestamps, and
some guy has tried to patch it to do that[1]... But the patch requires putting
an option in GN files which seems to break them wherever I put it... I tried
to patch GN, but it wouldn't ever seem to pass that option through... Ended up
patching Ninja to always have the option on, but then random other operations
broke (like simple file copies).

A day wasted, and problem not solved. Maybe my use case isn't common, or a bad
workman blames his tools, but for me at least it wasn't a nice experience.

[1]: [https://github.com/ninja-
build/ninja/issues/1459](https://github.com/ninja-build/ninja/issues/1459)

------
Omnipresent
sidebar question - Anyone know how they've made the interaction/animation on
this page [1] ? Feel like it is a great way to show lot of info in a concise
way.

[1] -
[https://opensource.google/projects/explore/featured](https://opensource.google/projects/explore/featured)

~~~
evere
Agreed, it is a very nice little interaction! It seems like they're animating
the bubbles around a circle while randomly fluctuating the speed and radius at
which they rotate. Clicking on a bubble centers it by setting the rotation
radius to `0` and expanding the size.

Would be interested to know how they expand the bubbles as your cursor moves
closer.

------
repomono
We are building similar experiences for internal repos.

Demo:
[https://demo.repomono.com/cs/view.php](https://demo.repomono.com/cs/view.php)

Code is here: [https://github.com/repomono/cs](https://github.com/repomono/cs)

------
gerash
It looks like they haven't integrated the kythe cross reference DB as the
symbols aren't clickable

~~~
kwh5336
A few projects have configured kythe for at least one language. See bazel, go,
gvisor, kythe, and tensorflow.

------
excerionsforte
First impression is that it enables discoverability of code across the open
sourced Google projects, but trying to find this page even on Google search is
not a thing yet. Is that intentional?

------
rerx
This is very useful to read and search TensorFlow source code. It definitely
beats Github for me.

------
revertts
How strange - OpenDNS/Cisco Umbrella seems to flag the domain and gives me a
403 Forbidden.

------
enitihas
Does anyone know the pros and cons of this vs something like Elastic search?

------
MichaelMoser123
so far it doesn't seem to index a lot of stuff. I searched from some terms out
of my kubernetes/openshift dependencies and it didn't find them. Is this
correct?

------
dunk010
Is this a frontend for Grok - the thing that Steve Yegge built?

------
whadar
For Java and JavaScript there's also codota.com

------
CydeWeys
Nice, my team's project is on there! (Nomulus)

------
sqs
Sourcegraph CEO here. This is the same underlying code search offered for a
while by Google Cloud Source Repositories for private code, and it’s cool to
see this usable for Google’s own open-source code, too.

If you want to get universal code search for your own (private) code on
any/all code hosts, Sourcegraph is easy to set up internally (self-hosted
Docker install) at
[https://docs.sourcegraph.com/](https://docs.sourcegraph.com/). Or you can get
code search for all OSS projects at
[https://sourcegraph.com/search](https://sourcegraph.com/search). More general
info at [https://about.sourcegraph.com](https://about.sourcegraph.com).

Lots of Xooglers and current Googlers use Sourcegraph, too. Just mentioning
Sourcegraph because I’ve seen several other folks mention us in the comments
(thanks!).

~~~
monadic2
What’s the code licensed under? It’s not clear from your site at first glance.

~~~
sqs
It's open core (Apache 2 + some non-OSS parts for enterprise features). All of
the code is public and we develop in the open at
[https://github.com/sourcegraph/sourcegraph](https://github.com/sourcegraph/sourcegraph).

~~~
dragonsh
Sourcegraph only support git repository so it's not very useful for enterprise
with mercurial, svn or other distributed version control systems.

There is another open source application for code search opengrok [1] (it's
completely open source unlike sourcegraph and supports multiple version
controls beside git).

Take a look. It's easy to install and operate on bare metal, cloud and
containers, instead of convoluted sourcegraph way of kubernetes or docker.

[1] [https://github.com/oracle/opengrok](https://github.com/oracle/opengrok)

~~~
aurelianito
You always can bridge to git from svn and mercurial. It is almost seamless,
and after generating the git repository everything will work.

~~~
dragonsh
Many organizations don’t use or want to use git. This is another convoluted
solution, trying to fit a square peg in round hole.

Another reason not to use sourcegraph is it’s proprietary (with some open
source parts), unlike opengrok fully open source.

~~~
judge2020
Is the sourcegraph "open core" unlike how redis is "open core", eg. The main
code is open but there are paid, closed-source modules and extensions?

~~~
sqs
Sourcegraph is open core like how GitLab and VS Code are open core. You can
run "Sourcegraph OSS" and get limited features, or you can run Sourcegraph
(see
[https://docs.sourcegraph.com/#quickstart](https://docs.sourcegraph.com/#quickstart))
and get all the features, but you need a license key when you hit the user
limit.

------
duckmysick
I really hate that some of the elements on the page are translated into a
different language, seemingly based on my IP. When did it become acceptable to
ignore my browser or my system language settings? The same thing happens on
other Google services (like Google Groups), but I noticed this trend on other
websites too.

~~~
scruffyherder
This annoys me to no end. I live in Hong Kong. I speak English. We have 2
official languages here, one of which is English. I travel frequently to Japan
as well, with infrequent trips to either Europe or North America.

My 'preferences' and settings are a total disaster. I end up having to go onto
the gray market to buy gift cards and prepaid credit cards as I seemingly
never can buy stuff online when I want to, as I'm either in the wrong place,
or in the wrong language. But I know I'm still me.

What is with this '100% of people in this location read/speak the same?'

What if I want to learn Russian, but I'm in China? Why cant I just tell my
computer to show me Russian, and the browser tells the site give me Russian if
you have it?

Why is this so hard?

I really dislike things that try to make it easy for me, as all they do is
prevent me from being able to function.

~~~
LeifCarrotson
The problem is that every big website operator wants to make it work correctly
for you, and they (1) have different definitions for success and (2) assume
you're incompetent.

In the first point, there's someone with a requirements document that assumes
every country has one official language and everyone in that country speaks
that language, and so feels successful and internationalization-ready when a
geo-IP served page is automatically switched to the "correct" language (much
like "Falsehoods programmers believe about names").

Second, configuring a computer's locale to set a browser's request headers
correctly is beyond the technical expertise of many users. It would be better
if things were consistent, but at the point where some locales were set
incorrectly and some were uniquely set intentionally your analytics would have
showed that you improved the situation on average by trying to guess the
locale (screwing over users who knew how to use their computer) than by
respecting it and eventually getting everyone to understand how to set their
desired language.

~~~
minusf
if you install a hungarian firefox, the accept language header will reflect
this (or it did when i tried it last time). Non-expert users also often choose
software in their language mutation. I dont have numbers but i wouldnt be
surprised if a lot of browsers were sending correct accept language headers.

i dont know IE but it was in a very good position to guess the language of the
user as well.

------
rochak
I hope I get to work for Google sometime.

~~~
vikinghckr
Just apply. Google is actively hiring all year round.

~~~
rochak
I am a grad student right now with 2 years of industry experience. Google
still prefers people who are extremely good at Data Structures and Algorithms.
I like doing them, but not so much to just grind them for the sake of getting
into Google. I like to learn how to design big systems and grinding Data
Structures and Algorithms seems like a waste of time.

~~~
CydeWeys
I put in "only" 40 hours of refreshing on data structures and algorithms, and
doing some practice coding problems, in the weeks leading up to my interview.
And I got the job.

Frankly, it's been the best hourly return on investment of anything I've done
in my life up to this point, by far. Assuming I wouldn't have gotten the job
otherwise (which seems reasonable), _each_ of those hours spent studying has
proven to be worth several tens of thousands of dollars. I'm not exaggerating;
I just did the math.

Maybe the interviewing process is broken or sub-optimal or whatever, but it is
what it is, and if you can get through it by doing some additional studying,
then it's absolutely worth it. Google is a good place to work on designing big
systems, so if that's your interest, consider just putting in the work.

~~~
rochak
This is a solid advice. Thanks! I will try to dedicate a portion of my day to
brusing up Data Structures and Algorithms and maybe, eventually, I will get
good enough to crack the interview.

------
jonathanoliver
I'm just gonna leave this here:
[https://killedbygoogle.com/](https://killedbygoogle.com/)

~~~
mav3rick
How is this contributing to the discussion. Do you feel cool now after a
snarky comment ?

~~~
carapace
Do you?

~~~
mav3rick
I am pointing out what he did. And yes I do feel cool for pointing out
something egregious.

