
Github search sucks (and how it could be better) - RenaudWasTaken
https://github.com/isaacs/github/issues/908
======
rhelmer
I use the Mozilla DXR project (or the searchfox.org fork) every day, which is
pretty great for code.

Not only can it quickly search across large codebases, it parses JS/Python and
the output of clang (for C/C++) to allow quickly finding the definitions of
functions, declarations of variables, and so on (try hovering over a variable
for instance):

[https://dxr.mozilla.org/mozilla-
central/source/browser/compo...](https://dxr.mozilla.org/mozilla-
central/source/browser/components/about/AboutRedirector.cpp)
[https://dxr.mozilla.org/mozilla-
central/source/toolkit/mozap...](https://dxr.mozilla.org/mozilla-
central/source/toolkit/mozapps/extensions/AddonManager.jsm)

Nothing that many mainstream IDEs can't do, but having it on the web and being
able to quickly link people and not requiring local setup helps tremendously
to get people up to speed quickly.

~~~
ketralnis
I might just be searching for the wrong terms, but am I understanding that
this is specific to searching in the mozilla codebase? Is there a version that
can work on arbitrary codebases?

------
danpelota
I would love to be able to search all code for a string and then either (1)
sort the resulting repositories by stars/forks; or (2) limit the results to
repositories with >X stars/forks. When learning a new framework or library I
like to find popular projects that use it and read the code to get a sense of
conventions, architecture, etc. For instance, it'd be fantastic to find all
repositories with over 20 stars containing a *.py file with "import flask" or
"from flask" in them.

~~~
hbt
all files ending with py with "import flask"

[https://github.com/search?q=filename%3A%2A.py+%22import+flas...](https://github.com/search?q=filename%3A%2A.py+%22import+flask%22+&type=Code&utf8=%E2%9C%93)

use advanced search to specify stars:>20 etc.

~~~
danpelota
Unfortunately, you can search _code_ by file extension and phrase, and you can
use advanced search to search for repository _descriptions_ filtering by
stars, but I don't believe you can do both at once.

For instance, searching for "flask" and limiting the results to >1000 stars
returns only the 27 repositories with a matching description[0], but the code
search returns over 4 million results, ignoring the stars parameter[1].

[https://github.com/search?l=&q=flask+stars%3A%3E1000&ref=adv...](https://github.com/search?l=&q=flask+stars%3A%3E1000&ref=advsearch&type=Repositories&utf8=%E2%9C%93)

[https://github.com/search?l=&q=flask+stars%3A%3E1000&ref=adv...](https://github.com/search?l=&q=flask+stars%3A%3E1000&ref=advsearch&type=Code&utf8=%E2%9C%93)

~~~
chatmasta
How would you build the search results UI for a grouped query like this? If
one repository has 10k stars and has 1000 files with matching strings, should
the first 1000 results be from the same repository?

------
jowiar
The (poor) choice of title here is the difference between "Here are some
things that would make my life easier" and "I'm an entitled prick".

~~~
nto
+1

------
londons_explore
github should try to compile code. Where that succeeds, it will give them full
type information for every variable, and information on every function call,
just like a good IDE has when doing code completion.

With that info, they would be able to build an awesome search system.

~~~
applecrazy
That costs a LOT of money to do (infra is $$$) and you also have the issue of
sandboxing code.

It would be hard and expensive with not much benefit to their bottom line.

~~~
MaulingMonkey
Gitlab manages to do CI (partnering with DigitalOcean), which managed to get
me at least partially switched, and has further potential for upsell (have
more CI servers! with exotic configurations!)

Not all languages which could significantly benefit from pulling out type info
even compile. Something significantly smaller scoped would be to have github's
search aware of and consume some kind of intellisense-esque database or
structured documentation format that any CI process could output. (Of course,
someone needs to write the tools to generate said output in the first
place...)

