
A Whole New Code Search - bencevans
https://github.com/blog/1381-a-whole-new-code-search
======
MattRogish
Interesting:
[https://github.com/search?p=4&q=gmail_password&ref=s...](https://github.com/search?p=4&q=gmail_password&ref=searchresults&type=Code)

[https://github.com/search?p=4&q=secret_token&ref=sea...](https://github.com/search?p=4&q=secret_token&ref=searchresults&type=Code)

~~~
bitops
Or SSH private keys.

[https://github.com/search?q=path%3A.ssh%2Fid_rsa&type=Co...](https://github.com/search?q=path%3A.ssh%2Fid_rsa&type=Code&ref=searchresults)

~~~
donretag
That is not the only disturbing part. SSH private key by itself is not much of
a threat, but bundled together with known_hosts is a recipe for disaster.

~~~
graywh
<https://github.com/gomachan/dotfiles/tree/master/.ssh>

~~~
enneff
At least now someone can push to his GitHub account to remove it for him. :-)

~~~
ibrahima
Not if he has a passphrase, right?

~~~
enneff
Correct.

------
obeattie
Thank goodness. This is the part of GitHub that has been driving me up the
wall for months. Google is pretty useless in this area when you're looking for
something buried _within_ a repo.

Fantastic job, it works beautifully. Congratulations (to GitHub and to Elastic
Search - I'm sure it's a big win for them too!)

------
ori_b
This just reminds me of Google code search and makes me miss it more.
Searching by regex was pretty useful.

~~~
tlrobinson
I was hoping for regex support, but I guess it's pretty tough without Google-
like scale.

Here's an interesting article describing how it worked:
<http://swtch.com/~rsc/regexp/regexp4.html>

------
xfax
Excellent feature. Thanks for making life a little better for a lot of us.

On a side note, I wonder how long before it'll be used to find security flaws
in code (that results in an exploit) - I bet there are hundreds of hard-coded
passwords, insecure defaults etc. all over the place.

------
aroman
Is anyone impressed else by how quickly and successfully* GitHub has been
rolling out new features over the past few months? I think almost every one of
their new features has in some way made my life a little easier.

Kudos to the whole team.

* Granted, uptime might have been a causality.

~~~
slashclee
I think you mean "casualty". :)

~~~
aroman
Hah, yep, good catch. Damn you [manual] spelling correct ;)

------
benmanns
This is pretty cool for finding local talent:
[https://github.com/search?l=Ruby&p=1&q=location%3A%2...](https://github.com/search?l=Ruby&p=1&q=location%3A%22Lynchburg%2C+Virginia%22+location%3A%22Lynchburg%2C+VA%22&ref=searchresults&type=Users)

------
oelmekki
Oh my. And suddenly, github gave grep to the web. Thanks for that great work.

~~~
dmit
Or rather, fgrep. But it's still a welcome feature, and there's
<https://code.google.com/p/codesearch/> for locally available code.

~~~
ConstantineXVI
`git grep` within repos as well

------
ghc
Just took it for a spin. The implementation is _fantastic_.

Bravo, Github!

------
boyter
Cudos to github. I am running <http://searchco.de/> I know how hard this
problem actually is and they have done a fantastic job of it.

~~~
simonz05
Nice! impressed by speed and quality of the results + features. Definitively
going to use this in the feature. Also enjoyed your blog post on how to create
a search engine.

~~~
boyter
Thanks! It has been 2 years in the making to this point, straight after Google
Code Search closed and I was feeling the pain of its loss.

------
drhayes9
Github's just been killing it lately. Great start to the year.

------
vinhboy
Is there an "order by" functionality? When looking for code. I like to find
the code with the most stars.

Ugh.. that reminds me of the good old days when HN had visible karma points...

~~~
maxdemarzi
and order by last updated please

------
tazzy531
This is awesome. Searching GitHub code has been my way of learning idioms or
seeing how others have solved a similar problem. This will make it much
easier.

------
boundlessdreamz
Feature Request

Allow us to search only within our starred repositories. The current search
for starred repositories is not so great because it doesn't search the README
(or code but README is more useful in finding the repo I want)

------
PanMan
Any stats on how much documents or GB is in the index, how big the cluster is,
and how long this took to build?

~~~
pea53
For those interested:

1212672153 documents across 2866400 repositories taking up 17 TB of disk space
over 23 elasticsearch storage nodes fronted by 8 elasticsearch compute nodes

It took about a month to iterate over all the repositories stored on the file
servers and index the source code.

------
ushi
Are there any plans to make this available through the API?

~~~
pea53
Yes. There will be a blog post when API access is available.

------
dexcs
Nice one! Elasticsearch is great software for that purpose. I guess github's
cluster is one of the bigger one out there now...

~~~
donretag
It is ironic since one of the Github's latest changes broke plugin
distribution for ElasticSearch.

------
proexploit
This is a great enhancement. It's been a problem finding quality repos on
Github as one of the key indicators for me was the "freshness" or time since
last commit. The previous interface did not make this easy to evaluate but it
looks like this new search has enough options to make my searching actually
useful.

------
eps
Nice. There are six C++ projects that use bool_t type.

~~~
unwind
And _two_ (one of which has two additional forks) C projects that reference
<stdbool.h>. Sadness!

------
robrenaud
> To ensure better relevancy, we're being conservative in what we add to the
> search index. Repository forks will not be searchable unless the fork has
> more stars than the parent repository, for example.

This has a grandfathering problem when the maintainers switch. The new active
branch of development is overshadowed by the previous branch. I've had someone
takeover my project, but I still have 2 years of accumulated stars from when
the project was fresh. The new development has less than 1/10 the number of
stars as my branch. But I guess fixing this is kind corner case might be left
for v2.

~~~
tedunangst
Have you considered renaming your fork? Like xyz-old? One of the things that
annoys (and sometimes infuriates) me about github is identifying the fork I
want to be using. A million old blog posts point to the wrong place. If that
repo were replaced with a new blank repo saying "moved to here" it'd be a big
help.

~~~
robrenaud
I did change the readme to point to the new mainline development. I still
think there might be some utility in having the code as I left it around, so I
didn't delete the repo.

~~~
plorkyeran
Deleting your repo and recreating it as a fork of the new upstream may be a
good idea, although it does break the links to all of the other repositories
forked from yours. It really would be nice if Github handled this case better.

~~~
dewski
Deleting and recreating your repo isn't necessary. If there's a problem with
your fork not showing up, let us know and we'll look into it.

------
adamnemecek
Now someone only needs to integrate this with the Facebook social graph
search.

~~~
arcatek
"People who like ruby and python and have write access in the php organization
repositories"

------
jasonkolb
This is sooooooo awesome. I have needed to search my private repos many times,
and descending to the command line requires too much googling for syntax for
my taste.

~~~
d0ugal
If you install ack [1] searching via the command line is really quite easy.

[1]: <http://betterthangrep.com/>

------
simonz05
Still feel this is missing the preciseness of former google code search. It
doesn't respect many of the symbols common in programming languages.

Here is an example. The query "format(args" will match "func formatArgs(args",
but not "func format(args".

[https://github.com/search?q=format%28args+repo%3Asimonz05%2F...](https://github.com/search?q=format%28args+repo%3Asimonz05%2Fgodis+path%3Aexp%2Fformat.go+&type=Code&ref=searchresults)

~~~
boyter
Its not as comprehensive as the github search (I don't have the same hooks
into the data they do) but this
<http://searchco.de/?q=format(args%20lang%3AGo> and
<http://searchco.de/?q=format(args%20repo%3Agodis%20lang%3AGo> respects your
query as you would expect.

------
axefrog
Where is the link to the search page? When I go to github.com and it displays
my dashboard, I can't find anything resembling a link to the search page.
There's the command bar, but it doesn't seem to provide code search unless you
click the advanced search link. I love the new features Github has developed
over time, but if it's a pain to find the feature, it's not going to see a lot
of use.

~~~
adamnemecek
<https://github.com/search> and <https://github.com/search/advanced>
respectively. But you can just use the dashboard search.

------
geetarista
I am crying tears of joy right now. Just. So happy.

------
GVRV
While this is excellent, most of my search likely just revolves around
particular repositories. So while I'm addressing issues or writing comments, I
can make searches like `branch:feature/foobar OrderModel` or
`file:app/config.yml` and the like returning blazingly fast results.

Is GitHub working on something similar? Or am I the only one who'd want search
for this purpose?

------
BinaryBullet
I just updated a userscript I created a while back so it works with the new
code search: <https://github.com/skratchdot/github-code-search.user.js/>

It's just a shortcut for searching the current repository you are viewing (by
adding a search box next to the tag count).

~~~
BillSaysThis
Your script would be awesome, something I don't understand why Github doesn't
provide natively. Unfortunately it isn't working for me that well.

Go to one of my org's repos: <http://github.com/railsforcharity/spokenvote>

and search this string (which is in our seeds.rb): users << User.create({name:
'Voter1', email: 'voter1@example.com', password: 'abc123',
password_confirmation: 'abc123'})

I not only don't get the right result, I don't get any text in the search
results area of the page, not even a nothing found message. Same thing if I do
the search on my personal fork.

~~~
BinaryBullet
Sorry for the delay. I was at work so couldn't respond right away.

All the userscript does is proxy the advanced search results page, so the
search you tried hits this page:

[https://github.com/search?type=Code&q=users+%3C%3C+User....](https://github.com/search?type=Code&q=users+%3C%3C+User.create%28{name%3A+%27Voter1%27%2C+email%3A+%27voter1%40example.com%27%2C+password%3A+%27abc123%27%2C+password_confirmation%3A+%27abc123%27}%29+repo%3Arailsforcharity%2Fspokenvote&p=1)

If you only search for voter1@example.com, results are shown:

[https://github.com/search?type=Code&q=voter1%40example.c...](https://github.com/search?type=Code&q=voter1%40example.com+repo%3Arailsforcharity%2Fspokenvote&p=1)

Anyways, you did find a "0 results" bug which should now be fixed. Thanks for
that!

~~~
BillSaysThis
Glad to point out someone else's mistake for a change ;)

------
juddlyon
Is it me or does is seem like every day either Github or Stripe shipping some
sweet new feature?

I need to read less and code more.

~~~
adamnemecek
It's probably the fact that they ship a lot and HN is non-platonically in love
with both so new features get a lot of attention.

------
danso
Being able to filter by stars is great...One other filter that would be nice
is days-since-lsat-commit. I guess number of stars is rough estimate of how
active a repo is, but as the years go on, there will probably be more and more
projects that are basically dead, yet still have a high number of stars.

~~~
rohanjon
Filters and sorting are definitely next on our priority. We will do another
blog post when that happens.

------
hcarvalhoalves
Awesome. I searched my email and was surprised to find my most shared code
is... a Vim theme! I would never have thought that .

[https://github.com/search?l=vim&p=1&q=hcarvalhoalves...](https://github.com/search?l=vim&p=1&q=hcarvalhoalves&ref=simplesearch&type=Code)

------
knewter
This is awesome, but I'm a little confused about the treatment of underscores.
If you search for something like "secret_key" (teehee) it will return results
for just 'secret' and just 'key' seemingly :-\ Not what I expect out of code
search, but easy enough to fix if it's deemed a bug.

~~~
knewter
It looks like I jumped the gun. The search itself seems fine, but the
highlighter is a bit overzealous

------
zippie
A different kind of use case: attribution.

In my case I used this to correct an improper attribution to me, I searched
for my name and came across:

<https://github.com/dks/TinyOAuth>

Except I didn't write OAuth.php. I wrote a different library.

~~~
rurounijones
Hah, fantastic!

I just did a search on my name and found a litle project I had never heard of
that was based of a little thing I did.

Kind of ego boosting and nice to know that some things you do are actually
found useful by people.

------
systematical
A lot of potential for recruiter spam if someone just takes an hour or two to
write a tool.

------
zem
very nice. one bad setting (perhaps a default that never got overridden) is
that searching for foo_bar searches for foo and bar separately, rather than
treating foo_bar as an indivisible token.

~~~
ushi
Just search for '"foo_bar"' rather than 'foo_bar'.

~~~
zem
i know, but in the context of code search foo_bar is almost always a single
token, since the underscore is nigh-universally used in variable and function
names.

------
sharjeel
Its pretty neat now. They forgot the "I'm feeling lucky" button though.

------
rkuykendall-com
I don't see a clear way to find projects in a size range. For example, between
300 and 500 stars.

Though I suppose you can just search for <500 and stop looking when you get to
300.

------
anonymouz
When I try the advanced search with "something stars:>500" it also seems to
match the string "500" in code for some reason.

Edit: And apparently "stars" for that matter.

------
amujumdar
Checkout <http://code.ohloh.net/> as well -- you can search beyond just
Github.

------
huskyr
I really like the fact that i can finally see how many people are using my
library, a lot more than i thought :) Thanks Github!

------
fidz
I wonder how they combine the database and the repository/git data raw

------
ratherbefuddled
I'll be really pleased if this works in a usable manner, I'd very much given
up on search in github.

~~~
dlf
It seems better than usable. It's quite good.

------
bobcattr
Can you search wiki's yet?

------
stroebjo
Facebook Graph Search, now GutHub Search, whats next?

------
dude3
FINALLY!!!!

------
doktrin


~~~
doktrin
Not sure what this phantom comment is. I must have inadvertently left it while
browsing on my phone with the iOS app. Would delete if I could.

