Hacker News new | past | comments | ask | show | jobs | submit login
A Whole New Code Search (github.com/blog)
366 points by bencevans on Jan 23, 2013 | hide | past | favorite | 109 comments

That is not the only disturbing part. SSH private key by itself is not much of a threat, but bundled together with known_hosts is a recipe for disaster.

At least now someone can push to his GitHub account to remove it for him. :-)

Not if he has a passphrase, right?


Why would people put dotfiles like ssh keys up on public github?

This kind of thing is best suited for a private repo (github is still ok, just make it private) - cause it's most likely of no use to anyone but that single user.

I would not suggest that it's okay even for a private repo. Never let your private keys leave your machine or its dedicated, encrypted backup.

Although I would never do this myself, if the keys themselves are encrypted with a password and then uploaded, it's not nearly as bad.

In the case of ssh keys, you usually should use a different key per device/home directory and let your server accept all the keys.

And that was just the one on the first page of results I got.

Nothing new about that, you didn't need github's improved search to do it.


Or Bitcoin RPC password!

A hacker-with-a-heart-of-gold will write a script to harvest these emails and send them a warning message with a link to this thread.

The one time those spammy GitHub bots could be put to good use

Most of those are examples or dummy data. Most of the messages this bot sends will be annoying and unwanted.

That is terrifying, I just logged in with three separate accounts and they worked. Obviously I logged out without fucking around with anything; why mess with somebody's professional work.

This is dangerous. But then again, is it Github's responsibility to keep these people from shooting themselves in the foot?

is it Github's responsibility to keep these people from shooting themselves in the foot?


Actually I'd like the presence of a "Report fool user" button just after "Report user"

Out of the ones I tried fb_secret seemed to have the most real results.


Someone is interested in what you and I have to say: https://github.com/ruggeri/hn-local-copy

Found via Github search

I'm one of the students of App Academy ( which Ned Ruggeri is co-founder of ). The reason for that is because today one of the tasks was to create a version of HN in ou terminals. HN was blocking people due to repeated requests and thus Ned made a local version of HN for students to use.


That's Ned Ruggeri, co-founder of App Academy (http://www.appacademy.io/). His HN account: http://news.ycombinator.com/user?id=ruggeri

I think github should keep an active list of filters that they apply to all code submitted to their service.

Such as when it is a key file, or is a known credential file -- "amazon_s3.yml" for example, they should send a warning to the committer.

And then show a big red flag on the website if the repo is public.

And of course, remove the results from search.

I know it's not github's responsibility, but it would help make the web a bit safer.

It took me all of two seconds to think of the same thing, too. Here's hoping this is a big net win for parameterized security tokens.

I found about that a short time ago while crawling github with Nuuton. A lot of people don't seem to be security aware. This is one of those things that search allows you to have fun with (by fun I mean be surprised, and by with I mean to only look and not use). You should see the stuff to be found on facebook.

real "crypto_key" is pretty widespread as well :(

Thank goodness. This is the part of GitHub that has been driving me up the wall for months. Google is pretty useless in this area when you're looking for something buried within a repo.

Fantastic job, it works beautifully. Congratulations (to GitHub and to Elastic Search - I'm sure it's a big win for them too!)

This just reminds me of Google code search and makes me miss it more. Searching by regex was pretty useful.

I was hoping for regex support, but I guess it's pretty tough without Google-like scale.

Here's an interesting article describing how it worked: http://swtch.com/~rsc/regexp/regexp4.html

It still exists at https://code.google.com/codesearch though it no longer searches everything, only those repositories hosted on googlecode.com itself.

Excellent feature. Thanks for making life a little better for a lot of us.

On a side note, I wonder how long before it'll be used to find security flaws in code (that results in an exploit) - I bet there are hundreds of hard-coded passwords, insecure defaults etc. all over the place.

Is anyone impressed else by how quickly and successfully* GitHub has been rolling out new features over the past few months? I think almost every one of their new features has in some way made my life a little easier.

Kudos to the whole team.

* Granted, uptime might have been a causality.

I think you mean "casualty". :)

Hah, yep, good catch. Damn you [manual] spelling correct ;)

This is pretty cool for finding local talent: https://github.com/search?l=Ruby&p=1&q=location%3A%2...

Oh my. And suddenly, github gave grep to the web. Thanks for that great work.

Here's.. a grep for the web, though: https://blekko.com/webgrep

Or rather, fgrep. But it's still a welcome feature, and there's https://code.google.com/p/codesearch/ for locally available code.

`git grep` within repos as well

Just took it for a spin. The implementation is fantastic.

Bravo, Github!

Cudos to github. I am running http://searchco.de/ I know how hard this problem actually is and they have done a fantastic job of it.

Nice! impressed by speed and quality of the results + features. Definitively going to use this in the feature. Also enjoyed your blog post on how to create a search engine.

Thanks! It has been 2 years in the making to this point, straight after Google Code Search closed and I was feeling the pain of its loss.

Github's just been killing it lately. Great start to the year.

Is there an "order by" functionality? When looking for code. I like to find the code with the most stars.

Ugh.. that reminds me of the good old days when HN had visible karma points...

and order by last updated please

This is awesome. Searching GitHub code has been my way of learning idioms or seeing how others have solved a similar problem. This will make it much easier.

Feature Request

Allow us to search only within our starred repositories. The current search for starred repositories is not so great because it doesn't search the README (or code but README is more useful in finding the repo I want)

Any stats on how much documents or GB is in the index, how big the cluster is, and how long this took to build?

For those interested:

1212672153 documents across 2866400 repositories taking up 17 TB of disk space over 23 elasticsearch storage nodes fronted by 8 elasticsearch compute nodes

It took about a month to iterate over all the repositories stored on the file servers and index the source code.

Are there any plans to make this available through the API?

Yes. There will be a blog post when API access is available.

+1 I'm anxiously awaiting the answer to this question too.

Nice one! Elasticsearch is great software for that purpose. I guess github's cluster is one of the bigger one out there now...

It is ironic since one of the Github's latest changes broke plugin distribution for ElasticSearch.

This is a great enhancement. It's been a problem finding quality repos on Github as one of the key indicators for me was the "freshness" or time since last commit. The previous interface did not make this easy to evaluate but it looks like this new search has enough options to make my searching actually useful.

Nice. There are six C++ projects that use bool_t type.

And two (one of which has two additional forks) C projects that reference <stdbool.h>. Sadness!

> To ensure better relevancy, we're being conservative in what we add to the search index. Repository forks will not be searchable unless the fork has more stars than the parent repository, for example.

This has a grandfathering problem when the maintainers switch. The new active branch of development is overshadowed by the previous branch. I've had someone takeover my project, but I still have 2 years of accumulated stars from when the project was fresh. The new development has less than 1/10 the number of stars as my branch. But I guess fixing this is kind corner case might be left for v2.

Have you considered renaming your fork? Like xyz-old? One of the things that annoys (and sometimes infuriates) me about github is identifying the fork I want to be using. A million old blog posts point to the wrong place. If that repo were replaced with a new blank repo saying "moved to here" it'd be a big help.

I did change the readme to point to the new mainline development. I still think there might be some utility in having the code as I left it around, so I didn't delete the repo.

Deleting your repo and recreating it as a fork of the new upstream may be a good idea, although it does break the links to all of the other repositories forked from yours. It really would be nice if Github handled this case better.

Deleting and recreating your repo isn't necessary. If there's a problem with your fork not showing up, let us know and we'll look into it.

Didn't you transfer the ownership of repository ?

Now someone only needs to integrate this with the Facebook social graph search.

"People who like ruby and python and have write access in the php organization repositories"

This is sooooooo awesome. I have needed to search my private repos many times, and descending to the command line requires too much googling for syntax for my taste.

If you install ack [1] searching via the command line is really quite easy.

[1]: http://betterthangrep.com/

Still feel this is missing the preciseness of former google code search. It doesn't respect many of the symbols common in programming languages.

Here is an example. The query "format(args" will match "func formatArgs(args", but not "func format(args".


Its not as comprehensive as the github search (I don't have the same hooks into the data they do) but this http://searchco.de/?q=format(args%20lang%3AGo and http://searchco.de/?q=format(args%20repo%3Agodis%20lang%3AGo respects your query as you would expect.

Where is the link to the search page? When I go to github.com and it displays my dashboard, I can't find anything resembling a link to the search page. There's the command bar, but it doesn't seem to provide code search unless you click the advanced search link. I love the new features Github has developed over time, but if it's a pain to find the feature, it's not going to see a lot of use.

https://github.com/search and https://github.com/search/advanced respectively. But you can just use the dashboard search.

I am crying tears of joy right now. Just. So happy.

While this is excellent, most of my search likely just revolves around particular repositories. So while I'm addressing issues or writing comments, I can make searches like `branch:feature/foobar OrderModel` or `file:app/config.yml` and the like returning blazingly fast results.

Is GitHub working on something similar? Or am I the only one who'd want search for this purpose?

I just updated a userscript I created a while back so it works with the new code search: https://github.com/skratchdot/github-code-search.user.js/

It's just a shortcut for searching the current repository you are viewing (by adding a search box next to the tag count).

Your script would be awesome, something I don't understand why Github doesn't provide natively. Unfortunately it isn't working for me that well.

Go to one of my org's repos: http://github.com/railsforcharity/spokenvote

and search this string (which is in our seeds.rb): users << User.create({name: 'Voter1', email: 'voter1@example.com', password: 'abc123', password_confirmation: 'abc123'})

I not only don't get the right result, I don't get any text in the search results area of the page, not even a nothing found message. Same thing if I do the search on my personal fork.

Sorry for the delay. I was at work so couldn't respond right away.

All the userscript does is proxy the advanced search results page, so the search you tried hits this page:


If you only search for voter1@example.com, results are shown:


Anyways, you did find a "0 results" bug which should now be fixed. Thanks for that!

Glad to point out someone else's mistake for a change ;)

Is it me or does is seem like every day either Github or Stripe shipping some sweet new feature?

I need to read less and code more.

It's probably the fact that they ship a lot and HN is non-platonically in love with both so new features get a lot of attention.

Being able to filter by stars is great...One other filter that would be nice is days-since-lsat-commit. I guess number of stars is rough estimate of how active a repo is, but as the years go on, there will probably be more and more projects that are basically dead, yet still have a high number of stars.

Filters and sorting are definitely next on our priority. We will do another blog post when that happens.

Awesome. I searched my email and was surprised to find my most shared code is... a Vim theme! I would never have thought that .


This is awesome, but I'm a little confused about the treatment of underscores. If you search for something like "secret_key" (teehee) it will return results for just 'secret' and just 'key' seemingly :-\ Not what I expect out of code search, but easy enough to fix if it's deemed a bug.

It looks like I jumped the gun. The search itself seems fine, but the highlighter is a bit overzealous

Surround in quotes and it won't split on the underscore.

A different kind of use case: attribution.

In my case I used this to correct an improper attribution to me, I searched for my name and came across:


Except I didn't write OAuth.php. I wrote a different library.

Hah, fantastic!

I just did a search on my name and found a litle project I had never heard of that was based of a little thing I did.

Kind of ego boosting and nice to know that some things you do are actually found useful by people.

A lot of potential for recruiter spam if someone just takes an hour or two to write a tool.

very nice. one bad setting (perhaps a default that never got overridden) is that searching for foo_bar searches for foo and bar separately, rather than treating foo_bar as an indivisible token.

Just search for '"foo_bar"' rather than 'foo_bar'.

i know, but in the context of code search foo_bar is almost always a single token, since the underscore is nigh-universally used in variable and function names.

Its pretty neat now. They forgot the "I'm feeling lucky" button though.

I don't see a clear way to find projects in a size range. For example, between 300 and 500 stars.

Though I suppose you can just search for <500 and stop looking when you get to 300.

When I try the advanced search with "something stars:>500" it also seems to match the string "500" in code for some reason.

Edit: And apparently "stars" for that matter.

Checkout http://code.ohloh.net/ as well -- you can search beyond just Github.

I really like the fact that i can finally see how many people are using my library, a lot more than i thought :) Thanks Github!

I wonder how they combine the database and the repository/git data raw

I'll be really pleased if this works in a usable manner, I'd very much given up on search in github.

It seems better than usable. It's quite good.

Can you search wiki's yet?

Facebook Graph Search, now GutHub Search, whats next?


Not sure what this phantom comment is. I must have inadvertently left it while browsing on my phone with the iOS app. Would delete if I could.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact