Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Code Search (Preview) (cs.github.com)
153 points by judge2020 87 days ago | hide | past | favorite | 78 comments

Sourcegraph CEO here. We make a popular code search/intelligence product that partially competes with this (referenced many times in the comments here). Right now, most devs don't use code search or know why it's great. GitHub's new code search is introducing many more devs to code search, and that is helping us.

Devs win when Sourcegraph and GitHub compete. And yes, Microsoft/GitHub will compete fiercely with us--and we'll compete back. Devs should cheer on the competition, rather than let Microsoft/GitHub get complacent.

There are 2 situations where Sourcegraph shines:

1. When you use something many times daily, then you need/deserve the best. We're 100% focused on being 100x better: speed, precise/cross-repo code nav, comprehensive result sets, all branches/commits, more code metadata, all code hosts (GitLab+Bitbucket+any others), way more languages, batch changes, insights, etc. GitHub's code search will be a good place to start if you're new to code search, but Sourcegraph is and will continue to be better for devs who rely on code search frequently, which in my experience is most devs once they get hooked on code search. :) (If we aren't much better than GitHub code search, we don't deserve to exist. Please give us direct feedback on how we can be better--the more competition Microsoft/GitHub feels, the better for everyone!)

2. For companies whose needs differ from what GitHub's core userbase wants, such as self-hosted deployment, single-tenant/managed-instance cloud deployment, massive codebases, numerous repositories, many languages, many code hosts, custom integrations, heavy API usage, etc.

Whether you use Sourcegraph or GitHub, we'll keep improving Sourcegraph to keep up the competitive pressure on GitHub to continue improving their code search. Everyone wins. :)

Also, regarding Microsoft/GitHub code search's distribution advantage for companies whose code is already all on GitHub.com:

1. That means our product needs to be even better, and we're up for that challenge! Devs thankfully have a lot of choice over what tools they use, and a better product that devs demand will thrive even in the face of a Microsoft/GitHub distribution advantage.

2. Sourcegraph has plenty of advantages, too: we're 100% focused on this (vs. it just being one of Microsoft/GitHub's many features), our customers are extremely sophisticated eng teams with heavy usage who help us see and build the future of code search, we address a much larger market (not just companies with code exclusively on GitHub.com), etc.

3. Not all the world's code is on GitHub.com. Far from it! And thank goodness. GitLab, Bitbucket, and tons of other code hosts exist, which means that Microsoft/GitHub doesn't have a monopoly. Let's keep it that way.

4. To the extent Microsoft/GitHub has a distribution advantage for code on GitHub.com, that makes it all the more important to keep the competitive pressure on Microsoft/GitHub so they don't get complacent and exploit their advantage in one area (code hosting) to dominate another area. So even if you're using GitHub code search today, root for more competition!

It's interesting that you bring up "more code metadata" and "self-hosted deployment" as strengths of Sourcegraph. I work for a company that licenses the self-hosted enterprise SG. We have had bugs open for months due to code metadata issues crippling the usefulness of the tool. Specifically, deleted repositories are not getting de-indexed and archived repositories are not getting indexed as such. This makes search results extremely noisy and in some cases outright incorrect. This forces our teams to create extensive "search contexts" in order to find relevant results. I use this tool daily and advocate for more usage internally, which speaks to its usefulness, but it's painful to teach new users all the caveats of poor data quality.

I'm really sorry about those problems. Can you please email me at sqs@sourcegraph.com and let me know which company you're at? I will see why these haven't been fixed yet and figure out how we can get them resolved for your company ASAP.

edit: The person emailed me (thank you!), and we will get this resolved ASAP.

@sqs: Quinn, is it common to leave enterprise customers' bugs open for months? Why does it take social media escalation to gain traction?

I say this as a big fan of SG, because issues like this ring like a big red alarm bell to me.

The GitHub monopoly stranglehold over OSS needs to be broken, not further enriched. You are one of our only hopes..

No, this is neither common nor acceptable.

Just passing by, but I’d love to read a story what happened with this user’s bug report. This looks like a serious usability problem for those who delete or re-create repositories often, and I guess that it wasn’t prioritised correctly or went into dev/null due to pilot’s error. So where exactly this report got stuck?

+1, good idea @COMMENT___.

I remember back in the there was code.google.com (or something like that) which allowed searching over repositories. I was assigned to creating firefox addon including native part with C++ and it was kind of daunting task, there were little documentation. This tool was indispensable for me, it allowed to quickly look at any API usage. I wouldn't be able to complete that project without it.

Since Google closed this project, I never used anything similar, but I think that I should be. I tried few times to use ordinary Github search, but it wasn't good enough to be useful. Hopefully this project will change it. Done properly it would be a very powerful tool.

What's not clear to me is whether that tool searches only on Github or on all repositories in the Internet? Hopefully it's the latter.

We still use that tool internally at Google, and we do have some of our open source projects browsable like:




for more Google related projects

Do you know https://grep.app? :)

So I can get a better sense of how to use this tool, if I wanted to see examples of say a 'Users' collection API from multiple Django projects, how would I do so? Looks like I can do keyword matches and language specific filtering but would love tips beyond that!

Awesome! I sometimes search across very niche projects on Github, where their codesearch doesn't show the results I want. I KNOW that the result exists, and can see it in the repository, but their codesearch doesn't show the results I want.

This is one of my most used tools. Love using it!

Probably former

for the latter, use https://sourcegraph.com

I've been using it for a couple months and it's kind of a weird experience. If it works you can feed it regex and stuff which is nice. But sometimes it will just tell you that it doesn't know anything about the repository, or it will miss results entirely that the normal GitHub search (bad as it is) finds easily. And the website is this weird div soup that doesn't really let you copy or anything, which makes it kind of frustrating to work with.

I've been in the alpha for about six months now and I love this - I use it several times a day.

It's become my default option for API documentation: given basically any API I want to use, this gets me real-world examples of it in seconds - often far more useful than the official documentation.

GitHub code search has been terrible since forever. They keep braying about preview stuff, AI search, etc. but when will this make it into their mainstream (let alone their on-prem) Enterprise product? (This has been in preview since December last year, but https://github.blog/2018-09-18-towards-natural-language-sema... was posted back in 2018 yet is still to be released.)

I agree that semantic code search would be a nice tool in exploratory and onboarding scenarios. The GitHub blog you posted and the CodeSearchNet project was a catalyst that got me interested in semantic code search. That's why I've been hacking on https://codesearch.ai in my spare time at Sourcegraph, which implements a version of semantic code search w/ transformers.

I've tried it for several months and compared the results to https://grep.app (I normally use it many times a day) and while GitHub cs has way more indexed repositories, the results of grep.app are still way better and more helpful to me.

I wish they would have some concept of "reputable" projects, which would be shown more prominently. For example, if I'm trying to search all of GitHub for a certain language API or library reference, I would rather get code examples from large actively maintained repositories first instead of 10 forks of the same long forgotten repo.

So, stars ?

Probably also other measures like pageviews, artifact downloads, etc. that aren't publicly exposed.

See also:

• searchcode: https://searchcode.com/

• Debian Code Search: http://codesearch.debian.net/

• Python Code Examples: https://www.programcreek.com/python/

• Linux kernel code search: https://livegrep.com/search/linux


C and C++ code search: https://codesearch.isocpp.org/ (uses Debian as source)

Seems very similar to https://sourcegraph.com/search, except that github will be limited to, well, github only.

I've been working on building plugins on top of JupyterLab and their documentation leaves much to be desired. I usually have 15 tabs open on their public GitHub repositories learning how to use their own API. GitHub does a decent job of parsing symbols (class names, etc.) so you can double-click on those to see dependencies and references of some symbols quite helpful. Of course this feature breaks down when code is split across multiple repositories.

Also one thing that has been missing from all these code search engines is tracking how a variable might traverse through the code. In cybersec this is called taint analysis. But for my purposes it greatly helps with just program understanding.

My .02 is whatever GitHub does here should be directly integrated into the main GitHub UI.

For the other comments who wonder. in their announcement [1] they said that this is for beta trial and then once they are ready it will be integrated into main github search

[1] https://github.blog/2021-12-08-improving-github-code-search/

Discussed on HN:

https://news.ycombinator.com/item?id=29487237 (217 comments)

I know they’ve said it’ll be integrated into the main GitHub site, but the way it’s pitched on the landing page makes me wonder if they’ll roll it into an individual subscription like the Copilot product.

I use this a fair bit in daily work but really it's an absolute gold mine when learning a new language or framework etc.

Currently learning Rust and I use it multiple times a day to learn patterns of usage for different libraries and language features. Documentation examples may not be that great and with code search you can find actual use cases.

Absolute game changer for me.

I legit appreciate you sharing this tip because I'm learning rust too and many times, documentation just doesn't cut it.

I'm one of the engineers at GitHub that built this - happy to answer questions (or fast-track your access into the preview!)

Hey, I’m the developer of a popular tool for searching all of public GitHub for sensitive information (GitHub.com/tillson/git-hound) … would love to get access to play with it and see how it can improve secret detection. github.com/tillson

I use GitHub search all the time to read and understand code on my phone, this thing will work on the web UI, right?

I would be delighted to getfast-tracked and get access to better search. github.com/fiatjaf

Would definitely appreciate being fast tracked!


That’s be awesome, would love to move off sourcegraph! github.com/bencooper222

What's wrong with sourcegraph?

The domain is a bit annoyingly long but other than that..?

How well does it work with C++ code?

Very nicely, they have a ton of high quality language integrations! Try it out and you'll discover how good it is.

Interesting because on https://lsif.dev/ I see that LSIF support for C++, which basically is just a wrapper around clangd AFAIU, is deprecated. Is there something else that replaced it?

I've been using the preview for a few months, it works pretty well. That said, one thing I miss from the regular code search feature is the ability to use a form to filter search results, or to refine results using suggested filters after an initial search.

The regular search results page usually has a few pre-set choices that you can quickly click on to refine the results, like a list of languages found. Code Search supports many such filters[1], and even supporting suggestions or completion in the search box would be a good improvement. With the new UI once you're on the search results page the examples of the home page are no longer visible, so I never know if I should use lang:py or language:python, or if it's path:*.txt or filetype:txt.

[1] https://cs.github.com/about/syntax

having used the preview for a while, when searching across all repos, it's like the standard search box you don't have to sign up for, except occasionally less liable to give you 50 matches in a row on identical files across forks

it has a few Modern Webdev™ quirks, like when you ^f for something on a result page, it'll respond to your “selection” (read: ^f highlight) with a popup prompting you to try that selection as another query, which of course will pop in and out of existence as you cycle through ^f results and i could go on but ugh

the public sourcegraph instance at sourcegraph dot com is honestly more useful. pro tip: `repo:^github\.com/.*` works as a filter. unfortunately since it's not a first-class citizen of github with access to whatever internal indexing they have, it's slower, but slow and useful is better than fast and infuriating

How does this compare with something like https://sourcegraph.com?

Will this require a subscription like Github Copilot?

I guess there are 2 options:

  * surveillance capitalism: tracking and ads

  * paying
No idea where Microsoft will drive it, but they are not here for the common good.

The third option is that they are adding features to a laid product. I already pay a lot for enterprise GitHub and the search is really bad. This will help me not leave GitHub and motivate more people to signup and use it.

You omitted the most likely scenario (3):

    * all of the above

Or both of course (see Windows)

Running this kind of stuff costs serious money.

Of course they must see a way to make money with it in the long term.

Like they did with GitHub itself: as a data source for CoPilot ;)

Code search code.google.com (I believe) existed years ago. Obviously Google did not see a way to monetize it and killed it.

Wonder what this will mean for Sourcegraph and other code search engines as a player like GitHub enters the space.

Sourcegraph wisely chased the enterprise market, so use it to index your private repositories. Plus it can other things such as finding all definitions and modifying them across multiple repositories. They don’t really exist in the same space, at least yet anyway.

Also there is no money to be made in public code search. Take my word for it. However it is something that all code hosing repository should probably have as a useful too, even if it’s not a direct revenue stream.

I mean… it feels obvious that this will index private repositories for users authenticated to access those repositories?

I don’t think this will be a separate paid product, just a standard feature as part of paying for GitHub. Sourcegraph will be in the same position as every other SaaS vendor that tries to compete with Microsoft is: a bad one with a worse goto market strategy than MSFT.

Sourcegraph CEO here. I posted my thoughts on the positioning. Code search for Microsoft/GitHub is not a winner-take-all proposition unlike many of their other products (eg Microsoft Teams vs. Slack), so the analogy is a bit different. A lot of code lives outside of GitHub, thanks to developers historically having more freedom and choice. See https://news.ycombinator.com/item?id=31968184. Would love your thoughts on this.

Just a heads up, you can search Github here too:



If you search something that is also used in a popular opensource project, github codesearch will then show the same exact hit in a zillion forks, completely drowning out the interesting hits.

Is this somehow fixed in the new codesearch?

Code search SEO here we come!

Ha ha this is going to end up with paid advertisements formatted as code comments in underfunded, but popular OS projects. Oh god I really hope that’s not going to happen.

I don’t understand why I would ever need to do this?

If I need to replace or find something across all repos in my organisation, or god forbid, Github, then something went extremely wrong architecturally.

In my experience it's great for:

1. When your code is too big for your IDE's built in global search.

2. When you want to share searches with others.

#2 is great for design docs. You can say "<link> shows the bug is common in API usages" or "<link> shows x% of our files do this"

Edit 3. When something does go wrong.

> If I need to replace or find something across all repos in my organisation, or god forbid, Github, then something went extremely wrong architecturally.

If your plan B for messing something up is to not have a plan B you're going to have a bad time when plan A fails :)

Unfortunately, hope is not a strategy.

This is very useful to see examples of how people have used APIs that are either poorly documented or not at all. Or even that are well documented, really. Going from docs to code is not always straightforward.

To give you just one example, recently I've been using it to find how people have written code to interface with e-ink displays. I usually have the datasheet which lists all the commands the protocol support, but building it all into a valid startup sequence of ~20 commands to activate the display is left as an exercise for the reader.

So the docs will look like this: https://www.waveshare.com/w/upload/6/6a/4.2inch-e-paper-spec...

And what I need is a sequence like this: https://github.com/joeycastillo/The-Open-Book/blob/5c5054c58...

Hard disagree. A very common pattern we face is we bugfix an internal npm package and then have to roll out that change to a large subset 200 microservices that consume the package. Our goto right now is sourcegraph search to figure out what the subset is.

Like fixing a recently discovered severe security issues affecting a large number of repositories?

That would be terrible indeed. But it might happen. Good to have a powerful tool at hand in that case.

I use the current search to answer stuff like “what projects are using GitHub actions” by searching for the action yamls and look forward to this new tool working better.

I have found it handy on my work repo to search code.

As for searching all public repo code I am not sure I see the need unless maybe you have a super-specific error code to search for.

I e.g. often check various code search tools to see whether it's worth maintaining a backward compat layer for APIs I'm in the process of changing.

Searching stuff in std libs/runtimes/kernel, etc?

It’s honestly incredible I would say as helpful as copilot

Looking forward for all the integrations with code editors etc. (if they would provide APIs)

Is it open source?

This is GitHub we're talking about, a microsoft company. Not open source friendly in the inside.


Off-topic question.

How do I get Safari to show as in screenshot without the Red / Yellow / Green buttons. The Z height of the title bar also seems to be thinner.

Or is that a different browser?

It's obviously not a real screenshot, since the codepoint overlay is a separate image.

Can this search be used to find the code github is laundering through copilot?

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact