If anyone from GitHub is listening, being able to exclude test code with a few clicks would be an absolute game-changer. By far the biggest source of noise in my GH code search results, and I use the tool (and similar tools like Searchfox) super super heavily. Either way, stoked to try this out.
Thanks for the feedback! We downrank test files with a heuristic, though we'll definitely be looking to make this more sophisticated. You can also exclude results using a regular expression, like `foo NOT path:/_test\.go$/`.
And also note that if you often need to add this kind of qualifier to many searches, you can create a "custom scope" that includes it for you transparently.
> Search for an exact string, with support for substring matches and special characters, or use regular expressions (enclosed in / separators).
Finally!
Search-for-literal is so important when you have technical users working on non-prose text.
They say this is going in a dedicated search page 'to start with', if "<literally any text>" doesn't work in the top bar eventually this is still going to be miserable.
I'm from the team that developed this at GitHub - if you are in the technology preview, then you can jump into cs.github.com from searches done at the top bar.
I use Github's UI for exploring and searching codebases more often than my own environment, since I do a lot of curious browsing.
No offense, but the search is so bad for anything worse than a single word, that I've developed a sort of intuition for how to phrase things -- and then still spend a lot of time crawling pages of results haha.
Hi Abdallah. If you haven't signed up yet, you can do so here: https://github.com/features/code-search/signup. People on the waitlist are getting access as quickly as the team can support it.
What's your take on developing a new code search instead of partnering with an existing global code graph like Sourcegraph? What are the advantages of GitHub Code Search over Sourcegraph?
Well, in the past I've tried Sourcegraph several times, but it never give me experiences that match the was-dead-many-years-ago Google Code Search. I wish the new github code search does that.
Hey Edwin, if you're open to providing feedback, I'd love to understand which types of searches worked well for you in Google Code Search but not in Sourcegraph. We've invested a lot of thought into our query syntax, supporting literal matches, regex, and the Comby pattern matching syntax with a rich set of keywords and filters—but we know the syntax isn't always intuitive for every user. We're always trying to improve the experience for all our users (I'm the CTO at Sourcegraph), so if you have any recollections you're open to sharing, would love to hear them!
For me for one thing, although I do use Sourcegraph for searching public code (thanks for creating and maintaining it!), I find the website too slow and heavy-feeling. Not in a way that makes it impossible to use, for sure, but somehow just slow enough to clearly not feel "snappy" but rather "bulky" and lagging. In particular this (and maybe also some other UX choices, like not 100% always fully supporting right-clicking and "open in new tab"? Not sure now if that's indeed so, but kinda feeling I'm always afraid to do this - or maybe because I fear slowing down my browser?) makes me do less searching with it than I'd need/want to, and makes me hesitate internally every single time I'm thinking whether to open a (new) tab with Sourcegraph. Although I know the results will be very good; but I know also I'll feel tired by the lagginess.
It's not quite what you're asking, but it you want to limit your searches to a particular repo, you can just type that into the URL bar (and then bookmark that): https://cs.github.com/$OWNER/$REPO, just like how the repo's primary site on GitHub is https://github.com/$OWNER/$REPO
I use github search a lot and this would be an insane productivity boost. I signed up for the waitlist. can you please give me a nudge in the queue? This is my profile https://github.com/abdallahmansour6
Nope, I would have used an existing search solution, like xapian. It does so much more, and much faster.
You need to support a proper query syntax, with tags, rankings, stopwords, stemming. Then you need to have a proper db backend (reverse indices). Trigrams dont help for regex. Then a templated representation. Google codesearch would do only the 2nd of 3. ElasticSearch is commercial, and only java.
Oh OK, you have clearly spent more time thinking about this problem than the team of engineers at GitHub who've been researching code search at scale for more than four years. I bet they feel real silly right now knowing they could have shipped this search engine in a couple weeks taping together off-the-shelf libraries if they only had your talent for software architecture.
Sure, I did. I've implemented a proper search for a very big companies document and knowledge base. Similar to gmane, which also used xapian. Much better than what I see there. Or with the old google code search
Everything you are talking about is useful for full-text search. It's practically pointless for code search. Trigrams definitely help for regex, you can make use of a trigram index for a quick false positive index lookup.
Code and log search are two specialized use-cases of search that definitely warrant a non-full-text approach and as far as I am aware the trigram bitmap index would be considered state of the art for both.
That isn't to say you can't solve the problem with a full-text search engine, many people do with the solutions you alluded to. However they are drastically less efficient and probably out of the question at Githubs scale.
Code search typically does not need many (most?) full-text search features like TF-IDF, stopwords, stemming, tagging, etc. It's a categorically different domain.
I love how the Microsoft acquisition continues to result in increased investment in github with microsoft's resources, and real vision; not always how an acquisition goes.
There was a moment where I thought GitLab was poised to start hitting critical mass and GitHub's best days were behind it, that was probably around right before or after the acquisition.
Microsoft has done a great job of actually improving the product and investing in it, something that seems to not happen most of the time with giant acquisitions.
GitHub today, at least for me, is definitely improved over the GitHub five or so years ago. It does feel like some things are bloated due to a push for feature after feature, but the core features have gotten better to the point I don't care. I just wish I didn't have to spend 2 minutes turning off all the annoying features I don't care about for small projects on every new repo I create.
(From a GitHub product manager) Thank you for this feedback about the pain of having to repeatedly change repo settings. It makes great sense that you'd want to repeat certain settings. Would it meet your needs if repository templates allowed you to have settings that got copied when you created a repo from the template (https://docs.github.com/en/repositories/creating-and-managin...)? I also wonder if you'd be interested in repository settings being settable in a text file. If you want to continue chatting about this it would be great if you could post it in GitHub's feedback discussions here: https://github.com/github/feedback/discussions/categories/ge.... Thanks again!
Win32 API isn't nice, but Microsoft was always relatively good with documentation etc. and don't forget all the developer support within Excel, VBA, Visual basic etc. Bill Gates early on understood the premise of building a platform and not breaking it. Even if that meant win32 API became ugly over time. Old windows programs still work on newer releases.
What makes you say that? I love to hate on Microsoft, wouldn’t touch Windows with a 100ft pole, but I always give credit to Microsoft for VS, VS Code (and my favorite languages, TS and C#).
I admittedly haven’t used Visual Studio in a while (though I use VS Code daily) but I remember it as my favorite IDE. Certainly better than XCode, and I even preferred it to IntelliJ.
Still waiting for the ability to search in other branches. It's a pain when some codebases have stable releases on the next/dev branch but keep their main branch to the previous release.
Absolutely. I get they don't want to index every branch but at least set some heuristics like it it has a certain amount of activity or something per repo. Or even allow repo to opt into 1 to 2 other branches besides main. Especially for bigger projects
I can't use source graph to search a public repo I work on without granting them read access to every private repo I have. Attempting to deny just that permission causes it to error out.
So I'll summarize that no, you can not do that on source graph, given that doing so would require everybody violating their ndas.
Appreciate the feedback here (Sourcegraph CTO). I agree that's annoying. Kicking off a conversation on our end to figure out how to fix that. In the meantime, the workaround would be to create a separate GH login, which can be used to add any public repo to our index. Or what's the URL of the repo(s) you'd like to index and we'll get those added to our index.
Great. I'm using grep.app[1] usually as for me the GitHub search is mostly useless. Your mileage may vary though.
That being said there are many other great search interfaces that I am using often when I'm trying to find solutions to common problems or specific design patterns. Chromium search[2] comes to mind, Mozilla's Firefox[3], Android[4] or of course Google[5]
Now, can we please get GitHub issues back into third party search engines? Now, whenever I search for something I know is in an issue I only ever get results from those crappy GitHub scraper sites. This is happening on both Google and DuckDuckGo.
Only thing missing is indexing of branches and forks.
My main use case for GitHub search is identifying provenance of misc. changes in vendor source code tarballs for e.g. Android kernel releases. It's hard, but sometimes possible to rehydrate most of the existing commits through cherry-picks and careful rebases.
The biggest problem with the lack of indexing branches and forks is that sometimes vendors makes releases through branches, or that sometimes repos of interests are forks of e.g. `torvalds/linux`.
Hopefully we can see those being indexed in the future.
I'm also curious: has the plan to drop "less active" repos from the index gone through? Has anything changed?
> I'm also curious: has the plan to drop "less active" repos from the index gone through? Has anything changed?
Whaaat? I hope it doesn't go through. I use GitHub code search for clues when reverse engineering cheap Chinese IoT crap. Usually I can find some headers / SDKs accidentally uploaded and set to public by a random Chinese guy. Those repos usually have one commit and zero traffic, but they contain invaluable information about proprietary MCUs.
I would personally like to see less indexing of duplicate files! There are many things I’ve searched for which return 100s of results from independent checkin-uploads of big libraries like the Android SDK. It would be great if results were filtered by file similarity regardless of git history (if that is in fact the issue).
Got into the preview, can finally search for actual code! One thing I'd like to see, though, is the ability to mark directories to be ignored in the search results. No one needs to search the raw HTML of my generated documentation, yet it shows up in every search for project symbols. And since HTML is considered "source", I can't filter it out unless I select a particular language.
Also the search text field is bit messed up in Safari when the text gets longer than the field.
GitHub Code Search developer here - try creating a custom scope to filter out that stuff! Click on the scopes dropdown and scroll to the bottom. You can filter out HTML by using a query like:
It would be great if this used the same filter format as sourcegraph and other internal code search tools. ex. -file:.html is enough to filter away files ending in html in the main search box.
Having to use dropdowns and multiple input fields is more cumbersome than the filter language of repo:, file:, lang: etc.
Are there any open source powerful code search engines out there? As a Googler the internal code search we have here is one of the most incredible things I've ever seen, it's so fast and powerful I'm amazed by it daily. Is there anything near that quality out there?
We built Sourcegraph taking inspiration from Google Code Search (https://about.sourcegraph.com/blog/ex-googler-guide-dev-tool...) to bring the power of code search—and precise code intelligence that just works—to every dev. Try it out here: https://sourcegraph.com. A super common thing we see is people leaving Google, missing code search, and then bringing Sourcegraph into their new org. We'd love to hear your feedback!
The best thing about the Sourcegraph instance hosted on sourcegraph.com is that you can edit the URL in your browser from https://github.com/foo/bar to https://sourcegraph.com/github.com/foo/bar to be dropped down into a Sourcegraph search for that GH repo. I've been using it for a long time because of this convenience.
(Though it would be even better if the two options for case-sensitivity and regex search were enabled by default instead of needing me to toggle them on every time.)
You should be able to do that over in your User Settings (Click your picture in the top right and then Settings.) Adding these two things should change that default for you:
Sourcegraph is open-core, with a dual licensing approach. You can run the open-source version here: https://github.com/sourcegraph/sourcegraph#sourcegraph-oss, and we have an enterprise offering for companies that want to adopt for their teams. Similar to GitLab, both our enterprise and OSS code is publicly available.
I helped write DXR for indexing Mozilla's source code based on an instrumented compiler run; this has eventually been developed into mozsearch (https://github.com/mozsearch/mozsearch), whose indexing for mozilla-central is visible here: https://searchfox.org.
I thought it was abandoned! This is great to hear it just moved. Is there anyone at Mozilla that can update the old DXR repo [0] to direct people to MozSearch?
I work on a very large c++ monolith at work and DXR has been a real game changer for helping me just figure out how so much of the codebase works. Thanks!!
My job uses https://oracle.github.io/opengrok/ and I'm generally happy with it. It has some problems with special character searches at times but generally does what I want. It's certainly better than code search in our on-prem github instance.
For the grepping aspect, https://github.com/google/zoekt is a powerful one-stop-shop. For the navigating, I don't know. SourceGraph maybe, but the linking is somewhat heuristic I assume, not compilation-graph powered. But maybe that changes or depends per language.
We're using https://searchcodeserver.com/ internally and it works quite good, it was miles ahead GH search. After today, I'll have to test GH again to reassert this statement.
DXR has largely been replaced with mozsearch (https://github.com/mozsearch/mozsearch), and a quick glance through the really early history does show that it adopted a fair amount of stuff from DXR. The downside is that it's not as easy to set up a local mozsearch instance as old-school DXR was.
It really looks like they took a lot of inspiration from https://sourcegraph.com/search with this. Not a bad thing at all. I hope SourceGraph doesn't get obsoleted by this though, they're great people.
Sourcegraph CEO here. Imitation is the sincerest form of flattery. We are very transparent, have a ton of users, and are open-core, so it's easy to get inspiration from us. :) We want way more devs to be using code search since it's so valuable 10x+/day, and if this helps, then we are very happy for that. Devs get to choose the code search tool they use, so the best tool will win (you wouldn't use Bing if your boss made you...likewise, code search isn't like team chat or team docs).
I've met two of their devs randomly in different Discord servers. Both were great people (Noah, Olaf) and are very active in OSS communities. Perhaps not coincidentally, both worked on Language Server related stuff.
Ólafur is responsible for a lot of Scala tooling and some pretty neat original ideas.
Sourcegraph also came up with LSIF, which is useful format for building tooling for language servers:
Sourcegraph CEO here. I'm really sorry about that. We work really hard on making our interviews good for everyone, including documenting it publicly at https://handbook.sourcegraph.com/talent/interview_process. Could you please email me at sqs@sourcegraph.com so I could find out what happened?
I interviewed for Sourcegraph and it was one of the best. Super transparent process, open source handbook, fun coding tasks -- really nothing to complaint about. Would be curious to know what made you have such a different experience.
I just want to say about time. A lot of the time when using libraries with inadequate documentation, being able to find usages of a method or class gives really good insight into the library. But the current code search's stemming removes all the context needed to find that and then gives alternate spellings too.
I always though Github's bad search functionality was a business decision. It was so bad for so long. Even if basic improvements are significantly harder at their scale, I just can't comprehend how Microsoft left something so potentially useful be so bad for so long.
There is a way to search for comments using the "global search", but no way to search for text over issues and their comments. In particular, no way to search from the issue tab, no way to search over comments only in issues (or only in merge requests), no way to combine a text search with label/milestone/status filters, etc.
So it's a workaround, but a bad one.
Here's the ticket (2015): https://gitlab.com/gitlab-org/gitlab/-/issues/13891. The fact that it has so many duplicates in your own project's issue tracker is a good indicator of how bad your issue search is.
Today I wanted to search for "strstr[a-z]+?_r" but got the error message "This is a partial result set. The search was stopped early because it would take too long to check every file for this regular expression.". However, I got results for the less restrictive regex "strstr.+?_r" which is weird since I'd expect that it would be easier to return results for more restrictive regular expressions. Not sure if there is a perfect solution for this, but in many cases, you could probably search for the less restrictive version and filter the results with the more restrictive one after that.
Also it would be great if more repositories were indexed. How do things work behind the scenes? Maybe it is possible to build a more memory-efficient index just for exact string search, which probably make up most searches.
Anyway, this website is amazing and I use it quite often. Thank you a lot for working on this!
* Copy & Paste does not work. When trying to select text from code snippets, the code is dragged like an image instead of being selected.
* The GitHub icon during search is not clickable. Clicking to the right of the GitHub icon does not go to GitHub but instead shows an embedded view of that repository. There, the GitHub icon somewhere on the right is a bit hard to find (maybe write "View on GitHub" next to it so it is discoverable with Ctrl + F?), but at least it is clickable.
Wow, HUGE feature, congrats to the team working on it! GH code search is a feature with such massive potential utility, but the old implementation was so weak it was basically useless. Looking forward to this, will use it constantly if it’s good.
Yes please! I like to search for examples of how to use libraries and often times the results are all the same exact call in forks or copies of the same code in multiple places. Perhaps deduplication could be optional when searching?
Of all the tools I use on a daily basis Github is probably the worst. I mean the "Find a repository..." input field on the start page can not even filter out named repositories I have access to in all my organizations. It works for some repos but not all.
Search improvements? It is impossible to create a worse search experience than Github. Just clone and use git grep instead in most cases.
De-duping exact matches is a game changed -- search has been miserable to use because of the dupes for so long. I can live with near-similar documents. Very excited to test this out.
Another GitHub Code Search developer here - to add more to this, we rank all the search results, and try to bring the most relevant results to the top. Ideally, if you have 10 pages of results, you shouldn't have to leave page 1 to find what you're looking for :D
That would be a tough problem. As de-dup you probably want to show/point towards the 'original' tree. But which one is the source? Or even worse someone abandons a project but someone else forked it and kept going should it show that one instead? Or should it show the one it was forked from depending on the version number. Which one is the 'true' repo now? Most certainly an interesting problem.
I get that. Just remove the 'extra'. That is a good first pass. I was thinking the longer term you want to show the 'original' higher in the list? Wouldnt you? What sort of criteria would you use to make it so it shows one copy vs another? Probably in many cases it probably would not mater much. But if you wanted to figure out linage of imports it could be? Some projects could have thousands of forks. Yet only maybe a dozen of those actually have anything going on. Those would be more useful to show?
I use github search a lot and this would be an insane productivity boost. I signed up for the waitlist. Does anyone working at Github want to bump me in the queue? This is my profile https://github.com/adamnemecek/
this looks awesome! two things I've always wanted and haven't found satisfying solutions for in code search (in an editor)
1) an ability to easily express higher level concepts in a search that's aware of code semantics ("match only function names", "find call sites of a method") etc. Maybe this is possible with existing tools (probably is?) but I tend to get lazy about learning DSLs - would love to see this in a UI if it's possible
2) ability to save searches I do frequently - after a certain level of complexity in a query (I've added ignore rules, I crafted the right regex, etc), I want to be able to save the "context" of a search so that I can easily return to it later
We do have code navigation via the UI, so in a way it's possible!
> ability to save searches I do frequently
Absolutely! This is possible using "custom scopes". If you're in the technology preview, click on the scope dropdown, scroll to the bottom, and choose "custom scopes". You can make a custom scope to search a set of respositories, a particular language, within a directory, or any combination with boolean operators!
It's local-only search, but you reminded me that this is possible with MacOS Spotlight. I wrote an indexer (for Common Lisp) that let you search for function definitions, etc.
1) This doesn't seem to exist in quite that way, but you can prefix a literal with "def:" and the engine will return only definitions of that thing (so far as it can tell). It's not quite what you (or I!) want, but close.
2) This exists and is called "scopes". On the landing page, to the left of the search bar, click the grey pill that says "All repos". At the bottom there is a "custom scopes" option.
One feature I would absolutely kill for is a setting that lets you hide issues and PRs from bot users in the global search.
I've been lucky enough to have a few projects that others have found useful, and so they've ended up in Conda forge, Gentoo, and other package repos. After I make a release, the Github-wide search is just absolutely flooded with dependabot PRs, Snyk PRs, and dozens of other bots. Literally thousands (and sometimes tens of thousands for CVEs) of automated PRs and issues that make it impossible for me to see how others are using or discussing my packages (usually to see if I've just horribly broken something).
It looks like this doesn't really understand the code... If I have a bunch of functions in different files all called "print", it won't be able to determine exactly which one is linked in and called at runtime.
Googles codesearch tool[1] actually compiles the code and uses the compilers parse tree to make the search index. The only time it doesn't work is if you are looking for code that has been "#ifdef 0"'d, when it falls back to regular string matching, and the difference is night and day.
Please github... please try to make a search index by compiling everyones files. Plenty of projects have CI buildbots which have all the info to automatically compile millions of projects, and at the same time generate the necessary parse trees. Even for interpreted languages like javascript/npm, python/pip, you can use heuristics to make a cross-module function call graph accurately most of the time.
We’ve been working on a framework called “stack graphs” that lets us extract exactly this kind of information without having to build anything. More details in my Strange Loop talk from October: https://dcreager.net/talks/2021-strange-loop/
This is great! As a project manager I am using github search everyday when I am searching for specific methods or part of the code in order to find logical issues or bugs in a code.
Got an opportunity to try it a few minutes ago and it's awesome so far. I was able to look for my code in repos I don't own, e.g `not org:user foo::bar`
Ah, great. GitHub throwing out special characters in searches was infuriating for languages with sigils and patterns, like $somevar or %sql% and so on.
Curious if this is something completely bespoke or simply a beefy ElasticSearch cluster which uses the (relatively) new "wildcard" field for enabling regex search on select fields. The search syntax certainly maps 1:1 to the ElasticSearch Query String syntax, including phrase search, boolean operations, grouping, regex search, etc.
I use github search a lot and this would be an insane productivity boost. I signed up for the waitlist. can you please nudge me in the queue? This is my profile https://github.com/abdallahmansour6
Just in case people from Github are listening to feedback here, please stop blocking search for logged out users? I mean, you're not exactly as terrible as your main competitor gitlab.com which entirely blocks Tor users from cloning repos (unless you add ".git" at the end of URI), but having to login every time i'm looking for a string eg. across an org is the worst. I understand i have to login to publish code and engage in discussions, but login for read-only content is bad UX.
PS: How do you feel about being bought by Microsoft? Maybe some of you feel it's a good time to implement s2s inter-forge federation to plant a nail in the coffin? Sounds like Gitea is on a good way to support it based on ActivityPub/forgefed and it would be sad if Github was relegated to a for-profit walled garden.
Some logic to exclude duplicate results would be useful. I often search to see how many external users there are of some API in postgres. But there's hundreds of separate repos with similar contents showing up in the search results...
The addition of exact match search is so exciting that I haven’t internalized any of the other new features. I’ve abandoned an ungodly number of semi-common-word searches after getting 30 pages of results in a monorepo
I didn't even see this in the feature list before doing the signup. One of the signup questions is "how do you usually search?" or so, I wrote in the blank "I want to search for symbols, not substrings, so if I'm searching for `bar` I don't want `foo_bar` to show up as a match". I usually do this with word boundaries in regexes, but I pretty much have to have the repo downloaded, so it's useless for searching on github.com this way.
This is great, specified search on GitHub has previously been very hit or miss. Generally I use the search feature for learning / trying to see if something I'm trying to do already exists. I personally think vsCode has the best code search implementation, in terms of "exact", "partial" and "regex" matching. The UI is clear, non-technical team members can navigate their way around it and it's relatively fast assuming you don't have too many extraneous plugins installed.
Any plans to integrate Copilot goodness into this? All I want is to search for "the function that concatenate multiple items with an Oxford comma" and get to that bad boy!
I wish they would fix the advanced search feature. Searches that have multiple filters don't show any of the matches (has been broken for over a year - they acknowledged it as a known issue). Example search "camera -filename:camera.css -filename:depend.make" will say there is 100 million matches, but won't show any of them. Super useful feature when it's working
whenever I search for code, it will say something like "Last indexed on Apr 2", but if you go to the actual file, the date will say 5 years ago or something. So currently the "Last indexed" listed date is completely useless, and you have to basically click through to every result.
Yes, sadly, that is literally when the file was _indexed_. So it's not particularly useful. It's a difficult problem to solve, but I'll bring up your feedback to the team.
i was actually really surprised that this did not exist when i went to search github for the first time. you would think that an open source giant would have this ability but i guess there is a ton of computational load to achieve search in
general. i’ll probably get downvoted for bringing up a whacky idea, but imagine having some type of referencing system that is done through multi node p2p, so searching certain systems using shared resources. i guess the major problem would be if devs would actually spare some of their personal computational resources to help the community find things and not rely on special interest groups. i get it, i am old school as well. i started out on pascal and BASIC. but still think using creative solutions is fun. but you know, napster was cool back in the day prior to their lawsuits. and p2p was starting to pick up speed
There was a recent post on search engines where I believe a P2P solution was mentioned (but maybe it was on some related post within a few days of this one): https://news.ycombinator.com/item?id=29417061
Now can they fix doing a language search for “Visual Basic”? If you filter a users repos or stars on that language it just shows all their repos or stars. Code search for language “Visual Basic” returns all repositories and does not limit by language like it should.
Will it be possible to see all the search results within each file? The lack of that feature is why I almost never use GitHub's code search for my repositories. Instead I'll download the repository locally and search there.
Thank f for this. Github search has been complete garbage for years tbh.
Being able to search, for instance, for 'def method' and not finding EVERY other def first is gonna be kinda nice
I have 10 different git + github instances across my org. (~50k strong workforce, pre github repos, m&a etc).
Does this cs offer aggregated searches across all those distributed repos?
Hi zxienin. I'm a GitHub product manager. May I assume the GitHub instances you're describing are GitHub Enterprise Server instances? We plan to bring advanced code search features to all GitHub plans including Enterprise Server once we've stabilized the UX and feature set. But it sounds like your situation goes beyond that, where the search needs to include code from Git repositories outside of GitHub Enterprise Server. That makes good sense, and we'll definitely consider it. If you want to keep in touch about it, please feel free to post in our feedback forum: https://github.com/github/feedback/discussions/categories/co.... Thank you!
grep.app doesn't index all repositories on GitHub though. I was doing some research a few months ago and couldn't find anything that would search all of github quickly.
Check out https://cs.github.com/about/syntax -- indeed, by default terms are searched in both content and paths. You can restrict to one or the other with `content:` or `path:`.
Then their documentation is wrong. I learned the hard way that GitHub code search didn't search file names in my case. I searched for a short bare string with some alphabet letters and one underscore, and it failed to find the file with that exact string in the file name, costing me a lot of time missing what I was looking for.
Unfortunately I can't reproduce the problem publicly because it happened while searching a private repo.
this is awesome but i dont know why we needed faster search, seems like time wouldve been better spent on more search features. I guess this is class programmer pointless optimization
hopefully i'll be able to search for usages of a library's function without getting 30 pages of that library's source code cloned in vendor directories
Slight tangent: The video has a guy describing the tool and he includes the fact that it’s written in rust when introducing it. I’ve always found this sort of name dropping in rust projects/devs baffling. Is there anything that I’m expected to infer from it? Is it that it’s backend is memory safe? I can’t think of anything else. Now it may very well be very memory safe but why include that highly specific detail when talking about a very high level thing that is the UX of search. What if it was written in Haskell or C#? Would it still be brought up? It’s almost as if being written in rust is a feature in itself these days. As a technical guy I can’t help but take the person less seriously, especially when it’s as unwarranted as this.
Not ashamed to be a Rust evangelist! The reason I mentioned Rust is because we spent a lot of time making the experience really fast - which is super important for a product like this. I really think getting the performance we have would have been enormously more difficult in any other language.
Fellow Rustacean here. Is the search engine secret sauce or something that could perhaps be open sourced? I'd like better tooling for searching private code bases. Also, would you consider writing about optimization techniques you used?
We are looking into open sourcing some libraries that we've developed for search. And we're going to write a blog post with way more technical details soon!
I agree with you, but I just wanted to point out the following:
In general, Rust, C, and C++ are going to be faster than languages like Ruby*. He brought up Rust while discussing the performance of the new tool. Although performance is more complex than language choice, etc., saying it's written in Rust gives the viewer an approximate lower bound as to how fast the tool should be.
*: (GH started as a Ruby shop, so I wouldn't be surprised if that's what the original tool was written in).
He’s talking about text search and the post thanks @BurntSushi. That means they’re using the fastest text search tool out there - ripgrep. I won’t mention what it’s written in, because that clearly upsets you.
Go had this issue for a while, too, it's finally started to calm down as Go hits a mainstream that is (imo) much farther than Rust is currently. I think much is just people trying to add validity to Rust for large-scale production workloads, in the same way that Kubernetes was "a compute scheduler written in Go" or Terraform was "infrastructure as code written in Go" (maybe those are bad examples, but I know I've seen the "X written in Go" thing going on).
This is exactly how I see it as well. Rust used to be an obscure language with a compiler written in OCAML. If something was written in D or zig, it’s noteworthy so you mention it. I think rust has come into the mainstream enough that we can drop the “written in rust” line imo.
I think depending on where the audience is coming from—for example people who primarily work in scripting/interpreted languages—Rust can also be a positive signal for performance.