Hacker News new | past | comments | ask | show | jobs | submit login
Ripgrep 11 Released (github.com/burntsushi)
410 points by burntsushi on April 15, 2019 | hide | past | favorite | 88 comments



Ripgrep is awesome. Thank you for making it!

In addition to being really fast, it "just works". By that I mean it automatically excludes the files I want to be excluded, like `.gitignore`d files and binary files. I know I can configure ack-grep (ag) and other tools to do that, but not needing to configure it is nice.

BTW, if anyone hasn't read https://blog.burntsushi.net/ripgrep/, highly recommended. It's about how ripgrep is so fast. (Edit: Just saw another comment mentioned this, too. Goes to show that making a single high-quality blog post has a big impact.)


It is a very good blog post. Recommended!

Now, I will do my obligatory "burntsushi needs to clag the rest of the Teddy code" post which I must do on every discussion of ripgrep. :-)

The "subtitles_alternate_casei" examples would be a good benchmark - these 48 strings should not overwhelm Teddy as they could be sensibly merged into Teddy's 8 "buckets" (in fact, they could be merged into 5 buckets) using the simple greedy merging strategy in the original Teddy implementation.

This would probably be a quite good project for someone who wants to contribute to ripgrep and could likely get some nice performance wins...


I agree, this could be a nice project! If anyone wants to work on it, this is the place to start: https://github.com/rust-lang/regex/blob/master/src/literal/t...

I'll get to it myself eventually if someone else doesn't, but it will likely be a while.


I've been thinking about a Teddy successor (which I suppose needs to be called "Taft").

Teddy is very much 'of its time' (SSSE3) and there are a lot of new approaches that seem interesting (AVX512 of Skylake generation, VBMI, Sunny Cove's even bigger slate of instructions, ARM NEON, SVE).

I also have better ideas about followup confirm than I used to. There are also some prospect to pick a 'fragment' out of the whole string within Teddy or equivalent at a position not strictly at its suffix - this can even be done with ordering preserved if you are careful not to make fragment choices that allow o-o-o matches (only possible if strings overlap).

I might do a bit of work on this, but I'm a bit jaded on string matching and regex matching after 13 years.


On the other hand, by now you probably have more fixed function hardware in your brain for string/regex matching than any other human alive.


What an unnerving concept! I think there are many more people with better algorithmic understanding of the problems. I am more of a 'bang on the problem with a stick until it kinda works' type meathead with a few cheesy SIMD tricks up my sleeve.

I am hoping to move on, but I admit I do have a "few more ideas" in that area - possibly even slightly less 'meatheaded' than previous outings. Maybe (although everything looks better on paper).


ack and ag (the_silver_searcher) are two distinct tools. The latter automatically excludes .gitignored and binary files, just like ripgrep. In fact, it seems ag inspired ripgrep's behavior; in the blog post[0] introducing Ripgrep, BurntSushi specifically highlights that rg aims for the usability of ag:

> I will introduce a new command line search tool, ripgrep, that combines the usability of The Silver Searcher [ag] …

[0]: https://blog.burntsushi.net/ripgrep/


>ack and ag (the_silver_searcher) are two distinct tools. The latter automatically excludes .gitignored and binary files, just like ripgrep.

Only I've never been able to make ag (or pt, another similar too) respect .gitignored and other such settings as good as rg does out of the box. Plus it's slower.


Yes, I believe BurntSushi specifically highlights better .gitignore support in the (long) blog post I linked.


both of these are slower than rg. ag has issues with big files, it's unreliable.


I've not used ag but just a small shout out for ack: it's a single Perl file (easy to install, v1.x will work with even antique Perl), it works on weird hardware/OS (no compilation required), it contains its own man-page (ack -man) and is fast enough for typical sysadmin tasks and moderate size codebases.

(Ack also supports the full PCRE syntax but that's less of an issue now rg has -P).

If your work takes you to minority platforms ack is certainly worth keeping in your toolbox.


What file sizes did you have issues with?

I'm using ag inside my source directories and cannot remember it not doing the job.


Thanks, did not know that!


> addition to being really fast, it "just works". By that I mean it automatically excludes the files I want to be excluded, like `.gitignore`d files

I basically never want this, so for me, ripgrep's one flaw is that it never "just works".


The nature of defaults is that they will never please everyone, because opinions on what is useful have been encoded into the defaults. If you want to completely disable ripgrep's smart filtering, than just do this:

    alias rg="rg -uuu"


You can configure ripgrep to default to whatever flags you want by creating a configuration file and defining `RIPGREP_CONFIG_PATH`: https://github.com/BurntSushi/ripgrep/blob/11.0.0/GUIDE.md#c.... Try putting `-uuu` in the file as burntsushi suggested.


Also see: A series of posts on regex parsers by Russ Cox.

https://swtch.com/~rsc/regexp/


I agree, ripgrep is amazing.


I’ve learned a lot by reading burntsushis code, so thank you for everything you’ve written.

Separately, ripgrep is a great advertisement of what Rust is capable of. First, it shows it’s possible to write highly performant and reliable applications without resorting to non readable code. Second, it shows how simple it is to separate applications into a library + binary thanks to Rust’s package management. Ripgrep is merely a command line interface to libripgrep, which is reused in other applications like VS Code. The regex and directory walking code that was once part of the main ripgrep code base are now crates that are reused to great effect throughout the ecosystem.


Last time I checked, VS Code didn't use librigpreg directly. It was calling the ripgrep binary instead.


This is correct. VS Code uses ripgrep's JSON output format from the binary.

libripgrep is nice for building your own Rust programs for searching stuff. But there is a not insignificant amount if code that translates your argv into uses of libripgrep.


If you've been holding back on Ripgrep because of its 0.x version numbering, now's the time to adopt it. Ripgrep is faster than grep, faster than ag (the Silver Searcher), and even faster than git grep.

The author breaks down all of these details in this 17,000 word blog post, full of detailed benchmarks. https://blog.burntsushi.net/ripgrep/ (It's from 2016, but it's still mostly in good shape; ripgrep has only gotten faster since then!)


The blog post is good, and fair at the time. One caution I would add is that time has not stopped for the other projects, either. I believe ag (the silver searcher) has also seen considerable performance improvements since 2016.


> I believe ag (the silver searcher) has also seen considerable performance improvements since 2016.

Do you know what they are? I haven't noticed them, although it has been a while since I've rigorously benchmarked it. From looking at the source code, I don't think much has changed. e.g., it appears to still be using memory maps on Linux when searching large directories.


Out of curiosity, is there a better approach than memory maps?


The author goes over that here:

https://blog.burntsushi.net/ripgrep/


Using mmap is generally slower than reading chunk of data and mmap does not work well if you have to handle huge files with several GBs of text data. You can find performance benchmark results and a detail analysis from @burntsushi blog and Lemire's blog post (https://lemire.me/blog/2012/06/26/which-is-fastest-read-frea...).


ag is slower than ripgrep in all of my benchmarks (https://github.com/hungptit/fastgrep). And ag might be slower than GNU grep if files are stored in fast storage devices i.e SSD drives even though GNU grep is a single thread command.


Those benchmarks are interesting. They are more thorough than most, but there's still a few fairly significant problems with them. Firstly, I don't see any verification that the commands are producing the same or similar output (e.g., a match count). Secondly, you split out fgrep into separate mmap and no-mmap benchmarks, but didn't do the same for ripgrep. Thirdly, some of the flags you're giving to ripgrep are a bit weird. e.g., Why are you asking it to follow symlinks? None of the other commands do that (or at least, grep doesn't). Fourthly, it looks like your benchmark corpora are pretty small. That's going to introduce a fair bit of variance. Hyperscan in particular might let you show a bigger performance difference on larger inputs.

Also, your README describes "modularity" as a difference between fgrep and ripgrep, but ripgrep appears _significantly_ more modular. Its components are split into several libraries, each with their own good API documentation. You might consider mentioning that, as your README kind of makes it sound like ripgrep isn't available as a library where as fgrep is.


@burntsushi I used "-L" by mistake in the benchmark. I have updated the README file and the performance benchmark results using the latest version of ripgrep. For searching lines from the boost source code ripgrep 11.0 is the clear winner i.e 50% faster than GNU grep 3.3. I tried to be fair as much as I can in the performance comparison so feel free to create a github ticket if you see any issue in my benchmarks. I do have unit tests to make sure that fastgrep does produce the correct results and did many manual tests to make sure that the output of fast grep is consistent to the output of GNU grep and ripgrep. Note that the matched lines may not the same for a search pattern and it has been explained here https://rust-leipzig.github.io/regex/2017/03/28/comparison-o... or @glangdale, the author of Teddy, will be the best person to ask.

BTW, I see a 20% performance jump from rg-0.10 to rg-11.0 for a single file benchmark. What are the key differences between these two versions?


> BTW, I see a 20% performance jump from rg-0.10 to rg-11.0 for a single file benchmark. What are the key differences between these two versions?

I don't know. It would be easier to explain if I could more easily see what actual commands are being run. Your README just has you running `./all_tests`, but I want to see the actual command line invocations so that I can reproduce them. I'm also not sure which benchmark in particular you're referring to, so I don't know which corpus to use. Look at ripgrep's README for an example of what I mean. All the inputs are carefully specified and the commands being run are clear. You can even see the raw commands for the full benchmark suite: https://github.com/BurntSushi/ripgrep/blob/master/benchsuite...

I realize doing benchmarks right is a lot of work. So if you just have a particular command for me to try and compare performance, then I'd be happy to just do that.

> Note that the matched lines may not the same for a search pattern

Indeed. ;-) That's exactly why I asked. That's a really important UX concern IMO.


The latest version of fastgrep does not highlight the matched text segment and I think there is a reasonable amount of work that I have to do to make sure that the highlighted text does make sense. Again writing a usable grep command is very hard and I appreciate the effort that you have spent on developing and maintaining ripgrep. BTW, I do use ripgrep in my daily workflow and only use my fastgrep command when I have to deal with huge text files.


Vscode integrates ripgrep by default in their “search project” feature.

I mostly use Vscode because of its fast search, decent auto complete and simple editing experience.

Whoever made ripgrep, deserves massive kudos.


The OP is by the author of ripgrep


Atom's fuzzy finder also uses ripgrep as of 1.37.0

[0] https://github.com/atom/fuzzy-finder/pull/369


Same reason for switching for me. The projects at my work needed 2 minutes for a single regex search, with ripgrep it went to 3 seconds.


I cannot understand all that amount of excitement. Are we under heavy fanbot attack or am I missing something? If the whole thing is that, is faster than grep and "in Rust we Trust" then please feel free to ignore my comment.

I would use OSX's slow grep every single day to find things and I never had any of the non-problems problems which ripgrep is trying to solve. About recursive, case insensitive and search and file number just use grep with the flags "-Rni", so I don't get in which aspect is superior. I never ever had half a thought about grep's performance at the systems I use it (already faster than my cognitive bottle neck). If I ever would need something better than grep I would try to move to the next level or deal with lots of unstructured data I would perhaps move to a local search engine using indexes.

And personally I would consider an antipattern to skip the gitignore contents per default. What kind of machines or use cases are hitting the people in the comments?


I use rg on a very large project (8.5 million LOC, 5.2G of files including the build). Searching with rg takes less than 2 seconds and I don't ever have to think about it. rg automatically handles not searching the build files and autogenerated files which often contain lots of matches but are never what I want to search. Additionally, rg has an excellent --vimgrep feature which makes it easy to integrate with the vim quickfix list.

Vanilla grep, meanwhile takes 30 seconds -- too long to search while maintaining flow.

For me, the whole thing _is_ that it is faster than grep. AFAIK, a lot of its speed is due to breaking compatibility with grep to allow for more convenient defaults (skip binary files by default, skip .gitignore by default, etc) which also happen to be faster. For me this is a clear win-win.

Also, I have an 8 core machine. To be waiting for a 30s search knowing 7 cores are doing nothing makes me sad.


maybe you never bothered giving to grep as input parameters the same 2 or 3 folders which are usually scanned? Anyway, I ignored grep is not multicore and wouldn't expect CPU to be the main bottleneck.

That's a good use case for a new tool. I might try the next time my greps take longer than 2 seconds.

P.D. May I ask which kind of project/tech has this ratio of LOC and size?


Some builds in this project are out-of-tree (the best practices cmake way with a separate build directory). Some are built in-tree (with json files, libraries, object files, LLVM ir files, and more scattered in the same directories as source code). I can't be bothered to spend the time to learn, write, debug, and maintain the correct grep invocation to do this. rg has saved me the 10 minutes it would take to do this, and 28 seconds * ~1000 grep invocations. I'm a very happy customer.

The project is a fork of LLVM. We're implementing compiler support for a new design for threading on x86.


> I never had any of the non-problems problems which ripgrep is trying to solve

That's OK. Not everyone has the same set of problems, and not every tool that is built must be used by everybody. I document pretty clearly what ripgrep does in the README, so if it doesn't target your use cases, then obviously you just don't need to use it. But also, just as obviously, it should be clear that plenty of other people do fit into the use cases that ripgrep targets. It turns out that a lot of people work on big code bases, and as a result, running the default grep command can be quite slow compared to something that is just a little bit smarter. It gets super annoying to always remember to type out the right `--include` commands to grep. I know, because I did it for ten years before I built ripgrep. I mean, this is the same premise that drives other tools like `git grep`, ack and ag. ripgrep isn't alone here.

See also: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

> And personally I would consider an antipattern to skip the gitignore contents per default.

Then use

    alias rg="rg -uuu"
and you'll never have to worry about that specific anti-pattern ever again. ;-)


Thanks for answering burntsushi I didn't expect you to reply personally. It's true that there is some good value in having smart sane defaults, and skipping always looking for 2 or 3 advanced flags which are only used from time to time so I ended up giving it a chance (installing it via brew). The difference showed up immediately.

At the moment first impressions are good, seems to be killing it at the most common grep use case I am having, so I will continue using it and trying for some weeks until I can have a better formed opinion.

The only unknowns for me are are the edgy use cases such as piping to other processes where I might end up falling back to system's grep or maybe learn better the rg possibilities.


Neat. FYI, ripgrep works fine in pipelines. You should be able to use it just like grep in those cases.


Can I just say thank you ripgrep?

In my usecase ripgrep is killer, I use it both to search through something around 10GB of a monorepo, and to search through 400GB of textfiles (custom format, similar to json) every day and not having rg performance would make my life considerably worse.

When you have very large sets of data to search through the difference from rp to other solutions is like night and day.


Glad it helps. :) Although, with 400GB, I think any tool would probably get similar performance, since it's likely bottlenecked by your disk's read speed.


You're not wrong, it's a combination! Yes, faster than grep, yes "ooh shiny rust", + "ooh shiny" in general, used as the search function in the most popular editor (statistically, for users of stackoverflow) (https://insights.stackoverflow.com/survey/2019?utm_source=so...).

More "technical" reasons here: https://github.com/BurntSushi/ripgrep#why-should-i-use-ripgr....

For me, the performance is best in class, ux is great, and has defaults that make sense.

Perhaps this helps add context. Every day there are probably thousands of new people learning to write software and learning and command line search tools. They would compare grep, and perhaps something like ripgrep on how they stack up today. They probably don't want to search their javascript dependencies by default (from gitignore, eg node_modules). They probably don't want to search their build (binary/gitignore). They would like to have an easy way to say "only search javascript files" or "only search go files". They would like the fastest tool for the job, why would you opt for the slower one? None of greps historical clout/muscle memory has an impact on them. They would probably appreciate coloured, formatted for easier human consumption output by default.

Sorry for the rant!


Use case: grepping (big) repos. Rg is great for that, and grep, because it is much slower, almost not usable.


maybe I did not had this problem yet... if I used grep on linux sources I would just skip the build folder and it usually works flawless (cannot even recall when was it too slow unless grepping over binary data).

Do you refer maybe to deep interdependent javascript projects in which a small library requires hundreds of MB in dependencies? How many Gb have your projects and how many lines?

Please include deps & builds (my current project is a small python one, 1.8Gb with deps, and I would usually grep only over the same folders all the time):

   $ time find ./ -type f -print|wc -l
    140197

    real 0m0.626s
    user 0m0.165s
    sys  0m0.473s


Anyway, if it is about grepping over several gigabytes of unstructured code and there is no chance to have a better tool to index everything that could be a use case.


I use "ack" rather than ripgrep, but the reason I changed was that it automatically ignores a lot of files that I don't care about: VCS files largely. That's the big feature for me, because I'm usually searching in Mercurial repos and every match appears twice (real file and .hg file).


For me it's not about speed at all. I've used ack and loved it because it automatically ignores everything I want ignored (with a .ackrc file).

I just tried ripgrep and like it even more because it automatically follows .gitignore, which is huge. To me, that's the killer feature.


for find/grep to catch up they really should take care of ignoring .git/.cvs/.svn/.repo directories as default these days, I don't want to type --exclude etc. I'm not that sensitive to speed here, but ignoring .git is an enough reason to make me use ripgred and fd daily instead of 'grep/ack' and 'find'.


Apart from what other people have mentioned, proper UTF-8 support is a big deal for folks who might be searching non-English text.


I love ripgrep. For me it has done an exceptional job at searching and finding a large list of search strings over pretty large text files.

Tangentially, it is because of great tools like these written in Rust that I have major respect for the language.


The -pre flag is rarely mentioned but was a revelation for me in for dealing with MS office files and pdf documents. I use it every single day.

alias rgpre='rg -i --max-columns=1500 --pre rgpre'

*edit Adding my script here, and would be grateful for any improvements. Goal was to catch doc,docx,ppt,pptx,xls,xlsx, epub, mobi, and pdf and still work on termux with default packages there only.

https://gist.github.com/ColonolBuendia/314826e37ec35c616d705...


That's a lovely script. And yeah, I should try to publicize --pre a bit more, since it is pretty handy. I don't use it a ton myself, but when it works, I'm really thankful for it. (That might sound weird since it's in my tool, but I was initially kind of opposed to adding it.)

One bummer is that it can slow ripgrep down quite a bit: one process per file is not good. As a tip, if you add

    --pre-glob '*.{pdf,xlsx,docx,doc,pptx,...}'
and whatever else you need, then ripgrep will only invoke an extra process for those files specifically, which can significantly speed up ripgrep if you're searching a mixed directory (or just want to have `--pre` on by default).


That addition was an incredible time savings in my average search, I'm very grateful.

A more prominent mention, and a sample script, would be very useful imho. Just googling for a way to do this for even just one of these file types yields inferior commercial solutions at best, often just nothing.

And while I'm not sure how to make it much faster from a search perspective, I was able to find a 40% improvement in invocation by switching from rgpre to rgg as my alais.

I ended here:

alias rgg='rg -i --pre-glob '*.{pdf,xlsx,xls,docx,doc,pptx,ppt,html,epub,mobi}' --pre rgpre'


Hah nice! I'll noodle on this and figure out how to publicize it more. Your script would be very helpful as an example. Do you mind if I use it in the docs somewhere?


Please feel free.

Even a cursory mention of the ability at the end of the readme would likely help many. With a verbatim script and verbatim command using it helping many more. I only stumbled upon this feature with enough context to really get it in the Ubuntu man pages and I don't know if I ever would have gotten to the pre-glob feature to be honest.

I do believe that technically oriented non-programmers would be the biggest winners from more exposure around it.



That's indeed a lovely script. It makes for seamless searching my trove of PDFs and ebooks. Thanks for sharing it, and thanks BurntSushi for considering including it to rg's README :) !


I also love ripgrep and use it all the time. Thanks!

I've lately been doing some Windows programming. Ripgrep from a Git Bash command-line can generally find ALL instances of some word before Visual Studio's Find In Solution can even return one result!

The only downside is that rg does not understand namespaces when searching, so a commonly used member variable will show up all over the place. Can still usually find the right location by visual inspection before VS2017 though!


I tried ripgrep 0.10 when it came out. I did variations of "rg foo", "rg -i foo", "rg -ri foo", "rg -ri foo .". All of them returned 0 results. Where "ack foo | wc" returned 79 lines. Sooooo. I uninstalled "rg".

I just tried again and same thing. If I run with "-u" I get results.

I've just spent some time hunting and figured out that in my "projects" directory I had a ".gitignore" that had "", which I did a long time ago to stop "git status" from spamming me if I ran it in the wrong directory (base projects, a hg project which I have a lot of).

For some reason "ack" recognizes that it is in a hg repo when I run it, and "rg" also recognizes that, but it ALSO* searches up the tree looking for a ".gitignore" which ack does not. Good to know.


If you use the `--debug` flag, it will tell you which files are being ignored and why.

If ripgrep doesn't search any files at all, then it should print a warning to stdout.

ripgrep ascends your directory hierarchy for .gitignore files because that's how git works; .gitignore files in parent directories apply. ripgrep doesn't know anything about Mercurial, so it doesn't know to stop ascending your directories.

ack is different since it doesn't look at your .gitignore files, so it doesn't have the same behavior here.


Oooh, ack ignores ".git" and ".hg", but ignores the ignore files. That, actually, seem like pretty ideal functionality. I guess the equivalent would be "rg --no-ignore-vcs". I had tried "-vvv" to no avail, thanks for the pointer about "--debug".


I love ripgrep and I’m thankful for the efforts that went into the project. It’s seems like a delicate programming project. Works great with Emacs btw


The only reason why I install Rust on all my machines is to install ripgrep with Cargo. :-)


This.


Does anyone know how ripgrep's pcre support compares to python/Perl/awk/sed for speed? I noticed it can do substitutions, so I'm curious.


I haven't tried it yet, but it would depend on what sort of features you need

Python wouldn't be a good choice for cli usage

Perl is awesome to use from cli, and it is not just simple search and replace, see my tutorial[1] if you want to see examples

sed and awk are awesome on their own for cli usage, sed is meant for line oriented tasks and awk for field oriented ones (there is overlap too) - one main difference compared to perl is that their regex is BRE/ERE which is generally faster but lacks in many features like lookarounds, non-greedy, named capture groups, etc

you could check out sd[2] for a Rust implementation of sed like search and replacement (small subset of sed features)

[1] https://github.com/learnbyexample/Command-line-text-processi...

[2] https://github.com/chmln/sd


Thanks! I was mostly asking because I've found Python's regular expressions to be extremely slow compared to egrep, and Perl is in the middle. It's annoying to have to call subprocess functions in python just too launch egrep on large files.


ripgrep enables PCRE2's JIT, so it should be very fast.


Is there a way to output the results in Visual Studio format, which is <filename>(<line number>)<result>? This way when I hook up ripgrep as an External Tool, the line shown in the Output window can be clicked on to open the file and go to the specific line. I see in the source code that the separator is ":"; VS expects () around the line number.


Not natively, no. You could probably write a simple script that transforms the output into what you want though. For example, on Unix at least,

    rg -n pattern | sed 's/:/(/' | sed 's/:/)/'
That won't work in every case (e.g., in the case of a file path containining a `:`), but it might get you pretty far. And it could probably be made more precise.

Otherwise, you could always use ripgrep's --json output mode to craft its output into whatever format you like, but that's probably more work. :-) The JSON format is documented here: https://docs.rs/grep-printer/0.1.2/grep_printer/struct.JSON....


I use this program on a daily basis at work, it is so fast and easy to use, I love it. Thank you for creating it :D


Here I thought I was hip using ag, the announcement blog post linked in this thread is fascinating and thorough, looks like it's time for me to try out rg.


Ripgrep is so awesome, "some performance improvements" doesn't make any sense because it's already horrifically fast.


What makes it so fast? Apart from skipping gitignored files and binaries.

Parallelism? Some Rust-specific features?


Here's a blog post from the author explaining how rg works: [1]. It's from 2016 so the benchmarks may not be up to date, including for rg's competitors.

[1] https://blog.burntsushi.net/ripgrep/



version 0.10.0 to 11.0.0 is a big version leap!


Hey they changed some exit codes ;)


emacs did that too - 1.12 to 13.


Solaris jumped from 2.6 to 7 (the "2." is now implicit). Of course, Windows NT jumped from 4.0 to 2000. ;)


Java went from 1.4 to 5. Although not completely; it took until 9 for the actual directory names etc to change.


But it didn't, really. NT 4.0 was followed by NT 5.0 (branded Windows 2000) and then by 5.1 (Windows XP) and then by 6.0 (Windows 7).


Vista was 6.0. Windows 7 was 6.1 and 8 was 6.2 (8.1 was 6.3). They finally bumped up the version to 10 with Windows 10.


goim was insanely good so I guess I’ll have to check this out!


If you liked goim, then you'll definitely want to check out its replacement: https://github.com/BurntSushi/imdb-rename/

It's even faster and comes with its own built in evaluation. There's also no more reliance on an external database.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: