Hacker News new | past | comments | ask | show | jobs | submit login
Ripgrep is faster (2016) (burntsushi.net)
98 points by tosh 10 months ago | hide | past | web | favorite | 40 comments

Not sure if this is coincidence or even if this was already mentioned on HG, but Ripgrep had a release yesterday with a few new features.



I doubt it's a coincidence. There's a common pattern where a popular new topic/thing on HN generates a few reposts and posts of old articles about the popular new topic/thing.

Fantastic, I use RipGrep every day and just noticed I was on 0.4.0. I wish they had a PPA or an easier way for me to have it update automatically...

I would love for someone to maintain a PPA. :-)

I've looked into it before, but it looks like a fairly weighty process to go through.

It does look like someone is working on it though! https://github.com/x4121/ripgrep-ubuntu

Ah, fantastic! I've never deployed a PPA so I don't know the process, but how hard would it be to write a tool that would take a few debs and render the static dir to host the PPA? It seems like it shouldn't be as hard as it is?

I don't use Ubuntu or Debian. I honestly have not much idea about how any of it works. Like I said, I looked into it and balked. The Linux distro I use has a much simpler build system. It had ripgrep in its repos in a matter of days from the initial release IIRC.

ripgrep's releases do include binary debs, but there's no auto-update:

    $ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.10.0/ripgrep_0.10.0_amd64.deb
    $ sudo dpkg -i ripgrep_0.10.0_amd64.deb

Yeah, unfortunately those are versioned so I can't even make a script to have it download https://someurl/ripgrep_latest_amd64.deb. I just added `cargo install ripgrep` to my script but that might prove too slow every time, so I'll see. Thank you!

Here's a quick little script I put together that should do the trick. It's idempotent, so it won't do anything if there are no updates. I did use `jq` for convenience though.

    if command -V rg > /dev/null 2>&1; then
        curvers="$(rg -V | cut -d' ' -f2)"
    latestvers="$(curl -s "$latesturl" | jq -r .tag_name)"
    if [ "$curvers" = "$latestvers" ]; then
        echo "ripgrep is up to date"
        exit 0
    (cd /tmp && curl -LO "$url" && sudo dpkg -i "$name")
    echo "ripgrep updated from $curvers to $latestvers"

I changed it a bit to be able to update more than one package: https://www.pastery.net/mushya/

This is perfect, thank you! I didn't know that there was a GH API endpoint to give you latest release names, that's pretty neat.

It's in Ubuntu, fwiw (via being in Debian). https://packages.ubuntu.com/cosmic/ripgrep

It might even be installable on older versions of Ubuntu.

Great news, thanks! I might as well upgrade, I've been putting it off.

If you don’t mind the wait to compile rustc from source once (only for tools written in rust, of course), linuxbrew is a pretty neat way of keeping tools that aren’t directly on your distro’s repos up-to-date. Nix is another option.

Oh that's very interesting, thank you! Do you know if Nix works well with Ubuntu? I'd love to use it but I didn't think it would play well with apt.

It stands separate from apt, yes.

linuxbrew tryikg to install stuff at a non standard location like /home/linuxbrew makes me question their technical decisions to get started with it.

I agree that `/home/linuxbrew/.linuxbrew` looks weird, but `~/.linuxbrew` seems somewhat okay.

Other locations like `/usr/local` or `/opt/linuxbrew` or somesuch might not be writable by the current user, and it seems they would like to avoid requiring root access.

How would /home/linuxbrew/ be writeable by a non root?

Using /home/ like /opt/ is just weird. It seems the installer picks ~/.linuxbrew/ if the user doesn't have sudo privilege but no idea how it got to the conclusion of using /home/linuxbrew/ otherwise.

I agree with you about `/home/linuxbrew`. `/opt/linuxbrew` would be better.

Interestingly enough a friend of mine kept trying to get me to use ag, since he always saw me doing relatively long "find" commands. A discussion about perfomance, ease of use, &c. later and I wrote an ag clone in shell, called "the copper searcher." Performance is similar between the two, I think I got most of the major features implemented.

Also note that if you still have UUCP infrastructure, the name "cu" will collide with an already installed program.


Neat! That's definitely better than some of the other shell clones I've seen, in that it makes an attempt to filter out files based on .gitignore. But it looks like it still gets some .gitignore stuff wrong. It's hard to handle .gitignore correctly in a simple way.

In some simple tests on my Linux checkout, it is about an order of magnitude slower than ripgrep though. It looks like a good chunk of time is spent in gitignore filtering?

I don't think it has multi-line search, which is something I've learned that ag users happen to love, which is why it's in the most recent release of ripgrep. :-)

Yeah, this was just a 1 afternoon project, and the .gitignore syntax is more complicated than I first thought.

This is a truly fantastic product. I didn't think ag could be improved but somehow ripgrep does it. Well done.

Searching through my 'projects' directory, piping all output to wc so my terminal doesn't become a factor, it is about 16 times faster than ack; 26 times when using the gitignore filtering. It is using ~7 times more CPU to do so.

Is there a reason that you prefer wc over /dev/null?

to check the outputs are roughly the same?

Since practically everything I work on is in a git repository, I use git grep a lot. I haven't found it wanting in speed.

Ripgrep does have some neat advanced features though...

One reason I still use git grep is this: I have some minified .js files in the repository, which I don't want to be included in grep results. So I mark the files as binary in gitattributes, then git grep just says "Binary file foo matches".

For ripgrep, you can achieve this via a custom ignore file, e.g., `echo '*.min.js' > .ignore`.

Another approach I've seen people take is to put `-M300` in your ripgrep config file, and then any super long lines are automatically omitted from output.

I've seen the few posts deciding to use .ignore as the file name but I hope this won't be a mistake in choosing the right name without enough discussion. Even for users it's hard to figure what the file is for.

I consider .grepignore to be more comprehensible.

You can also use .rgignore.

Have you considered parsing .gitattributes and offering support for reducing files marked as "binary" down to a "binary file matches" notice?

Nope. I don't think I've ever used .gitattributes. Might be a neat idea. Not sure.

git grep is 8 chars and rg is 2.

yes, that's the primary reason for me :)

    alias g="git grep"

Ripgrep feels a lot faster (basically instant) to me than the alternatives. Even on very large repositories.

That's why I've switched to using it pretty exclusively now.

Git grep feels pretty much instant to me. Cant really understand the need for something faster unless its for searching through data files...

git grep can be fast, depending on what you're searching for and how much you need to search. If you commonly search literals, then git grep's literal optimizations probably make it good enough for you. But increasing the pattern complexity just a bit can result in large performance cliffs. For example: (times reported after running each command a few times to account for I/O cache)

    $ git clone --depth 1 https://github.com/BurntSushi/linux
    $ cd linux
    $ time LC_ALL=en_US.UTF-8 git grep -E '\w+_RESUME' | wc -l
    real    20.616
    user    2:02.49
    sys     0.363
    maxmem  64 MB
    $ time rg '\w+_RESUME' | wc -l
    real    0.127
    user    0.673
    sys     0.617
    maxmem  26 MB
Both of these invocations are doing roughly equivalent work, including respecting gitignores. Both of them are using a Unicode aware `\w` character class. OK, so you might say you don't care about Unicode. That's fine, ripgrep is still faster by an order of magnitude:

    $ time LC_ALL=C git grep -E '\w+_RESUME' | wc -l
    real    4.546
    user    27.741
    sys     0.420
    maxmem  63 MB
With that said, `git grep` can now be made to use PCRE2, which gives it a significant speed boost on this workload:

    $ time LC_ALL=en_US.UTF-8 git grep -P '(*UCP)\w+_RESUME' | wc -l
    real    0.894
    user    5.821
    sys     0.493
    maxmem  63 MB
    $ time LC_ALL=C git grep -P '\w+_RESUME' | wc -l
    real    0.517
    user    2.962
    sys     0.596
    maxmem  59 MB
ripgrep can do the same as of this release, but faster:

    $ time rg -P '\w+_RESUME' | wc -l
    real    0.511
    user    4.795
    sys     0.544
    maxmem  24 MB
    $ time rg -P --no-pcre2-unicode '\w+_RESUME' | wc -l
    real    0.422
    user    4.119
    sys     0.479
    maxmem  24 MB
Do these performance differences matter to you? Maybe not. But you said you couldn't understand; hopefully the numbers above add some clarity. :-) On top of that, ripgrep works just as well outside of git repos, on huge log files, binary data or even in shell pipelines.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact