Hacker News new | past | comments | ask | show | jobs | submit login

How often do you handle files large enough to observe a difference between grep, ack, ag and rg ?

I'm willing to bet (and happy to lose) that most people, even in the subset who use grep "a lot" (defining "a lot"...), wouldn't see a significant improvement. They are people (I'm betting fewer) who need speed above all other concerns, and those people already make it to the top 1000.




The popularity of ripgrep, ag, ack, etc., is an object lesson in "defaults matter." I don't say this in the prescriptive sense, i.e., "hey, you, you should care about defaults!", but rather, in the descriptive sense, i.e., "there are a lot of people out there that care about the defaults." The second lesson to learn is that people care about the difference between "results are instant" and "there is a bit of noticeable lag." I don't personally care all that much, but other people do. (AIUI, some people use ripgrep in their fuzzy searchers, and maybe "instant" matters there. A lot of engineering went into ripgrep to make its directory traversal fast.)

Before I wrote ripgrep, I was a grep user. I hadn't migrated to ack or the silver searcher because I didn't see the need. (Sound familiar? :P) In my $HOME/bin, I had grepf:

    #!/bin/sh
    
    find ./ -type f -wholename "$1" -print0 | xargs -0 grep -nHI "$2"
and grepfr

    #!/bin/bash
    
    first=$1
    shift
    grep -nrHIF "$first" $@
And that was pretty much all I ever needed. If ack had never come along, I'm not sure I ever would have changed. The tools I had were good enough.

ripgrep didn't begin life as something that I intended to release as its own project. It began life as a way to test the performance of Rust's regex engine under similar work-loads as the regex engine in GNU grep. In other words, it was a benchmark that I used. (In fact, I used it quite a bit to reduce per-match overhead inside the regex engine. The second commit in ripgrep says, "beating 'grep -E' on some things.") I didn't really start to convert it to a tool that other people could use until I realized that it was actually as fast---or faster---than GNU grep. That, plus I was bored in an airport. :-)

A lot of people are happy with their tools that are good enough. I know I was. Has my life been dramatically changed by using ripgrep? No, not really. But I do like using it over my previous tools. It's a minor quality of life thing. It turns out, a lot of people care about minor quality of life things!

But yeah, I hear roughly the same sentiments that you say from a lot of people. All it really comes down to is different strokes and different common workloads that magnify the improvements in the tool.


Have you considered problem of printing searched results to terminal? I saw it is detailed little bit in your blog, but one of the things that bothers me is - say I am searchin for string "foo" in a 2GB log file. There are usual number of matches, nothing unusual.

But typically, I am not really looking for string "foo". I guess most users who are grepping log files are also looking for strings/text that appear slightly before and slightly after the match. This mostly happens when I am searching for an error/exception that triggered "foo". I find usability of grep frustrating when I need to search around something. It usually means, I have to restart the search with `grep -C` or something like that and even then line numbers I specified may not be enough.


Thoughts have crossed my mind, but it's a wickedly hard problem. My personal opinion is that once you start trying to solve the problem you're describing, then you really start to venture away from "line oriented searcher" to "code aware searcher" in a way that invites a lot of trade offs. The most important trade off is probably maintenance or complexity of the code.

In particular, in order to better show results, I kind of feel like the search tool needs to know something about what it's showing. Right? How else do you intelligently pick the context window for each search result? For code, maybe it's the surrounding function, for example.

The grep -C crutch (which is also available in ripgrep) is kind of the best I've got for the moment for a strictly line oriented searcher. `git grep` has some interesting bits in it that will actually try to look for the enclosing the function and emit that as context. I think it's the `-p/--show-function` flag. ... But that doesn't really help with your log files.

In any case, I am very interested in this path and even have an issue on the ripgrep tracker for it: https://github.com/BurntSushi/ripgrep/issues/95 --- I'm not sure if it really belongs in ripgrep proper, but I would really love to collect user stories. If you have any, that would be great. Examples of what you'd like the search results to look like would be great!


I use -C 20 combined with a pager.

rg foo -C 20 -p | less -R

Also, yuck, that command line. Glad I have those hidden behind shell scripts.


Hehe, yeah, I have `rgp` in my $HOME/bin:

    #!/bin/sh
    
    exec rg -p "$@" | less -RFX


First, thank you for your reply and your work.

Second, I include myself in the users of ripgrep (and the silver searcher before), I also dislike the slight waiting time when I have a better alternative.

Third, I'd like one those to be a default package in Debian.


:-)

> Third, I'd like one those to be a default package in Debian.

Yeah, I'd love that too! I know there have been people pushing on this, but AFAIK, it's stalled on "how do we package Rust applications in Debian."

(I don't use Debian and I'm not terribly familiar with their policies, so I'm not really familiar with the details.)


Rustc and Cargo are packaged for Debian; both are Rust applications. That shouldn't be the holdup.


Oh, I thought Cargo hadn't made it into Debian. I'm way behind the times then. :-)


It's not in stable, but it is in sid and buster. Now that rustc builds with Cargo, you gotta get both. :)

https://crates.io/crates/debcargo is also a big help.


> How often do you handle files large enough to observe a difference between grep, ack, ag and rg ?

> I'm willing to bet that most people, even in the subset who use grep "a lot" (defining "a lot"...), wouldn't see a significant improvement.

Daily, but not for the reasons that I think you're thinking. I work in Python a lot, which means there is typically a virtualenv in the tree somewhere, sometimes more than one. Typically, I want to search the code base itself — not a virtualenv, not the .git directory, etc. ripgrep, by default, will ignore entries in the .gitignore (and the virtualenvs are listed there, as they're not source, and cannot be committed), and repository directories like .git, and will thus not even consider those files. For my use case (searching my own code base), this is exactly what I want, and culling out those entire subtrees makes rg considerably faster than grep.

Yes — I could exclude those directories with grep by passing the appropriate flag. But it's time consuming to do so: ripgrep wins out by doing — by default — exactly what I need.

I also greatly prefer ripgrep's output format; the layout and colors make it much easier to scan than grep's.

Most of the people I've recommended ripgrep to are using grep, and passing flags to it to get it to do what essentially rg does quicker and/or by default. Ripgrep is an excellent tool.

(I used `git grep`, which is also considerably faster for similar reasons, prior to rg. But `git grep` requires a repository — for obvious reasons — and thus fails in cases where you're not in one. I often need to search several codebases when doing cross repository refactors, and ripgrep has been quite useful there.)


Possibly redundant information, but still: ag has those same features. I see lots of reasons to choose rg/ag over grep, but none yet to choose rg over ag.


ripgrep's gitignore support is more complete. ripgrep also supports seamless UTF-16 search, and can be made to search files of any number of other encodings as well.

But yes, the feature sets are very similar.


Well, the main would be speed, possibly even stability, (Rust vs C), but if you're not after those, there's little reason to choose rg over ag. On the other hand, speed is also the primary reason to go with ag over ack, so the question is, why not go with the fastest alternative?


Ripgrep’s antipitch [0] lists some reasons to prefer ag over rg.

[0]: http://blog.burntsushi.net/ripgrep/#anti-pitch


I intended to mention in my original post, but I appear to have forgot: I'm only comparing rg/grep; I have no experience with ag, so I can't speak to it. rg was my first "better than grep" tool, and it's filled my needs quite well. (Enough so that I've not felt the need to investigate ag.)


Every time I search in VS Code, which now uses rg by default.

You'll easily notice the difference on any recursive search. grep is really slow.


Depends on the size of the codebase you're grepping. I routinely deal with larger ones that I have to dissect and the difference is more than noticeable.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: