Haven't benchmarked *grep implementations, but assuming those are just CLI wrappers around RegEx libraries, I'd expect the RegEx benchmarks to be broader and more representative.
Your claim is true to a first approximation. But greps are line oriented, and that means there are optimizations that can be done that are hard to do in a general regex library. You can read more about that here: https://blog.burntsushi.net/ripgrep/#anatomy-of-a-grep (greps are more than simple CLI wrappers around a regex engine).
If you read my commentary in the ripgrep discussion above, you'll note that it isn't just about the benchmarks themselves being accurate, but the model they represent. Nevertheless, I linked the hypergrep benchmarks not because of Hyperscan, but because they were done by someone who isn't the author of either ripgrep or ugrep.
Hyperscan also has some preculiarities on how it reports matches. You won't notice it in basic usage, but it will appear when using something like the -o/--only-matching flag. For example, Hyperscan will report matches of a, b and c for the regex \w+, where as a normal grep will just report a match of abc. (And this makes sense given the design and motivation for Hyperscan.) Hypergrep goes to some pain to paper over this, but IIRC the logic is not fully correct. I'm on mobile, otherwise I would link to the reddit thread where I had a convo about this with the hypergrep author.
I want to be clear that these are intended semantics as part of Hyperscan. It's not a bug with Hyperscan. But it is something you'll need to figure out how to deal with (whether that's papering over it somehow, although I'm not sure that's possible, or documenting it as a difference) if you're building a grep around Hyperscan.
It might be the intended behavior of Hyperscan but it really feels like a bug in Hypergrep to report the matches like this - you cannot report a match which doesn't fully match the regex...
I also wonder if there's a performance issue when matching a really long line because Hyperscan is not greedy and will ping back to Hypergrep for every sub match. I guessing this is the reason for those shenanigans in the callback [0].
I don't disagree. It's why I brought this up. It's tricky to use Hyperscan, as-is, as a regex engine in a grep tool for these reasons. I don't mean to claim it is impossible, but there are non-trivial issues you'll need to solve.
It's hard to learn too much from hypergrep. It still has some rough spots:
$ hgrep -o 'foo.*bar' foobarbar.txt
foobarbar.txt
1:[Omitted long line with 1 matches]
$ hgrep -M0 -o 'foo.*bar' foobarbar.txt
Too few arguments
For more information try --help
$ hgrep -M 0 -o 'foo.*bar' foobarbar.txt
foobarbar.txt
1:[Omitted long line with 1 matches]
$ hgrep -M 0 'foo.*bar' foobarbar.txt
foobarbar.txt
1:[Omitted long line with 1 matches]
$ hgrep -M0 'foo.*bar' foobarbar.txt
terminate called after throwing an instance of 'std::invalid_argument'
what(): pattern not found
zsh: IOT instruction (core dumped) hgrep -M0 'foo.*bar' foobarbar.txt
Another issue with Hyperscan is that if you enable HS_FLAG_UTF8[1], which hypergrep does[2,3], and then search invalid UTF-8, then the result is UB.
> This flag instructs Hyperscan to treat the pattern as a sequence of UTF-8 characters. The results of scanning invalid UTF-8 sequences with a Hyperscan library that has been compiled with one or more patterns using this flag are undefined.
That's another issue you'll need to grapple with if you use Hyperscan. PCRE2 used to have this issue[4], but they've since defined the semantics of searching invalid UTF-8 with Unicode mode enabled. ripgrep 14 uses that new mode, but I haven't updated that FAQ answer yet.
Hyperscan isn't alone. Many regex engines do not support searching arbitrary byte sequences[5]. And this is why many/most regex engines are awkward to use in a fast grep implementation. Because you really do not want your grep to fall over when it comes across invalid UTF-8. And the overhead of doing UTF-8 checking in the first place (and perhaps let you just skip over lines that contain invalid UTF-8) would make it difficult to be competitive in performance. It also inhibits its usage in OSINT work.
You mean two different regex engines for the same search? That is perhaps conceptually fine, but in practice any two regex engines are likely to have differences that will make that strategy fall apart in some cases. (Perhaps unless those regex engines rigorously stick to a spec like POSIX or ecmascript. But that's not the case here. IIRC Hyperscan meticulously matches the behavior of a subset of PCRE2, but ripgrep's default engine is not PCRE2.)
You could perhaps work around this by only applying it as an optimization when you know the pattern has identical semantics in both regex engines. But you would have to do the work to characterize them.
I would rather just make the regex crate faster. If you look at the rebar benchmarks, it's not that far behind and is sometimes even faster. The case where Hyperscan really destroys everything else is for searches for many patterns.
Hyperscan has other logistical issues. It is a beast to build. And its pattern compilation times can be large (again, see rebar). Hyperscan itself only supports x86-64, so one would probably want to actually use Vectorscan (a fork of Hyperscan that supports additional architectures).
rg uses a lot of memory in the OpenSubtitles test. 903M vs 29M for ugrep. Unlike the previous test, we are not told the size of the file being searched.
Would be interesting to see comparisons where memory is limited, i.e., where the file being searched will not fit entirely into memory.
Personally I'm interested in "grep -o" alternatives. The files I'm searching are text but may have few newlines. For example I use ired instead of grep -o. ired will give the offsets of all matches, e.g.,
> rg uses a lot of memory in the OpenSubtitles test. 903M vs 29M for ugrep. Unlike the previous test, we are not told the size of the file being searched.
Which test exactly? That's just likely because of memory maps futzing with the RSS data. Not actually more heap memory. Try with --no-mmap.
I'm not sure I understand the rest of your comment about grep -o. Grep tools usually have a flag to print the offset of each match.
EDIT: Now that I have hands on a keyboard, I'll demonstrate the mmap thing. First, ugrep:
$ time ugrep-4.4.1 -c '\w+\s+Sherlock\s+Holmes\s+\w+' sixteenth.txt
72
real 22.115
user 22.015
sys 0.093
maxmem 30 MB
faults 0
$ time ugrep-4.4.1 -c '\w+\s+Sherlock\s+Holmes\s+\w+' sixteenth.txt --mmap
72
real 21.776
user 21.749
sys 0.020
maxmem 802 MB
faults 0
And now for ripgrep:
$ time rg-14.0.3 -c '\w+\s+Sherlock\s+Holmes\s+\w+' opensubtitles/2018/en/sixteenth.txt
72
real 0.076
user 0.046
sys 0.030
maxmem 779 MB
faults 0
$ time rg-14.0.3 -c '\w+\s+Sherlock\s+Holmes\s+\w+' opensubtitles/2018/en/sixteenth.txt --no-mmap
72
real 0.087
user 0.033
sys 0.053
maxmem 15 MB
faults 0
It looks like the difference here is that ripgrep chooses to use a memory map by default here. I don't think it makes much of a difference here.
If the file were bigger than available memory, then the OS would automatically handle paging.
observation: 1.sh does not include the newline after match #187; some workaround is required for 1.sh
conclusion: for me, ripgrep is too large and complicated for this simple task involving relatively small files; it's overkill. it does not feel any faster than ired at the command line. in fact, it feels slower. like python or java, or other large rust/go binaries, there is a small initial delay, a jank. whereas ired feels very smooth.
I love how you continue to ignore the fact that ired produces incorrect results.
Also:
You can use -F to make the argument to ripgrep be interpreted as a literal. No knowledge of regex is needed. It's a standard grep flag.
You also aren't using PCRE. You're using ripgrep's default engine, which is the regex crate. You need to pass -P to use PCRE2. Although I don't see the point in doing so.
I find your overall comparison here to be disengenuous personally. You can't even be arsed to acknowledge that ired returns incorrect results. And every benchmark I've run has shown ripgrep to be faster or just as fast. There's no jank.
I already acknowledged that the rg binary is beefy. It is actually statically linked by default (although it may dynamically link C libraries). I don't care if rg is 5MB. If you do, then rg isn't for you. You can keep using broken software instead.
We cannot add the surrounding context characters as literals because we do not know the identity of these characters. That is what we are attempting to find out.
Would I ever search for a repeating pattern such as \(a\(a using ired. The answer is no; I am looking for context. I would search for \(a and then add a request for context, a number characters before and/or after, as in the examples. Again, I do not know what those characters will be; that is what I am searching for. If the pattern repeats, this would be visible from viewing the context.
For line-delimited files where data is presented in a regular format, grep -A -B and -C work great for printing context. But for files that can be idiosyncratic in how they present data and/or files that lack consistent newline delimeters, for me, grep -o is inadequate for printing context.
> It would be nice to have a ripgrep without libpcre2.
Blame your packagers for that, not me. PCRE2 is an optional dependency that isn't enabled by default. If you bothered to read the README, you'd know that.
Your packagers probably enable PCRE2 because nobody gives a fuck about a MB or two. And if you do, compile your own damn software.
> It also would be nice to use BRE by default and make ERE optional, similar to grep.
Disagree. No thanks.
> What would compiling ripgrep from source entail. Would it be as easy as compiling ired.
cargo build --release
Read the docs maybe?
> ired compiles in seconds and compiling requires less than 1MB of disk space. No connection to any server is required to compile the program.
I don't care. Go talk to the Debian people. They know how to build Rust software without connecting to the Internet. Cargo supports that workflow, but it's not the default.
ripgrep isn't minimalist software. Never was. Never will be. If that's a requirement, then tough cookies.
> The answer is no; I am looking for context.
And who knows if you'll find it with ired given that it doesn't implement substring search properly. I demonstrated this with a simple example days ago. You continue to ignore it. This is why your commentary is dishonest. You keep looking for other reasons to prop up ired. Shifting goalposts. And not even bothering to acknowledge that I already told you days ago that ripgrep and ired are two different tools. They don't target the same use cases.
Go back to the start of our conversation. All I did was correct an erroneous complaint about perf where you snubbed your nose. Since then, you've hemmed and hawed and tried to turn this into a broader discussion about whether ripgrep can replace ired. I never said it could. rg -o might be able to hit some subset of ired faster than ired, but I don't claim anything beyond that. I mean obviously ripgrep works better on line oriented data. It is a grep!
Your tunnel vision is so intense you can't even realize that you're trying to rationalize the use of broken software. Who the fuck cares if it's fast or let's you show windowed context if it can't even report every match?
Fast and user friendly doesn't mean shit if it's wrong. The fact that you can't acknowledge that reveals the dishonesty in your commentary.
Wrong again. The . doesn't match newlines by default. That's standard for regex. As with just about any regex engine, you need to explicitly instruct . to match newlines. In ripgrep, that means `(?s)..query..` for example. Or pass `--multiline-dotall`. Or use `\p{any}` to match the UTF-8 encoding of any Unicode scalar value. Or `(?-u:[\x00-\xFF])` to match any byte value.
In contrast, not being able to find ABAB in ABAABAB is definitely a false negative. There's no qualification needed there. You're using badly broken software. One wonders why you can't seem to bring yourself to acknowledge it as a downside.
You have failed to provide a working example that does the simple task I presented, which ired does perfectly. That's all I am looking for. A working example. You also claimed you would provide an explanation of why traditional grep -o is so slow when adding preceeding context, e.g., .{500}pattern. You never did. You decided to start whining incessantly about not being able to match substrings instead.
Again, I am not searching for repeating strings such as ABAB. I am searching for AB preceded or followed by unknown characters. Thus the "problem" you found is not one I am going to have given the way I'm using the program. Why would I care about it. It's a hex editor, not a program I use to do searches. I never searched for a repeating pattern. You did. Then you proceed to whine about it. Incessantly. Hoping to create some sort of diversion.
Further, if you were really paying attention, you would notice that in the scripts I presented I'm not using ired to search for strings (/"string"). I am searching multiple, catenated hex pairs (/737472696e67). ired is not intended to be used that way. Although it works for my purposes, ired is only intended to be used to search a single hex pair. The issue you spotted is applicable when searching _strings_, not searching a single hex pair. But I am not searching strings. I'm searching multiple hex pairs. And because the program is not intended to be used to match catenated hex pairs, I fully expected this might not work. As it happens, it works.
Unless an example is provided, it appears that using rg -o to _print_ (not just match) characters found trailing context that happen to be newlines (task #1) works about as well as using ired to search for repeating strings in a file (task #2). It does not work. This is not surprising to me since IMO there are better programs to do those tasks. As stated in the very first comment I made, I am interesed in programs that perform task #1. I do not need solutions for task #2. Despite what you may be selling.
I already told you how to make a dot match a newline.
I explained why repeating patterns can be expensive for regex engines to execute. It's a known issue among most regex engines.
I find it funny that you whine and dance around all your little use cases, but when it comes to actual reporting correct output, you just happen to conveniently assume that none of the strings you search for produce incorrect results.
As I already said, this entire conversation started by me correcting a perf claim where you snubbed your nose. That was my only point. You then decided to talk about a grander problem that I have never engaged in. I never once said rg meets your use cases. (Your use cases seem quite strange.) I don't give a shit if you think rg can replace broken software or not.
3. Test grep v3.6, ripgrep v14.0.3 and shell script; busybox is v1.34.1
busybox time grep -Eo .{35}https:.{4} test.json;
busybox time rg -o .{35}https:.{4} test.json;
busybox time sh -c "echo https:|1.sh 45 35 test.json"
We can make the script slower by using bash
busybox time bash -c "echo https:|1.sh 45 35 test.json"
Program size
du -h /usr/bin/grep
216K/usr/bin/grep
du -h /usr/bin/rg
5.7M/usr/bin/rg
du -hc /usr/bin/ired /bin/dash /usr/bin/tr /usr/bin/sed /usr/bin/od
456K/bin/dash
40K/usr/bin/ired
56K/usr/bin/tr
68K/usr/bin/od
104K/usr/bin/sed
724Ktotal
du -h /usr/bin/busybox /usr/bin/ired
772K/usr/bin/busybox
40K/usr/bin/ired
812Ktotal
readelf -d /bin/dash /usr/bin/busybox
File: /bin/dash
There is no dynamic section in this file.
File: /usr/bin/busybox
There is no dynamic section in this file.
$ busybox time grep -Eo .{35}https:.{4} test.json
real 0m 0.15s
user 0m 0.15s
sys 0m 0.00s
$ busybox time rg-14.0.3 -o .{35}https:.{4} test.json
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
$ busybox time dash -c "echo https:|./1.sh test.json 45 35"
real 0m 0.01s
user 0m 0.01s
sys 0m 0.00s
$ busybox time bash -c "echo https:|./1.sh test.json 45 35"
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
$ busybox time dash -c "echo https:|./busy-1.sh test.json 45 35"
real 0m 0.00s
user 0m 0.01s
sys 0m 0.00s
$ busybox time bash -c "echo https:|./busy-1.sh test.json 45 35"
real 0m 0.01s
user 0m 0.01s
sys 0m 0.00s
So grep -o takes 150ms, but both ripgrep and ired are seemingly instant. But if I use zsh's builtin `time` command with my own TIMEFMT[1], it gives me numbers greater than 0:
$ time grep -Eo .{35}https:.{4} test.json
real 0.324
user 0.317
sys 0.007
maxmem 16 MB
faults 0
$ time rg-14.0.3 -o .{35}https:.{4} test.json
real 0.008
user 0.003
sys 0.003
maxmem 16 MB
faults 0
$ time dash -c "echo https:|./1.sh test.json 45 35"
real 0.010
user 0.011
sys 0.007
maxmem 16 MB
faults 0
$ time bash -c "echo https:|./1.sh test.json 45 35"
real 0.011
user 0.014
sys 0.004
maxmem 16 MB
faults 0
Would you look at that. ripgrep is faster! By a whole 2 milliseconds! WOW!
OK, since I'm a software developer and thus apparently cannot understand the lowly needs of an "ordinary user," I'll hop over to my machine with a i5-7600, which was released 6 years ago. Is that ordinary enough, or still too super charged to do any meaningful comparison whatsoever?
$ time grep -Eo .{35}https:.{4} test.json
real 0.641
user 0.620
sys 0.017
maxmem 6 MB
faults 0
$ time rg-14.0.3 -o .{35}https:.{4} test.json
real 0.010
user 0.008
sys 0.000
maxmem 8 MB
faults 0
$ time dash -c "echo https:|./1.sh test.json 45 35"
real 0.011
user 0.009
sys 0.011
maxmem 6 MB
faults 0
$ time bash -c "echo https:|./1.sh test.json 45 35"
real 0.013
user 0.021
sys 0.003
maxmem 6 MB
faults 0
(I ran the commands above each several times and took the minimum.)
OK, so ripgrep is still 1ms faster even on "ordinary user" hardware.
All right, so your other comment also shared another benchmark:
$ time grep -Eo .{100}https:.{50} test.json
real 1.777
user 1.772
sys 0.003
maxmem 6 MB
faults 0
$ time rg-14.0.3 -o .{100}https:.{50} test.json
real 0.013
user 0.006
sys 0.000
maxmem 8 MB
faults 0
$ time rg-14.0.3 --color never -o .{100}https:.{50} test.json
real 0.006
user 0.006
sys 0.000
maxmem 8 MB
faults 0
$ time dash -c "echo https:|./1.sh test.json 156 100"
real 0.015
user 0.024
sys 0.004
maxmem 7 MB
faults 0
$ time bash -c "echo https:|./1.sh test.json 156 100"
real 0.016
user 0.028
sys 0.000
maxmem 7 MB
faults 0
(Notice that disabling color and line numbers for ripgrep improves its speed a fair bit. ired isn't doing either of those things, so it's only fair. GNU grep doesn't count line numbers by default and disabling color doesn't improve its perf here.)
This one is more interesting because it exposes the fact that many regex engines have trouble dealing with bounded repeats. Something like `.{100}` for example is not executed particularly efficiently in most regex engines. And indeed, in ripgrep by default, `.` actually matches the UTF-8 encoding of any Unicode scalar value (so between 1 and 4 bytes) and not any arbitrary byte. You'd need to pass the `--no-unicode` flag or prefix your pattern with `(?-u)` to match any arbitrary byte. And indeed, even then, `.` doesn't match `\n`. So you might even want `(?s-u)`. But since this is a grep and *greps are line oriented*, you'd need to enable multi-line mode in ripgrep (GNU grep doesn't have this):
$ time rg-14.0.3 -Uo '(?s-u).{100}https:.{50}' test.json
real 0.057
user 0.041
sys 0.006
maxmem 8 MB
faults 0
$ time rg-14.0.3 --color never -N -Uo '(?s-u).{100}https:.{50}' test.json
real 0.042
user 0.041
sys 0.000
maxmem 8 MB
faults 0
This actually runs slower, I believe, because it disables the line oriented optimizations that ripgrep uses. In this case, it isn't as good at detecting the `https:` literal and looking for that first. That's where `ired` can do (a lot) better, because it isn't line oriented and doesn't need to support arbitrary regex patterns. greps are.
To complete this analysis, I'm going to do something that I realize is blasphemous to you and increase the input size by ten-fold. This will help us understand where time is being spent:
$ time grep --color=never -Eo .{100}https:.{50} test.10x.json
real 17.931
user 17.906
sys 0.017
maxmem 7 MB
faults 0
$ time rg-14.0.3 --color never -N -o '.{100}https:.{50}' test.10x.json
real 0.032
user 0.017
sys 0.010
maxmem 23 MB
faults 0
$ time rg-14.0.3 --color always -N -o '.{100}https:.{50}' test.10x.json
real 0.137
user 0.034
sys 0.019
maxmem 23 MB
faults 0
$ time dash -c "echo https:|./1.sh test.10x.json 156 100"
real 0.067
user 0.089
sys 0.069
maxmem 7 MB
faults 0
I compared the profiles of `rg --color=never` and `rg --color=always`, and they look about the same to me. This suggests to me that color is slower simply because rendering it in my terminal emulator is slower.
For grins, I also tried ugrep:
$ time ugrep-4.4.1 --color=never -o '.{100}https:.{50}' test.10x.json
real 6.003
user 5.977
sys 0.007
maxmem 6 MB
faults 0
Owch. But not as bad as GNU grep.
So with a bigger input, we can see that `rg -o` is about twice as fast as ired, even on "ordinary" hardware.
And IMO, for inputs of the size you've provided, the difference is not meaningful.
Going back to your original prompt:
> Personally I'm interested in "grep -o" alternatives.
It seems to me like `rg -o` is quite serviceable in that regard, and at the very least, substantially better than GNU grep.
At this point, I wondered what ired did for substring search[2]. That immediately stuck out to me as something that looked wrong. Indeed:
$ cat haystack
ABAABAB
$ echo -n BAB | od -An -tx1 | sed 's>^>/>;s/ //g' | ired -n haystack
0x4
$ echo -n ABAB | od -An -tx1 | sed 's>^>/>;s/ //g' | ired -n haystack
$ rg -o ABAB haystack
1:ABAB
So ired is a toy. One wonders how many search results you've missed over the years because of ired's feature "it's so minimal that it's wrong!" I mean sometimes tools have bugs. ripgrep has had bugs too. But this one has been in ired since 2009.
What is it that you said? YIKES. Yeah. Seems appropriate.
OK, so first of all, let's get one thing cleared up. What the heck is ired? It isn't in the Archlinux package repos. I found this[1], but it looks like an incomplete and abandoned project. It doesn't even have proper docs:
So like, I don't even know what `ired -n` is doing. From what I can tell from your commands, it's searching for `string`, but you first need to convert it to a hexadecimal representation.
But okay, let's also check the output between the commands and make sure they're the same. I used my own file:
$ time grep -ob string 1-2048.txt
333305:string
333380:string
920494:string
5166701:string
5210094:string
6775219:string
real 0.006
user 0.006
sys 0.000
maxmem 15 MB
faults 0
$ time rg -ob string 1-2048.txt
13123:333305:string
13124:333380:string
33382:920494:string
159885:5166701:string
161059:5210094:string
211466:6775219:string
real 0.003
user 0.000
sys 0.003
maxmem 15 MB
faults 0
$ time sh -c "echo -n string|od -An -tx1|sed 's>^>/>;s/ //g'|ired -n 1-2048.txt"
0x515f9
0x51644
0xe0bae
0x4ed66d
0x4f7fee
0x6761b3
real 0.013
user 0.010
sys 0.004
maxmem 15 MB
faults 0
Indeed, the hexadecimal offsets printed by ired line up with the offsets printed by grep and ripgrep. Notice also the timing. ired is slower here for me.
OK, now let's do context:
$ time grep -ob string 1-2048.txt
[..snip..]
real 0.006
user 0.006
sys 0.000
maxmem 16 MB
faults 0
$ time grep -ob .string 1-2048.txt
[..snip..]
real 0.005
user 0.003
sys 0.003
maxmem 16 MB
faults 0
$ time grep -ob ..string 1-2048.txt
[..snip..]
real 0.006
user 0.003
sys 0.003
maxmem 16 MB
faults 0
$ time rg -ob string 1-2048.txt
[..snip..]
real 0.004
user 0.003
sys 0.000
maxmem 16 MB
faults 0
$ time rg -ob .string 1-2048.txt
[..snip..]
real 0.004
user 0.000
sys 0.003
maxmem 16 MB
faults 0
$ time rg -ob ..string 1-2048.txt
[..snip..]
real 0.004
user 0.004
sys 0.000
maxmem 16 MB
faults 0
I don't see anything worth saying "yikes" about here.
One possible explanation for the timing differences is that your search has a lot of search results. The match count is a crucial part of benchmarking, and you've made the same mistake as the ugrep author by omitting them. But okay, let me try a search with more hits.
$ time rg -ob the 1-2048.txt | wc -l
60509
real 0.011
user 0.006
sys 0.006
maxmem 16 MB
faults 0
$ time rg -ob .the 1-2048.txt | wc -l
60477
real 0.014
user 0.014
sys 0.000
maxmem 16 MB
faults 0
$ time rg -ob ..the 1-2048.txt | wc -l
60359
real 0.014
user 0.014
sys 0.000
maxmem 16 MB
faults 0
A little slower, but that's what you'd expect with the higher match frequency. Now let's try your script for 1.sh:
$ echo the | time sh 1.sh 1-2048.txt 6 | wc -l
63304
real 0.048
user 0.072
sys 0.052
maxmem 16 MB
faults 0
$ echo the | time sh 1.sh 1-2048.txt 7 1 | wc -l
63336
real 0.056
user 0.096
sys 0.042
maxmem 16 MB
faults 0
$ echo the | time sh 1.sh 1-2048.txt 8 2 | wc -l
63419
real 0.053
user 0.079
sys 0.049
maxmem 16 MB
faults 0
(The counts are a little different because `..the` matches fewer things than `the` when given to grep, but presumably `ired` doesn't care about that.)
But in any case, ired is quite a bit slower here.
OK, let's pop up a level. Your benchmark is somewhat flawed. For three reasons. First is because the timings are so short that the differences here are generally irrelevant to human perception. It reminds me of the time when ripgrep came out, and someone would respond with a "gotcha" that `ag` was faster because it ran a search on a tiny repository in 10ms where as ripgrep took 12ms. That's not quite exactly the same as what's happening here, but it's close. The second is that the haystack is so short that overhead is likely playing a role here. The timings are just too short to be reliable indicators of performance as the haystack size scales. See my commentary on ugrep's benchmarks[2].
Let's try a bigger file:
$ stat -c %s eigth.txt
1621035918
$ file eigth.txt
eigth.txt: ASCII text
$ time rg -ob Sherlock eigth.txt | wc -l
1068
real 0.154
user 0.103
sys 0.050
maxmem 1551 MB
faults 0
$ time rg -ob .Sherlock eigth.txt | wc -l
935
real 0.156
user 0.096
sys 0.060
maxmem 1551 MB
faults 0
$ time rg -ob ..Sherlock eigth.txt | wc -l
932
real 0.154
user 0.107
sys 0.047
maxmem 1551 MB
faults 0
And now ired:
$ echo Sherlock | time sh 1.sh eigth.txt 6 | wc -l
1068
real 1.393
user 0.671
sys 0.729
maxmem 16 MB
faults 0
$ echo Sherlock | time sh 1.sh eigth.txt 7 1 | wc -l
1201
real 1.391
user 0.604
sys 0.793
maxmem 16 MB
faults 0
$ echo Sherlock | time sh 1.sh eigth.txt 8 2 | wc -l
1204
real 1.395
user 0.578
sys 0.823
maxmem 16 MB
faults 0
Yikes. Over an order of magnitude slower.
Now that the memory usage reported for ripgrep is high just because it's using file backed memory maps. It's not actual heap usage. You can check this by disabling memory maps:
$ time rg -ob ..Sherlock eigth.txt --no-mmap | wc -l
932
real 0.179
user 0.063
sys 0.116
maxmem 16 MB
faults 0
And if we increase the match frequency on the same large haystack, the gap closes a little, but ired is still about 4x slower:
$ time rg -ob ..the eigth.txt | wc -l
13141187
real 2.470
user 2.418
sys 0.050
maxmem 1551 MB
faults 0
$ echo the | time sh 1.sh eigth.txt 8 2 | wc -l
13894916
real 10.027
user 16.293
sys 8.122
maxmem 402 MB
faults 0
I'm not clear on why you're seeing the results you are. It could be because your haystack is so small that you're mostly just measuring noise. ripgrep 14 did introduce some optimizations in workloads like this by reducing match overhead, but I don't think it's anything huge in this case. (And I just tried ripgrep 13 on the same commands above and the timings are similar if a tiny bit slower.)
> Does piping rg output to wc -l affect time(1) output?
Oh yes absolutely! If `rg` is printing to a tty, it will automatically enable showing line numbers and printing with colors. Both of those have costs (over and beyond just printing matches and their byte offsets) that appear irrelevant to your use case. Neither of those things are done by ired. It's not about `wc -l` specifically, but about piping into anything. And of course, with `wc -l`, you avoid the time needed to actually render the results. But I used `wc -l` with ired too, so I "normalized" the benchmarking model and simplified it.
But either way, my most recent comment before this one capitulated to your demands and avoided the use of piping results into anything. It was for this reason that I showed commands with `--color=never -N`.
And yes, `grep -Eo` gets slower with more `.`. ripgrep does too, but is a bit more robust than GNU grep. I already demonstrated this in my most recent previous comment and even wrote some words about it explicitly and that regex engines can't typically do as well with increasing window sizes like this when compared to a purpose built tool like ired. Nevertheless, ired is still slower than ripgrep in most of the tests I showed in my previous comment.
But optimally speaking, could something even faster than both ired and ripgrep be built? I believe so, yes. But only for some workloads, I suspect, with high match frequency. And ain't nobody going to build such a thing just to save a few milliseconds. Lol. The key is really to implement the windowing explicitly instead of relying on the regex engine to do it for you. Alternatively, one could add a special optimization pass to the regex engine that recognizes "windowing" patterns and does something clever. I have a ticket open for something similar[1].
I tried your `_test2` and `rg` without `wc -l` takes 0.02s while with `wc -l` it takes 0.01s. The difference is meaningless. I don't believe you if you say that impacts your edit-compile-run cycle.
+ echo .{150}(https:)|(http:).{10}
+ n=0
+ test 0 -le 3
+ busybox time rg -o .{150}(https:)|(http:).{10} test.json
real 1m 33.37s
user 0m 1.25s
sys 0m 2.97s
+ sleep 2
+ echo
+ echo Now try this with a pipe to wc -l...
+ echo
+ sleep 2
+ busybox time+ rg -o .{150}(https:)|(http:).{10} test.json
wc -l
real 0m 0.49s
user 0m 0.45s
sys 0m 0.02s
+ echo
+ sleep 5
+ echo
+ n=1
+ test 1 -le 3
+ busybox time rg -o .{150}(https:)|(http:).{10} test.json
real 1m 34.23s
user 0m 1.75s
sys 0m 4.22s
+ sleep 2
+ echo
+ echo Now try this with a pipe to wc -l...
+ echo
+ sleep 2
+ busybox time rg -o .{150}(https:)|(http:).{10} test.json
wc -l
real 0m 0.40s
user 0m 0.37s
sys 0m 0.02s
+ echo
+ sleep 5
+ echo
+ n=2
+ test 2 -le 3
+ busybox time rg -o .{150}(https:)|(http:).{10} test.json
real 1m 33.59s
user 0m 1.05s
sys 0m 1.76s
+ sleep 2
+ echo
+ echo Now try this with a pipe to wc -l...
+ echo
+ sleep 2
+ busybox time rg -o .{150}(https:)|(http:).{10}+ test.json
wc -l
real 0m 0.45s
user 0m 0.37s
sys 0m 0.04s
+ echo
+ sleep 5
+ echo
+ n=3
+ test 3 -le 3
+ busybox time rg -o .{150}(https:)|(http:).{10} test.json
real 1m 33.99s
user 0m 1.93s
sys 0m 4.82s
+ sleep 2
+ echo
+ echo Now try this with a pipe to wc -l...
+ echo
+ sleep 2
+ busybox time rg -o .{150}(https:)|(http:).{10} test.json
wc -l
real 0m 0.40s
user 0m 0.37s
sys 0m 0.02s
+ echo
+ sleep 5
+ echo
+ n=4
+ test 4 -le 3
+ exit
No, I am not going to stare at the screen for a minute and half as thousands of matches are displayed. (In fact I am unlikely to even be examining a file of this size. It's more likely to be under 6M.) With a file this size what I would do is examine a sample of the matches, let's say for example the first 20.
Look at the speed of a shell script with 9 pipes, using ired 3x to examine the first 20 matches.
Not sure where "wc -l" came from. It was not in any of the tests I authored. That's because I am not interested in line counts. Nor am I interested in very large files either, or ASCII files of Shakespeare, etc. As I stated in the beginning, I am working with files that are like "a wall of text". Long lines, few if any linebreaks. Not the type of files that one can read or edit using less(1), ed(1) or vi(1).
What I am interested in is how fast the results display on the screen. For files of this type in the single digit MB range, piping the results to another program does not illustrate the speed of _displaying results to the screen_. In any event, that's not what I'm doing. I am not piping to another program. I am not looking for line counts. I am looking for patterns of characters and I need to see these on the screen. (If I wanted line counts I would just use grep -c -o.)
When working with these files interactively at the command line, performing many consecutive searches,^1 the differences in speed become readily observable. Without need for time(1). grep -o is ridiculously slow. Hence I am always looking for alternatives. Even a shell script with ired is faster than grep. Alas, ripgrep is not a solution either. It's not any faster than ired _at this task for files of this type and size_.
Part of the problem in comparing results is that we are ignoring the hardware. For example, I am using a small, low resource, underpowered computer; I imagine most software developers use more expensive computers that are much more powerful with vast amounts of resources.
Try some JSON as a sample but note this is not necessarily the best example of the "walls of text" I am working with; ones that do not necessarily conform to a standard.
curl "https://api.crossref.org/works?query=unix&rows=1000" > test.json
busybox time grep -Eo .{100}https:.{50}
real 0m 7.93s
user 0m 7.81s
sys 0m 0.02s
This is still a tiny file in the world of software developers. Obviously, if this takes less than 1 second on some developer machine, then any comparison with me, an end user with an ordinary computer, are not going to make much sense.
1. Not an actual "loop", but an iterative, loop-like process of search file, edit program, compile program, run program, search file, edit program, ...
With this, speed becomes noticeable even if the search task is relatively short-lived.
Well then can you share such a file? I wasn't measuring the time of wc. I just did that to confirm the outputs were the same. The fact is that I can't reproduce your timings, and ired is significantly slower in the tests I showed above.
I tried my best to stress the importance of match frequency, and even varied the tests on that point. Yet I am still in the dark as to the match frequency in your tests.
The timing differences even in your tests also seem insignificant to me, although they can be significant in a loop or something. Hence the reason I used a larger file. Otherwise the difference in wall time appears to be a matter of milliseconds. Why does that matter? Maybe I'm reading your timings wrong, but that would only deepen the mystery as to why our results are so different. Hence my request for an input that your care about so that we can get on the same page.
Not sure if it was clear or not, but I'm the author of ripgrep. The benefit to you from this exchange is that I should be able to explain why the perf difference has to exist (or is difficult to remedy) or file a TODO for making rg -o faster.
Another iterative loop-like procedure is search, adjust pattern and/or amount of context, search, adjust pattern and/or amount of context, search, ...
If a program is sluggish, I will notice.
The reason I am searching for a pattern is because there is something I consider meaningful that follows or precedes it. Repeating patterns would generally not be something I am interested in. For example, a repeating pattern such as "httphttphttp". The search I would do would more likely be "http". If for some reason it repeats, then I will see that in the context.
For me, neither grep nor grep clones are as useful as ired. ired will show me the context including the formatting, e.g., spaces, carriage returns. It will print the pattern plus context to the screen exactly as it appears in the file, also in hexdump or formatted hex, like xxd -p.
And it will do all this faster than grep -o and nearly as fast as a big, fat grep clone in Rust that spits out coloured text by default, even when ired is in a shell script with multiple pipes and other programs. TBH, neither grep nor grep clones are as flexible; they are IMO not suitable for me, for this type of task. But who knows there may be some other program I do not know about yet.
Significance can be subjective. What is important to me may not be important to someone else, and vice versa. Every user is different. Not every user is using the same hardware. Nor is every user trying to do the exact same things with their computer.
For example, I have tried all the popular UNIX shells. I would not touch zsh with a ten-foot pole. Because I can feel the sluggishness compared to working in dash or NetBSD sh. I want something smaller and lighter. I intentionally use the same shell for interactive and non-interactive use. Because I like the speed. But this is not for everyone. Some folks might like some other shell, like zsh. Because [whatever reasons]. That does not mean zsh is for everyone, either. Personally, I would never try to proclaim that the reasons these folks use zsh are "insignificant". To those users, those reasons are significant. But the size and speed differences still exist, whether any particular user deems them "significant" or not.
Well yes of course... But you haven't demonstrated ripgrep to be sluggish for your use case.
> For me, neither grep nor grep clones are as useful as ired. ired will show me the context including the formatting, e.g., spaces, carriage returns. It will print the pattern plus context to the screen exactly as it appears in the file, also in hexdump or formatted hex, like xxd -p.
Then what are you whinging about? grep isn't a command line hex editor like ired is. You're the one who came in here asking for grep -o to be faster. I never said grep (or ripgrep) could or even should replace ired in your workflow. You came in here talking about it and making claims about performance. At least for ripgrep, I think I've pretty thoroughly debunked your claims about perf. But in terms of functionality, I have no doubt whatsoever that ired is better fitted for the kinds of problems you talk about. Because of course it is. They are two completely different tools.
ired will also helpfully not report all substring results. I love how you just completely ignore the fact that your useful tool is utterly broken. I don't mean "broken" lightly. It has had hidden false negatives for 14 years. Lmao. YIKES.
> they are IMO not suitable for this type of task
Given that ripgrep gives the same output as your ired shell script (with a lot less faffing about) and it does it faster than ired, I find this claim baseless and without evidence. Of course, ripgrep will not be as flexible as ired for other hex editor use cases. Because it's not a hex editor. But for the specific case you brought up on your own accord because you wanted to come complain on an Internet message board, ripgrep is pretty clearly faster.
> nearly as fast as a big, fat grep clone in Rust
At least it doesn't have a 14 year old bug that can't find ABAB in ABAABAB. Lmao.
But I see the goalposts are shifting. First it was speed. Now that that has been thoroughly debunked, you're whinging about binary size. I never made any claims about that or said that ripgrep was small. I know it's fatter than grep (although your grep is probably dynamically linked). If people like you want to be stingy with your MBs to the point that you won't use ripgrep, then I'm absolutely cool with that. You can keep your broken software.
> Significance can be subjective. What is important to me may not be important to someone else, and vice versa. Every user is different.
A trivial truism, and one that I've explicitly acknowledged throughout this discussion. I said that milliseconds in perf could matter for some use cases, but it isn't going to matter in a human paced iteration workflow. At least, I have seen no compelling argument to the contrary.
It's even subjective whether or not you care if your tool has given you false negatives because of a bug that has existed for 14 years. Different strokes for different folks, amiright?
Interesting, it supports an n-gram indexer. ripgrep has had this planned for a few years now [1] but hasn't implemented it yet. For large codebases I've been using csearch, but it has a lot of limitations.
Unfortunately... I just tried the indexer and it's extremely slow on my machine. It took 86 seconds to index a Linux kernel tree, while csearch's cindex tool took 8 seconds.
A little off-topic, but I'd love to see a tool similar to this that provides real-time previews for an entire shell pipeline which, most importantly, integrates into the shell. This allows for leveraging the completion system to complete command-line flags and using the line editor to navigate the pipeline.
In zsh, the closest thing I've gotten to this was to bind Ctrl-\ to the `accept-and-hold` zle widget, which executes what is in the current buffer while still retaining it and the cursor position. That gets me close (no more ^P^B^B^B^B for editing), but I'd much rather see the result of the pipeline in real-time rather than having to manually hit a key whenever I want to see the result.
Any particular reason why newer tools don't follow the well-established XDG standard for config files? Those folder structures probably already exist on end user machines, and keep your home directory from getting cluttered with tens of config files
Slight rant/aside but Firefox is bad for this. You can point it to a custom profile path (e.g. .config/mozilla) but ~/.mozilla/profile.ini MUST exist. Only that one file - you can move everything else.
For ripgrep at least, you set an environment variable telling it where to look for a config file. You can put it anywhere, so you don't need to put it in $HOME.
I didn't do XDG because this route seemed simpler, and XDG isn't something that is used everywhere.
Standard should be - tool tells you where it's configured, how to change the config, and choose a 'standard' default config, such as XDG.
Assuming you aren't doing weird things with paths, I can work around 'dumb lazy' developers releasing half-assed tools with symlinks/junctions, but I really don't want to spend a ton of time configuring your tool or fighting its presumptions.
Oh okay, I guess you've got it figured out. Now specify it in enough detail for others to implement it, get all stakeholders to agree and get everyone to implement it exactly to the spec.
Good luck. You're already off to a rough start with XDG, since that isn't what is used on Windows. And it's unclear whether it ought to be used on macOS.
No, you don't understand. I'm not saying The XDG variables might not be defined. Give me a little credit here lol. I have more than a passing familiarity with XDG. I've implemented it before. I'm saying the XDG convention itself may not apply. For example, Windows. And its controversial whether to use them on macOS when I last looked into it.
I don't see any significant problem with defining an environment variable. You likely already have dozens defined. I know I do.
I'm not trying to convince you of anything. Someone asked why. This is why for ripgrep at least.
Could ripgrep not simply add a check for the XDG environment variables and use those, if no rg environment variable is given? Of course if both are not available you would use the default.
Of course. But now you've complicated how config files are found and it doesn't seem like an improvement big enough to justify it.
Bottom line is that while ripgrep doesn't follow XDG, it also doesn't force you to litter your HOME directory. That's what most people care about in my experience.
I would encourage you to search the ripgrep issue tracker for XDG. This has all been discussed.
The issue is complexity - we could create some sort or 'standard tool' library that 'just works' on all platforms, but now building the tool and runtime bootstrapping the tool become more complex, and hence more likely to _break_.
Really most people want it in their path and it just to work in as many scenarios as possible. Config almost shouldn't be the responsibility of the tool at all... (Options passed to the tool via env variables, perhaps)...
Yes, that's the problem. You need to maintain a close attention level to know which things are POSIX. And in the case of GNU grep, you actually need to set POSIXLY_CORRECT=1. Otherwise its behavior is not a subset.
POSIX also forbids greps from searching UTF-16 because it mandates that certain characters always use a single byte. ripgrep, for example, doesn't have this constraint and thus can transparently search UTF-16 correctly via BOM sniffing.
Slightly off topic, but how does one publish so many installable versions of a binary across all the package managers? I figured out how to do it for Brew, but the rest seems like a billion different steps that need to be done and I feel like I am missing something.
You only have to set up CI/CD once for each package type, afterwards all the packaging work is done for you automatically.
Ripgrep is also quite a large project (judging on both star count and contribution cout), so people probably volunteer to support their platform/package manager of choice.
ripgrep, grab, ugrep, hypergrep... Any of the four are probably fast enough for any of my use cases but I suddenly feel tempted to micro-optimize and spend ages comparing them all.
The reason is very simple: I can trust 'grep' to be on any system I ever touch. Learning ugrep doesn't make any sense as I can't trust it to be available.
I could still use it on my own systems, but I work on customer systems which won't have this tool installed.
And I'm proficient enough with grep that it's 'good enough', I'm not focussing on a better grep. I'm focussing on fixing a problem, or trying something new.
I'd rather invest my time into something that will benefit me across all environments I work with.
Because a tool may be 'better' (whatever that means) doesn't mean it will see adoption.
This is not about being closeminded, but it's about focus on what's really important.
Okay, this solves a feature I was occasionally missing for a long time: searching for several terms in files (the "Googling files" feature). I wrote a 8 line script a few weeks ago to do this, that I will gladly throw away. I'll look into the TUI too.
(I've been using ripgrep for quite some time now, how does this otherwise compare to it? would I be able to just replace rg with ug?)
For regular use, I use ugrep’s %u option with its format feature to only get one match per line same as other grep tools.
Overall, I’m a happy user of ugrep. ugrep works as well as ripgrep for me. It’s VERY fast and has built-in option to search archives within archives recursively.
They are more or less equivalent. One has obscure feature X other has obscure feature Y, one is a bit faster on A, other is a bit faster on B, the defaults are a bit different, and one is written in Rust, the other in C++.
Pick the one you like, or both. I have both on my machine, and tend to use the one that does what I want with the least options. I also use GNU grep when I don't need the speed or features of either ug and rg.
One thing I never liked about ripgrep is that it doesn't have a pager. Yes, it can be configured to use the system-wide ones, but it's an extra step (and every time I have to google how to preserve colors) and on Windows you're SOL unless you install gnu utils or something. The author always refused to fix that.
Ugrep not only has a pager built in, but it also allows searching the results which is super nice! And that feature works on all supported platforms!
Interesting - for me a built-in pager is an antifeature. I don't want to figure out how to leave the utility. Worst of all, pager usually means that sometimes you get more pages and you need to press q to exit, and sometimes not. Annoying. I often type yhe next command right away and the pager means I get stuck, or worse, pager starts doing something in response to my keys (looking at you, `git log`).
Then again I'm on Linux and can always pipe to less if I need to. I'm also not the target audience for ugrep because I've never noticed that grep would be slow. :shrug:
Some terminal emulators (kitty for sure) support "open last command output in pager". Works great with a pager that can understand ANSI colors - less fussing around with variables and flags to preserve colors in the pager
I referenced bat because I've found that suggesting cygwin sometimes provokes a negative reaction. The GP also mentioned needing to install GNU tooling as if it were a negative.
I'm sure you know but windows command prompt always came with its inbuilt pager -- more. So, you could always do "dir | more" or "rg -p "%*" | more ". (more is good with colors without flags)
I didn't! I'm not a Windows user. Colors are half the battle, so that's good. Will it only appear if paging is actually needed? That's what the flags to `less` do in my wrapper script above. They are rather critical for this use case.
For me, it's a lot easier to compile a static binary of a C++ app than a Rust one. Never got that to work. Also nice to have compatibility with all of grep's arguments.
For an up-to-date performance comparison of the latest ugrep, please see the ugrep performance benchmarks [at https://github.com/Genivia/ugrep-benchmarks]. Ugrep is faster than GNU grep, Silver Searcher, ack, sift. Ugrep's speed beats ripgrep in most benchmarks.
Although being faster in some cases, ripgrep lacks archive search support (no, transparent decompression ignoring the archive structure is not enough) which works great in ugrep.
I find myself returning to grep from my default of rg because I'm just too lazy to learn a new regex language. Stuff like word boundaries "\<word\>" or multiple patterns "\(one\|two\)".
If you consider it "the weirdest ever", I'm guessing that I'm probably older than you. I've certainly been using regex long before PCRE became common.
As a vim user I compose 10s if not 100s of regexes a day. It does not use PCRE. Nor does sed, a tool I've been using for decades. Do you also recommend not using these?
I use all of those tools but the inconsistency drives me crazy as it's hard to remember which syntax to use where. Here's how to match the end of a word:
ripgrep, Python, JavaScript, and practically every other non-C language: \b
Did you know that not all of those use the same definition of what a "word" character is? Regex engines differ on the inclusion of things like \p{Join_Control}, \p{Mark} and \p{Connector_Puncuation}. Although in the case of \p{Connector_Punctuation}, regex engines will usually at least include underscore. See: https://github.com/BurntSushi/rebar/blob/f9a4f5c9efda069e798...
And then there's \p{Letter}. It can be spelled in a lot of ways: \pL, \p{L}, \p{Letter}, \p{gc=Letter}, \p{gc:Letter}, \p{LeTtEr}. All equivalent. Very few regex engines support all of them. Several support \p{L} but not \pL. See: https://github.com/BurntSushi/rebar/blob/f9a4f5c9efda069e798...
Isn't that inconsistent with the way Perl's regex syntax was designed? In Perl's syntax an escaped non-ASCII character is always a literal [^1], and that is guaranteed not to change.
That's nice for beginners because it saves you from having to memorize all the metacharacters. If you are in doubt you on whether something has a special meaning, you just escape it.
Yes, it's inconsistent with Perl. But there are many things in ripgrep's default regex engine that are inconsistent with Perl, including the fact that all patterns are guaranteed to finish a search in linear time with respect to the haystack. (So no look-around or back-references are supported.) It is a non-goal of ripgrep to be consistent with Perl. Thankfully, if you want that, then you can get pretty close by passing the -P/--pcre2 flag.
With that said, I do like Perl's philosophy here. And it was my philosophy too up until recently. I decided to make an exception for \< and \> given their prevalence.
It was also only relatively recently that I made it possible for superfluous escapes to exist. Prior to ripgrep 14, unrecognized escapes were forbidden:
I had done it this way to make it possible to add new escape sequences in a semver compatible release. But in reality, if I were to ever add new escape sequences, it use one of the ascii alpha-numeric characters, as Perl does. So I decided it was okay to forever and always give up the ability to make, e.g., `\@` mean something other than just matching a literal `@`.
`\<` and `\>` are forever and always the lone exceptions to this. It is perhaps a trap for beginners, but there are many traps in regexes, and this seemed worth it.
Note that `\b{start}` and `\b{end}` also exist and are aliases for `\<` and `\>`. The more niche `\b{start-half}` and `\b{end-half}` also exist, and those are what are used to implement the -w/--word-regexp flag. (Their semantics match GNU grep's -w/--word-regexp.) For example, `\b-2\b` will not match in `foo -2 bar` since `-` is not a word character and `\b` demands `\w` on one side and `\W` on the other. However, `rg -w -e -2` will match `-2` in `foo -2 bar`:
Cool, but in a real life scenario where the system is not able to pull from external packages because it is in a secured environment makes myself think this is moot as you'll be out of practice of actually running grep. I would avoid not staying out of practice with grep.
On the other hand for a non-work environment where security isn't in question this is cool.
I feel like if you're going to make a new grep and put a web page for it, your webpage should start with why your grep is better than the default (or all the other ones).
> I feel like if you're going to make a new grep and put a web page for it, your webpage should start with why your grep is better than the default (or all the other ones).
No snark here, but is the subtitle not enough to start? "a more powerful, ultra fast, user-friendly, compatible grep"
> * ultra fast -- This at least means something, but it should be quantified in some way. "50%+ faster for most uses cases" or something like that.
That would be begging for nerd rage posts, just like so many disputing the benchmarks. >:D
> * user-friendly -- not even sure what this means. Seems kind of subjective anyway. I find grep plenty user friendly, for a command line tool.
Just below is a huge, captioned screenshot of the TUI?
> * compatible grep -- I mean, they all are pretty much, but I guess it's good to know this?
One would think so... but I have so many scars concerning incompatibilities with different versions of grep (as do others in the comments). If you don't know, then that feature isn't listed for you. :)
no snark here, but the subtitle was the start of my confusion: what does "user-friendly" mean in the context of grep, and why should I believe the claim?
regular expressions are not friendly, but the user friendly way for a cli filter to behave is to return retvals appropriately, output to stdout, error messages to stderr... does user friendly mean copious output to stderr? what else could it possibly mean? do I want copious output to stderr?
> no snark here, but the subtitle was the start of my confusion: what does "user-friendly" mean in the context of grep, and why should I believe the claim?
Granted, it is far from a thing of beauty, but there is a large, captioned screenshot of the included text user interface just beneath. Then again, it is a website for a command line tool. "Many Bothans died to bring us this information."