This wouldn't build for me, so I had to apply the patch suggested by a sibling comment.
Once I got it building, my first benchmark attempt shows it as being slower:
$ curl -LO 'https://burntsushi.net/stuff/subtitles2016-sample.en.gz'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 265M 100 265M 0 0 48.6M 0 0:00:05 0:00:05 --:--:-- 49.9M
$ gzip -d subtitles2016-sample.en.gz
$ hyperfine --ignore-failure "rg -c 'ZQZQZQZQ' subtitles2016-sample.en" "krep -c 'ZQZQZQZQ' subtitles2016-sample.en"
Benchmark 1: rg -c 'ZQZQZQZQ' subtitles2016-sample.en
Time (mean ± σ): 80.7 ms ± 1.6 ms [User: 57.7 ms, System: 22.7 ms]
Range (min … max): 75.3 ms … 83.3 ms 35 runs
Warning: Ignoring non-zero exit code.
Benchmark 2: krep -c 'ZQZQZQZQ' subtitles2016-sample.en
Time (mean ± σ): 122.8 ms ± 1.4 ms [User: 372.6 ms, System: 24.4 ms]
Range (min … max): 120.2 ms … 125.5 ms 24 runs
Summary
rg -c 'ZQZQZQZQ' subtitles2016-sample.en ran
1.52 ± 0.03 times faster than krep -c 'ZQZQZQZQ' subtitles2016-sample.en
That's a benchmark with no matches, which is the best case essentially for throughput. Now I want to try a benchmark with a high match frequency:
$ hyperfine "rg -c 'the' subtitles2016-sample.en" "krep -c 'the' subtitles2016-sample.en"
Benchmark 1: rg -c 'the' subtitles2016-sample.en
Time (mean ± σ): 411.8 ms ± 3.6 ms [User: 389.7 ms, System: 21.1 ms]
Range (min … max): 404.8 ms … 415.7 ms 10 runs
Benchmark 2: krep -c 'the' subtitles2016-sample.en
Time (mean ± σ): 121.2 ms ± 1.9 ms [User: 364.6 ms, System: 24.9 ms]
Range (min … max): 113.2 ms … 123.0 ms 24 runs
Summary
krep -c 'the' subtitles2016-sample.en ran
3.40 ± 0.06 times faster than rg -c 'the' subtitles2016-sample.en
Which is very nice. So I decided to poke at it:
$ krep -c the subtitles2016-sample.en
Found 29794426 matches
$ rg -c the subtitles2016-sample.en
6123710
$ grep -c the subtitles2016-sample.en
6123710
The counts are way off here. At first I thought maybe it was counting every occurrence of `the` instead of every matching line, but when I ask ripgrep to do that, it gets a different answer:
$ rg -oc the subtitles2016-sample.en
7739791
$ rg -o the subtitles2016-sample.en | wc -l
7739791
$ grep -o the subtitles2016-sample.en | wc -l
7739791
So not sure what's going on here, but it looks like `krep` might not be giving accurate results.
Pushing it a bit more, it seems like it just kind of falls over?
$ time rg -c 'You read Sherlock Holmes to deduce that\?' subtitles2016-sample.en
10
real 0.076
user 0.049
sys 0.026
maxmem 923 MB
faults 0
$ time krep -c 'You read Sherlock Holmes to deduce that?' subtitles2016-sample.en
Found 0 matches
real 0.935
user 3.597
sys 0.029
maxmem 918 MB
faults 0
I ran the above benchmarks in `/dev/shm` on Linux with an i9-12900K.
And it uses memory maps (sometimes, when it thinks it will be fast, but it will do so in the single file case on Linux).
ripgrep also uses parallelism, but at inter-file level. It sounds like `krep` also uses parallelism, but will use multiple threads when searching a single file. I've considered doing the same in ripgrep, but haven't done enough experiments (or seen enough from someone else) to be convinced that it's the right way to go in general. It might edge out single threaded search in some cases for sure though.
EDIT: Looking at the timings in https://dev.to/daviducolo/introducing-krep-building-a-high-p..., I see, for example, ripgrep taking >40 seconds to search for the literal pattern `error` in a 5GB file. Even if you're reading from disk (which the OP is using an SSD), that does not seem right at all. Even for an exceptionally common word like `the` in this haystack, ripgrep can chew through a 13GB file in 5 seconds on my machine:
$ time rg -c the full.txt
83499915
real 5.404
user 5.092
sys 0.302
maxmem 12511 MB
faults 0
Even if I force reading from disk, we get nowhere near 40 seconds:
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time rg -c the full.txt
83499915
real 10.577
user 5.191
sys 2.105
maxmem 12511 MB
faults 42
I'm not saying the benchmark results are definitely wrong, but something looks off here that I can't easily explain. OP, can you please share a way to fully reproduce your benchmark? (Like I did above for `subtitles2016-sample.en`.)
I had a similar experience, but testing by running `strings` on the Steam Deck repair image (the largest file I had handy) to create a 203 MB strings file with 34,206,436 lines, and then checking it for the string "Steam"
$ time fgrep -c "Steam" /tmp/steamstrings
241
grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn,.idea,.tox,.venv,venv 0.09s user 0.03s system 99% cpu 0.112 total
$ time rg -c Steam /tmp/steamstrings
241
rg -c Steam /tmp/steamstrings 0.03s user 0.02s system 92% cpu 0.054 total
$ time ~/source/other/krep/krep "Steam" /tmp/steamstrings
Found 2226035 matches
Search completed in 0.0338 seconds (5991.67 MB/s)
Search details:
- File size: 202.56 MB
- Pattern length: 5 characters
- Using AVX2 acceleration
- Case-sensitive search
~/source/other/krep/krep "Steam" /tmp/steamstrings 0.08s user 0.02s system 225% cpu 0.045 total
So krep is:
1. Extremely fast
2. Extremely inaccurate
3. Not useful if you actually want to see what the lines actually are rather than just knowing how many there aren't
Not to be facetious, but if the goal is to write a program that gives incorrect output as fast as possible I don't think you need to go as far as using AVX2.
I have tried this on a couple of different machines. On one machine it gives ridiculous answers like you found. On the other it at least works as expected, although it's kinda useless since it doesn't print the matched lines.
On the working machine it reported using SSE4.2 acceleration while the broken one used AVX2 acceleration. However, the machine using SSE4.2 didn't see nearly as much speedup as the AVX2 machine. Regular system grep on the SSE4.2 machine took 0.186 seconds to do the search, while krep needed 0.154 seconds. However the biggest test file I had handy was only 123MB, so maybe the lead will grow more with a larger file?
That's probably because pcmpestri is trash for substring search. There is a good reason why ripgrep doesn't use it. :-)
I looked for an authoritative search for why pcmpestri is trash, and I couldn't find anything I was happy linking to other than Agner Fog's instruction tables: https://www.agner.org/optimize/instruction_tables.pdf You can see that the throughput and latency for pcmpestri is just awful.
And yes, not having any code to print the matching lines means that the only code path in krep is just counting things. If that's all your tool is doing, you can totally beat ripgrep or any other tool that is more applicable to generalized use cases. It's why the `memchr` crate (what ripgrep uses for single substring search) has a specialized routine for counting occurrences of bytes (which ripgrep uses for line counting): https://github.com/BurntSushi/memchr/blob/746182171d2e886006...
Because it's faster to do that than it is to reuse the generalized `memchr` API for finding the location of matching bytes.
And counting matches in a multi-threaded context is way easier than actual managing the printing of matches in the same order that you get them.
krep isn't big. You can skim its source code in a few minutes and get a good idea of how it works.
Once I got it building, my first benchmark attempt shows it as being slower:
That's a benchmark with no matches, which is the best case essentially for throughput. Now I want to try a benchmark with a high match frequency: Which is very nice. So I decided to poke at it: The counts are way off here. At first I thought maybe it was counting every occurrence of `the` instead of every matching line, but when I ask ripgrep to do that, it gets a different answer: So not sure what's going on here, but it looks like `krep` might not be giving accurate results.Pushing it a bit more, it seems like it just kind of falls over?
I ran the above benchmarks in `/dev/shm` on Linux with an i9-12900K.In terms of the approach here, ripgrep is already using a pretty sophisticated substring search algorithm: https://github.com/BurntSushi/memchr?tab=readme-ov-file#algo...
And it uses memory maps (sometimes, when it thinks it will be fast, but it will do so in the single file case on Linux).
ripgrep also uses parallelism, but at inter-file level. It sounds like `krep` also uses parallelism, but will use multiple threads when searching a single file. I've considered doing the same in ripgrep, but haven't done enough experiments (or seen enough from someone else) to be convinced that it's the right way to go in general. It might edge out single threaded search in some cases for sure though.
EDIT: Looking at the timings in https://dev.to/daviducolo/introducing-krep-building-a-high-p..., I see, for example, ripgrep taking >40 seconds to search for the literal pattern `error` in a 5GB file. Even if you're reading from disk (which the OP is using an SSD), that does not seem right at all. Even for an exceptionally common word like `the` in this haystack, ripgrep can chew through a 13GB file in 5 seconds on my machine:
Even if I force reading from disk, we get nowhere near 40 seconds: I'm not saying the benchmark results are definitely wrong, but something looks off here that I can't easily explain. OP, can you please share a way to fully reproduce your benchmark? (Like I did above for `subtitles2016-sample.en`.)