Hacker News new | past | comments | ask | show | jobs | submit login

ag is slower than ripgrep in all of my benchmarks (https://github.com/hungptit/fastgrep). And ag might be slower than GNU grep if files are stored in fast storage devices i.e SSD drives even though GNU grep is a single thread command.



Those benchmarks are interesting. They are more thorough than most, but there's still a few fairly significant problems with them. Firstly, I don't see any verification that the commands are producing the same or similar output (e.g., a match count). Secondly, you split out fgrep into separate mmap and no-mmap benchmarks, but didn't do the same for ripgrep. Thirdly, some of the flags you're giving to ripgrep are a bit weird. e.g., Why are you asking it to follow symlinks? None of the other commands do that (or at least, grep doesn't). Fourthly, it looks like your benchmark corpora are pretty small. That's going to introduce a fair bit of variance. Hyperscan in particular might let you show a bigger performance difference on larger inputs.

Also, your README describes "modularity" as a difference between fgrep and ripgrep, but ripgrep appears _significantly_ more modular. Its components are split into several libraries, each with their own good API documentation. You might consider mentioning that, as your README kind of makes it sound like ripgrep isn't available as a library where as fgrep is.


@burntsushi I used "-L" by mistake in the benchmark. I have updated the README file and the performance benchmark results using the latest version of ripgrep. For searching lines from the boost source code ripgrep 11.0 is the clear winner i.e 50% faster than GNU grep 3.3. I tried to be fair as much as I can in the performance comparison so feel free to create a github ticket if you see any issue in my benchmarks. I do have unit tests to make sure that fastgrep does produce the correct results and did many manual tests to make sure that the output of fast grep is consistent to the output of GNU grep and ripgrep. Note that the matched lines may not the same for a search pattern and it has been explained here https://rust-leipzig.github.io/regex/2017/03/28/comparison-o... or @glangdale, the author of Teddy, will be the best person to ask.

BTW, I see a 20% performance jump from rg-0.10 to rg-11.0 for a single file benchmark. What are the key differences between these two versions?


> BTW, I see a 20% performance jump from rg-0.10 to rg-11.0 for a single file benchmark. What are the key differences between these two versions?

I don't know. It would be easier to explain if I could more easily see what actual commands are being run. Your README just has you running `./all_tests`, but I want to see the actual command line invocations so that I can reproduce them. I'm also not sure which benchmark in particular you're referring to, so I don't know which corpus to use. Look at ripgrep's README for an example of what I mean. All the inputs are carefully specified and the commands being run are clear. You can even see the raw commands for the full benchmark suite: https://github.com/BurntSushi/ripgrep/blob/master/benchsuite...

I realize doing benchmarks right is a lot of work. So if you just have a particular command for me to try and compare performance, then I'd be happy to just do that.

> Note that the matched lines may not the same for a search pattern

Indeed. ;-) That's exactly why I asked. That's a really important UX concern IMO.


The latest version of fastgrep does not highlight the matched text segment and I think there is a reasonable amount of work that I have to do to make sure that the highlighted text does make sense. Again writing a usable grep command is very hard and I appreciate the effort that you have spent on developing and maintaining ripgrep. BTW, I do use ripgrep in my daily workflow and only use my fastgrep command when I have to deal with huge text files.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: