Hacker News new | past | comments | ask | show | jobs | submit login
Why GNU grep is fast (2010) (freebsd.org)
511 points by jacobedawson 57 days ago | hide | past | web | favorite | 218 comments



It's probably worth pointing out that OS X is using BSD grep. You will see a phenomenal difference if you install the gnu grep via homebrew:

    $ homebrew info grep
    GNU grep, egrep and fgrep
    https://www.gnu.org/software/grep/
    /usr/local/Cellar/grep/3.3 (21 files, 885.3KB) *
      Poured from bottle on 2019-01-03 at 12:23:37
    From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/grep.rb
    ...
This is a very arbitrary example, but I've got 1.5Gb application log sitting on my machine right now that'll suffice. I'll even do this in a way that might give a slight performance advantage to BSD, by using gnu grep first:

    $ time /usr/local/bin/ggrep "foobarbaz" application.log
    
    real 0m1.319s
    user 0m0.948s
    sys 0m0.345s
vs OSX's grep:

    $ time /usr/bin/grep "foobarbaz" application.log
    
    real 0m37.225s
    user 0m31.036s
    sys 0m1.286s
1s vs 37s. Same file.

There's nothing odd about the file, straight boring application logs, and the line starts with the logging level in capital letters. So I can use a regex that is pinned to the start of the line, about as optimal as you can get:

first gnugrep:

    $ time /usr/local/bin/ggrep -c "^INFO" application.log
    1527786
    
    real 0m0.622s
    user 0m0.323s
    sys 0m0.292s
then OS X's native bsd grep:

    $ time /usr/bin/grep -c "^INFO" application.log
    1527786
    
    real 0m3.588s
    user 0m3.206s
    sys 0m0.349s
BSD grep was significantly better than the prior search, but it is still notably slower than the gnu grep.

In my experience, this isn't the most pathological example, but it's close. Note that this also applies to other tools that leverage grep under the surface, like zgrep etc.

I've specifically aliased grep over to ggrep in my shell so that I'm avoiding bsd grep whenever I can:

    $ alias grep
    alias grep='/usr/local/bin/ggrep'


All the OS X tools are horribly broken due to something fucked up inside their unicode support.

try something like tr:

  time gtr a b < application.log > /dev/null
  time tr a b < application.log > /dev/null
if you set LC_ALL=c it might be a little faster.

  $ time gtr a b < http_10x.log > /dev/null

  real 0m0.061s
  user 0m0.032s
  sys 0m0.028s

  $ time tr a b < http_10x.log > /dev/null

  real 0m2.284s
  user 0m2.267s
  sys 0m0.014s


The same LC_ALL=C trick also speeds up GNU tools, sort in particular:

    $ shuf /var/log/syslog > shuf-syslog
    $ time sort shuf-syslog > /dev/null

    real    0m0.536s
    user    0m0.870s
    sys     0m0.031s
    $ time LC_ALL=C sort shuf-syslog > /dev/null

    real    0m0.094s
    user    0m0.109s
    sys     0m0.024s
It's more dramatic with a bigger file, of course.

I once got annoyed with sort's speed, threw together a parallel external sort program that dramatically outperformed it, then realized the difference was not as dramatic with LC_ALL=C...oh well, it was a fun afternoon program anyway.


In fairness, when I'm dealing with unicode data which is actually unicode data, I don't want to set `LC_ALL=C` and fuck up my data set.


It's certainly worth considering, but the answer depends on what you're doing with it. If your goal is to binary search over it for an exact string match, for example, the important thing is that your sorting and searching code are consistent. Might as well have both use the simpler/faster LC_ALL=C semantics.


Actually sorting should be fine. The order of UTF-8 will always be the same as the codepoints encoded under it if you sort it as binary.

I really appreciate this property of UTF-8, and I can highly recommend to do a pen&paper exercise to see why it preserves order :-)

It wouldn't work if you want any type of normalisation and especially solving composition/decomposition.


Locale specific sorting is not the same as sorting by codepoint. The Unicode collation algorithm is fairly involved: https://www.unicode.org/reports/tr10/


If you are just using sort because you need to use uniq then you can replace the whole thing with an awk command that gets unique lines without sorting. I think it's faster.


Yep. For this reason I override the standard UNIX tools on macOS with Nix + home-manager. E.g.:

  ~ % which grep
  /Users/daniel/.nix-profile/bin/grep
  ~ % which sed
  /Users/daniel/.nix-profile/bin/sed
  ~ % which ls
  /Users/daniel/.nix-profile/bin/ls


So far as I can see, setting LC_ALL=c shaves about 5 seconds off the "foobarbaz" case, and nothing off the "^INFO" case.


It is definitely worth pointing out that MacOS is not using BSD grep. Apple has not tracked the updates, I am told.

* https://unix.stackexchange.com/a/398249/5132


I stand corrected. May be more accurate to say "BSD derived grep"


also worth updating your initial comment to clear that up?


It's not possible. Comments only have a small window in which they can be edited.


Doesn’t that doesn’t mean OS X isn’t using BSD grep. That just means it’s using an outdated version of BSD grep.


Just don't forget about these aliases since GNU grep and the builtin grep can vary in what they support. I once ran into a head-scratching afternoon where `sed` would work with GNU extensions from the command line but not inside a shell script, and it took far too long to figure it out!

https://stackoverflow.com/questions/40183218/why-does-sed-be...


I did the very same thing, and asked about it on SO too! https://stackoverflow.com/questions/54658932/grep-v-f-of-emp...


Keep in mind the "BSD grep" in OS X is some ancient version. Some work was done in the last few years to improve it somewhat. Today, I'd skip over both tools in favor of smarter searchers like Ripgrep or Ag (the_silver_searcher).


Just a quick callout that pinning the search to start of line does nothing. It is just another character, after all.

Now, getting it to report line numbers will kill it, since you then have to scan every character.


Why wouldn't pinning to start of line help for some patterns/data? You do still have to find the next newline, but you don't have to make other, additional comparisons once ^x is false, for that line.

Or, roughly, grepping for '^x' instead of 'x' should result in less work if the line does not contain x in the first character. One fewer comparison for each character after the first.


You seem to be thinking about handling a line at a time. In that context, checking for an x at the beginning would be much faster than looking for an x anywhere in the line. However, for anything fast you're going to be handling a large buffer at a time, if not the entire memory-mapped file. In that context, there's no difference between ^x and yx, except that ^x also needs to match at start-of-file (so if anything it should be marginally slower). Finding just x is probably faster than finding yx - longer patterns can be faster when you can skip looking at some fraction of the input (as per TFA) but I expect 2 characters isn't enough to allow that.


Just to elaborate a little on my quick callout. Doing line oriented searches actually requires you to look at every character, if only to find all the line breaks.

Instead, treat the file as a collection of characters and just look for patterns. Which may include the line break character.

That make sense?

Edit: I should also add that you should look for burntsushi's posts. Turns out some of this had become out of date. And largely depends on size of what you are searching.


newline is just another character, so grepping '^x' should be no different than grepping 'yx'.


Except at start-of-file.


I'm guessing the instructions necessary to special case beginning of file are in the nanosecond range.


Probably not much performance difference, for sure.


It's a pity that Apple lets license politics outweigh technological excellence in the software it chooses to ship.

Way back when, the first thing I'd do on any non-GNU host was to install a full GNU userland. I could write a book on my issues with GNU tools, but they are on balance preferable to the alternatives. All IMHO, of course.


Apple ships GPL v2 software, but they don't ship any GPL v3 software.

I'm uncertain if it's the patent clause or the fact that apple prevents you from modifying and running changed software (such as on the iphone).

What's funny is that I think they are technicall in violation of the GPL (their modifications to the GPL v2 bash are not entirely distributed)


> It's a pity that Apple lets license politics outweigh technological excellence in the software it chooses to ship.

Is there much else they could do? Don't see how Apple could reasonably ship GPL code...


While you're at it, could you also benchmark ripgrep and silversearcher on the same exact file? Thanks!


I personally use ripgrep instead.


You can also do `brew install grep --with-default-names` to avoid having to alias.


You can no longer do this because homebrew/core has dropped options.


In the latest version, if you need to use these commands with their normal names, you can add a "gnubin" directory to your PATH from your bashrc like: PATH="/usr/local/opt/grep/libexec/gnubin:$PATH"


Wow, many thanks. I actually do a lot of greps regularly, partly on OS X. Will replace now.

If anyone knows other tools where the gnu/home brew alternative is much better, please chime in (already saw tr on another comment).


How can it crunch through 1GB in 1s? Even just reading the data would take longer on any system I know.


According to Apple's MacBook Pro marketing page [0], their SSD supports sequential read speeds of up to 3.2GB/s. So this isn't as far-fetched as you imply, even if we're discounting other factors like the filesystem cache.

[0] https://www.apple.com/macbook-pro/


Yes, I see these specs all the time but I never see them materialize in the real world. Agree about the cache but that's also an easy way to fool yourself when you are measuring performance.


> easy way to fool yourself when you are measuring performance

In this case isn't the "already in RAM" test a more accurate reflection of performance anyway, as we are talking about the performance of grep and not the IO subsystem?

There are many cases where grep's performance won't be bottlenecked by IO, or at least not impacted directly by a bottleneck there. Anywhere when the input is coming from another program, essentially, and even if that task is IO bound it might be sending output to grep much faster than it is consuming input (perhaps gunzip <file | grep searchtext).

And in the case of searching a log file interactively, it is common that you won't just run grep of the file just once in a short space of time, instead doing it a couple of times as you refine your search, so for most of the runs it will be in cache (assuming the file is not to large).


The data could also be in the VFS cache/buffers for a number of reasons depending on the system setup and access pattern; including being written.


This usage is literally ideal for pretty much any file I/O - it's a straight sequential read. Even spinning rust will achieve >400MB/s on this type of load. Take a look at the sequential burst speeds at QD1, first chart: https://www.anandtech.com/show/13761/the-samsung-970-evo-plu...

Nearly ever SSD listed achieves well over 1GB/s in an actual benchmark, not just on a spec sheet. And these are just boring old off the shelf consumer drives. Nothing crazy.


HDDs almost never sustain 400 MB/s unless you are talking about something pretty exotic. 5200 RPM drives are generally in the 100-130 MB/s range and 7200 proportionally faster but still usually under 200 MB/s.


https://www.tomshardware.com/reviews/wd-red-10tb-8tb-nas-hdd...

So yeah maybe not over 400MB/s, but all of them are over 200MB/s. Sequential speeds really spiked as densities kept increasing.


Definitely seeing GB/s IO spikes on my samsung NVMe drives. E.g. when persisting application state into sqlite.

Note that you're not going to get this with SATA SSDs, you need NVMe, it's a 5x difference in throughput and IOPS.


On my puny laptop with an SSD I get ~400MB/s from cold cache, and ~5GB/s after the first run. So the answer is likely "it's in the FS cache".

That's a very common use-case with grep. Either grepping a file you recently wrote, or running grep multiple times as you refine the regex, at which point the files will be in the FS cache.


The SM0512G in this machine is capable of over 1.5GB/s in sequential read.

It's also possible that the file is cached in memory (I ran grep a few times through the file before I carried out the specific measurements).


As TFA mentions,

> #1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

TFA is incredibly short, and will explain it much better than I can.


> it AVOIDS LOOKING AT EVERY INPUT BYTE

This would not help, since the backing storage doesn't provide support for this kind of resolution. It would end up reading in the entire file anyways, unless your input string is on the order of an actual block.


Sure it would help, not for the IO part, but the CPU-bound part of actually checking each character, which is apparently a much lower bound in this case.


Yeah, that's why the article talks about decreasing the amount of CPU work. From the context of disk IO though (which is what this thread seems to be about) this can't help.


DMA would like a word with you


I'm still not seeing how this significantly changes things?


Loading the file from disk into memory doesn’t require reads by the CPU. That’s significantly different than the cpu doing comparisons (or even reads) on each byte.


That doesn't avoid it needing to read the data off disk though. AFAIK even SSDs still only read in 4K chunks.

As the other reply mentioned though, it's just that MacBook SSDs are that fast.


All SSDs are that fast (and way faster). Nothing special on Macbook ones.


"GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE"

Somewhat confusing since it has to look at every byte to find the newlines. They are using a pretty specific definition of "look".


This is why the linked article specifically says it does not look for newlines:

> Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!


... until you ask for line numbers.


It can't avoid breaking the output into lines, though, so it probably looks for newlines after a match is found.

I assume the Boyer Moore preprocessor reads a lot of bytes also.

Not disputing it's more efficient, but there's no magic. It avoids reading some bytes when and if it can.


> It can't avoid breaking the output into lines

It can whenever you don't ask for line numbers, can't it?

> It probably looks for newlines after a match is found

Probably, yeah. Counting number of newlines in a range, without caring just where they fall, can probably be pretty darned fast with vector instructions. Idk if that's worth the additional pass, though.


It doesn't need to look for newlines. That is looking for ^ABCD is not harder than looking for ABCD. The set of matches for the former is a subset of the latter, so if you have some fast way of doing the latter you have a fast way of doing the former (with an additional check for the preceding newline).

Another way of looking at is just considering the ^ another character (plus one-off special handling for start of file).


Others have mentioned the SSD and VFS cache, however spinning disks in a RAID configuration can easily surpass this in raw sequential read performance.


Not really "easily" - the second test does 1.5 GB in 0.622s for a throughput of 2.41 GB/s.

If we assume something like 100 MB/s sustained for spinning disks, that's a lot of disks to get to 2.41 GB/s even ignoring overheads.


100 MB/s sustained is very slow these days. Even a Barracuda budget 7200 will hit >150 MB/s real world and it goes up from there.


Even at 200 MB/s that is at least a dozen disks in zero redundancy RAID to get the needed speed.

This test was hitting the OS disk cache.


NVMe SSD's can read about 3GB/s these days.


Faster than that even, but you're not likely to see them in laptops at the moment :D

Samsung's PM1725b (https://www.samsung.com/semiconductor/ssd/enterprise-ssd/MZP...) has a Seq. Read of 6300 MB/s and Seq. Write of 3300 MB/s.


This is answered in the article.


No it isn't. Unless grep is capable of skipping entire blocks (4096 bytes) it still has to pull that data off the disk.


No because most blocks will be found in the file system cache.


GNU grep is an excellent tool! A similar tool that is rising in popularity is ripgrep (https://github.com/BurntSushi/ripgrep), which can be orders of magnitude faster than GNU grep. Give it a try!


In my personal tests, it's not at all orders of magnitude faster than GNU grep. In fact, for single file, non-unicode greps (most of my usage) the difference is so small as to be imperceptible when interactively using it.

Even for multi-file scenarios, the difference is nowhere near close the difference between GNU grep and BSD grep. This means that compatibility with GNU grep takes priority for me and it's not worth switching over to ripgrep.


It's a simple break down of communication. I try to be upfront about this, but no matter how hard I try or how many times I try to clarify it, I can't stop the spread of inaccurate claims. (And my goodness, this entire HN thread has numerous inaccuracies just with GNU grep alone.)

On equivalent tasks, ripgrep is not orders of magnitude faster than GNU grep, outside of pathological cases that involve Unicode support. (I can provide evidence for that if you like.)

For example, in my checkout of the Linux kernel, here's a recursive grep that searches everything:

    $ time LC_ALL=C grep -ar PM_RESUME | wc -l
    17

    real    1.176
    user    0.758
    sys     0.407
    maxmem  7 MB
    faults  0
Now compare that with ripgrep, with a command that uses the same amount of parallelism and searches the same amount of data:

    $ time rg -j1 -uuu PM_RESUME | wc -l
    17

    real    0.581
    user    0.187
    sys     0.384
    maxmem  7 MB
    faults  0
Which is 2x faster, but not "order of magnitude." Now compare it with how long ripgrep takes using the default command:

    $ time rg PM_RESUME | wc -l
    17

    real    0.125
    user    0.646
    sys     0.654
    maxmem  19 MB
    faults  0
At 10x faster, this is where you start to get to "order of magnitude" faster claims. But for someone who cares about precise claims with respect to performance, this is uninteresting because ripgrep is 1) using parallelism and 2) skipping some files due to `.gitignore` and other such rules.

You can imagine that if your directory has a lot of large binary files, or if you're searching in a directory with high latency (a network mount), then you might see even bigger differences from ripgrep without generally seeing a difference in search results because ripgrep tends to skip things you don't care about anyway.

In summary, there is an impedance mismatch when talking about performance because most people don't have a good working mental model of how these tools work internally. Many people report on their own perceived performance improvements and compare that directly to how they used to use grep. They aren't wrong in a certain light, because ultimately, the user experience is what matters. But of course, they are wrong in another light if you're interpreting it as a precise technical claim about the performance characteristics of a program.


> [...] because ripgrep tends to skip things you don't care about anyway.

so you bring the trick #1 from the grep author to a new level:

> #1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

You stop looking at entire files, which can avoid a lot of bytes to go through.

Also parallelisms surely helps with a lot of files, or big ones really big ones which could be processed in chunks.


No, that's a bad trick to rely on for speed these days. GNU grep's speed comes from using memchr in its skip loop. Skipping bytes is only going to matter when the thing you're searching for is longer. For short needles, there isn't much opportunity for skipping, so it's better to say inside a vectorized routine like memchr (which, for glibc, is written in Assembly on most platforms).


Whoa, which "time" command are you using that provides memory and page fault info?


I'm using the `time` built-in from zsh, with this config in my ~/.zshrc:

    TIMEFMT=$'\nreal\t%*E\nuser\t%*U\nsys\t%*S\nmaxmem\t%M MB\nfaults\t%F'


From what I can tell, the GNU time command has a -f "%F" flag that will provide it.

http://man7.org/linux/man-pages/man1/time.1.html


Excellent tool. rg is very fast. I am using it at work now :)


Second that. Well done.


^ this. For common usage of searching for decently long patterns (>3-6 bytes) with several literal characters/limited wildcarding on a text with a largish alphabet, it is algorithmically very difficult to do _far_ better than GNU Grep’s Boyer-Moore-based implementation.

That situation changes when you have very short patterns, very long patterns, many patterns, or small alphabets (eg DNA). As burnsushi notes, ripgrep’s performance difference for common feel usage comes more from being smarter about its input than from algorithms (its algorithms are solid, of course).


So if you cripple ripgrep, to act like grep, it is not much faster. This is kind of obvious. Ripgrep is faster because it makes assumptions, that will hold true for a large part of its users.

This is like saying "if I remove wings from a plane, then it won't go much faster than my car. So a plane is not technically faster than a car"

Of course, users should know the difference between grep and ripgrep (especially with tweaks like .gitignore, which could be confusing if you don't know).


They should also know that

  git ls-files -z | xargs -0 grep -nHE <regexp>
can be almost as fast.


FYI one thing I have found that rg is much slower than GNU grep is when I have 100K exact match patterns.


Yup, that's a known bug. It will be fixed in the next release. I've already done the leg work to fix it, and it should be on master soon. My local copy of ripgrep can now search with my local /usr/share/dict/words as exact patterns just as fast as grep.


> Which is 2x faster, but not "order of magnitude."

Well, it's a base-2 order of magnitude!


There’s a reason VSCode uses ripgrep under the hood for searching. It can make a huge difference in a large (1M+ LOC) project.


This "huge diffrence" is just miliseconds for a 20M LoC, as the author showed in this thread using the linux kernel source, he gives importance to the UX when compared to grep though.


Read the second half of his comment and ask what percentage of the Linux kernel source would be excluded by file type rules versus other projects. I mostly work on web and data archival projects and ripgrep is noticeably seconds (or more!) faster because it can skip images, archives, etc.

If you’re talking about the core algorithm, it’s not that much faster but the clever adjustments for common patterns and modern CPU architectures also counts for real world results. It’s just important to be precise so people know there hasn’t been some fundamental breakthrough in searching.


For all my searches so far, ripgrep is about 30-100x faster. Perhaps you're on some hardware which ripgrep isn't well suited for.


That is simply impossible.

You are measuring the wrong thing or a case where rg can skip most files.


+1 for ripgrep. It's crazy fast


It's more that it uses concurrency intelligently. The underlying stuff is similar to GNU grep.


ripgrep's README has this to say about speed:

> ripgrep is built on top of Rust's regex engine. Rust's regex engine uses finite automata, SIMD and aggressive literal optimizations to make searching very fast. Rust's regex library also maintains performance with full Unicode support by building UTF-8 decoding directly into its deterministic finite automaton engine.


The author said in a sister thread[1] that for similar tasks the performance is comparable. ripgrep (and ag) both feel incredibly fast because they skip whole files and directories which are usually unwanted when searching.

[1]: https://news.ycombinator.com/item?id=19523263


Worth mentioning that the main author of ripgrep is also the main author of Rust's regex library, so their development nicely complements each other.


Early on in my career, like most novice programmers, I thought that custom written C programs could be much faster than unix tools if written well and for a specific purpose. However, I could not beat the speed of unix tools like grep, cut or cat even once. That is when I realized just how well written these tools are and just how much optimization work has been done.


It's amazing to me how programs like these have seemed to avoid the universal phenomenon of technical debt. They've crystallized into an ideal version of themselves, and haven't continued to decay past that point. Maybe it's because of the Unix philosophy of single-purpose programs; no feature-creep tends to mean no technical debt.


By keeping a narrow focus and avoiding feature creep, the technical debt has been pretty minimal, and they've been able to pay it down easily. There's at least something to learn from the approach.

It does also help that there is a certain type of developer that sees such things as a personal challenge and will go through hell and high water to come up with more and more efficient ways to do things.


There is truth in what you say, however, if you think GNU tools are free of technical debt or feature creep, look into how ./configure works, as an example.

Or, there's the old joke about GNU echo: https://www.gnu.org/fun/jokes/echo-msg.en.html


Autoconf code is actually quite clean--you just need to know M4 and have an appreciation of the problems autoconf was built to overcome. What changed over time is that most proprietary unix systems disappeared[1], and POSIX compliance and general consistency have improved dramatically, BSDs and GNU included.

Also, best practices have shifted to a continuous development model which keeps everybody on the upgrade treadmill. There's less concern with maintaining backwards compatibility and catering to those not running the latest environments. So if you make use of some newer Linux kernel API there's only a short window where people will put in the effort to maintain a compatibility mode, assuming they bother at all.

Lastly, containers mean people often develop and ship static environments that can be maintained independently, sometimes never upgraded at all.

What I find interesting is how people have begun to ditch autoconf in favor of even more complex (but newer and therefore cooler) build systems when ironically there's less need than ever for these things. Autoconf doesn't need replacing; such layers can often be left out entirely.

That said, when feature detection and backwards compatibility truly matters there's no good alternative. CMake, for example, effectively requires the end user to install the latest version of CMake, and if you already expect someone to install the latest version of something then why the contrivance at all? I always sigh aloud whenever I download a project that relies on CMake because I know that I now have two problems on my hand, not just one. (But better CMake than the other alternatives--I just won't even bother.)

[1] All that's left are AIX and Solaris. HP-UX/PA-RISC will be officially dead next year, and EOL for HP-UX/Itanium is 2025. From a commercial perspective Solaris seems to be deader than AIX, however Solaris still seems to see more development--especially improved POSIX and Linux compatibility. It's much easier to port to Solaris than AIX. It's a real shame Solaris is disappearing because on big machines with heavy workloads the OOM killer is a fscking nightmare on Linux. Solaris and Windows (and maybe AIX?) are the only operating systems that do proper and thorough memory accounting, permitting you to write reliable software.[2] The cloud services principle that says individual processes are expendable doesn't work when your job takes hours or days to run. (Or even just minutes, because workloads accumulate when the OOM killer starts shooting things down, and even in the cloud you run into hard limits on resource usage--i.e. cap on numbers of nodes. Memory overcommit is just like network buffer bloat--intended to improve things at the small scale but which results in catastrophic degradation at the macro level.)

[2] You can disable overcommit on Linux but the model of overcommit is baked too deeply into the kernel's design. A machine can still end up with the OOM killer shooting down processes if, for example, the rate of dirty page generation outpaces the patience of the allocator trying to reclaim and access memory that is technically otherwise available.


Well, there's a lot of gross things about CMake (the syntax of its DSL is horrifying), but in my experience if you want Windows support it's a lot better than autoconf and make.


good post. i agree that the main problem that autoconf solves has gone away. cmake is regularly a pain point. its own version, as well as how un-understandable and un-debuggable the cmakelist.txt files are. sharing code on unix systems used to be painful. now the whole world is linux. until about 10 years ago, the whole world was 'gcc', but nowdays it is more likely to have a clang/gcc difference than it is to autoconf to make the right header to include.


> There is truth in what you say, however, if you think GNU tools are free of technical debt or feature creep, look into how ./configure works, as an example.

./configure is a sad thing, and the cluster of madness around it is even worse (autoconf, automake, and the ridiculous libtool). However, most gnu stuff is excellent, including gnu make. Fortunately, today most unix systems are sufficiently posix-compliant so that you can ship a gnumakefile that builds your program directly without much fuss.



That's romanticizing it a bit... there's been a ton of feature creep[1] and a fair amount of technical debt. It's just more manageable because they've contained the scope of most of the core utilities and packages better than most 'modern' software. Also, don't underestimate the value of having a relatively stable set of maintainers over long periods of time.

[1] If you think about it, things like colorization belong in grep about as much as SVG generation belongs in systemd.


How would you do colorization of matches outside of grep, other than reimplementing grep?


By optionally emitting generic structural markup which you could then use to format via colorization (or whatever other means you wanted) in a terminal, a GUI, a web page, etc. That would have been more consistent with the 'one tool, one job' philosophy of Unix. So then you could do something along the lines of 'grep --emit-markup ... | colorize-markup' (completely made up switch and command names) and use aliases/environment variables/etc. so that to a user could type something like 'cgrep' to get colorized output.


but the structure generator and decoder would be so closely coupled they might as well be the same program.

Alternatively, you could have grep take a list of colors from a config file or environment variable and blindly interpolate those strings to make colored output[1]. This also allows color to show up automatically for interactive use.

[1] this is in fact how grep works.


That's not all one might want to do the structured out.

Think of converting to HTML, as one example.


ripgrep does have a JSON output mode, if you're curious.

Of course, ripgrep handles coloring like GNU grep does.


lsusb | grep --color=always Linux | aha > grepped.html


I guess what I meant was, most software projects eventually reach a point where they have to be burned down and rebuilt (or simply deprecated in favor of a younger project). Depending on the quality of the engineering it could be 5 years or 25, but it feels like an inevitability. These single-purpose GNU tools seem to be free of that phenomenon.


> most software projects eventually reach a point where they have to be burned down and rebuilt (or simply deprecated in favor of a younger project)

Is it really that, or do new people just come along and rebuild the same thing again and again without firm understanding of the older thing?

One approach would be to call GNU tools very exceptional and marvel at them (keep in mind its origins as a clone of older Unix stuff), but perhaps it's more appropriate to ask tougher questions of people operating in this other modality, that you are calling normal expectations.


If it only does one thing it is a tool.

If it does many things it becomes a subordinate with an intelligence of its own, something you need to communicate with or talk to, as opposed to something you can simply use


It takes a lot of focused work to keep them cohesive and performant

https://www.gnu.org/software/coreutils/rejected_requests.htm...


GNU Grep’s source code is nigh-unreadable, and edge-case bugs are not unknown. I would say it has a fair amount of technical debt.


It's possible to beat the speed of cat, actually. See e.g. https://hub.darcs.net/vmchale/fastcat. One trick is to not free the memory used by the buffer :p


If you're on Linux, you can go way faster by using the splice syscall. :) See https://matthias-endler.de/2018/fastcat





Thanks! It's great to provide links to previous discussions.

Just so everybody knows: the links are for curiosity purposes. Reposts are ok after about a year, as the FAQ explains: https://news.ycombinator.com/newsfaq.html


For many situations the more complex substring search algorithms gave way to raw, brute force some time ago, I believe. For example, if you are looking for a given relatively short string, you can just take a prefix of say four bytes and then make parallel comparisons within a vector; this simple technique gets you already down to the general area of 1 cpb. The SSE 4.2 PCMPSTR family of instructions is basically the same thing but microcoded, and is a bit faster.

For short patterns, which are imho by far the most common use, any algorithm that tries to be smart and skip a couple bytes wastes cycles on being smart where a simpler brute force algorithm has already fetched the next 16 bytes and started to compare them while the prefetcher already went off to get the next line from L2.


When did you measure SSE4.2 against straightforward vector usage and find it faster? In my experience this was true more or less never. There's almost nothing in SSE4.2 that can't be done better with SSSE3. Intel has slowly deprecated SSE4.2 by not promoting it to wider vectors (it is still 128b when most other instructions are 512b) and letting the throughput/latency stagnate.

1 cycle per byte is not a good result for single string search; you can practically do a shift-and comparison of your input in that speed using general purpose registers.


That work was all in the context of searching for a binary string with a mask. I didn't try too long to optimize it, since performance was quickly satisfactory, so it's not just possible, but rather very likely that my implementation isn't particularly good. I'll keep your advice about SSE 4.2 in mind should I revisit that code.

https://github.com/rust-lang/regex/blob/master/src/literal/t... might be of interest to casual readers (not necessarily you, considering you were probably very involved in developing that algorithm ;)


I wouldn't use "Teddy" to look for single strings, at least not without heavy modification. The boring approach of hunting for a predicate or two with PCMPEQB or equivalent then shift-and'ing things together has worked well in practice for that sort of thing, although it can be a bit brittle if you get the predicate(s) wrong.


Ripgrep uses SSE3 parallelization instead of skipping input bytes to get faster on current architectures.


I'm assuming it still does both. Parallel is nice. Never touching a byte is tough to beat, though.

I've often thought of making sure my IDs are uncommon characters to exploit the ability to skip a lot.


It does not. ripgrep does not use Boyer-Moore in most searches.

In particular, the advice in the OP is generally out of date. The "secret" sauce to ripgrep's speed in simple literal searches is a simple heuristic: choose the rarest byte in the needle and feed that to memchr. (The "heuristic" is that you can't actually know the optimal choice, but it turns out that a guess works pretty well most of time since most things you search have a similar frequency distribution.)

The SSSE3 optimizations come from Hyperscan, and are only applicable when searching a small number of small patterns. e.g., `Holmes|Watson|Moriarty|Adler`.

In other words, for common searches (which are short strings), it is much better to spend more time in a vectorized routine than to try to skip bytes.


> In other words, for common searches (which are short strings), it is much better to spend more time in a vectorized routine than to try to skip bytes.

Complexity analysis of substring search focuses on the number of comparisons – at least those I saw –, much like sorting, and of course that's not an accurate model at all.


>... but it turns out that a guess works pretty well most of time since most things you search have a similar frequency distribution.)

This kind of thing is an pattern unless the benefit over Boyer-Moore huge.

A small performance gain in the common-case is not worth the pain of introducing pathological cases that only bite you once you are deeply committed.

Presumably this is not too bad for ripgrep itself, as long as it falls back to something sensible when the assumption fails.


This is wrong. Boyer-Moore does the same thing. It just always selects the last byte to feed into memchr. If that byte happens to be rare, then it works great. But if it happens to be common, then it gets into the weeds pretty quickly. My suggested heuristic does the same thing, but increases the chances that it selects a rare byte.

And yes, the performance difference can be very large. Here's an example on a ~9.3GB file:

    $ time rg --no-mmap 'Sherlock ' OpenSubtitles2016.raw.en | wc -l
    6698
    
    real    3.006
    user    1.658
    sys     1.345
    maxmem  8 MB
    faults  0
    
    $ time grep 'Sherlock ' OpenSubtitles2016.raw.en | wc -l
    6698
    
    real    9.023
    user    7.921
    sys     1.092
    maxmem  8 MB
    faults  0
Notice that the pattern is `Sherlock `. The last byte is an ASCII space character, which is incredibly common. Boyer-Moore blindly picks this as the skip byte, but ripgrep uses a simple pre-computed frequency table to select a better byte as the skip byte.

> Presumably this is not too bad for ripgrep itself, as long as it falls back to something sensible when the assumption fails.

It does. That's why I said, "ripgrep does not use Boyer-Moore in most searches."


What pathological case are you referring to? The worst case I can think of is a completely random haystack in which case it is approximately equivalent to a naive search.

Even Boyer-Moore is not superior to a naive search in literally every case, e.g. short needle or large alphabet.


Interesting. Could this perform even faster if you (for large files) took a random sample of bytes before doing the search, to get the best distribution? Or just update the distribution as the search progresses.


It's probably more terrible than its worth, and you'd need to figure out how to take the sample in a way that doesn't kill performance. It would be an interesting experiment, but the usual heuristic is just to give up on fast skipping if it's ineffective.


This makes sense. Surprises me, but makes sense.

I typically search for the largest string I can nowadays. Though, I suspect even those are not large enough to tip the needle, since I'm usually just talking about a uuid.


I've been using ack (https://beyondgrep.com/) for a while and it seems to be better suited for my specific purpose -- looking into source files and avoiding things like data dumps and VCS. It may be slower than grep, but it faster in the sense that I don't spend time remembering all the parameter combinations when I search for something.

Here are a few aliases I use:

alias ack="ack --color" # color output

alias ackl="ack -l" # show file names only

acksubl () { ack -l -i "${1}" | xargs subl } # Do case-insensitive search and open files in sublime.

Turns out in practice I don't use regular expressions very often when searching text, and the most frequent question is -- where is this function/variable might have been used?


This is the journal referenced in the mailing list: https://onlinelibrary.wiley.com/journal/1097024x

Seems the articles are free, if you don't want to pay the €6085 yearly instituational subscription.

Other than Adrian's https://blog.acolyer.org/ what else is there worth watching that is closer to engineering than theory? I wish we still had DDJ or C/C++ User Journal, but today's periodicals are things like Rasperry Pi enthusiast or Monthly Minecraft Tips.


See also “The Treacherous Optimization (2006)”:

http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...


One of my favorite idioms for searching for a string: find /path -type f |xargs grep blah

Has been replaced by `ag` https://github.com/ggreer/the_silver_searcher , ag is instant in a big repo compared to find|grep


it's mentioned elsewhere in this thread but you should try ripgrep

https://github.com/BurntSushi/ripgrep


ag has PCRE enabled by default. That's why I prefer it over ripgrep.


You might already know this, but the current release of ripgrep has support for PCRE2 (as opposed to PCRE1 in ag). All you need to do is pass the -P flag. If you want it enabled by default, then:

    echo -P > $HOME/.ripgreprc
and add this to your .bashrc or equivalent:

    export RIPGREP_CONFIG_PATH="$HOME/.ripgreprc"


will do, thanks !


Yes, though it should be noted that the reason it's faster is fewer command invocations and ignoring usually unwanted files (like .git/ and files listed in .gitignore).

grep has a -R option which allows you to avoid the first problem, but avoiding the second one is a bit more difficult.

Also there is ripgrep which I've heard is effectively a faster version of ag written in Rust.


> Also there is ripgrep which I've heard is effectively a faster version of ag written in Rust.

I've tried it but had it crashing on me in some directories so I'm sticking with ag


Can you file a bug report? Crashing is a serious bug, and I don't think one has been reported against ripgrep during a search for quite some time, so I'd be very keen on hearing about it.


Well, I did not find a crash (yet), but at least some deadlock (probably). https://github.com/BurntSushi/ripgrep/issues/1231


Thanks! Yeah that was the last crashing related bug, due to a problem doing streaming transcoding on one of ruby's utf16 encoded files. It's fixed on master.


I do grep ruby sources quite a bit, so if there was a crash it's probably this. Was there a release with a fix?


No. But one should be out soonish.


I will try to make it crash again, it's like half a year back so my memory is little fuzzy on where exactly it did happen.


For what it's worth, I use it extensively on several different platforms, and have never had a crash.


I love how you always just instantly appear outta nowhere when someone mentions a problem with ripgrep or makes a false statement about it :)


maybe he has scripts that grep for keywords on hacker news etc. :-)


The second one isn't that hard for common cases - I've got `grepr` for `grep -r --exclude-dir .git --exclude-dir .mypy_cache --exclude-dir __pycache__`. Adjust as you like for the particular sorts of things you commonly need to ignore.


Except you also need to do `--exclude *.o` and whatever else is present in the .gitignore of any subdirectory. I personally use "git grep" when searching in source trees, as it automatically gives you gitignore support.


I think the author phrased it somewhat differently, but my understanding is grep's high throughput also comes from its use of what we (Computer Engineering grad research group) referred to as a variable state machine. A colleague was researching implementing analogous on an FPGA, for gigabit line-speed throughput. Preferred Computer Science term is apparently Thompson's NFA. https://en.wikipedia.org/wiki/Thompson%27s_construction#The_...


Thompson's NFA construction, when used directly to search via an NFA simulation, is dog slow. It does give you a O(nm) search, but in practice it's slow. AIUI, GNU grep uses a lazy DFA, which uses Thompson's NFA construction to build a DFA at search time. This does indeed lead to pretty good performance for the regex engine. But GNU grep's speed largely comes from optimizing the common case: extracting literals from your pattern, identifying candidate matching lines, and then checking with the full regex engine to confirm (or reject) the match.


I suspect Thompson's NFA is not inherently dog slow (Glushkov can be done reasonably fast for decent-sized NFAs). The fact is that most Thompson-lineage engines opted for the 'lazy DFA' approach and optimized that (which is effective until it isn't). I imagine a more aggressive 'native' Thompson NFA is possible. A nice benefit of that is not having to write to your bytecode - there's a good deal of systems-level complexity stuff in RE2 that springs out of a consequence of the 'lazy DFA construction' decision.

That being said, matching literals is always going to be faster, especially if you decompose the pattern to get more use out of your literal matcher - the downside of filtration is that if the literal is always present, you are just doing strictly more work. At least with decomposition you've taken the literal out of the picture. See https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-mul... for those who don't know what I'm talking about (I know you've read it).

Am flirting with doing another regex engine that gets some of the benefit of decomposition and literal matching without taking on the nosebleed complexity of Hyperscan...


Do you know of any fast Thompson NFA simulation implementation? I don't think I've seen one outside of a JIT.

Is there a fast glushkov implementation that isn't bit parallel? I've never been able to figure out how to use bit parallel approaches with large Unicode classes. Just using a single Unicode aware \w puts it into the weeds pretty quickly. That's where the lazy DFA shines, because it doesn't need to build the full DFA for \w (which is quite large, even after the standard DFA compression tricks).


Unicode is a PITA. In Hyperscan, it's not pretty what gets generated for a bare \w in UCP mode if you force it into an NFA (it's rather more tractable as a DFA, even if you aren't lazily generating, although of course betting the farm that you can always 'busily' generate a DFA isn't great).

I've always thought that a better job of doing NFAs (Gluskov or otherwise) and staying bit-parallel would be done with having character reachability on codepoints, not bytes, generally remapping down to 'which codepoints make an actual difference'. This sounds ugly/terrifying, but the nice thing is that remapping a long stream of codepoints could be done in parallel (as it's not hard to find boundaries) and with SIMD. Step by step NFA or DFA work is more ponderous as every state depends on previous states.


Yeah, I've looked at glushkov based primarily on your comments about it, but Unicode is always where I get stuck. In my regex engine, Unicode is enabled by default and \w is fairly common, so it needs to be handled well.

And of course, one doesn't need to bet the farm on a lazy DFA if you have one, although it is quite robust in a large number of practical scenarios. (I think RE2 does bet the farm, to be fair.)


Unicode + UCP is a perfectly principled thing, but it wasn't a design point that made any sense for Hyperscan as a default. The bulk of our customers were not interested in turning 1 state for ASCII \w into 600 states for UCP \w unless it was free.

I think both Glushkov and Thompson can be done fast, but I agree that they are both going to be Really Big for UCP stuff. Idle discussions among the ex-Hyperscan folks generally leans towards 'NFA over codepoints' being the right way of doing things.

Occam's razor suggests if you do only 1 thing in a regex system (i.e. designing for simplicity/elegance, which would be an interesting change after Hyperscan) it must be NFA, as not all patterns determinize. If you are OK with a lazy DFA system that can be made to create a new state per byte of input (in the worst case) then I guess you can do that too.

I am not sure how to solve the problem of "NFA over codepoints", btw. Having no more than 256 distinct characters was easy, but even with remapping, the prospect of having to handle arbitrary Unicode is... unnerving.


Yeah, my Thompson NFA uses codepoints for those reasons. But not in particularly smart way; mostly just to reduce space usage. It is indeed an annoying problem to deal with!


... and no, I don't know of any fast Thompson NFA simulations, but I don't see why they shouldn't be possible. They have a very simple "next" function, modulo the awfulness of getting past epsilons, but that seems to be roughly parallel to the awfulness of computing arbitrary 'next' functions in Glushkov-land. I'm not aware of anyone that's actually tried.


FWIW, BSD grep has significantly closed the gap since then, often by replication GNU approach in some ways.

Also BSD grep has other advantages, primarily it's not GNU grep.


This is curious, as I just did a test with grep and ggrep. The latter is almost 3 times faster for a very common use case I have.


Presumably on macOS? Basically all BSD- or GNU-derived tools that ship on macOS are ancient, ancient versions that deserve to be burnt away and should not be considered indicative of what the current state-of-the-art is capable of.


grep -V ? ldd grep ? Use case sample ?


grep (BSD grep) 2.5.1-FreeBSD

The other commenter in this thread pointed out that this is a very old version and the newer bsd version is better.

    real 0m2.044s
    user 0m1.932s
    sys 0m0.085s
    wgl@pondera:~$ time grep LiteonTe *.text | wc -l
       11020
    
    real 0m1.939s
    user 0m1.905s
    sys 0m0.038s
    wgl@pondera:~$ time ggrep LiteonTe *.text | wc -l
       11020
    
    real 0m0.130s
    user 0m0.087s
    sys 0m0.037s
    wgl@pondera:~$ time ggrep LiteonTe *.text | wc -l
       11020
    
    real 0m0.119s
    user 0m0.088s
    sys 0m0.035s
    wgl@pondera:~$ du -h -s *.text
    128M Kismetkismet-kali-pondera-20190325-08-46-27-1.pcapdump.text
First one was done to cache then the number discarded. Thus the 'grep' you see above is the second run over the 128 mb pcap file expanded with tshark.

Dramatic.

I'll stay with the gnu grep and not update the regular ones for now.


It would help if you tested just grep when benchmarking grep. These datapoints tell a much different story.

  # /usr/bin/grep -V
  grep (BSD grep) 2.6.0-FreeBSD
  
  root@m6600:~ # /usr/local/bin/grep -V
  grep (GNU grep) 3.3
  Copyright (C) 2018 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
  
  Written by Mike Haertel and others; see
  <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
  
  root@m6600:~ # /usr/bin/time /usr/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.54 real         0.45 user         0.07 sys
  root@m6600:~ # /usr/bin/time /usr/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.54 real         0.44 user         0.08 sys
  root@m6600:~ # /usr/bin/time /usr/bin/grep X-User-Agent packetdump.pcap -c
  60
        0.54 real         0.41 user         0.11 sys
  root@m6600:~ # /usr/bin/time /usr/local/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.58 real         0.49 user         0.08 sys
  root@m6600:~ # /usr/bin/time /usr/local/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.60 real         0.48 user         0.11 sys
  root@m6600:~ # /usr/bin/time /usr/local/bin/grep X-User-Agent packetdump.pcap -c
  60
          0.59 real         0.50 user         0.08 sys
  root@m6600:~ # du -h -s packetdump.pcap
  225M packetdump.pcap


That is a very good point. Taking this better approach, here is what I get on my (not updated grep) system:

    wgl:$ /usr/bin/grep --version
    /usr/bin/grep --version
    grep (BSD grep) 2.5.1-FreeBSD
    
    wgl:$ /usr/local/bin/ggrep --version
    /usr/local/bin/ggrep --version
    ggrep (GNU grep) 3.3
    Packaged by Homebrew
    Copyright (C) 2018 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    
    Written by Mike Haertel and others; see
    <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
    
    wgl:$ /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text | wc -l
    /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text | wc -l
            2.30 real         1.04 user         0.67 sys
        1228
    wgl:$ /usr/bin/time /usr/bin/grep LiteonTe really-big.text | wc -l
    /usr/bin/time /usr/bin/grep LiteonTe really-big.text | wc -l
            5.65 real         5.30 user         0.33 sys
        1228
    wgl:$ /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text >/dev/null
    /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text >/dev/null
            0.05 real         0.03 user         0.01 sys
    wgl:$ /usr/bin/time /usr/bin/grep LiteonTe really-big.text >/dev/null
    /usr/bin/time /usr/bin/grep LiteonTe really-big.text >/dev/null
            6.50 real         5.71 user         0.58 sys
    wgl:$ /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text -c
    /usr/bin/time /usr/local/bin/ggrep LiteonTe really-big.text -c
    1228
            2.33 real         1.05 user         0.69 sys
    wgl:$ /usr/bin/time /usr/bin/grep LiteonTe really-big.text -c
    /usr/bin/time /usr/bin/grep LiteonTe really-big.text -c
    1228
            5.37 real         5.05 user         0.31 sys
The wc -l is clearly polluting the result. However, I suspect that the >/dev/null is as well. But in the worst case, I see a halving of time over the old grep (edited), which correlates with my most common use of grep in looking through source files.


Compiler is also going to make an impact which for me is consistent across both grep binaries.

FreeBSD clang version 6.0.1 as well as -O2

I suspect there are still edge cases where BSD grep is quite a bit slower or not compatible with GNU grep. However with a closer apples to apples comparison there isn't much difference anymore for my usage. Which is a lot of grep use but that is pretty vanilla.

There may also be other OS differences in our comparison. My tests where run against a fairly recent FreeBSD 12-STABLE.


> The key to making programs fast is to make them do practically nothing. ;-)

Also key to making them small, stable, intuitive, and usable


FWIW GNU's tools tend to be much larger and more complicated that competing versions of the same command.


That's natural when you try to support a standard, are multi-platform, and have extra features.


I like to call this the Zen of optimization. Calm memory is fast.


Grep is a damn useful tool, I use it every day. But is it just me or does grep sacrifice convenience for performance ? Eg. it rather skips some bytes, then to make sure it finds everything. And case sensitive is default, where you most of the time want to find all case variations.


No. Skipping bytes is only done when you can prove there isn't a match. Please see my other comments in this thread.

If you want case insensitive matching by default, then use:

    alias grep="grep -i"


"Old age and treachery will beat youth and skill every time." http://ridiculousfish.com/blog/posts/old-age-and-treachery.h...


I often have the experience of being like, "Surely I can't just do this entire task with a bunch of greps in a timely manner, it's going to take forever, right?" and then boom, the script takes like a minute to finish sorting through bazillions of lines.


Yes, Boyer-Moore is ridiculously fast, but it's a string search algorithm, not a regular expression search algorithm. You can't use it for regexes. Is TFA still true when you're not searching for literal strings?


Yes, because it will extract literals from your regex and seaech for those first. When a match of the literal is found, all you then need to do is find the line boundaries of that match and run the regex engine on just that line. It ends up working pretty well.

Of course, if your regex has no required literals at all, then this optimization can't be performed. GNU grep can slow down quite a bit in these cases, especially if you have a non-C locale enabled and use some Unicode aware features such as \w.


I keep this on my bookmarks: [1] (that I fished from this other blog post [2]).

Not sure if any actual search tools other than Scalyr use that specific implementation, but it makes for an interesting read anyway if you are into substring search algorithms (the algo described improves on Boyer-Moore).

1: http://volnitsky.com/project/str_search/

2: https://www.scalyr.com/blog/searching-1tb-sec-systems-engine...


This would depend upon which version of GNU grep one installs... About 10 years ago (when this article was written), I upgraded a RedHat system and discovered that some of my scripts were running 10 times slower than they had on the previous install. I isolated the issue to grep and benchmarked the grep from the previous release against the current release. The new grep was 10 times slower. I didn't have much more time to waste on the issue so I simply copied the grep binary from the previous release to the new install and everything was fine afterward.


This sounds a lot like when they started making it handle Unicode correctly. We ran into the same problem but fixed it by setting `LC_ALL=C` in the affected log processing scripts.


That’s so 2010. Here is what I like to read: https://blog.burntsushi.net/ripgrep/


It’s not that fast (comparatively): https://blog.burntsushi.net/ripgrep/


Boyer-Moore + Loop Unrolling... fast.

But, perhaps it could be improved further via SSE instructions, or by sending the data to a GPGPU and parallelizing the algorithm...


see ripgrep mentioned elsewhere in comments.


Just in case anybody wants to look at the code, I think these two files [0, 1] contain the main logic when it comes to searching. Please correct me if these files aren't the right ones:

[0] https://git.savannah.gnu.org/cgit/grep.git/tree/src/kwset.c

[1] https://git.savannah.gnu.org/cgit/grep.git/tree/src/kwsearch...


If you're going to link to gnu grep's source code, you might as well link to the actual git repo grep uses: https://git.savannah.gnu.org/cgit/grep.git/tree/src


Thanks, didn't realize you could actually look at the source in browser for the savannah repo


Also interesting, the Linux network stack is now faster than BSDs, which while a recent development, is still a feat.


Well, the Netflix folks claim the exact opposite :-)

Also, note that there were some quite fundamental optimizations in FreeBSD 12, such as introduction of the “epoch” mechanism - basically RCU. 11 still only uses old-fashioned locks.


could you please point me to a link where i can read about it? i am curious.



thanks, I have got something to read for the weekend :)


This is simply a very good technical mailing list discussion. Instead of the whole BSD vs. GNU thing, you get to the point information sharing on processing structures.

I do wonder about the later messages discussing the other string matching algorithms (on of them using a Trie to easily track back), while they are considered in the discussion, I didn't find any search/regex implementing them.


Are you referring to Aho-Corasick? If so, there are definitely some regex engines that use it, including I believe GNU grep: http://git.savannah.gnu.org/cgit/grep.git/tree/src/kwset.c


I must've been temporarily blind while scanning the src, thanks! The kinds of implementations that bring together various data structures and algorithms are usually rather interesting to read and learn from.


From my experience GNU CLI tools make a particular effort to be really fast (even on embedded hardware).


I think this is originally to avoid any possible plagiarism claims by Unix--if a Unix tool was originally optimized for one of size/efficiency/speed/memory use, the GNU tools were optimized for one of the others. This is part of why the `yes` utility is so freakin fast in GNU, despite being kind of a weird use case for speed.

Mac builtin:

yes | pv > /dev/null

139MiB 0:00:05 [28.4MiB/s]

GNU:

gyes | pv > /dev/null

3.31GiB 0:00:06 [ 584MiB/s]


I don't know if it is or isn't the plagiarism thing but GNU tools were written with a very different, way more modern style than Unix tools.

Unix was pretty much "do a quick loop using terse code" and "see what fits in this static small buffer, and if it doesn't, quietly truncate". GNU would reallocate buffers to fit, and take reasonable precautions on performing well on tiny/huge inputs.

GNU was actually considered bloated by the Unix crowd ("why so many more kilobytes of code?" ). It's an approach that scaled better for the future though.


See section 2.1 of [0] and the top response to this comment [1]. You'll note in the GNU standards there, it encourages optimizing for the opposite of Unix to avoid collisions in strategy.

[0] https://www.gnu.org/prep/standards/standards.html#Reading-No...

[1] https://news.ycombinator.com/item?id=14543536


It obviously has a bit of work put into it to make it fast: https://github.com/coreutils/coreutils/commit/35217221c211f3...


How is looking at the final letter of a string faster than starting with the first?



Yeah, thats easy to digest.


I miss the days when the internet was mostly html pages


The Internet still mostly is HTML pages. It's just the portion of the Internet that you look at on a daily basis that has moved to JS-heavy SPAs.


use ripgrep it's faster https://github.com/BurntSushi/ripgrep and appropriately named



TLDR: It cheats.


That’s an interesting term in this context. The technique is so clever and the gains so big that it does feel a “cheat”. But as long as it replicates the behavior and effects and constraints of the previous algorithm, is it really a “cheat” even in the loosest sense of the word? It feels far more apt to view prior work as “tryhard”


I mean is it?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: