
Why GNU grep is Fast - giu
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
======
jakevoytko
I tested string searching algorithms a few years ago[0][1], and Brute Force
was surprisingly performant, enough that picking a string searching algorithm
becomes an engineering tradeoff. It found a sentence fragment at the end of
Moby Dick within 8ms, 7 times slower than Boyer-Moore. But most of us aren't
searching Moby Dick, but rather a HTTP header, some user input, or a paragraph
of a document. Even long-winded users won't write Moby Dick into your
<textarea>. Most languages and libraries use brute force for this reason -
initializing and using Boyer-Moore may be slower than brute-forcing the text.
Sometimes "practically nothing" is just a brute-force search.

But Boyer-Moore is the perfect choice for Grep! The likely inputs on Unix are
all huge: log files, entire directory trees, output pipes from loud programs,
etc. The cost of initializing a small skip table is overwhelmed by the cost of
I/O and the potential volume of text. It's not surprising that they've gone to
some lengths to optimize the core inner loops and the I/O in that context.

[0] [http://www.jakevoytko.com/blog/2007/12/11/fun-with-string-
se...](http://www.jakevoytko.com/blog/2007/12/11/fun-with-string-searching/)
I've declared bankruptcy on broken TeX and code examples... WordPress mangles
them every few updates.

[1] [http://www.lysium.de/blog/index.php?/archives/201-Fun-
With-S...](http://www.lysium.de/blog/index.php?/archives/201-Fun-With-String-
Searching.html) A few improvements to the code in my post

~~~
jacquesm
> Brute Force was surprisingly performant, enough that picking a string
> searching algorithm becomes an engineering tradeoff. It found a sentence
> fragment at the end of Moby Dick within 8ms, 7 times slower than Boyer-
> Moore.

I think that line of thinking is actually symptomatic of producing the kind of
software that eats up our present day powerhouses and makes them dog slow.

~~~
nostrademons
His next line was "But most of us aren't searching Moby Dick, but rather a
HTTP header, some user input, or a paragraph of a document."

A HTTP header is several orders of magnitude shorter than Moby Dick.

~~~
fleitz
The chances of analyzing just one HTTP header in an application are almost
nil. Use a boyer-moore skip table.

~~~
adamtj
This assumes that you are going to the trouble of initializing your skip table
once and re-using it. Now you have state to maintain beyond the life of the
function call, or else your performance is slower than brute-force. That, in
addition to probably needing to write the function in the first place, means
you've got all sorts of bugs to find and fix.

And anyway, what are you doing searching HTTP headers in anything more than a
one-off script? More likely, you are parsing the whole header and sticking it
in a hash table. So, not only aren't you searching, but even if you were,
that's not the hard part. And even that is dwarfed by the application that's
going to service the HTTP request. (Unless you are Google, in which case you
don't need my advice.)

Searching HTTP headers is not your bottleneck. Use your language's built in
string search. Premature optimization makes code slower.

------
jacquesm
Boyer-Moore is one of the examples that made me realize clearly that on the
larger scale of programmer competence I'm nobody special. Some algorithms show
such out-of-the-box thinking that it blows your mind.

The most interesting thing is that most pattern matching algorithms up to then
got slower with longer match strings, but Boyer-Moore actually got faster!

To quote Majikthise: "Bloody hell, now that is what I call thinking."...

~~~
Daishiman
If you think Boyer-Moore is trippy, try to understand the partial match lookup
table creation algorithm in Knuth-Morris-Pratt:
[http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pr...](http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#.22Partial_match.22_table_.28also_known_as_.22failure_function.22.29)

~~~
mhansen
They're about the same amount of trippiness. Boyer-Moore makes the same kind
of jump table, except Boyer-Moore starts comparing at the end of the search
string.

------
RiderOfGiraffes
This:

    
    
      > The key to making programs fast is
      > to make them do practically nothing.
      > ;-)
    

is a paraphrase of something I posted here a long time ago:

    
    
      > You can't make programs run faster,
      > you can only make them do less.
    

While not entirely true (and Ph.D. theses have been written about the corners
where it's wrong) it's an excellent start when you have to make a program run
faster.

~~~
vinutheraj
_While not entirely true_...

One counter-example I can think of is that of Judy trees -
<http://judy.sourceforge.net/> which uses more instructions to try to keep the
data in cache for as long as possible.

 _A (CPU) cache-line fill is additional time required to do a read reference
from RAM when a word is not found in cache. In today's computers the time for
a cache-line fill is in the range of 50..2000 machine instructions. Therefore
a cache-line fill should be avoided when fewer than 50 instructions can do the
same job._

More info at <http://judy.sourceforge.net/doc/10minutes.htm>

~~~
anamax
> One counter-example I can think of is that of Judy trees -
> <http://judy.sourceforge.net/> which uses more instructions to try to keep
> the data in cache for as long as possible.

It's not a counter-example. The cost of running a program includes the cost to
access memory as well as the cost of executing instructions. It also includes
the cost to access disk/flash.

------
thaumaturgy
One of the issues of Apple's Develop magazine a long time ago explained the
Boyer-Moore, the Tuned Boyer-Moore, and the Self-Tuning Boyer-Moore, which is
what I used for a C library I was working on at the time.

If you think Boyer-Moore is a trip, check out Self-Tuning Boyer-Moore. There's
actually a really good writeup on it at
<http://www.grouse.com.au/ggrep/string.html>

------
rntz
Boyer-Moore is a great algorithm, but it's for fixed-string searching. grep,
in the general case, handles regular expression searching. Is there some way
to extend Boyer-Moore to regular expressions that I'm unaware of? Or is GNU
grep's use of Boyer-Moore limited to when the search string is just a literal?

~~~
kscaldef
In practice, most regular expressions contain some literal strings. You can
use Boyer-Moore to anchor the match, then do a full regexp match from there.

~~~
jimbokun
..and the classical answer to implementing full Regex is finite state
automaton, correct? At least, there is a one to one correspondence.

I'm curious, though, about what tricks are used in actual implementations to
speed things up, and what modern Regex features necessitate climbing further
up the Chomsky Hierarchy. (I seem to recall reading about features getting
slipped into Regex engines that made them no longer finite state, but can't
recall what they were, right now.)

~~~
sharkbot
<http://en.wikipedia.org/wiki/Shift-or>

Check out Shift-Or: used in agrep, and another example of a very clever
algorithm. Rather than translating a non-deterministic finite state automata
to a deterministic one, it uses the boolean operations of the hardware to
simulate the NFA directly. Result: linear time regexp for patterns that have
less than the bit-length of a machine's registers. I.e., 32 bytes on x86, 64
bytes on x86_64, etc.

------
jules
Is there a simple generalization of Boyer Moore to regular expressions? Can
something be said about the optimality of string searching, for example by
assuming a probability distribution of input texts?

------
terinjokes
"The key to making programs fast is to make them do practically nothing. ;-)"

In a prefect world, I would prefer many programs that did nothing and worked
together, than a monolithic program that does anything (and even contains a
kitchen sink!) but is slow.

So come on fellow developers, let's make a bunch of nothing!

~~~
tszming
This is essentially the same as the "RISC patent" - a patent that essentially
said "if you make something simpler, it'll go faster"

Quoted from James Gosling,
<http://nighthacks.org/roller/jag/entry/quite_the_firestorm>

Also, kudo's to the 15 years maintainer of GNU grep.

~~~
jonah
Isn't that a big part of the GNU mentality too? Lots of little programs that
can be easily chained together.

~~~
telemachos
_Lots of little programs that can be easily chained together._

I think that's usually described as "the Unix philosophy." It's not limited to
GNU (nor does it originate with GNU). See, for example, the Wikipedia article
on "Unix philosophy"[1]:

 _Doug McIlroy, the inventor of Unix pipes and one of the founders of the Unix
tradition, summarized the philosophy as follows:[2]

This is the Unix philosophy: Write programs that do one thing and do it well.
Write programs to work together. Write programs to handle text streams,
because that is a universal interface.

This is usually abridged to "Write programs that do one thing and do it
well"._

[1][http://en.wikipedia.org/wiki/Unix_philosophy#McIlroy:_A_Quar...](http://en.wikipedia.org/wiki/Unix_philosophy#McIlroy:_A_Quarter_Century_of_Unix)

[2]<http://www.faqs.org/docs/artu/ch01s06.html>

~~~
jacquesm
There is this strange wall between 'program' and 'subroutines' that at times
feels completely artificial.

Why shouldn't 'grep' be automatically available as a routine once programmed?

I can see some of the charm of 'images' such as used by smalltalk.

~~~
silentbicycle
There are benefits to both approaches.

An external program communicating over pipes can let the OS handle buffering,
run on another process core, be swapped out for another program that speaks
the same protocol (perhaps in a faster language), won't crash the whole
system, etc.

A program running as a library subroutine has a bit less overhead, and can use
more context (library-native data structures, rather than piped text), but
this is also usually more language-specific. Working within a "full
environment" language like Smalltalk has a lot of advantages, but it also
needs comprehensive libraries for your problem domain, or you're back to using
external programs.

There's an insightful aside about this in Joe Armstrong's _Programming
Erlang_, in the chapter about ports - Erlang code can load foreign code as
linked-in libraries, but a buggy library will make the whole system unstable
in a way that code running in a foreign process and communicating via message
passing will not. He argues for running code in an external process (a "port")
by default.

Of course, having a comprehensive (but low-level) library in C with wrappers
in higher-level languages is an option. High-level languages' type systems /
object models can be very different, though, and it takes experience to
translate a C API to feel native to Python/Lua/Ruby/etc.

It also works to structure a program as a C library, but provide a small
standalone program which gives it a command line interface. SQLite and Lua are
good examples of the latter approach.

~~~
jacquesm
I (still) think that message passing is a vastly undervalued mechanism. Erlang
is a really interesting language by the way, I wished I had more time to
devote to learning it.

------
Qerub
Everybody should read the great post "The Treacherous Optimization" about
`grep` at [http://ridiculousfish.com/blog/archives/2006/05/30/old-
age-a...](http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-
treachery/).

------
powdahound
ack (<http://betterthangrep.com>) is a great alternative to grep, although
certainly slower as its written in Perl. Great for working with smaller files
such as source code though.

~~~
koenigdavidmj
It's faster when you are trying to skip .svn directories, and certainly easier
on the fingers.

------
frou_dh
The Security Now podcast did an episode on the Boyer/Moore algorithm. Not the
most information dense way to learn about it, but might be of interest.

( Starts at 34m10s -- <http://twit.tv/sn203> )

------
acqq
Speaking as a guy who writes in assembly too, I'm not sure that much can be
gained by BM at least on modern out-of-order processors. The cost of comparing
every byte is low, and to skip a few bytes can cost more in additional
instructions which can slow the pipeline than it would be by just searching
through every byte for the first one. Other speedups (mmap) sound reasonable.

------
konad
Plan9 grep is competitive in my simple test, often faster. It is also immune
from the pathological data that will kill GNU grep - see
<http://swtch.com/~rsc/regexp/> and the BUGS section of your local GNU grep

    
    
        #!/usr/local/plan9/bin/rc
    
        fn 9grep { /usr/local/plan9/bin/grep '2010-[0-9][0-9]-23 02:01:57' /home/maht/lighttpd.error.log > /dev/null }
    
        fn ggrep { /usr/bin/grep '2010-[0-9][0-9]-23 02:01:57' /home/maht/lighttpd.error.log > /dev/null }
    
        fn mgrep { /usr/bin/grep -mmap '2010-[0-9][0-9]-23 02:01:57' /home/maht/lighttpd.error.log > /dev/null }
    
        switch($1) {
        case -9
                9grep
        case -g
                ggrep
        case -m
                mgrep
        case *
                ls -l /home/maht/lighttpd.error.log
                time /tmp/gtest -9
                time /tmp/gtest -g
                time /tmp/gtest -m
        } 
    
        /tmp/gtest
        -rw-r--r--  1 www  wheel  1113325534 Aug 23 18:51   /home/maht/lighttpd.error.log
               23.67 real         3.88 user         3.74 sys
               24.28 real         0.63 user         3.89 sys
               23.09 real         0.56 user         3.87 sys

~~~
kragen
It looks like all three of your grep commands there are I/O-bound (where are
you getting a machine with much less than a gig of RAM, but 50 megabytes per
second of disk streaming? Is this an old server with a RAID?), but plan9 grep
uses six times as many CPU cycles as the other greps.

I wouldn't call that "competitive", even if system-call overheads _does_ knock
that crippling slowdown down to less than a factor of two.

Also, what kind of kernel are you running there that would peg your CPU with a
mere 300 megabytes per second of disk I/O? Is DMA disabled on your disk or
something? Surely not, because there aren't any IDE PIO modes that are
anywhere close to 50 megabytes per second.

------
ramki
i heard most of the editors use Boyer-Moore, somebody please confirm me. :(

