
Why GNU grep is fast (2010) - kurren
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
======
ColinWright
In case you're interested to see what older HN contributors had to say about
this, here are some of the previous submissions, all of which have significant
discussion:

[https://news.ycombinator.com/item?id=1626305](https://news.ycombinator.com/item?id=1626305),
1193 days ago, 115 comments

[https://news.ycombinator.com/item?id=2393587](https://news.ycombinator.com/item?id=2393587),
972 days ago, 68 comments

[https://news.ycombinator.com/item?id=2860759](https://news.ycombinator.com/item?id=2860759),
842 days ago, 43 comments

( ionelm also pointed out a previous submission:
[https://news.ycombinator.com/item?id=6814153](https://news.ycombinator.com/item?id=6814153)
)

Also worth mentioning is "The Treacherous Optimization" (
[http://ridiculousfish.com/blog/posts/old-age-and-
treachery.h...](http://ridiculousfish.com/blog/posts/old-age-and-
treachery.html) ), although previous submissions of that provoked no
discussion at all.

[https://news.ycombinator.com/item?id=1624402](https://news.ycombinator.com/item?id=1624402)

[https://news.ycombinator.com/item?id=5257874](https://news.ycombinator.com/item?id=5257874)

 _ADDED IN EDIT: More rigorous searching has turned up substantial discussion
of the Treacherous Optimization:_
[https://news.ycombinator.com/item?id=1627367](https://news.ycombinator.com/item?id=1627367)

~~~
antirez
It's great that you posted this, but it would be greater if HN was capable of
doing it automatically since it is mostly trivial, but provides a lot of
value.

~~~
steveklabnik
In the past, there were semi-automated accounts that did this, but they made
more people upset than happy.

I would also prefer to see this happen on every story.

~~~
gitaarik
hey hey hey, if you want hacker news to stay fast you should't make it do too
much!

~~~
ColinWright
The systems I wrote were independent of HN and didn't slow it down. I got
significant grief for them and pulled them.

Standard "wisdom" for start-ups includes "If you're not embarrassed by your
first product then you didn't launch early enough," and "Try the minimal
viable product before investing too much time." I both launched early, and
made sure the "product" was absolutely minimal. It got slated by the people it
was trying to help, so I didn't bother refining it.

The experience was educational and instructive.

 _(minor edit to remove some inappropriate snark - apologies)_

~~~
AlisdairO
I wouldn't be surprised if that was because people misinterpreted it as
complaining about reposting, which is a contentious topic on HN.

------
michaelsbradley
The author of The Silver Searcher[1] has some blog posts about his efforts at
optimizing his own grep-like tool.

There's nothing wrong with good ole grep, but on boxen I spend any significant
time on, I always install ag. The integration with Emacs provided by ag.el[2]
is awesome too!

[1] [https://github.com/ggreer/the_silver_searcher#how-is-it-
so-f...](https://github.com/ggreer/the_silver_searcher#how-is-it-so-fast)

[2] [https://github.com/Wilfred/ag.el](https://github.com/Wilfred/ag.el)

------
yelnatz
I'm actually more familiar with the KMP algorithm than BM. So I looked it up
to see what the difference was:

The classic Boyer-Moore algorithm suffers from the phenomenon that it tends
not to work so efficiently on small alphabets like DNA.

The skip distance tends to stop growing with the pattern length because
substrings re-occur frequently.

By remembering more of what has already been matched, one can get larger skips
through the text.

One can even arrange 'perfect memory' and thus look at each character at most
once, whereas the Boyer-Moore algorithm, while linear, may inspect a character
from the text multiple times.

1: [http://stackoverflow.com/questions/12656160/what-are-the-
mai...](http://stackoverflow.com/questions/12656160/what-are-the-main-
differences-between-the-knuth-morris-pratt-and-boyer-moore-sea)

~~~
tetha
Couldn't you speed up a small-alphabet search by considering pairs or triplets
of characters so you have N^2 or N^3 glyphs instead of N characters (so, going
from 4 characters to 16 cgaracters to 64 characters)?

~~~
mtdewcmu
I'm not sure exactly how your proposal might work, but, generally, the problem
you would run into is that the group of characters can be shifted. So you'd
end up having to compare every possible shift.

------
stcredzero
That's also a Forth way of looking at optimization.

Also reminds me of Kent Beck's quip when he was asked to optimize Chrysler's
C3 system. He asked for validated sets of input and output. The programmers on
site said the the system wasn't producing correct results yet. His response:
In that case, I can make this _real fast!_

~~~
gruseom
I recall nearly the same story about Jerry Weinberg:

Engineers: Oh your system can only do a thousand cards a minute? Ours can do
ten thousand.

Weinberg: Yeah but your system doesn't work! If mine doesn't have to work I
can do a million cards a minute.

I thought I got it from one of Weinberg's books.

------
BrainInAJar
Conversely, Why GNU grep is slow (and how to make it not-slow):
[http://dtrace.org/blogs/brendan/2011/12/08/2000x-performance...](http://dtrace.org/blogs/brendan/2011/12/08/2000x-performance-
win/)

------
ionelm
Older submission here:
[https://news.ycombinator.com/item?id=1626305](https://news.ycombinator.com/item?id=1626305)

------
pestaa
By this logic you can write really fast programs in Haskell, because it avoids
computing unused values (and therefore complete calltrees). (Unfortunately,
the management of such calltrees -- thunks -- often has higher cost than
outright computing them in the first place.)

I try hard, but I'm not smart enough to write programs that do nothing. :)

~~~
mseebach
You can do that in any programming language by, well, not computing unused
values.

~~~
chii
That's like saying anyone can make money by just making money. It's a null
statement!

~~~
mseebach
In the context of the comment I replied to, no, it's not.

Haskell gives you some constructs that makes this easier but there exists no
program whose Haskell implementations will run faster because of this, only
programs that are easier to make fast.

~~~
nostrademons
The big advantage lazy evaluation has is composability. In an eager language,
you have to explicitly code operations in a lazy fashion, which means that if
you use a library, it will probably be eager. With Haskell you can be pretty
sure that the runtime will evaluate only as much of the program as it needs
to.

That said, the downside is that it then becomes very difficult to reason about
the performance characteristics of your library, because its performance
depends on how it's _used_. Simon Peyton-Jones is on record as saying that
laziness is probably the wrong default for a language - the big benefit for
Haskell was that it "kept them honest" wrt purity, but in a production system
you probably want strictness.

------
bane
The inverse is this...what is modern software doing that makes them so slow?

~~~
weland
> GNU grep also tried very hard to set things up so that the _kernel_ could
> ALSO avoid handling every byte of the input, by using mmap() instead of
> read() for file input. At the time, using read() caused most Unix versions
> to do extra copying.

Good luck pulling this off in Chrome.

~~~
jamesaguilar
With nacl, you could. But I think you're talking about js, and yeah, you are
right.

------
noonespecial
The fastest programs are the ones that don't prepare themselves to do
something and then don't do that thing. Another way to say just do one thing
well.

~~~
badman_ting
The fastest programs return 0 immediately and don't do shit.

~~~
noonespecial
Returning 0 is technically doing something. It might not be something that you
find useful but it is something.

~~~
Danieru
On unix you'll find two programs which do this exact thing: `false` and
`true`. False returns 1, and true returns 0.

You might not believe that these are real programs but they are. You can find
the binaries using `whereis`. For example on my linux install true is
/bin/true.

The command line is pretty awesome.

~~~
staz
And here are the actual code source for them :

[http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_p...](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_plain;f=src/true.c;hb=HEAD)

/bin/true source is a bit longer than expected but /bin/false implementation
is really interesting :
[http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_p...](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_plain;f=src/false.c;hb=HEAD)

~~~
mtdewcmu
true.c could be useful as a kind of helloworld -- a minimal example of the
conventions followed by gnu command line programs.

~~~
rm445
GNU Hello exists for this purpose.

------
jontro
Please add [2010] to the title

~~~
ohwp
Why? Because it doesn't apply anymore?

~~~
unwind
Because it can help people realize they might have read this before (I had),
which in turn can affect how one prioritizes which articles to read.

Also because "it's how it's done around here": the site is about news so it
makes sense to at least tag possible re-posts to flag them as "not quite news
yet still interesting and recommended reading".

------
_delirium
Does this still work in Unicode? It seems like it'd cause the Boyer-Moore
lookup tables to blow up in size.

~~~
FreakLegion
You can use Unicode code points for the shift table(s -- the Horspool variant
has only one table) but make them sparse, that way the tables only contain
characters that exist in the pattern. Hash tables make for easy lookups, but
with a bit blob of a few hundred kB you can also use a bitmask.

Of course, you can also just reduce the Unicode pattern to bytes, so your
alphabet is never larger than 256. This will run slower, but not as much as
you'd think: Boyer-Moore does benefit from larger alphabets, but only to the
extent that the alphabet is actually used.

~~~
_delirium
> Of course, you can also just reduce the Unicode pattern to bytes

Ah, right. I was under the impression this was unsafe, since you could end up
with spurious byte matches that are not on character boundaries. But it seems
the keyword is "self synchronizing", and UTF-8 (but not UTF-16) is safe to do
byte-oriented searching on.

------
prostoalex
Link to a no-registration-required PDF version of "Fast String Searching"
mentioned in that post

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.9460&rep=rep1&type=pdf)

------
4ngle
Original title is better. I'm gonna call the cops.

------
Myrmornis
In case it helps anyone else, to install GNU grep on OSX using homebrew I did

    
    
      brew tap homebrew/dupes
      brew install --default-names homebrew/dupes/grep
    

following [http://www.heystephenwood.com/2013/09/install-gnu-grep-on-
ma...](http://www.heystephenwood.com/2013/09/install-gnu-grep-on-mac-osx.html)

~~~
agnokapathetic
Be careful with:

    
    
      --default-names do not prepend 'g' to the binary
    

On OS X (a BSD derivative with BSD-grep), --default-names will put GNU grep in
your path. A lot of things in OS X depend on BSD sed/awk/grep. Omitting
--default-names links then GNU variants to ggrep/gsed/gawk.

------
chj
Quote of the day:

    
    
      The key to making programs fast is to make them do practically nothing. 
             -- Mike Haertel

------
ivanhoe
I know all this, but it still feels like a magic when you run it against a fat
log file and it's done in a couple of seconds, while awk and tail were
previously struggling with it for like 15 minutes...

------
mamcx
That is similar to how I explain to some students how have a fast database
system: The key to a fast query is one that do little, and the way to make it
do little is have the data exactly as how is needed.

------
pmiller2
This applies very well to optimization, too. There are only really two ways to
optimize code:

1) make it do less

2) make it do more at a time.

The first corresponds to using more efficient algorithms and data structures.
The second is parallelism.

~~~
tfb
I've found that taking a considerable amount of time (days even) to determine
the very best solution to a problem _always_ results in less effort and less
time spent coding overall. It's also no coincidence that the code becomes
incredibly simple, more readable, more efficient, more flexible, easier to
build on top of, and fewer issues down the road. Plus, in the end, depending
on the magnitude of the project, less time is spent on overall development.

It is never a good idea to rush any large coding project. By rush, I mean just
sit down and start churning out code just so you can have something tangible
within a day or two. And by large, I mean anything requiring at least a few
thousand LOC. Anything less can be okay to rush though; i.e., small projects
where maintainability and scalability aren't as important; e.g., some MVP to
test some market.

~~~
pacala
I wholly agree that it's worth investing in finding the best solution.
However, in my experience I found that having concrete code to work with is
_invaluable_. The magic bullet for me is to slap something concrete together,
then aggressively refactor / cut the crap out of it. In some sense, building
software is like sculpture. You've got this amorphous block of half baked
ideas, and you need to turn it somehow into a svelte shape.

* We end up finding more crap to remove then we'd thought at the start.

* We can cut down misguided arguments from wannabe architects of "what if we'll need x" by saying "we'll add this back when someone asks for x". It's much easier to point out that x is currently pointless with concrete code than at the demiurge phase where everything is possible and timelines are ignored.

* We can chop at the problem as a team, without having a long sequential and solitary step of "designing the best system". Amdahl's law applies to dev teams as well.

~~~
pmiller2
I call that "sketching in code." I've gotten in some trouble in interviews for
"starting to write code right away," so it's a habit I'm having to fight at
the moment, but I find having something in code (that I'm fully willing to
throw away) _really_ helps.

It's like Fred Brooks said:

    
    
        ...plan to throw one away; you will, anyhow.

~~~
tfb
I agree. It helps to have something tangible and to refactor later. I think
the main issue that some people (myself included of course) have with that is
refactoring too late. You never want to build something big on top of sketchy
code. I've done this a few times myself to try to meet a deadline, and it's
really bit me in the ass. We're talking weeks maybe months down the drain.

"Sketching in code" is a great analogy. Similar to how you'd trim off the
rough edges in a sketched drawing, I've found myself cutting a lot of cruft
when refactoring.

------
amit_m
I wonder how much these tricks matter nowadays. Even if we ignore the time to
read from disk, for simple algorithms that scan data linearly, it is often the
case that the CPU is spending most of its time waiting for the data to be read
from RAM.

Also, in modern x86 processors, an entire cache line (64 bytes) is read
whenever a single byte is needed. So for strings smaller than 64, even
algorithms that don't check every byte will read every byte from disk to RAM
and from RAM to the processor cache.

~~~
colanderman
_Even if we ignore the time to read from disk, for simple algorithms that scan
data linearly, it is often the case that the CPU is spending most of its time
waiting for the data to be read from RAM._

Nope. Even DDR-3 can perform sequential transfers at speeds well exceeding one
byte per CPU clock cycle.

And yes, you _can_ ignore the time to read from disk. Think a SAN over a
10-gig link. That gets you close to byte-per-cycle territory as well. It takes
on the order of a dozen cycles (more or less depending on architecture and
algorithm) to perform one "step" of a search. So yes, these algorithms very
much matter.

 _Also, in modern x86 processors, an entire cache line (64 bytes) is read
whenever a single byte is needed. So for strings smaller than 64, even
algorithms that don 't check every byte will read every byte from disk to RAM
and from RAM to the processor cache._

Yep. But this is not necessarily the bottleneck; see my above comments about
the CPU.

~~~
amit_m
The CPU can also execute instructions exceeding one per clock cycle. For
example instructions such as MOV, CMP and even conditional jumps run at a
typical throughput of 3 instructions per clock cycle on the intel Core line of
CPUs. (source:
[http://www.agner.org/optimize/instruction_tables.pdf](http://www.agner.org/optimize/instruction_tables.pdf))

This means that the simplest handcrafted loop that loads a byte, compares its
value to the first byte of the searched string and then does a conditional
jump should (if unrolled) take about 1 clock cycle per byte examined!

This means that if our DDR3 memory can read 6GB of data per second and our CPU
core is clocked at 3Gh, this completely naive algorithm will run at half of
the theoretical maximum speed.

Using XMM instructions (that work on 16 bytes at a time), should probably get
us to the limit of the RAM speed.

Regarding SAN-over-10-gig link. I don't know about you, but the computer I'm
typing this on has an SSD that can read only a few hundred megabytes per
second.

~~~
colanderman
SANs typically have dozens of drives configured as a RAID array to achieve 10
Gbps performance.

------
myg204
Corollary (from my fortunes file):

Deleted code is debugged code. - Jeff Sickel

~~~
jsickel
The phrase should be attributed to Ray Ontko, I first heard it while working
on database projects for him in the early nineties. -- Jeff Sickel

------
MikeTaylor
Impressive, but if the goal is for BSD to have a faster grep then clearly the
solution is to use GNU grep instead of rolling their own redundant and
inferior clone.

~~~
jtmcmc
presumably GNU grep is GPL and incompatible with their BSD license

~~~
snogglethorpe
The GNU GPL is perfectly compatible with their BSD license... :]

It's just that the end result is then restricted by the GPL, and they don't
want that.

~~~
jamesaguilar
That is what "not compatible" means.

~~~
snogglethorpe
You can release some software that contains both GNU GPL and BSD-licensed
code, and it will be legal to distribute as long as you follow the rules of
both licenses; as the rules of these licenses do not conflict, there's no
legal problem.

This is what most people mean when they say two FOSS licenses are
"compatible."

------
smackay
In other words the correct choice of algorithm and data structure can
dramatically simplify a problem and the amount of code and time needed to
solve it.

It also means that having tools where you can quickly apply different
techniques, ahem, composable functions, that you can search for more efficient
solutions with a lot less effort. That doesn't solve the smartness problem but
it makes it a lot more tractable.

~~~
bouk
This is not just because of algorithm and data structure though, it's also
very well implemented

------
zamalek
I don't know who to attribute the quote to, but there is one the goes
something along the lines of:

"The fastest method to execute is an empty method."

~~~
levosmetalo
> "The fastest method to execute is an empty method."

The fastest method to execute is an empty method that was never called.

The fastest method to execute is an empty method that was never called and
never written.

The fastest method to execute is an empty method that was never called and
never written and never planned.

~~~
mhurron
This is why I say I'm at my most efficient when I just don't do any work at
all.

~~~
jmilloy
Funny, but I think you finally went too far! _Speed_ is one thing, and not
executing a function sure can happen fast, but _efficiency_ is productivity
per unit time, and if there is no productivity, it can't be "most efficient".
(Yes, I'm a pedant sometimes.)

------
TheAceOfHearts
In case anyone is interested in checking it out, you can download the source
here: ftp://mirrors.kernel.org/gnu/grep/

~~~
eru
You will need some gnu common libraries. I tried building it from scratch
recently, as a first step for implementing an extension idea, but didn't
succeed.

(If anyone is interested, I want to add more operators, like intersection or
difference of regular languages.)

~~~
pmiller2
The shell, combined with grep already implements these intersection and
difference operators. For intersection, all you have to do is pipe into a
second grep [1]; for difference, you pipe into a second grep with -v [2].

[1] Intersection: grep regex1 file1 | grep regex2

[2] Difference: grep regex1 file1 | grep -v regex2

~~~
eru
Yes, you can do that. But you can only do that at the top level of your
expression (ie outside the expression), not intra regular expression.

------
a3n
The article mentions the difference in times from forcing mmap.

    
    
      $ grep --mmap "fleagle" *
      grep: the --mmap option has been a no-op since 2010
      ...
    

Coincidentally, the article is from 2010.

------
dicroce
The art of solving the problem in question and ONLY the problem in question.

------
dbbolton
In some cases, Perl can actually be faster than grep and sed at basic text
processing-- IIRC particularly with larger files-- but I'm not sure about the
mechanics behind it.

------
enthdegree
Original submission title was: "The key to making programs fast is to make
them do practically nothing."

------
leeoniya
if you don't need backtracking, re2 [1] should be faster.

[1] [http://code.google.com/p/re2/](http://code.google.com/p/re2/)

------
jrgnsd
I've always loved grep. Now I know why.

------
fibo
Thanks to grep, it almost payed my rent

------
niuzeta
Leaving a comment here and upvoting simply to have an access for the
resources. I will really need to update my way of bookmarking pages...

------
fsniper
Practically doing nothing is not doing things efficient and intelligently.It
is a joke but it may be misleading.

~~~
zamalek
> It is a joke but it may be misleading

It's something along the lines of a mantra, not a joke.

------
UNIXgod
This is a classic post.

------
jhhn
what is grep?

~~~
Zash
A text search utility in Unix. Name comes from a command in the text editor ed
and stands for "global regular expression print".

~~~
jhhn
Thank-you, guys, but I was kidding. I've been using Linux and other Unixes
since 1998. I just got impressed in how this post became busy with everything,
but Linux programming.

