
GNU grep is 10x faster than Mac grep - jlebar
http://jlebar.com/2012/11/28/GNU_grep_is_10x_faster_than_Mac_grep.html
======
pooriaazimi
I'm not trying to start a theological war about grep/ack here, I'm just
mentioning it in case someone hasn't heard about 'ack' before and they (like
me) might find it extremely useful: <http://betterthangrep.com>

It's grep, just better. It highlights the selected text, it shows which files,
and in what line the text was found (and uses vivid colors so you can
distinguish them easily), ignores .git and .hg directories (among others, that
shouldn't be searched) by default, you can tell it to search, for example for
only `--cpp` or `--objc` or `--ruby` or `--text` files (with a flag, not a
filename pattern), and many many other neat features that I'm sure grep has,
but you have to remember and memorize them. ack has sensible defaults.

Why ack? <http://betterthangrep.com/why-ack/>

manpage: <http://betterthangrep.com/documentation/>

Oh, and ack is written in perl and doesn't require admin privileges to
install.

~~~
ComputerGuru
Do you know of any C ports of ack? Ack is beautiful and productive, but
nowhere near as fast as grep (orders of magnitude slower, in fact).

    
    
        gfind . -type f -exec grep -i mbr {} \; >| /dev/null  
        1.10s user 0.81s system 90% cpu 2.113 total
    
        gfind . -type f -exec ack -i mbr {} \; >| /dev/null  
        24.34s user 4.17s system 96% cpu 29.678 total
    

(Yes, I know about the flag to search recursively. This is the most fair
comparison.)

~~~
AngryParsley
I wrote a mostly-clone of Ack in C:
<https://github.com/ggreer/the_silver_searcher> . Output format and most flags
are the same. Besides the speed, most users won't notice a difference.

I spared no effort in optimizing. Pthreads, mmap(), boyer-moore-horspool
strstr, it's all there. Searching my ~/code (5.2GB of stuff), I get this:

    
    
        ag blahblahblah  1.93s user 3.54s system 313% cpu 1.749 total
    
        ack blahblahblah  9.75s user 2.79s system 98% cpu 12.690 total
    

Both programs ignore a lot of extraneous files by default (hidden files,
binary files, stuff in .gitignore, etc). The real amount of data searched is
closer to 500MB.

~~~
kamaal
Only that, Ack's core strength over time will always evolve and depend on
Perl's regular expression and text processing powers.

So re writing this in C will fundamentally mean endlessly growing a language
which will look similar to the Perl implementation. Or a Perl DSL.

Not that its a bad thing, I find it interesting though. I would say you better
start with a specification.

~~~
AngryParsley
Ag supports the same regexes as Ack. I use the PCRE library. I only call
pcre_study once, and I use the new PCRE-JIT[1] on systems where it's
available. These tweaks add up to a 3-5x speedup over Ack when regex-matching.

1\. <http://sljit.sourceforge.net/pcre.html>

~~~
btilly
If you use PCRE, you do NOT support the same regexes as Ack.

"Perl Compatible" isn't really Perl compatible, see
<http://en.wikipedia.org/wiki/PCRE> for details.

~~~
AngryParsley
Yes, there are a few edge cases, but hardly anyone uses those features. In
fact, 90% of the time, most people seem to use literal string matching.

------
martinp
'why GNU grep is fast' from the FreeBSD mailing list:
[http://lists.freebsd.org/pipermail/freebsd-
current/2010-Augu...](http://lists.freebsd.org/pipermail/freebsd-
current/2010-August/019310.html)

~~~
haberman
Also classic: [http://ridiculousfish.com/blog/posts/old-age-and-
treachery.h...](http://ridiculousfish.com/blog/posts/old-age-and-
treachery.html)

------
pixelbeat
I notice these Mac tools becoming a bit stale. sort is derived GNU sort, but
from some ancient version. I guess this might be due in part to these tools
now being GPLv3 ?

~~~
paxswill
Almost certainly. Apple stopped updating their tools past the GPLv2 versions,
with the most noticeable example being gcc, which was frozen at 4.2 until they
removed it.

------
X-Istence
This may also be because the default grep, i.e. BSD grep actually pays
attention to what you have set in your environment variable LANG. Default on
OS X is en_US.UTF-8.

If the author were to set LANG to c. He would find that BSD grep suddenly
speeds up tremendously.

~~~
pdw
GNU grep certainly honors locale settings, and recent versions are fast even
when you're using UTF-8 (since release 2.7 or so).

~~~
X-Istence
Hmm, interesting. Work is being done on BSD grep to make it faster than it is,
so hopefully in the near future the two will be on par.

------
mattparlane
For those using homebrew:

    
    
        brew install https://raw.github.com/Homebrew/homebrew-dupes/master/grep.rb

~~~
paxswill
If you tap the `homebrew-dupes` repository, you will get updates in the
future:

    
    
        brew tap homebrew/dupes
        brew install grep

~~~
lyso
Are there any other utils worth installing from that tap? Awk? OpenSSH?

~~~
paxswill
The repository with all of the formulas is here:
<https://github.com/homebrew/homebrew-dupes>

The two I end up using semi-frequently are gcc and apple-gcc, for those
projects that Clang just won't compile.

------
eik3_de
You should tack a "LC_CTYPE=C" in front of grep to get comparable results. A
multibyte CTYPE can slow down grep up to factor 30.

------
emidln
Is speed really that much of a concern with grep? I typically use :vimgrep
inside of vim, not because it's faster (it's orders of magnitude slower due to
being interpreted vimscript), but because I hate remembering the differences
between pcre/vim/gnu/posix regex syntax.

~~~
jlebar
I regularly search my whole Firefox clone for keywords. If this takes 2s,
that's plenty fast; if it takes 20s, I'd have to come up with some other way
of doing it.

~~~
Evbn
Ctags?

~~~
jlebar
Firefox is quite complicated; we have code written in C, C++, JS, Python,
Make, m4, plus at least three custom IDL formats. grep handles these with
ease.

------
buster
Obviously this means, Linux is 10x faster then Mac, ha!

Seriously though, it's really amazing what performance they squeezed of that
tool. Always amazing to grep through gigabytes of files in a few seconds.

------
pooriaazimi
I once tried a sed script on a couple million text files (60 GB in total) -
they were web pages downloaded in some format (WARC? I don't remember what it
was called) and I needed to change the formatting slightly (to feed them to
Nutch) - Mac's default sed was literally 50 times slower than gsed (on the
same machine). If I remember correctly, gsed finished the task in under two
hours.

------
tehwalrus
just tried on snow leopard, not quite 10x but nearly 2x faster, certainly.
(admittedly, by firefox checkout is mercurial, and hg locate seems to pass
something invalid to xargs half way through, but I guess the first chunk of
files are the same.)

Someone commented on the article that this might be caused by missing off the
-F flag; I tried this, and -F makes both versions slightly faster again.

------
xtrahotsauce
Does "git grep" use a system grep or does it implement grep on its own?

~~~
meaty
It uses its own. See <https://github.com/git/git/blob/master/builtin/grep.c>

~~~
unwind
That seems to be the "command infrastructure" for the grep builtin. The actual
grep engine is in <https://github.com/git/git/blob/master/grep.c>.

------
wildranter
Or...

    
    
      brew install ack

