
The Silver Searcher: An attempt to make something better than ack - robin_reala
https://github.com/ggreer/the_silver_searcher
======
kibwen
I haven't used ack since I discovered git's built-in `git grep` command,
which, as I understand it, doesn't even need to sift through your files; it
just examines your git index, so it's waaaay faster than either grep or ack.
It's also recursive by default, doesn't require a file-matching parameter, and
colorizes its output, which together were the reasons that I used ack over
grep in the first place. Pretty brilliant, all told.

~~~
js2
The index is just a list of the files and associated metadata, it is not their
contents. So git grep still needs to read the files from disk, though it
doesn't need to walk the filesystem to locate them[@].

[@] Technically git grep has five modes of operation:

1\. Search the contents of the tracked files as they currently are on disk.
This is the default.

2\. With --cached, search the contents of the tracked files as they are in the
index (i.e. ignore any un-added changes).

3\. With --no-index, search all files recursively from the current directory
down. This allows you to use "git grep" as a "grep -R" replacement even when
your CWD is not inside a repo.

4\. With --untracked, search all files recursively from the current directory
down in addition to files in the index. (The difference between this and --no-
index when used inside a repo is that --untracked honors the .gitignore
mechanism by default, i.e., --untracked is a synonym for --no-index --exclude-
standard when inside a repo.)

5\. With a tree'ish (commit, tag, branch name, etc), search all files in the
tree.

------
colomon
Much as I've loved ack, I'd be all in favor of a replacement that was really
significantly better.

That said, the readme there lists five reasons why Silver Searcher is better
than ack. Two of them are nonsense (who cares what language it's written in,
or same-order-of-magnitude differences in how big the executables are?), two
sound like they would be pretty trivial changes to ack, and the last is a
significant speed improvement. But then you read further down the readme, and
it says the current development state is somewhere between "Runs" and "Behaves
correctly". Isn't it kind of premature to be bragging about how fast you are
before your code actually behaves correctly?

Also, it makes me wonder how much of the speed increase is based on the easy
changes filtering out more files...

~~~
mturmon
Not arguing with your basic point, which is good, but when the developer says
"The binary name is 33% shorter than ack!", he's referring to the difference
between typing "ag" vs. typing "ack". Not the size in bytes of the executable.

It's a little inside joke, because one of the points in favor of "ack",
offered jokingly years ago by _its_ developer, is that "ack" is 25% fewer
letters to type compared to "grep".

(This feature is still there as point #10 in <http://betterthangrep.com/why-
ack/>, now not offered 100% jokingly.)

~~~
petdance
It's a joke but not. The less typing you have to do, the better. Less typing
means fewer mistakes and less time waiting for the search to start.

Defaults matter. ack is all about having sensible defaults for your most
common uses.

------
AngryParsley
I'm the author. People are asking how this thing is faster than ack or grep.
Here's how:

\- Literal matches use Boyer-Moore-Horspool strstr.[1]

\- Files are mmap()ed instead of read into a buffer.

\- If you're building with PCRE 8.21 or greater, regex searches use the JIT
compiler.[2] Also I call pcre_study() before executing the regex on a jillion
files.

\- Ag reads your .gitignore and .hgignore files to ignore code you don't care
about.

\- Instead of calling fnmatch() on every pattern in your ignore files, non-
regex patterns are loaded into an array and binary searched.

I wrote a couple of blog posts about profiling The Silver Searcher and
improving performance. [http://geoff.greer.fm/2012/01/23/making-programs-
faster-prof...](http://geoff.greer.fm/2012/01/23/making-programs-faster-
profiling/) is the most informative one, IMO.

1\.
[http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Hor...](http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm)

2\. <http://sljit.sourceforge.net/pcre.html>

~~~
js2
FWIW, git grep also uses threads in some circumstances to get better
performance.

Also, obligatory link whenever Boyer-Moore is mentioned:
[http://ridiculousfish.com/blog/posts/old-age-and-
treachery.h...](http://ridiculousfish.com/blog/posts/old-age-and-
treachery.html)

~~~
AngryParsley
That's a cool blog post. I've tried not to look at the grep source code until
I've written my own solutions, so I didn't know grep made that tradeoff.

------
pixelbeat
I prefer the much simpler approach of reusing the existing utils with a simple
shell script, which is much faster than ack:

<http://www.pixelbeat.org/scripts/findrepo>

A quick test on a moderately big repo:

    
    
        $ time findrepo test '*' | wc -l
        158819
        real	0m0.532s
        $ time ack -a test | wc -l
        76526
        real	0m8.762s

~~~
petdance
I'm more than happy to include findrepo on the betterthengrep.com/more-tools
page. If you'll make a page for it, or at least have something for newbies to
read, I'll add a link.

My concern is that someone who's new to all of this isn't going to understand
what to do if I just link to <http://www.pixelbeat.org/scripts/findrepo>

------
StavrosK
Wait, how is anything faster than grep?

~~~
steve-howard
I believe it's because ack and friends skip files you ordinarily don't want by
default. So there's no .svn-base duplicates, no cache files generated by
tools, etc.

~~~
troels
So `find ... | xargs grep ...` is still faster, right?

~~~
petdance
Depends on if you include time to type the command.

If your find command looks like

    
    
        find . -name '*.pl' -o -name '*.pm' | xargs grep foo
    

and it takes 1 second to finish, and your ack command is

    
    
        ack foo --perl
    

and it takes 1.5 seconds to finish, you can say the grep is faster.

But I just timed the time it takes to type those, and they took me 9.2 vs. 2.3
seconds.

So which is faster: 9.2+1.0 or 2.3+1.5?

