

Use ack instead of grep to parse text files - gnosis
http://stevengharms.com/blog/2012/04/10/use-ack-instead-of-grep-to-parse-text-files/

======
Groxx
You can do a lot of things with `grep -E`, fwiw - there's not much here to
really sell ack.

Things that _do_ sell ack, for me:

    
    
      ack css_class --sass       # search .sass and .scss
      ack some_method --no-flash # ignore .as and .mxml
      # ignore compiled css in every Rails project on
      # my system (as long as I `ack` from the root)
      --ignore-dir=public/stylesheets/compiled
    
    

And the fact that it prints out like this:

    
    
      path/to/file.ext
      123: some text matching
      234: more text matching
    
      path/to/other/file.ext
      480: a match
    

instead of like this (with `-n`):

    
    
      path/to/file.ext:123:  a match
      path/to/other/file.ext:567:  another match
      path/to/that/file/you/didnt/know/you_had.ext:32:  yet another match
    

makes it _massively_ more useful for human-viewing of the results than the
normal behavior of grep. And it reverts to grep-like output when you pipe it
into something, so you can go from exploration to composition with no effort.

~~~
obviouslygreen
Entirely agreed. Other users' points about the tone of the article ring true
to me as well, and they are only hurting people's impression of the tool,
which is unfortunate; I use both ack and grep (my regex comfort level is not
extremely high, so grep -v is still a common fallback).

For the curious: It's "ack-grep" in Ubuntu's package manager (and presumably
Debian, though I can't say for sure); I stick it on every machine/server I set
up just to have it handy. Queries to the effect of _ack-grep --python
ClassName_ yield fast, readable, extremely useful output, as you mention.
That's why I use it _in addition_ to grep.

------
gnosis

      ack '(?=silver).*needle' haystack
    

is really not the same as:

    
    
      grep needle haystack|grep silver
    

Because with the double grep method, a match will be made whether "silver" is
before or after "needle"; while with the single ack command shown above, a
match will only be made when "silver" comes before "needle".

Also, I'm not sure why the author uses

    
    
      ack '(?=silver).*needle' haystack
    

instead of simply:

    
    
      ack 'silver.*needle' haystack

~~~
nene
Maybe it's because you can write the latter also with plain grep:

    
    
        grep 'silver.*needle' haystack

~~~
Firehed
That assumes that "silver" will appear _before_ "needle", which may not always
be the case. `grep needle file | grep silver` gets you lines containing both
"silver" and "needle" but not necessarily in that order.

------
martininmelb
While ack is a great tool, I don't think the author pointed out its strengths
in this article. From my perspective, the strengths are using it recursively
and its ability to 'recognise' files containing source code (and yes, I know
that grep has a recursive option - it's more innate, though, in ack).

------
spullara
This article is a wonderful demonstration why simple methods piped together is
better and easier to use than a giant monolithic application like awk.

~~~
martininmelb
Are you confusing awk with ack?

~~~
spullara
I confused the apps in the text but the point was that those regexes look far
worse than the equivalent grep pairs.

~~~
wonderzombie
Agreed. He says he can't remember the syntax for such and such in grep, but
the regexen he follows up with seem complicated enough to me.

Now, there's stuff that never sticks in my brain (tests in shell, sigh). But
generally there's less syntax and therefore less to remember in a chain of
greps. Composition of simple piece is easier to understand than one equivalent
and therefore more complex piece. Heck, the power of the shell is predicated
on this idea.

Perhaps the best part about ack is that it's simple to restrict your search of
files to a given pattern with a command line flag rather than using shell
globbing. You could wrap invocations of grep with a shell function or another
script, but that's still not great.

------
lemmsjid
A little feedback on voice. When someone says "You should stop using them.
Now.", I expect the article to be about some system killing security problem,
not an argument for the elegance of one tool over another. And if the argument
is going to be about elegance, it better be absolutely compelling. The benefit
of piped grep expressions is that you don't have to know anything beyond the
principles of Unix to intuit their usage. For many uses (most uses for me), no
thought is required -- grep fades into the background and becomes part of the
programmatic brain stem. Commanding the reader to no longer use it is as
effective as telling them to stop breathing.

------
FreakLegion
_> The primary virtue of these commands is that they use the Perl regular
expression engine._

You mean the engine that lets you write pathological regular expressions[1]
and accidentally ReDoS[2] yourself? To be fair, it's fine if you understand
how the engine works well enough to avoid these cases. But how many people can
actually say this?

1\. <http://swtch.com/~rsc/regexp/regexp1.html>

2\. <http://en.wikipedia.org/wiki/ReDoS>

~~~
jevinskie
I was looking into breaking a Perl IRC bot the other day and couldn't get any
of the examples to work (that is, take more than a split second to execute).
Does perl now detect these pathological cases and work around them or was I
just not trying the examples correctly?

~~~
FreakLegion
I believe I read somewhere that recent versions of Perl detect certain
obviously pathological cases and abort them, but I don't know if they fail
silently or display an error. Whether a particular needle is pathological
depends on the haystack, too, so it could just be that there was a mismatch
between the two in your case.

------
miles
_if you want the silver needle, the unsophisticated, greppy way of doing this
would be: $ grep needle haystack|grep silver_

Why not simply $ grep "silver needle" haystack?

~~~
gnosis
Well, with a simple haystack like the one used in the example, there really
would be no reason not to grep for "silver needle" in the first place. So it's
really not the best or most realistic example of the usefulness of the double
grep method.

When I use double grep in real life, I often tend to do so on a relatively
large haystack, where I don't necessarily know what the second search term
will be. In that situation, I'll usually do the first grep, look through its
output, and add on the second grep once I see something in the first grep's
output that I want to narrow the results down to.

Of course, instead of adding on a second grep, I could modify the original
regex (and sometimes I do); but if the original regex is complicated, then
modifying it is error prone. And, anyway, using a shell abbreviation, it's
very easy to type " G " and have that expand to " | grep " to simply add on
another grep, without touching the first regex.

A second, quite common use case for a double grep is when I want the second
search term to match whether it's before or after the first term. There's
probably some convoluted way to get the same effect using a single regex, but
it probably won't be nearly as easy or intuitive as a double grep.

------
lispertoascheme

      cp /usr/bin/grep ack
    

Find the needles

    
    
       ./ack needle haystack
    

Find the silver needles

    
    
       ./ack silver.*needle
    

Find all needles except lead ones

    
    
       ./ack '[^^][^e.][^a.][^d.] needle' haystack
    

That last one could be tricky if there's other types of needles with names
like "ead needle" or "mead needle". But using the haystack he gives us BRE can
do the job, easily.

Perl regex may be easy to use but they are inferior from a performance
perspective. As someone else said, they're slower than BRE or ERE. Moreover,
even if speed is not an issue, you pay a price in the amount of memory you
will need compared with line-based utilities like, e.g., sed and awk.

Find the needles

    
    
      sed '/needle/!d;/needle/q' haystack
    

Find the silver needles

    
    
      sed '/silver needle/!d;/silver needle/q' haystack
    

Find all needles except lead ones

    
    
      sed '/lead needle/d;/needle/!d/needle/q' haystack
    

My preference is to use (f)lex if I want a fast "parser" (scanner). Its regex
is more than adequate.

------
andmarios
\- grep does regular expressions.

\- grep uses by default the same regular expressions as sed, which is another
frequently used tool.

\- grep also supports perl regular expressions.

\- grep is available on every linux/bsd/*nix system out there, so it just
works and make your scripts work.

\- We use grep to search through gigabyte sized files (ie logs). You didn't
show us how well ack performs there.

~~~
wonderzombie
That'd be my other concern. ack is perl, as far as I can tell. I have no idea
how perl performs at these tasks. But grep is written in C, and there're fun
examples of how exactly it gets to be so fast
([http://ridiculousfish.com/blog/posts/old-age-and-
treachery.h...](http://ridiculousfish.com/blog/posts/old-age-and-
treachery.html) comes to mind).

------
qwertyboy
Ack is sweet. Ag is sweeter:

<http://github.com/ggreer/the_silver_searcher>

------
Mozai
I do wish the author described which 'ack' is being lauded. When I attempt to
use 'ack' on my linux workstation the results are confusing.

    
    
        moses@deunan:~$ </etc/mime.types grep application |grep x-ruby
        application/x-ruby				rb
    
        moses@deunan:~$ </etc/mime.types ack application x-ruby
        application: No such file or directory
        x-ruby: No such file or directory
    
        moses@deunan:~$ ack -h
        ack v1.39 Copyright 1993,94 Ogasawara Hiroyuki (COR.)
        usage: ack [-{e|s|j|c[c]}] [-{a|A|o<file>}] [-zCntud] [-{E|S}] [<file>..]

------
carbocation
Why is:

    
    
        ack -C5 'scope(?!.*lambda)' app/models
    

Better than:

    
    
        grep -C5 scope app/models | grep -v lambda
    
    ?

~~~
anonymoushn
The first one gets you five lines around all of the scopes without lambdas.
The second one gets you five lines around all the scopes (including those with
lambdas) and then omits all of the lines with lambdas, some of which will have
caused a match in the first grep, and some of which may be part of the context
of matches in the first grep. You will get both contexts with no match in the
middle and matches with incomplete contexts.

~~~
carbocation
Quite right. The downfalls of posting untested code...

------
jjoe
Is anyone else turned off by the tone of the blog post and the attitude of the
poster? There's gotta be a better way to showcase the usefulness of a tool.

~~~
Firehed
I'm more turned off by hard-to-read regular expressions, especially ones that
look like they may break depending on what terminal emulator I'm using, how
quotes are escaped, etc. The "you've been doing it wrong for years" tone I
could do without, but see people using it with good enough intentions so
frequently that I'm no longer bothered by it.

Also, `ack` is not installed by default, which is reason enough to not get too
used to it. Some people will say "optimize for being on your own machine,
since you are 99% of the time", but I'm not. Installing additional utilities
on multiple production servers is annoying enough, and can actually become
problematic in a PCI-compliant environment as mine is. I'm also frequently
helping out other members of my team, and having a magic one-liner that often
results in "-bash: ack: command not found" is not terribly useful to me. YMMV.

------
colomon
Strange article -- I love ack, but this article doesn't really hit any of the
reasons why, but focuses only on cases where I'd use grep anyway.

------
esert
if all you need is Perl regexp you can use grep -P, no need to stop using grep

------
GlennS
Is it worth learning grep or ack or a similar tool?

When I need to do these sort of tasks, I do them in a scripting language with
some combination of split() and regex instead of using command line tools.
But, I'm just doing that because it's what I know.

Would I end up saving a significant amount of time if I learned to use grep
instead?

~~~
grabastic
I'd say start with ack for its ease of use, but knowing grep is important. I
prefer ack but most of the boxes i work on don't have ack installed (and
won't).

~~~
grabastic
Also, these aren't particularly good examples of ack vs grep... as others have
pointed out.

------
ClayM
"Look at that one character shorter than grep and just as easy."

Well, let's apply that same criteria to searching for two terms:

> grep needle haystack|grep silver

> ack '(?=silver).*needle' haystack

Look at that one character shorter than ack and just 10 times easier.

grep wins, by a knockout.

------
pg_bot
Wasn't mentioned in the article but you can install ack (on OSX) through the
homebrew package manager with: `brew install ack`. Check out the docs at
<http://betterthangrep.com/>

------
pixelbeat
I find ack slow and overcomplicated (in implementation at least). As an
alternative consider:

<http://www.pixelbeat.org/scripts/findrepo>

------
bobcattr
I thought perl regex was much much slower. Atleast according to Russ Cox,

------
kylemaxwell
grep has the great strength of near-ubiquitous installation. If you find
yourself on a Unix-like system, it will have grep. Perhaps if you only ever
really work on a very small set of systems, then this doesn't matter, but for
those of us who frequently need to work on different systems and don't always
have the ability to install new packages, grep does a fine job. In fact, it
has a lot in common with vi here: you can (nearly) always count on having it
available.

------
halayli
grep -E is good enough. But whatever works.

Saying "You should stop using them. Now." is ridiculous. Why do I need to stop
anything if it does what I need?

------
dfc
ack-grep is not ack's related cousin. Ack-grep is the name debian uses for the
ack executable in order avoid a namespace collision.

    
    
      dfc@ronin:~$ apt-file search bin/ack
      ack: /usr/bin/ack
      ack-grep: /usr/bin/ack-grep
      ...
      dfc@ronin:~$ apt-cache search --names-only ^ack
      ack - Kanji code converter
      ack-grep - grep-like program specifically for large source trees
      dfc@ronin:~$

------
eliben
ack is a great tool for _searching within_ text files, not for parsing them.

[shameless plug] Try also the pure-Python based alternative to ack called
"pss" (pip/easy_install or <https://bitbucket.org/eliben/pss>).

------
mhitza
Searching text files, not parsing.

------
mistermcgruff
No.

------
lucian303
Love it since I started using it a couple of months back. :)

