Hacker News new | past | comments | ask | show | jobs | submit login
Use ack instead of grep to parse text files (stevengharms.com)
24 points by gnosis on Dec 24, 2012 | hide | past | favorite | 48 comments



You can do a lot of things with `grep -E`, fwiw - there's not much here to really sell ack.

Things that do sell ack, for me:

  ack css_class --sass       # search .sass and .scss
  ack some_method --no-flash # ignore .as and .mxml
  # ignore compiled css in every Rails project on
  # my system (as long as I `ack` from the root)
  --ignore-dir=public/stylesheets/compiled

And the fact that it prints out like this:

  path/to/file.ext
  123: some text matching
  234: more text matching

  path/to/other/file.ext
  480: a match
instead of like this (with `-n`):

  path/to/file.ext:123:  a match
  path/to/other/file.ext:567:  another match
  path/to/that/file/you/didnt/know/you_had.ext:32:  yet another match
makes it massively more useful for human-viewing of the results than the normal behavior of grep. And it reverts to grep-like output when you pipe it into something, so you can go from exploration to composition with no effort.


Entirely agreed. Other users' points about the tone of the article ring true to me as well, and they are only hurting people's impression of the tool, which is unfortunate; I use both ack and grep (my regex comfort level is not extremely high, so grep -v is still a common fallback).

For the curious: It's "ack-grep" in Ubuntu's package manager (and presumably Debian, though I can't say for sure); I stick it on every machine/server I set up just to have it handy. Queries to the effect of ack-grep --python ClassName yield fast, readable, extremely useful output, as you mention. That's why I use it in addition to grep.


  ack '(?=silver).*needle' haystack
is really not the same as:

  grep needle haystack|grep silver
Because with the double grep method, a match will be made whether "silver" is before or after "needle"; while with the single ack command shown above, a match will only be made when "silver" comes before "needle".

Also, I'm not sure why the author uses

  ack '(?=silver).*needle' haystack
instead of simply:

  ack 'silver.*needle' haystack


Maybe it's because you can write the latter also with plain grep:

    grep 'silver.*needle' haystack


That assumes that "silver" will appear before "needle", which may not always be the case. `grep needle file | grep silver` gets you lines containing both "silver" and "needle" but not necessarily in that order.


Also, traditional grep can easily find 'silver needle': `grep silver\ needle haystack`.


ack hilights results by default, so `(?=silver.*)needle` would potentially be useful. But, yes, not the same as the grep|grep.


While ack is a great tool, I don't think the author pointed out its strengths in this article. From my perspective, the strengths are using it recursively and its ability to 'recognise' files containing source code (and yes, I know that grep has a recursive option - it's more innate, though, in ack).


This article is a wonderful demonstration why simple methods piped together is better and easier to use than a giant monolithic application like awk.


Are you confusing awk with ack?


I confused the apps in the text but the point was that those regexes look far worse than the equivalent grep pairs.


Agreed. He says he can't remember the syntax for such and such in grep, but the regexen he follows up with seem complicated enough to me.

Now, there's stuff that never sticks in my brain (tests in shell, sigh). But generally there's less syntax and therefore less to remember in a chain of greps. Composition of simple piece is easier to understand than one equivalent and therefore more complex piece. Heck, the power of the shell is predicated on this idea.

Perhaps the best part about ack is that it's simple to restrict your search of files to a given pattern with a command line flag rather than using shell globbing. You could wrap invocations of grep with a shell function or another script, but that's still not great.


It sounds like it.

But that too seems like a demonstration of something. The more "simple" methods with obscure names that populate the Unix toolbox, the more confusing it gets. I've gone from find to locate recently, for example, but their functionalities kind of overlap and so when I do find, I'm rusty with it.


A little feedback on voice. When someone says "You should stop using them. Now.", I expect the article to be about some system killing security problem, not an argument for the elegance of one tool over another. And if the argument is going to be about elegance, it better be absolutely compelling. The benefit of piped grep expressions is that you don't have to know anything beyond the principles of Unix to intuit their usage. For many uses (most uses for me), no thought is required -- grep fades into the background and becomes part of the programmatic brain stem. Commanding the reader to no longer use it is as effective as telling them to stop breathing.


>The primary virtue of these commands is that they use the Perl regular expression engine.

You mean the engine that lets you write pathological regular expressions[1] and accidentally ReDoS[2] yourself? To be fair, it's fine if you understand how the engine works well enough to avoid these cases. But how many people can actually say this?

1. http://swtch.com/~rsc/regexp/regexp1.html

2. http://en.wikipedia.org/wiki/ReDoS


I was looking into breaking a Perl IRC bot the other day and couldn't get any of the examples to work (that is, take more than a split second to execute). Does perl now detect these pathological cases and work around them or was I just not trying the examples correctly?


I believe I read somewhere that recent versions of Perl detect certain obviously pathological cases and abort them, but I don't know if they fail silently or display an error. Whether a particular needle is pathological depends on the haystack, too, so it could just be that there was a mismatch between the two in your case.


if you want the silver needle, the unsophisticated, greppy way of doing this would be: $ grep needle haystack|grep silver

Why not simply $ grep "silver needle" haystack?


Well, with a simple haystack like the one used in the example, there really would be no reason not to grep for "silver needle" in the first place. So it's really not the best or most realistic example of the usefulness of the double grep method.

When I use double grep in real life, I often tend to do so on a relatively large haystack, where I don't necessarily know what the second search term will be. In that situation, I'll usually do the first grep, look through its output, and add on the second grep once I see something in the first grep's output that I want to narrow the results down to.

Of course, instead of adding on a second grep, I could modify the original regex (and sometimes I do); but if the original regex is complicated, then modifying it is error prone. And, anyway, using a shell abbreviation, it's very easy to type " G " and have that expand to " | grep " to simply add on another grep, without touching the first regex.

A second, quite common use case for a double grep is when I want the second search term to match whether it's before or after the first term. There's probably some convoluted way to get the same effect using a single regex, but it probably won't be nearly as easy or intuitive as a double grep.


  cp /usr/bin/grep ack
Find the needles

   ./ack needle haystack
Find the silver needles

   ./ack silver.*needle
Find all needles except lead ones

   ./ack '[^^][^e.][^a.][^d.] needle' haystack
That last one could be tricky if there's other types of needles with names like "ead needle" or "mead needle". But using the haystack he gives us BRE can do the job, easily.

Perl regex may be easy to use but they are inferior from a performance perspective. As someone else said, they're slower than BRE or ERE. Moreover, even if speed is not an issue, you pay a price in the amount of memory you will need compared with line-based utilities like, e.g., sed and awk.

Find the needles

  sed '/needle/!d;/needle/q' haystack
Find the silver needles

  sed '/silver needle/!d;/silver needle/q' haystack
Find all needles except lead ones

  sed '/lead needle/d;/needle/!d/needle/q' haystack
My preference is to use (f)lex if I want a fast "parser" (scanner). Its regex is more than adequate.



- grep does regular expressions.

- grep uses by default the same regular expressions as sed, which is another frequently used tool.

- grep also supports perl regular expressions.

- grep is available on every linux/bsd/*nix system out there, so it just works and make your scripts work.

- We use grep to search through gigabyte sized files (ie logs). You didn't show us how well ack performs there.


That'd be my other concern. ack is perl, as far as I can tell. I have no idea how perl performs at these tasks. But grep is written in C, and there're fun examples of how exactly it gets to be so fast (http://ridiculousfish.com/blog/posts/old-age-and-treachery.h... comes to mind).


I do wish the author described which 'ack' is being lauded. When I attempt to use 'ack' on my linux workstation the results are confusing.

    moses@deunan:~$ </etc/mime.types grep application |grep x-ruby
    application/x-ruby				rb

    moses@deunan:~$ </etc/mime.types ack application x-ruby
    application: No such file or directory
    x-ruby: No such file or directory

    moses@deunan:~$ ack -h
    ack v1.39 Copyright 1993,94 Ogasawara Hiroyuki (COR.)
    usage: ack [-{e|s|j|c[c]}] [-{a|A|o<file>}] [-zCntud] [-{E|S}] [<file>..]


Is anyone else turned off by the tone of the blog post and the attitude of the poster? There's gotta be a better way to showcase the usefulness of a tool.


I'm more turned off by hard-to-read regular expressions, especially ones that look like they may break depending on what terminal emulator I'm using, how quotes are escaped, etc. The "you've been doing it wrong for years" tone I could do without, but see people using it with good enough intentions so frequently that I'm no longer bothered by it.

Also, `ack` is not installed by default, which is reason enough to not get too used to it. Some people will say "optimize for being on your own machine, since you are 99% of the time", but I'm not. Installing additional utilities on multiple production servers is annoying enough, and can actually become problematic in a PCI-compliant environment as mine is. I'm also frequently helping out other members of my team, and having a magic one-liner that often results in "-bash: ack: command not found" is not terribly useful to me. YMMV.


Strange article -- I love ack, but this article doesn't really hit any of the reasons why, but focuses only on cases where I'd use grep anyway.


if all you need is Perl regexp you can use grep -P, no need to stop using grep


Why is:

    ack -C5 'scope(?!.*lambda)' app/models
Better than:

    grep -C5 scope app/models | grep -v lambda

?


The first one gets you five lines around all of the scopes without lambdas. The second one gets you five lines around all the scopes (including those with lambdas) and then omits all of the lines with lambdas, some of which will have caused a match in the first grep, and some of which may be part of the context of matches in the first grep. You will get both contexts with no match in the middle and matches with incomplete contexts.


Quite right. The downfalls of posting untested code...


Agreed. The article says:

  $ grep needle haystack|grep silver

  This sucks.
It would be nice if the argument against this had some sort of substance.

In most cases grep is going to be faster that ack. If you are searching large files this can make quite a difference.


Is it worth learning grep or ack or a similar tool?

When I need to do these sort of tasks, I do them in a scripting language with some combination of split() and regex instead of using command line tools. But, I'm just doing that because it's what I know.

Would I end up saving a significant amount of time if I learned to use grep instead?


In my opinion, yes, it is. grep is fast and versatile.

Shell tools in general are relatively simple or at least specialized, and they're built to be composed in novel/useful ways. The interface between all of these is text, aka data, aka what is arguably the simplest interface.


I'd say start with ack for its ease of use, but knowing grep is important. I prefer ack but most of the boxes i work on don't have ack installed (and won't).


Also, these aren't particularly good examples of ack vs grep... as others have pointed out.


Yes!


Wasn't mentioned in the article but you can install ack (on OSX) through the homebrew package manager with: `brew install ack`. Check out the docs at http://betterthangrep.com/


"Look at that one character shorter than grep and just as easy."

Well, let's apply that same criteria to searching for two terms:

> grep needle haystack|grep silver

> ack '(?=silver).*needle' haystack

Look at that one character shorter than ack and just 10 times easier.

grep wins, by a knockout.


I thought perl regex was much much slower. Atleast according to Russ Cox,


I find ack slow and overcomplicated (in implementation at least). As an alternative consider:

http://www.pixelbeat.org/scripts/findrepo


grep has the great strength of near-ubiquitous installation. If you find yourself on a Unix-like system, it will have grep. Perhaps if you only ever really work on a very small set of systems, then this doesn't matter, but for those of us who frequently need to work on different systems and don't always have the ability to install new packages, grep does a fine job. In fact, it has a lot in common with vi here: you can (nearly) always count on having it available.


ack-grep is not ack's related cousin. Ack-grep is the name debian uses for the ack executable in order avoid a namespace collision.

  dfc@ronin:~$ apt-file search bin/ack
  ack: /usr/bin/ack
  ack-grep: /usr/bin/ack-grep
  ...
  dfc@ronin:~$ apt-cache search --names-only ^ack
  ack - Kanji code converter
  ack-grep - grep-like program specifically for large source trees
  dfc@ronin:~$


grep -E is good enough. But whatever works.

Saying "You should stop using them. Now." is ridiculous. Why do I need to stop anything if it does what I need?


ack is a great tool for searching within text files, not for parsing them.

[shameless plug] Try also the pure-Python based alternative to ack called "pss" (pip/easy_install or https://bitbucket.org/eliben/pss).


Searching text files, not parsing.


No.


Love it since I started using it a couple of months back. :)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: