

A Crash Course In Awk - zdw
http://blog.bignerdranch.com/3799-a-crash-course-in-awk/

======
tux1968
Having used bash, awk, sed and a hodgepodge of utilities for twenty years, it
was a revelation to finally spend some time learning perl. It's not a panacea
but it does offer a much larger return for time invested in learning the
basics. It is today as ubiquitous as sed & awk, and provides much more power
and sophistication for no extra complexity.

$0.02

~~~
ivanhoe
Perl is great, but today you can just as well (or easier) use ruby, python or
even php for scripting one-liners and small tasks. However awk and sed are
still fantastic option when you need just a quick processing on a command
line. Pipe it in, do some transformations or replacements, pipe it out. Same
goes for other unix power-tools like sort, unique, wc, head, tail, split,
etc., they are all still very useful...

~~~
berntb
Good luck doing one liners with Python, as long as it use wspace indentation
for e.g. loops and don't support {}:s as an option. :-(

I'd argue with most others, Perl is the single most useful command line tool.
Not the only one, of course. But afaik, you can't e.g. load a JSON lib in awk
as part of a pipeline. (I deserialize dumped data structures multiple times a
week in [ad hoc testing with] pipelined cmds.)

Imho, if you know Ruby or PHP as the back of your hand, don't learn another
scripting language for command line use. Learn some completely different
language for some other use, instead.

~~~
mtdewcmu
No, you can't work with JSON very easily in awk. Any kind of hierarchical
format like JSON and XML will give you a headache in awk, and CSV can be
difficult as well.

~~~
bsg75
While not as complete as a full CSV parsing lib, finding this made working
with CSV in awk much easier:
[http://www.gnu.org/software/gawk/manual/html_node/Splitting-...](http://www.gnu.org/software/gawk/manual/html_node/Splitting-
By-Content.html)

    
    
      gawk -vFPAT='[^,]*|"[^"]*"'
    

[http://stackoverflow.com/questions/4205431/parse-a-csv-
using...](http://stackoverflow.com/questions/4205431/parse-a-csv-using-awk-
and-ignoring-commas-inside-a-field/17287068#17287068)

~~~
berntb
That regexp fails with fields containing '"':s, but I guess you can grep for
embedded double quotes ("") first.

Are there multiple variants of coding '"' in CSV fields? I don't know -- but
_some people who do know are those who write the CSV libs I use_!

Edit: And as your link notes, it fails for embedded \n:s. Imnsho, awk needs
csv (and json, etc) builtin, preferable as a plugin architecture. But then,
why not just use the Perl superset?

~~~
billjings
Man, seconded. I'm not sure if a CSV-ized awk is a sensible idea, but I'd love
to have it if it were. CSV might be #1 on my list of "things that will cause
problems for you because they are slightly harder than you think they are".

~~~
berntb
I hear you, re CSV.

Join the dar... cough, Perl side, we have cookies. :-) We have CSV parsers and
everything else, all the way up to e.g. good web libraries and the best OO
among the scripting languages (Moose, ~ like the Common Lisp OO environment;
more or less std for new Perl projects today.)

 _And_ there is more! You can reuse most everything you know from awk! Write:
perldoc perlrun

Check for -n, -p, -i, -E flags. And, as many have noted, there is a2p.

[http://perldoc.perl.org/5.16.2/perlrun.html](http://perldoc.perl.org/5.16.2/perlrun.html)

[http://perldoc.perl.org/5.16.2/a2p.html](http://perldoc.perl.org/5.16.2/a2p.html)

But the main reason is that we have fun. An insane programming language which
throw all this "minimal mathematical notation" stuff out the window with some
linguist inspirations, but still works wonderfully (do insist on keeping to
the coding standards in your group. Seriously. At a minimum -- lie and say
that you do that, when people interview for a job at your place. :-) )

------
mtdewcmu
A pretty good article. Events aren't a bad metaphor, but it's actually simpler
than that. The whole program, except BEGIN and END blocks, is a big loop.
Blocks with conditions in front are just if statements without having to type
if. They're executed one after the other, and they can affect each other by
setting variables. If it encounters a next statement, no further commands are
executed and it starts over on the next line. The beauty of it is that it
saves you from having to set up the loop and write I/O commands to loop
through a text file.

~~~
gingerlime
The event / pattern-matching analogy was pretty great for me. I never really
bothered much with awk, possibly out of sheer laziness, but with this small
piece of knowledge, it somehow feels much more accessible and logical to use.

That said, the loop concept and using next is also very valuable piece of
information. So thanks to both!

------
hvd
Nice article. awk is a fantastic tool to get quick insights into not so large
data. This is my take which I wrote last tuesday incidentally
[http://hkelkar.com/2013/10/15/rolling-up-data-with-
awk/](http://hkelkar.com/2013/10/15/rolling-up-data-with-awk/)

~~~
tux1968
Your post caught my eye. Since I mentioned earlier that learning perl was a
good investment, thought i'd just show a quick equivalent to the awk script in
your blog post.

This produces same output as your awk script (except it's sorted
alphabetically):

    
    
      #!/usr/bin/perl -F, -an "$@"
      $wins{$F[0]} += $F[6];
      $losses{$F[0]} += $F[7];
      END {
        print "manager,total_wins,total_losses\n"
        for (sort keys %wins) {
          print "$_,$wins{$_},$losses{$_}\n";
        }
      }
    

EDIT: should also mention that you can automatically convert awk scripts to
perl with the a2p utility which should already be installed along with perl.

~~~
mtdewcmu
Here is another awk version of that script that's a little more concise and
sorts, like the above, using newer features that are in gawk:

    
    
      #!/usr/local/bin/gawk -E
      BEGIN{
      	FS=","
      }
      {
      	total_wins[$1]+=$7;
      	total_losses[$1]+=$8;
      }
      END{
      	print "manager,total_wins,total_losses"
      	n = asorti(total_wins, managers)
      	for (i=1; i<=n; i++){
      		m = managers[i]
      		print m "," total_wins[m] "," total_losses[m]
      	}
      }
    

(I recommend using gawk over mawk or nawk. You might need to install it over
the awk that came with your OS.)

~~~
hvd
Thank you mtdewcmu, did not know about the built in sort in gawk, will make
the switch.

~~~
mtdewcmu
Using gawk-only features means sacrificing portability, but gawk is so much
more refined than the other awks. I don't know why it's not the default awk on
a lot of OSes.

~~~
hvd
wouldn't installing gawk on all target machines take care of portability?

~~~
epo
Portability usually means taking the machine as you find it. If you're pre-
installing stuff then that is just configuring your environment.

------
nonchalance
A correct FizzBuzz in awk (the version in the article doesn't print numbers in
the general case):

    
    
        $ seq 1 100 | awk '!($1%3){printf "Fizz";p=1}!($1%5){printf "Buzz";p=1}!p{printf $1}{printf "\n";p=0}'

~~~
billjings
Argh. I should know what FizzBuzz is before I write one. I've updated the
implementation in my post to be... not wrong. (It's a little different from
yours, though.)

~~~
nonchalance
I hope it's shorter :) 60 chars:

    
    
        !($1%3){$2="Fizz"}!($1%5){$2=$2 "Buzz"}!$2{$2=$1}{print $2}

~~~
billjings
Nice. Using $2 as an auto-clearing global is a neat little hack!

------
brickmort
crash course indeed. I just learned the basic concept of awk in just a few
minutes. superb article!

