
Is there still any reason to learn AWK? - luu
http://stackoverflow.com/q/107603/334816
======
brutos
Like the sibling post to this I also have a bioinformatics background and awk
is one of the most important tools in my toolkit. It is just so extremely
fast. A few years ago I was laughing about a friend that did a lot of simple
data transformation in awk and told him to save his time use python instead.
He challenged me to do the same task in python. It took me longer to write and
it was magnitudes slower. That was eye opening.

It is useful to be aware about the different awk implementations. For example
mawk. That awk version is stupidly fast. GNU awk is already extremely fast,
but if I work on a > 200gb file, I just prefix the m and the task will be done
even faster. It however does not have all the features of gawk. The
differences in regex support are the most painful.

A very useful feature of the GNU version of awk is the debugger. Super useful
to be able to step line wise through a awk script.

If I ever have too much time I would like to take a look at that experimental
llvm-awk (lawk) someone started and get it production ready. Sadly is seems
abandoned currently.

------
a_bonobo
In bioinformatics, awk is such a wonderfully useful tool - most of the data in
read alignments/genomics is either in tab delimited format or is easily
converted from there (via bedtools, samtools, bamtools etc.), so that it's
easy to get relevant stats via a quick chain that ends in awk.

There's even a fork of awk called bioawk which adds a few commonly needed
functions (treat common bioinformatics files as tab delimited files etc.).

The recent bioinformatics book Bioinformatics Data Skills from Vince Buffalo
has a whole chapter on using awk in bioinformatics. Since reading it I've gone
from writing 10, 20 line python helper scripts that calculate stats for the
publication to just using one line in awk. (Downside: a 20 line python helper
script is still in the folder next week and is at least a little bit
reproducible, simpler awk stuff disappears in the history if you're not
careful)

tl;dr: in bioinformatics it's a boon.

