
The Awk Programming Language (1988) [pdf] - shawndumas
https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf
======
coliveira
I consider awk to be the most useful and underused language in the UNIX
ecosystem. I use it daily to analyze, transform, and assemble data, and it
always blows my mind that so few people really know how to use it at a decent
level. This is an excellent book to give a real idea of what awk is capable
of.

~~~
ajross
> it always blows my mind that so few people really know how to use it at a
> decent level

Not nearly as surprising as it is to me that now that most developers have
forgotten perl, they're turning to awk as an inspiring example of a bygone
era.

Seriously: perl basically replaced awk in the mid-90's. It absorbed all the
great lessons and added two or three dozen innovations. But it had scary
syntax, so everyone used python (which did not replace awk very effectively)
and forgot perl. So now we're back at awk. And the scary syntax is all in the
Rust world.

~~~
olskool
Python was never intended to replace Perl. Perl was designed to extract stuff
from text files. Python was designed as a scripting language for system
programming. IM(Seldom Humble)O Python beat Perl for two reasons (a) batteries
included (CPAN became a clusterfuck) (b) C API lead to stuff like SciPy, NumPy
and Pandas.

FWIW I've used both Perl and Python professionally and Python rules.

~~~
Zash
> CPAN became a clusterfuck

This gave me flashbacks to installing some perl module and watching it
download half of CPAN.

~~~
vgy7ujm
Still NPM is so much worse...

~~~
Zash
Those who don't learn from history etc.

------
wernsey
I'll just share my biggest Awk-based project here: A tool that creates HTML
documentation from Markdown comments in your source code.

It does essentially what Javadoc does, except it uses Markdown to format text,
which I find much more pleasing to the eyes when you read source code.

The benefit of doing it in Awk is that if you want to use it in your project,
you can just distribute a single script with your source code and add a two
lines to your Makefile. Because of the ubiquity of Awk, you never have to
worry whether people building your library has the correct tools installed.

It doesn't have all the features that more sophisticated tools like Doxygen
provides, but I'm going to keep on doing the documentation for my small hobby
projects this way.

[1] [https://github.com/wernsey/d.awk](https://github.com/wernsey/d.awk)

~~~
mauvehaus
It is true that there exists _an_ awk most everywhere, but I've been burned a
few times by the discovery that nawk (Solaris (dating myself a bit there...)),
gawk (many GNU/Linux distros), and mawk (some GNU/Linux distros? Don't know
where I ran into it, but I have in the last couple of years) all have subtle
incompatibilities around the edges.

As I recall, gawk in particular has some extensions that are explicitly a
superset of the standard funcionality. Which is great if you're only targeting
gawk and know about it. It's less great if you _think_ you're only targeting
gawk, and discover later on that there's a system with a different awk you
need to support.

Your project looks neat, by the way. I'm looking forward to taking a closer
look.

~~~
wernsey
Yes, there are differences. I used only standard awk facilities, and tested my
project with gawk, mawk and nawk.

I've also been burned when I discovered that Raspbian ships with an older
version of mawk that did something differently in the way it processed regexen
that caused my script to break.

------
ianmcgowan
I had this exact book and used awk on DOS/Novell in the early 90's when
scripting choices were pretty scarce. The writing is tremendous - a model of
clarity, and worth reading just for that. Anything with Kernighan, Pike or
Plauger as author is worth checking out just for the example of clear
thinking.

------
orionblastar
In 1996 I worked as a federal contractor for a US Army base. They had
different Unix systems locked down for security reasons. Had Awk and Sed to
work with and ordered the books from Amazon.

Oracle databases and other databases exported data in fixed width files and I
had to download from several Nix systems to import into one general Nix system
using Oracle and then a DOS based Clipper 5 system and an Access 2.0 Windows
system and they all had to get the same results.

If not for Awk I could not filter the files from the Nix systems.

~~~
thomastjeffery
Saying "Nix" instead of "*nix" is confusing because of the Nix package
manager[0]

[0][https://nixos.org/](https://nixos.org/)

------
znpy
I printed this book and went through it and imho just skimming through all of
it is worth it: just understanding how it works beyond the basic '{print $2}'
is immensely worth it, and being exposed to some 'advanced' techniques gives
you a set of techniques that you can reuse in your daily chores (in particular
if you're a sysadmin).

------
totalperspectiv
This book is worth the read. Just to get in the mindset of the authors. I wish
that more programming books could be as concise and useful at the same time.

------
kazinator
TXR Lisp provides a Lisp-ified _awk_ in a macro:

[http://www.nongnu.org/txr/txr-
manpage.html#N-000264BC](http://www.nongnu.org/txr/txr-
manpage.html#N-000264BC)

> _" Unlike Awk, the awk macro is a robust, self-contained language feature
> which can be used anywhere where a TXR Lisp expression is called for,
> cleanly nests with itself and can produce a return value when done. By
> contrast, a function in the Awk language, or an action body, cannot
> instantiate an local Awk processing machine. "_

The manual contains a translation of all of the Awk examples from the POSIX
standard:

[http://www.nongnu.org/txr/txr-
manpage.html#N-03D16283](http://www.nongnu.org/txr/txr-
manpage.html#N-03D16283)

The (-> name form ...) syntax above is scoped to the surrounding awk macro.
Like in Awk, the redirection is identified by string. If multiple such
expressions appear with the same name, they denote the same stream (within the
lexical scope of the awk macro instance to which they belong). These are
implicitly kept in a hash table. When the macro terminates (normally or via
non-local jump like an exception), these streams are all closed.

------
3uclid
For context, I'm in university, but during one of my internships, a lot of the
older developers always seemed to use awk/sed in really powerful ways. At the
same time, I noticed a lot of the younger developers hardly used it.

I'm not sure if it's a generational thing, but I thought that was interesting.

Anyways, are there any good resources to learn awk/sed effectively?

~~~
MikeTaylor
Awk really has been superseded by Perl (and therefore arguably by Python,
Ruby, etc.) But sed remains a thing of beauty, all its own, and very well
worth learning. Hardly a day goes by that I don't use it in some one-off
command like

    
    
        for i in *.png; do pngtopnm < $i | cjpeg > `echo $i | sed 's/png$/jpeg/'`; done

~~~
kazinator

      for i in in *.png ; do
        pngtopnm $i | cjpeg > ${i#.png}.jpeg
      done

------
olskool
Over the years I've written too many awk one liners to count. Most of them
look ugly - hell awk makes Perl look elegant - but having awk in your toolkit
means that you don't have to drop out of the shell to extract some weird shit
out of a text stream. Thanks Aho Weinberger and Kernigan!

------
hawski
And I'm still waiting for the structural regular expressions version of awk
[0].

I very much like awk, I prefer it over sed, because it's easy to read. Also
proper man page is all one needs. But I find myself many times doing something
like this:

    
    
      match($0, /regex/) {
        x = substr($0, RSTART, RLENGTH)
        if(match(x, /regex2/)) {
          ...
        } else if(match(x, /regex3/)) {
          ...
    

Then I sometimes want to mix and match those strings. Or do some math on a
matched number. It's a bit tedious in awk.

[0]
[http://doc.cat-v.org/bell_labs/structural_regexps/](http://doc.cat-v.org/bell_labs/structural_regexps/)

~~~
nur0n
It seems that work has already started:
[https://github.com/martanne/vis](https://github.com/martanne/vis).

~~~
hawski
Correct me if I'm wrong, but as fine as vis is it will not feature stand alone
strex-awk.

------
jph
Awk is great for quick command line scripts and also for running on a very
wide range of systems.

I recently wrote a simple statistics tool using Awk to calculate median,
variance, deviation, etc. and people say the code is readable and good for
seeing the simplicity of Awk.

[https://github.com/numcommand/num/blob/master/bin/num](https://github.com/numcommand/num/blob/master/bin/num)

------
rurban
If you want a fast awk, use mawk.

[https://github.com/mikebrennan000/mawk-2](https://github.com/mikebrennan000/mawk-2)

~~~
davidgould
Yes. mawk is shockingly fast.

In my perfect world mawk would have some of the gawk extensions, and it would
have a csv reader mode to properly split csv into $1...$NF. Because that would
be the killer tool.

~~~
rurban
The new mawk has it, just the old Debian version not yet.

~~~
davidgould
Where is the new mawk please?

------
JepZ
That is a nice book. Starting with a practical tutorial and going into the
structure and language features afterwards on a reasonable page count of just
about 200 pages.

I like to use awk when I need something a little more powerful than grep.
Nevertheless, when I look at the examples and where the book is heading I
prefer R for many of the tasks (in particular Rscript with a shebang).

Just to give an example: If you have to manipulate a CSV file, that would most
certainly be possible with awk, but some day there might be a record which
does contain the separator and your program will produce some garbage. R on
the other hand comes with sophisticated algorithms to handle CSV file correct.

I truly respect awk for what it was and is but I also think that the use-cases
where it is the best tool for the job has become very narrow over time.

------
samuell
As I do most of my daily work in cheminformatics with a (shell-based) workflow
engine ([http://scipipe.org](http://scipipe.org)), awk has turned out to be
the perfect way of defining even quite complicated components based on just a
shell command. These days, pretty much 50% of my components are CSV/TDV data
munging with awk! :D

(Can be hard to explain how this works without an image, so an (older) image
is found in:
[https://twitter.com/smllmp/status/984173696448434176](https://twitter.com/smllmp/status/984173696448434176)
)

------
hi41
I find awk so beautiful. I written many scripts in awk. It is so good at data
transformation. I used it write a script to delete old and unused records from
tables. The book is so beautifully written with amazing clarity of thought.

------
oblio
I recently had a sort of "contest" with someone for parsing the output of a
tool. I had to parse some text output into a tree structure.

The other person wrote it in awk, quite quickly. After writing my own version
in Python (my version was waaaay over-engineered), I decided to blatantly rip-
off the awk solution and re-implement it in Python.

It was almost as simple and as short.

Awk is much more compact as a language, but also way more limited. And it
still has its quirks and a certain volume of information you have to gather.
I'd say it's more worthwhile to learn Python instead, because you'll be able
to use it for other purposes.

------
thomastjeffery
From the introduction to Chapter 2:

> Because it's a description of the complete language, the material is
> detailed, so we recommend that you skim it, then come back as necessary to
> check up on details.

Any book that recommends skimming is doing something right.

------
dokem
This pdf looks like a scanned book but I can highlight and copy text from it?
What exactly is going on here? Does Chrome pdf viewer have built-in OCR?

~~~
fulafel
PDF has long had this feature, most/all readers support it. There is a hidden
textual representation included along with the scanned shown content.

~~~
dokem
Interesting, why not just replace the text then? Where can I find more info
about this? I was actually trying to find a good source of info about PDF's
recently and couldn't really find much.

~~~
fulafel
Replacing the typeset text with any reasonable fidelity seems like a much
harder problem than reproducing the scan and providing the ocr'ed text
content. It might still be a good idea to do, maybe some software does this.

I don't have any references, sorry.

------
mbubb
The lovely and humbling thing about this it was written 3 decades ago and the
examples still work. Makes me think of another short elegant piece by Kenneth
Church (?) called "Unix for Poets" which shows how to use core UNIX utils to
work with text. Also from the mid to late 80s. Perl may have replaced sed and
awk but they endure.

------
asicsp
See also: [http://www.faqs.org/faqs/computer-
lang/awk/faq/](http://www.faqs.org/faqs/computer-lang/awk/faq/) from 2002

For latest manual/book:
[https://www.gnu.org/software/gawk/manual/](https://www.gnu.org/software/gawk/manual/)

------
segmondy
I've this book and highly recommend. I have referenced it numerous times to
pull out text manipulation wizardly that stunned others.

~~~
technofiend
Lol. This is the IT equivalent of the hairy dog story about the itemized
million dollar invoice: $1 - drill a hole, $999,999 - knowing where to drill
the hole.

And by that I mean sometimes it seems easy to solve a problem because you have
the skill to do so. It _looks_ easy but that's only because of the time
invested in making it easy _for you_. For anyone else the challenge remains.

------
excitom
I always found it interesting that the Awk paradigm is also the basis for
IBM's RPG language. Two very different environments coming up with basically
the same elegant solution for the same problem:

1\. Run zero or more setup operations.

2\. Loop over the lines of a text file and process its columns into an output
format.

3\. Run zero or more cleanup operations at the end.

------
gerbilly
If you want a binary alternative to awk, try using lex.[1]

You can feed in regexps and c code fragments and it will generate c code for
you.

[1] [https://www.tldp.org/HOWTO/Lex-YACC-
HOWTO-3.html](https://www.tldp.org/HOWTO/Lex-YACC-HOWTO-3.html)

------
anoonmoose
I wanted to know what the language looked like so I went to the first example
in the book and found this:

    
    
      This is the kind of job that awk is meant for, so it's easy. Just type this command line:
      awk '$3 > 0 { print $1, $2 * $3 }' emp.data

------
vasili111
Lots of comments about awk, perl and sed for text proccessing. What about tcl?

------
wenc
Recycling an older HN discussion on Awk vs Perl

[https://news.ycombinator.com/item?id=14647022](https://news.ycombinator.com/item?id=14647022)

------
carlmr
I just skimmed it in 30 minutes. I feel I can write some simple stuff now.
Except for all the examples it doesn't feel that overwhelming.

------
ulzeraj
I was joking with my coworker some weeks ago about how awk is condemned to be
forever used as a cut replacement.

------
rbc
I have this book. I use awk daily to do analysis of Suricata logs. It's great
for querying structured text.

------
mxschumacher
I love the typography

~~~
kps
That's troff at work.

~~~
tux1968
This has bothered me for a long time. It looks like i'm seeing something other
than what other people are seeing. So many of these documents just look awful.
In this particular case it does not look good either:

[http://i.imgur.com/e11d0aK.png](http://i.imgur.com/e11d0aK.png)

[http://i.imgur.com/0Ysr7QQ.png](http://i.imgur.com/0Ysr7QQ.png)

Look at the kerning on "Awk", it's not good. And look at the zoomed in
version, the characters all have pixelisation and jaggies.

These were just viewed using Firefox's default pdf viewer. Is there a way to
view them and see better a quality version of the document?

~~~
kps
Given that this is a scan of a book produced on a phototypesetter (no pixels),
you probably want a real copy.

~~~
tux1968
Yeah, it was just surprising to me that people were praising this document.

------
lasermike026
Awk has it's uses. If you use the command line you'll probably use Awk
occasionally.

I don't get the Perl hate. Perl's unpopularity may have something to do with
some of the languages design choices. I think what really killed it was Perl
coders. Some of the worst code I've seen happened to be written in Perl. If
you follow clean code principles Perl is fine. Mojolicious is an awesome
framework. I like it a lot.

Today I code Python and C. I used to code Ruby and before that Perl. I loved
Ruby's syntax but Ruby seems to be waning. I'm looking forward to coding in
Go. I'll be coding Javascript but I'm not looking forward to it.

Use the tool that fits the job. I have no loyalties to any programming
language.

~~~
snaky
>Perl's unpopularity

Is a myth actually.

> Tom Radcliffe, recently presented a talk at YAPC North America titled “The
> Perl Paradox.” This concept refers to the fact that Perl has become
> virtually invisible in recent years while remaining one of the most
> important and critical modern programming languages. Little, if any, media
> attention has been paid to it, despite being ubiquitous in the backrooms of
> the enterprise.

> Yet at ActiveState, we have seen our Perl business continue to grow and
> thrive. Increasingly, our customers tell us that not only are they using
> more Perl but they’re doing more sophisticated things with it. Perl itself
> recently made it back into the Top 10 of the Tiobe rankings, and it remains
> one of the highest paying technologies. Therein lies the paradox.

[https://www.activestate.com/blog/2016/07/perl-
paradox](https://www.activestate.com/blog/2016/07/perl-paradox)

~~~
zzzcpan
At the bottom of top 20 now and pretty much no job openings in most countries.

~~~
Verdex_3
With perl's use cases, I would be a bit surprised to see a job specifically
for programming in perl. It's a tool that you use to support your other
infrastructure. It's not a tool that you generally use to build up that
infrastructure.

Similarly, I don't expect to see many jobs openings for wrenchers, but I fully
expect a mechanic being hired someplace to be able to use a wrench.

~~~
kazinator
> _see a job specifically for programming in perl_

That was exactly seen some twenty years ago. Perl as used to build
infrastructure, like entire back-ends for sites and whatever.

You don't see those jobs _anymore_.

------
florinutz
_bold_ was called _heavy_ , lol

------
mdavid626
I know many people will downvote, but in my opinion, just say no to this
ancient "programming language". It's so confusing, completely text based,
designed ages ago in an entirely different environment. There are many better
alternatives, like Python or Powershell. Why not use them?

