
The Awk Programming Language (1988) [pdf] - dang
https://archive.org/download/pdfy-MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf
======
nprescott
One of my favorite books - I initially bought a copy based on a review by
Brandon Rhodes [0]:

> But the real reason to learn awk is to have an excuse to read the superb
> book The AWK Programming Language by its authors Aho, Kernighan, and
> Weinberger. You would think, from the name, that it simply teaches you awk.
> Actually, that is just the beginning. Launching into the vast array of
> problems that can be tackled once one is using a concise scripting language
> that makes string manipulation easy — and awk was one of the first — it
> proceeds to teach the reader how to implement a database, a parser, an
> interpreter, and (if memory serves me) a compiler for a small project-
> specific computer language! If only they had also programmed an example
> operating system using awk, the book would have been a fairly complete
> survey introduction to computer science!

[0]:
[http://stackoverflow.com/a/703174/2912179](http://stackoverflow.com/a/703174/2912179)

~~~
wyclif
Why you should learn just a little Awk: [https://gregable.com/2010/09/why-you-
should-know-just-little...](https://gregable.com/2010/09/why-you-should-know-
just-little-awk.html)

------
david-given
I wrote a compiler in awk!

To bytecode; I wanted to use the awk-based compiler as the initial bootstrap
stage for a self-hosted compiler. Disturbingly, it worked fine.
Disappointingly, it was actually faster than the self-hosted version. But it's
_so_ not the right language to write compilers in. Not having actual
datastructures was a problem. But it was a surprisingly clean 1.5kloc or so.
awk's still my go-to language for tiny, one-shot programming and text
processing tasks.

[http://cowlark.com/mercat](http://cowlark.com/mercat) (near the bottom)

(...oh god, I wrote that in _1997_?)

~~~
thisrod
I've always thought that AWK's most important feature is its self limiting
nature: no one would ever contemplate writing an AWK program longer than a
page, but once Perl exists the world is doomed to have nuclear reactors driven
by millions of lines of regexps.

But no, there's always one. :-)

~~~
ams6110
I've written one or two awk programs that probably went beyond what the tool
was intended for, but mostly I use short one-liners or small scripts. I use
awk, grep, sed, and xargs pretty much daily for all kinds of ad-hoc
automation.

~~~
joepvd
> beyond what the tool was intended for

Not sure what that would mean. I think the tool was designed to be a user's
programming language. I liken to think that `awk` was the Excel + VBScript of
its days.

~~~
throwaway7645
VBScript was largely replaced on Windows by Powershell. Awk is still popular
for what it's good at.

------
chubot
Last year I dug up Kernighan's 2012 release of awk, fixed up the test suite
packaging and automated it, and wrote a makefile which adds clang ASAN
support.

It found a couple bugs because the test suite is quite comprehensive. I think
it's somewhat interesting that 5000 or so lines of C code polished over 20
years still has memory bugs.

I didn't fix the bugs, but anyone should feel free to clone it and maybe get
some karma points from Kernighan. Maybe he will make a 2017 release. He is
fairly responsive to email from what I can tell :)

[https://github.com/andychu/bwk](https://github.com/andychu/bwk)

~~~
nickpsecurity
" I think it's somewhat interesting that 5000 or so lines of C code polished
over 20 years still has memory bugs."

Typical might be the word you're looking for there. The CompSci people doing
static analysis for C programs often apply them to popular FOSS. They find new
errors about every time.

~~~
qwertyuiop924
As much as I defend C, if you're using C in a non-embedded environment, and
you're handling any sort of textual input... don't. In fact, even if you're
not handling textual input, think about not doing it.

------
vram22
One of my commonly used Unix one-liners, using awk, is to get the sum of the
file sizes for the files listed by the ls command (with the -R for recursive
option if wanted):

ls -lR /path/to/dir | awk ' { s += $5 } END { print s / 1024 " K" } '

$5 is the 5th field of the output, which is the file size field in the case of
ls output. The code inside the first set of braces runs once for every line of
input (which comes from standard input, so from the ls command, in this case),
and the code inside the second set of braces runs at the end of the input,
calculating and printing the desired result of the total of all file sizes for
files found by ls, in kilobytes. It can easily be changed to output the total
in bytes or megabytes by dropping the '/ 1024' or adding another one after the
first. Variable s is initialized to 0 by default at the start.

You can get similar info with "du -hs /path/to/dir" but the ls plus awk
pipeline lends itself to more customization, such as adding conditions for the
type or owner of the file, etc.

~~~
mnaydin
I'd use the _find_ command with the _-printf_ option (GNU _find_ has this
option but POSIX _find_ doesn't define it) instead of _ls_. For instance:

find /path/to/dir -type f -printf "%s\n" | awk ' { s += $0 } END { print s "
bytes" } '

The _find_ command has much powerful file filtering capabilities than that of
the _ls_ command and works better with weird characters in filenames.

~~~
vram22
Thanks. Yes, I'm aware that in general find is a better option (long time Unix
guy) than even a recursive ls command (ls -R) for finding files under a
directory and processing them in some way (often together with xargs, to get
around the args length limit). But mine was just a quick example, so I didn't
use find. Actually, find is also better for this example, because with it, you
do not have to deal with per-dir header lines like "dirname:" and "total n" (n
blocks) that ls outputs. (The headers may not matter for my example, because I
only process field 5, but they can matter for other kinds of processing of the
output.)

There is also the -print0 option to find to handle filenames with newlines in
them.

-print0 may be non-POSIX and a GNU extension.

POSIX has -print, but interestingly, in some Unixes I have seen that not using
-print still prints the filenames found, by default.

~~~
mnaydin
That's the expected behaviour. Quoting from spec:

If no expression is present, -print shall be used as the expression.
Otherwise, if the given expression does not contain any of the primaries
-exec, -ok, or -print, the given expression shall be effectively replaced by:
( given_expression ) -print

[http://pubs.opengroup.org/onlinepubs/9699919799/utilities/fi...](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html)

~~~
vram22
Yes, I wasn't implying the behavior is wrong. Was just mentioning it. Anyway,
thanks for that link, which explains why. That Open Group info on POSIX
utilities is a great resource for when you want to know the comprehensive,
well-specified behavior of the commands.

------
luckydude
I've got the source code to both the book (in English and French) as well as
awk. How? I sent email to bwk that we were trying extend awk to be sort of
threaded (think awk scripts as first class so you have awk foo { } awk bar { }
and you could do foo | bar). We called it bawk, BitMover's awk.

Anyhow, I asked Brian if we could base it off the one true awk and he tarred
up ~bwk/awk and sent it to me.

I love that guy, the culture of the Bell Labs people and the people that
worked with them is great.

I've stolen a bunch of awk ideas over the years. BitKeeper (first DSCM) has a
programming "language" for digging info out of the repository. For example,
this:

[http://www.mcvoy.com/lm/bkdocs/dspec-changes-
json-v.txt](http://www.mcvoy.com/lm/bkdocs/dspec-changes-json-v.txt)

prints out the repo history as a json stream. One of my guys said that it
couldn't be done, heh, it could be :)

Everyone should learn some awk, it's so handy.

------
cagey
PolyAWK (by Polytron), which included a copy of this book with each unit sold
(see sticker on the cover of the linked PDF), was a _favorite_ tool of mine
"back in the DOS (and early Windows) days". It was IIRC developed by Thompson
Automation Software[0], who later sold the software package directly. The
Thompson Automation Awk package included an awesome _awk compiler_, allowing
creation of standalone .EXE files (using a 32-bit DOS extender, and later a
Win32 version) from 1+ awk source files. The compiler presumably generated
bytecode which was bundled into the .EXE file along with a 32-bit runtime
which provided data capacity sufficient for a wide range of real-world
projects. Anyway, TAWK gave me a huge productivity boost for a number of years
during a time when such languages were only beginning to become available on
the PC platform. And the ability to create single-file standalone EXE files
greatly eased distribution of the tools I created. Good times.

[0] [http://www.tasoft.com/](http://www.tasoft.com/)

~~~
throwaway7645
Wtf? Why doesn't anything like this exist today? Windows doesn't have anyway
to create an .exe (that I'm aware of) besides C#, C++, turbo basic and that's
about it. All I really want is a way to write terse code and release it to
other users without installation of a runtime ...etc. I can't even distribute
PS, because you can't guarantee another user has the right version.

~~~
cagey
I certainly agree with your sentiment. Yet I'm unsure that even C# or C++ _by
default_ builds .EXE files that can be copied onto another (same OS) machine
and run successfully (for C++ this is almost always possible, but IIRC not
default build behavior).

"What happened" was industry-wide standardization on dynamic-linking of
prerequisite (library) code (IOW, this code stays in separate DLL files
typically stored in system-global locations), leading to the need for
"installer" software whose purpose (I presume; I've entirely avoided dealing
with that stuff) is to ensure that all prerequisite dynamic libraries are
upgraded to the minimum version needed by the SW being installed, replaces any
old version(s) of the program with the new, and modifies the Windows Registry
in various and sundry ways (can you say "system-global variables run amok"?).
The solution which I prefer is to build static-linked .EXEs (binaries) instead
of dynamic-linked. Convincing toolchains to do this is a small exercise for
the reader. OBTW: I think go (golang) static-links by default.

I stopped using TAWK compiler when I discovered Lua (5.1; IMHO a substantially
better language than TAWK (this is not a criticism of TAWK)). I even went so
far as to commission a "Lua Compiler" for Win32 which behaved almost
identically to the TAWK compiler; I used this with great success for a few
years. Unfortunately it was an internal tool which I lost access to when I
departed that employer.

P.S. IIRC Borland Delphi also builds static-linked EXEs by default. I wrote
one Delphi 2 (Win NT 4.0 era) program whose source code I've kinda lost track
of which still runs fine on Win10 x64. TAWK and Lua are more productive
languages than Delphi/Pascal, and it's trivially easy to add your own C
library functions to Lua (for improved performance or added functionality), so
I gravitated toward an overall preference for Lua.

~~~
throwaway7645
Yea, but neither Lua or LuaJIT can make a true binary without some hack where
you package up the interpreter as well.

~~~
etiene
It's not complicated and the interpreter is super light though

------
dang
Plain text version here, but the formatting is off in places:
[https://archive.org/stream/pdfy-
MgN0H1joIoDVoIC7/The_AWK_Pro...](https://archive.org/stream/pdfy-
MgN0H1joIoDVoIC7/The_AWK_Programming_Language_djvu.txt).

------
jph
Awk is the #1 language I learned this year for fun.

I wrote a simple command line statistics tool that uses awk to calculate sum,
stddev, and more.
[https://github.com/numcommand](https://github.com/numcommand)

------
technofiend
Since this is Hacker News my plea may be answered: does anyone have the
artwork or an actual example of the infamous AWK T-shirt? From memory it
features a bird jumping (pàrachuting) out of a plane and is titled with AWK's
most famous error message: "awk: bailing out near line 1."

------
lucidguppy
This book should be required reading for anyone looking to write their own
tech books.

It's short, clear, and concise. It's useful and helps you solve real problems
with AWK. Who could ask for anything more?

~~~
vram22
Agreed.

The fact that one of the authors is Brian Kernighan is partly why, IMO. Just a
few days back, I commented here in reply to someone about the quality of his
K&P and K&R books (both of which I've used for trainings), on Unix and C
respectively.

------
bglazer
I wish that certain simple tasks in awk were a little less verbose, especially
for command line use.

The number one example for me is counting by string in a csv file:

>> awk -F',' '{a[$1] +=1} END {for(v in a) print v,a[v]}'

Not that this is particularly difficult stuff, it's just a bit exhausting to
find myself typing that over and over again. I'd love a more concise
alternative to this.

Also, 'sort | uniq -c' is not a viable alternative for very large files.

~~~
jmts
Sounds like an opportunity to save it in a file! My $HOME/bin is full of shell
scripts containing this sort of stuff so I can avoid retyping them.

~~~
vram22
Yes, that approach is normal automation and also part of the Unix philosophy.
And of course you can pass command-line arguments to the scripts too.

A useful fact that I've seen some people didn't know: Unix metacharacters such
as star and question mark (for filename matching, $ (in various uses such as $
star, $#, $!, etc.) - are all expanded by the _shell_ , not by individual
commands, so use of metacharacters is actually available to _all_ commands and
scripts that are run at the shell prompt - not just to selected or built-in
ones.

Contrast that with DOS (at least in earlier versions) which had the problem
that some commands supported wild-card characters such as for filename
matching, but others did not. You could write your own logic for that using OS
API calls (Int 21H etc., IIRC), or calls named like FindFirst and FindNext)but
it was not built-in and freely available.

Edited for formatting.

------
carlisle_
People are often surprised when I mention that awk is Turing complete. It's
quite a powerful tool, I can't imagine loving the command line as much as I do
without it.

~~~
bluesmoon
even sed is turing complete. I once did a talk about this at Opensource
Bridge: [http://tech.bluesmoon.info/2008/09/programming-patterns-
in-s...](http://tech.bluesmoon.info/2008/09/programming-patterns-in-sed.html)

------
joepvd
I love awk for text processing purposes. When analyzing log files, I often
drop down into awk-mode to check the exceptional constellation that is
currently under investigation. Very powerful to be able to say after three
minutes: This happens in 0.5% of the cases.

Bought this book 2nd hand online. This book on one day costs $150, and on the
next $2. The first bit has been an awesome read, never got to read much more.
Tend to read much more from $READER. Sure this PDF will get me going again!

------
gallerdude
Man, I need to study some more weird languages. Just got done with the basics
of Python and C for my first CS class. Over the summer I want to tackle LISP.

~~~
rashkov
I can definitely recommend the University of Washington's Coursera on
Programming languages. It's available here, and starts every few weeks I
think: [https://www.coursera.org/learn/programming-
languages](https://www.coursera.org/learn/programming-languages) You'll learn
SML (a strongly typed functional language), Racket (in the Lisp-scheme family
of languages), and Ruby (to compare the previous languages with object-
oriented ones). You'll even write your own language on top of Racket. It is a
challenging class but I can say for sure that it has changed the way I think
about programming.

------
banku_brougham
Are the /AA and /ObjStm items a concerning indicator? This is the limit of my
familiarity with pdf-id:

    
    
            ＄> python2.7 pdfid.py The_AWK_Programming_Language.pdf 
    	PDFiD 0.2.1 The_AWK_Programming_Language.pdf
    	PDF Header: %PDF-1.6
    	..
    	/Page                  0
    	/Encrypt               0
    	/ObjStm                7
    	/JS                    0
    	/JavaScript            0
    	/AA                    1
    	/OpenAction            0
    	/AcroForm              0
    	/JBIG2Decode         222
    	...
    

It has /AA which is an automatic load action, and it has a lot of objects
which could contain javascript, would need closer scrutiny I think.

~~~
banku_brougham
I posted the above to prompt explanation from someone with expertise in pdf
malware to validate the safety of the linked pdf. I'm concerned that it has
open actions and objects that could be used to obfuscate js code. The author
of pdf-id flags these attributes as requiring further inspection.

Wouldn't a tech pdf of a popular book that is impossible to obtain legally in
digital form be an excellent vector to deliver malware to tech users with
probably lots of stored credentials to resources?

------
michaelsbradley
Robbins' open source book may be of interest as well:

 _GAWK: Effective AWK Programming_

[https://www.gnu.org/software/gawk/manual/gawk.pdf](https://www.gnu.org/software/gawk/manual/gawk.pdf)

------
zoom6628
Used AWK a whole lot in early 90s for massaging source code. Mostly to analyse
and refactor 1m+ LOC of COBOL. And Awk was brilliant for that. Have used it
ever since when needed to text process. Around 2000 was using it a lot to get
convert systems by running reports on old system and then getting the data
from output text files. Clunky way to do it but faster than typing when there
is no way to get the data directly. If a system can print to a text file then
the data is available. Use awk still on Windows, OSX and Linux. Its an
essential tool when faced with string/text processing tasks.

------
kazinator
Awk as Lisp macro in TXR:

[http://www.nongnu.org/txr/txr-
manpage.html#N-000264BC](http://www.nongnu.org/txr/txr-
manpage.html#N-000264BC)

It has direct counterparts to all POSIX features, plus a number of extensions
similar to ones found in Gawk, as well as some of its own: for instance, range
expressions which freely combine with other expressions (including other range
expressions), and range expressions which exclude either or both endpoints.

------
sstanfie
Literally writing a small awk script, took a break to check Hacker News. Nice.

------
iconara
It triggers my OCD that the names of the authors are in alphabetical order on
the cover and not in, you know, the logical order.

~~~
torrent-of-ions
Go on... explain why it's not the logical order.

------
contr-error
Is there anything that "explains" sed only half as well as this book? I know
how to use basic sed, but haven't yet completely grokked the way pattern space
and hold space really go together.

~~~
hackermailman
The Unix Programming Environment covers sed/ed enough to grok it
[https://en.m.wikipedia.org/wiki/The_Unix_Programming_Environ...](https://en.m.wikipedia.org/wiki/The_Unix_Programming_Environment)

------
qwertyuiop924
AWK is still my go-to scripting language for quick tasks, like simple
computation and basic data analysis. It is still the best thing in its problem
space.

Given, AWK's problem space is very small, but still...

------
mcintyre1994
I've been learning this at work as part of a get-good-at-Linux regime :) One
of the most surprising things for me is that as horrible to a beginner that
some of the one liners in the command line can look, it's actually quite a
forgiving scripting language. I don't think I've seen another language where
you can increment a variable without declaring/initialising it, nor where you
can set indices on an array without it being declared (except in a constructor
fashion I guess).

~~~
tyingq
Perl does those things if you don't "use strict;"

    
    
      $ perl -e '++$i && print $i;$foo[2]=99;$foo[1]=88;for (@foo) {print}'
      18899

~~~
mcintyre1994
Thanks for pointing that out! I haven't used perl yet, it should probably be
on my list of things to learn though :)

~~~
tyingq
A fair amount of it is based on awk, so awk might be better to start with.

~~~
throwaway7645
Perl5 is a really big language...especially when you look at extending it with
the 10 gigs of CPAN code. It grew from a need to have a better Awk, but I'm
not sure if they're related close enough for starting with Awk to matter.
Perl6 is an all around sister language that isn't ready for production yet,
but has a ton of power and features including the ability to call other
languages (python, perl5, Lua, scheme).

~~~
tyingq
Sure. I meant that though in the context of _" I've been learning this at work
as part of a get-good-at-Linux regime"_. Which I suppose involves using awk
mostly for command pipelines / one liners. Awk and Perl would have a lot of
overlap (autosplit, matching, BEGIN, END, etc) in that space.

~~~
throwaway7645
Yea, if you're just doing one-liners...probably easier to start with Awk. For
more complicated scripts, Perl or Python should be built in.

------
getpost
I used awk until I learned Python (long ago). For me, awk was yet another
example of the "worse is better" approach to things so common in unix. For
example, if you make a syntax error, you might get a message like "glob: exec
error," rather than an informative message. "Worse is better" is probably a
good strategy in business and for getting things done, but still, mediocrity
and the sense of entitlement that so often goes with carelessness, sickens me.

------
nat
Awk meshes very well with a lot of my natural inclinations about text
processing. I've sadly stopped using it lately as it seems that the majority
of my use cases these days run up against a (to me) glaring deficiency in the
language. Specifically, capture groups in pattern regexes. It's probably one
of those "you're doing it wrong" kind of things, but if awk had that one
feature, I probably wouldn't ever need to use perl.

------
elchief
awk one liners:

[http://www.pement.org/awk/awk1line.txt](http://www.pement.org/awk/awk1line.txt)

~~~
pkrumins
explained:

[http://www.catonmat.net/blog/awk-one-liners-explained-
part-o...](http://www.catonmat.net/blog/awk-one-liners-explained-part-one/)

------
gwu78
Not sure about GNU, but BSD build systems depend on AWK for building
installation media.

crunchgen, a compiled C program, has to call AWK.

Anyone out there do AWK-less builds?

Why did I need to learn a little AWK?

Because I could work out how crunched binaries were built without knowing some
AWK.

Best thing about AWK IMO is the C-like syntax.

For anyone learning C and AWK concurrently, this kills two birds with one
stone.

------
throwaway7645
I love Awk on Unix. I really wish Windows had something closer to this.

~~~
sizzzzlerz
you can get it, along with a number of other Unix tools, with MinGW
([http://www.mingw.org/](http://www.mingw.org/))

~~~
cylinder714
[https://mingw-w64.org/](https://mingw-w64.org/) is actively developed,
whereas MinGW appears to be dormant.

------
based2
[http://jawk.sourceforge.net/](http://jawk.sourceforge.net/)

------
kworker
The printed book still expensive on amazon. I guess it's still important these
days.

------
patrickg_zill
I was working with a VOIP startup, and they needed to find some unique numbers
in their CDR's (call detail records, basically a CSV list of calls made,
duration, etc.) .

Loading the file into Excel took literally minutes as Excel tried to parse
every field. It bogged down a 16GB RAM machine.

Using awk and uniq, the total run time of getting a solution , including
reading the many MB of files and generating a summary into another file, was
about 6 seconds.

