
Awk As A Major Systems Programming Language, Revisited (2018) - kick
http://www.skeeve.com/awk-sys-prog.html
======
ojosilva
Back in the 90s when I got my first job, Awk was the gateway drug to the
stronger stuff, Perl. Once you were hooked, productivity skyrocketed and
managers would wonder around the office saying "how does he do that 10-day
thing in just 10 minutes?" Those were the days when dynamic languages were
game changers in the Unix landscape. But, in fact, the mainframe folks and the
PC crowd had had Rexx and BASIC for at least a decade prior. I still can't
believe C was the standard for data etl and general reporting at so many
venues.

~~~
cs101
> productivity skyrocketed

Was the increased productivity due to Awk or was it due to Perl?

~~~
wainstead
Really, any of the scripting languages.

My recollection is there was a lot of disdain for scripting languages because
they could not match the speed of C or other low level languages. Today,
computer speed is so many orders of magnitude faster it’s hard to believe the
speed of scripting languages was ever an issue.

John Ousterhout’s paper on programmer productivity gains comes to mind:

[https://web.stanford.edu/~ouster/cgi-
bin/papers/scripting.pd...](https://web.stanford.edu/~ouster/cgi-
bin/papers/scripting.pdf)

~~~
adrianN
Just a few days ago I wrote a simple awk script to parse some log files but it
was horrendously slow. I had to replace understandable loops with weird calls
to builtin functions to make it fast enough for my usecase.

~~~
dublin
You're doing something wrong. I've used awk to run big data reformatting jobs
in under an hour that took most of a day to run in Scala on an Apache Spark
cluster. In the vast majority of cases today, if speed is your problem, then
_you_ are the problem - especially since most problems fit into RAM these
days, even w/o exotic stuff like RAMcloud...

------
mjcohen
In my career, I wrote a number of multi-thousand line gawk programs. The
longest was a text formatter about 6,000 lines long. I have now been retired
for 9 years and still write occasional small gawk programs.

The good old days.

------
jasonpeacock
Years ago, I bought the O'Reilly Awk & Sed book with the intention of becoming
a linux guru and master the commandline.

Then I realized most all the awk/sed stuff looked very similar to Perl, which
I already knew, and I ended up just becoming very good at Perl 1-liners.

------
snarfy
My first programming job was on a telephone exchange system that was 50k lines
of sed, awk, and korn shell.

~~~
sigzero
I'm sorry. ;)

~~~
snarfy
lol. It was a complete system with accounts, billing, 'tui' screens, all glued
together by scripts, kron jobs, and awk text processing. It definitely
embodied The Unix Way™ of everything is a file and small tools you can pipe
in/out to to build larger systems. They did.

~~~
mikeyjk
How did that function in terms of maintenance? Was it clear where bugs existed
or where new functionality needed added? I'm curious as I've never gotten to
work on a system like that before.

~~~
snarfy
New features? They started rewriting it shortly after I was hired. Maintenance
issues were always fixed by the grey beards. One of them was blind and used a
talk box, a wyse terminal, and vi.

~~~
nickpeterson
That sounds like a man who could handle his abstractions.

------
thewhitetulip
Few months back I finally discovered the true power of the shell, began
writing an into book

[https://github.com/thewhitetulip/awk-anti-
textbook](https://github.com/thewhitetulip/awk-anti-textbook)

~~~
bla3
Your "Gitbooks" link links to a Go book, not to your awk book.

~~~
thewhitetulip
Oh yes. Apologies for that. I haven't created gitbooks of awk guide yet. Will
fix it asap

------
markus_zhang
From the using the right tool for the right job perspective, can any expert
let me know when is the best time to use awk or sed if other tools are also
available? I know awk and sed are different tools so maybe I'll ask a more
general question: given other tools e.g. Python are available, what's the
suitable scenario to use cl tools?

~~~
jumpman500
Awk and sed are generally superior for simple data transformations than
python. Like cutting off specific lines, columns or replacing characters in
files. They have pretty great performance in large files.

Once things get complicated though you should probably push whatever
transformations you need to do to a database.

~~~
manifoldgeo
I would like to offer a counter-argument. I’ve been torrenting TV series for
years, and I always have to do a mass rename of The files afterward to get
Plex to process them properly and download the metadata. They’re always named
something like “TV Show Name 2014.L337.H4XXX1080p-420.mkv”.

The other day I set out to use BASH to mass rename, and I had such trouble
backslash-escaping the dot characters and whitespaces to prevent sed from
interpreting them specially. Eventually I gave up and searched for the
pythonic way to rename them, and it was as simple as a string replacement
followed by a call to os.rename() inside a for-loop. It was a breath of fresh
air to escape “command line Kung Fu” and fearing the thrashing from shell
globbing.

To be fair, I got my start in using sed and awk as powertools in the BASH
command pipeline, but I don’t miss them compared to a language with strong
data types and simple built-in methods for handling complex manipulation.
Python is built in to basically every Linux distro that has BASH, and for the
sake of simple transformations, it offers a lot of succinct methods that work
on either 2.7 or 3.x with no external packages.

~~~
beojan
You need zmv

~~~
manifoldgeo
Thanks! I hadn't heard of zmv before, and it fits my use case perfectly.

------
andrewstuart
I've got a real soft spot for Awk but any time I want to get _anything_ done I
have to read the manual.

I know I like it, but I'd have to be working with it alot more for the basics
to sink in.

And the thing is that in almost all cases Python or (ugh) bash can get the
same job done so I rarely pick up Awk.

~~~
mistahenry
I use awk very often at my current job when analyzing our logs. I've been
slowly compiling a list of useful shell pipelines, the majority of which
involve awk. Any time I need to do some novel analysis, I can steal the
syntax/ideas from myself

Mawk on debian is super fast and can determine statistics like messages/sec,
uniq ip addresses for a particular user, etc from 10GB log files very quickly.

------
timClicks
Awk could be a sleeping giant, because it's required as part of POSIX
compliance. With the dominance of Linux though, I don't know if portability is
as large of a concern as it might have been historically.

~~~
shakna
I still regularly need to write portable scripts, but it is niche, and not
nearly as easy as it should be.

Several of our "grunt work" servers run macOS. So you're either left with a
seriously outdated set of utilities, or need to write wrappers to first
attempt to use gX versions first, and then fallback to standard names.

The options I'm trying to use aren't GNU specific, and can be found across the
BSDs. But Apple is old, so you can't fully count on POSIX.

~~~
throw0101a
> _But Apple is old, so you can 't fully count on POSIX._

Strange thing to say since macOS is officially certified UNIX(r) POSIX, and
has been for many years:

* [https://www.opengroup.org/openbrand/register/brand3653.htm](https://www.opengroup.org/openbrand/register/brand3653.htm)

* [https://en.wikipedia.org/wiki/POSIX#POSIX-certified](https://en.wikipedia.org/wiki/POSIX#POSIX-certified)

~~~
shakna
However, if you check the compliance documents, you'll find macOS has a number
of waivers. For example [0].

[0]
[https://www.opengroup.org/csq/repository/noreferences=1&RID=...](https://www.opengroup.org/csq/repository/noreferences=1&RID=apple%252FCX1%25252F12.html)

------
PaulHoule
I like awk as a lisp replacement.

I have to look at the manual to remember how to write loops in bash, so I
write an awk script that writes a bash script and pipes to 'bash'. People will
tell you this is a bad idea because you get character escaping risks as with
SQL injection - they are right, but it is so much fun.

I have done this with three layers of code generation.

~~~
hnlmorg
I totally understand your pain about iteration in bash. That was one of the
main reasons I ended up writing my own shell and scripting language.

------
a3n
'89 or '90 I wrote a personal version control tool in awk, shell and rcs.
Because I didn't know better.

------
glofish
The problem with awk today is that it has very few features that would make it
superior to Perl,Ruby or Python.

The only instance where awk has utility for me is when the program is short
enough to be explicitly specified at the command line, for example:

    
    
      $3 > 10 { print $6 - $5 }
    

for that it is awesome, you don't have to look inside another program to
figure exactly what is happening, it is explicit etc. it is also super fast,
much-much faster than splitting with a typical scripting language

for anything more complicated than that, it offers very few benefits (I'd be
hard pressed to name any) and significant limitations.

Thus IMO the problem with awk is that it does too little and offers to little
room to grow.

~~~
Spivak
But you’re missing out on so much good stuff when you do this. Awk might not
replace Python once you start needing external dependencies but awk can
absolutely replace bash in a lot of situations.

~~~
pizza234
Bash and Awk are not the same class of languages, so they can't be compared
to/replace each other.

Bash is a glue language, while Awk a scripting one, so it rather makes sense
to compare Awk with Python/Ruby/similar.

------
thrwaway69
I tried awk before (the default implementation on most distros) to do a simple
task of making a template engine, I learnt a bit about awk and sed (basic
stuff) but I couldn't manage to do what I wanted that I could do in python
with a few lines and one minute.

The man pages are nice but I didn't have the patience to start reading every
thing to just do simple stuff like replace regex pattern with a content of a
file located at the path generated from a capture group of that regex and some
other stuff.

~~~
flukus
> to do a simple task of making a template engine

Not quite sure what you mean but it does sound like awk was the wrong tool for
the job there. For the sort of templating I'm thinking of shell scripts or m4
would have been a better tool. Taking some structured data and piping it to
one of those is where awk shines (that and pattern matching).

~~~
thrwaway69
It was for making a static site generator. Templating engine was a part of it
where I wanted to add functionality for components, generating event handling,
adding SEO (meta tags), gluing shell code etc similiar to jsx/vue.

------
ggm
I have a small oneliner wrapped in shell I use to get uniq, but emitting stuff
as seen and never repeating. (hash table counting refs)

I routinely use awk when cut is obsessing about IFS awk does LWSP compression
so you can get ' one two. three' to match properly when cut thinks field 1 (oh
god, code which counts from 1 not zero..) is a ' ' space awk '{print $1}' just
works.

I used awk to compile a list of unique IP addresses seen over GB inputs, 350m+
unique IPs. it was within scale both for memory footprint and speed of python
and perl, for hash constructs. Basically, Brian coded it efficiently, all the
perl claims of maximal hash efficiency did not add much in terms of speed OR
size outcome.

I choose to code in python3, but I use awk for one liners. Its great. I avoid
gawk-isms. I don't see the need.

~~~
jgehring
Agree, it's so convenient for grabbing fields that I ended up writing a bash
script that generates an awk script since the '{print $1}' is cumbersome to
type, and I can never remember how to properly output multiple fields.

At some point I thought that had the ideal use case for awk (a git --graph
filter) and spent an evening desperately putting it together because, as other
commenters mentioned, it's hard to find good documentation and examples
online. Sure, I have a fast and mostly-working filter now, but the code is
also hard to understand or even debug. On the other hand, the examples linked
in the article are actually a lot more readable than I expected, so maybe it's
something to consider for small but frequently-used log parsing scripts.

~~~
ggm
If I wound up on a host in /rescue mode awk would be my go-to to fix up
'convert this to that' changes, maybe even grobble into the piped inputs of
other commands to get debug data marshalled up. If you have a bigger system,
there are better tools. If you have to live in the small state of a /rescue,
knowing how to use sed/ed/grep/awk is data-saving.

Sometimes people observe I'm using three tools with pipes to do one job, and I
freely admit grep <pat> file | awk '{print $2}' | sed -e 's/this/that/g' is
probably stupid, but I do think of these atoms as tools for the job. Grep
aside, sed and awk should be fully interchangeable for many pipe jobs, and
when not BEGIN{} ... END{} you could do the whole thing in awk or sed simply.
If it has pre- and post- states, Awk is ideal. But.. the mind does what the
fingers remember.

Pipes are cheap.

------
derjames
The AWK programming language. PDF ahead
[https://ia802309.us.archive.org/25/items/pdfy-
MgN0H1joIoDVoI...](https://ia802309.us.archive.org/25/items/pdfy-
MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf)

~~~
every
Heh. My 1988 copy is sitting on the shelf next to me...

------
downerending
It has a great history, but the only thing I use it for anymore is splitting
fields on whitespace, and that only because the 'cut' maintainers won't add
this (tiny) feature.

~~~
samatman
Fex is your new best friend!

[https://www.semicomplete.com/projects/fex/](https://www.semicomplete.com/projects/fex/)

~~~
lousyd
Ugh. Why braces ( { } )? That's the biggest tiny pain of using awk. It's an
odd stretch for my fingers to type them and I have to look down to be sure I
hit them right.

~~~
adrianN
Use one of the many tools for customizing your keyboard layout. {} are Alt+J
and Alt+K for me.

------
dublin
Awk is the ideal AWS Lambda language, and should be supported as a first-class
citizen there. Add the ability to tag such a function from a URL (w/o the API
gateway cruft), and AWK and netcat could replace tons of troublesome and
expensive dynamic data management and ETL code that winds up living in much
more complex and expensive environments today...

------
known
[https://rosettacode.org/wiki/Category:AWK](https://rosettacode.org/wiki/Category:AWK)
has versatile code snippets in AWK

