
Learn Just a Little Awk (2010) - ingve
https://gregable.com/2010/09/why-you-should-know-just-little-awk.html
======
Stratoscope
I don't write much Awk code any more, but at one time it was my favorite
programming language, and not just for little one-liners.

A while ago I ran across a couple of my old Awk programs and uploaded them to
GitHub. One is an implementation of Mark V. Shaney (the infamous Markov chain
program), the other takes source code and prints it out in a nice "two up"
landscape format on a LaserJet II, including graphical boxes to replace the
ASCII dividers in comments like this:

    
    
      //------------------
      // Some comment here
      //------------------
    

I wish I had a sample printout, but they are long gone.

Awk was the first language I used that had associative arrays built into the
language, which made SHANEY.AWK almost trivial to write. Especially compared
with implementations I saw in C and other languages of the day.

I think LJPII.AWK is a good example of how you can write a fairly substantial
program in Awk and have it be (mostly) readable. (It was written for either
MKS Awk or Thompson Awk, I forget which - so there are a few things in it that
may not be available in a typical Awk today.)

[https://github.com/geary/awk](https://github.com/geary/awk)

[https://en.wikipedia.org/wiki/Mark_V._Shaney](https://en.wikipedia.org/wiki/Mark_V._Shaney)

------
kylek
Awk is a fantastic "swiss army knife", and always included (in some form) on
every *nix.

The article mentions FS but this is easier to access with -F. e.g. for some
quick and dirty CSV munging-

awk -F, '{print $2}'

Something else I find super useful is the built-in regex matching, e.g.-

awk '/ERROR/{print $(NF-1)}'

or maybe picking out all 503's from an web server access log-

awk '/503/{print}'

edit: formatting....wow hn is bad for this

~~~
pizza234
For some time I've been using both Awk and Perl for text processing, based on
which offered the most convenient command for a given task.

There are two (IMO) glaring omissions in Awk:

\- using the captured group(s) of a regular expression

\- slicing the arguments array

Mawk supports the first functionality, but it's inconvient.

Then at some point, I've found that virtually every task I do in my text
processing via Awk can be done via Perl in a comparable way... and much more -
including the two functionalities above - so I've ditched awk.

The only thing inconvenient is to type something like `perl -lane` as opposed
to `awk`, but this can be aliased :-)

~~~
Scarbutt
AWK can be learned much faster than Perl though.

~~~
webreac
Yes, but as soon as your awk script becomes complex, you need to switch to a
more complete language like perl (or python or ruby). Switch slightly earlier
and avoid learning all the subtelities of awk (the same applies to sed). You
need to learn well one scripting language and to know most common use cases of
sed, awk.

~~~
Annatar
No you don't. There is no reason to switch. I've written many full-blown
programs in AWK and I see no reason to switch them to another language --
they're fast, portable and readable, with tiny memory requirements. It all
comes down to mastery: someone who's mastered AWK will have no issues with a
large AWK program and therefore will feel no urge to translate that program to
another language. Large AWK programs are easy to maintain because the syntax
is so small that the code is always intrinsically understandable.

------
asicsp
Reminds me of a similar article [1]

Once you get a hang of how shell constructs like pipes, redirections, globs,
variable/command substitutions, etc, the various cli text processing tools
like grep, sed, awk, perl, sort, tr, pr, paste, etc are worth to learn at
least the basics. They have been through years of use and heavily optimized
for performance. Just today, a friend of mine called to ask how to improve a
Python script's performance for 5-10 MB text file. After confirming he isn't
using any other special modules, I advised to check if the performance
improves by implementing it using awk.

If anyone's interested in examples based tutorials on cli text processing
tools, check out my github repo [2]

[1] [https://blog.jpalardy.com/posts/why-learn-
awk/](https://blog.jpalardy.com/posts/why-learn-awk/)

[2] [https://github.com/learnbyexample/Command-line-text-
processi...](https://github.com/learnbyexample/Command-line-text-processing)

~~~
msla
The GNU Project hosts this essay, "The UNIX Shell As a Fourth Generation
Language":

[https://www.gnu.org/software/nosql/4gl.txt](https://www.gnu.org/software/nosql/4gl.txt)

The concept of language "generations" is mostly lost now, much like how we
stopped inventing names for how densely we pack transistors onto silicon once
we'd gotten to VLSI (Very Large Scale Integration); the point of the essay is
that the shell complete with a userspace is a programming language in itself,
and one which is much more programmable and extensible than, say, getting all
of your data into a database and using SQL to manipulate it. It's a
programmable environment, as opposed to a data silo, where the only tools you
have are the ones the silo's implementer deigned to provide.

~~~
marble-drink
There is a short talk by Eben Moglen [0] which is only 10 minutes long but jam
packed with wisdom. One part which changed the way I think about using
computers is how he describes GUI interfaces: as a "point and grunt" or
"caveman" interface. Using UNIX, on the other hand, is to use language to
communicate with the computer and comes with all the benefits that we have
when using it to interact with people.

[0] [https://youtu.be/uKxzK9xtSXM](https://youtu.be/uKxzK9xtSXM)

------
ssivark
Thinking over this, it just (finally) struck me that awk (and sed, grep, etc.)
are basically just _libraries_ (more like DSLs), and the shell is _just
another scripting language_ with a REPL/notebook environment.

When that's the case, what makes them particularly useful compared to some
other language with a uniform syntax and possibly better forms of "glue" \--
say Python, or lisp, or whatever? Why not allow people to acquire proficiency
in just one language, and use it consistently to communicate with their
computer? I don't think ubiquity/portability is a compelling answer, since
(for many/most people who care about this) it is relatively easy to install
the environment of their choice on their personal computing device. BTW, we
must also strive to maintain that flexibility in the next generation of
computing devices.

As an aside, folks who do like the task-focused nature of shell DSLs might
appreciate language-oriented programming with Racket.

~~~
mistahenry
Command line tools are very terse, heavily optimized (ie fast), interactive,
and easily chainable. I would never want to write a python program for quick
analysis of a log file when I can grep/sed/awk/sort/uniq my way to a solution.

The other day my boss asked me to find out if there was any second where we
processed more than 100 transactions a second on our server in the last 90
days. I needed to go through 100s of GB of data which mawk managed in no time
and in a relatively simple one liner. I imagine my idiomatic python script
would still be running

Often times, analysis is an evolving thing. As you uncover one answer, new
questions arise. It’s far easier imo to tweak my command line pipelines than
to modify program flow.

At the end of the day, general purpose programming languages are jacks of all
trades and masters of none. They’re inherently held back by their flexibility
when compared to DSLs. Text processing is so easy in awk b/c it’s a text
processing domain specific language. If you really think that we should use on
language for everything, you’re ignoring the value that specialization can
provide you, whether in speed, power, or ease of use

~~~
umvi
> I would never want to write a python program for quick analysis of a log
> file when I can grep/sed/awk/sort/uniq my way to a solution.

Yeah but I would wager very few people actually know awk. I would also wager
90+% of awk programs are pasted directly from StackOverflow.

That's why I like the idea of pawk[1] - it's like awk, except with python
syntax instead of some esoteric syntax that nobody knows.

[1] [https://github.com/alecthomas/pawk](https://github.com/alecthomas/pawk)

~~~
zwkrt
I guarantee that more people know awk syntax than pawk!

------
ineedasername
I've found the combination of sed & awk to be a really powerful duo.

One of my earliest development projects circa 2002 was to create a complex awk
& sed program that took tons of database blobs with inconsistent semi-
structured test from a publishing company and converted it to XML,
automatically prompting the user for input (using the Pico text editor)
whenever there was something that couldn't be resolved. It saved thousands of
hours as part of a metadata digitization project that got their journals into
ProQuest.

As only a semi-literate programmer at the time, it was significantly easier
than trying the same thing with a "traditional" programming language. On
occasion I still call to awk withing python code when I need to do a one-off
quick & dirty <something>

For those interested, if you can find a copy, there'a truly excellent book on
awk:

 _" The Awk Programming Language"_, by Alfred Aho, Brian Kernighan, and Peter
J. Weinberger

~~~
yesenadam
A copy:

[http://gen.lib.rus.ec/book/index.php?md5=0AD0779BF3602F1EA22...](http://gen.lib.rus.ec/book/index.php?md5=0AD0779BF3602F1EA22734B6EEFCA0EA)

Also Arnold Robbins' books _Sed and Awk_ and _Effective Awk Programming_ are
excellent, and also detail the many features added since the latest edition of
_The AWK Programming Language_ (1988).

[http://gen.lib.rus.ec/search.php?&req=awk+robbins&sort=year&...](http://gen.lib.rus.ec/search.php?&req=awk+robbins&sort=year&sortmode=DESC)

~~~
rapfaria
Buy the books, do not pirate them.

------
forinti
I use a lot of Perl and occasionally some awk. My experience is that Perl is a
lot faster (it really shows when you have millions of lines of text to chew).

For example, I compared two scripts that find the longest line in a file: Perl
took about a second when awk took 21s.

~~~
rukuu001
Same. Also, many tasks grew in complexity until I threw away my awk/sed combos
and wrote some Perl instead.

------
salgernon
I used to struggle at dtrace until I realized it was just awk for a probes
instead of regular expressions.

------
kuharich
Prior discussions:

[https://news.ycombinator.com/item?id=1738688](https://news.ycombinator.com/item?id=1738688)

[https://news.ycombinator.com/item?id=2932450](https://news.ycombinator.com/item?id=2932450)

------
norswap
I wonder. Unless you have this syntax (and their semantics) memorized so that
it just flies from your fingers, it does seem much easier to me to use
whatever your language of choice is (Python, Java, ...) to whip up a script.
Sure, awk will be terser, but in my case at least, the bottleneck will not be
typing, it will be looking up awk features and syntax.

Maybe if you do this every day though. I only have need for things like this
once every few months.

------
splittingTimes
I love awk, grep and sed.

But I am wondering: in a professional setup, are there no better tools to
analyse and interpret structured log messages out there?

------
thewhitetulip
I recently learnt awk but had a hard time figuring things out (had been trying
to learn it since 6yrs)

So I wrote a small guide which is hands on for awk

[https://github.com/thewhitetulip/awk-anti-
textbook](https://github.com/thewhitetulip/awk-anti-textbook)

------
xvilka
Awk and sed can provide a better integration with "modern" formats like
JSON/YAML/Protobuf/etc. And with corresponding tools like jq/pq/rq. It would
help to keep them popular still.

------
llacb47
403 Forbidden

------
j2kun
Why is this linking to the wayback machine? This blog post is live on the same
website at the same url.

[https://gregable.com/2010/09/why-you-should-know-just-
little...](https://gregable.com/2010/09/why-you-should-know-just-little-
awk.html)

~~~
grzm
Some people link to archive.org to future-proof against dead urls.

~~~
why-oh-why
That’s kinda backwards. One should point to the source and if that is no
longer available then fire up the Machine.

In this case the archive.org link is a 403 for me while the original works.

~~~
kazinator
I suspect you misunderstand some of the point.

When you make a posting on HN or anywhere else and cite a URL, that URL could
go down years later. Your historic posting then has a dangling URL. You can't
go back and edit old postings to fix it.

Also, some content changes. The URL is valid, but the cited material is not
there. With the archive, you can point to a specific date when that was
retrieved.

~~~
cbaleanu
I get a 403 as well, no point in linking to something unaccessible just
because reasons. Gdpr is hard but people in Europe should not be denied access
just for the sake of immutable URIs

~~~
johnisgood
Wait what? [https://web.archive.org](https://web.archive.org) is inaccessible
to you because you are European? FWIW: I am European, and it works for me.

------
jadia
I think the traffic is too much. 403 Forbidden.

~~~
gregable
Weird. The site is served by cloudflare and backed by an s3 bucket. Totally
static. You shouldn't get forbidden unless cloudflare or your isp are doing
something odd.

~~~
gregable
Oh nevermind, the original url was archive.org apparently.

------
jpxw
A lot of the examples are just reimplenting cut.

------
tempguy9999
I have, and read, a book on awk (the AWK programming language, by guess-the-
initials) and didn't see the point. If I need regexes I have emacs. For CSVs I
use excel or LibreOffice.

It has been a while so could someone remind me why awk is so fab? Perhaps it's
just I've never met the right use case to make me fall in love.

Sorry if it's a dumb question. TIA

~~~
the_jeremy
excel can't handle millions of lines. Every job (only 2, but still) I've had,
I've had a problem at least once where I'm given millions of lines and needed
to parse them for something.

That being said, I used python in those instances, but I'm willing to believe
I could've done it faster and more easily with awk.

~~~
tempguy9999
Yes, this was my feeling too, why not just use python and save the effort of
learning another language? Perhaps because you can't assume python will be
present in a basic *nix install.

