I recently published my ebook on GNU awk one-liners [1]. It starts from the basics of awk syntax and then discusses one-liner examples. There's a chapter on regular expressions as well. The github repo has the details on how to get the PDF version, all the example files and code snippets used in the book, sample chapters as well the markdown source used to generate the PDF.
I made all my ebooks [2] free last month amidst the pandemic fears. These include GNU grep & ripgrep, GNU sed and three books on regular expressions (Python, Ruby, JavaScript).
I'd appreciate your feedback and hope the books are useful. Happy learning :)
I always like when classic lean-and-mean Unix tools get attention, as opposed to big language ecosystems. Speaking of which, is there a particular reason why only gawk is covered? gawk has only very minor enhancements to POSIX awk, and gawk isn't even the default in many place. For example, Debian uses, or used to use, mawk as default, while the BSDs and Mac OS have nawk. I think the point of awk is that it's portable, and introducing gawkisms in your program not only makes it non-portable, but also would make it impossible to run on mawk, which is much much faster for eg. log file analysis. Might not matter all that much for one-liners, though.
Agree about portability issues. I cover only gawk, because I don't know all the differences between various versions. This started as a chapter in my command line text processing repo, where I cover various tools. I had come across various posts on stackoverflow/unix.stackexchange about implementation differences. I use Ubuntu, so I made a choice of sticking to GNU/Linux to make my life simpler.
I'm not sure about your point saying "only very minor enhancements". When I posted about my book on reddit, I got this comment [1] noting feature differences.
while echo -n '] ' ; do read a; awk 'BEGIN{print '"$a"'}' ; done
It’s a calculator; type in something like 2 + 2 and it will give you 4. Since standard AWK has advanced math functions like log, it’s a full blown scientific calculator.
The only tricky part is that you hit Ctrl + C (not Ctrl + D) to exit it.
gawk is pretty much everywhere now that awk sees substantial use, and presents rather great enhancements over POSIX awk[1]. I usually don't think that GNU tools make much sense to focus on, but gawk is something that's, generally-speaking, worth doing so. Especially given that it runs on almost everywhere under the sun.
“It runs everywhere” (gawk) is very different from “it’s installed out of box everywhere” (awk). The latter can be used in provisioning scripts, shell script libraries/modules, etc. where the former usually can’t.
(Not saying gawk isn’t significantly better than awk for some tasks.)
> It is very rare that you see a system that awk is frequently used on that doesn't have GNU awk installed.
Wait, what? Ubuntu server and macOS both have awk preinstalled (which is true for any POSIX) and gawk not preinstalled. Those are what I (and many other developers) spend 95% of my time on. And I do use awk frequently.
Edit: Okay, /usr/bin/awk is actually gawk on Ubuntu, so I was wrong about that. Still, macOS isn't very rare.
How many OS X users do you think are actively using awk in any of its variants? What percentage of total awk usage is on OS X, do you think? OS X may not be rare, but it seems highly unlikely that many OS X systems see frequent usage of awk.
To be fair, /usr/bin/awk was mawk for a long time with Ubuntu. I remember getting in a heated discussion with GAWK’s maintainer that he should allow a way for [A-Z] to never match lower case letters (it does in some locales) so that we didn’t have to use stuff like [[:upper:]] and [[:lower:]], which do not work in mawk.
The claim under discussion in this deep subthread is "it is very rare that you see a system that awk is frequently used on that doesn't have GNU awk installed", so this is irrelevant.
I don't see Unix tools as lean and mean, but as brittle and slow. It's never just a "one liner", you end up piping text data through dozens of programs with inconsistent unstructured interfaces.
I don't understand the reverence for these tools. I think it's a bunch of junk that somehow can't get replaced because it's not quite bad enough and it's "already installed" everywhere.
I tried making it scalable, but unfortunately, the server sockets in gawk don't set SO_REUSEPORT. So, I can't fork usable children. It does work if you use LD_PRELOAD tricks, or edit the gawk binary to change SO_REUSEADDR to SO_REUSEPORT, but both are pretty hacky.
If gawk would separate the listen() and accept() calls out, you could do a lot more with their server socket code.
I want a Perl one. I never saw the need to do AWK because I knew Perl fairly well.
In saying that the youngsters should learn some of these old school tools. Python is a nice language but the regexes are crap acompared to Perl. I always need to look up the documentation. Perls are built in, clean and concise.
(caturing, groups) = string =~ /regex/
I remember that having not touched Perl in a while. I miss it.
Have yet to learn Perl, but I've frequently not seen it preinstalled on systems, and additionally you sometimes need CPAN to be able to run scripts. awk might not be as powerful, but at least you know it's small, self-contained, and likely to be available in some form on most systems. That's part of its value I think... likely a consequence of not trying to do as much.
Someone else already asked for perl one-liners in this thread. I started with command line text processing repo [1] about three years back. That has a chapter on grep/sed/awk/perl/ruby one-liners along with many other tools. I may convert perl one-liners to a book as well later.
Python's default 're' module does indeed lack many features, but there's 'regex' third party module that would be easier to adapt for perl users.
There is an available replacement for Python's standard re library, regex[1], which adds a long list of features and enhancements. It is too little known IMO, despite having existed and been continually maintained and enhanced for nearly a decade.
Yeah, these are troubling times for sure. I was about half way done with the book when things became serious in my country early March. I did think about delaying the release but then made the opposite decision. I made all my books free, released markdown sources for them and then published this book early (cutting down on some topics, no exercises yet, etc).
I started with command line text processing repo [1] about three years back. That has a chapter on grep/sed/awk/perl/ruby one-liners along with many other tools. I may convert perl one-liners to a book as well later.
The thing about cookbooks is that you rarely find something that matches what you need in the moment - you really need to learn and use these tools regularly and then you'll be prepared without documentation.
Some others have asked for an epub version too. I have this article [1] bookmarked, so if it goes well, I'll add epub as well this month. If you know pandoc, you could use the markdown source in the repo to generate a basic version and see if it works with your reader.