In a previous discussion about awk, the user comex made a wish that I strongly desire as well:
"I wish Awk had capture groups. It would fit in so well with typical Awk one-liners to be able to say:
awk '/foo=([0-9]+)/ { print $1 }'
although I suppose the syntax would have to be different since $1 has a meaning already."
I use Awk all the time, and the new additions in the article are pretty nice; but for typical uses of Awk, the feature comex wants would make a tremendous difference in usability.
Of course, I understand capturing groups are useful to have in awk when you're using awk. ripgrep can only handle very simplistic cases. But they tend to be quite common.
Anything too big to be coded with AWK should be done in Perl. Nothing bad about that. Perl it's almost a sed/awk/sh replacement when shell scripting based code gets clunky.
Why? Perl has cryptic syntax, while AWK has syntax that is easily understood by anyone that has experience with common programming languages like C, C++, Java.
If you find unknown syntax construct in Perl code, it is hard to identify what it is, but if you find unknown function in AWK, you can just google its name.
Also, AWK is mandated by POSIX, so one can assume it is installed, while Perl is optional.
IMHO the nice thing about sed&awk is that a quick cheatsheet that covers most usage fits nicely on one page. Perl, in contrast, is more a 'proper' language and the Camel book is a pretty chunky tome. If you're going through all the trouble of learning a language in that space, you might as well go with something like python. Though admittedly python isn't a very good sed/awk replacement.
Who said you had to learn the entirety of Perl? I've been using Perl as an awk replacement for soon five years and I barely know a lick of Perl outside of what I need to use it as awk. If you use it that way, it's a much smaller language.
(That said, I do still have the tabs of perldata, perlobj, perlmod, and perlop open because I want to learn it better.)
It actually does (to an extent). There are BEGIN and END blocks (not used that much in regular scripting but they exist). There are also a bunch of perl command flags that can make things more awk-ish or sed-ish.
This random blog post gives something of the flavor.
There's one fairly obvious error in that article: the fields are $F[0] etc, not @F[0]. Otherwise yes, that's a good way to make Perl look like awk. I personally translate awk concepts to more idiomatic Perl (I had no idea about -MEnglish, for example) but the concepts are still the same.
A capture group is not "too big", sed has them. It's a reasonable feature request for a small tool.
sh, awk and sed are fine. They are easy, small, powerful tools that are easy to compile and understand.
Perl, python, nushell, etc. The options listed here are great if you're writing cute snippets on the terminal or hacking together some higher level automation.
These more elaborate tools, however, are terrible if you're trying to be lean in the build/bootstrap process and have a small set of auditable, easy-to-compile tools.
Speaking of sed/awk/sh, in this case precomposing the awk with a sed preprocessor (in a sh pipeline) will give you both data structures (awk) and capture groups (sed).
Maybe, but what comex and I are talking about isn't too big to be coded in Awk; it is a very (extremely) common pattern that Awk just makes more difficult than it needs to be.
22k stars is not really that obscure not that it matters. I didn't realize we are forever limiting ourselves to tools written in the 70s and 80s my apologies.
Would you consider trying something other than Perl once it is no longer packaged for you? Because it is an x-language, bereft of life.
Use the tools you like. It's nobody's place to tell you what tools to use, and the reverse of that is also true.
For a lot of us, we want to develop scripts (and skills) that are portable across different environments. There are limited hours in a workday, and I get the most value out of learning (and using) tools that I can find on the servers / build-machines / workstations that I have to use every day. Those machines run Ubuntu, Debian, Rocky, and RHEL.
Yes, it's a slow process to get new packages into mainstream distros. That's not a bad thing, because those packages have to be maintained for a very long time. Stability is a virtue here.
There are some ecosystems (I'm looking at you, Javascript) where anything older than a year might as well be abandonware. It's great that there are some fast-moving areas in our industry, and also great that there are slow-moving areas.
Don't make the mistake of assuming something is bad just because it's feature-complete. You might be surprised at how feature-rich something like Awk really is.
If your argument boils down to "Awk and Bash are ugly and outdated", I'd encourage you to think more flexibly about the tools you choose. There's nothing wrong with learning the basics of a widespread tool that you are guaranteed to find anywhere.
Perl is not bad because it is feature complete. Perl might even be good, except that it is dead.
> if your argument boils down to "Awk and Bash are ugly and outdated"
Not at all. However, Awk was written almost 50 years ago. The awk book is good and it is very true that the thing is installed most everywhere so if you're already invested in it and it does everything you want, keep using it of course. But it just might be possible to improve on a tool after half a century.
I simply shared new tools, that if you like awk, might really be up your ally. They might not be packaged with your favorite package manger anytime soon but you can grab them with cargo - if that's not portable enough so be it. No harm no foul.
i like awk. but i hate Regexp (YMMV). but for different perspective check out SNOBOL or its speedier variant Spitbol. although not one liner, but i find patterns easier to compose like functions.
$anchor = 0
digits = '0123456789'
patt = "foo=" SPAN(digits) . num
while line = INPUT
if line ? patt
OUTPUT = num
endif
end
end
NOTE1: reading variable INPUT. reads from input. assignment to OUTPUT writes to output. normally assigned to STDIN and SDTOUT, and can be configured.
NOTE2: there is no real while or if. flow is through labels and jumps (considered by some as power but i disagree). above example uses another script to transform while and if into labels and jumps. the point is that patterns composition and success match assignment.
AWK is a great tool, but I found it lacking when working with CSV files where the separator is found in quoted fields [0]. In this case, European number formatting was the problem, tripping up Sqlite's import.
I resorted to pandas, where the CSV import has parameters for the thousands and decimal separators.
The frawk implementation (not quite POSIX, but extremely fast) and my GoAWK implementation both have an "-i csv" option that puts it in "input mode CSV". More here: https://benhoyt.com/writings/goawk-csv/
$ cat quoted.csv
"Smith, Bob",42
$ awk -F, '{ print $1 }' quoted.csv
"Smith # you want to print the first field: Smith, Bob
$ goawk -i csv '{ print $1 }' quoted.csv
Smith, Bob # that's better!
I always enjoy seeing what a good awk & sed user can achieve in bash.
However one of the not so good Devs I worked with used awk to load a large, deeply hierarchical JSON file. They refused to use a library to parse JSON. It was a many hundreds of line monstrosity.
Luckily when they left we were able to parse it in JQ instead .
If you want to see a real awk expert ask chatGPT to write a script that allows you to query columns foo and bar in 100 csv files where the column header is anywhere in column 1 or 100 and the header may start between line 1 or 20. All my dirty excel data can be handled so easily now.
"I wish Awk had capture groups. It would fit in so well with typical Awk one-liners to be able to say:
although I suppose the syntax would have to be different since $1 has a meaning already."I use Awk all the time, and the new additions in the article are pretty nice; but for typical uses of Awk, the feature comex wants would make a tremendous difference in usability.