Hacker News new | past | comments | ask | show | jobs | submit login
The State of the Awk (2020) (lwn.net)
105 points by xrayarx on Jan 19, 2023 | hide | past | favorite | 43 comments



In a previous discussion about awk, the user comex made a wish that I strongly desire as well:

"I wish Awk had capture groups. It would fit in so well with typical Awk one-liners to be able to say:

    awk '/foo=([0-9]+)/ { print $1 }'
although I suppose the syntax would have to be different since $1 has a meaning already."

I use Awk all the time, and the new additions in the article are pretty nice; but for typical uses of Awk, the feature comex wants would make a tremendous difference in usability.


  awk 'match($0, /foo=([0-9]+)/, g) { print g[1] }'
works in gawk (using extended match syntax allowing captured groups in the 3rd parameter array).


ripgrep can do it! :)

    rg 'foo=([0-9]+)' --replace '$1'
or more succinctly:

    rg 'foo=([0-9]+)' -r '$1'
Example:

    $ echo 'quux=123 foo=123 bar=123' | rg 'foo=([0-9]+)' -r '$1' 
    quux=123 123 bar=123
Named groups work too:

    $ echo 'quux=123 foo=123 bar=123' | rg 'foo=(?P<digits>[0-9]+)' -or '$digits'
    123
You can also replace the whole match by combining --replace with --only-matching:

    $ echo 'quux=123 foo=123 bar=123' | rg 'foo=([0-9]+)' -or '$1'
    123
Of course, I understand capturing groups are useful to have in awk when you're using awk. ripgrep can only handle very simplistic cases. But they tend to be quite common.


I've needed these capabilities often while using awk for converting messy logs/error outputs into tables/commands.

Nowadays I like the nushell approach to the composition:

    echo 'quux=123 foo=123 bar=123' | str replace '.*quux=([0-9]+).*foo=([0-9]+).*' $"$2,$1" | from csv -n | each {|r| $r.column1 + $r.column2}
which of course relies on the same regex library (hattip).


It can't be comfortably used as part of a larger Awk script.


Yes, as I acknowledged generally in the end of my comment.


Anything too big to be coded with AWK should be done in Perl. Nothing bad about that. Perl it's almost a sed/awk/sh replacement when shell scripting based code gets clunky.


Why? Perl has cryptic syntax, while AWK has syntax that is easily understood by anyone that has experience with common programming languages like C, C++, Java.

If you find unknown syntax construct in Perl code, it is hard to identify what it is, but if you find unknown function in AWK, you can just google its name.

Also, AWK is mandated by POSIX, so one can assume it is installed, while Perl is optional.


Perl and awk are on the same level for cryptic syntax IMO, unless Perl isn't abused (fully utilized)


> Perl has cryptic syntax,

Perldoc perlintro. Have you seen real life Perl outside of oneliners?


IMHO the nice thing about sed&awk is that a quick cheatsheet that covers most usage fits nicely on one page. Perl, in contrast, is more a 'proper' language and the Camel book is a pretty chunky tome. If you're going through all the trouble of learning a language in that space, you might as well go with something like python. Though admittedly python isn't a very good sed/awk replacement.


Who said you had to learn the entirety of Perl? I've been using Perl as an awk replacement for soon five years and I barely know a lick of Perl outside of what I need to use it as awk. If you use it that way, it's a much smaller language.

(That said, I do still have the tabs of perldata, perlobj, perlmod, and perlop open because I want to learn it better.)


What does this look like in practice? Does Perl have a similar control flow syntax to awk?


It actually does (to an extent). There are BEGIN and END blocks (not used that much in regular scripting but they exist). There are also a bunch of perl command flags that can make things more awk-ish or sed-ish.

This random blog post gives something of the flavor.

https://lifecs.likai.org/2008/10/using-perl-like-awk-and-sed...


There's one fairly obvious error in that article: the fields are $F[0] etc, not @F[0]. Otherwise yes, that's a good way to make Perl look like awk. I personally translate awk concepts to more idiomatic Perl (I had no idea about -MEnglish, for example) but the concepts are still the same.


You're right, but it will still work as written.


A capture group is not "too big", sed has them. It's a reasonable feature request for a small tool.

sh, awk and sed are fine. They are easy, small, powerful tools that are easy to compile and understand.

Perl, python, nushell, etc. The options listed here are great if you're writing cute snippets on the terminal or hacking together some higher level automation.

These more elaborate tools, however, are terrible if you're trying to be lean in the build/bootstrap process and have a small set of auditable, easy-to-compile tools.

The graph on this page illustrate the bootstraping problem well: https://bootstrappable.org/projects/mes.html

These small 50+ years UNIX tools that have zero build dependencies are around for a reason: they are small and have zero build dependencies.


> awk '/foo=([0-9]+)/ { print $1 }'

The equivalent Perl isn't that far off from the above.

perl -nE '/foo=([0-9]+)/ && say $1'


I don't regularly use either language, but I prefer to use Ruby for this: https://robm.me.uk/2013/11/ruby-enp/

Example from that article

    $ echo "foo\nbar\nbaz" | ruby -ne 'BEGIN { i = 1 }; puts "#{i} #{$_}"; i += 1'
output

    1 foo
    2 bar
    3 baz


Speaking of sed/awk/sh, in this case precomposing the awk with a sed preprocessor (in a sh pipeline) will give you both data structures (awk) and capture groups (sed).


Maybe, but what comex and I are talking about isn't too big to be coded in Awk; it is a very (extremely) common pattern that Awk just makes more difficult than it needs to be.


Absolutely nothing should be done with Perl, it is 2023.

If you're jonesing for a sed replacement (it even has capture groups) try: https://github.com/chmln/sd

Worth being aware of:

https://github.com/theryangeary/choose

https://github.com/nushell/nushell


So obscure tools that are not packaged even in Debian?


22k stars is not really that obscure not that it matters. I didn't realize we are forever limiting ourselves to tools written in the 70s and 80s my apologies.

Would you consider trying something other than Perl once it is no longer packaged for you? Because it is an x-language, bereft of life.


Use the tools you like. It's nobody's place to tell you what tools to use, and the reverse of that is also true.

For a lot of us, we want to develop scripts (and skills) that are portable across different environments. There are limited hours in a workday, and I get the most value out of learning (and using) tools that I can find on the servers / build-machines / workstations that I have to use every day. Those machines run Ubuntu, Debian, Rocky, and RHEL.

Yes, it's a slow process to get new packages into mainstream distros. That's not a bad thing, because those packages have to be maintained for a very long time. Stability is a virtue here.

There are some ecosystems (I'm looking at you, Javascript) where anything older than a year might as well be abandonware. It's great that there are some fast-moving areas in our industry, and also great that there are slow-moving areas.

Don't make the mistake of assuming something is bad just because it's feature-complete. You might be surprised at how feature-rich something like Awk really is.

If your argument boils down to "Awk and Bash are ugly and outdated", I'd encourage you to think more flexibly about the tools you choose. There's nothing wrong with learning the basics of a widespread tool that you are guaranteed to find anywhere.


Perl is not bad because it is feature complete. Perl might even be good, except that it is dead.

> if your argument boils down to "Awk and Bash are ugly and outdated"

Not at all. However, Awk was written almost 50 years ago. The awk book is good and it is very true that the thing is installed most everywhere so if you're already invested in it and it does everything you want, keep using it of course. But it just might be possible to improve on a tool after half a century.

I simply shared new tools, that if you like awk, might really be up your ally. They might not be packaged with your favorite package manger anytime soon but you can grab them with cargo - if that's not portable enough so be it. No harm no foul.


Perl comes with OpenBSD's base. With pledge(4) and unveil(4) support.


i like awk. but i hate Regexp (YMMV). but for different perspective check out SNOBOL or its speedier variant Spitbol. although not one liner, but i find patterns easier to compose like functions. $anchor = 0 digits = '0123456789' patt = "foo=" SPAN(digits) . num while line = INPUT if line ? patt OUTPUT = num endif end

end

NOTE1: reading variable INPUT. reads from input. assignment to OUTPUT writes to output. normally assigned to STDIN and SDTOUT, and can be configured.

NOTE2: there is no real while or if. flow is through labels and jumps (considered by some as power but i disagree). above example uses another script to transform while and if into labels and jumps. the point is that patterns composition and success match assignment.


i. missed up the code

  digits = '0123456789'
  patt = "foo=" SPAN(digits) . num
  while line = INPUT
      if  line ? patt
          OUTPUT = num
      endif
   end


Why not use a perl one-liner?


AWK is a great tool, but I found it lacking when working with CSV files where the separator is found in quoted fields [0]. In this case, European number formatting was the problem, tripping up Sqlite's import.

I resorted to pandas, where the CSV import has parameters for the thousands and decimal separators.

Also see AWK HN post from 2021 [1].

[0] https://earthly.dev/blog/awk-csv/

[1] https://news.ycombinator.com/item?id=28707463



The frawk implementation (not quite POSIX, but extremely fast) and my GoAWK implementation both have an "-i csv" option that puts it in "input mode CSV". More here: https://benhoyt.com/writings/goawk-csv/

  $ cat quoted.csv
  "Smith, Bob",42
  $ awk -F, '{ print $1 }' quoted.csv
  "Smith        # you want to print the first field: Smith, Bob
  $ goawk -i csv '{ print $1 }' quoted.csv
  Smith, Bob    # that's better!


gawk's FPAT is sufficent for any CSV file without newlines in the fields. I've never run into a CSV with newlines in the fields in the wild.


Of course, the most important story related to AWK is that of Peter Weinberger's face.

https://en.wikipedia.org/wiki/Peter_J._Weinberger

https://spinroot.com/pico/pjw.html


I always enjoy seeing what a good awk & sed user can achieve in bash.

However one of the not so good Devs I worked with used awk to load a large, deeply hierarchical JSON file. They refused to use a library to parse JSON. It was a many hundreds of line monstrosity.

Luckily when they left we were able to parse it in JQ instead .


Previous discussions:

* https://news.ycombinator.com/item?id=23240800 (213 points | May 19, 2020 | 86 comments)

* https://news.ycombinator.com/item?id=25142867 (207 points | Nov 18, 2020 | 58 comments)


Because reasons, gawk fizzbuzz (nearly) one-liner:

  #!/usr/bin/env gawk -f
  
  BEGIN {
      for (i=1;i<=100;i++) {
          printf(" %2s", i%(3*5)!=0 ? i%5!=0 ? i%3!=0 ? i : "fizz" : "buzz" : "fizzbuzz\n" )
      }
      printf("\n")
  }


If you want to see a real awk expert ask chatGPT to write a script that allows you to query columns foo and bar in 100 csv files where the column header is anywhere in column 1 or 100 and the header may start between line 1 or 20. All my dirty excel data can be handled so easily now.


Somebody should give out a price for best awk usage... ... though it might be awk-ard.


Terrible pun, but A(WK) for effort!


Clearly someone needs to make a Rust implementation called RAWK





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: