Hacker News new | past | comments | ask | show | jobs | submit login
GNU Awk 5.0 (lists.gnu.org)
229 points by Aissen on April 16, 2019 | hide | past | favorite | 49 comments



In 2016 I fuzzed awk and found a bunch of segfaults. Sadly it seems still not fixed:

      $ ./bin/gawk -V | head -n 1
      GNU Awk 5.0.0, API: 2.0

      $ ./bin/gawk 'for (i = ) in steve kemp rocks'
      gawk: cmd. line:1: for (i = ) in steve kemp rocks
      gawk: cmd. line:1: ^ syntax error
      gawk: cmd. line:1: for (i = ) in steve kemp rocks
      gawk: cmd. line:1:          ^ syntax error
      gawk: cmd. line:1: fatal error: internal error
      Aborted (core dumped)
I'll have to forward bugs from the Debian-tracker to the maintainers directly I guess.


I don't know Steve. Can't leave you alone with anything :-)

Nicely found!


Did you try to contact directly Alfred, Peter or Brian? :)


I reported to the Debian bug-tracker where I assumed it would either be forwarded upstream, or be searched by the project authors:

https://bugs.debian.org/816277

I appreciate that most of the time reporting directly is the right thing to do, but often there is a good relationship between packager and developer and things flow well in both directions.

I guess I should make sure I chase old reports and reroute if they seem to be orphaned/ignored/overlooked.


From https://www.gnu.org/software/gawk/: "Bug reports and feature suggestions for gawk should be sent to bug-gawk@gnu.org."

Maybe try sending an email? I agree that reporting to the Debian packager often works, but in this case it's possible they just have too many things to triage appropriately.


Dropped a mail there now, thanks.

Edit: Too late I see it was already reported before the release and apparently wasn't regarded as significant:

http://lists.gnu.org/archive/html/bug-gawk/2019-04/msg00015....

I guess that means no fix.


I know from a "getting bugs fixed" perspective it's important to make sure that things are being fixed, or followed up on... but geeze, is it ever a time sink.


The oldest bug report I think I've made is now reaching 15 years old.

Sometimes I go through things, but its easy to think you've done your part if you submit a bug report, a reproducer, and even a patch.


> even a patch.

I'd expect that once a patch is there and it is consistent it would be applied, whereas just reporting what a fuzzer finds is not too helpful.


This really should be easier. Every distro has their own bugzilla, some projects use github issues, a lot of bug tracking just happens on mailing lists, it's difficult to keep track of everything. Maybe we need federated bugzilla.

By the way, we've all seen the XKCD about competing standards before.


Did you know fossil has a built-in issue tracker? https://www.fossil-scm.org/xfer/doc/trunk/www/bugtheory.wiki


Congratulations and thank you! Namespaces are a big help for me personally.

From the GNU Awk 5 manual: A qualified name consists of a two simple identifiers joined by a double colon (::). The left-hand identifier represents the namespace and the right-hand identifier is the variable within it.

Namespaces are a big help for writing libraries, especially for clients with systems (e.g. banking, government, education, medical) where awk code is fine yet other kinds of languages (e.g. python, go, perl) are not available and/or require management audit practices.


Thanks for pointing to the Awk 5 manual (in the tar archive, I haven't seen it online still).

I've written the previous comment asking "why namespaces based" only on the directly online accessible info. Now having had the time to download and unpack the sources and read the documentation there, it sounds much better than it appeared based on the announcement.

Long term it shouldn't make gawk less compatible than it already is. Gawk already has gawk-specific extensions, and that would be one. It seems that the command-line example is the only "breaking" aspect. And even that could be theoretically modified to be non breaking, but it's hopefully seldom enough used to not affect too much.

And the "namespaces" is in fact a "module" infrastructure extension, introduced specifically to support the construction of module-like libraries (specifically solving the problem of exporting the limited number of variables from the module).

I especially like the writing of the author of Gawk manual, I've really enjoyed it even almost thirty years ago.


Why would awk not require audit and code management practices vs those other languages in a controlled environment?


I took it to mean that getting support for those languages would require management audit practices i.e. "why are we installing a new package on all of our servers?". In that case, awk would be better since the gnutils are much more likely to already be installed and in use. The resulting awk code would still probably fall under code management.


Go, you mentioned, does not require package to be installed. It compiles to a single, native binary.


If the place you worked at was backwards enough you may not be able to install a Go compiler on your desktop. Anything is possible when the people making the rules are disconnected enough from the people getting the work done.


This. If I want anything installed it requires survey from IT and Legal and can take months.

If it is Java or C# and already in-use, it is pretty simple. If I wanted to install a Go compiler, it could get complicated. Not having Admin rights really slows down an organization. Especially if you're someone who likes to use multiple technologies (right tool for the right job). IT hates that. Note that I don't write production tools, just things to get my job done quicker (automation scripts, data analysis...etc).


always assumed all those organizations stayed in the 90s

not figuratively


Probably because awk is available as a stock build of the approved OS. It's been around a long time. I mean, to be POSIX-compliant, you need an awk interpreter in your system https://pubs.opengroup.org/onlinepubs/9699919799/utilities/a...


I remember the days when I was working in a data company and one of the product check tools was a 2000 line GNU awk script with some korn shell in the mix. It was both scary and awesome at the same time..


I wonder if the new namespaces functionality will break compatibility for users. I guess I always assumed gawk was feature-complete, or at least a mature enough tool that I wouldn't expect compatibility breaks.

    11. Namespaces have been implemented! See the manual.  One consequence of this
        is that files included with -i, read with -f, and command line program
        segments must all be self-contained syntactic units. E.g., you can no
        longer do something like this:
    
            gawk -e 'BEGIN {' -e 'print "hello" }'


It's not entirely clear, but I believe that some of these changes, namely namespacing, may impact compatibility with POSIX awk[1] (as shipped on BSD, macOS, etc. al.). Historically, GNU awk has been a strict superset of "standard" awk. I for one am concerned that this may impact portability of awk based scripts between platforms using gawk and other implementations.

It could be that I have missed a detail and this isn't really the case; I also certainly don't want to discourage progress, but there is something to be said for being able to write an awk program once and be confident it will work on any POSIX-ish system, rather than only those with a GNU userland.

1 - http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/


Amazing how long gawk has been around, and still with the same maintainers.

My port of gawk to MacOS in 1990/91 was my first active involvement in open source. I was rather crushed when told that my patches would not get accepted, because RMS at the time had a Fatwa against accepting MacOS code in GNU projects, and, though the gawk maintainers were quite civil about it, it was clear that they would not go against RMS' wishes.

Shortly afterward I discovered perl, so I never did all that much awk, and when a few years ago, I decided a 5 line awk script was the most elegant way to solve a configuration problem, my coworkers vetoed the plan.


Does anyone know a good awk tutorial or book?

I kind of skipped over it in my journey into the world of Unix... I usually either (bash shell scripting, tools like cut and grep + pipes and redirection) or use python.


https://news.ycombinator.com/item?id=13451454

The Aho and Kernighan book is amazing.


I have a rule of thumb that I'll buy and read any book which Kernighan is an author of regardless of subject. He's more or less the gold standard of technical writing for me.


Cool, thanks for the suggestion! Are the basics pretty much unchanged? I see that edition is from 1988. I like free stuff, but I don't mind paying if you think a more recent edition is worth the $.


_The AWK Programming Language_ ( https://smile.amazon.com/AWK-Programming-Language-Alfred-Aho... ) is one of the best programming books, on any language, in my opinion. Worth reading even if you don't use awk. In less than 200 pages it covers an introduction to the language, through to implementing a relational database, recursive-descent parsing, and graph-based algorithms.

For gawk, the manual covers the gaps between the language introduced in the book and the latest implementation.


The 6 page summary in the 7th edition manual turned out to be comprehensive while still being approachable. I knew next to nothing about awk, not counting some snippets found on the web, after reading through this document twice awk feels like home.

https://9p.io/7thEdMan/v7vol2b.pdf (starting on page 105)


Assuming the syntax is $name::spaces it could look like Awk borrowed something from Perl, which is a funny full circle.


Congrats! Long live {,g}awk !


Differences from gawk 4.2.1 are not available; they would be too large.

That's my kind of changelog. I, too like to live dangerously.


> Bug reports should be sent to address@hidden

Will do.


Release notes are a bit weird. It's like an author is shy of actual major changes: NAMESPACES and hides them behind mentions of c99 compliance and test infrastructure :-/


Arnold Robbins is a busy dude. He maintains awk, bash, gnu grep and several other must have packages.

A little slack would be nice.


Actually, Chet Ramey maintains GNU Bash, and Jim Meyering maintains GNU Grep. As far as I can tell, Arnold Robbins "just" maintains GNU awk. Don't get me wrong, I don't want to trivialize this at all, just correct the record.


[flagged]


> Is it just because Arnold Robbins is a woman?

Arnold and his family would be surprised to hear that.

https://www.oreilly.com/pub/au/459


Also, if one person really did maintain all those things and was overworked, my reaction wouldn't be "Wow, you do so much!" but "Why haven't you found other people to help out so you're not overworked?" All this software is too important to be left to people who don't have time for them.


> "Why haven't you found other people to help out so you're not overworked?"

Sadly it does not rain maintainers everywhere, especially when the role does not pay.


But isn't that the state of a large percent of the essentials of modern life? Especially when they don't make money.


Wow, I had no idea those things were maintained by the same guy! That's amazing! Is there somewhere one can contribute to his development efforts? I guess just donate to GNU Foundation?


How so? It is actually one of the very first things mentioned:

> This is a major new release, with new or improved features, including namespaces. The relevant part of the NEWS file is appended below.

So you would be happier to have the item as number 1 rather than 11 in the list further down?

> 11. Namespaces have been implemented! See the manual. One consequence of this is that files included with -i, read with -f, and command line programsegments must all be self-contained syntactic units. E.g., you can no longer do something like this:

         gawk -e 'BEGIN {' -e 'print "hello" }'
I found the announcement rather concise and to the point.

Apart from that it would have been nice with a reference to what section of the manual to look at. Especially as https://www.gnu.org/software/gawk/manual/ still references 4.2 (will probably be updated soon).


Perhaps the changes were listed in the chronological order of version control history, as is often the case with release notes, as opposed to any subjective notion like significance.


I would rather say “who asked for namespaces in awk”?

Who needs that and what’s the point? Any compatibility breakage in awk should start wit “negative points” so what are the positive ones to even qualify it as desired?



Something I just learned: there is an awk debugger! From the above page: “Beginning with version 4.0 in 2011, gawk provides an awk-level debugger: dgawk, which is modeled after GDB. This is a full debugger, with breakpoints, watchpoints, single statement stepping and expression evaluation capabilities.”


There are no details there. Just “it’s needed” claim.

Brian Kernighan is also cited there, and also not convinced.

I’m in a good company, at least.


It seems to have some breaking changes and non-user facing improvements though. I'm not sure major changes are needed but it should allow them more easily now.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: