Hacker News new | past | comments | ask | show | jobs | submit login
An Awk Implementation in C99 (raygard.net)
107 points by asicsp 49 days ago | hide | past | favorite | 17 comments



The second edition of The AWK Programming Language by Aho, Kernighan, and Weinberger was just released in February:

* https://awk.dev

ToC and preface of the new edition:

* https://awk.dev/pp1-13.pdf


I just finished reading Kernighan & Pike's The Unix Programming Environment this weekend, and it has a lot of awk. I read it more as a history book (it was published 40 years ago!), but the awk parts were a highlight, second only to the long penultimate chapter building your own calculator/programming language. It kind of made we want to learn more than just the basics I've been using for almost 25 years. But it also recalled Larry Wall's intro in Learning Perl, and how Perl started as a "better awk". That intro is really a direct response to K&P's book I think. So I don't know. I loved Perl in those days, and I agree about how not everything is doable as a shell pipeline. But a second edition makes me want to read it anyway. :-)


It was in October or November, I think. The book doesn't add much to the first edition, I was quite deceived.


It's a second edition, not a new book -- why did you feel deceived? There's a fair bit of restructuring of the first few chapters, and a whole new chapter on exploratory data analysis (which uses the new CSV support) and a bunch of new stuff about the new Unicode-character support.


Then certainly that my expectations were too high from what I'd read in the reviews, I was expecting a lot of changes like the second edition of the dragon book. You can see that some programs were obviously written to get around the memory limitations of the computers of the time, such as the assembler and emulator for an imaginary machine.


2nd Edition != Volume 2


I've chatted with Ray (the author of wak -- love the name) a few times about our respective AWK implementations. He's a nice guy and a really thoughtful programmer, and he even found several edge-case bugs in my GoAWK implementation: https://github.com/search?q=repo%3Abenhoyt%2Fgoawk+raygard&t...


I have a mirror of an awk-related gopher hole from an awk fan as sadly the former domain it's offline. Here's the github repo, but the author has nice addons such as a universal awk library with missing functions for awk (tan, bitwise ops...) https://git.sr.ht/~luxferre/

Luckly, the AWK library it's used for a project:

https://git.sr.ht/~luxferre/DALE-8A/blob/master/tgl.awk

Also, there's a solitaire written in POSIX AWK:

git://git.luxferre.top/nnfc.git

His homepage:

https://luxferre.top


Cool. Awk is a nice tool for semi-simple record-at-a-time processing.

As far as I can tell, this implementation is much slower than other awks. Most awk implementations use a non-backtracking regex implementation (as do Rust and Go). By contrast, this seems to be using a backtracking implementation (e.g., like Perl and Python). For simplicity that makes sense, but for some regexes this kind of implementation can be many orders of magnitude slower.

My guess is that this slowdown doesn't matter for a typical toybox user, where small code size is valued. Thanks for the link!


Looks like it uses the standard POSIX regular expression functions - regcomp() and regexec().


Do you mean for the parser or the engine that does the awking?

I was looking over the POSIX spec to see what's so difficult about parsing it and just kind of wondering.


You can write a Z-Machine interpreter in awk, a Tetris and with Gawk, even a gopher client.


Always good to see some love for the awk language. I, however l,went down the rabbit hole to find out about toybox and some background. Came back just puzzled (260 slides of slidedeck only for the message that it has a more permissive licence).


It’s also less of a grab-bag of random imported software than busybox, the tools are mostly written in a single style with some care taken to use the common library. (Making one utility smaller and making the whole multicall binary smaller are different goals.) The advantage is there’s much less redundant or disproportinately oversized stuff. The disadvantage is that most things need to be written anew, so there’s less stuff in general. (In particular, the author has spent a lot of time writing a shell and it’s still not ready).


Oblique, but since this topic might be catnip for people who'd know:

Anyone aware of a Python parser for the Awk language?

I found a number of Awk-alikes implemented in Python, but they all sounded like they took enough liberties that I wouldn't expect them to have a parser worth trying to lean on to parse wild Awk scripts.


You would probably get reasonable results from simply feeding the POSIX grammar to lrparsing[1], but unfortunately it doesn’t look like there’s much support in it for lexer feedback (aka the “lexer hack”), which you’ll need for the regular expression syntax. It’s likely possible to hack that in, but I don’t know how much work it’d be.

[1] https://lrparsing.sourceforge.net/


Is there a Tree Sitter grammar for AWK? That could be a good way to go, via the Python bindings for Tree Sitter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: