Show HN: Jacinda, a functional Awk (text stream processing on the comamnd-line)

jdp · 2024-05-11T11:34:46

Nice work with the guide, the bevy of examples makes it easy to digest.

The colon being used in multiple contexts is tricky. As I was scanning the examples I found postfix `:` doing type conversion like in `(%)\. {%/Apple/}{`3:}` and then I was wondering what it does when it has nothing on its left-hand side, like in `[(+)|0 [:1"x]`. Then I noticed that the [ were unbalanced in the latter example, and eventually figured out that `[:` is its own operator separate from `:` and the middle `[` had nothing to do with function syntax.

dan-robertson · 2024-05-11T11:40:54

Is there a succinct summary of what one gains from this being ‘functional’? I find the succinctness of regular awk to be a good advantage, and it feels like some of that comes from it being non-functional.

When I think about how I use awk, I think it’s mostly something like:

  awk '!a[$2]++' # first occurrence of each value in the second field

Or

  awk '{a[$2]+=$3} END {for(x in a) print x, a[x]}'

Or just as an advanced version of cut. A fourth example is something that is annoying to do in a streaming way but easy with bash: compute moving average of second field grouped by third field over span of size 20 (backwards) in first field.

  awk '{ print $1, $3, 1, $2; print $1+20, $3, -1, -$2}' | sort -n | awk '{ a[$2]+=$3; b[$2]+=$4; print $1, $2, b[$2]/a[$2] }'

The above all feel somewhat functional as computations – the first is a folding filter, the second a fold, the third a map, and the fourth is a folding concat map if done on-line or a concat map followed and a folding map as written.

The awk features that feel ‘non-functional’ to me are less the mutation and more operations like next, or the lack of compositionality: one can’t write an awk program that is, in some sense made of several awk programs (i.e sets of pattern–expr rules) joined together. That compositionality is the main advantage, in my opinion, of the ‘functional’ jq, which feels somewhat awk-adjacent. Is there some way to get composition of ja programs without falling back to byte streams in between?

dan-robertson · 2024-05-11T12:39:00

Ok, the thing I hadn’t worked out when I wrote the above is that this is also APL-like, which is pretty fun. It feels like the main missing feature is some way to do grouping, though I’m not sure. I tried translating my first two examples but I didn’t try executing them so they mightn’t work:

  (->2)"(->1)~.*{|`2.`0}
  "(\g. g.((+)|0 {g=`2}{`3})) ~. $2

I think I might also be missing something about the ` operator: the first example feels a bit strained because it first needs to put fields in a tuple then extract from the tuple. I feel like I want $0 to give a list of some records that can be converted to strings but which can also have fields extracted. Then the example might look like (`1)~.*$0. I don’t know if the $0 could be implicit too. Another horrible feature would be integers to be coerced into functions if needed – take the nth field from a record/list/tuple.

I’m not sure I really understand the language (and I definitely don’t understand the implementation!) but it seems pretty interesting. Perhaps a better motto than “functional awk” is “streaming APL with type checking (and type classes)”?

gleenn · 2024-05-11T02:13:50

Maybe spell check that title

asicsp · 2024-05-11T03:22:40

See also: https://github.com/gelisam/hawk - Transform text from the command-line using Haskell expressions. Similar to awk, but using Haskell as the text-processing language.

PurelyApplied · 2024-05-11T17:45:39

[flagged]

riedel · 2024-05-11T21:23:50

What do you mean by an explicit parser? Regexps are basically grammars for finite automata (typically with some more features like lookahead etc). Haven't used parser generators in a while, but is there anything better for general use today?

maxbond · 2024-05-11T19:36:35

Regexes at the command line are usually about writing a quick throwaway pipeline that you tweak into working once in a specific case. It doesn't have to be a correct or reliable implementation, it just has to get the job done correctly in this instance. Special cases you may have encountered but didn't can be ignored.

If you are doing sometime repeatedly, especially if it's automated, then yes, if possible you should write/adopt a real parser and add error handling and such.

ETA: A notable exception is file paths and similar. Regexes are a totally acceptable way to parse those (modulo issues like escaping user input).

burntsushi · 2024-05-11T18:06:27

The link provided is bad. (It was never good, so it's likely just a mistake on the author's part.) Googling for "rust regex library" will take you to the right place. https://docs.rs/regex/latest/regex/

What do you use instead of grep on the CLI? Or do you "trust regex" when using grep?

PurelyApplied · 2024-05-11T20:08:44

Oh, I fixed the link. It was easy enough to figure out. It never bodes well when an author fails to proof their summary line, though.

pvg · 2024-05-11T22:33:01

It never bodes well

People make mistakes, links rot, etc. It's not a sensible generalization and there's this thing:

Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead.

https://news.ycombinator.com/newsguidelines.html

maxbond · 2024-05-11T20:46:31

To be fair, it's very difficult to proof your own writing. You can look over the same mistake 3 times and mentally read in the correct word. It's hard to buy this betrays a lack of care when the author clearly put a lot of work into writing the code and documentation.

Part of the point of publishing open source software is that many eyes make bugs (and documentation errors) shallow.

burntsushi · 2024-05-12T01:10:10

A simple typo mistake means nothing. Someone whinging about it as if it means more than it does on the Internet however...