
Show HN: Mawkdown – (Toy) Markdown Parser in Awk - rethab
https://github.com/rethab/mawkdown
======
asicsp
Nice. Haven't gone through it fully, but the header parsing stood out for
improvement. Use match to capture number of '#' characters and use length, for
example:

    
    
        $ echo '# ' | awk 'match($0, /^#+ /, m){print length(m[0])-1}'
        1
        $ echo '### ' | awk 'match($0, /^#+ /, m){print length(m[0])-1}'
        3
    

You can also use capture groups so that you do not need -1 and remove that
substr as well.

    
    
        awk 'match($0, /^(#+) (.+)/, m){l=length(m[1]); print "<h" l ">" m[2] "</h" l ">"}'

~~~
rethab
Thanks for the hint. I started with the intention of staying within the limits
of "traditional" AWK and only resorted to using match (which is only available
on GAWK, for those who don't know) to parse links and images.

As you'll read on, you'll certainly find more areas for improvements, because
this is pretty much based on an idea I had in the shower and then typed it out
in a hour :)

~~~
asicsp
As far as I know, 'match' function is part of POSIX spec for awk. Only the
third array argument is specific to gawk. So, this should work for any awk.

    
    
        awk 'match($0, /^#+ /){l=RLENGTH-1; print "<h" l ">" substr($0,RLENGTH+1) "</h" l ">"}'
    

I checked it on [https://awk.js.org/](https://awk.js.org/) and it did work

------
khm
Would it be ok to use elements of this to improve the one we ship with Werc?

[http://code.9front.org/hg/werc/file/2ace198c631b/bin/contrib...](http://code.9front.org/hg/werc/file/2ace198c631b/bin/contrib/md2html.awk)

