 Cool write up! I am a huge fan of Nim. I use it as a replacement for small scripts, big scripts, and even real code at work, to interface with other languages that interface with C too.I see in your post that you are about to look into Nim/C interop; have a look at nimterop[1]. It has been immensely useful to me for wrapping Nim libraries around C code exported from Matlab and to talk with C API for SystemVerilog.Finally, you blog post shows a snippet of Python code, but also show to the world how the Nim code looks [I know that you are sharing the link to the git repo containing the Nim code, but still] :)
 Thanks for the link! I will certainly be checking that out! I'm sure there will be a follow up with Nim code front and center :)
 big.tsv is basically the same line repeated 2 000 000 times:`````` \$ awk \ 'BEGIN{for (i=0; i<2000000; i++){print "abcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123"}}' \ > big.tsv \$ cat big.tsv |sort|uniq -c 2000000 abcdef ghijk lmnop qrstuv wxyz1234 ABCDEF HIJK LMNOP QRSTUV WXYZ123 `````` You can first sort all the lines, and count them. The first column is now the number of instances of that line.The idea is that if 2 lines are identical, the number of "count++" emitted will be exactly the same.I took your gawk code, but I am starting the for loop at 2 instead of 1, and I removed the -F'\t' option.`````` \$ time cat big.tsv |sort | uniq -c \ | gawk '{for (i=2; i <= NF; i++) {if (index(tolower(substr(\$i, 1, 3)), "bc") != 0) {count += \$1}}}END{print count}' 4000000 real 0m0.724s user 0m0.605s sys 0m0.322s `````` Edit: added backslashes to split lines.

