
Ask HN: How would you improve this (C) code? - juliangoldsmith
I&#x27;ve been working on some C code to do video encoding for a while, and I&#x27;m curious as to how I might improve it.<p>I&#x27;d be interested to hear what you all think of my Huffman coding implementation, and what I could do to make it better: https:&#x2F;&#x2F;github.com&#x2F;julian-goldsmith&#x2F;uChat&#x2F;blob&#x2F;master&#x2F;huffman.c<p>I&#x27;d also welcome any comments on the rest of the code, particularly lzw.c.  (The rest is available at https:&#x2F;&#x2F;github.com&#x2F;julian-goldsmith&#x2F;uChat .)
======
bjourne
There seem to be edge case bugs when the data is empty or consists of only one
character repeated ("aaaaaaa..."). I haven't run the code though. A good way
to avoid making such bugs is to write more unit tests, because then your mind
goes into a state where it actively looks for corner cases. The end result is
more robust code all around.

Other small things you might want to consider is being more consistent with
the types. E.g you are using int and unsigned int in places where size_t
probably would be better. Using unsigned short for the frequencies array will
cause bugs on specially crafted input data.

I think you also should consider what your goal is. If it is speed, then you
would develop your code on way, if it is memory efficiency another and, if it
is readability of the source code a third.

------
mamaniscalco
Encoding can be greatly speed up by constructing an array of the codes for
each symbol (left justified) plus the code length. During encode you look up
the code for the current symbol, shift it to align with the current output bit
position, bitwise or it into the output stream and adjust your output position
according to the code length. Similarly an array of the codes can be used to
decode quickly as well using a binary search. Avoid using tree traversal when
encoding and decoding to greatly increase performance.

Also, Huffman is fast becoming obsolete with the introduction of ANS.

~~~
juliangoldsmith
Thank you, this is exactly the sort of feedback I was looking for.

And that's a nice find on the ANS. I hadn't heard of that before, but will
have to look into it.

~~~
mamaniscalco
I'll push a high permormance Huffman to github for reference later on today
and post a link here.

For ANS info start with this link that leads to a thread by its creator:

[https://encode.ru/threads/2078-List-of-Asymmetric-Numeral-
Sy...](https://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-
implementations)

------
blackflame7000
Run it through PVS-Studio or some other static analysis tool

