Hacker News new | comments | show | ask | jobs | submit login
Let's build a JPEG decoder: Huffman tables (imrannazar.com)
69 points by jamesmiller5 1699 days ago | hide | past | web | 13 comments | favorite



Slightly off topic, but heck.

I was lucky enough to have David Huffman as an instructor at UC Santa Cruz. Very engaging and smart guy. He got a little tired about being asked about "Huffman Coding" all the time, given it was so long in the past and he had done a number of other things.

One of the things he enjoyed talking about during office hours (if help wasn't needed) was his paper folding:

http://www.graficaobscura.com/huffman/

Gives a good example.


Huffman encoding should be a mandatory topic you can explain before you are allowed to write a web api.

I hate when people output json that's not self documenting enough because they don't understand that aReallyNiceLongNameThatDescribesTheAttribute and nam1 compress to functionally the same size once you turn on compression.

One API recently I saw had a few single letter names. I could still figure it out, but it was figuring it out instead of just reading the name.


technically your example is incorrect unless the same attribute name is used quite a few times. Regardless, premature optimisation is the root of all evil, especially in places that are highly unlikely to be a bottleneck and even more so when it's used to remove valuable documentation.


The particular example I'm referring to, it is used over 100 times.


Huffman coding alone wouldn't do a lot in that example. Huffman coding lets you build a prefix code to pack symbols into variable length sequences of bits, depending on the frequency of the symbol. A dictionary encoder, like LZ77, would replace repeated sequences by references, and that would make the two variable names equivalent. Deflate uses Huffman coding to pack the LZ77 compressed data using as few bits as possible, making it more efficient, but LZ77 is the part removing the redundancy.

I agree with the general sentiment though. Compression isn't magic, and understanding how it works helps you to work with compressed formats.

Writing your own implementation of some compression algorithms is a lot of fun to. I learned a lot from implementing Huffman coding and an LZ77 variant a few years ago.


>Huffman coding alone wouldn't do a lot in that example. Huffman coding lets you build a prefix code to pack symbols into variable length sequences of bits, depending on the frequency of the symbol. A dictionary encoder, like LZ77, would replace repeated sequences by references, and that would make the two variable names equivalent. Deflate uses Huffman coding to pack the LZ77 compressed data using as few bits as possible, making it more efficient, but LZ77 is the part removing the redundancy.

That's how gzip works, but all you have to know that gzip WILL be packed via huffman coding in a compression algorithm, and you will be able to conclude you can have a far longer variable name length with no cost.

The actual algorithm itself is a far less common topic than huffman coding.


While I do agree that web APIs should keep compression in mind, as the user of the API I don't necessarily want to type aReallyNiceLongNameThatDescribesTheAttribute either. I'd rather see a word or two than a letter, but I'd rather see a letter than a sentence.


I was more being hyperbolic. I don't want to see sentences either, but I'm okay with fullDesignSpec. No reason to call that "s"


This seems fitting.

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton


"...and off by one errors"

:)


I had a very heated argument with a friend on saterday about 0 indexed arrays. God damn I hate them.


Just stop using the word index and start using the word offset. Problem solved.

The notion that the first brick in a run has an offset of 0 as does the first upright stud in a frame is something that's been basic to bricklayers and carpenters for a very long time.

Sometimes it's natural to refer to elements by their position, or offset, other times it's natural to talk about sequence numbers or indexes.


They're not indexed. That's an offset.

Naming things. Again.

1 indexed arrays are truly indexed :D




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: