Hacker News new | past | comments | ask | show | jobs | submit login
Succincter (2008) [pdf] (mit.edu)
52 points by espeed 3 months ago | hide | past | web | favorite | 5 comments

Abstract: We can represent an array of n values from {0, 1, 2} using n log_2 3 bits (arithmetic coding), but then we cannot retrieve a single element efficiently. Instead, we can encode every block of t elements using t log_2 3 bits, and bound the retrieval time by t. This gives a linear trade-off between the redundancy of the representation and the query time.

In fact, this type of linear trade-off is ubiquitous in known succinct data structures, and in data compression. The folk wisdom is that if we want to waste one bit per block, the encoding is so constrained that it cannot help the query in any way. Thus, the only thing a query can do is to read the entire block and unpack it.

We break this limitation and show how to use recursion to improve redundancy. It turns out that if a block is encoded with two (!) bits of redundancy, we can decode a single element, and answer many other interesting queries, in time logarithmic in the block size.

Our technique allows us to revisit classic problems in succinct data structures, and give surprising new upper bounds. We also construct a locally-decodable version of arithmetic coding.

Anyone know of published experiments about porting traditional apps over to succinct data structures?

Thanks, @espeed! I did a quick browse through the paper. I'm wondering why the focus seems to be on "trits" instead of bits.

For more on trits, see Mihai's 2010 paper...

An Alternative to Arithmetic Coding with Local Decodability (2010) [pdf] http://people.csail.mit.edu/mip/papers/trits/paper.pdf

Changing base without losing space (2010) [pdf] https://dl.acm.org/citation.cfm?id=1806771 (same paper as above, different title)

"The ternary numeral system (also called base 3) has three as its base...a ternary digit (trit) is analogous to a bit. One trit is equivalent to log2 3 (about 1.58496) bits of information" [1].

NB: Base 3 is the integer base with the lowest average radix economy; however, base e has the lowest average radix economy overall [2].

[1] Ternary numeral system https://en.wikipedia.org/wiki/Ternary_numeral_system

[2] Radix economy https://en.wikipedia.org/wiki/Radix_economy

[3] Three-valued logic https://en.wikipedia.org/wiki/Three-valued_logic

Bits already have an efficient encoding in binary. ;)

Representing trits (0, 1, 2) efficiently in binary is harder. (Which is the subject of the paper. It's a great paper, and we lost a marvelous mind when Mihai died.)

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact