Hacker News new | past | comments | ask | show | jobs | submit login

Is the probability density actually uniform? Once all million numbers have been processed (which is the only case that matters - anything with <1M numbers has a strictly smaller encoding than it would if you tacked a bunch of zero deltas on to the end to reach 1M) , you're guaranteed that the 'average' delta value will be <= 100.

This means that even in a worst-case input set, "low" delta values will be substantially more common than "high" ones.




I think in the worst case it can be uniform: you can have an average delta value of 100 where the values range uniformly from 0 to 200. I think that case is fine, as you can still encode 200 in less than eight bits, but I'm worried that some worst-case number spacing will waste bits in any encoding scheme chosen, and that wastage will be enough to ruin the result. You have to pick an encoding scheme in advance but you don't know the distribution. Say you encode using 8-bit words. When the high bit is not set, the word ranges from 0-127. When the high bit is set, what follows is the full 27 bits. This will fail for certain distributions. For example, say a quarter of the numbers are spaced 257 apart and the rest are spaced 47 apart. You'll go over budget. Perhaps an adaptive scheme can be proven to work in all cases, but I don't know what that scheme is. Or maybe there's a simple scheme which can be proven for all distributions.


Arithmetic coding avoids the "wastage" of entropy you're talking about entirely - there are no "waste bits" in the middle of the stream like there would be with a fixed-size encoding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: