the article is painfully verbose, and doesn't even get into what the objective function is! (it just gives a high level description of what dynamic programming is) And the paper is worse, droning on the idea for 82 pages!! The idea is interesting, but its on the level of a programming puzzle.
The idea:
If there are k bins in a histogram. The "fitness" of a bin is
N[k] ( logN[k] − logT[k])
N[k] is the number of elements in bin k
T[k] is the width of the bin k
The fitness of a histogram is the sum of the fitnesses of all the bins. The larger the above number, the better. The optimal segmentation of the bins can be found by a simple recursion on the elements,
(Best Histogram for points 1..n)
= max_j { (Best Histogram for points 1..j) + (Fitness of final bin from j...n) }
Finally, the formula has a simple interpretation. If we think of the histogram as a probability density function, we can be said to be trying to minimize the KL divergence between the histogram's distribution and the empirical distribution.
> Finally, the formula has a simple interpretation. If we think of the histogram as a probability density function, we can be said to be trying to minimize the KL divergence between the histogram's distribution and the empirical distribution.
Can you expand on that? I mean it does look like the KL divergence from N to T, but you're maximizing it so that doesn't really make sense. According to the paper it's the maximum log likelihood of the samples assuming a piecewise constant rate.
Thanks, I was surprised when the article just ended without going into further detail how the dynamic programming recursion looks like or how you define a good fit for a bin.
Is there a downside to using adaptive bin-sizes (other than complexity or making them possibly harder to interpret)?
If I understand the theory correctly the adaptive bin-sizes give a better fit than a fixed step-size (since you're choosing the optimal partition, and the adaptive partitions include all possible fixed step partitions).
You can of course use the same criterion to choose an optimal step-size, this is fairly easy to do with just brute force.
The idea:
If there are k bins in a histogram. The "fitness" of a bin is
N[k] ( logN[k] − logT[k])
N[k] is the number of elements in bin k
T[k] is the width of the bin k
The fitness of a histogram is the sum of the fitnesses of all the bins. The larger the above number, the better. The optimal segmentation of the bins can be found by a simple recursion on the elements,
(Best Histogram for points 1..n) = max_j { (Best Histogram for points 1..j) + (Fitness of final bin from j...n) }
Finally, the formula has a simple interpretation. If we think of the histogram as a probability density function, we can be said to be trying to minimize the KL divergence between the histogram's distribution and the empirical distribution.