The technique is efficient, straightforward and has provable error bounds. It was published in 2001, so it would also be worth looking at subsequent papers that have cited it for further improvements:
I think that in your case your data (server response time?) looks good because you probably have a log-logistic or log-normal distribution.
Suppose you were working with values that are exponentially distributed which is also a reasonable hypothesis for your data.
In that case the log-binned histogram would like a plateau with the exception of the beginning and ending bins. In this scenario a linear-binning approach would probably be better.
Unfortunately, I think that there is no approach for bucketing that is good for all situations. Usually the best approach will depend on your data and also on what you are trying to analyze.
Do you really need to know the shape of the distribution, or is it sufficient to know, say, various percentiles (e.g. 50%, 75%, 99%, 99.9%). If you're looking at metrics like response time usually 99% is all you care about. If this is the case there are efficient algorithms to estimate these values from streaming data using O(1) space with high probability bounds on error.
If you just need counts indexed by some key, then the count-min sketch might be sufficient (e.g. http://www.auai.org/uai2012/papers/231.pdf)
Finally, if you really do need to know the shape of the distribution should you be considering kernel density estimation (KDE) instead of histograms (see http://en.wikipedia.org/wiki/Kernel_density_estimation)? Most people would argue you should. Looks like there are streaming KDE algorithms, though I don't know how practical they are.
In general, binning/bucketing can be seen as filtering the empirical density function of your dataset with a box filter and then sampling. The frequency response of a box filter is the sinc function, which has a lot of energy above the Nyquist of this sampling rate. Kernel density plots with a gaussian kernel, on the other hand, can be seen as filtering the empirical density function with a gaussian filter and are thus approximately bandlimited.
To be clear this isn't an area i'd consider myself all that knowledgeable. Coming up with a useful idea was an achievement.
So as i read through, comparing each proposed solution to mine, i decided i quite liked mine. Then i see that last proposal is my first idea. Great!
So over i wander to the HN comments and see that this is a common approach, i didn't just come up with a nice neat solution, i probably learned it, forgot it, and regurgitated it from my subconscious.
That sure doesn't feel as nice as the sparkly feeling i had just 3 minutes ago...
OTOH it's extra difficult because when we did a first implementation we didn't we didn't have a full understanding of the shape of the problem.
That said, the challenge with online (streaming) data is that you have to constantly estimate the distribution you are sampling from. Someone here might know better =)
Might not meet your memory or computation constraints, but grouping is such an wide topic.
Sure, it's an estimate so you'd have some error term but you could pass that on to the new buckets. If the total error is small enough in practice it wouldn't be a problem.