
Logswan – Fast Web log analyzer using probabilistic data structures - mulander
https://github.com/fcambus/logswan
======
ar7hur
If you're interested in the probabilistic approach, this is how it works:
[https://en.wikipedia.org/wiki/HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog)

"The basis of the HyperLogLog algorithm is the observation that the
cardinality of a multiset of uniformly-distributed random numbers can be
estimated by calculating the maximum number of leading zeros in the binary
representation of each number in the set. If the maximum number of leading
zeros observed is n, an estimate for the number of distinct elements in the
set is 2^n."

------
cwilkes
If anyone involved in the project is reading this the DNS entry for
"www.logswan.org", available as a link on the github page, does not exist.

~~~
fcambus
Thanks for reporting this.

Indeed, there was no site configured on logswan.org when this was posted to
HN. I made the required changes but due to the nature of DNS, it'll still
return NXDOMAIN for some users until caches are cleared.

