
Verifying a Sorting Algorithm - lorenzhs
https://4z2.de/2016/02/07/verifying-sorting-algorithms
======
dalke
What about using a counting or layered Bloom filter? They also gives a
probabilistic confirmation that the sorted list contains the same elements as
the unsorted one.

Another solution, if the goal is to verify the sort algorithm, is to augment
the sort data with position information (which isn't used in the sort), then
use that to confirm that each of the original elements is used.

~~~
lorenzhs
That would be another option but you would have to evaluate several hash
functions per element, which is a lot more expensive than a few
multiplications _and_ would use a lot more space.

What do you want to gain by augmenting the data with position information?
Wouldn't you just have to check whether the positions array is a permutation
of (1, ..., n)?

~~~
dalke
> a lot more expensive than a few multiplications

It's "t" times the few multiplications, because each one gives a 75% success
factor. This is equivalent to doing multiple hash functions.

> and would use a lot more space

Well, yes and no. A Bloom filter is space-efficient, so it depends on what you
compare it to.

If you have both lists in memory then each verification test can be run in
constant memory.

But suppose you sort in-place? This cuts your overall memory use in half. You
can use some of that saved space to construct the Bloom filter, then sort,
then construct and compare the result.

It's also possible to compute all "t" values first, then sort in-place, then
compare. This is another sort of signature. I don't know enough to tell if
this will require more or less space than an equivalently powerful Bloom
filter.

> "What do you want to gain by augmenting the data with position information?"

It's a linear time check, rather than O(n log n). But at a cost of n * log(n)
bits (or n*32 bits if we assume 32 bit indices.)

~~~
lorenzhs
A hypothetical optimal bloom filter-like data structures needs -log₂ ε bits
per key as noted in [1]. For the absolutely arbitrary value ε=0.25 that's 2
bits per key (so 2n bits), and you need that t times. Or if you do it directly
with a higher success probability, say, 10^-6 then it's 20 bits per element.
That's more than half of the input size, assuming 32 bit keys. The approach I
wrote about uses O(log₂ p) = O(log₂ (n/ε)) bits, or put differently: a
constant number of integers. Bloom filters are awesome (and I know HN loves
them) but this isn't a good application for them.

Sorting in-place would be a neat trick, you'd have to store the t integers,
but as I showed t=30 is more than enough. That's 30 ints, again way less than
a Bloom filter.

Re: position information, I'd consider that linear space too (measured in
machine words). It's a good idea but keep in mind that if the input was in a
random order, these are n accesses to completely random memory locations. Due
to the way virtual memory is implemented (TLB misses are commonly resolved
using a B-Tree data structure to prevent timing attacks), they actually take
closer to log n time. There's an older paper that shows that this takes
roughly the same time as sorting the input. I believe it was in [2] somewhere.
Huge pages might mitigate this nowadays.

[1]
[https://en.wikipedia.org/wiki/Bloom_filter#Alternatives](https://en.wikipedia.org/wiki/Bloom_filter#Alternatives)
\- citation goes to Pagh, Anna; Pagh, Rasmus; Rao, S. Srinivasa (2005), "An
optimal Bloom filter replacement", SODA'05, pp. 823–829

[2] [http://www.cs.le.ac.uk/people/rraman/rahman-
thesis.pdf](http://www.cs.le.ac.uk/people/rraman/rahman-thesis.pdf) PhD thesis
of Naila Yasmeen Rahman. The PDF resists my attempts to do full-text search.
It might have been "Analysing cache effects in distribution sorting" from
WAE'99\. I'm sorry that I can't be more specific right now.

~~~
dalke
It makes sense (to me now, in retrospect) that a Bloom filter would require
more memory than a characteristic fingerprint because the Bloom filter
supports element tests in addition to equivalency. Thanks for working through
the details!

Regarding "Due to the way virtual memory is implemented", the essay does not
use that memory model. It says a binary search take O(n log n) time, but it's
actually O(n log n) essentially random lookups. If we assume log n lookup time
then the binary search is better described as O(n (log n) (log n)), yes?

~~~
lorenzhs
In the RAM model, a single binary search takes O(log n) accesses. If nothing
is specified, this is the model that's commonly used to analyse algorithms,
and one of its assumptions is that access to any memory location takes
constant time. That this might not be true on actual machines is one of the
shortcomings of that model, so you might say that it takes O(log² n) _cycles_
on a certain machine. But the accesses also converge to the position where the
element is, so the last couple of accesses will be in pages the TLB knows and
that will even be in cache, so they'll take only a couple of cycles. Analysing
this gets very complicated very quickly.

In the end, the RAM model is just that: a model. No model will ever cover all
the peculiarities of the hardware that things are actually implemented on, as
that would be nearly impossible and in any case too complex to work with
during analysis. You just have to be aware of its limitations and do lots of
benchmarks if you want to maximize performance :)

