Proving 50-Year-Old Sorting Networks Optimal: Part 1

magicalhippo · on May 6, 2021

Fun stuff, and a very nice post!

This is one of those topics where my mind starts racing. How many optimal, distinct networks per order are there (ie ignoring trivial exchanges in the sequence of comparators)? What if you have additional constraints, like maybe limited temporary storage (ie not all inputs can be loaded at once)?

This isn't my field, but it's fun just trying to think about it.

jix · on May 6, 2021

Those are interesting questions! For example generating and looking at all networks of optimal size which have distinct circuits (assuming unlabeled inputs, so that rearranging those doesn't count as distinct) is something I still want to do at some point. Extending my search to produce this wouldn't be too complicated... but I haven't found the time for this so far.

jix · on May 6, 2021

Author here, in case anyone has questions

YeGoblynQueenne · on May 6, 2021

I do! See my comment posted a bit after yours, regarding the relation between "subsumption" in your field and mine :)

Edit: nice work btw!

jix · on May 6, 2021

I replied to your other comment :)

thethirdone · on May 6, 2021

How long did it take to compute s(12)? How much harder would it be to compute s(13)?

jix · on May 6, 2021

Computing s(11) and s(12) together took just below 5 hours on a 24 core server, but it also required almost 200 GB of RAM. Verifying the computation took about 3 days in total on the same server.

Computing s(9) and s(10) with the same approach takes under a second and a few megabytes, so while this is much faster than the previous approach (the one I described in the blog post) it still grows very quickly.

I don't know exactly how much s(13) would take, but extrapolating from the number I have for smaller n, my estimate is that it would take over 20000 TB of RAM and take proportionally longer. Distributing the computation would also slow it down a lot as all cores are doing frequent random accesses to all the used memory.

So s(12) seems to be the end of what's possible with this exact approach, but I think some of the theory I developed might be usable beyond.

YeGoblynQueenne · on May 6, 2021

Nice piece of work, if a bit esoteric (or maybe that's the best part!).

I was struck by this paragraph, towards the end (edited to remove copying artifacts):

>> This is not quite yet the full approach used by Codish et al. Instead of using set inclusion to compare output sets, they define a relation called subsumption. Remember that above, where we introduced sorting networks, we noted that unconditional exchanges do not count towards the networks size as we can always rearrange the network to eliminate them. Therefore for two candidate sorting network prefixes A and B for which there is a sequence of unconditional exchanges X such that outputs(AX)⊆ outputs(B), the same extension argument can be made to show that outputs(AXS) ⊆ outputs(BS) for any suffix S. In this case A subsumes B, and again there is no need to keep the candidate B around, as for any sorting network BS, the network AXS will also be a sorting network of the same size. Note that any sequence of unconditional exchanges corresponds to a permutation and vice versa, so effectively this exploits symmetry under permutation.

This is almost precisely the definition of subsumption between first-order clauses, defined by Gordon Plotkin's thesis (a foundational piece of work in Inductive Logic Programming, ILP) from 1971 [1]. A more modern definition is as follows:

Let C, D be two clauses (considered as sets of literals interpreted as a disjunction). C ≼ D (read "C subsumes D") iff there exists a substitution θ of the variables in C such that Cθ ⊆ D.

This is a central result in ILP and logic programming also because it turns out that if C ≼ D, then C |= D, read C entails D, or D is true whenever (in any interpretation where) C is true. This is a generalisation relation that can be exploited to prune a search space for logic programs that must entail a set of positive examples and not entail a set of negative examples. These mathematics of generalisation are the foundation of one of the main branches of ILP and set ILP apart from machine learning approaches that are based on optimisation.

To make an analogy, reading the above paragraph from the article was, for me, as surprising as I would imagine it would be for a phycisist reading an article on sorting algorithms and finding that there is a computer science concept called "general relativity" -and that is defined as E = mc² to boot.

So I was curious to see if the sorting network concept and the logic programming concept are connected in some way, other than the obvious derivation of "subsumes" from "subset".

I looked briefly for the original definition of "subsumption" in the context of sorting networks, but could only find a definition in the 2014 Codish et al. paper cited in the article ("Sorting Networks: to the End and Back Again"):

The second notion involves permutations. Given two comparator networks C and C′ on n channels, we say that C subsumes C′, denoted C ≼ C′, if π(outputs(C)) ⊆ outputs(C′) for some permutation π of {1,...,n}. In this situation, if C′ can be extended to a sorting network, then so can C (within the same size or depth), which allows C′ to be removed from the search space.

Not only does this definition use the same notation as Plotkin's subsumption, where A ≼ B is read as _A_ subsumes B (and not, as it may come more natural, the other way around) but the relation is also used to prune a search space. In cross-applied ILP parlance, we would speak of a refinement operator that replaces C with its specialisations.

It is interesting to note that while much work in ILP has used the subsumption relation to prune a search space (or order it, or bind it, etc) Plotkin himself proposed instead to construct the least general generalisation of two clauses, which is a unique object that can be constructed without a search. Without an expensive search. I now wonder if this approach can be applied to sorting networks. But, I only wonder a little. After all, this is not my field :)

I'm still intrigued by the similarities of definitions and notations between the two fields. Anyone knows where they come from?

____________________

[1] https://era.ed.ac.uk/handle/1842/6656

If you track this down, note it's an ancient text, type-written with symbols added in by hand and difficult to read. I'm still not done with it. There's two shorter papers that get right to the point:

A note on inductive generalisation:

https://www.semanticscholar.org/paper/A-Note-on-Inductive-Ge...

A further note on inductive generalisation:

https://www.semanticscholar.org/paper/A-Further-Note-on-Indu...

jix · on May 6, 2021

Interesting :)

Codish et al. have a background in SAT solving, which is somewhat related to logic programming. In SAT solving subsumption of clauses usually doesn't involve any substitutions. That's the notion of subsumption I was familiar with before reading Codish et al. Given that, extending subsumption to handle the symmetries inherent in the problem felt like a natural extension as it maintains all the relevant properties, but I also wouldn't be surprised if this was inspired by ILP.

6ardamu · on May 6, 2021

A notion of subsumption was also studied in the context of QBF solving. https://doi.org/10.1007/978-3-642-24372-1_14

YeGoblynQueenne · on May 6, 2021

>> In SAT solving subsumption of clauses usually doesn't involve any substitutions.

I guess not, because SAT is propositional so just substituting variables for terms would yield a ground propositional clause. Actually I wasn't even sure we say "clause" in propositional logic. Anyway Plotkin's subsumption is really based off Robinson's unification algorithm and it makes a lot more sense in that context and in the context of resolution theorem proving. Or I'm just hopelessly stuck forever thinking in the first order.

Do you know if Codish et al. were based on an earlier paper for their definition of subsumption? It really sounds like they are simply describing it, rather than introducing it as a new concept.

Edit: of course, the obvious question in my mind now is whether it would be possible to learn sorting networks using one of the ILP approaches that build on Plotkin's work and don't have to perform an expensive search of a large hypothesis space.

jix · on May 6, 2021

The paper where they introduce subsumption is "Michael Codish, Luís Cruz-Filipe, Michael Frank, and Peter Schneider-Kamp. 2016. Sorting nine inputs requires twenty-five comparisons. Journal of Computer and System Sciences 82, 3 (May 2016), 551–563." and I just realized that I lost the actual citation while editing my blog post, fixed that now.

YeGoblynQueenne · on May 6, 2021

Thanks. I thought there was a citation missing! I'll have a look at the paper.