This is one of those topics where my mind starts racing. How many optimal, distinct networks per order are there (ie ignoring trivial exchanges in the sequence of comparators)? What if you have additional constraints, like maybe limited temporary storage (ie not all inputs can be loaded at once)?
This isn't my field, but it's fun just trying to think about it.
Those are interesting questions! For example generating and looking at all networks of optimal size which have distinct circuits (assuming unlabeled inputs, so that rearranging those doesn't count as distinct) is something I still want to do at some point. Extending my search to produce this wouldn't be too complicated... but I haven't found the time for this so far.
Computing s(11) and s(12) together took just below 5 hours on a 24 core server, but it also required almost 200 GB of RAM. Verifying the computation took about 3 days in total on the same server.
Computing s(9) and s(10) with the same approach takes under a second and a few megabytes, so while this is much faster than the previous approach (the one I described in the blog post) it still grows very quickly.
I don't know exactly how much s(13) would take, but extrapolating from the number I have for smaller n, my estimate is that it would take over 20000 TB of RAM and take proportionally longer. Distributing the computation would also slow it down a lot as all cores are doing frequent random accesses to all the used memory.
So s(12) seems to be the end of what's possible with this exact approach, but I think some of the theory I developed might be usable beyond.
Nice piece of work, if a bit esoteric (or maybe that's the best part!).
I was struck by this paragraph, towards the end (edited to remove copying
artifacts):
>> This is not quite yet the full approach used by Codish et al. Instead of
using set inclusion to compare output sets, they define a relation called
subsumption. Remember that above, where we introduced sorting networks, we noted
that unconditional exchanges do not count towards the networks size as we can
always rearrange the network to eliminate them. Therefore for two candidate
sorting network prefixes A and B for which there is a sequence of unconditional
exchanges X such that outputs(AX)⊆ outputs(B), the same extension argument can
be made to show that outputs(AXS) ⊆ outputs(BS) for any suffix S. In this case A
subsumes B, and again there is no need to keep the candidate B around, as for
any sorting network BS, the network AXS will also be a sorting network of the
same size. Note that any sequence of unconditional exchanges corresponds to a
permutation and vice versa, so effectively this exploits symmetry under
permutation.
This is almost precisely the definition of subsumption between first-order
clauses, defined by Gordon Plotkin's thesis (a foundational piece of work in
Inductive Logic Programming, ILP) from 1971 [1]. A more modern definition is as
follows:
Let C, D be two clauses (considered as sets of literals interpreted as a
disjunction). C ≼ D (read "C subsumes D") iff there exists a substitution θ of
the variables in C such that Cθ ⊆ D.
This is a central result in ILP and logic programming also because it turns out
that if C ≼ D, then C |= D, read C entails D, or D is true whenever (in any
interpretation where) C is true. This is a generalisation relation that can be
exploited to prune a search space for logic programs that must entail a set of
positive examples and not entail a set of negative examples. These mathematics
of generalisation are the foundation of one of the main branches of ILP and set
ILP apart from machine learning approaches that are based on optimisation.
To make an analogy, reading the above paragraph from the article was, for me, as
surprising as I would imagine it would be for a phycisist reading an article on
sorting algorithms and finding that there is a computer science concept called
"general relativity" -and that is defined as E = mc² to boot.
So I was curious to see if the sorting network concept and the logic programming
concept are connected in some way, other than the obvious derivation of
"subsumes" from "subset".
I looked briefly for the original definition of "subsumption" in the context of
sorting networks, but could only find a definition in the 2014 Codish et al.
paper cited in the article ("Sorting Networks: to the End and Back Again"):
The second notion involves permutations. Given two comparator networks C and C′
on n channels, we say that C subsumes C′, denoted C ≼ C′, if π(outputs(C)) ⊆
outputs(C′) for some permutation π of {1,...,n}. In this situation, if C′ can be
extended to a sorting network, then so can C (within the same size or depth),
which allows C′ to be removed from the search space.
Not only does this definition use the same notation as Plotkin's subsumption,
where A ≼ B is read as _A_ subsumes B (and not, as it may come more natural, the
other way around) but the relation is also used to prune a search space. In cross-applied ILP
parlance, we would speak of a refinement operator that replaces C with its
specialisations.
It is interesting to note that while much work in ILP has used the subsumption
relation to prune a search space (or order it, or bind it, etc) Plotkin himself
proposed instead to construct the least general generalisation of two clauses,
which is a unique object that can be constructed without a search. Without an
expensive search. I now wonder if this approach can be applied to sorting
networks. But, I only wonder a little. After all, this is not my field :)
I'm still intrigued by the similarities of definitions and notations between the
two fields. Anyone knows where they come from?
If you track this down, note it's an ancient text, type-written with symbols
added in by hand and difficult to read. I'm still not done with it. There's two
shorter papers that get right to the point:
Codish et al. have a background in SAT solving, which is somewhat related to logic programming. In SAT solving subsumption of clauses usually doesn't involve any substitutions. That's the notion of subsumption I was familiar with before reading Codish et al. Given that, extending subsumption to handle the symmetries inherent in the problem felt like a natural extension as it maintains all the relevant properties, but I also wouldn't be surprised if this was inspired by ILP.
>> In SAT solving subsumption of clauses usually doesn't involve any substitutions.
I guess not, because SAT is propositional so just substituting variables for terms would yield a ground propositional clause. Actually I wasn't even sure we say "clause" in propositional logic. Anyway Plotkin's subsumption is really based off Robinson's unification algorithm and it makes a lot more sense in that context and in the context of resolution theorem proving. Or I'm just hopelessly stuck forever thinking in the first order.
Do you know if Codish et al. were based on an earlier paper for their definition of subsumption? It really sounds like they are simply describing it, rather than introducing it as a new concept.
Edit: of course, the obvious question in my mind now is whether it would be possible to learn sorting networks using one of the ILP approaches that build on Plotkin's work and don't have to perform an expensive search of a large hypothesis space.
The paper where they introduce subsumption is "Michael Codish, Luís Cruz-Filipe, Michael Frank, and Peter Schneider-Kamp. 2016. Sorting nine inputs requires twenty-five comparisons. Journal of Computer and System Sciences 82, 3 (May 2016), 551–563." and I just realized that I lost the actual citation while editing my blog post, fixed that now.
This is one of those topics where my mind starts racing. How many optimal, distinct networks per order are there (ie ignoring trivial exchanges in the sequence of comparators)? What if you have additional constraints, like maybe limited temporary storage (ie not all inputs can be loaded at once)?
This isn't my field, but it's fun just trying to think about it.