
The Traveling Salesperson Problem - merusame
http://nbviewer.ipython.org/url/norvig.com/ipython/TSPv3.ipynb
======
danso
It's amazing to me that someone of Norvig's stature still finds time and joy
in tackling "common" CS questions and not just writing his own code, but going
the extra mile by doing it in a presentation format (ipython notebook) that's
designed to be reader and replication-friendly...nevermind creating a fleshed
out narrative of the process.

A list of his notebooks here:
[http://norvig.com/ipython/](http://norvig.com/ipython/)

His explanation of the Cheryl's birthday brain teaser, which was posted here
last week:
[http://nbviewer.ipython.org/url/norvig.com/ipython/Cheryl.ip...](http://nbviewer.ipython.org/url/norvig.com/ipython/Cheryl.ipynb)

edit: One of my favorite bits: Norvig even takes time to describe his thought
process behind what most people would consider the most basic object-oriented
design: how to construct a simple Point class, and then explains how that will
affect his design and implementation of subsequent functions and routines:
[http://nbviewer.ipython.org/url/norvig.com/ipython/TSPv3.ipy...](http://nbviewer.ipython.org/url/norvig.com/ipython/TSPv3.ipynb#Representing-
Points-and-Computing-distance)

------
j2kun
Here's a more or less complete list of references for the known approximation
and inapproximability bounds for TSP and its special subproblems.

[http://cstheory.stackexchange.com/questions/9241/approximati...](http://cstheory.stackexchange.com/questions/9241/approximation-
algorithms-for-metric-tsp)

One thing to note in particular is that general TSP has _no non trivial
approximation guarantee._ Norvig's article might not stress this strongly
enough, the guarantees he describes really depend on the triangle inequality.
At least, I think the gap between what's possible for these two versions is
deep enough to mention.

Another important note is that the TSP subproblem Norvig attacks, Euclidean
TSP, has much better theoretical guarantees than a 2-approximation. In fact,
the problem has what's called a polynomial time approximation scheme (for a
fixed dimension) allowing one to efficiently compute an approximation that is
arbitrarily close to optimal. The runtime of the algorithm is something like n
(log(n))^c where c depends on both the accuracy and the dimension.

------
Xcelerate
It's amazing to see all of this described in such detail. Very nice
presentation!

Also, I'm going to mildly hijack this thread with a question since I know
people familiar with this kind of problem will be looking at the comments ;)

The TSP is essentially trying to find the correct permutation of an ordered
list of cities (up to a cycle) that minimizes the total distance traveled.

I have a somewhat similar problem. Given a set of vectors (in R^3), I am
trying to find a way to generate a permutationally invariant representation of
these vectors _that is also_ rotationally invariant. The rotational invariance
is easily solved by constructing the Gram matrix of the set of vectors (each
element M_ij in the matrix is the inner product between vector x_i and vector
x_j). Of course, the Gram matrix is overcomplete now (redundant), so I take
the Cholesky decomposition to get a unique representation that has 3 less
degrees of freedom (all the rotational degrees) than the original set.

For permutational invariance though, I'm having a heck of a time. I've been
extensively searching the academic literature on this topic, and maybe I'm
just not using the right search terms, but I can't find anything. Recently
I've been reading about symmetric groups and trying to figure out if there's
some kind of linear basis that I can construct its elements out of, but I'm
not having much luck. Anyone have any ideas?

If that problem is too hard, consider this simpler problem: Given a metric
between two sets of vectors (of the same order), I want to find the
permutation of the second set that minimizes this metric. So basically, min(||
A - P A' P' ||), where ||X|| denotes the metric, and P is a permutation
matrix. I feel like there is some kind of relation between this question and
the TSP, but I can't quite figure out what.

~~~
psb217
This is just the problem of forming automorphism-invariant encodings of a
graph, but extended to permit weighted graphs.

First, given the Gram matrix for your vectors, interpret its entries (read
from left-to-right and top-to-bottom) as defining a cumulative sum.

Next, order your vectors such that the values in the order-induced cumulative
sum are maximal for as many of the partial sums as possible, compared with the
cumulative sums induced by any other possible ordering. This produces the same
"canonical ordering" and hence the same "canonical Gram matrix" for any set of
vectors that have the same collection of pair-wise relationships, as measured
by dot-products.

For graphs with binary adjacency matrices this can be stated more simply as
sorting the vertices such that the resulting adjacency matrix, when read
l-to-r and t-to-b, produces the largest binary number. This encoding uniquely
identifies the automorphism group of the encoded graph (i.e. all isomorphic
graphs produce the same encoding).

If you were willing to represent your pair-wise dot-products u sing fixed-
precision binary representations, then you could just use the "matrix as
binary number" approach directly. Though, each matrix entry would now
contribute multiple bits to the binary number, rather than just one.

Note that this approach is not efficient. But, your problem subsumes standard
graph isomorphism, so a polynomial-time solution would be noteworthy. All of
what I've said makes no assumption about the vectors' dimensions. There's
probably a more efficient approach for vectors constrained to a relatively
low-dimensional space.

------
mck-
I'm surprised that there is no mention of Tabu Search or other metaheuristics
(like simmulated annealing, as pointed out in this thread). In my research, I
discovered it to be one of the best algorithms for this class of problems
(determinisitic, non-black-boxy, flexible, can incorporate expert knowledge).

Also worth mentioning is that while the TSP is interesting from an algorithms
class perspective, a problem that occurs far more often in real-life is the
Vehicle Routing Problem.

The Vehicle Routing Problem (with time-windows, types, multiple vehicles,
multi-depot, capacities, etc) is a much more complex problem compared to the
TSP, and I would argue that pretty much all the algorithms mentioned in the
article would fail to transfer.

Here's an open-source library written in Common Lisp that implements the Tabu
Search approach [1] -- shameless plug, but we also build an API for it [2]

[1] [https://github.com/mck-/Open-VRP](https://github.com/mck-/Open-VRP)

[2] [https://www.routific.com/developers](https://www.routific.com/developers)

~~~
alfiedotwtf
I'm wanting to dabble with metaheuristics. You mention VRP... what are other
"hello world" problems for testing different algorithms? Thanks!

------
graycat
For the set of real numbers R, positive integer n, and the Euclidean norm on
R^n, there is the old R. Karp paper where he shows amazing results for: Find a
minimum spanning tree and do a depth-first traversal but deleting arcs that
have already been traversed and going directly to the city not yet visited
that the depth first traversal would visit next.

So, just code up one of the at least two famous polynomial and amazingly fast
algorithms for minimum spanning tree then write the traversal code.

For the Euclidean case, likely tough to beat, all things considered, in
practice.

Why not stop there?

(1) In some cases want to be quite close to optimality or have actual
optimality.

(2) In some cases, the Euclidean norm need not hold even roughly, e.g., job
shop scheduling with setup times.

------
brador
> There are more sophisticated approximate algorithms that can handle hundreds
> of thousands of cities and come within 0.01% or better of the shortest
> possible tour.

Anyone have more details on this more sophisticated algorithm? Even just a
name to google would be great.

~~~
_delirium
The earliest one to be proposed (1958) is called "2-opt":
[https://en.wikipedia.org/wiki/2-opt](https://en.wikipedia.org/wiki/2-opt)

The Lin-Kernighan heuristic is a related heuristic developed later, which
works well in practice on many problems. Here is an implementation (freely
available source, but not freely licensed):
[http://www.akira.ruc.dk/~keld/research/LKH/](http://www.akira.ruc.dk/~keld/research/LKH/)

------
octatoan
I like how Norvig isn't a Real Programmer. His way of describing the problem
before starting to work on it is refreshing.

~~~
aikah
> I like how Norvig isn't a Real Programmer

I'm not sure what you mean by "Real Programmer". if one write programs for a
living one is a programmer. Reminds me that french joke about the "real
hunter" and the "fake hunter".

~~~
jeorgun
I think they mean it in the "real programmers use FORTRAN" [1], not the
literal, sense.

[1] [http://www.catb.org/jargon/html/R/Real-
Programmer.html](http://www.catb.org/jargon/html/R/Real-Programmer.html)

------
userbinator
I don't remember the algorithm exactly and Google isn't yielding any useful
results, but I seem to recall that if all the distances are integers (or if
you make all the distances integers via scaling/rounding) there is a much
faster algorithm that relies on "stepping upward" and considering increasingly
larger distances. All I remember is that it's very similar to the same
principle that makes radix sort sub-linearithmic.

Anyone know what I'm talking about?

~~~
j2kun
They would certainly have to be _bounded_ integers, right?

~~~
userbinator
Yes, I believe it is something like O(n) where n is the sum of all the
distances.

It's probably related to the "pseudopolynomial time subset-sum" algorithm.

------
Animats
The classic practical algorithm comes from Bell Labs work in the 1960s.
Connect the points in some random order. Then pick two links and cut there,
leaving three connected chains. Try all 12 possibilities for reconnecting
those three chains, and keep the shortest. Repeat until no improvement is
observed for a while.

This was one of the first useful nondeterministic computer algorithms.

~~~
jessaustin
_Try all 12 possibilities for reconnecting those three chains..._

I'm only coming up with 8 possibilities. Pick one chain and designate it Aa.
That may be followed by:

    
    
        Bb, Cc
        Bb, cC
        bB, Cc
        bB, cC
        Cc, Bb
        Cc, bB
        cC, Bb
        cC, bB
    

If distance depends on direction, multiply by 2 (for aA in addition to Aa),
but that still isn't 12.

~~~
Animats
Aa can be in the middle.

~~~
jessaustin
Aa Bb Cc is identical to Cc Aa Bb (or bB aA cC). That's why we fixed one
chain, to simplify the rest of the calculation.

EDIT: on reading TFA I find Norvig does essentially the same thing: " _So let
's arbitrarily say that all tours must start with the first city in the set of
cities. We'll just pull the first city out, and then tack it back on to all
the permutations of the rest of the cities._" Interestingly he doesn't also
eliminate the Aa-aA redundancy, because " _First, it would mean we can never
handle maps where the distance from A to B is different from B to A. Second,
it would complicate the code (if only by a line or two) while not saving much
run time._ "

~~~
Animats
OK, non-standard problem definition. Sorry. Usually it's an interconnection
problem without specified ends.

------
idunning
Great stuff from Norvig, as per usual. I find it strange that the next step
after enumeration is a heuristic approach, something that comes up a lot in
"solve a TSP" posts - there are many ways to solve TSP to provable optimality
that perform really well on non-pathological instances, and can often be
stopped at any point to get both a probably near optimal solution and, even
better, a bound on how good the best solution could be. I would argue that the
implementation cost is outweighed by the possibility of usually getting
optimal solutions and by having confidence in how far off you are (no chance
of producing a terrible solution).

------
plg
what about global optimization methods, e.g. simulated annealing?

~~~
jcoffland
I too was disapointed SA wasn't even mentioned. I've gotten good results with
it in the past. It's ability to get out of local minima make it quite
powerful. Still, it's an excellent article.

~~~
fogleman
Simulated Annealing is my go-to optimization algorithm. I'm using at work to
solve a vehicle scheduling problem.

------
apricot
Just the right mix of theory and practice to make the data structures class I
had as an undergraduate pale in comparison. Thank you Mr. Norvig, and please
keep them coming.

------
avyfain
A while back someone posted a cool simulated annealing post of this[0]. The
app seems to be down, but the code is there in case anyone wants to take a
look.

[0] [http://toddwschneider.com/posts/traveling-salesman-with-
simu...](http://toddwschneider.com/posts/traveling-salesman-with-simulated-
annealing-r-and-shiny/)

------
vacri
To me, a 1089-point 'Travelling Salesperson Problem' is finding the sales
staffer who is willing to do that in one run... :)

------
yodsanklai
Is it the non sexist version of the "Traveling Salesman problem"?

