Only decision problems can be NP-complete, other problems can only be NP-hard. The article is a horribly long winded rant over this nitpick.
By the way, in the computational complexity community it is understood that if you say "TSP is NP Complete" you mean the decision version of the problem (is there a path shorter than K?) so nobody will bother to nitpick.
The great thing is that the author doesn't seem to understand the distinction. He states in his update that he has a decision problem of finding the tour of minimal distance.
Isn't that decision problem also in NP? The traditional TSP decision problem is in NP, and it seems like you should be able to run that a polynomial number of times to solve his version.
Perhaps the parent you're replying to means to say that the author doesn't appear to understand the distinction, because the wording being used doesn't make it a decision problem either. As long as we're nitpicking: decision problems have yes-or-no answers.
It's true that TSP is not NP complete because it is not in NP, because it is an optimization problem, not a decision problem.
It's also true that TSP is NP hard. The simplest way to see this is that if you have solved a TSP you have also automatically solved the Hamiltonian Path problem for the corresponding graph and the Hamiltonian Path problem is known to be NP complete.
What the op does not state and which is also very important to realize however is that TSP is also NP easy (and hence NP equivalent). In other words if you have an algorithm that can solve any NP complete problem in polynomial time (ie. P == NP) then you can use that algorithm to solve TSP (actually any finite optimization problem) in polynomial time. There is a sketch of a proof of this in one of my previous HN comments, but I'm pretty sure this is well known to complexity theorists, though there seems to be a fair amount of confusion about it on the web.
> ... there seems to be a fair amount of confusion
> about it on the web.
There's a lot of confusion on the web about this. And while it's true that this is largely a small technical point, this is an area where small technical details can matter.
> It's true that TSP is not NP complete because it is
> not in NP, because it is an optimization problem, not
> a decision problem.
So we make it a decision problem by saying: "Is there a tour with cost less than K?"
> ... if you have an algorithm that can solve any NP complete
> problem in polynomial time (ie. P == NP) then you can use that
> algorithm to solve TSP (actually any finite optimization problem)
> in polynomial time.
And, roughly speaking, you do that by binary searching on the cost of a solution. Set a lower limit of 0 and a random upper bound. Keep doubling that till you get a solution, then binary search.
It's not a path that we're feeding to the decision problem. The decision problem is when we have a graph with costs on the edges, and when someone says "Is there a solution of cost less than K?" We're assuming we have a way to answer that, and using that to get a solution to the optimization problem.
We need to use that to find the best possible path. We start by finding the cost of the best possible path. Set k=1 and ask if there's a solution. If not, double K and ask again. Repeat until the answer is "Yes".
Now you know there is a solution with cost in the range [0,K]. Set L=0 and U=K. Now do a binary search. Set K = L + (U-L)/2 and ask if there is a solution of cost K. If not, set L=K, if so then set U=K. Lather, rinse, repeat, until U-L is sufficiently small (1 if using integer weights - more care required otherwise).
Now we know the optimum price. Remove an edge and see if there's still a solution. If there is, there's an optimal path without it. If not, the edge is in the optimal path. Repeat for all edges - some efficiencies can be made.
Removing one edge at a time won't work if there is a second, redundant, optimal path (every edge can be removed without altering the cost). I don't remember how you get around this but it shouldnt be too complicated...
This doesn't get you the path itself, it only gets you the length of the path.
Think of it like this:
the decision version question is: Is there a path of length k that hits all of the cities?
The algorithm just has to answer yes/no, it doesn't have to give you the path itself. So you can use binary search to repeatedly ask whether there is a path with length less than k, and home in on the optimal length. You still don't know what the path is.
We assume it would be difficult for the algorithm to make this determination without having actually discovered the path, but the problem doesn't require that it output any such path.
Edit: This is wrong, ColinWright details how to get the actual path from just knowing the length
That doesn't seem at all obvious. It's pretty easy to prove that if you can answer whether or not there exists a path of length k in polynomial time then you can also find what that path is in polynomial time. They prove this in the Udacity course on complexity, which I found very interesting if not necessarily practically useful. But it seems to me that the possible values of k you can get grows exponentially with problem size, so you can't just use your decision algorithm on various kays to find the optimal k in polynomial time.
The size of the set k may be exponential but if you have a decision function D(C) == 1 iff there exists a path of cost < C you can use that function to do a binary search over all k's in polynomial time because binary search is log time.
Each call to D should give you 1 bit of the solution and with n bits you can represent numbers from 0 to O(2^n).
It depends what encoding is used. If the edge costs are integers, this is true. Also such bounds are possible for rational costs (but you have to pay more attention). It gets a lot more difficult if non-rational edge weights are allowed.
I think we're talking here about problems that can be fed to a digital computer and hence can be represented by a finite number of bits. I believe that would exclude non-rational weights.
No, it wouldn't exclude non-rational weights. Consider for example the field Q[sqrt{2}]. Any element a+b * sqrt{2} of it can be represented by two rational numbers a, b. Since any field extension is a vector space (in this case of dimension 2) the addition can be implemented trivially and the multiplication is
Not necessarily - quantities like, for example, sqrt(2) can be represented symbolically. So can quantities like e and pi. You can get around this by multiplying all edge costs by twice the number of edges divided by the smallest non-zero difference between costs, then rounding to integers. Or something similar.
This might come across as quite harsh but the article could have been much shorter.
NP is the class of decision problems that can be solved by a non-deterministic Turing machine in polynomial time. A problem is called NP-hard when there is a polynomial-time reduction of 3-SAT to said problem. A problem is called NP-complete if it is both NP-hard and in NP. Since TSP is not a decision problem, it is not in NP, therefore it cannot be NP-complete. However, it is NP-hard.
One can call this a minor nitpick. I still point it out every now and then when I hear people talking about such things. However, the really important thing is that we haven't found a way to solve NP-hard (and hence of course NP-complete) problems efficiently on a deterministic Turing machine (and hence computers as we can actually build them). An informal way of saying the same thing is "TSP is really really hard to solve".
Either way, I'm always surprised at how many problems we encounter in our every day lives are actually NP-hard. Every decent puzzle game for instance is usually NP-hard. Goes to show that we don't really enjoy solving easy problems.
EDIT: Of course there's also a decision version of TSP, as others have pointed out. This one is NP-complete of course.
>Every decent puzzle game for instance is usually NP-hard.
An interesting question is how many of these puzzles are NP-hard in practice. For example, Minesweeper in NP-hard. However, it is possible to solve many games of Minesweeper using simple deductive reasoning in polynomial time. This leads to the question of what is the average difficulty of a random game. My best attempt at formalizing this question is to imagine reducing every game with a polynomial time algorithm, then look at the complexity of the remaining games. The complexity of an original game of size n could be the average after you reduce all games of size n with your polynomial time algorithm. If this new complexity is sub exponential than it feels as if there is some sense in which the game is not often NP-hard.
A more concrete example that comes to mind is Haskell's type inference. In general this type inference is NP-hard. However, there are algorithms that can solve it in polynomial time assuming a limit to nesting. Because no such limit actually exists, type inference is still exponential. But, because we never see arbitrarily deep nesting (except for contrived examples), this does not matter.
Many NP-hard or NP-complete problems have (infinitely many) instances that are "easy" to solve. It is quite straight forward to write up a procedure that produces infinitely many instances of 3-SAT that can be solved in polynomial time by the DPLL algorithm. But it is important to note that a problem is actually a set of instances.
Your observation basically boils down to "how rare is the worst case?". There are actually algorithms with exponential worst case complexity that are used all over the place because - on average - they perform really well. The simplex algorithm for linear programming is an example for such. Every version of this algorithm comes with a worst case problem such that the algorithm has exponential runtime. However, on average it is quite a fast algorithm that can handle millions of variables and corresponding constraints on modern hardware. Still, that doesn't change the fact that linear programming is - from a theoretical point of view - still hard enough that we cannot generally solve it faster than in exponential time. As is the case with DPLL, good heuristics can speed up an algorithm a lot without changing worst case complexity. Other examples are all over the place in AI. It's also in AI where the concept of pruning is probably encountered most often. This the practice of discarding whole subtrees of the search tree by some sort of heuristic or deductive reasoning.
Those things aside, I think what we should be looking at from a theoretical point of view is when it makes sense to consider the average case complexity over the worst case complexity. This brings us back to "how rare is the worst case". Quicksort for instance is said to be an O(n log n) algorithm but it has O(n^2 ) worst case complexity (trivially so). However, it can be decided in O(n) whether this case actually occurs and hence a good sorting implementation could switch to a different algorithm once this has been detected (something similar happens in the std::sort function that's been implemented in the STL). This is definitely a case where it makes sense to "forget" worst case complexity.
For SAT, researchers have actually worked up statistics on how hard certain instances are in terms of the number of clauses and the number of symbols. It turns out that there is a range of ratio between the two where problems are the hardest (roughly between clauses/symbols is between 4 and 6). I don't have a source right now but googling "satisfiability threshold conjecture" might turn up some results.
As such, yes, not all instances of NP-hard problems are actually hard to solve, but there are infinitely many that are. Encountering such problems in practice one can usually make use of the special problems of the instance (or class of instances) one actually encounters and devise an algorithm that performs rather well on average.
AS others have pointed out, this glosses over the distinction between the decision version of the problem and the optimization version. But the reason that it often gets glossed over is that it doesn't make much of a difference when it comes to the complexity class of the problem. In this case, the decision version of the problem is: Does a path exist shorter than K? But once you have a solution to that, you can use binary search to answer the optimization version of the problem: Find k such that there exists a path = k and no path < k.
But when it comes to the question of verifiers the distinction does matter. because NP is only concerned with decision problems the OP is incorrect when he says that there's no poly time verifier, because there is one for the decision version of the problem, and that's all that matters.
I'm a little confused by this issue. According to Proof Wiki: "Because the Traveling Salesman problem is both NP and NP-hard the Traveling Salesman Problem is NP-complete."
NP is a class of decision problems - the solution has to be "yes" or "no". NP-complete problems are a subset of NP, so only decision problems can be NP-complete.
TSP, the optimization version (what's the shortest path?) is an optimization problem therefore it is NP-hard, not NP-complete. TSP, the decision version (is there a path shorter than K?) is NP-complete.
There is a chance someone will correct me here, as I'm not an expert. Take this with a grain of salt. Or a handful.
The second claim, that it's not in NP, is what this article is talking about. The optimization problem, as usually given, is not in NP because optimization problems are, by definition, not in NP. NP is for decision problems.
That's why this is regarded by some as simply a technical nit-pick, but continues to be a major source of confusion for those unwilling or unable to reason in nit-picking detail.
Certainly the decision problem "Is there a path of cost less than K?" is in NP, is NP-hard, and is therefore NP complete.
This can then be leveraged up into a solution of the optimization problem. Use a lower bound of 0, an upper bound of #(E) * max{ e in E: cost(e) }, and do a binary search for the optimal cost.
Does that help? With that information, you now need to re-read other sources.
The OP's update is still incorrect. A problem in NP is a set of strings which a deciding Turing machine must accept (and it must reject all other strings). You cannot define a problem in NP by what the machine outputs, because for a decision problem you can only output a single bit.
What the OP meant to say is that you can define TSP as a decision problem as follows:
TSP = {<G, T> : G is a weighted graph, and T is a tour of minimal total weight}
Then this achieves what we want: that deciding whether a given tour has minimal weight is NP-hard but not known to be in NP. You can also formulate the TSP problem as a decision problem this way:
TSP = {<G, v>: G is a weighted graph with a tour of weight at most v}.
This formulation is essentially equivalent to the first problem.
By the way, in the computational complexity community it is understood that if you say "TSP is NP Complete" you mean the decision version of the problem (is there a path shorter than K?) so nobody will bother to nitpick.