Sorry, I added the credit now. Hope you don't mind it now? :)
I have to say that I find these pictures extremely aesthetical. It's crazy how the brutally powerful CPUs we have today are successors to processors created not so long ago that also, though very intricate, fit into one picture...
Thanks! No worries, I just found it amusing to randomly encounter one of my die photos. I agree, it's remarkable how much complexity there is in a modern CPU compared to the early ones.
Despite being mentioned throughout the post, I don't see a link to the video in question. I saw it when it was posted a few weeks back and really enjoyed it. I would highly recommend giving it a watch, especially if the ramifications of P vs NP are new to you [1]
Thanks. I also added the link to the post now. Frankly, I did not count on people getting to the blog post in any other way than being curious after watching the video. :D
> We recently made a Polylog video about the P vs NP problem. As usual, our goal was to present an underrated topic in a broadly understandable way
Interesting post but not sure why the author calls it underrated. It seems to get a fare share of attention and love amongst those that care about such things.
Interestingly the shortest route question of the salesman problem (which is what everyone associates it with) is NP-hard and thus not necessarily even within NP. P vs NP is specifically about P vs NP complete whereas NP-hard is problems that are at least as hard as NP-complete but need not be in NP nor decision problems. So make sure you give the “is there a route <= k” version because the much more common “find the shortest route version” is not NP.
Somebody really should come up with better names. I know for a fact I knew the difference once, but right now I cannot even remember what it was without clicking the link. Surely many people who heard about this problem (and that's a lot of people, it's about as pop-mathy as it gets) don't even suspect these aren't synonymous.
Yes. A maximal value of k can be calculated as (largest weight in the graph) times (number of nodes in the graph). At that point you can do binary search to find the smallest k for which a solution exists.
Under the assumption that the <= question can be answered in polynomial time, this entire algorithm will also complete in polynomial time.
It’s indeed NP-complete if edge lengths are integers or otherwise discretized. But the general traveling salesman problem has no such restriction, so you can’t finitely enumerate in the general case.
In the general case, you just have a big list of edge weights, which means you have to be able to write the edge weights down.
If you want to represent the problem as a bunch of points in a Euclidean plane with free travel, that's a different (and easier) problem.
And even then, while it might take an infinite number of steps to specify the answer at infinite resolution, it will only take a finite number of steps to specify the answer at any level you're capable of writing down.
> it will only take a finite number of steps to specify the answer at any level you're capable of writing down.
That’s basically the good old “everything is O(1) because int64/float64 has 2^64 possibilities” misconception. It’s not how complexity theory works. For any N, 2^N is still finite, we’re obviously not talking about undecidable problems.
Edit: I see that you may be responding to the “finitely enumerate” part of my comment. Sure, it wasn’t phrased well. Replace with “enumerate in P”.
What are you fixing? Suppose you want to know the answer to within one part in 10¹⁰⁰. That will take you 333 questions.
Suppose you don't actually need 100 decimal places of the answer, or more likely that even if you had them you'd be unable to use them, and you can only represent the answer to 20 decimal places. That will take you 67 questions.
You can easily enumerate this answer in P. The problem in your argument isn't that float64 only has 2^64 values. Use as many bits to hold the answer as you want. No matter how many that is, it will be a finite number, and you'll be able to specify them all in a polynomial amount of time. Each question takes polynomial time to answer and fills one bit of the solution.
Okay apparently my complexity theory is rusty and I’m remembering results but not remembering the reasons for those results, so sorry about that. Determining optimal path length to a certain degree is indeed in the same class as the decision problem. It’s finding that optimal path (optimization problem) that’s not NP-complete.
> It’s finding that optimal path (optimization problem) that’s not NP-complete.
I'm having some trouble with this. As far as I can see, finding a path of a given maximum length must be in NP, because it's very easy to deterministically verify that a given path has length no more than the maximum.
If determining the optimal path length is in NP, and identifying a path of that length is also in NP, how can determining an optimal path fail to be in NP?
In this case np-complete and np-hard are closely linked: the optimization problem can be reduced to the decision version. Not all decision problems are like that (e.g. happy net problem), but for this one I don’t see the point making the distinction.
You can always reduce anything to anything, but in this case, the reduction is relevant because it only adds a polynomial factor. Showing that a problem can be reduced to another one in polynomial time is core to proving np-completeness.
That means that tsp and tso-opt are in the same ballpark of complexity. The reason tsp-opt isn't called np-complete isn't related to its complexity, but to the fact that only decision problems get to be called that.
As I said, there are problems where the decision problem and the optimization problem are completely unrelated in terms of complexity, but tsp isn't one of those.
Nope, the usual way of reducing optimization problem is to ask the corresponding yes/no question - for some fixed value x, is the optimal value greater than x?
If you can solve the decision problem, you can solve the optimization problem by doing a binary search on the x.
> Not all decision problems are like that (e.g. happy net problem)
Assuming you have an oracle for any question that you know how to verify, this is very easy to reduce:
Is there a solution in which vertices {1, 4, 6} are all positive?
First, let's establish that the question is legal: presented with a working solution, we can calculate the defining constraint of the problem, and we can check the polarity of vertices 1, 4, and 6. If this problem is in NP, verifying the constraint will take at most polynomial time. If not, not, but our question can be verified in however much time it takes to calculate the constraint.
The complexity of determining a solution by getting answers to questions of this form is interesting.
Case 1: The answer is "no" for all single-vertex sets. All vertices must be negative. This cannot actually happen, because all vertices being negative is the same thing as all vertices being positive.
Case 2: All vertices may be simultaneously positive. In this case, the answer to every question will be "yes", and we'll ask one question for every vertex in the input graph, solving the problem in sublinear time.
Case 3: We need some positive and some negative vertices. By asking about single-vertex sets, we can quickly identify a vertex that can be positive, vertex A.
We will include that vertex in every subsequent question. By the time we get there, we will have already asked about zero or more other vertices and been answered "no". We need never ask about those vertices again; they have to be negative and if we include them in any set, we'll get another "no".
So, let's ask about a never-before-examined vertex, vertex B. Can {A, B} be simultaneously positive?
If not, we can toss B into the "must be negative" pile, since we know A can be positive.
If so, we can include it in our developing solution; future questions will include vertices A and B. We still never need to reexamine any vertex that has ever been included in any of our questions.
So, unless I'm missing something, in this case we'll also produce a solution using one question per vertex in the graph.
> but for this one I don’t see the point making the distinction.
The distinction between NP-complete and NP-hard has nothing to do with the distinction between yes/no questions and open-ended questions. Those are unrelated concepts.
An NP-hard problem is at least as hard as any NP problem. It might be much harder.
An NP-complete problem is NP-hard, but it's also guaranteed to be an NP problem. That's the distinction.
> An NP-complete problem is NP-hard, but it's also guaranteed to be an NP problem. That's the distinction.
My point being that transforming tsp-opt to tsp only adds a polynomial factor. So in the case of tsp-opt, it's called NP-hard rather than NP-complete because it's not a decision problem, not because it's not equivalent in terms of time complexity as a problem in NP.
People simply call it NP-hard because that term is better known than NPO or NP-equivalent.
No, NP-Complete is only defined for decision problems. As OP specifically and correctly points out, NP hard applies to decision problems as well as search and optimization problems (and others as well).
You may review the following to clarify the distinction between the various NP complexity classes:
I for one have no idea about topology in general. I had to look up what a nice shape was... I thought it was some mathematical term.
The only thing I remember about is topology is someone telling me how it would be easier for me to untangle cables if I knew topology but alas I never learned.
I think that word is in the process of being re-defined. I see it used a lot in the context of older pop music or singers/bands that are no longer at their peak.
My own pet theory is that it's an evolution of speech towards "hardened" claims that you can't easily disagree with. You cannot disagree with "over/under-rated" because there's no official "rating" method. You cannot disagree with "vibes" because the source of a vibration is impossible to determine. They both allow a speaker to make a claim without evidence.
This is just me being bad at English. I now rewrote the first paragraph of the blog post to hopefully make it clear that I am just claiming that our nonstandard way of viewing P vs NP is underrated, not the problem itself. :)
(A) it's going to stop attacks, because people are learning and adapting to having their ideas challenged. it's easier to use weasel words than it is to study language and make a persuasive argument. Your claim is false.
(B) anything can be "obviously false" under contrived/extreme situations. try again with "Sting is underrated" and suddenly you can easily argue it either way. Try using an argument that doesn't rely on p-values being at the 3+ sigma levels, but of course you won't because you know that will invalidate your position.
Might be the clickbait treadmill, indeed, or algospeak. But it doesn't need an objective meaning to be treated that way. The word "literally" can no longer be trusted to mean "in a literal sense". It's frequently used as an intensifier.
There's another term, i'm sure, because politicians use it all the time; "I never said X" when it was very heavily implied, and the listener was led down the garden path to the conclusion, only to find out the conclusion is bitter and unpleasant. The speaker can say "oh, that's your own biases/misconceptions/dogma, i never said <something extremely specific>"
I'm in the weeds here, but a simple analogy would be: "I don't like sunny days without clouds or fog." and i say "why don't you like blue skies?" and they say "i never said that" when the salient points of a sunny day with no clouds or fog are 1) there's a sun, and 2) the sky is blue.
I'm pretty sure the author is specifically referring to the framing of P vs NP used in the video, not to the problem itself, since a bit into the article they write this:
"I think that the framing with inverting a function f is quite underrated."
Yes! We are not trying to claim that P vs NP is underrated in the video, just the bit with interpreting it as a question about inverting functions. I think using "underrated" is justified, I talked with a bunch of people about this topic and it was a new way of looking at it for a lot of them.
Of course, using the title "What P vs NP is actually about" is a bit of a stretch -- "Interesting take on P vs NP" would be more honest.
Totally justified, I hadn't thought about that framing of it myself.
I think it's just that the first time you use underrated, it seems to be referring to the problem itself, not to your specific take. With the video as context, or even just reading a few sentences in, the meaning is clearer.
1) With others, I work on high-effort Youtube videos, our channel's called Polylog. We recently made a video about an underrated take on P vs NP. [1]
2) After working on the high-effort video, I always write a low-effort stream-of-consiousness-style very-much-unedited blog post where I clarify some details or explain some more esoteric stuff that did not make it into the video. The blog post you are discussing is such a blog post. So you should watch the video first and read the blog post only if you really liked the video and want to know more. :)
3) If you have any questions, I'll be happy to answer them!
BBC "in our time" radio show did an outstanding episode on the topic. As usual it had an amazing calibre of guests but a totally understated style. Also no long pointless intro full of terrible banter or lengthy bullet list of what will be discussed.
I wish podcasts could follow this example:
Framing this as inverting a function is interesting, but raises a question, since any function that maps multiple inputs to the same output is not invertible.
For example, let's say we invert a checker algorithm and "run it backward from YES". Does that find an arbitrary single solution? Does it find all solutions? What if there are an infinite number of solutions?
"An arbitrary single solution" would be the proper answer for NP. The goal is to find some solution that the verifier accepts, like how in SAT the goal is to find some satisfying assignment for all clauses.
Informally, an example of this problem is calculus class. A kid who's interested in programming, and takes calculus, can dream up a program that computes the derivative of any expression. But now try doing that with the anti-derivative. And to be sure you can write an expression in multiple ways, but the core problem remains.
Disclosure: I was that kid. My program was a mess, but good enough for a proof of concept.
In some contexts, the inverse of a function f : A -> B is defined as a function g : B -> A such that g(f(x)) = x and f(g(y)) = y for all x,y.
In other contexts, what is meant is g : B -> 2^A such that x is in g(f(x)) for all x and f(x') = f(x) for all x' in g(f(x)). Here, g is said to produce the pre-image under f, and is a generalization of the above definition of inverse for non-injective functions.
In other contexts still, what is meant is g : B -> A such that f(g(y)) = y for all y such that there exists some x with f(x) = y. This is a different generalization called a one-sided inverse. (People probably call it a left inverse or right inverse, but I can never remember which is which because it emerges from an arbitrary notational convention.)
The article uses inverse in this third sense. "Find some input to f which would yield a given output y, if one exists."
Hi,
a few commenters on Youtube also asked us this question. I just now added the last section to the blog post that discusses it. Basically, although we only show how to find one arbitrary solution, you can use this trick to also find all of them efficiently in terms of the size of the input and the number of solutions.
Idea is to add enough constraints to the function so that inverting it is non trivial (where inverts finds ANY input). Like the video said inverting multiplication with the output of 15 could yield 15 = 15 * 1, so just make the function f(a, b) = a * b where a > 1 and b > 1 for a more interesting RSA-breaking case.
Given that there is potentially an exponential number of solutions, it would be impossible to enumerate them all in anything smaller than exponential time.
For an example of harder-than-np-complete, I was shocked to find out how hard vector reachability is after being presented with a vector reachability problem and assuming I could just look up a reasonable-time algorithm.
I incorrectly assumed it would have some basic linear algebra solution because of how simple the problem seemed.
> To me, the essence of deep learning has nothing to do with trying to mimick biological systems or something in that sense; it’s the observation that if your circuits are continuous, there’s a clear algorithmic way of inverting/optimizing them using backpropagation.
> From the perspective of someone used to algorithms like Dijkstra’s algorithm, quicksort, and so on, this declarative approach of thinking in terms of loss functions and architectures, rather than how the net is actually optimized, sounds very alien. But this is how the whole algorithmic world would look like if P equaled NP! In that world, we’d all program declaratively in Prolog and use some kind of .solve() function at the end that would internally run a fast SAT solver to solve the problem defined by our declarations.
I think a much closer analogy to function inversion is MCMC (or Bayesian inference in general), where we can easily compute the density p(x) of any point x, but finding the x given a p(x) is intractable. Strictly speaking it's about finding a set of x's that are distributed as p(X), not finding the x given any single density p(x), but it's close.
Relatedly, probabilistic programming was originally imagined pretty much like your second quote: you define a model, get some data, run them both through the built-in inference engine, and you get the parameters of the model likely to have produced the data. In practice though, there's no universal inference engine that works for everything (some people disagree, but they're NUTS ;)
I guess pretty much for the same reason P is probably not equal to NP.
> I guess pretty much for the same reason P is probably not equal to NP.
Yep, in particular there are classes called #P and PP that are closely connected to NP that can capture the hardness of problems like computing partition functions, sampling from posterior distribution and so on.
Didn't know about these, thanks for the pointer! Do you have a good resource for learning about these (specifically about the hardness of sampling from posterior distributions)?
I just skimmed the article and didn't watch the video, but the bit about backpropagation is just wrong. Backpropagation doesn't compute an inverse of the jacobian, it computes its transpose. (although a similar idea to backpropagation could possibly be used to compute an inverse of several reversible layers, but that's not typically how neural networks work)
Backprop itself doesn't invert the computation, but it does give you the direction for an incremental move towards the inverse (a 'nudge' as the article puts it). That is, given a sufficiently nice function f and an appropriate loss ||f(x) - y*||^2, gradient descent wrt x will indeed recover the inverse x* = f^{-1}(y*) since that is what minimizes the loss. I assume this what the article is pointing at.
If you want to be picky, it's true that the direct analogue of continuous optimization would be discrete optimization (integer programming, TSP, etc) rather than decision problems like SAT. But there are straightforward reductions between the two so it's common to speak of optimization problems as being in P or NP even though that's not entirely accurate.
"So, for example, if you can solve some problem \Pi by running a SAT solver ten times, this doesn’t mean that you have reduced that problem to SAT— in reduction, you can only run the SAT solver once."
The section was written horribly -- while I was talking about backpropagation, I was thinking about "what can be done in polynomial time" and there's a mismatch as you explained. Thanks and shame on me, I rewrote it.
In any case, I would recommend watching the video first, the article is just accompanied stream of consciousness for those few who really liked the video.
This is one of the many comments I see on HN where I genuinely have no idea whether it’s satire or just high-level technical talk. I assume the latter, but I don’t know enough to disprove the former
It's not satire. If you have some function with several inputs, the Jacobian is a matrix of all the partial derivatives of this function with respect to each input variable[1]. Since derivatives give the slope of a function, if you think about your function as being like a bumpy surface with the height at each point being the output, this matrix tells you which way (and how far) to change any input if you want to go "up" or "down" in the output.
Backpropogation is a way to optimise a neural network. You want to know how best to nudge the weights of the network to optimise some loss function, so what you do is compute the gradient (ie partial derivative) of that function with respect to each of the weights. This allows you to then tweak the weights of the function so your network gets better at whatever task you're trying to get it to learn. See [2] to understand how this works and and [3] to understand how this relates to the Jacobian, but generally if you're trying to go "downhill" in your loss function it's easy to see intuitively that knowing which way the function slopes (ie the effect of tweaking each of the weights) is important and that's what the Jacobian tells you.
The inverse of a matrix[4] and its transpose[5] are two different operations in linear algebra. Transpose turns rows into columns and columns into rows and the inverse of a matrix is a little harder to grasp maybe without background, but you could think of multiplying one matrix by the inverse of another as a little like division (since you can't actually divide matrices).[6]
They're almost certainly not equal because solving NP in P time requires magically making exponential decisions in P time. What's interesting that it's been so hard to prove one way or the other.
Is the interesting part of P vs NP (since it seems very unlikely that P = NP) the mathematical machinery that would be required to prove that P != NP? Presumably doing so would require a way to determine a lower bound for _all_ possible algorithms that solve a particular problem.
Exactly! In theoretical computer science, we have a lot of experience coming up with new algorithms but no idea how to prove lower bounds. So I would say that new lower-bound technique is perhaps the most interesting thing we would learn from the proof of P!=NP.
But, of course, P vs NP is clearly central to computer science and math for a bunch of other reasons, hopefully, our video provided some intuitions.
Well, it was recent, so I'm currently in the 'feedback and validation' stage. I've reached out to experts for feedback, but these things do take time.
If you have a genuine interest, and especially if you could provide feedback or warm introductions to someone who can, I'm happy to share.
You can email me at atmanthedog at Google's free email service domain. Put something HN related in the subject. *If you do, please share your mathematical background so I can better tailor a response.