Hacker News new | past | comments | ask | show | jobs | submit login
P vs NP on simple english wikipedia - feedback, please (wikipedia.org)
93 points by MarkPNeyer on Sept 15, 2011 | hide | past | web | favorite | 59 comments

Your example at the end - you should go in to less detail about the number of combinations, and more in to how it's easy for a computer to check the right answer has been reached.

This. It's a really great example, but you never make the full connection by explaining that it's easy to check that 2 towers have the same mass.

I see posting this here as a mistake. You should post it somewhere where people dont know anything about it. Posting it here will result in us adding more and more detail, possibly making it more complicated.

I used to use my grandmother as a subject for testing out paper abstracts/intros for this reason. She could barely figure out scrolling.

My 2c: Don't write "Problems in P", write "P Problems".

Also you definition of "why its interesting" conflicts with the other Wikipedia page which says:

The P versus NP problem is a major unsolved problem in computer science. Informally, it asks whether every problem whose solution can be efficiently checked by a computer can also be efficiently solved by a computer.

i'm not sure i see the conflict... care to elaborate?

He's changed it =)

This is what it said originally: That means they would like to know if there are any problems where the answer cannot easily be found by a computer, but if someone says he has the answer, it is easy to use a computer to check if that answer is correct.

I think you might be missing the point of the simple English wikipedia. It's the vocabulary that you use that is simple, not the way in which you express the concepts.

Exactly. I was suprised why someone would use "mass" in a simple english wikipedia

I somewhat agree. For example why use a term like 'mass', when 'size' or 'weight' would be much simpler.

and wrong

...but equally useful for the example, which is just about dividing rocks into two groups. We're interested in the process of dividing the rocks, not the criteria for division.

Use 'height' (and maybe blocks instead of rocks, because then it's clearer they stack).

In what way is the word 'size' wrong?

I guess that Lamby means that one should not use "weight" instead of "mass", the two being two very different concepts.

They're very different concepts, but for the sake of the example it doesn't matter which you are measuring, you could equally be measuring the number of atoms in each one if you happen to know it.

The Simple English Wikipedia is also for people with different needs. Readers may be students, children, and adults who may find it hard to learn, and people who are trying to learn English. Other people use the Simple English Wikipedia because the simple language helps them understand difficult ideas or topics they do not know about.

From: http://simple.wikipedia.org/wiki/Simple_English_Wikipedia

i debated this one a lot. on the one hand, people are more likely to recognize the word weight, but on the other hand, weight isn't an inherent property of objects, and (to be overly pedantic) two rocks of equal mass, one at the top of the tower, and one at the bottom, will have difference weights.

  P problems are considered 'easy' for computers to solve. NP problems are considered easy for computers to check the answers to them.
To improve clarity:

1. Avoid putting pronouns far from the main noun.

2. Use simple sentences. Reduce the number of verbs and prepositional phrases per sentence.

3. Avoid scare quotes. If you need to qualify a term, do so explicitly.

Consider, for the above:

NP problems have answers which can be easily checked. P problems can be easily solved.

(Maybe precede or follow with a quick definition of what you mean by 'easy.' But I'm not sure it's necessary here.)

i guess the article is currently being edited, but at the moment it isn't so clear what is common and what different between the two groups. i think it would be better if the motivation came earlier:

  solving a problem is sometimes harder than checking you have the right answer.  
  for example, it is easy to check that you have the secret key to a code.  you 
  can decode a message and see the original text.  but it is hard to find the key 
  if you do not know it.

  so the problem of finding the key to a code is much harder than checking whether a 
  key is correct.  other problems can be easy to solve and to check.  for example,
  adding 2 and 2 is both easy to solve and to check.

  this difference can be used to divide problems into two groups.  these groups are 
  called P and NP.  it is easy to check the answers for problems in both groups, 
  but only problems in group P are easy to solve.

  the problem of adding two numbers is in P, because it is easy to solve and to 
  check. the problem finding a secret key to a code is in NP because is is easy to 
  check but hard to solve.
i know my writing isn't simple enough here, but an approach like the above makes it easier to understand the relationship between them. in particular, the commonality between the two groups is emphasised.

I had a similar response to the article, but I can see at least one reason why this explanation doesn't fit. Despite any inclinations we may have P vs. NP is still unsolved. So, claiming that a problem is in NP is "easy to check but hard to solve" is not necessarily true (if P == NP, it is both easy to check and to solve).

The original article does a good job of not making this assumption, and because of this gains the ability to pose the question:

> Many people want to know if there are any NP problems that are not P Problems

hmmm. it's true that my text is wrong in that way, but couldn't you fix it with a "maybe" or "appear" and then explain the open question afterwards? or maybe the conditional nature of the text then becomes too complex?

You might want to mention P VS NP is similar to the following scenario: people feel they can identify a piece of art (say a painting) as being of merit very quickly just by seeing it; but won't be able to come up with any piece of equal merit even if they had all the time in the world.

  Many people want to know if there are any NP problems that are not P Problems. 
  That means they would like to know if there are any problems where the answer 
  is hard to find, but if someone says he has the answer, it is easy to check 
  if that answer is correct.
This is probably the core of the article, but I find it misleading. The truth is that we need to either prove that the answer to a certain problem will always be hard to find (hence proving that P =/= NP) or we need to prove that all NP problems have in theory a P algorithm (P = NP). We currently have many NP (="easily" verifiable) problems that have no known solving algorithm in polynomial time. The evidence would so far suggest that P =/= NP, but this is empirical not mathematical. It's possible that all those non-P problems have some way to resolve them in P, but it doesn't look like it right now.

I would thus rephrase the core explanation as something like this:

"For problems where the correct answer can be checked easily, there are two possibilities: If (P = NP) is correct, that means there is always an easy way to come up with a solution no matter how complicated the problem is. If the opposite is correct (P =/= NP), there are some problems where the answer will always be difficult to find no matter how clever the solution is. We don't know yet which one of these possibilities is true."

the lack of a known polynomial time algorithm could be interpreted as evidence that P =/= NP, but it could just as easily be interpreted as evidence of our primitive state of knowledge with regard to algorithms. Suppose that P = NP, but the minimum complexity of any NP-Complete problem is O(n^(# of atoms in the universe)); we might never discover such a complicated algorithm, but that in no way implies it doesn't exist.

still, i see your point, but i think your phrasing is too complicated; i've changed it to this:

  It is known that all P problems are NP problems; the proof
  requires math which is too complicated for this article but 
  may be explained in the future. Many people want to know if
  there are any NP problems that are not P Problems. That 
  means they would like to know if there are any problems
  where the answer cannot easily be found by a computer, but
  if someone says he has the answer, it is easy to use
  a computer to check if that answer is correct.

I like mine better ;-)

Your idea about the the primitive state of our algorithm knowledge is interesting. This has nothing to do with the article, but I do believe there are deep philosophical implications here. If we knew (and could make use of the fact) that everything is P, it would fundamentally shift our relationship with reality. I would go so far as to say that P =/= NP is an instinctive model we have formed about the universe as it presents itself today. If this were to change, it would affect pretty much everything we're doing - I can't even begin to imagine what would happen on the day abstract problem solving becomes a P algorithm...

N^(1,000!!!!) is P but you need a rediculusly large number before 2^(N) is worse (N >= 2). Generally N^5 is about as far as you want before you can do anything with 3 digit numbers.

what happens in the example i gave above? if the minimum complexity of solving any np-complete problem is O(n^2e80) then we'd need such massively parallel computing that you'd have to turn pretty much every atom in the universe into a computing device.

i agree that p != np is the instinctive model most theoreticians have about the universe, but if science has taught us anything in the last 100 years, it's that our instincts are often wrong, and the story of science will probably never be finished.

Absolutely, but I'd say that the idea of brute-forcing the entire universe - while theoretically and mathematically sound - goes against the spirit of what we're trying to express with Big O. Solutions like these work in a theoretical universe with infinite matter and possibly over infinite time, but if the world is indeed finite (which it may very well be) any Big O expression containing the sum of all states of all atoms in the universe does contain a practical infinity.

i'm not sure what you mean by the 'spirit' of Big-O; i find the question fascinating because it is relatively easy to understand the concepts involved, yet it's been 40+ years since we understood enough to be able to ask the question, and we still don't know the answer. when you couple that with the fact that the prevailing wisdom that our inability to find an algorithm implies that it can't exist, the contrarian in me goes nuts: the academic attitude is always "we're so smart, if we can't find it, it must not exist," and i get a kick out of arguing for more humility on their part.

even though a polynomial-time algorithm with a massive polynomial would still be computationally intractable, even a proof that such an algorithm exists, but is too complicated to express using all the atoms in the universe as bits would probably open the doors to all kinds of proofs and techniques we cant' even imagine.

The page says people "would like to know if there are any problems that are easy for computers to see if a one person's answer is correct, but hard for computers to find a correct answer." However the example given seems to answer this question in the affirmative: it's a problem that's easy to check using a set of scales, but very hard to find a correct answer.

maybe the example needs to be more clear - it's hard for the computer to check all possible answers, but maybe it's easy for the computer to find the answer another way.

Rating 3/10. I don't think this article is written by someone who knows the topic well.

e.g. you don't too complicated math as claimed in this line:

>> It is known that all P problems are NP problems; the proof requires math which is too complicated for this article but may be explained in the future.

For P problems can be solved easily and then the solution can be easily verified. This means all P problems are NP problems too.

Good news!! Better Article Exists. Read: http://cstheory.stackexchange.com/questions/5188/explain-p-n...

I would agree that the math required is not 'complicated' from the point of view of anyone with basic understanding of discrete logic, but it's certainly too complicated to express in simple english terms that are understandable at the level i was shooting for.

if you want to do the proof that all problems in P are also in NP, you have to explain the notion of a decision problem and show that any np-complete numerical optimization problem can be converted to a decision problem without increasing its computational complexity by more than a function bounded by a polynomial.

someone named 'balabiot' changed the article by adding an incorrect 'proof' that P is a subset of NP, at the same time you made this post. the incorrect proof reads:

  All P problems are NP problems: if a problem is easy to 
  solve, to check an answer you just solve it and check that 
  the results match.
this invalid proof makes the assumption that an np-complete problem has a single solution. many np-complete problems have multiple solutions; there many be many ways to pack the knapsack, color the graph, or satisfy the boolean circuit. if you are presented with a valid solution, but the algorithm you use to solve the problem instance produces a different, but also valid solution, your comparison would fail.

note that simply solving the problem (because it is in p) is NOT the same thing as verifying a solution correct.

I came from an electronics background when I first started reading algorithms my head was spinning. And I am sure a lot of other novices/laymen from the same problem.

The reason I have seen over the years is simple. Expressing the problems in terms of mathematical equations. Mathematical equations some how denote a way of modifying elements through equations. They tend to build on existing established principles of maths. There fore in order to understand what the other person is saying a person must get into a high learning curve.

When I first read about P Vs NP problem, I really wasn't able to co relate it to anything in the real world. Later on as I read more on puzzles like The Bin packing problem, or the traveling salesman problems. Things began to make sense.

The way to explain people about such things is to give them problems defined in simple English. And then allow them to build their own way of representing objects and elements, and the way they choose to modify them. Like this article does, However while explaining to people start from very small input sizes. Help them work out outputs for large input sizes.

Given the problem at hand, explain them why such large computation time doesn't make sense in the physical world.

Representing the problem in pure math will never help the guy next to your home. Giving him a day to day use case, will force him to think.

So here we go, what all common every day use cases fit into the P Vs NP problem? If we can create such a list, a lot more laymen can think and build on those problems easily!

Is there such a list lurking somewhere?

doubtful. a programmer could spend an entire career in industry and never run into an np complete problem he needed to solve.

i added more examples to the article, and made them as simple and practical sounding as possible, but to be honest my primary interest in the problem is mostly from a theoretical perspective. to me, the idea that P != NP implies a host of bizarre conclusions that seem incredibly unlikely to me. i am so convinced of this that i had it emblazoned my license plate:


Thanks Mark,

Good work Indeed.

Can you please explain why you think P=NP. I am deeply interested in knowing about them.

to be honest, mostly it's because i enjoy disagreeing with people who are convinced of things they don't have a full understanding of. if you are decently educated, you realize how much there is we don't know, and approach every unsolved puzzle with a combination of curiosity, confidence and humility. philosophy aside, here's my thinking:

LONG VERSION: Suppose P is an NP complete decision problem; for each instance, the answer is either 'yes' or 'no'. Let P have verifying function V. Let the language L be the set of all binary strings s that represent instances of P for which V(s) = 1, i.e. the set of all strings s that represent instances of P for which the answer is 'yes'. If L (the set of all instances of P for which the answer is 'yes') is finite, then you can easily answer all instances of the problem P in constant time by using a lookup table that stores all members of L (i.e. all finite languages are regular.)

So, consider the case when L is infinite. We know that V must be expressible in terms of a finite sequence of symbols. Suppose we have manged to express V using a total of C bits. The entropy of all C-bit binary sequences is bounded by a function of C alone; i.e. it is independent of the specific content of V, all that matters is how many bits are in some (it doesn't matter which) implementation of V. My primitive understanding of information theory tells me that if V has a finite amount of information, and L is uniquely determined by V, then L cannot possibly have MORE information than V; if you have V and enough time, you can compute all of L. This limits the possibilities of L, and suggest to me that there must be a way to find a (possibly complex and incomprehensible algorithm) for solving all instances of P, by 'compressing' L in just the right way. I admit that this is very hand-wavey; i just don't know enough information theory to pin everything down.

SHORT VERSION: if you can verify all solutions to a problem using a function expressible using a finite sequence of symbols, then the total information content of a sequence representing all solutions to the problem can't exceed the information content of the verifying algorithm.

if P != NP, this (to me) implies that a list of answers to a problem somehow has more information than an algorithm that can be used to generate that list of all answers. it seems to me that the algorithm could easily contain more information than the list, but not the other way around.

to put it another way, if you see a verifying function as a compression algorithm on the list of all solutions (because they contain the same amount of information), then P != NP implies that some infinite sequences (which represent np complete problems) are simultaneously so redundant that they can be compressed to a finite length, but at the same time are so complicated that you can't figure out an arbitrary element without doing a massive amount of computation. as i see it, some patterns can appear random unless you look at them from just the right perspective, and then the pattern emerges. because np-complete problems can be expressed with a finite amount of information, there has to be a pattern that we're not seeing yet.

I think this generally does a good job of simplifying the terminology without falling into "durr, people who don't speak English natively are dumb" territory.

I agree. Simple errs towards 5 year olds, not people whos native language is not English, as they have their own wikipedias.

As someone who has an interest in this kind of thing, but very little in the way of formal education on it (my last formal maths education was my 'A' levels around 25 years ago), I'm possibly the exact target audience for this article.

Given that, I thought it was very helpful. I've read a whole host of attempts to explain it before, and they rapidly descend into 1000 other mathematical concepts that I don't fully understand and end up reading about instead of the actual P v NP issue.

The one bit that I think is missing is a better explanation of what's meant by "easy" in this context. Clearly a problem that takes more than the life of the universe to solve is likely to classify as hard. But what about if it was solvable in a year, or a minute, or a second? My vague understanding is that there's not really a "less than 5 seconds to solve"-type cutoff, and that it's actually to do with "nondeterministic polynomial time". But at that point I disappear into one of the mathematical things that I don't really grasp.

i agree, it could use more math. i plan to add that in; my concern is that i don't want to scare people away.

I would offer a suggestion, but I just had another go on the Wikipedia version and I honestly don't have a clue what it's telling me. I think I get what non-deterministic means, but then I'm totally flummoxed as to what that has to do with problems whose solution can be quickly verified.

I liked the article!

I would mention that it is part of the millenium problems and the one million dollar prize that comes with it.

It seems to be missing the philosophical introduction to what P = NP means [1]:

If P=NP, then the world would be a profoundly different place... Everyone who could appreciate a symphony would be Mozart

[1] http://www.scottaaronson.com/blog/?p=122

Many people find it difficult to see which of two large numbers is greater, unless they are reading on paper and take up a pencil and count the groups of three. This issue must come up a lot in simple-English. The solution is to introduce scientific notation, and to have a link to that. (As has been pointed out, mass is a tricky word, being used here.) Perhaps another method would be to invent something quirky, saying perhaps "4+(20 digits)" or something.

In this particular case, there is another solution: do not quote the individual numbers, but only their ratio. Anyone can tell that "1000" is big and "1/1000" is small, because all that's needed is to see if there are more than two or so digits.

I dislike the common implication that NP complete means "the stupidest brute force thing is the best way we know how to solve the problem." For example, meet in the middle will turn subset sum from 2^n to 2^(n/2).

i agree that there are a lot of problems, but the goal of this article was to make the topic as simple as possible. any simplification of a complex, mathematical construct will invariably result in hand-waving and undefined terms; i tried my best to keep those to a minimum. i plan to expand the article in the future, to clarify some mathematical details. thanks for the feedback, though.

I suggest emphasizing that problems that can be solved easily can be checked easily. It's a simple idea, but it wouldn't necessarily occur to the casual reader and is quite important.

agreed that the example is not the best, a better example would be a picture puzzle that can be verified in just a glance but to build it you need to test all pieces against each other to see if they match.

Picture puzzle is in P because you can build the puzzle incrementally in quadratic time

I wonder if there is a good way to explain that the nature of these problems is to find any and all exact solutions to the puzzle and that the 95% solution which is vastly easier and entirely sufficient for any practical application (in terms of the given examples) is NOT interesting.

E.g.: Travelling sales-person: In terms of P/NP, she doesn't care if she can as much as if there is any possible route she might take that is even an inch shorter than the one she can fairly trivially plan out.

your claim that a 95% solution is entirely sufficient only applies to numerical optimization problems. every numerical optimization problem has a corresponding decision problem, and an algorithm that could solve these decision problems would be tremendously useful. think about circuit satisfiablility; if you're building an airplane, you want to know if the circuits for the airplane could ever cause the engines to shut off mid-flight. the concept of an approximation algorithm is meaningless here; either you know the answer, or you don't.

Although I understand what your are trying to say here.

But the problems you tend to describe never exist in the real world either. What I mean to say is, you can never be Boolean sure about anything in the real world as you can never control all the parameters yourself.

Which in case you are always going to measure things in percentages like x% safer or y% safer.

I guess that's what the parent poster tends to imply.

you can never control all of the parameters yourself, true, but think of it this way:

the cockpit of an airplane has hundreds of switches and nobs. suppose we want to ask the question "is there some way the pilot could toggle switches or flip knobs that would open the doors in flight?" you're never going to be able to guarantee that the doors won't open, but you can guarantee that the circuit alone won't cause them to open.

or, consider that many games are np-complete. if you are playing such a game against another player, whoever gets the better solution will win.

in addition, the original author's claim that the 95% solution is "vastly easier and entirely sufficient" _may_be true for the numerical optimization problems given in the example, but there are most certainly np-complete problems which cannot be approximated to any degree of accuracy.

for example: given a graph, the max-clique problem asks 'what is the largest fully-connected subgraph?' this problem arises very naturally in social networks; it's equivalent to asking 'what is the largest group of people who are all friends with each other?' This problem cannot be approximated within ANY bound of the optimal solution unless P = NP. The same problem arises in computational biology and chemistry as well.

That's a very good point, but my concern is that the provided examples dumb down issues to a level where this is not easily understood. I mean, who gets closer to grasping this nuance bu supposing a woman building towers of rocks on the beach?

Actually, I think the cockpit-switches-example would be a valuable example.

A picture like this http://en.wikipedia.org/wiki/File:Exponential.svg would help.

To rub it in, the x axis should go to 20 or so, e.g. http://www.wolframalpha.com/input/?i=2**x+vs.+x**2+%28x+from...

This sentence doesn't read quite right to me: "Most mathematicians also believe that are any NP problems that are not P problems."

It feels like there are a couple of words missing, but I'm reluctant to guess at them and potentially change the meaning of the sentence.

I think your subset sum problem (the rock piles) is decent. However, for it to be useful I think you have to make it clear which class it is in (NP) and why (it is easy to check a solution and 2^n is not polynomial in n).

The real difficult part of explaining this concept to the layperson is getting them to understand what asymptotic complexity is. The example I always use when I explain this to people with little or no math or computer science background is sorting algorithms, explained using the following example of a deck of cards. Everybody's familiar with a deck of cards, and it doesn't take much more math than the most intuitive bits statistics.

Imagine I give you a deck of perfectly shuffled (i.e. uniformly distributed) cards, and decree that you need to find the ace of spades. What you have to do is go through each card one-by-one, and stop when you've found it. You'll have to look at, on average, 26 cards, since there are 52 cards in the deck, and each card has an equal chance of being the ace of spades. Now, let's say I add another 52 cards, this time with a defaced ace of spades, and wanted you to find the original (non-defaced) ace of spades, which is now just one card out of 104. On average, this'll take 52 cards.

Notice that when I double the amount of cards I gave you, the amount of work (that is, looking at cards) you had to do also doubles, meaning that for N cards, the amount of work is N/2. When the amount of work is directly related to the number of cards, excluding dividing or multiplying by a number that does not depend on the number of cards (in this example, dividing by 2), then we simplify things and say that this process for finding the Ace of Spades takes BigO(N) work to finish.

Now, imagine I give you that same deck, and instead of demanding that you find the ace of spades, I require you to sort it out into suits, with the cards in each suit in increasing order. One way to do this is to keep a separate pile for each suit, with that pile in increasing order. Deciding where to put the first card of each suit is easy, you just make a new pile for it. The later cards are not as easy, since you have to go through the pile and find the right place to put the new card.

Right away you can tell that this sorting takes more work from you than finding the ace of spades does. Instead of just looking at each card to see if it's the ace of spades, you first have to check the suit of the card to determine which pile it should go in. Then, you have to look through that pile for the proper place to insert the card. I won't go into the math, but it turns out the best way to sort a deck of cards into piles of suits[1] like this takes N * lgN work, where lgN is a fancy number that isn't constant, but gets bigger when N gets bigger, unlike the division by 2 in the ace of spades example. That means that sorting cards is harder than finding the ace of spades.

This business of determining how hard tasks like sorting cards are is called "analyzing the complexity" of a task. Except, instead of saying task, computer scientists like to say "function." The question of whether or not P = NP is a question about whether a certain kind of tasks called "deterministic Polynomial" tasks (which are referred to by the P) are as hard as another kind of tasks called "Non-deterministic Polynomial" tasks (referred to by the NP).

I have a second part that tries to explain the difference between determinism and non-determinism using a magic deck of cards where we can always draw the card that we want (the "magic" deck is, of course, merely a "non-deterministic" deck), but I've already spend an hour or so writing this first part up. I can post the second part of the explanation tomorrow, if folks are interested.

[1]: This is meant to be a memory constraint. Obviously, radix sort takes linear time too if the key sizes are constant, but has a memory overhead of N. As far as I'm aware–and please correct me if I'm wrong–the fastest sorting algorithms with constant memory overhead are comparison-based sorting algorithms, which take N lgN time. But, even if I am wrong, it's incidental to the main purpose of the explanation.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact