
Helping a stranger, and why you should understand NP-complete - lifebeyondfife
http://lifebeyondfife.com/helping/
======
thaumasiotes
> NP-complete is essentially shorthand for belonging to a class of problems
> that are computationally very hard

It's totally correct, but it always bothers me a little to see people talk
about NP in those terms; much, much more difficult problems are known.
Recreational wikipedia reading taught me a while ago the the regexp
equivalence problem ("do these two given regular expressions define the same
language?") is EXPSPACE-complete, where EXPSPACE contains all the problems
that:

\- Require additional space bounded above by an exponential function in the
input.

\- May require an unbounded (though obviously finite) amount of time.

That second point is a doozy, and the problem so accessible that I'm fairly
sure many people have independently come up with it and wished for a solution.
That's not going to happen, though; a solution would be good for much more
than just deciding whether two regexps are equivalent. Any NP-complete problem
is utterly trivial by comparison.

~~~
vph
>> NP-complete is essentially shorthand for belonging to a class of problems
that are computationally very hard

That is actually incorrect, and is a common and serious misunderstanding of
beginners. NP-complete problems are not hard to solve (e.g. via brute force).
They are however very hard to solve _efficiently_ (i.e. in polynomial time).

~~~
reverius42
I think "computationally very hard" means exactly what you say NP-complete
problems are ("very hard to solve efficiently"). Typically when one talks
about the computational difficulty of an algorithm, they are talking about the
computational resources it takes, not the amount a human has to think to come
up with the algorithm.

------
jdmitch
_Think of NP-complete problems as if they were a brick wall like the red plot
in the above graph. Understanding the theory behind the NP-complete complexity
class means recognising intractable problems in your code – it’s like having a
bomb sniffer dog when walking through a minefield._

Very helpful analogy - hadn't thought of it like that. Now I'll know whether
the seemingly intractable problems in my code are only temporarily intractable
due to my incompetence, or actually intractable for grander reasons :)

------
fjwolski
There're few other things you can do to possibly speed up the second code
snippet, by reducing dimensionality:

1) recognize q1 * q2 = 2^p1 * 2^p2 = 2^(p1 + p2), so instead of iterating over
q1 then q2, iterate over p = p1 + p2

2) memorize some of the combinations; e.g. if you iterate over p, r1 & r2,
then you know that all you're trying to find is two integers f1 & f2 that are
equal to output * r1 * r2 * 2^p / input; so before you start solving the
problem, calculate table T:

    
    
      for (int f1 = 1; f1 <= 256; ++f1)
        for (int f2 = 1; f2 <= 256; ++f2)
          T[f1 * f2] = f1
    

now you can check whether integer pair (f1,f2) exists, you can just check
whether T[output * r1 * f2 * 2^p / input] is non-zero. Also, output * r1 * r2
* 2^p has to be divisible by input, so after you iterate over p & r1, you can
iterate only over r2's being multiples of input / gcd(input, output * r1 *
2^p)

There could be possibly some more tricks found.

~~~
bazzargh
Another way to reduce the number of loops:

    
    
              for (int q2 = 1; q2 <= 256; q2 <<= 1)
                if (output * r1 * r2 * q1 * q2 == input * f1 * f2)
    

This doesn't need a loop. Set q2 directly and test if it is a power of 2.

    
    
         n2 = input * f1 * f2;
         d2 = output * r1 * r2 * q1;
         q2 = n2/d2;
         if (__builtin_popcount(q2) == 1 && q2 * d2 == n2) {
    

... taking advantage of the fact that calculating the Hamming weight of q2
doesn't need a loop.

[http://en.wikipedia.org/wiki/Hamming_weight](http://en.wikipedia.org/wiki/Hamming_weight)

------
ColinWright
Here is one case where the original title would be significantly better than
the title given:

    
    
        Helping a stranger (and why you
        should understand NP-complete)

~~~
lifebeyondfife
I ummed and ahhed over it. Really it's two posts, one about helping someone
out and another about NP-complete problems in the wild. Thought I'd go with
the Hacker News angle for Hacker News (hopefully to encourage / keep up the
helping).

~~~
p4bl0
Well, for me the _Hacker_ News angle would have been the part about NP-
complete problems ;-).

------
kfk
_You have to be knowledgeable about a domain that can’t easily be simplified
by series of web searches, and more importantly, there has to be a suggestion
that someone might be seeking that clarification._

You won't believe how difficult is to offer help if you are not a professional
programmer of some sort (in HN and outside HN). Even for open source projects.
Even if you try to provide insights on your "professional" job that, in
theory, is full of "startup potential" (finance, like we are still using MS
Access and messy spreadsheets).

------
gms7777
> They can tell you that the most efficient algorithm for ordering
> (randomised) data is Quicksort, which has a Big-O notation of O(nlogn)

I know this is nitpicky, but quicksort is, worst-case, O(n^2), meaning a large
number of comparison based sorting algorithms are faster than it. Its in the
average case that Quicksort wins.

~~~
lifebeyondfife
That's why I said "(randomised)". The worst case occurs when the list to be
sorted is already ordered and you select the head of the list as the pivot
point.

Even though other algorithms such as Mergesort also have O(nlogn), Quicksort
is normally the preferred implementation because it's relatively easy to do
in-place and generally is the most efficient[1] of all the sorting algorithms.

[1]Not my field of expertise though, happy to be told I'm wrong.

~~~
gms7777
You're definitely not wrong, its generally the fastest. Efficient is a bit of
a loaded worded in these contexts, and I feel as though when discussiing
algorithms, unless otherwise specified, the assumption is you're discussing
_worst-case_ efficiency (because while worst case isn't always the most useful
metric, is usually the most easily defined and derived one). And randomizing
data doesn't necessarily guarantee that you won't still land on a worst-case
starting order (as is the nature of randomness). As I said though, that is an
entirely nitpicky point.

~~~
tubbzor
>And randomizing data doesn't necessarily guarantee that you won't still land
on a worst-case starting order (as is the nature of randomness).

This is the beauty of randomized analysis in the worst-cases. The worst-case
occurs if and only if the random generator spits out a sorted list. If all
permutations are equally likely, a list of n elements has probability 1/n! of
coming out sorted = O(n^2).

Even in practice, this is very pessimisstic as it only occurs with a
probability of 1/n! and is therefore extremely rare.

------
picomancer
This is actually a number theory problem. You want to have

    
    
        input * A = output * B         (1)
    

where

    
    
        A = f1 * f2                    (2)
        B = r1 * r2 * q1 * q2          (3)
    

where q1, q2 are powers of two.

Any good elementary number theory book will tell you that the smallest
solution to (1) is to let M = lcm(input, output) and set A = M / input, B = M
/ output. Given the further constraint that A <= 2^16, you can compute offline
a 2^16 element lookup table which gives values of f1, f2 in the desired ranges
for each possible value of A. You can re-use the same table to go from your B
value to r1, r2 if you first pull out as many powers of two as you can and put
them in q1, q2.

------
danbruc
The first point of your definition of NP-completeness is backwards - every
problem in NP must be reducible to the problem, not the other way round. The
paragraph following the definition has it backwards, too.

------
saosebastiao
Great post, I really enjoyed it. I wish there were more people with your skill
set in my realm of expertise...it is amazing how many problems could be solved
if they can be identified, classified, and attacked methodically like you just
showed (instead of loosey-goosey hand-cobbled code). I can think of a few huge
problems (big $$$) in my org that could be optimized really well if we had
more people that could understand and apply constraint programming.

~~~
eru
You might want to check out Zimpl. It's a language to make it real simple to
formulate mixed linear programming problem. (Mixed here means integer and
continuous.)

