The Easiest Hard Problem (2002)

phkahler · on Aug 7, 2015

Here's an interesting partitioning problem: Split the first N primes into 2 groups such that the difference of products is as small as possible but not 1. If this number is less than the square of the Nth prime, it will be prime. For example:

split 2,3,5 into 2 groups: (2,5) and (3) the difference of their products 2x5-3 = 7.

split 2,3,5,7 into 2 groups (3,7) and (2,5) the difference is 11. Note that (2,7) and (3,5) has a difference of 1 so don't go there.

5x11 - 2x3x7 = 13

Is it always possible to create a partition that produces the N+1th prime?

AFAIK this is a problem of my own conjuring, but I also know that most such things already exist and have a name.

nick-parker · on Aug 7, 2015

It's an interesting heuristic, but I think your limitation that it be less than the square of the Nth prime is a case of overfitting.

Here's a list of results for the first N primes, for different Ns: 3: 7 4: 11 5: 13 6: 17 7: 107 8: 41 9: 157 10: 1811 11: 1579 12: 18859 13: 95533 14: 310469 15: 1995293 16: 208303 17: 2396687 18: 58513111 19: 299808329 20: 2933961157 21: 3952306763 22: 33298242781 23: 115405393057

I just wrote a script to brute force all combinations here, so the 24th iteration got very slow, but of all those the nonprime ones are n=23, 19, 18, 16, 14, and 13.

So, your rule appears to hold true as long as N is less than the cube of the Nth prime. The actual rule is probably more complex than a simple power, considering 83^5 is far less than the result in N=23.

I think I'll play with this a bit more this evening then ping some mathematician friends about it. They always love playing with weird properties of primes.

phkahler · on Aug 7, 2015

>> It's an interesting heuristic, but I think your limitation that it be less than the square of the Nth prime is a case of overfitting.

Maybe, but the result is obviously guaranteed to not be a multiple of any of the N primes. Since the smallest factor has to be greater than N, that means prime for up to N^2 (or (N+1)^2 I suppose).

phkahler · on Aug 7, 2015

Confused N for P_n in that.

harryjo · on Aug 7, 2015

What's the largest example you have verified?

phkahler · on Aug 7, 2015

>> What's the largest example you have verified?

I don't recall. I don't think I went beyond the use of 32bit integers (or was it 64) when I first looked at it, so not very large. The intermediate products can get quite big quickly. I do remember using a binary number to represent group membership of each prime, and seeing a pattern that broke down after only a few primes.

phkahler · on Aug 7, 2015

Just had a great idea for exploring this... If you add the logarithms of the primes in a group you can know how big the product will be. The products must be of similar size, and if so you can do the math mod k where k is a power of 2, like 2^64. This might allow some speedy C code without bignum support.

SilasX · on Aug 7, 2015

Great article! This gives an excellent overview of NP-complete problems, the difficulty of constructing hard instances, the critical ratio in determining the solution count, and the analogy to physical systems' phase transitions. Also gives some intuition on why the greedy algorithms can be good enough in pracitce and when they wouldn't be.

vog · on Aug 7, 2015

This reminds me of a challenge at the Free University Berlin. A lecturer gave the following list of numbers to his students. The task was to find two different subsets that have the same sum:

http://www.inf.fu-berlin.de/lehre/WS07/mafi1/90zahlen.txt

His point was: It it simple to prove that those subsets must exist, but it is very hard to compute them. To make his point even more clear, he offered some prize money if anyone found an explicit solution.

A friend told me about that challenge, and I tried. With a lot of conceptual work, programming, optimization and some luck I was finally able find such a solution (and received the prize):

https://njh.eu/90

(Sorry, German language, but the solution is readable, I guess :-))

Strilanc · on Aug 7, 2015

Why is it easy to prove that this particular set of numbers can be broken into equal subsets?

vog · on Aug 7, 2015

You just have to compare the number of subsets with the number of possible sum values.

1) There are 90 numbers, so we have 2^90 subsets.

2) All numbers are < 10^25, so the sum of all 90 numbers is < 90 * 10^25 < 10^27. So each sum of a subset is an integer in the range of 0 to 10^27-1. So we have ≤ 10^27 possible sum values.

Since 2^90 > 10^27, we have more subsets than possible sum values. Hence, there are at least two subsets having the same sum.

BTW, this is a beautiful application of the Pigeonhole principle: https://en.wikipedia.org/wiki/Pigeonhole_principle

crystalmeph · on Aug 7, 2015

Regarding item 1, yes there are 2^90 ways to choose one subset, but since you are comparing two separate subsets, those subsets can't overlap (or else you could just choose the same subset twice!), so it seems like there should be "more" to this calculation? Am I missing something here?

Edit: wait I think I've got it:

Each subset implicitly also chooses it's counter subset, e.g. if you choose the subset consisting of the first 45 numbers, you've also therefore said the other subset consists of the last 45 numbers. Since there are more of these "dual subsets" than possible sums, the pigeonhole principle yadda-yadda.

vog · on Aug 7, 2015

I'm think you misunderstood the question.

You are not forced to use all numbers. (And in fact, my explicit solution doesn't use all numbers.) There is no restriction that the union of both subsets must be equal to the full set. Note that this restriction would change the question dramatically. See also: https://news.ycombinator.com/item?id=10023098

Also note that it is allowed for both chosen subsets to overlap. In fact, the proof just says that two different (i.e. not entirely equal) subsets with the same sum exist.

However, once you have a pair of different overlapping subsets of the same sum, you can simply remove the intersection from both sets. Both sums decrease by the same amount. You then get a pair of two disjoint sets that have the same sum.

harryjo · on Aug 7, 2015

The number of partitions is 2^90 / 2, since each partition has 2 representations. (The rest of the proof still holds.)

vog · on Aug 7, 2015

You misunderstood the question. This is not about partitions. The union of the subsets is not required to be the full set. See also: https://news.ycombinator.com/item?id=10023034

And no, I don't think the proof would not work when requiring the two subsets to form a partition. Also note that for partitions, the number of sums would be one, as the only possible sum is half of the sum of all numbers. (And the problem would be unsolvable if the sum of all numbers was odd, but all numbers were still integers.)

Strilanc · on Aug 7, 2015

Ah, of course.

(I didn't realize you were allowed to use each number in both subsets, so I have no idea if I would have figured it out or not on my own.)

noobie · on Aug 7, 2015

> The sum of all 90 numbers is < 2^89

Why? (Obviously you wouldn't compute it so what's the "trick"?)

SilasX · on Aug 7, 2015

The sum of 90 numbers in this context is considered to be easy (esp relative to finding the sum of many possible subsets).

vog · on Aug 7, 2015

That's true. However, it is also easy to estimate that sum rather than calculating it exactly, so I adjusted my proof accordingly.

unlikelymordant · on Aug 7, 2015

Given the subsets it is easy to verify the subsets come from the set and their sum is the same. Finding them is hard.

vog · on Aug 7, 2015

I'm afraid you missed the point.

skyrockect9 · on Aug 7, 2015

The first heuristic algorithm that jumps to my mind to solve this problem goes like this: First sort the numbers into a list. Second take elements from the head of the list to construct a subset A. Third take elements from the tail to construct a subset B The second and third steps are done in such a way that the sum of A and B are nearly equal. Then one can hope that this procedure make the elements that are still in the list to be near identical and so allow you to construct nearly identical sums.

skyrockect9 · on Aug 7, 2015

The hope is that the remaining elements in the list l are similar in the sense that max(l)/min(l) is small, the heuristic could be improved one you try to make a trade off between max(l)/min(l) in the list and the difference between sum A and sum B.

deepnet · on Aug 7, 2015

Stephen Mertens Paper * The Easiest Hard Problem: Number Partitioning* is a good review paper of the subject covering the phase transition between P and NP this problem exhibits and extending it to k-partitions.

http://arxiv.org/abs/cond-mat/0310317

bit-player · on Aug 7, 2015

PDF version here: http://bit-player.org/bph-publications/AmSci-2002-03-Hayes-N... (from the author's website)

malnourish · on Aug 7, 2015

At first his language was a little confusing -- when I saw "set" and then saw two 2s listed in his set. Are "set" and "multiset" often this interchangeable?

vog · on Aug 7, 2015

No, they aren't. This is just sloppy wording in the article. The concept of sets is deeply established in mathematics and when you talk about multisets rather than sets you have to state this explicitly.

malnourish · on Aug 7, 2015

That's what I thought -- it was what I was taught in school. But I was also taught that no one can agree on terminology.

harryjo · on Aug 7, 2015

In this, case, it's trivial to word the problem a "sets of objects, each with integer size, partition into sets of equal sum size". It is still polite to state that explicitly outside of the sample data, though.

ChrisArgyle · on Aug 7, 2015

Very enjoyable read, thanks OP.

Even though my math muscles haven't been exercised in a decade this article was easy to grasp.