Hacker News new | past | comments | ask | show | jobs | submit login
Genetic Algorithm for Knapsack Problem (kataklinger.com)
45 points by signa11 on June 19, 2013 | hide | past | web | favorite | 27 comments



Nitpicking, I know, but I think the author has misinterpreted the XKCD strip. IMO, " - as fast as possible, of course. Want something on the Travelling Salesman?" is not a continuation of the client's request (Give me set of appetizers, fulfilling these requirements, as quickly as possible), but rather a response to the waiter ("I have to see to these other tables" - "and you have to do so as quickly as possible"). This makes the mention of "Travelling Salesman" more appropriate (since it deals with optimizing travel time for certain visits).

Am I misinterpreting the dialogue?

(Not that it matters, of course!)


Every NP problem can be converted in another NP problem, so you could probably try to convert your subset problem to a TSP problem (but it wouldn't be useful in practice)


Nitpicking: In general, an NP problem can be reduced to an NP-hard problem (in poly-time).


No, you're not misinterpreting. The character is drawn holding a bunch of papers, and he's apparently offering a paper on Traveling Salesman to be "helpful" to the waiter who has 6 tables to visit "as fast as possible".

So, in a way, this entire solution is the product of an error in reproduction. :)


Huh, I didn't get that, but I think you're right. deep :-)


You're right, its actually the sum of subsets problem


I'm mildly annoyed that they don't actually post the solution their algorithm found.


In this case, since the possibilities are fairly small, you can solve this pretty quickly anyways. I wrote a very naive algorithm in ruby and it solved it it in 7 seconds.

  # Initialize it with the most we can need of any individual numbers, plus the maximum possible zero's we could use
  # (Just picked 5, because it can't take more than 7, and we have at least 2 of everything. I know there are better ways of doing this, but I was just hacking something up quickly).
  x = [2.15,2.75,3.35,3.55,4.2,5.8].map{|n| [n]*(15.05/n).to_i}.flatten + [0]*5
  # Find all of the combinations that add up to 15.05
  x.combination(7).select{|c| c.inject(:+)==15.05}.uniq
There are two answers to this.

  [2.15,2.15,2.15,2.15,2.15,2.15,2.15]
  [2.15,3.55,3.55,5.80]


from what I understand from Koza about genetic programs, it will be indistinguishable from garbage anyway.

That's one of the problems with Genetic Programming in general - you might get a specific program optimised to a specific problem, but you never learn anything about the problem space from it.


I think he means the best combinations of dishes found.


Genetic Programming is a slightly different thing than a Genetic Algorithm, a subclass, really. I didn't read the code too closely on account that C++ template code makes me want to dig my eyes out, but he said he was doing a GA. A GP is a GA where the genes represent Abstract Syntax Trees and the alleles represent instructions.


Nitpick: there's actually no template code except for passing a few type parameters to the GA library, no different from using ArrayList<MyClass> in Java. However, there's a massive overuse of namespaces and fully-qualified class names - a few using directives would make the code much more legible.


Oh, so this blogger is misusing the term (I scanned a little way down the article and thought I'd concluded he was doing GP). In that case this is much less interesting!


I don't get the need for the GA library here. It doesn't seem to have simplified the problem at all. This code looks several orders of magnitude more complex (or at least just ugly) compared to a hand-written GA that just used std::vectors as the genes.


How does this compare to a the typical branch-and-bound or dynamic programming solution?


It is very likely worse. GAs and ESs are good only if you can't come up with a dedicated heuristic that understands the structure of the problem it aims to solve.


For this specific problem, if it's allowed to have several items of the same type, the solution is easily noticeable while modelling dynamic programming problem, without writing any code.


Can you use the word 'solve' as they do in the article? As this only ever gives a best-found-so-far answer, is that 'solving' it?


Also without knowing the details of the library or the implementation given's interaction with said library whether or not this is capable of getting stuck on local maximas and minimas (or perhaps already is).


In reality this is a subset sum problem (unless I overlooked weights on the appetizers).

Rather than using the GA for this, there are extremely efficient dynamic programming methods available (in pseudo-polynomial time). References on the Wikipedia page (https://en.wikipedia.org/wiki/Knapsack_problem).

Since GA is a metaheuristic, it is really great if you don't know how to tackle a problem more efficiently. But like most metaheuristics, not really an ideal method for optimization.

To get a better visual sense of what GAs do wrong, take a look at: http://www.demo.cs.brandeis.edu/pr/buildable/crane/

You can see that the bridges and cranes have redundant bricks, are not straight, and just generally messy. However, GA gets the job done.

There is a lot of enthusiasm over GAs, but it is just a refined form of random search, and not that different from Simulated Annealing.


Are you taking Discrete Optimization in coursera ( https://class.coursera.org/optimization-001/class/index )?


In my experience, genetic algorithms with a known target goal (i.e. your fitness function is "how far away from the target is this gene?") are no better than hill-climbing. The problem is that allele sharing is apparently useless (I have never witnessed two wrong parents produce a right child, over numerous experiments with GAs. I get the theory behind it, it just doesn't seem to happen in the real world) and mutation--in order to provide enough variation to find the answer in a short period of time--needs to be nearly 100%, at which point you might as well just iterate. With the propensity for genetic algorithms to fall into local minima and maxima and the time it takes to tweak all of the parameters of mutation and sharing and number of parents, etc., you might as well have just hill-climbed the problem and gotten just as good of an answer in a quarter of the time.

Genetic algorithms are useful if you have three conditions: your parameter range is over a practically-infinite space, you know that hill-climbing will end in a local min/max, and your fitness function is open-ended and relative to previous generations, not an end goal. GAs can sometimes get out of local min/max on their own, so they offer a slightly better chances than randomly distributing seeds in a hill climbing algorithm, and a very large parameter range favors mutation and allele sharing over directed hill climbing. Finally, lacking a known end point, you may actually be interested in the intermediate results of the GA, rather than just the end result.

Anyway, GAs are completely worth studying, because they are fascinating tools for problem solving for the right class of problem, just don't make the same mistakes I did early in my playing that they are a general problem solving algorithm.


Since GAs are analogue of biological natural selection/evolution, I will explore that a little:

Evolution didn't build humans by starting with the problem of 'build a human' - it branches into solving various local niche problems simultaneously (primarily motivated by, how to replicate self). See also: Darwin's finches.

I would assume that biological evolution also spends a lot of time stuck in local maxima with short periods of volatile change as a new local maximum is identified (this hypothesis fits perhaps conveniently with the fossil record).

So natural selection is excellent for blindly solving problems with progressive goal(s) - which are rarely of the type identified by pure computer science. But, hard optimization problems with an open-ended solution could be well-suited.

A problem such as this where a near-miss answer doesn't get you any closer to the correct answer makes it difficult or impossible to identify relative genetic fitness - evolution is more likely to get stuck in an ineffective local maximum. Evolution may be slower at solving this problem than pure random guessing.

Interestingly, the article introduces the aspect of preparation time, which is not exactly what I understood from the xkcd comic. I believe this would allow an artificial escape route for the algorithm, since it's less likely to get stuck in a wrong local maximum considering price alone.

tl;dr: Agreed, biological evolution doesn't seem to effectively solve a yes/no problem like this.


I saw a phenomenal talk in grad school about why genetic algorithms are generally not as good as stochastic hill climbing. Unfortunately, I can't track down any references to it.

The punch line was that they mathematically proved that genetic algorithms produce less optimal solutions than hill-climbing.

So why are "genetic algorithms" so common in nature? Well, it turned out that genetic algorithms were better at producing solutions that were resilient to changing environmental conditions. They produced a less sharp peak at the solution they did find, so when a change occurred in the underlying conditions, the still had a reasonable chance of being at least ok.


Yes, my personal experience bares that out. If one has the notion of trying a genetic algorithm, one should first try hill-climbing. The GA is useful insomuch as the intermediate results are interesting.


The validity of your statement pivots on the domain in which you are applying GA. I do not think your three conditions cover the the following case.

In a previous life, we applied GA to finding defects in design and implementation. The genes were atomic actions one could take with the product. Chromosomes were scenarios. In this case two valid scenarios can be crossed to expose a defect. The system was monitored while the scenario was processed, and fitness was measured as whether or not a defect was detected.

One excellent side-affect of this process was that when defective scenarios were used to seed population, we received convergence to a minimum reproduction.


Well, I felt inspired to do my own GA program http://wedusc.com/ga/

with a discussion thread here: https://news.ycombinator.com/item?id=5909829




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: