Hacker News new | past | comments | ask | show | jobs | submit login
A Field Guide to Genetic Programming (2008) [pdf] (metabiblioteca.org)
80 points by optimalsolver 22 days ago | hide | past | favorite | 35 comments



A long time ago a lone ricer experimented with this in Gentoo to find the best compiler flags:

[1] https://web.archive.org/web/20110312082043/http://www.coyote...

[2] https://github.com/Acovea/libacovea

[3] http://swmath.org/software/28819

[4] https://ieeexplore.ieee.org/document/5380987

I think he was way ahead of the times then.

Nowadays? Not so much, integrate it with all sorts of fuzzers, ccache, distcc, whatever and let the "compile cloud" churn.


This looks like a follow up to 2002 "Foundations of Genetic Programming" by the same Langdon and Poli.

I've found Genetic Programming to be a powerful technique when the solution space is itself an algorithm.

ML research has moved on to deep learning, but I've always found that, coming from CS background, GA/GP is much more intuitive a mechanism.


FoGP is more a theory book. This is designed to be an accessible hands-on guide.


The discussion of these things is so weird

The basic Genetic algorithm, accord to the text

    1:Randomly  create  an initial population of  programs
    from  the  available primitives (more on this in 
    Section 2.2).
    2:repeat
    3:Execute each program and ascertain its fitness.
    4:Select one or two program(s) from the population with
    a probability based on fitness to participate in genetic
    operations (Section 2.3).
    5:Create new individual program(s) by applying genetic
    operations with specified probabilities (Section2.4).
    6:until an acceptable solution is found or some other
    stopping condition is met (e.g., a maximum number of
    generations is reached).
    7:return the best-so-far individual
By making fairly reasonable variations of the concept of fitness and "genetic operation", it seems like one could have simulated annealing, stochastic gradient descent, particle filters and a host of other standard algorithms. Remember, they say "Select one or two program(s)" - select one and mutate is any monte carlo algorithm, right?

Which is to say GP seems more like a mix-in that can be added what ever algorithm (add some binary operations to the unary operations most algorithms effectively use to evolve). Yet GP is spoken of as an alternative all the other approaches, something wholly different. Plus what is spoken of as "genetic programming proper" is this basic loop plus a bunch of less explicit but very assumed approaches.


Genetic programming is not an algorithm. It is a community of researchers interested in applying stochastic optimization / metaheuristics methods to discover and/or optimize computer programs and other computational devices.

Those methods applied can be fairly broad, but not every monte carlo algorithm will work. Because of the nature of the problem space (computer programs), it's less common to see methods which assume the existence of a gradient in the space, or that the space is metric. More often you need more general-purpose, "weaker" techniques. You see simulated annealing and hill climbing, various combinatorial optimization techniques (ant colony optimization say), etc., but even now the dominant approach is to use evolutionary computation methods because they make few assumptions about the nature of the space beyond a smoothness criterion, and because they are very highly parallelizable.


Genetic programming is not an algorithm. It is a community of researchers interested in applying stochastic optimization / metaheuristics methods to discover and/or optimize computer programs and other computational devices.

-- Which saying that it can be formally just about anything but is knit together by various hunches and analogies.

This sort of approach has a certain strength through flexibility. I would offer that the strength of a fully formally defined system has the strength that is can be meta-analyzed - propositions about what the entire system is capable of can proven (for example, see neural network approximation theorem).

Also, neural networks today have a set-upon using "relu" layers, which basically "nothing but smooth" layers (continuous, piecewise-linear functions) and gradient descent works fine.


basically you use GP as a last resort, when you don't have any knowledge of the actual problem space that would help you make better guesses from previous ones.

This is also usually the default situation, because often we really don't know anything about the problem space.

Of course if you do know something about the problem space in the form of a gradient or probabilistic graph, and then you use those things & you won't be doing GP anymore.


No, you are missing the thing that sets GP apart from GAs and other metaheuristics. It is the P -- GP is about creating programs.


Genetic programming works basically the same as the other genetic algorithms, except that the individuals are variable size programs, whereas GA uses fixed length values like a vector.


Somewhat related, I've just been reading a summary [0] of a lecture by physicist/mathematician Stephen Wolfram on his quest to find the Program That Runs The Unverse.

An excerpt:

"Wolfram goes off onto something else. He talks about searching through the space of possible computer programs, for ones of a certain type that can solve problems, rather than the having to go through the labor of humans programming something for a specific task. His meaning is more than a little unclear here. Wolfram says we're constrained to follow historical patterns of engineering, to only create things such that we have designed them with an understanding of what they will do. Nature, he says, doesn't have this constraint. Alright, this make more sense. He's talking about mining all potential programs, including useful ones that we ourselves might not have thought of. That's quite an idea, and sounds like no small task to actually perform it.

Now he's bringing it in the direction I thought he would — comparing the search among the space of possible programs to natural selection, which is the natural equivalent of searching the space of possible biological programs. Now Wolfram is talking about whether we can search for our physical universe among the space of possible universes, and he's going still another level "meta."

--

etc etc

But doesn't a search in general program space almost immediately result in a combinatorial explosion?

How on Earth are we supposed to find these programs that Wolfram dreams about?

[0] http://futurisms.thenewatlantis.com/2009/10/stephen-wolframs...


Well GP is not random search, but still, the space is too huge for it to find anything of significance yet. I remember reading a paper that trained GP to write a sorting program, that is around the most complex problem we can solve with it now.

In my opinion, what makes natural selection work but not GP, is that the life time of Earth and the number of individuals existed vastly exceeds the typical parameters of a GP simulation. Biological programs are also less likely to be killed off by a mutation compared to GP programs, so in some ways, the gradient is a lot smoother.


There are lots of algorithms for optimizing the search of a space. If you have a goal in mind it’s possible to come up with a strategy that will get you there faster. Think breadth first vs A star


The terminology is bad. Don't confuse "search and optimisation"-type search with "pathfinding"-type search. A-star is the latter, but GP (and other metaheuristics like GA) are the former. To be clear -- if you're looking for eg a good exam timetable, you just need a good timetable -- you don't need a path from some other point in the space to that timetable.


From personal experience, I've always had better results from Simulated Annealing than Genetic Programming. I hasten to point out that it's entirely possible I've been doing GP poorly - but part of the problem seems to me that GP has so many more knobs/dials/parameters that need to be tweaked correctly in order to yield good results.


Are you referring to GA or GP? GP is not comparable to simulated annealing, as it's kind of non-parametric.

Many ideas of GP are getting reused in Bayesian program induction, in conjunction with differentiable programming, SAT solvers, etc [1]. IMHO a very promising route to AGI.

[1] https://web.mit.edu/ellisk/www/documents/dreamcoder_with_sup...


These are orthogonal things. Simulated annealing is an algorithm. Genetic Programming is pursuit, which can (among other things) use simulated annealing to achieve its goal. Are you sure you're not confusing this with something else? Did you mean Koza-style GP perhaps? Or did you mean a genetic algorithm (GA)?


Results on what? The effectiveness of an algorithm depends on the problem you target.

Edit: Also, if you broaden the definition just a bit, you could call simulated annealing a kind of genetic program.


Do you have any good references for SA? Thanks!


Just read wikipedia. [0] is also a good high-level overview of the different algorithms in metaheuristics.

[0]: https://cs.gmu.edu/~sean/book/metaheuristics/Essentials.pdf


Reminder that Karl Simms has produced some interesting results with Genetic Programming:

https://karlsims.com/evolved-virtual-creatures.html

https://karlsims.com/papers/alife94.pdf


Mind blowing and vastly ahead of their time.


In my opinion it's the closest we've come to "artificial intelligence" and probably the only true future path -- that of course only applies if GAI is possible, which is still in dispute


I've never managed to produce a useful function using GP but have found that in some instances, GAs can outperform both simulated annealing and the great deluge algorithm. Adding random mutations to the process can be especially useful when there are local minima/maxima that would stop an e.g. GDA instead of finding the global minima/maxima.


I can’t speak to GA functions, but GA antenna synthesis has been very successful. Probably because the number of decision variables can be tightly constrained.



Obligatory note: Genetic Programming is pretty much dead, except for niche applications (antenna design, fun animations of robots trying to walk). It's been almost 100% replaced by deep learning for all production applications or cutting-edge AI research.

So if you're new to the field, I would treat this as an interesting "what if", a vision of the direction AI could have taken... if it had ended up working. But not as a direction to invest a lot of your time in hopes of it paying off in a job.

(And of course, maybe someone will make a breakthrough which rejuvenates the field. If you want to have fun hacking with it, I would never discourage you! Just want to emphasize that it's not the current state-of-the-art, and is doesn't _look_ like the future of AI, as far as we can tell)


Super wrong.

Genetic programming has been expanded and generalized and is now a core aspect of many AutoML systems.

As another commenter mentioned, when the solution space is comprised of algorithms, programs, or even just functions, GP is still useful.

I am using ideas inspired by, and aligned with, GP in my work today on anomaly detection for multivariate time series data.

Deep learning is not the be-all, end-all just because it outperforms other algorithms on a small subset of possible data types (primarily images, and to a lesser extent, audio).


My PhD student is doing GP for (univariate) TS one-class classification.


Also text - ie. GPT3


GP has dedicated conferences and journals, hardly dead.

One interesting direction is the mixture of traditional ML and GP, and some of the “deep GP” variants that combine some of GP’s strengths with some of DL’s.


Just arrived in my inbox: IEEE Software CFP for special issue on program repair. The editor is a GP person.

https://www.computer.org/digital-library/magazines/so/call-f...


Would you mind explaining why it didn't take off?

Imagine problem of domain but something definite would be nice as having trouble finding anything

Thanks


GP hasn’t progressed much because the problem is super hard. In general, how do you search program spaces in a way that gives you smooth gradients to search, or even operators that have nice properties? Turns out semantic GP (a relatively new direction) can give you nice properties but it has other flaws.

Furthermore, how do you know if your program is “correct”? a programmer may use test cases and GP the same, but a programmer doesn’t just write code to pass test cases.

DL code writing methods are fine if you just want to generate a bit of code for some that pretty much already exists, but GP is supposed to find completely new solutions to hard problems... so you run into this issue of “what does it mean to solve this problem.”

I also think it hasn’t received the funding or talent it needs to progress. Probably something will come out of the field that suddenly sparks interest and it will take off again.


>> GP hasn’t progressed much because the problem is super hard. In general, how do you search program spaces in a way that gives you smooth gradients to search, or even operators that have nice properties?

That's where coding theory comes in. I did some work in this area over 20 years ago during my PhD. The natural genetic coding is a very clever thing - it doesn't code the organism directly - it codes the instructions for building an organism. So naturally, there is a hierarchy of sites on the genome. The sites that code for the earliest instructions have the most impact on the final form, sites which are used later have smaller impacts. So as a human embryo develops, it only differentiates itself from a fish embryo, or a mammal embryo or a primate embryo by stages.

So the sites used as the earliest stages of phenotype development have the biggest impact on the shape of the overall fitness landscape, with smaller impacts from sites used later. this gives rise to a fractal-type fitness landscape - which has sufficient smoothness to enable the genetic algorithm to explore efficiently.

So if you want to make GP or any other type of GA effective, you need to think about your coding schema - coding schemas that build solutions through a process rather than those that directly code the solution are more likely to give rise to suitable fitness landscapes.


>antenna design

you are confusing genetic programming and genetic algorithm




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: