
Deep Representation Learning with Genetic Programming (2018) [pdf] - henning
https://ccc.inaoep.mx/archivos/CCC-17-009.pdf
======
OkGoDoIt
Just read through most of the paper. It's basically an attempt to build an
autoencoder feature extractor using genetic programming, by having multiple
levels of feature extractors, somewhat similar to how convolutional neural
networks look for basic features (edges) at a low level and then complex
features (eyes, etc) at higher levels. They make a point of only using
arithmetic operations (add, multiply, etc) rather than higher-level operations
(edge detection, etc) in an attempt to not introduce any domain knowledge
requirement or bias. The paper also contains a lot of background of the
current state of genetic programming and a proposal for further research.

I think it's an interesting attempt to push GP using some of the structure
we've learned is useful for deep neural networks. I've always found genetic
programming intriguing, although it's hard to argue with its inefficiency
these days compared to DNN's.

------
nickpsecurity
For those new to Genetic Programming, you might find the Wikipedia entry, main
site, and Humie Awards interesting.

[https://en.wikipedia.org/wiki/Genetic_programming](https://en.wikipedia.org/wiki/Genetic_programming)

[http://www.genetic-programming.org/](http://www.genetic-programming.org/)

[https://www.human-competitive.org/awards](https://www.human-
competitive.org/awards)

~~~
verdverm
GP Bibliography is another great resource

[http://www.cs.bham.ac.uk/~wbl/biblio/](http://www.cs.bham.ac.uk/~wbl/biblio/)

------
meh2frdf
I wonder how many people actually read this 40 page doc having seen this link.
Would be nice if the OP gave us a clue why they posted it, perhaps a summary.

~~~
henning
I read a substantial fraction of it before submitting it.

I'm interested in evolutionary computation because it's simple to implement
and historically has competed with neural networks.

They offer the potential to create interpretable models.

~~~
jostmey
The method of optimization probably has little to do with the interpretability
of the model. Gradient optimization (i.e. backpropagation) can be used to fit
logistic regression models, which are interpretable. Evolutionary algorithms
can be used to fit deep neural networks, and will not make it more
interpretable.

I'd say the least intepretable systems are biological, which are the outcome
of an evolutionary process

------
verdverm
GP is not a great algo, fraught with issues from using RNG to search. Check
out the PGE paper (link to paper below). It will give you an overview of these
issues, how to deal with them, and a better alternative to GP search. In
recent years, I'm starting to think PGE related to AutoML and model search in
general. Would love to see what RL for PGE could do.

[https://github.com/verdverm/go-
pge/blob/master/pge_gecco2013...](https://github.com/verdverm/go-
pge/blob/master/pge_gecco2013.pdf)

(disclaimer, author of the paper, it was my PhD subject)

~~~
pmoriarty
GP is a gigantic field, comprising a huge number of approaches to many
different problems, while the paper you cite only offers a brief critique of a
small subset of GP as applied only to symbolic regression. So it can not
fairly be taken as a critique of GP as a whole.

Even when limited to symbolic regression alone, the objections they cite: 1 -
that "state-of-the-art GP implementations often fail to return the original
formula from which the input data was generated", and 2 - "the results
returned are inconsistent and difficult to reproduce. A user who is not an
expert in GP will not likely trust an algorithm which cannot reliably
reproduce the same results with each invocation..." are not irrefutable
arguments.

On point 1, not everyone is interested in "returning the original formula from
which the input data was generated". For one, for many problems the original
data was not generated with any formula, and any solution would suffice as
long as it meets whatever requirements the user may have. Take the evolving of
radio antennas with GP -- something that was done to a human-competitive level
decades ago with GP. There the user wants a working antenna with certain
properties, they are not interested in "returning the original formula from
which the input data was generated". Or maybe the users are interested in
evolving a team of soccer-playing robots (which was also done with GP). Once
again, they're not interested in "returing the original formula from which the
input data was generated". So for these users this critique would be
irrelevant.

On point 2, yes, there is a random number generated involved with GP, and if
the user is interested in evolving the exact same solution on a different run
of the GP without reusing the same PRNG seed, they'd have a problem. But I'm
not sure why they'd want to do that instead of just using the best solution(s)
on new data. If they just used the best solutions, then they should get the
same result every time when the solutions are used on the same data (unless
the solutions themselves involved randomness, which is not usually the case,
and would be independent of GP anyway).

~~~
verdverm
The main point of the paper is that you should account for the structure of
the solution space, and take advantage of that. For SR, one example is a+b =
b+a. For physical objects, like antenna, that would be symmetries along axes.
PGE can be generalized to any problem which can represent their solution space
as a trie representation and a grammar for local modifications or movements in
the search space.

With regard to PRNG, GP can only produce statistically significant results if
run a minimum number of times, because it is a probabilistic algorithm. My
stats professor said that was at least 29 times. If you fix the random seed,
you lose the power of the GP algo.

It's hard to trust results from GP research as the vast majority is not
reproducible. There is one guy who publishes the world best results to
Springer anytime someone out performs his results. Totally bogus work. The GP
field needs to adopt the publishing and open code / data that we find in the
DL/RL community.

~~~
nestorD
GP can take the structure of the space into account (you just need to take it
into account into your mutation and crossover functions) and algorithms are
typically compared on enough runs to provide statistically significant results
(29 runs might be far from enough if you have high variance...) but you don't
need reproducibility when solving an optimisation problem in the real world,
you just want to get a good result in a reasonable time. The fact that you can
use GP to optimize about anything (both parameters and structures) makes it a
good first approach (where it fails is that it tends to be slow and have way
too much parameters for a casual user).

(I did think that your paper is interesting in that it invites us to think
about non-GP based approach to symbolic regression)

