
Genetic brainfuck programming - r00nk
https://github.com/r0nk/corvus
======
jostmey
As someone with a background in computer programming, I found biology quite
frustrating. I wanted to see some underlying structure or meaning in the
genetic code. It took me a long time to realize that it wasn't there.
Biological systems and their genetic codes are the result of random trial and
error filtered through Natural selection. Biological systems clearly have a
design because you see the same "form" reappear in each generation, but the
design itself is completely random. In other words, the design has to work,
but it does not have to be sensible.

To summarize, the design of biological systems is random. It was a hard pill
for me to swallow, and an utterly dissatisfying one at that.

~~~
fragsworth
> To summarize, the design of biological systems is random. It was a hard pill
> for me to swallow.

There are so many things about our universe that are very non-random, such as
the physical laws, and more specifically the spherical shapes of
planets/stars, the flatness of their surfaces, the flow of energy from stars
to planets, etc.

An analogy could be that life forms randomly evolving with these constraints
are like water particles randomly floating through a river. There are plenty
of variations in where the particles can go, but they flow in a general
direction.

~~~
jostmey
I guess I was trying to draw a distinction between biological systems and man
made machines. I find the difference striking. I can look at a car engine, and
if I stare at it long enough, I can make sense of it. The engine makes sense
to me not just because it was designed by another human but also because its
design is non-random. When I stare at the protein KcsA, which is the
biological system that I study, I cannot make sense of its design. My
conclusion is that its design is an outcome of chance. The protein works and
has a design but it is not one that I can figure out.

Man made machines have a rational design. Biological systems have an "
_irrational_ " design. The most striking characteristic of a randomly designed
system is that you cannot restart it. Once you shut it off it dies
permanently.

~~~
apl

      >  The most striking characteristic of a randomly designed
      > system is that you cannot restart it. Once you shut it off
      > it dies permanently.
    

I don't see how that is a consequence of the underlying design principle.
There are many organisms that survive de-facto death through freezing etc. You
could also engineer machines to not be restartable, and under certain
circumstances that may be a desirable feature. There's no connection between
what you call rational or irrational design and restarting.

Moreover, there are natural mechanisms with "sensible" design. To me, the
mammalian retina is very carefully calibrated, perfectly understandable
machinery implementing various filters and feature extraction networks.
Examples abound.

~~~
nazka
Yes saying it is that random is like to expect nature to build a complex
organism like us by only being like a monkey typing randomly on the keyboard
of chemistry, physics etc... and be able to create that.

Even if it was the case then the earth would be full of defective cells to
animals across the whole tree of life with only a few working well. It's not
what we find when we mine the earth or what we have today.

But even still with a "keyboard" that complex.. You will need more than
billions of years to accomplish something that complex randomly.

------
akkartik
My #1 link for anyone interested in genetic programming is the language
designed from the ground up to be programmed genetically:
[http://faculty.hampshire.edu/lspector/push.html](http://faculty.hampshire.edu/lspector/push.html)

------
reitzensteinm
It's been a decade since I have written a BF program, but if I'm not mistaken,
the winning strand in the gif example has quite a bit of dead code, for
example here:

[[]+<-,<<.><+>+,]

This entire loop must always be skipped, because if entered, the inner [] loop
will also enter, and run indefinitely.

It would be cool to run it through more iterations, to get to the local
minimum of length, rather than just a correct program.

~~~
robert_tweed
This appears to be GA rather than GP, but I did some experiments with GP a
while ago and the most counterintuitive thing I discovered was how _essential_
junk code turns out to be.

First, for anyone unaware of the difference, in Genetic Algorithms (GA) you
have fixed-length strings of instructions. Genetic Programming (GP) is a
superset in which you have ASTs, which can get arbitrarily large (which can be
a problem if you're not careful). I find GP much more interesting than GA, but
the practical issues with AST bloat are why you see a lot more GA examples.

Anyway, not wanting to run into overly large ASTs myself, I went ahead and
prematurely optimised my GP implementation by having it always collapse
functions into simpler forms where possible, e.g. (+ (+ 1 2) 3) becomes (+ 6).

It turns out that this causes huge problems, because all that "junk" code is
actually useful during evolution. Each generation inherits parts of its
parents (called "crossover"), but if you cut down the information available to
inherit as-per the above optimisation, you get much less variety between the
generations and become more reliant on random mutations. As a result, it's
like falling back on bogosort to find your solution.

Once you get down into final generations you can start running optimisations
like this and removing dead code, but if you do it too soon, you won't reach a
solution convergence quickly, if at all.

Interestingly, this principle seems to apply to DNA too, but DNA doesn't come
with an optimiser. Hence most DNA is "junk" but the junk is actually more
important than it seems.

It also turns out that genetically optimising (as opposed to hand-coding the
optimisation rules, like I did) is non-trivial too. You can either adjust your
main cost function to account for code size/speed alongside the problem you're
trying to solve, or you can do a second pass that aims to optimise the code
after you've already found an acceptable solution. Either way, there's a
certain amount of black magic involved to decide how many iterations to run,
or what constitutes a "good enough" or "fast enough" or "small enough"
solution so you can decide when to stop.

Code size is also irrelevant in GA because the code is always the same size,
but different instructions might have different costs, with NOPs having the
lowest cost. If you allow goto or recursion in your VM then you have to impose
a runtime constraint to avoid infinite loops, and can also assign a cost per
instruction executed or per microsecond of runtime.

~~~
ctdonath
I hope you've written this up somewhere for publication. You've solved the
"junk DNA" issue with computer science.

Best post on HN ever.

~~~
robert_tweed
It's already documented. I just didn't pay much attention and was thus able to
verify what others have observed. I highly recommend the book "A Field Guide
to Genetic Programming", which covers this and other associated topics. It's
one of the best CS books I've read in the last few years.

------
signa11
reminds me of this:
[http://www.generation5.org/content/2003/gahelloworld.asp](http://www.generation5.org/content/2003/gahelloworld.asp)

where a c++ program is 'evolved' to output "hello world"

------
rekibnikufesin
props for
[https://github.com/r0nk/corvus/blob/master/bird.c#L80](https://github.com/r0nk/corvus/blob/master/bird.c#L80)

------
codezero
Next step, do this with Malbolge :)

~~~
r00nk
Mal-wat?

~~~
codezero
[http://en.wikipedia.org/wiki/Malbolge](http://en.wikipedia.org/wiki/Malbolge)

Get to work! :)

~~~
r00nk
[http://imgur.com/r/futurama/bZNjJll](http://imgur.com/r/futurama/bZNjJll)

