Neuroevolution of augmenting topologies (NEAT algorithm)

Rhapso · 2024-12-04T13:19:56 1733318396

I post a link to NEAT here about once a week.

The big problem is that NEAT can't leverage a GPU effectively at scale (arbitrary topologies vs bipartite graphs)

Other than that, it feels like a "road not taken" of machine learning. It handles building complex agent behaviors really well, and pressure to minimal topology results in understandable and reverse interpretable networks.

Its easier to learn and implement than back propagation, the speciation is the most complex but awesome feature. It develops multiple solutions, clustering them to iterate on them separately. An online learning NEAT agent is in practice is an online collection of behaviors, adapting and swapping dominance as their fitness changes.

bob1029 · 2024-12-04T13:50:54 1733320254

> The big problem is that NEAT can't leverage a GPU effectively at scale

I worry how much experimentation we are leaving on the table simply because these paths wouldn't implicate nvidia stockholders in an advantageous manner.

Modern CPUs might as well be GPUs when you are running GA experiments from the 80s and 90s on them. We can do millions of generations per hour and populations that span many racks of machines with the technology on hand right now.

api · 2024-12-04T14:40:56 1733323256

The more immediate causal reason is that there is no off the shelf high performance hardware to accelerate these paths. GPUs were made for graphics, not machine learning. They just happened to be really good at running certain kinds of models that can be reduced to a ton of matrix math that GPUs do really really fast.

Something like Intel Xeon Phi or one of the many-many-core ARM designs I've heard talked about for years would be better for more open ended research in this field. You want loads and loads of simple general purpose cores with local RAM. Put like 4096 32-bit ARM or RISC-V cores with super fast local RAM on a die and transactional or DMA-like access to larger main memory, and make it so these chips can be cross linked to form even larger clusters the way nVidia cards can. This is the kind of hardware you'd want.

When I was in college in the early 2000s I played around with genetic algorithms and artificial life simulations on a cluster of IBM Cell Broadband Engine processors at the University of Cincinnati. That CPU was an early hybrid many-core design where you had one PowerPC-based controller and a bunch of simplified specialized cores like a GPU but more designed for general purpose compute. Programming it was hairy but you could get great performance for the time on a lot of things.

Computing is unfortunately full of probably better roads not taken for circumstantial reasons. JavaScript was originally supposed to be a functional language-- a real, cleaner one. There were much, much better OSes around than Unix in the 80s and 90s but they were proprietary or hardware-specific. We are stuck with IPv4 for a long time because IPv6 came too late, and if they'd just added say 1-2 more octets to V4 then V6 would not be necessary. I could go on for a while.

This has led to a "worse is better" view that I think might just be a rationalization for the tyranny of path-dependent effects and lock-in after deployment.

RaftPeople · 2024-12-04T15:57:50 1733327870

> When I was in college in the early 2000s I played around with genetic algorithms and artificial life simulations on a cluster of IBM Cell Broadband Engine processors

Early/Mid/Late 2000's I was working on a hobby project for artificial life with ga evolved brains. I spent a lot of time investigating options, like the cell processor (I bought a playstation to test with). I also looked at interesting multi-core cpu's that were being introduced.

I ended up with a combo of CPU+GPU on networked PC's, which was better than nothing but not ideal.

bob1029 · 2024-12-04T16:54:13 1733331253

> I ended up with a combo of CPU+GPU on networked PC's, which was better than nothing but not ideal.

The biggest constraint I've seen with scaling up these simulations is maintaining coherence of population dynamics across all of the processing units. The less often population members are exchanged, the more likely you will wind up with a decoupled population and stuck in a ditch somewhere. Since modern CPUs can go so damn fast, you need to exchange members quite frequently. Memory bandwidth and latency are the real troublemakers.

You can spread a simulation across many networked nodes, but then your effective cycle time is potentially millions of times greater than if you keep it all in one socket. I think the newest multi-die CPUs hit the perfect sweet spot for these techniques (~100ns latency domain).

RaftPeople · 2024-12-05T03:19:24 1733368764

> The biggest constraint I've seen with scaling up these simulations is maintaining coherence of population dynamics across all of the processing units

Agreed. For mine I purposefully avoided this problem by making the population+world relatively small (<100 creatures and a relatively finite space they could move in) so each pc handled one instance of a world and after each generation there was a process of sharing data so successful brains were available to each world for continued evolution.

My dream was to scale this thing up significantly, but life intervened, maybe someday.

epr · 2024-12-04T14:50:14 1733323814

> The big problem is that NEAT can't leverage a GPU effectively at scale (arbitrary topologies vs bipartite graphs)

Is that true? These graphs can be transformed into a regular tensor shape with zero weights on unused connections. If you were worried about too much time/space used by these zero weights, you could introduce parsimony pressure related to the dimensions of transformed tensors rather than on the bipartite graph.

Y_Y · 2024-12-04T15:24:54 1733325894

Or even use CuSparse, if you don't mind a little bit of extra work over normal cudnn.

https://developer.nvidia.com/cusparse

Rhapso · 2024-12-04T17:42:21 1733334141

Just because you can pack the topology into a sparse matrix doesn't make it actually go faster.

Sparse matrices often don't see good speedup from GPUs.

In addition, each network is unique, each neuron can have an entirely different activation function, and the topology is constantly changing. You will burn a lot on constantly re-packing into matrices that then don't see the same speedups a more wasteful topology pretends to have.

On the flip-side out narrative of "speedup" is on bipartite graphs crunch faster in gpus and it might not be the same if the basis is actually utility of behaviors generated by the networks. A cousin thread explores this better.

HelloNurse · 2024-12-04T17:13:08 1733332388

Another plausible strategy to neutralize arbitrary topologies: compile individual solutions or groups of similar solutions into big compute shaders that execute the network and evaluate expensive fitness functions, with parallel execution over multiple test cases (aggregated in postprocessing) and/or over different numerical parameters for the same topology.

PartiallyTyped · 2024-12-04T16:43:51 1733330631

We can also show that sparse NNs under some conditions are ensembles of discrete subnets, and the authors of the original dropout paper argue that [dropout] effectively creates something akin to a forest of subnets all in "superposition".

tomrod · 2024-12-04T14:17:06 1733321826

I feel a bit embarrassed as, despite almost 15 years working in the field, I've not played with NEAT yet nor read up well on it. TIME TO CHANGE THAT :)

Rhapso · 2024-12-04T13:24:36 1733318676

I think the major iteration that will come next is "NEAT in a dream" where a robot or agent online-trains a library of behaviors on an model of the environment constantly updated with the new experiences generated by the dominate behaviors interacting with reality.

buffalobuffalo · 2024-12-05T15:15:20 1733411720

I've used NEAT a few for a few different things. The main upside of it is that it requires a lot less hyper-parameter tuning than modern reinforcement learning options. But that's really the only advantage. It really only works on a subset of reinforcement learning tasks (online episodic). Also, it is a very inefficient search of the solution space as compared to modern options like PPO. It also only works on problems with fairly low dimensional inputs/outputs.

That being said, it's elegant and easy to reason about. And it's a nice intro into reinforcement learning. So definitely worth learning.

nurettin · 2024-12-04T19:16:49 1733339809

I never understood the appeal of NEAT. It's easy to conceive of mutation operators on fully connected layers instead of graphs of neurons and evaluate a lot faster. NEAT also seems to have at least a dozen hyperparameters.

Rhapso · 2024-12-04T20:58:27 1733345907

It mostly manages its hyperparameters itself.

There is a reward function, one of the hyperparameters we do have to set, for condensing functionality into topology instead of smearing it obscurely across laters. You can just look at a NEAT network analytically and know what is going on there.

kyma · 2024-12-04T16:57:07 1733331427

HyperNEAT is also really cool: https://en.wikipedia.org/wiki/HyperNEAT

It uses NEAT to evolve a CPPN which is then sampled at some specified resolution to determine the actual network topology. The really cool thing is the sampling resolution can be varied to scale the size of the network while maintaining the overall "shape" of the topology. I took Ken's neuroevolution course back in the day and worked with HyperNEAT before deep learning got big. Ever since, deep learning network architectures have always reminded me of the dense layered topologies that result from higher sampling resolutions with HyperNEAT. It would be interesting to see if HyperNEAT along with Ken's novelty search technique could be used to evolve useful deep learning architectures.

publicdaniel · 2024-12-04T15:08:14 1733324894

For anyone interested in NEAT, you’ll likely enjoy Ken’s later work on novelty search, open-endedness, etc.

His book “Why Greatness Cannot be Planned: The Myth of the Objectives” is one of the most perspective altering works I have ever read.

Very excited to see what he does next. He mentioned on Twitter a couple times his interest in representation learning and how objective based search affects this. Very interesting stuff

sakex · 2024-12-04T14:18:47 1733321927

Shameless plug:

I have an implementation in Rust (CPU only) with added GRU gates

https://crates.io/crates/neat-gru

retrofrost · 2024-12-04T14:30:07 1733322607

Damn thats awesome, might be the thing that really gets me to mess around with Rust.

supernikio2 · 2024-12-04T15:40:07 1733326807

Legendary Sethbling video from 2015 where he implemented NEAT for SMW: https://www.youtube.com/watch?v=qv6UVOQ0F44&t=1

ThouYS · 2024-12-04T20:28:19 1733344099

That video inspired me to read the NEAT paper. I think that was the first scientific paper I ever printed out and read

notpublic · 2024-12-04T21:40:39 1733348439

Thanks for that link!

There is another video posted 7 months ago from Seth where he interviews Ken Stanley, the creator of the NEAT algorithm.

https://www.youtube.com/watch?v=5zg_5hg8Ydo

jcgrillo · 2024-12-04T23:36:39 1733355399

This is really interesting, I remember seeing the Mar/IO video and being psyched that someone took Kenneth O. Stanley's work and applied it to something fun, but I had no idea how impactful that video actually turned out to be. Having spent O(hundreds) of hours reading his papers and thinking about those ideas a decade+ ago, this interview serves a really interesting role for me personally, because it connects conversationally in a way papers don't.

retrofrost · 2024-12-04T14:29:05 1733322545

There's a lot of really interesting work in neuroevolution that has the potential to make some really interesting unsupervised training regimes. I think theres some really interesting possibilities for unique encoding schemes like ACE encoding to speed up training and provide much smarter behavior out the other end. Especially, if "genes" can form reusable elements of neural topology that makes scaling networks faster. Reusing components all over body is how we fit such complexity in the relatively little unique DNA we have. The other interesting thing about using genetic algorithms for a portion of training/network mapping is that allows you to have heterogenous networks, so you can have simulations or representations of astrocyte/glial behaivor easily get integrated with neural networks. With traditional training methods it's a massive fucking pain to train a non-feed forward network.

I do think that languages like Elixir and other cpu concurrent strong tools can really be leveraged to make some dynamite libraries.

sgolem · 2024-12-04T14:54:23 1733324063

I've written a NEAT implementation in Rust a few years back. It was able to train XOR and pole balancing.

https://sgolem.com/blog/neat-simple-neuroevolution-framework...

westurner · 2024-12-04T15:52:05 1733327525

NEAT: https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_t...

NEAT > See also links to EANT.

EANT: https://en.wikipedia.org/wiki/Evolutionary_acquisition_of_ne... :

> Evolutionary acquisition of neural topologies (EANT/EANT2) is an evolutionary reinforcement learning method that evolves both the topology and weights of artificial neural networks. It is closely related to the works of Angeline et al. [1] and Stanley and Miikkulainen. [2 [NEAT (2002)] Like the work of Angeline et al., the method uses a type of parametric mutation that comes from evolution strategies and evolutionary programming (now using the most advanced form of the evolution strategies CMA-ES in EANT2), in which adaptive step sizes are used for optimizing the weights of the neural networks. Similar to the work of Stanley (NEAT), the method starts with minimal structures which gain complexity along the evolution path.

(in Hilbert space)

> [EANT] introduces a genetic encoding called common genetic encoding (CGE) that handles both direct and indirect encoding of neural networks within the same theoretical framework. The encoding has important properties that makes it suitable for evolving neural networks:

> It is complete in that it is able to represent all types of valid phenotype networks.

> It is closed, i.e. every valid genotype represents a valid phenotype. (Similarly, the encoding is closed under genetic operators such as structural mutation and crossover.)

> These properties have been formally proven. [3]

Neither MOSES: Meta-Optimizing Semantic Evolutionary Search nor PLN: Probabilistic Logic Networks are formally proven FWIU.

/? Z3 ... https://news.ycombinator.com/item?id=41944043 :

> Does [formal verification [with Z3]] distinguish between xy+z and z+yx, in terms of floating point output?

> math.fma() would be a different function call with the same or similar output.

> (deal-solver is a tool for verifying formal implementations in Python with Z3.)

jcgrillo · 2024-12-04T23:04:36 1733353476

EANT2 looks really cool, thanks for the link.

Macuyiko · 2024-12-04T18:42:45 1733337765

Some more interesting approaches in the same space:

- https://github.com/openai/evolution-strategies-starter

- https://cloud.google.com/blog/topics/developers-practitioner...

And perhaps most close:

- https://weightagnostic.github.io/

Which also showed that you can make NNs weight agnostic and just let the architecture evolve using a GA.

Even though these approaches are cool and NEAT even is somewhat easier to implement than getting started with RL (at least that is what based on so many AI Youtubers starting with NEAT first) they didn't ever seem to fully take off. Although knowing about metaheuristics is still a good tool to know IMO.

tpoacher · 2024-12-04T17:42:09 1733334129

If you like NEATs you might also want to have a look at PDGP by R. Poli.

This is a more traditional / general GP in the sense that it expresses computer programs rather than NNs specifically, but is graph based rather than tree based, in a way that allows one to add a "link set" on top of the "function set" and "terminal sets", allowing it to fully express neural networks as well as more general graph structures. And as the name implies, it lends itself well to parallel/distributed operations.

The big difference with NEATs is that your initial population is randomly generated programs rather than "base" networks ... but you could easily treat these programs in the same manner Koza used GPs to evolve circuits from "embryonic circuits" by having terminals act as circuit-modifying operations.

JCoder58 · 2024-12-04T21:23:48 1733347428

SharpNeat is an independent implementation that targets C# and .NET 8.

https://github.com/colgreen/sharpneat

https://sharpneat.sourceforge.io/

neonsunset · 2024-12-04T21:36:41 1733348201

Wow, started 7 years ago and is still updated. Props to the authors!

orthecreedence · 2024-12-04T21:44:04 1733348644

I did my high school senior project on NEAT! It was a c++ project set up as an "ants eating food" expermient where their miniscule brains learned over several iterations to go to the closest food and eat it.

I later did a version (in common lisp) where instead of an ant knowing where the nearest food was, it had a visual field extending forward that would feed as inputs to the NN. Then added predators, poisonous food items, etc etc. It was fascinating seeing it all unfold.

Genetic algorithms seem like a somewhat forgotten but very handy tool for ML. I've always wondered what the cost of running experiments on species/populations over and over for however many hundreds of iterations is compared to something like back propagation on a large network.