Hacker News new | past | comments | ask | show | jobs | submit login

The connectedness of neurons in neural nets is usually fixed from the start (i.e. between layers, or somewhat more complicated in the case CNNs etc). If we could eliminate this and let neurons "grow" towards each other (like this article shows), would that enable smaller networks with similar accuracy? There's some ongoing research to prune weights by finding "subnets" [1] but I haven't found any method yet where the network grows connections itself. The only counterpoint I can come up with is that is probably wouldn't generate a significant performance speed up because it defeats the use of SIMD/matrix operations on GPUs. Maybe we would need chips that are designed differently to speed up these self-growing networks?

I'm not an expert on this subject, does anybody have any insights on this?

1. https://www.technologyreview.com/2019/05/10/135426/a-new-way...

I think this is a really interesting area of machine learning. Some efforts have been made in ideas that are tangential to this one. Lots of papers in neuroevolution deal with evolving topologies. NEAT is probably the prime example http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf and another paper I read recently called pathnet that is different but very interesting https://arxiv.org/abs/1701.08734.

This is very cool! Thanks!

I experimented with networks where weights were removed if they did not contribute much to the final answer.

My conclusion was I could easily set >99% of weights to zero on my (fully connected) layers with minimal performance impact after enough training, but the training time went up a lot (effectively after removing a bunch of connections, you have to do more training before removing more), and inference speed wasn't really improved because sparse matrices are sloooow.

Overall, while it works out for biology, I don't think it will work for silicon.

Would you say you found a result similar to the lottery ticket hypothesis? https://arxiv.org/abs/1803.03635

Not really - I had to do multiple steps of 'prune a bit, train a bit' to be able to prune to 99%. If I had done all the pruning in one big step as they do, I don't think it would have trained well, even if I had been able to see the future and remove the same weights.

(See sibling comment NEAT is awesome)

The only reason we architect ANNs the way we do is optimization of computation. The bipartite graph structure is optimized for GPU matrix math. Systems like NEAT have not been used at scale because they are a lot more expensive to train and to utilize the trained network with. ASICs and FPGAs have a change to utilize a NEAT generated network in production, but we still don't have a computer well suited to training a NEAT network.

So this might be an enormous opportunity for low-cost and more performant AI if someone was able to build an FPGA of some sort that could handle these types of computations as efficiently right?

Running the post-training network is a solved problem. (FPGA and ASIC can do it just fine). TRAINING the network is the difficulty. The problem is that the structure of the network is arbitrary and is a result of the learning process. You can't optimize a computation for a structure you don't know yet. Bipartite layer networks have the benefit of never changing structure but they can approximate other subset structures. I don't know if we could easily tell where we are on the tradeoff between "bipartite graphs are trained efficiently but are inefficiently simulating a smaller network in practice"

NEAT just doesn't have good, modern GPU powered implementations.

NEAT would totally be competitive if someone actually gets a version running in PyTorch/Tensorflow

It's not that simple. Backpropagating a bipartite graph of nodes works out to a series of matrix operations that parallelize efficiently on a GPU as long as the matrices fit into the GPU's working memory. Running a GA (part of neat) doesn't normally work well on a GPU. The good NEAT algorithms even allow different neurons to have different firing response curves. This inherently defies the "same operation multiple values" style of parallelization in GPUs. The way GPUs work just fundamentally isn't well suited to speeding up NEAT.

You may be interested in this implementation [1] which builds the networks using PyTorch.

[1] https://github.com/uber-research/PyTorch-NEAT

It uses pytorch (and I'm probably going to use it), but doesn't effectively leverage a GPU for training.

What do you think is the best way to accomplish this?

You don't. You need a different parallelism model than a GPU provides. It could work well on machines with very high CPU count, but the speedup on GPUs is the main reason bipartite graph algorithms have seen such investment.

Here is a relevant paper, which was the coolest thing I saw at this past NeurIPS: https://weightagnostic.github.io/

It is based on NEAT (as other commenters mentioned) and also ties in some discussion of the Lottery Ticket Hypothesis as you mentioned.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact