You shouldn't view GA as a competitor to gradient descent. Whenever gradient descent applies to a problem, it'll probably be better than GA. But GAs are flexible and applicable to more problems, and can be used in conjunction with other optimisation methods.
A few days ago I used GA to perform feature selection (binary encoding with 0 indicating a feature is excluded, 1 if included, fitness function was the accuracy of a NN taking the included features as inputs and trained with gradient descent for x epochs). This made my final network smaller, train faster, etc.
A few days ago I used GA to perform feature selection (binary encoding with 0 indicating a feature is excluded, 1 if included, fitness function was the accuracy of a NN taking the included features as inputs and trained with gradient descent for x epochs). This made my final network smaller, train faster, etc.