> gradient descent isn't good at combinatorial optimisation.
[1] https://en.wikipedia.org/wiki/Natural_evolution_strategy#Nat...
It requires O(N^4) evaluations to compute Fisher Information Matrix for N-dimensional parameterization of the problem in original formulation. But there are closed form solutions and more economical representations of covariance matrix (LoRA, hehe).
[1] https://en.wikipedia.org/wiki/Natural_evolution_strategy#Nat...
It requires O(N^4) evaluations to compute Fisher Information Matrix for N-dimensional parameterization of the problem in original formulation. But there are closed form solutions and more economical representations of covariance matrix (LoRA, hehe).