Simulated annealing is gradient based. It's an extension of gradient descent and in the degenerate case (zero temperature) they're the same: it generates random neighbouring states, and if the fitness of that state is better than the current one then it jumps there. That is, it seeks the local minimum.
The key part that simulated annealing adds is that when the temperature is non-zero there's also a chance of jumping to worse states. The probability depends on how much worse the new state is and what the temperature is. It's unlikely to make the jump if it's much worse or if the temperature is too cold. The idea here is to jump out of local minima to (hopefully) find the global minimum.
The temperature is a knob that trades off between exploration (high temperature) and exploitation (low temperature). You start off with a high temperature and jump around a lot, hopefully seeking the general vicinity of the global minimum. As you do so, you gradually decrease the temperature and settle on a locally-optimal solution.
Let's say the fitness landscape looks like this:
\ B /
\A/ \C /
\D/
If you are at A and you're seeking the lowest point, then there's a probability that you will jump to B even though it is worse. The hope is that you'll then discover C and finally D, the global minimum. There's no route from A to D with monotonically increasing fitness, so gradient descent won't find it. But simulated annealing might.
The local minimization step does not have to be gradient based. You could use something like Nelder-Mead to do the local minimization step, and still use an annealing schedule to decide whether to accept or reject that particular minima.
The key part that simulated annealing adds is that when the temperature is non-zero there's also a chance of jumping to worse states. The probability depends on how much worse the new state is and what the temperature is. It's unlikely to make the jump if it's much worse or if the temperature is too cold. The idea here is to jump out of local minima to (hopefully) find the global minimum.
The temperature is a knob that trades off between exploration (high temperature) and exploitation (low temperature). You start off with a high temperature and jump around a lot, hopefully seeking the general vicinity of the global minimum. As you do so, you gradually decrease the temperature and settle on a locally-optimal solution.
Let's say the fitness landscape looks like this:
If you are at A and you're seeking the lowest point, then there's a probability that you will jump to B even though it is worse. The hope is that you'll then discover C and finally D, the global minimum. There's no route from A to D with monotonically increasing fitness, so gradient descent won't find it. But simulated annealing might.