> The perceptrons are trained by an algorithm that increments a weight when the branch outcome agrees with the weight’s correlation and decrements the weight otherwise.
A "perceptron" seems to be a single linear function, akin to a single neuron in a neural network. So complex learning algorithms like back-propagation don't apply here.
The complete process, from the paper, is:
> 1. The branch address is hashed to produce an index "i" into the table of perceptrons.
> 2. The "i"th perceptron is loaded from the table into a vector register "P" of weights.
> 3. The value of "y" is computed as the dot product of "P" and the global history register. [This is a shift register containing values of -1 or 1 for the last N branches seen. -1 means not taken, and 1 means taken.]
> 4. The branch is predicted "not taken" when "y" is negative, or "taken" otherwise.
> 5. Once the actual outcome of the branch becomes known, the training algorithm uses this outcome and the value of "y" to update the weights in "P".
> 6. "P" is written back to the "i"th entry in the table.
The article says that AMD's microarchitecture uses a hashed perceptron system in its branch prediction. So, from the above, it seems that "hashed" means that you hash the branch address to pick which perceptron (model) to use, and "perceptron" is just a dot product. (Actually the wikipedia article suggests that the name "perceptron" refers to the training algorithm, i.e. how you update the weights.)
: Daniel A. Jiménez and Calvin Lin. 2002. Neural Methods for Dynamic Branch Prediction. https://www.cs.utexas.edu/~lin/papers/tocs02.pdf