I had 25 training images for this perceptron, and it still wrongly identified my...

   I had 25 training images for this perceptron, and it
   still wrongly identified my unseen images. I noticed that
   $\boldsymbol\sigma$ was still changing if I trained it
   repeatedly on the same data, and it didn't stabilize
   until around the tenth time. Maybe it needs more
   training data, maybe the perceptron is a bad algorithm
   for this task, I really don't know.

Perceptrons are particularly delightful because you can answer all these questions based on properties of the data. It is almost lawyer proof, if you do not mind me saying so.

The other beautiful aspect of it is that unlike many learning algorithms that have a training phase and a deployment phase (where it stops learning), a perceptron can keep on learning nonstop, even when deployed. The reason you can do this is because the learning algorithm only needs to process the new data point (and not the set of all old training data points augented by the new examples).

Now to the lawyer proof part, this requires that the algorithm keeps learning when it is deployed (what follows is correct, but nevertheless it is tongue in cheek): Say your client is that "I will sue you at the drop of a hat" kind. He/she wants you to promise on certain quality of performance. Well, with perceptrons there is a way. Just make him/her also sign that all past, current and future data points that the algorithm will come to see in its lifetime will be such that you can pass a slab of thickness d between the 2 classes.

Now you can guarantee that the algorithm will never ever make more than R^2/d^2 mistakes, regardless of how many examples it is tested on, even if you do not know what are the exact examples the algorithm will be tested on in the future. If the client comes back with a log that the algorithm made more mistakes than you promised, dont worry if the client threatens to sue, with that log you can sue them back, because in that log you have a counterexample to their claim that you can pass a slab of thickness d through the data that the algorithm was exposed to.

EDIT: R is the radius of the sphere that conatins all the data it saw. Search for perceptron and mistake bounds, the proof is surprisingly lucid (no calculus, just geometry and a little high-school algebra).