In a 3 layer network you have an input layer, a hidden layer, and an output layer. Between nodes in each layer you have a weighting, and a nodes output is the sum of how strongly its activated by nodes in the previous layer. For calculation, you set the activations in the input layer to your input. These then cause hidden nodes to be activated, given the weights, which then activate output nodes, which is what you see as the outcome.
It can be shown that any function can be approximately arbitrarily closely by a single hidden layer by increasing nodes.
Backpropagation uses a set of test data, and expected outputs, to adjust weights. The test data is put as an input to the network, and how much each node is activated is recorded. The algorithm then works backwards through the layers, adjusting weights so that the output will be the/closer to the desired output. Backpropagation itself is about a page worth of differentiation derivation, as you need to work out how much each weight affects each node.
There are other details such as the sigmoid function (increases 'contrast' of whether a node is activated or not, to give better choice boundaries) and the issue of over-training, where your network becomes too trained for your specific test data, and actually gives worse results than a less trained set.
"The algorithm then works backwards through the layers, adjusting weights so that the output will be the/closer to the desired output. Backpropagation itself is about a page worth of differentiation derivation, as you need to work out how much each weight affects each node."
For reference, I think that last bit may oversell it a bit, if you are familiar with multidimensional calculus, which is a required course in many computer science programs so that may not be much of a stretch. Getting all the details of backpropagation right can be a challenge, but it boils down to considering each input as a dimension, taking the local gradient of the whole thing, then adjusting the vector represented by all the weights up (or down, depending on your sign) the gradient by the "learning rate", which is nothing more than a statement of how far in the direction of the gradient you'll go.
Sadly, non-looping neural networks are actually quite boring as classifiers go, and unless someone has made a breakthrough that I haven't heard of, nobody really knows what to do with looping neural networks. (It is possible that breakthrough has been made without me hearing about it, but it is the sort of thing that my feeds really should have picked up on, it would be big nows.)
(Actually one of my personal projects someday is to fire one AI buzzword at another and see if I can't use genetic programming to write something that can train a looping network. :) )