1. Neurons, or more specifically, connections between neurons ("synapses"), absolutely do have weights, and the "strength" of synapses can be adjusted by a variety of properties that act on a scale of seconds to hours or days. At the "semi-permanent" end of the spectrum, the location of a synapse matters a lot: input arriving far from the cell body has much less influence the cell's spiking. The number (and location?) of receptors on the cell surface can also affect the relative impact of a given input. Receptors can be trafficked to/from the membrane (a fairly slow process), or switched on and off more rapidly by intracellular processes. You may want to read up on long-term potentiation/depression (LTP/LTD), which are activity-dependent changes in synaptic strength. There are a whole host of these processes, and even some (limited) evidence that the electric fields generated by nearby neurons can "ephaptically" affect nearby neurons, even without making direct contact, which would allow for millisecond-scale changes.
2. While you can start by dividing neurons in excitatory and inhibitory populations, there's a lot more going on. On the glutamate (excitatory) side, AMPA receptors let glutamate rapidly excite a cell and make it more likely to fire. However, it also controls NMDA channels that, under certain circumstances, allow calcium into a cell. These calcium ions are involved in all sorts of signaling cascades (and are involved--we think--in tuning synaptic weights). GABA typically hyperpolarizes cells (i.e., makes them less likely to fire) and is secreted by cells called interneurons . However, there's a huge diversity of interneurons. Some seem to "subtract" from excitatory activity, others can affect it more strongly in a divisive sort of way or even cancel it completely. Furthermore, there's a whole host of other neurotransmitters. Dopamine, which is heavily involved in reward, can have excitatory or inhibitory effects, depending on whether it activates D1 or D2 receptors
3. While the textbook feed-forward neural networks certainly have "instant" signal propagation, there are lots of other computational models that do include time. Time-delay neural networks are essentially convnets extended over time instead of space. Reservoir computing methods like liquid state machines also handle time, but in a much more complicated way.
4. I chuckled at the idea of finding a biological correlate analog for reinforcement learning, since reinforcement learning was initially inspired by the idea of reinforcement in psychology/animal behavior. People have shown that brain areas--and individual neurons within them--encode action values, state estimates, and other building blocks of reinforcement learning. Clearly, we have a lot to discover still, but the general idea isn't at all implausible.
Finally, some people are fairly skeptical that the fields have much to learn from each other; Jürgen Schmidhuber said this a lot at NIPS last year. However, other, equally-smart people (e.g., Geoff Hinton) seem to think that there may be a common mechanism, or at least a useful source of inspiration there. But, if you want to work on something like this (and it is awesomely interesting), it really helps to have a solid grounding in both.
There are a couple of standard neurobiology textbooks, like Kandel, Jessel, and Schwartz's Principles of Neural Science, Purve et al.'s Neuroscience and Squire et al.'s Fundamental Neuroscience. These are huge books that cover a bit of everything, and you should know that they exist, but I wouldn't necessarily start there.
If you're specifically interested in computation, I would start with David Marr's Vision. It's quite old, but worth reading for the general approach he takes to problem-solving. He proposes attacking a problem along three lines: at the computational level ("what operations are performed?"), the algorithmic level ("how do we do those operations?"), and the implementation level ("how is the algorithm implemented").
From there, it depends on what you're interested in. At the single-cell level, Cristof Koch has a book called The Biophysics of Computation that "explains the repetoire of computational functions available to single neurons, showing how individual nerve cells can multiply, integrate, and delay synaptic input" (among other things). Michael London and Michael Häusser have a 2005 Annual Reviews in Neuroscience article about dendritic computation that hits on some similar themes (here: https://www.researchgate.net/publication/7712549_Dendritic_c... ), along with this short review (http://www.nature.com/neuro/journal/v3/n11s/full/nn1100_1171...) by Koch and Sergev, and a 2014 review by Brunel, Hakim, and Richardson (http://www.sciencedirect.com/science/article/pii/S0959438814...). Larry Abbott has also done interesting work in this space, as have Haim Sompolinsky and many others. Gordon Shepard and his colleagues maintain NEURON (a simulation package/platform) and a database of associated models (ModelDB) here: https://senselab.med.yale.edu/ if you want something to download and play with (they also do good original work themselves!)
Moving up a bit, the keywords for "weight adjustment" are something like synaptic plasticity, long-term potentiation/depression (LTP/LTD), and perhaps spike-timing dependent plasticity. The scholarpedia article on spike-timing dependant plasticity is pretty good (http://www.scholarpedia.org/article/Spike-timing_dependent_p... Scholarpedia is actually a pretty good resource for most of these topics. The intro books above will have pretty good treatments of this, though maybe not explicitly computational ones.
More to come, however I also just found this class from a bunch of heavy-hitters at NYU: http://www.cns.nyu.edu/~rinzel/CMNSF07/ Those papers are a good place to start!
I probably should have lead with this, but there's been a lot of interest in backprop-like algorithms in the brain
* Geoff Hinton has a talk (and slide deck) about how back-propagation might be implemented in the brain. (Slide deck: https://www.cs.toronto.edu/~hinton/backpropincortex2014.pdf Video: http://sms.cam.ac.uk/media/2017973?format=mpeg4&quality=720p )
* As always, the French part of Canada has its own, slightly different version of things, care of Yoshua Bengio (slide deck from NIPS 2015: https://www.iro.umontreal.ca/~bengioy/talks/NIPS2015_NeuralS... preprint: https://arxiv.org/abs/1502.04156 )
* Here is another late 2015 take on back-prop in the brain by Whittington and Bogacz (http://biorxiv.org/content/early/2015/12/28/035451) This one is interesting because they view the brain as a predictive coding device which is continuously estimating the future state of the world and then updating its predictions. (I think the general predictive coding idea is cool and probably under-explored).
* There's a much older paper by Pietro Mazzioni, Richard Andersen, and Michael I. Jordon attempting to derive a more biologically plausible learning rule (here: http://www.pnas.org/content/88/10/4433.full.pdf) This work is particularly neat because it builds on earlier work by Zipser and Andersen (https://www.vis.caltech.edu/documents/54-v331_88.pdf), who trained a three-layer network (via back-prop) to transform data from gaze-centered ('retinotopic') to head-centered coordinates, and noticed that the hidden units performed transforms that look a lot like the work done by individual neurons in Area 7A. The Mazzioni paper then replaces the backprop with a learning procedure that is more biologically plausible.
For backprop, you need some sort of error signal. Wolfram Schultz's group has done a lot of work demonstrating that dopamine neurons encode something like "reward prediction error." (e.g., this: http://jn.physiology.org/content/80/1/1, but they have lots of similar papers: http://www.neuroscience.cam.ac.uk/directory/profile.php?Schu...). For reinforcement learning, you might also want to maintain some sort of value estimate. There are tons of studies looking at value representation in orbitofrontal cortex (OFC), using mostly humans and monkeys, but occasionally rats. Here's a review from Daeyeol Lee and his postdoc Hyojung Seo describing neural mechanisms for reinforcement learning (http://onlinelibrary.wiley.com/doi/10.1196/annals.1390.007/f... ) The Lee lab has done a lot of interesting value-related things too.
Switching gears slightly, there is also considerable interest around unsupervised learning and related methods for finding "good" representations for things. This is potentially interesting because it would allow for improvements within individuals and even across individuals (e.g., by evolution).
Olshausen and Fields kicked this off by demonstrating that maximizing the sparseness of a linear code for natural images produces a filter bank that resembles the way neurons in primary visual cortex process images. (http://courses.cs.washington.edu/courses/cse528/11sp/Olshaus...)
Michael Lewicki has done similar things in a variety of sensory modalities. Here's a recent paper from him looking at coding in the retina (http://journals.plos.org/ploscompbiol/article?id=10.1371/jou...) but he has similar work in the auditory system and building on the Olshausen and Fields paper linked above to explain complex cells (and more!) in visual cortex. Bill Geisler has also done a lot of work looking at the statistics of natural scenes and how the brain (and behavior) appears to be adapted for them.