Well we're in luck. https://www.youtube.com/watch?v=CaCcOwJPytQ.
It's a great set of 55 video lectures on Kalman filtering. (I'm only on number 5 but so far they've been great)
I would have paid good money for this if I wasn't already familiar with the material. The clarity is up there with The Essence of Linear Algebra: https://www.youtube.com/watch?v=kjBOesZCoqc
Trading math videos on the Internet--definitely nerds.
It's a little simpler to derive the least squares smoothing function
"In both models, there's an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar." - THISISDAVE
-- HMM vs LSTM/RNN:
"Some state-of-the-art industrial speech recognition  is transitioning from HMM-DNN systems to "CTC" (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that's kind of a big deal." -PRACCU
"HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference." - NEXTOS
"IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs." - SHERJILOZAIR
In those scenarios, once computational power allows, I think other approaches such as particle filters which can handle arbitrary distributions(e.g., multimodal) will start taking over. But we're not there yet(?).
It's also worth noting that the Kalman filter follows the EM pattern of many ML / statistical models.
The real draw of these filters, though, is that they are very fast. In my experience, most of the compute time every update cycle is spent on sensing because your sensors dump a ton of data that you need to process as part of your CV / SLAM / whatever pipeline (the outputs of these then go into your KF). The dream is to get a 10ms update loop so your control algorithms can do a good job, but this is easier said than done.
Please stop saying that.
For example, missile guidance.
Consider standard deviation. You can calculate the standard deviation of a stream of numbers without storing all of them, or knowing where the stream will end. 'The standard deviation so far', in effect.
That makes them applicable to a wide range of embedded applications. As generalized ML tools, I'm pretty doubtful unless you wanted to create something in hardware as a large set of coupled noisy state-spaces.
Yes, you are right, but this is both a good example, and a bad example, because my old Casio calculator could do that too :)
I'm also a huge fan of the use of colors to understand all the different concepts at work. Yesterday I actually asked the secretary of my department to get my an 8 pack of multicolored pens for this exact purpose (red, blue, and black aren't enough!).