Hacker News new | comments | ask | show | jobs | submit login
The magic of the Kalman filter, in pictures (bzarg.com)
443 points by tbabb on Aug 11, 2015 | hide | past | web | favorite | 48 comments



I'll be shameless and point you to my book on Kalman filtering which I wrote in IPython Notebook, which allows you to experiment within your browser.

https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...


I've been using your book to learn about unscented Kalman filters for something I'm working on, and I need to take this opportunity to say thank you! It would've taken me much longer to understand it all without your book - you've saved me much time and frustration.


Looks like a very good resource, however which prior knowledge does one need to have to understand this book?

e.g., do I need to know bayes' rule? bayesian inference? etc.


This looks incredible, and I don't say that lightly. Thanks for posting it!


Nice. Julia version here: https://github.com/wkearn/Kalman.jl


What do you mean by "Julia version?" I thought you meant that you were linking to a version of the ipython-notebook-based Kalman Filter textbook that had been ported to Julia. But... you seem to only be linking to a Julia library that implements Kalman Filters.


Obviously the important thing (and the one being ported) is the Kalman filter, which is what we're discussing in this post.

Not specifically iPython notebook versions of them.


yes, that's what I meant, there's a Julia library that implements Kalman filters.


Well, I was sorta interested until this:

>The key insight to this entire book is in the next paragraph. Read it carefully!

Okay. What is it!!!

>If we only take data from the measurement then the prediction will not affect the result. If we only take data from the prediction then the measurement will be ignored. If this is to work we need to take some kind of blend of the prediction and measurement (I've italicized the key point).

Unless I am missing something, this "key insight" is redundant and has essentially zero information content, apart from the assertion that to get a good result it needs to be a function of two variables, measurement and prediction. Not one. not the other. But two variables, and that's what they are called, yessir.


I think what you're missing is that the ratio is updated continuously.


Another cool resource for learning to apply and implement Kalman filters is Udacity's AI for Robotics (focused on self driving cars) course by Sebastian Thrun. Apparently Kalman filters are how Google's self driving cars predict the velocity of other cars from their position.

https://www.udacity.com/course/artificial-intelligence-for-r...


Thank you to everyone who has posted resources in this thread. Just yesterday I was talking to a younger engineer about how one of our GPS systems works. I know the complete unit has a GPS and an IMU, and I knew of the Kalman filter, but was unable to explain it beyond "it combines the GPS and IMU inputs to create a position and velocity solution with greater precision and accuracy than can be achieved with either source separately". Now I have much reading to do, and so does the young engineer. Thanks! This will help for the long wait I have in the doctor's office tomorrow...


"In other words, the new best estimate is a prediction made from previous best estimate, plus a correction for known external influences.

And the new uncertainty is predicted from the old uncertainty, with some additional uncertainty from the environment."

Crystal clear - great article, thanks!

I also recommend Ramsey Faragher's lecture notes on teaching the Kalman Filter: http://www.cl.cam.ac.uk/~rmf25/papers/Understanding%20the%20...


It's much easier to understand Kalman filtering in one dimension: http://credentiality2.blogspot.com/2010/08/simple-kalman-fil...


I find the Kalman filter explanation in terms of the Chokesky decomposition by R. Eubank to be excellent. It has a prerequisite of knowing linear regression (the linear algebra actually know what you are doing type), but made the inference of what the Kalman filter does very clear. This is definitely not a 10 minute blog post explanation, but understanding how it functions on a deep level is worth it.

http://read.pudn.com/downloads158/ebook/707373/A_Kalman_Filt...

I don't know why people jump through hoops to start out with a multidimensional Kalman filter, and even sometimes dive deep into complex topics. The subject seems to suffer from the same terminology and confusion gap as pre-Kolmogorov statistics.


I like particle filtering because it's easy to understand and implement - https://en.wikipedia.org/wiki/Monte_Carlo_localization - and it's correct even for non-gaussian uncertainty.

Is Kalman filtering computationally more efficient (obviously particle filtering is stochastic and so trades off accuracy for compute) or does it have some other advantage?


Firstly, Kalman filtering is optimal, that is, it produces exactly the correct posterior distribution. As you mention, particle filters cannot achieve this.

It's a rough heuristic that to achieve a certain accuracy for a linear/Gaussian system with a particle filter, you need a number of particles exponential in the number of dimensions of the system. I feel like this could probably be stated more formally and shown, but I don't think I've seen anything in that vein. The Kalman filter, being simply matrix operations, should scale as the number of dimensions cubed.

So yes, Kalman filtering is computationally more efficient, and (obviously) more accurate.

I also wouldn't discount the fact that the Kalman filter is, in a sense, simpler than the particle filter for a linear/Gaussian system; you don't need to worry about resampling or setting a good number of particles, and you don't need to compute estimates of the mean/covariance statistics (which are sufficient since the posterior should be a Gaussian).


Under the same conditions of linearity and Gaussian noise you can get essentially optimal performance. Of course, why would you when the KF is so much more efficient.


This is true but the assumption of Gaussian noise carries with it the assumption that the model is correct.

Most models are not correct which is why particle filters perform much better than people expect.


Particle filters can be an option if the noise is not additive (e.g. scales with signal amplitude) and Gaussian.


Kalman Filters are super efficient to calculate - they're what kept the Apollo program on track (60s compute!). Another post gives the asymptotic complexity - but as a rule of thumb if you can do any practical computation at all you can run a Kalman Filter.

Basically they're both implementations of a recursive Bayesian filter, but the Kalman filter requires very strong assumptions about the distribution (all Gaussian) and the particle filter requires none.

The Kalman filter is optimal for the Gaussian case (and is very efficient to calculate), whilst the particle filter can use more accurate distributions but is far less efficient to calculate.

You kinda use them in different places - a Kalman filter is useless for pedestrian dead reckoning (step made + estimate of direction), whilst a particle filter would be similarly dumb on submarines.


True. Our robot uses a particle filter since we're mostly localization by odometry and lidar returns. But if we had an outdoors robot using GPS like the one in the article then our uncertainty would be a lot more Gaussian and I think we'd switch.

Also, for determining the position of your own robot the efficiency isn't a big deal. But if you're tracking a lot of objects in your environment then it becomes more important. And since you're looking at the objects and are tracking relative position the distribution is uni-modal too.


The KF is very efficient.

It's optimal for linear Gaussian systems and computation is polynomial in measurement dimensionality k and state dimensionality n: 


O(k^2.376 + n^2)

(Yes I had to look that up.)


This appears to be a really nice writeup. However, at the end:

For nonlinear systems, we use the extended Kalman filter, which works by simply linearizing the predictions and measurements about their mean.

I would recommend looking at an Unscented Kalman filter:

http://www.control.aau.dk/~tb/ESIF/slides/ESIF_6.pdf

which sucks a lot less.


Even better, use a Sigma Point Kalman Filter.


The Unscented Kalman Filter is a type of Sigma Point Kalman Filter -- I just had to look this up to be sure, but it is kind of obvious.


Certainly. However, if you look at the popular literature, UKF is more represented than the SPKF by far, even given its better performance on highly nonlinear systems.


A few months ago I was trying to wrap my head around Kalman filters and this was the clearest explanation I found anywhere:

http://www.cs.unc.edu/~welch/kalman/media/pdf/maybeck_ch1.pd...


The literature on Kalman filters has traditionally been horrendous to a degree that is hard to believe, so this is a fantastic resource.


It was a surprisingly light read. I really liked the strong intuition based reasoning in each step.

When I was playing with different compass implementations in F-droid I recognized many of them uses Kalman filter for reducing the noise from the raw sensor data. Some of them (maybe only one) had some problems near the angle 0, where the sensor data jumps between almost 2pi and slightly above 0 frequently. The problem that the assumption that the measurement uncertanity is gauss-distributed is breaking there badly, since it will be a half gauss near 0 and an other half near 2pi. I don't know what is the general approach to solve this. I would solve this with either:

- Convert the angle into a unit vector and use that as measurement input. Then predict the actual vector and use its orientation for the compass. - Move the periodic window boundaries with a slow relaxation. So if I hold my compass in 0 angle direction, then all angle data is transformed into the [-pi,pi) range. If I hold it to pi direction then the raw data transformed to [0,2pi) range.

TL;DR: Be careful when applying a Karman filter on angles (or more generally R/Z).


Perhaps this is a stupid question, but why is it called a _filter_? To me it just seems like a (very clever) linear projection.


As far as I know there is no deep reason, but it's traditional in the signal processing community, where a filter is a function that modifies a signal.

https://en.wikipedia.org/wiki/Filter_(signal_processing)

Kalman trained as an electrical engineer (MIT and Columbia) so that's the lingo he was used to. There's a brief note in his Wikipedia page about having trouble publishing the technique at first. I'd like to hear the story.

https://en.wikipedia.org/wiki/Rudolf_E._Kálmán


It is a filter and an adaptive one at that. To see it let's compare it with the simplest IIR:

x(k) = h * x(k-1) + (1-h) * u(k)

To use the same language as the blog post, at time k-1 our best estimate of the state is x(k-1). Here we don't know anything about the system dynamics (but if we do we can incorporate it as an FIR as well) so our projection for time k is simply x(k-1). Our measurement at time k is u(k). Based on the same criteria of combination we should combine these two sources of information with appropriate ratios controlled by h. Without an adaptive algorithm we appeal to our prior knowledge of the signal and noise statistics to fix h. Kalman filter as presented in the blog, on the other hand, can estimate the covariance of the signal to determine the best h at any time and therefore adapt to changes of the signal.


It's a part of a "signals and systems" discipline. Operators on/in signals and systems tend to be called filters.


This is great! Back in the '90s, I played with Kalman filters for a predictive navigation system. I had a couple of textbooks, but it was a bear to make sense of the math. Really wish I'd had this back then!

Nav applications are the ones you see most often; it would be interesting to see an example from a completely different domain.


There are plenty of applications in system control, when you want to estimate a variable within a system (sometimes, it's just too hard or expensive to set a sensor).

I've been close to applications that estimate the state-of-health and state-of-charge of lithium-ion batteries. I myself worked in volatility estimation of financial returns using Particle Filter (very similar principle, but usable with non-additive and/or non-Gaussian noise).

Here's some work on applications in Failure Prognosis: http://icsl.gatech.edu/aa/images/e/ed/Pfafdfp.pdf


We use these in the nuclear data community. Measurements of interaction probabilities between neutrons and nuclides are really complex and uncertain, especially for a few reactions like inelastic scattering. Kalman filters help tie the nuclear models and experiments together.


Awesome! Kind of similar to the Viterbi algorithm, except that Kalman is on-line, while Viterbi works on the entire observed sequence at once, after it is fully known.


Not only is it possible to run the Viterbi algorithm on-line, there are probably half a dozen pieces of hardware currently running the Viterbi algorithm on-line within arm's reach of you: https://en.wikipedia.org/wiki/Viterbi_decoder


Conceptually they are very similar; as you mention, Kalman is generally online, and trying to estimate the current state, whereas Viterbi can factor 'future' information into its estimate as well.

Another big difference is that Viterbi works on a discrete state space (DNA base pairs, bits in a convolutional code, etc), while the Kalman filter works in a continuous space (e.g. position).

https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation


Once you see a full sample you can run a Kalman Smoother (after The Filter). Main difference is that KF extrapolates and KFS interpolate.


In a past life, when I was mixing GPS, accelerometer, gyros, and tachometer sensors, Aided Navigation by Jay A Farrel (http://www.amazon.com/Aided-Navigation-High-Rate-Sensors/dp/...) was super handy.

Optimal State Estimation by Dan Simon helped, too: http://www.amazon.com/Optimal-State-Estimation-Nonlinear-App...


Out of curiosity, how much better, typically, is the 'optimal' blending rule than blending with some random or 'reasonable' weights? Never really thought to ask myself that question...


I still find it astonishing how good of an abstraction Matrices are for representing physical data. In CS terms, the use of matrix notation has allowed the description of complex multidimensional systems to be scalable. i.e. instead of adding an entry to the equation for every equation, you represent ALL the variables in a matrix and change that one matrix instead. So, this way of representation allows one to observe the relationships between these numbers better.

Little wonder that beginner physicists spend so much time mastering matrices and linear algebra.


This is a beautiful, well written, and clear explanation. The web needs much more of this, please!



This is the clearest description I've ever seen of a Kalman filter.


as a mechatronics engineer, i didnt understand so well kalman filters




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: