Hacker News new | past | comments | ask | show | jobs | submit login
How a Kalman filter works, in pictures (2015) (bzarg.com)
390 points by sytelus on May 21, 2018 | hide | past | favorite | 34 comments

Kalman filters are wonderful things and great for keeping track of where other things are in relation to yourself. But for localizing my robot I'd generally use a Particle Filter[1] instead to allow me to represent multiple hypothesis about where I am. For example if I see a doorway I know I'm going to be by one of the doorways on the corridor but I won't be able to tell which one without further observation. But if there's a source of absolute but imprecise location, like GPS, then a Kalman filter is a good choice again.


Couldn't you fuse this extra fact into a kalman filter?

If you mean your GPS readings then yes you can and should. If you mean having multiple predictions of where you are then no, you would have to either run multiple Kalman filters explicitly or change things so much that it's no longer a Kalman filter.

This is my favourite explanation of how a Kalman filter works


Agree. I like how it starts "if you had 2 sensors, here is how you could combine the measurements". Then goes on to say what if you only had 1 sensor, but you know the system evolves in a linear way, you can use that to make a prediction and treat it as the input from a second sensor.

That finally made me understand it better, and it's much shorter than most explanations without losing necessary details.

Also even though I took both ML and probability and statistics courses saying "it's just Bayesian inference to update the posterior" makes it more confusing for me.

> "it's just Bayesian inference to update the posterior" makes it more confusing for me.

I do not know the jargon, but perhaps that is because this sentence does not give you any context for what "the posterior" is referring to?

Here’s a good HN discussion of that article:


HN comments were quite informative too.

The main comment there I found made it more complicated for me. That is, saying "it's simple because it is like a bayesian thing". Sometimes raising the level of abstraction helps, but it didn't help me here because now I had to remember all the statistics stuff about bayesian inference.

Also check out Sebastian Thrun's videos from the "AI for robotics" Udacity course (2012). He explains with similar examples:





Needs a (2015), plus, recent discussion:

https://news.ycombinator.com/item?id=13449229 (213 points, Jan 21 2017, 32 comments)

It looks very similar to Bayesian estimation. I wonder what is really the difference. Does any of you know, or have a link that explains it?

Yes it is a special case of Bayesian inference under the assumptions that the system is linear and that the priors and likelihoods are Gaussian.

I should have added that there's a very good reason for making these assumptions: they let you represent the system with extremely little data! In general for a system with a continuous state space (like position) then to represent a general Bayesian prior you need to store one data point for each of the infinitely many possible states. Obviously this is impossible so you might instead discretize the space into a grid and store the probability that it is at each that point in the grid. (Think of an image where each pixel's brightness tells you how likely you are to be there - you can store any shape of probability distributions this way but it's of limited resolution). Same for the likelihood and posterior distribution - this uses a ton of memory and is slow and limits your resolution.

Instead, if you assume that the priors are Gaussian, then you can store that information as just two numbers: the mean and the variance (or a matrix of numbers for higher dimensional state spaces). And Guassian's have the remarkable property that if you start with Gaussians and perform a Bayesian update, then you end up with a Gaussian that can also be represented with this same amount of data. Assuming that the system is linear means that you can also represent it's update from one time to the next as a matrix. Moreover, both of these approximations are pretty good for a large class of real-world problems.

There are other sophisticated ways to get around the downsides of the general approach. Namely, there's particle filters which discretize your distributions by a sample of points, but unlike the grid discretization above, the points aren't at specific fixed locations. They're allowed to move around and are constantly being resampled from the distribution you have. This allows lots of the points to get very close together and accurately represent the most interesting (most likely) parts of the state space without wasting tons of memory on extremely unlikely points in the state space. It's very clever and fun to watch in practice!

There is really no difference. You can frame the Kalman filter as a Bayesian posterior inference problem.

For example, for a stationary linear Gaussian model, you have a transition model of the form: z_t = Az_{t-1} + Bu_t + e where e ~ Gaussian(0,Q) and an observation model of the form: x_t = Cz_{t} + Du_t + d, where, d ~ Gaussian (0,R)

Since, z_t and x_t are both multivariate gaussians in this model, you can compute the posterior distribution on z_t's, which will also be a Gaussian. That is basically the Kalman filter.

As the writeup mentions, you might choose a non-Gaussian noise model, in which case the posterior distribution is not a Gaussian and then you employ something like a unscented Kalman filter or extended Kalman filter.

In the event of Gsussian noise, linear state transition and linear measurement, KF yields the exact posterior. Both unscented and extended refer to non-linear measurement or state transition equations. Neither are designed for non gaussian noise.

Though the KF and it's variants are one of the simplest, well-performing estimation methods out there, so it wouldn't suprise me if it's used for everything, appropriate or not.

As all the other replies have mentioned, it is the same thing if you assume linear dynamics and Gaussian noise in the KF. Besides non-linearity and non-Gaussianity, there are a few other extensions that make super useful - one is that you can add an external obervable/controllable control signal. Thinking of this as an "independent" variable that you can observe or manipulate to get your desired observed value of y. This will take you into the realm of control systems engineering. Also, it's trivial (computationally) to let the dynamics be a changing function of time (i.e. instead of the fixed matrix A, let it be A(t)) - this allows you to model some pretty sophisticated dynamical systems.

I wrote this [http://www.tina-vision.net/docs/memos/1996-002.pdf] many years ago which derives the Kalman filter from a state space perspective and then as chi-squared merit function. Certainly helped me as an engineer appreciate the statistical underpinnings.

That's exactly where it came from.


^ A very simple Python implementation of a Kalman Filter with Matplotlib for visualisation.

Here’s a C++/Eigen example implementation of something very similar: https://github.com/google/detectorgraph/blob/master/examples...

With autogenerated block diagram and description here: https://google.github.io/detectorgraph/docs/html/robotlocali...

Hm, this is useful to know...there are a lot of conference posters and publications about using Kalman filters to interpret neural signals, particularly from Utah microarrays for the Gee-whiz brain-machine interfaces out of the BrainGate consortium. Not having an a background in EE, I was always fuzzy as to how it worked.

Nice pics for exam studies.

But in industry Worse Is Better and Kalman Filters is ofent just the optimal way to remove noise you yourself added in your simulation.

I've never seen any implementation use of Kalman filters where the covariance matrix is actually sound ... and I usually go with just lowpassfilters or moving averages.

It's industry standard in aerospace, granted one usually uses the EKF with nonlinear dynamics so the covariance matrix is not estimated perfectly. That setup is also flexible enough to let you introduce of band measurements from the past but with the correct timestamp and correct.

It's kind of interesting that how particular fields chose to embrace certain particular methods; till I have worked with aerospace domain,I never had heard of the Kalman filter, whereas some of the newer and even more fundamental methods like wavelets and compressed sensing appear to have such low traction in Aerospace till now.

You probably know it by some other name. It is so classical its unlikely that one hasn't run into it in some form or the other. If you assume a hidden Markov model with Gaussian state transition and Gaussian output and work out the recursive update equation KF is what you will get. It might not have been call a Kalman filter. May be Weiner filter will ring a bell, under certain assumptions they become the same thing. If not Weiner filter, recursive least squares would surely ring a bell.

Kalman filters were invented for moon lander navigation in the Apollo program, so they had taken firm root especially in the days of limited data processing capability.

A while ago (around 2 or so years) I did quite a bit of mechatronics and Kalman Filters were used resonable effectively for data fusion in IMUs. I wouldn't be surprised if there were better methods around but all the standard chipsets used them at that time.

EDIT: I forgot but I've seen them used more recently by CERN for particle trajectory prediction as part of their hit tracking system. I only know this since they wrote about it in some of their supporting documentation for their latest kaggle comp [0].

[0]: https://www.kaggle.com/c/trackml-particle-identification

We do control stuff in marine radar and target tracking in noisy environments. We use EKF and UKF all the time, and in our cases the covariance matrix is definitely sound, useful, and provides information that we use.

Ye well maybe it's a domain thing. I do driveline programming for cars.

As soVeryTired wrote "In general I find Kalman filters a little suspect where the underlying dynamics of the system can change over time, and when the physical process that gives rise to the dynamics isn't known."

This maybe applies in a less degree to radars etc?

In general we use only one sensor source at a time for eg. rpm rather then fusing them. If one is obvously off just select another. Maybe it's rather about how exact you need the estimate to be.

Also chiming in - I do control stuff in aerospace, Kalman Filters are the industry standard, have been so for decades, verging on low-tech now. Lots of interest in online particle filters in my little niche currently.

What you're saying is definitely appropriate for finance. In general I find Kalman filters a little suspect where the underlying dynamics of the system can change over time, and when the physical process that gives rise to the dynamics isn't known.

I've used them in radar with an actual covariance matrix. I know some people use them in robotics but I've always relied on Particle Filters. And a woman I've been dating has used them in her econometrics work and I'm pretty sure she used an actual covariance matrix given the academic context.

Still waiting for the EKF blog post

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact