
Derive Yourself a Kalman Filter - NougatRillettes
https://ngr.yt/blog/2019-04-10-kalman.html
======
Waterluvian
"It's really quite simple. Here, let me show you all these formulas..."

This reminds me of last week. I bought a house and want to get into
woodworking so I looked up intro videos on YouTube. "It's easy. Just follow me
over to this table saw and router and planer and all these other tools you
don't own."

I know I'm not being fair. But a recurring frustration I have is when experts
claim it's easy or simple or for beginners and then talk right over you. They
don't mean to, but it can be insulting and demoralizing. "well if it's for
beginners and I don't know what these glyphs mean, the problem must be me."

So back to the topic. Am I wrong or could you begin with, "a kalman filter is
a way to get a guesstimate of a value from different sources where the trust
in each source can vary."

~~~
gjm11
You could, but you should follow it up with "where one of those sources is
your (incomplete, noisy) information about what the value _used_ to be and how
it changes over time".

If you just have (say) three different ways of estimating the position of an
aeroplane, then you don't need a Kalman filter. The place where the Kalman
filter adds value is where you measure its position _repeatedly_ and make use
of the fact that it's known to be moving approximately in a straight line.

~~~
bigred100
How do EKF and ensemble Kalman Filters fit into this?

~~~
gjm11
The Kalman filter assumes that all errors are gaussian, that all updates are
linear (i.e., new state = linear function of old state), and that observations
are linear (i.e., what you measure is a noisy version of a linear function of
the state).

The _extended Kalman filter_ allows for state updates and observations to be
nonlinear, by the straightforward expedient of replacing them with linear
approximations near to the current estimate.

The _ensemble Kalman filter_ also allows for nonlinear updates and
observations; instead of keeping track of the expectation and covariance of
the state (i.e., everything you need to define a Gaussian model, things that
behave nicely under linear transformations but _not_ under nonlinear ones) it
keeps track of an "ensemble" of sample values, applies the update and
observation functions to those, and then estimates expectations and
covariances from this ensemble. It's a sort of Monte Carlo Kalman filter.

A couple of other things worth knowing about:

Intermediate between the extended KF and the ensemble KF is the "unscented
Kalman filter". Like the ensemble KF it estimates things using a number of
samples; but instead of propagating those samples through repeated steps, it
picks the sample points at each step on the basis of the estimated expectation
and covariance, and uses them only to compute new expectations and
covariances. More expensive than the EKF but copes better with substantial
nonlinearities.

Extrapolating beyond the ensemble KF is the "particle filter", which uses the
same "track an ensemble of samples" approach but gives up the assumption that
all the errors are Gaussian. You need a larger ensemble to get good results, I
think, but it can cope with a wider range of scenarios. (I find the name
"particle filter" annoyingly distracting; the "particles" are the samples,
which I guess you're supposed to think of as a cloud of points in possible-
configuration-of-the-system space, and of course it's a "filter" in the same
way as the Kalman filter is.)

------
heinrichhartman
If you enjoyed this, you might also like this article from my blog:

[http://heinrichhartmann.com/blog/2014/12/11/Generative-
Model...](http://heinrichhartmann.com/blog/2014/12/11/Generative-Models-for-
Time-Series.html)

This is a study of generating time series/stochastic processes. The estimators
for the parameters lead straight up to Kalman filters. The state space models
are taken from the Kalman setup. This was the first time I understood how
Kalman filters come about. Its really a three step process:

1\. Stationary processes --> Classical parameter estimation.

2\. Discrete state space --> Markov models

3\. Continues state space --> Kalman filters.

------
xchip
Make sure to read this bit to make the formal explanation easier to
understand.

[https://aguaviva.github.io/KalmanFilter/KalmanFilter.html](https://aguaviva.github.io/KalmanFilter/KalmanFilter.html)

------
paillou
This one is great to me, less formula and more drawings :)

[https://www.bzarg.com/p/how-a-kalman-filter-works-in-
picture...](https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/)

------
richajak
A few months ago I try to use Kalman Filter as one of the filters to stabilize
the GPS from mobile phones due to various factors: loss of signal, jumpy
stationary coordinate, fake GPS, etc. I found that Kalman Filter was not
giving me satisfactory result, adjusting its parameters many times without
further improvement. Finally, I just need to write my own simple filter,
without fancy math filters (kalman/etc) as I can differentiate the speed of
car/motorbike/pedestrian, distance of jumpy signal (less than 20 meters in
four directions), accuracy of gps signal in the city/suburb, etc.

------
bonoboTP
I think there is no such thing as a conditional random variable. There are
conditional probability distributions, but I've never heard of A|B itself
being a random variable and I think it doesn't fit the definition. A random
variable (I'm not a mathematician) is a function that takes an elementary
event and returns the value of the variable.

You may say this is pedantry, but I think it's important to keep track of what
is what, especially when being a beginner. You can afford to be sloppy once
you're more advanced.

~~~
kgwgk
[http://www.maths.qmul.ac.uk/~pettit/MTH5122/notes15.pdf](http://www.maths.qmul.ac.uk/~pettit/MTH5122/notes15.pdf)

~~~
bonoboTP
Hm, I googled it now, the issue is discussed in detail here:
[https://math.stackexchange.com/questions/612468/how-to-
forma...](https://math.stackexchange.com/questions/612468/how-to-formalize-
conditional-random-variables)

~~~
NougatRillettes
Thanks for your comments! I don't see how what is discussed here conflicts
with the notation I introduced into the post, do you still believe there is a
soundness issue in what I have written?

~~~
bonoboTP
I'm just saying that the notation of say A * X|Y * B seemed unfamiliar to me.
I only know conditional notation within a P(...). Or an expectation, etc.
Apparently your way of writing is used by others as well, but it may be good
to know that it is not fully rigorous.

Again, there are different people preferring different presentations. I as a
student was often frustrated by abused notations and was often confused by
such things when trying to understand something in detail. For a more cursory
and "practical" understanding it could be good enough.

~~~
kgwgk
> it may be good to know that it is not fully rigorous

What is the problem with A|B=b being a random variable? (Apart from you
unfamiliarity with the concept, I mean.)

Edit: I don’t say there are no problems, I ask what do you think the problem
is? There is no problem in the discrete case. In the continuous setting things
are indeed more complicated (but if the limiting process is well defined there
are no issues).

Note that the same lack of rigour that you find in conditional ramdom
variables affects conditional probabilities. If you can accept the latter
there is no reason to reject the former.

~~~
bonoboTP
A random variable is different concept from a distribution. For me personally
it is helpful to keep them separate, but I can see that others may not care
about the complete conceptual picture.

In the PDF file linked above I can see conditional probabilities, conditional
distributions and conditional expectation etc, which are all valid and
rigorous. I can see that the author thinks it's a good idea to merge these
into a single concept of conditional random variable for didactic reasons, but
that's not a rigorous concept.

Practically, if you have two random variables then you can take their joint
distribution. What would be the joint distribution of (A|B) and (C|D)? For
actual random variables it's simple: you can take intersections in event
space, but a "conditional random variable" does not correspond to any subset
of the event space.

Very simply speaking (this is my working model, not the exact precise math
definition which involves a lot of measure theory): in probability theory we
have an event space containing atomic events that cover all possible outcomes
for _the whole experiment /observation_. A random variable is a function that
maps from each such potential (atomic) event to a number. That's right. The
random variable is a function but not the mass function, which maps from a
number to a probability.

Conditional probability P(A|B) is an expression defined to mean P(A,B)/P(B).
That's a clear definition. I am yet to see the actual definition of a
conditional random variable.

Again, disclaimer 1: I can see the practicality of disregarding formality.
Still I argue this is best done only when you do know better but it would be
tedious to be technically correct all the time. But as a beginner I find it
more useful to keep track of the correct concepts. For example not
distinguishing random variables and distributions can be very confusing when
considering more advanced things, like mutual information and KL-divergence.
The former operates on random variables, the latter on distributions. I
remember this was a difficult realization for me because the material we used
didn't emphasize the difference enough, probably in the name of practicality.

Disclaimer 2: my point is a minor one overall.

~~~
srean
I think it will help if you think in terms of conditioning on (for example, a
coarser sigma algebra). You would get another random variable that is
measurable on the sigma algebra you conditioned on. If that is coarser so
would be the new function you obtained by conditioning.

~~~
bonoboTP
Let's talk about a fair dice roll to make it concrete, and let the rolled
number be X and let the event that we rolled an even number be E. P(X=6|E) =
1/3\. P(X|E) is a distribution where 1,3,5 has 0 probability mass and 2,4,6
have 1/3 each.

If we consider X|E as a random variable, what is its value if we roll an odd
number? Undefined? What does that mean? Random variables always have some
value.

Sure you can build a new event space (sigma algebra) but then you can't use
random variables over the original one.

Let's consider two independent rolls, X and Y. You can't compute the joint
distribution P(Y, (X|E)), it just doesn't make sense as the two "variables"
are defined over different spaces. Note that this is not the same as P(X,Y |
E). The latter is simple a conditional _probability_ , without any concept
"conditional _random variables_ ".

Again, this is totally obvious to people who have experience with
probabilities, but could be confusing to students. Such cases are where
students who try to understand the details may be left more confused than
students who just want to get the main idea.

~~~
srean
Sure you can. The TLDR would be "piecewise constant projection"

I think picking up a standard graduate probability book will clear this up
better than any long comment trail. There are no problems defining a coarser
sigma algebra using an original one and then defining a function measurable on
the new sigma algebra. Note this continues to be an r.v. in the original space
as meaurability is preserved. A consistent definition the values of the
conditioned r.v. would be the piecewise constant approximation of the original
r.v. over the indivisible elements of the coarser sigma algebra.

Let me try another route.

You seem to be accepting of a conditional expectation. Now what is a
conditional expectation if not a function. Now all we need is that function be
measurable with respect to the new sigma algebra, thats ensured
byconstruction. Hope it helped some

~~~
bonoboTP
> I think picking up a standard graduate probability book will clear this up
> better than any long comment trail.

Can you recommend one? I just picked up _Probability and Measure_ by
Billingsley and it does not mention "conditional random variable" a single
time in over 600 pages. It does have a lot of "conditional probability",
"conditional distribution", "conditional expectation" etc.

> You seem to be accepting of a conditional expectation.

Conditional expectation is defined in terms of conditional _probabilities_ ,
and those are in turn explicitly defined as P(A|B)=P(A,B)/P(B), so there's
nothing not to accept.

~~~
srean
Billingsley is pretty darn good. It might have left the connection as a dotted
line given that the notion is no different from conditional expectation. The
only connection you have to make is conditional expectation is a function and
a random variable. You must have seen expectation taken of a conditional
expectation. That should should convince you that condititional expectation is
indeed a random variable. Since that r.v. was obtained by conditioning its not
a stretvh to call it a conditioned r.v.

Any book that explains conditioning over a sigma algebra should suffice. You
could try Loeve, Dudely or Neveu but dont remember if its mentioned
explicitly.

BTW conditional expectation is really more fundamental than conditional
probability. Its the former that yields the latter in measure theoretic
probability. If you want to drink from the source that would be Kolmogorov.

Finally if you are reading Billingsley you are adequately qualified to call
yourself a mathematician.

~~~
bonoboTP
It's getting a little tedious. Please show me a concrete citation of a serious
textbook (not a tutorial/handout by a grad student or a paper by a random
researcher) that puts the three words "conditional random variable" next to
each other (consistently, not simply as a one-off potential mistake). Google
doesn't show serious sources for it.

While I agree with isolated points of your comment I think it doesn't add up
to a useful/coherent concept of conditional random variable.

~~~
srean
Thats a little too much to ask, perhaps if they were grep'able I could have
obliged, unfortunately I dont have a photographic memory.

More concretely its just another name for conditional expectation. I am
assuming you are aware that conditional expectation is a random variable
obtained via conditioning (equivalently as a piecewise approximation in L_2).
If you arent familiar with that view point that would be the place to start.
Kolmogorov, Neveu, Dudely, Billingsley will all cover that view point.

~~~
bonoboTP
> I am assuming you are aware that conditional expectation is a random
> variable

That's not what we're considering here, but things of the form X|Y=y for a
concrete y. Even as E[X|Y=y], that's not a function, y is specified. Do you
agree we shouldn't call X|Y=y a conditional random variable?

~~~
srean
Oh absolutely for a specific y its not function (or a random variable) one
usually thinks of Y as a variable and not a constant.

~~~
kgwgk
The expectation E[X|Y=y] is a fixed value. (Edit: it’s the expectation of the
random variable “X|Y=y”, while E[X|Y] is a random variable because it’s a
function of the random variable Y: for each element in the sample space there
is a corresponding value of “y” and in turn there is a value of the
expectation E[X|Y=y].)

X|Y=y (as used in the blog post being discussed) is a random variable: it’s a
function from a subset of the original sample space (corresponding to the
elements for which the value of the random variable Y is y) to real values (or
whatever the image of the X random variable is).

~~~
srean
Yes you are right. I had messed up in the comment above. It continues to be a
function on the restriction Y=y

------
a_imho
In my very ignorant world view Kalman filters make some prediction from some
input. As do various ML techniques. As do various statistical models and
techniques. What are some good sources that show the connection between all
these techniques and help me pick the right one for specific use cases?

For example, I want to predict a time series let's say the number of visitors
of a site. I know some characteristics of the series (periodic, seasonal), but
how should I go about it?

~~~
bonoboTP
> In my very ignorant world view Kalman filters make some prediction from some
> input.

The word "filter" indicates that it turns one sequence (usually a time series
of measurements) into another sequence (an estimate for underlying states).
You can think of it as denoising the sequence of measurement by using
knowledge about how the underlying system behaves. For example, if our GPS
measurement says we suddenly jumped 100 meters compared to a second ago, we
can weigh this against our prior knowledge that a car (the underlying system)
is not likely to make such a sudden position change.

The Kalman filter weights the incoming measurement against what our model
would predict. Both the prediction and the measurement are probabilistic and
they are weighted according to the uncertainties of them. The more certain
source of information is weighted higher.

> For example, I want to predict a time series let's say the number of
> visitors of a site. I know some characteristics of the series (periodic,
> seasonal), but how should I go about it?

That's not what the Kalman filter is for as you are not trying to denoise some
sequence of noisy measurements.

