Hacker News new | past | comments | ask | show | jobs | submit login
Kalman filters and functional programming (johndcook.com)
215 points by edwintorok on Jan 31, 2017 | hide | past | web | favorite | 34 comments

Kalman filters are important in advanced control systems. One of the things the Apollo computer system did was run Kalman filtering on trajectory data, in order to continuously verify its course.



The first paper linked (page 5 of http://vixra.org/pdf/1606.0328v1.pdf) makes a cute connection that was new to me.

It starts with the familiar formula for on-line computation of the sample mean (e.g., https://en.wikipedia.org/wiki/Algorithms_for_calculating_var...) --

  mu_n <- mu_n-1 + (1/n) (z_n - mu_n-1)
This corrects the mean "mu" based on the new observation "z_n".

The paper then uses this simple expression to motivate the Kalman filter update equation, in which a new observation z_n is used to compute an innovation (or error signal) y:

  y = z_n - H x_n-1
The innovation is then used to update the state-estimate x,

  x_n = x_n-1 + K y
The two updates are formally identical, with the Kalman state "x" being analogous to the running-mean "mu". I had never noticed the connection. There is a similar connection to the on-line variance.

Great. I have another form of mathematics that I need to learn. I'll put it on the pile on top of geometrical algebra and stochastic processes.

-- Kalman Filters vs HMM (Hidden Markov Model):

"In both models, there's an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar." - THISISDAVE


"Some state-of-the-art industrial speech recognition [0] is transitioning from HMM-DNN systems to "CTC" (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that's kind of a big deal." -PRACCU

"HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference." - NEXTOS

"IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs." - SHERJILOZAIR

Sweet. What are the names you mentioned in capital after the quotes? Are they HN usernames?

Sometimes I wish I could quit my job for a few years and just catch up on all of the math I'd like to know. There is an amazing amount of stuff to learn that I just have no clue about and ultimately is much more interesting than yet another HR tool.

My advice is to just start a side programming project where you have to learn the concept. I think it's a much, much faster way of learning, to have to implement the technique, than to go through textbooks and exercises. I don't think most people who aren't career mathematicians will truly understand linear algebra until they write a few Matrix and Vector libraries in support of some other thing they are deeply interested in building.

Seconded. The slightly better option, if you can get away with it, is working it into a project at work (rather than a side project) so you can devote a few of your fresh-brained 9-5 hours to it each week and not feel too guilty.

You could just work for a couple hours per day on it, or 10-20 hours per week on it.

Smaller opportunity cost missed and more likely to complete the goal than just taking a sabbatical.

I've done this, learning advanced orbital mechanics on my spare time. But I manage to put in 1-3 hours a week, about one week per month at most and it's been almost half a year since the last time I was able to put my mind into it.

It's not like I don't have the time, it's the fact that I'm way too tired after work to do any substantial amount of learning. I think I could manage something that's much easier, say learning some languages or learning how to knit. But learning math, physics, etc just requires more than I have to give after work.

That said, I managed to get a firm understanding of orbital mechanics up to a level where I could probably ace all the tests for freshman-level courses at an aerospace university and I did all of it on Saturday afternoons (but it took me 5 years to do so).

I have carved out 1 hour in my morning before work to do things that I consider important. It's not quite enough time. I'm going to try to improve it to 1.5 hours. But either way, it means the best of my brain goes to me. Sometimes I give it to my employer, but not always. Most likely, you don't need the best of your brainpower to do your day-to-day work.

This is a great technique. I have a 2.5 hour morning routine, I do before I head to the office.

You'll often hear about famous writers, hollywood-writers, etc. doing this type of technique of doing their best creative work before the chaos of the day sets in.

> I've done this, learning advanced orbital mechanics on my spare time.

Sooo...... you play KSP? ;)

Yes, I play KSP and Orbiter.

In my study project, I learned thoroughly how to solve different two body problems (initial value, boundary value, closest approach) and wrote a C library [0] that's got all the orbital mechanics code you need to write your own Kerbal Space Program clone (but I don't intend to do that). During the process, I read 3-4 books and an inch thick pile of research papers.

[0] https://github.com/rikusalminen/twobody

If you'd like to start looking at three-body problems, you might be interested in studying the Kozai-Lidov mechanism. It's a really fascinating subject, so much so that I did my dissertation on it! Here's a really good paper (one of my all time favorites) to get you started: https://arxiv.org/abs/1309.5244

FWIW my first encounter with a Bayesian explanation of Kalman filters was this series: https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Pyt...

Kalman filters (unless you get into really esoteric modifications) are just one tool and are related to stochastic processes. It's more like learning an algorithm than learning a whole field of math.

Also, the wikipedia article for it is possibly the best-written math article I've read: https://en.wikipedia.org/wiki/Kalman_filter

This looks really interesting. The idea of splitting the accumulator from the data seems a clear win. Would love to see evidence confirming it works.

Brian Beckman works for Amazon Prime Air (drone delivery service)[1]. I submit the video "First Prime Air Delivery"[2] as circumstantial evidence that it does indeed work.

[1] https://twitter.com/lorentzframe [2] https://www.amazon.com/Amazon-Prime-Air/b?ie=UTF8&node=80377...

I'm not sure I follow. Would you let me submit as evidence an old piece of software that is littered with GOTOs all over the place as evidence that that is a great way to write software?

I fully believe this can work. Especially with a talented practitioner. However, the question is if it will lead to more maintainable (e.g., by needing fewer iterations?) and faster (e.g., by lots of people using it) output with fewer bugs (e.g., by surviving the lindy effect).

That is, my question is entirely empirical and will hinge on repeated success. Not initial success. I am personally hopeful for it, but I have reasons not to trust my hopes.

I wish the various papers linked in that page used a different language than Wolfram Alpha to illustrate the various concepts, especially since that language doesn't support even basic functional constructs such as pattern matching.

I am very confused by this comment. While I do find the Wolfram language difficult to read and generally obfuscated, it definitely has a ton of advanced functional constructs. If anything, simple pattern matching is the main way I used to write code in Mathematica.

Among the typical "scientific computing" languages, Wolfram is the only one that has "cool" functional programming tricks as the standard way to solve a problem in it.

1: https://reference.wolfram.com/language/guide/RulesAndPattern...

P.S. Seriously! The amazing functional style of Mathematica is pretty much the one thing I like about it.

I've only worked through the first paper thus far, but translating it all to Scheme has been a great help in understanding what is going on. If it had been in a nicer language by default, I think I would have lost something from the explanations.

As those papers are hosted on vixra.org, how credible is the content? Has somebody with deeper knowledge on the subject had a look into it?

How about you go through a tiny bit of trouble and figure out for yourself that John Cook is legit?

The papers are written by Brian Beckman as you have found out by yourself certainly. However, this doesn't answer my question.

Kalman filter is a natural example of iterative algorithm. Writing it in Fold does not gain much.

glad to see this guy on here again. I hadn't seen him on HN in a while and I forgot his name so I couldn't google. he's great at explaining advanced math concepts.

You can add his blog to your RSS feed, or bookmark the articles you like and tag them (I do both).

In fact I think bookmarks are underrated, I tried various other ways of organizing information (Zim, Org, etc.) and bookmarks are by far the simplest and most effective. For example just tag this blogpost with 'math, concepts'. Then you just type 'math concepts ' in your Firefox address bar and see all related pages. If you setup firefox sync then you can bookmark pages (even from other apps by sharing the URL with firefox and using the 'add bookmark' action), and tag it later on the desktop.

So, is it like error compensation in interpreting sensor data?

It does sensor fusion. The google term you want is "optimal estimation".

A Kalman filter isnt really a filter. It keeps a state estimate of a system. The system state is updated on a regular basis using control inputs. That runs open loop, with a model of the drift gradually reducing your confidence in the open loop estimate. As sensor inputs arrive, they are incorporated in the state to refine the state estimate. The sensor noise model is your confidence in the measurement.

The Baysian arithmetic comes into play when incorporating measurements. The confidence in current estimate and confidence sensor are used to decide how much of each to average into the new estimate.

Ah, but it is a filter - it filters out high frequency noise. One interpretation of the Kalman filter is a variable cutoff low-pass filter. The cutoff is determined by the Kalman gain. Note the similarity between the Kalman state update equation and a first-order low-pass IIR filter:

x_post = (1-KH)x_prior + K*z

Well, yeah. It has a smoothing effect on the sensor data. But you don't use it because you want a filter. You use it because you want to build a controller.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact