
Intro to Hidden Markov Models (2010) [pdf] - kercker
http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec20.pdf
======
platz
\-- vs kalman filters:

"In both models, there's an unobserved state that changes over time according
to relatively simple rules, and you get indirect information about that state
every so often. In Kalman filters, you assume the unobserved state is
Gaussian-ish and it moves continuously according to linear-ish dynamics
(depending on which flavor of Kalman filter is being used). In HMMs, you
assume the hidden state is one of a few classes, and the movement among these
states uses a discrete Markov chain. In my experience, the algorithms are
often pretty different for these two cases, but the underlying idea is very
similar." \- THISISDAVE

\-- vs LSTM/RNN:

"Some state-of-the-art industrial speech recognition [0] is transitioning from
HMM-DNN systems to "CTC" (connectionist temporal classification), i.e.,
basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well.
Speech was one of the places where HMMs were _huge_, so that's kind of a big
deal." -PRACCU

"HMMs are only a small subset of generative models that offers quite little
expressiveness in exchange for efficient learning and inference." \- NEXTOS

"IMO, anything that be done with an HMM can now be done with an RNN. The only
advantage that an HMM might have is that training it might be faster using
cheaper computational resources. But if you have the $$$ to get yourself a GPU
or two, this computational advantage disappears for HMMs." \- SHERJILOZAIR

~~~
the_decider
An RNN can be trained to a) Simulate the random output behavior of the
sequence b) Predict the categorial hidden states at any given sequence
location from supervised training data. A HMM is able to carry out both a and
b. It also has the capacity to do: c) Predict the categorical hidden states
any sequence location with reliance on supervised training data. Effectively,
an HMM can identify/reveal hidden categorical states directly from
unsupervised learning. To my knowledge, unsupervised labeling of hidden states
is not a trivial task for RNNs.

~~~
howlin
HMMs are a good model if you know what the categorical states are before you
start modeling. If you let the HMM learn the hidden states (e.g. with
expectation maximization), there is no good reason to believe they will have
obvious semantic value to you.

For RNNs, you can get very similar info. If you do have meaningful hidden
state, you can treat a few labeled examples as training data for a supervised
learning procedure. The RNN winds up predicting the state variable along with
the future of the sequence. If you want something exploratory, you can use the
RNN's hidden state activation over the course of a sequence as a general real
valued vector that is amenable to cluster analysis. If the RNN state vectors
segment well, it's likely these segmentations will have at least as much
meaning as learned HMM states.

------
melling
Markov Chains Explained Visually

[http://setosa.io/ev/markov-chains/](http://setosa.io/ev/markov-chains/)

~~~
GnarlyWhale
The interactive nature of those visualisations do a great job of increasing
ones' grasp of the concept.

------
mjt0229
A coworker of mine used to ask job candidates (usually folks with PhDs) with
HMMs on their CV "what's hidden in a hidden markov model". Lots of people
couldn't answer that question.

------
gallamine
Are there an open tools for solving HMMs for large datasets? i.e. if I have
millions of observations from millions of users and want to learn a HMM from
the data, what are my options?

~~~
fizzbitch
mlpack ([http://www.mlpack.org/](http://www.mlpack.org/)) has a nice
implementation of HMMs. The library provides a few command-line programs you
can use to work with HMMs:

[http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_generate....](http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_generate.html)

[http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_train.htm...](http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_train.html)

[http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_loglik.ht...](http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_loglik.html)

[http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_viterbi.h...](http://mlpack.org/docs/mlpack-2.0.1/man/mlpack_hmm_viterbi.html)

There is also the HTK toolkit, but last I used that, years ago, I was pretty
disappointed; it did not compile on a 64-bit system. Both of these are single-
machine toolkits, so there is a limit to how much you can scale, but I think
you may be able to deal with a million or so points with mlpack. There's also
MATLAB's implementation, but that is not very fast.

------
graycat
> A Markov chain is a sequence of random variables X1, X2, X3, . . . , Xt , .
> . . , such that the probability distribution of Xt+1 depends only on t and
> xt (Markov property), in other words:

No. In a Markov process, the future does depend on the past, even all of the
past. But what is special is that the past and the future are conditionally
independent given the present. If we are not given the present, then all of
the past can be relevant in predicting the future.

~~~
btaitelb
Markov chains are much simpler than Markov models. The lecture starts off with
the simpler idea to motivate the more complex one that you're describing.

~~~
graycat
Sorry. The statement I quoted is just flatly wrong. Okay?

~~~
evanpw
It's more reasonable to parse "depends only on" as expressing conditional
independence than unconditional, and that parsing has the added bonus of
making the original statement correct.

~~~
graycat
No. Again, once again, over again, one more time, in a Markov process,
including a discrete time Markov chain, even one with a finite state space,
the past can, and in practice usually does, provide a lot of powerful
information to predict the future. The future really can depend on the past.
Typically the past and the future are not independent -- not even close.

But, the Markov assumption (property) says that _GIVEN_ (excuse my emphasis --
it is just crucial here) the present, the past does not contribute more
information for predicting the future. That is the past and the future may not
be independent at all, but the past and the future are _conditionally
independent_ given the present.

To discuss the Markov property, desperately need the concept of conditional
independence -- there is no substitute, no easier terminology or description.

Alas, conditional independence is not so easy to discuss, even in the simplest
cases. And in the grown up cases, need the Radon-Nikodym theorem of measure
theory and the associated machinery of conditional expectation.

Tellingly, the part I quoted never mentioned conditional independence or even
conditioning.

Sorry, again, the quote was wrong. I tried to correct the quote and, thus,
keep readers trying to learn from being misled.

My sympathies are with the readers trying to learn. Early in my career, I read
lots of such elementary, easy to understand introductions to lots of topics in
applied math. The biggie problem was, as here, the content was from not very
good down to actually wrong. Later I got a first class, rock solid foundation
in grad school. In particular my background in Markov processes was from a
star student of E. Cinlar, a world class expert long at Princeton. And my
Ph.D. dissertation was in stochastic optimal control where the Markov
assumption is just crucial -- indeed, stochastic optimal control is also
commonly called Markov decision processes. In addition I've applied Markov
processes in some serious work in US national security.

What I'm saying here is on the center of the target. What I quoted no one
should try to learn from -- it's wrong.

------
platz
link to MIT course:

[http://ocw.mit.edu/courses/aeronautics-and-
astronautics/16-4...](http://ocw.mit.edu/courses/aeronautics-and-
astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/)

