>In other words explained by my TA: when you have a really complex multi-dimensional distribution normal MCMC will take forever to explore it. HMC on the other hand adds momentum that will help the MC to explore areas where the probability is high. Imagine the probability is being translated into 3d landscape where high probability corresponds to deep areas and low high. Any ball with gravity will follow those curvatures and not jump over the walls needlessly where the probability is low.
>Also HMC is the current state-of-the-art MCMC algorithm if you have very high-dimensional data. Regular MCMC too can be applied if the distribution is much simpler. However instead of MCMC, VI is commonly used since it can give really good results with little amount of work. I mean sure you have to choose your approximation distribution but after that it's dead simple. Only maybe if you need really high accuracy you might choose something like HMC.
This is why I get so excited about about probabilistic programming in general. 1M dimension data sets likelihood estimation in reasonable time right on your laptop. Real world samples are usually sparse, heterogeneous. By abstracting out your analysis it not only reduces the chance for human error. But allows for ingestion of even more disparate sets of archival and de novo data sources. I have little doubt this will lead to more nuanced theories. And higher reproducibility of results.
A recent example of the state-of-the-art: predicting rare events in pediatric transplant surgeries.
The folk theorem of computational statistics suggests that if the model has convergence problems, it's a bad model ;)