Inference/learning of densely connected graphical models is a huge difficulty. It's not a big problem for MRFs with nearest neighbours only, but it is for anything interesting (e.g. long-range connections), so I felt quite jealous of neural networks with simple feed-forward computation. Using feed-forward computations in one direction (likely observed->latent) of hybrid models is really appealing to me. Instead of having to use approximations (e.g. Contrastive Divergence (CD) instead of actually sampling from an RBM) which are easy to work with, why not just build your model around the easy thing in the first place so it's not an approximation, and use approximation in the other direction only. I'm not very familiar with Helmholtz machines but they seem to do that, but (unsurpisingly) seem just as difficult? I'm not sure. But I don't like Helmholtz machines and VAEs with Gaussian priors over latents, that seems like making zero use of the power of graphical models. I want the best of both!