Hasselt argues this is because if you need accurate updates to your Q values, you need to trust the learned model you are using for simulated rollouts on the states on which you are sampling. But if your simulation model is trustworthy on these states, it is because it saw a lot of real transitions from these states from the actual environment. But then you might as well just have stored those transitions in a big enough replay buffer and use ordinary Q-learning with experience replay. And this indeed seems to be the case: when you give Rainbow DQN a nice big replay buffer, it is more sample efficient (both real and imagined samples) than SimPLe. Hasselt leaves some wiggle room for learned models to help with action selection and credit assignment, though.
My counterargument to this (supplied with zero evidence of course!) would be that with the right inductive biases, a learned model can generalize quite accurately and with very few seen transitions, and hence be so sample efficient that it would outperform the replay memory approach. I'd imagine that the kinds of inductive biases that are appropriate for a varied meta-environment like Atari are quite general things like 'visually localized objects typically only interact when they approach or touch each other', and 'the arrow keys likely control one localized object'. There are approaches for how to encode such priors;  is a good survey paper, and  employs some of these ideas for RL. Moreover, these are the kinds of priors that one imagines are encoded or biased towards by evolution in actual animal brains.
There's a lengthy (and depressing) series of posts by Ben Recht that basically outlines all the reasons that model-free RL is effectively bunk if you have access to any model of how the environment evolves. These guys (as far as I can tell), haven't made any attempt to disambiguate how much of the improvement they see is down to having a model of the environment _at all_, as against the specific model they propose. I think there's probably more these guys could do to prove that they're actually onto something and aren't just confusing throwing compute at the problem with a genuinely better solution.