There's some information theory that goes over my head, but I think the point is that you can approximate E[f(x)] with an integral over the typical set and expect good results, for non-pathological functions f.
Defined in this way, you could of course leave the origin in the typical set. But then there would be another, smaller, typical set that would do the job just as well.
Thanks. I'm sure the formalism makes sense if properly developed and can lead to useful approximations. It's just that the "conceptual introduction" in that paper makes no sense to me... I understand it's just a simplification, but simplifying too much removes the essence of things.
> A grid of length N distributed uniformly in a D-dimensional parameter space requires N^D points and hence N^D evaluations of the integrand. Unless N is incredibly large, however, it is unlikely that any of these points will intersect the narrow typical set, and the exponentially-growing cost of averaging over the grid yields worse and worse approximations to expectations.
In fact any point within the hypersphere is as good (or better) as any point in that narrow typical set not including the mode. The problem is not missing the "narrow typical set", is missing the whole hypersphere (because the volume of the hypersphere is small). I fail to see where in that introduction the fact that the typical set excludes the mode of the distribution makes any difference.
Defined in this way, you could of course leave the origin in the typical set. But then there would be another, smaller, typical set that would do the job just as well.