I think that MAP_POPULATE here will fill the page table with entries rather than leaving the page table empty and letting the CPU interrupt at (almost) every time a new page is accessed. That would be about 200k less interrupts for a 1G file.
MAP_POPULATE will probably also do the whole disk read in one go rather than in a lazy+speculative manner.
Page size is probably not affected and neither is number of TLB misses. I in my testing that the size of the file (and the mapping) will affect the page size, a 4G had significantly less page fault interrupts than a 500MB file.
And obviously, MAP_POPULATE is bad if physical memory is getting exhausted.
in my testing that the size of the file (and the mapping) will affect the page size
I'm doubtful of this, although it might depend on how you have "transparent huge pages" configured. But even then, I don't think Linux currently supports huge pages for file backed memory. I think something else might be happening that causes the difference you see. Maybe just the fact that the active TLB can no longer fit in L1?
I'm confused by this, but this does appear to be the case. It seems strange to me that the MAP_POPULATE|MAP_NONBLOCK is no longer possible. I was slow to realize this may be closely related to Linus's recent post: https://plus.google.com/+LinusTorvalds/posts/YDKRFDwHwr6