From talking to a processor designer, popcount is expensive to implement, which is a more plausible explanation than "OMG NSA conspiracy". Also, some of the latest billion-transistor processors do have popcount.
Secret instructions wouldn't work on emulators such as Bochs or Virtual PC because they aren't in on the conspiracy and the resulting errors would raise suspicion.
This would be a good point were it not for transparent (in every way save for slowdown, which would be dwarfed by the slowdown of using an emulator in the first place) fall-back. All that must be done is for the OS to hook the invalid-opcode interrupt and run a short routine which emulates the instruction in question. So apps will still work, but without the magic speed-up of the secret instruction. This assumes that the OS vendor is in on the conspiracy, as Microsoft was found to have been on two separate occasions. (Read about how they killed OS/2.)
You could still find the "evil" code by monitoring the invalid-opcode trap using the emulator's debug interface.
Secret instructions wouldn't work on emulators such as Bochs or Virtual PC because they aren't in on the conspiracy and the resulting errors would raise suspicion.