Any open source system can be screwed with in a variety of ways. The simplest and most effective option is to publish both the source and the binaries, but built latter from the an altered source. This will work in a vast majority of cases, because a lot of people make this ridiculous assumption that publishing the source automatically implies that the guy is good, open and trustworthy all over. And won't bother verifying the binaries. Virtually everyone will assume that since it's open there will be someone who will do the verification. Guess what? That someone will assume the same thing.
That's your good old social engineering. It's the humans that are exploitable, not the tech.
But let's say, as unlikely as it is, this such person materialized. Easy enough to run an independent build and verify the binaries, right? Sure. In theory. In a lot of cases, due to dependencies, it's either hard or nearly impossible to do. In other cases it translates into an non-trivial amount of work, which needs to be justified. I am aware of just one project - PGPfone - that published not just the code, but the exact build instructions to produce matching binaries. Everything else is just the "open source, trust us" model. And so the bottom line is that in heck of a lot of cases you will not be able to produce matching binaries.
Now, even if the binary difference in just several bytes that is 100% enough to screw everyone over. This is done by messing with an initialization of an internal random number generator, which all crypto stacks have. All you need to do is make the PRNG (semi)predictable and the best crypto won't stand a chance as there'll be no secrets.
In the end, if you are using pre-made binaries (and who doesn't?) that are not built by a trusted entity from a specific peer-reviewed snapshot of the sources, you have the exact same chances of running a flawed version regardless of whether its source is open or not. Except that in a closed source case you are likely to be more on guard for the surprises.
With open source you can get intelligent, experienced experts to look through it. And if enough of them look through it and say it's good (for example, as many experts have done with the Bitcoin client and protocol), you can at least gain some degree of assurance, even if the possibility of a critical exploit being found in the future still always remains.
Also, you are right in that in this particular scenario, open source would only be the first step as you couldn't know if their servers are actually running the source they published. Open source + full end-to-end encryption and authentication are both required, as is the case with OTR and Moxie's project.
Probably worth linking to Ken Thompson's Reflections on Trusting Trust paper, which illustrates exactly what you're saying with a hypothetical (or not) C compiler backdoor.
"You can't trust code that you did not totally create yourself [...] No amount of source-level verification or scrutiny will protect you from using untrusted code"
"I could have picked on any program-handling program such as an assembler, a loader, or even hardware microcode. As the level of program gets lower, these bugs will be harder and harder to detect. A well installed microcode bug will be almost impossible to detect."
Presumably to would be possible to introduce a bug into every CPU manufactured, that even the manufacturer is unaware of.
Full paper: http://people.umass.edu/gbecker/BeckerChes13.pdf
In the attack they describe, Intel's hardware RNG is supposed to return the results of encrypting 128 random bits using a 128-bit random key with the AES cipher. So an attacker trying to guess the random number returned would have to try 2^256 options. Instead, the attacker modifies the chip so it returns the results of a known key used to encrypt n random bits and 128-n known bits. They therefore only have to try 2^n options -- they can make guessing the random number as easy or hard as they want for themselves.
Now the AES part makes the results of the hacked chip appear random -- the results of AES(0), AES(1), AES(2) ... will have a random distribution in the 2^256 range. (The whole point of a cipher is that it should be impossible to draw any conclusions about the inputs by examining a bunch of outputs, even if the inputs are predictable.) To detect it in software, we'll have to generate enough numbers to start to see a suspicious number of repeated results. So if the attacker sets n==2, they'll have a really easy time cracking the resulting encryption, but we'll easily detect it -- we'll quickly notice that the RNG always returns one of four numbers. On the other hand if they set n==32, they'll have to try n^32 options to crack the resulting encryption, but there will be few enough repeats that we'll give up testing before we notice anything is wrong. (Of course if they're more paranoid and have better resources, they could go with n==64 or whatever -- it's like a dial they can use to set the difficulty where they want it.)
The neat thing here, of course, is that this backdoor is only valuable if you know the bits that have been hardcoded into the chip. So it remains secure against everyone in the world, except the one three-letter agency that managed to modify the chip.
Or their private contractors and consultants, I suppose.