Agreed, it's very hard to catch errors in feature behaviour when code reviewing, if you check out the code and start testing then you are doing something else besides code review, sometimes is good for a big changeset, but most of the times the code review should be enough and the original author must have tested his code.
On the other hand, I have seen so much (tested) production code, that didn't do what was intended, but no one noticed in practice. Or stuff like people/documentation said it does a specific thing the code shows it didn't.