I think what he is trying to say is that Phil didn't test the higher order interactions enough, i.e. he didn't check that the effect that some pair of action have when done together (and the effect of triples, etc.). This is very important to note, and a good point. (I think one should preferably run a fully crossed factorial experiment, so that every combination is tested.)
But it is just saying "don't do A/B testing wrong", which is obviously true. I think a lot of problems with A/B testing are caused by people who don't have much statistical knowledge missing some of the subtleties that comes with any experimental design.
(On that note, there are many articles about why one needs to be careful using A/B testing, which do have the backing of statistics.)
: http://www.evanmiller.org/how-not-to-run-an-ab-test.html, http://www.cennydd.co.uk/2009/statistical-significance-other...