Here are some things I've found:
1) Absolute conversion rate. After a certain point, whatever you add will just detract from the performance of something else. That detraction could either be from the landing page itself (if your lucky) or some longer term variable (hope those "conversions" don't cost you too much money.) I have had both occur.
2) "Statistically significant" can just be noise when variables are fairly close to each other. After getting rid of the obvious losers, I've watched the "winner" of elements like button color change back and forth daily for weeks, with no clear winner, even with 30,000+ conversions a day flowing through. This is the kind of thing visualwebsiteoptimizer would write a case study on 1 hours worth of traffic and declare a winner.
3) You brand can be shit on when dealing with returning users. They are used to seeing one thing and now they see something else. Imagine if the colors, theme, and button locations changed every day (or to be fair, once a week) when you visited hn. Often "better converting" designs actually convert worse when introduced on existing customers.
4) Failure to account for external variables, especially when dealing with small sample sizes. Testing is often done most vigorously with paid traffic sources as the monetary incentive is direct. The traffic source itself is often a bigger determining factor behind the conversion rate than the design. Small budget/sample size, and you could end up with some pretty poor test results that the math will tell you are correct.
I am not saying don't test. I am saying a/b testing, split testing, multivariate testing, etc is abused