
Split Testing May Cause Google to Accidentally Think You’ve Been Compromised - melvinram
http://www.webdesigncompany.net/split-testing-google-compromised/
======
Matt_Cutts
There's a simpler explanation, which is that the site really was hacked as we
claimed. Here's an example cached page that we crawled on May 2nd:
[http://webcache.googleusercontent.com/search?q=cache:zl9t7DQ...](http://webcache.googleusercontent.com/search?q=cache:zl9t7DQwkH4J:model1.webdesigncompany.net/articles/&hl=en&gl=us&prmd=imvns&strip=0)

I count the name of one drug repeated over 100+ times.

I've been quite clear that there's nothing wrong with A/B testing. In fact,
less than a month ago I tweeted that "A/B testing can be really helpful" and
included a link to best practices:
<https://twitter.com/#!/mattcutts/statuses/191658511149711360>

~~~
saurik
That is not the same website: that is model1.webdesigncompany.net, which seems
to be some kind of lorem-ipsum design or functionality demonstration. That
site does not seem to be linked from www.webdesigncompany.net (based on a
complete recursive wget of that site) and is not hosted on the same server as
www.webdesigncompany.net: the only thing it shares with
www.webdesigncompany.net is the domain name. It is also not entirely clear
that it actually has been hacked. Is this kind of circumstantial evidence
really relevant for Google?

~~~
Matt_Cutts
Hi saurik, here's what I can tell from home on a Sunday night:

\- some pages on the domain were definitely hacked across multiple pages for
weeks.

\- we detected hacked pages on the site both manually and algorithmically
based on our hacked site classifier.

\- we sent a message via the webmaster console on May 5th to alert the site
owner so they'd have a heads-up.

\- it looks like this has nothing at all to do with A/B testing (which as I've
said before is perfectly fine to do).

That's what I know with near 100% confidence.

Here's what I believe, but won't be able to confirm until tomorrow when I can
talk to a few folks. I think the domain was hacked in _multiple_ ways. I found
the hacked pages on model1.webdesigncompany.net just from doing site:
searches. And you're right that those are auxiliary pages, not related to the
main part of the site. But I suspect the core site was also hacked, based on
looking at the manual action that was submitted. I'll be happy to talk to the
relevant webspam people to ask for more details.

------
hartror
The analysis here makes sense. It is likely that the redirection test tripped
Google's URL cloaking detection algorithm [1]. I've not used redirects for
split testing before, rather I use client or server side variations. Redirects
don't make sense from a page speed and UX stand point which is why you don't
see them advocated for use in split testing.

One thing that rang alarm bells that you and readers should be aware about is
to do with this statement:

> _The VWO javascript automatically redirected equal portions of the traffic
> to the test pages._

If you are not randomly assigning visitors to a group and instead just putting
odds and evens in separate groups when you run more than one test at once you
are will synchronize those tests and generate incorrect results.

[1]
[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=66355)

~~~
melvinram
That was a simplification on my end. I'm sure VWO does randomly assign the
traffic to the variations. I just wanted to get across that part of the
traffic went to page A and part of it went to page B.

~~~
hartror
If you were using VWO why not have a single landing page with different
content?

Also why did you segment the traffic during the test (ignoring traffic that
didn't match you criteria) instead of doing it at the analysis stage? Is this
a limitation of VWO?

I haven't used VWO rather I have rolled my solutions and I found segmenting
traffic during analysis allowed for a great deal of discovery and indications
for further testing. What if one of your test pages caused people who searched
using a phrase that didn't indicate an intention to buy converted those users
better?

~~~
melvinram
I typically use the client-side variations approach but didn't in this
instance because the two pages were created separately and were pretty
different in terms of css and it was just easier to keep them separate.

Moving forward, I'll do the extra work to make VWO swap content on the same
page and adjust the css appropriately.

Segmenting traffic after the test is something I wish VWO would allow me to do
but that is currently a limitation of their system.

------
glimcat
I usually robots.txt any big text variations anyway. A short delay in indexing
isn't going to do any major harm and that way you get the right version
cached.

~~~
saurik
This makes sense if you are doing a one-time split test, but not if you are
doing ongoing tests akin to the scheme described in the article I link to
below (which comes up on HN from time to time). The problem being that in
these cases you never really stop testing: instead, you always assign some
random proportion of incoming traffic to your various buckets based on your
current statistical certainty.

[http://untyped.com/untyping/2011/02/11/stop-ab-testing-
and-m...](http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-
like-a-bandit/)

This not only has the advantage mentioned where you limit your regret for
making a bad decision on too little evidence, but with some modifications to
the algorithm (such as discounting old evidence from your inference) you can
also deal with situations where the world in which you are running the
different variations is itself changing over time, or where different types of
users coming from different browsers may have different behaviors.

From the perspective of someone searching for your site, however, the results
might be nearly identical: even if you have massive changes to the phrasing
(having versions written in various regional dialects of English, for
example), the resulting page will still largely be identical, possibly
sentence-for-sentence (though not word-for-word). In this case, you will still
want to get one of those variations indexed, and it might not matter which
one.

As described, of course, this is both "cloaking" and not: the site is not
treating GoogleBot differently than any other visiter on purpose, it is simply
going to end up doing that because GoogleBot has different behavior on the
site than a normal user, so it will learn and optimize (possibly somewhat
randomly or uselessly) from this and end up treating that user differently
than one coming from Internet Explorer (which may very well represent a
different demographic of user).

It sadly does not seem like this advanced usage of testing is allowed by
Google.

~~~
Drbble
If your site looks different to Google than to other users" even for a
legitimate reason, why should Google index York step based on what it saw? And
it can't index your site based on whatvit didn't see, so what's left? Either
make your site look the same to Google and the public, or get your link juice
in a way that doesn't depend on site content.

------
saurik
It's really despicable that Google has decided there is a single way that the
Internet must be used, and that if you don't abide by their rules they can use
their near-monopoly on both search (discovery) and advertising (revenue) to
entirely exclude you from it. We are now living in a world where Google has
managed to replace the URL bar for 99% of users with a search box (whether or
not the URL bar actually is a search box, which it now actually is in many
browsers).

Hell, I myself am guilty of not bothering to type in URLs anymore (yes, just
like those poor people who used to search "Facebook login" and one day got
highly confused when an article about logging in to Facebook that happened to
support Facebook Connect to leave comments ranked higher): I just search for
things like "cdnetworks portal" and "1and1 client login"; I even, and I shit
you not, do a Google search using the Chrome OmniBar for "google images", as I
don't remember what the URL for that part of Google is.

[edit: As adgar correctly points out below, the following paragraph is a
misunderstanding of the mechanism that was used by this website to implement
this split-test algorithm. However, it seems fairly obvious to me that the
mechanism used to implement the split test does not matter. Meanwhile, the
reason I went in this direction is due to Matt Cutts specifically stating the
trade-off in the below paragraph on the Google webmaster video series; it is
not because I assumed something from this article's conclusions.]

Seriously... there is /nothing wrong/ with a site choosing to A/B test people
by returning different results from the server, and yet Google insists that
doing so is somehow harmful and that it would be better to wait entire extra
round-trips to do client-side testing using JavaScript, a process that in the
end is not only a worse experience for the user and a more complex and less
secure mechanism for the developer but has /the exact same fundamental
behavior that Google claims is evil/.

It is /exceptionally/ irritating as their rules may have some (highly
arguable) philosophical purity, but in the majority of cases leads to a /worse
result/. For example, it would be /much more correct/ for sites like Hacker
News to mark that the comment/title at the top of each page "is what search
engines are allowed to see and index" and that the rest of the comments below
it are "ancillary content that absolutely must not be indexed". Otherwise,
when you search for a comment using Google, you find every single point along
the tree that connects from that comment up to the root of the page, as the
comment is present on all of them.

I found myself thinking about these issues a lot recently while working on
writing some custom forum software, and even went and skimmed through every
single Google web masters video that Matt Cutts put out, and the end result
simply made me angry: I was finding myself purposely designing worse things so
that they could be "indexed better", and when I'd look at what I was doing and
go "this is nuts: is Google really that important?" I'd have to sigh and sadly
remind myself "yes, it probably is".

~~~
adgar
> Seriously... there is /nothing wrong/ with a site choosing to A/B test
> people by returning different results from the server, and yet Google
> insists that doing so is somehow harmful and that it would be better to wait
> entire extra round-trips to do client-side testing using JavaScript, a
> process that in the end is not only a worse experience for the user and a
> more complex and less secure mechanism for the developer but has /the exact
> same fundamental behavior that Google claims is evil/.

Halfway through your rant, you clearly demonstrate that you did not bother to
adequately understand the linked article before ranting about Google, throwing
words around like "evil" and "despicable" without even knowing what you are
ranting about, like so very many well-intentioned but woefully ignorant
commenters in today's technorati.

FTA, emphasis mine:

> [The split-page approach used] is sort of a hybrid of Client-side and
> Server-side variations. Here’s how it works. Let’s say you want to test your
> landing page at yourdomain.com/landinag-page. You would create additional
> pages _and using javascript, the visitor would be redirected_ to one of the
> pages.

You seem to have completely misunderstood what the author was saying.
Naturally, since you are both misinformed _and_ critical of Google, that makes
you the most highly upvoted comment on today's Google Hacker News thread!

~~~
saurik
Can you please explain to me how this oversight on my behalf makes the
situation better? Does the fact that Google is also disallowing client-side
A/B testing of this fashion cause it all to make sense? Seriously: you can
tell me I misunderstood the article (which I will happily admit I skimmed: I
read the various paragraphs as "different common mechanisms for implementing
split tests", one per paragraph), but can you honestly tell me that this
misunderstanding would have changed anything about my rant, excepting possibly
making Google come off even worse?

<http://www.youtube.com/watch?v=xepQgGcit2A#t=3m23>

^ This is the reason why I went in that direction, by the way. This is a video
from Matt Cutts, one of the many that I sat through watching during a massive
five-hour- MattCutts-and-GoogleWebmasterHelp -a-thon that I force myself
through a week or two ago. I am hotlinking to 3:23 into the video as this
video answers multiple questions (as is common on the MattCutts channel, as
opposed to on GoogleWebmasterHelp, where each question tends to get its own
video).

In this question's answer, it is clearly stated that Google may very well
consider the same page returning different loads of the page for purposes of
A/B testing as "cloaking", and that webmasters should instead use client-side
mechanisms to perform these tests. If tests /are/ done, it is claimed that the
webmaster should only do so on areas of the site that are not being indexed
(which may involve explicitly telling Google to stop indexing that part of
your site).

~~~
fpgeek
Your rant doesn't make sense because your starting point "Google is evil and
despicable for making rules for the Internet" is incoherent.

Both Google's action and Google's inaction "make rules for the Internet". In
an alternate universe where Google didn't implement this penalty, attacks
using this kind of client-side redirection could easily be common and a
serious problem. In that universe, alternate-saurik could come to Hacker News
and complain that "Google is evil and despicable for not clamping down on
these client-side hacks that are almost always used by attackers, not
legitimate developers."

Google has to make choices. Those choices are going to feel like "rules for
the Internet" for many people they influence. That's unavoidable. If you're
going to criticize Google for this sort of issue, you need to focus on the
details of those choices. In this case, your lack of attention to detail makes
it clear you're not making a credible case.

~~~
saurik
In my rant I provided a specific example of how Google's rules regarding
content cloaking make it difficult to search or build sites like Hacker News
(and will now further point out that Yahoo provides that feature); I have also
provided evidence that Google disrecommends doing A/B testing using some
mechanisms as it may be considered cloaking, and that webmasters should first
deindex their content from Google.

I can even, if you demand, find other videos from Google (also from Matt
Cutts) that show that the heavily-hedged suggestion in this video to serve
different content from the server (even small snippets of text, such as titles
or breadcrumbs) that is not based on IP address (and thereby might not be
stable for GoogleBot) is also highly dangerous and can lead to Google
believing that you are cloaking.

I therefore take issue with the fact that, once I misunderstood this specific
article, that somehow means that all of the other research that I've done on
this matter, and even the arguments that I state in my rant (where the
specific paragraph that is currently under contention I am backing up with
evidence outside of this single article, which is from someone none of us have
ever heard of and for all we know could be lying about what Google did anyway)
are now void.

