

Demystifying SEO with experiments - jwegan
http://engineering.pinterest.com/post/109318939139/demystifying-seo-with-experiments

======
aresant
"Figure 4 shows the result of an experiment we ran to test rendering with
JavaScript for better web performance. . . [The test failed and] even after we
turned off the experiment, it took almost a month for pages in the enabled
group to recover..."

This is the incredibly frustrating part of White Hat SEO w/Google because:

(a) At nearly every SEO conference, by Google's own actions in releasing Page
Speed Insights, and by Matt Cutts engaging and talking about page speed being
an important indicator it seems like speed is pretty !@$!@% important to
Google and worth pouring resources into. (1) (2)

(b) As a result a well equipped and well connected organization like Pinterest
launches tests designed to improve this important signal. I'm going to assume
they're organizationally smart enough to not damage or ignore other important
Google signal ranks like usability, time on site, etc that you have to balance
w/the JS page speed test.(3)

(c) Google penalizes them.

WTF!

My frustration as a customer acquisition guy - encompassing CRO / SEM / SEO /
etc - is that I try to discuss and push best practices for my own projects,
for clients, and for public facing blogs / presentations / etc.

I get that they don't want people gaming / pushing - but when they push out a
"best practices" methodology like page speed, and then execute a penalty as
described by Pinterest, I just want to throw my hands up.

(1)
[https://developers.google.com/speed/pagespeed/insights/](https://developers.google.com/speed/pagespeed/insights/)

(2) [http://www.webpronews.com/today-on-the-matt-cutts-show-
page-...](http://www.webpronews.com/today-on-the-matt-cutts-show-page-speed-
as-a-ranking-factor-2013-08)

(3) I'm going to add the disclaimer of not having seen the JavaScript that
pinterest used and perhaps they're not properly weighting / aware of other
important SEO signals that GOOG penalizes when using Javascript, but I'm sure
they are. Happy to answer more on that directly via my profile or this thread.

~~~
mhoad
While I too couldn't comment on the actual JavaScript they put in place for
the test and how it might have impacted other factors I just wanted to make a
quick comment on the idea of a page speed penalty.

1\. I have never in my life seen evidence of this happening before (enterprise
consultant in this field). I am positive in my mind that it was not a case of
"page is faster = drop in rankings".

I would be willing to bet that it had a lot more to do with the "rendering
content in JS" which has traditionally been a huge issue for search engines
(despite what they claim).

But every single time I have seen someone claim this or something like this
there has been many other factors at play.

2\. It is highly likely that the amount of impact that any given variable (in
this case pagespeed) is not the same across the board. In fact it is much more
aligned with the kinds of industries you are in and the keywords relating to
that search.

As an example page speed is MUCH more likely to be a bigger factor for an
e-commerce website and keywords showing any kind of transactional intent than
it would be for someone looking for detailed information on a medical
condition for example.*

*This isn't a confirmed fact that I know of at all but it seems to be a relatively well established theory in some more advanced SEO circles I believe.

~~~
jnem
The main issue with JS and SEO is that most JS Frameworks we are talking about
are used to create SPAs (Single Page Applications). Usability-wise, these are
great; everybody hates page loads. SEO-wise, there is a problem because the
search engines can’t see all the HTML because everything is being rendered
client side.

Basically, with a traditional web page written in PHP, the HTML is constructed
from a template. The template spits out HTML from the server to the client.
Every time you click a new link to load a new page, it’s delivered to your
client via a GET request (EDIT: Your client/browser "asks" for the file from
the server, the server sends the requested document back to the client).

With a SPA, JS is doing some DOM manipulation, and you’re not making round-
trips to the server to display new content. For example, if you were to look
at the source of a SPA written in Angular, you might see some <div ng-view>
</div> elements, but almost no actual html. The web crawlers would see
something similar.

There are several tactics for circumventing this issue, and Im curious if the
Pintrest team considered them during this experiment. Anybody on the team
here?

~~~
dangayle
Angular.js, courtesy of Google.

------
willu
Unfortunately this only reinforces the mystical nature of SEO. 1\. Why does
Webmaster Tools tell you duplicate titles are a problem but changing them has
no impact? 2\. Why does repeating pin descriptions improve traffic drastically
when we're told not to duplicate content? 3\. Why do some changes have a
lingering impact while others revert back to pre-change behavior?

That said, I applaud the scientific approach to coping with the black box.

~~~
motherwell
What makes you think that the solution they implemented was a good one, and
should have worked?

~~~
dangayle
I agree. I took one look at their test and thought: That's missing the point.
They actually don't care if the title tags would hash differently, they care
if the title tags are descriptive of the content in such a way that the pages
can be distinguished by a human or nlp bot.

But then again, what more could they do? Likely adding the board's owner would
have had a much higher statistical relevance.

------
voyweb
Think they could have had more fun experimenting with image indexing. Say you
Google a nice location, for example: Edinburgh, Scotland. On that SERP there
is two locations for images, the 5th and the knowledge box to the right has a
map and image.

The first image in the 5th position image area is Wikipedia (hard to beat
that), but the last three are local blogs and Flickr (easier to beat). The
very last image is the same image used in the knowledge box which sits nicely
in eye line with the 1st position SERP link.

After a quick bit of detective work I've found that in Google Images,
Pinterest link back to their page but are not the image source.

Type into Images search: site:pinterest.com intitle:Edinburgh, Scotland

Back to the Edinburgh, Scotland SERP and looking at those images in the 5th
position we can see that all images are both the page and source.

We can use the Flickr image that's third in the 5th position image area as
grounds to warrant even a small experiment to test if the theory is correct.
The theory being that if Pinterest was the page and source could they see a
benefit from it reflected in their organic search traffic.

What Pinterest lack is content, which they stated in the post. What they don't
lack is images and titles.

------
nlh
I'm presuming the negative results they saw from "rendering with JavaScript"
meant, specifically, they moved certain page rendering tasks to the client
side vs. server side. (It wasn't explicitly clear that was the case, but
implicitly so).

If so, that's a big reinforcement of the importance of server-side rendering
for SEO purposes or, for you JavaScript fans, isometric applications.

I know this is talked about a lot anecdotally, but it's interesting to see it
so starkly laid out in an experiment by a major site.

~~~
dmnd
I'd love to hear about that experiment in more detail. People often cite
Google's Understanding web pages better[1] as evidence that it's now OK to
render everything with JS, but this is the first time I've seen someone
publish actual evidence.

[1]:
[http://googlewebmastercentral.blogspot.com/2014/05/understan...](http://googlewebmastercentral.blogspot.com/2014/05/understanding-
web-pages-better.html)

------
somberi
When I spoke with some senior Google search guys, this is what I walked away
with:

Be a good player. Provide good content that your users care about. Positively
add to the Web. Google will find you.

~~~
IndianAstronaut
Not all sites are about content. A friend of mine recently struggled with
this. He is able to get top quality swords and has a cool selling platform.
Problem is that this does him no good on search rankings.

~~~
schoen
That's an interesting point conceptually: the best result for some purposes
might be best for a reason that comes from the offline world. For example, if
you want to buy rare things, you want to find the dealer with the best
expertise and access to sources of those things. That dealer might have an
incredibly minimal web site with almost no content -- maybe just contact
information.

The original link-structure analysis idea in PageRank was meant to address
issues like this a little bit: if everybody links to that dealer's page, it's
a good suggestion that that dealer is important, regardless of the content of
the page. But there are also things that people don't talk about on the web
that much, or don't link to on the web that much (especially if they relate to
a secretive, insular, or otherwise not-heavily-web-using community).

You could say it's no fair expecting search engines to know about social facts
they can't possibly observe, but in any case it's a reminder of how
complicated the idea of relevance or the best result really is!

------
lurchpop
On the duplicate title test, I wonder if they saw no difference because they
put the unique element after the pipe (e.g. "... on Pinterest | {pins}" ).

Maybe google ignores after the pipe because that's where people always put
branding: {title} | {meaningless company name}.

------
sixQuarks
This doesn't demystify SEO. There are just so many factors involved in SEO,
and unknown factors. Something that works today may not work tomorrow. The
only true guideline to go by is to create great content for humans, period.

~~~
dangayle
I agree with you about the great content for humans bit, but that doesn't mean
that you shouldn't still aim to maximize your current traffic through
technical means.

If it gives you a whitehat traffic increase of even a few percentage points,
that can still mean a big deal, hundreds if not millions of dollars.

~~~
sixQuarks
Here's the crazy thing about SEO - specifically SEO in relation to Google.
This is a gut feeling based on over 10 years of building sites primarily with
organic traffic:

I don't think you should try to do "best practices" with SEO. Over the past
couple of years, I feel like Google is penalizing sites that try to dot all
the i's and cross all the t's. And why shouldn't they? White hat SEO is still
gaming the system in a way. In google's eyes, the pages that contain the very
best content for humans should show up higher in search, despite them not
being optimized for SEO.

I've been seeing more success with pages that are not optimized, pages for
which I didn't pay any attention to SEO. The content on those pages are geared
for humans and contain great info, that's it. I don't pay attention to URL,
title tags, meta tags, etc. Google is getting very good at filtering this out,
not sure if they have a team of humans that are whitelisting sites now, but
I've given up on trying to optimize for SEO and it's worked wonders.

------
web007
Good to see AB testing applied to SEO instead of just UX changes. The tl;dr
version:

A/B testing is:

    
    
      bucket(hash(experiment, user identifier))
    

A/B testing for SEO is:

    
    
      bucket(hash(experiment, url))

~~~
kanzure
> bucket(hash(experiment, url))

Unfortunately it often seems to be per domain, and not per url. And then you
have to factor in backlinks...

------
weavie
I don't think I have ever come across PInterest by searching. Am I just
searching for the wrong things? I thought PInterest was largely a glorified
bookmarking service - what original content is there that the search engines
could pick up?

~~~
JonLim
Personal guess, as I never have either, but I'd hazard a guess that they show
up for a lot of recipes, fashion, and interior decoration queries.

------
Kiro
Are they comparing two URLs on the same domain? Is that really worthwhile? How
much is it about the domain and how much about the single URL on the domain?

If I link to a specific URL, do I give PR to that URL or to the domain itself?

~~~
greyman
To that URL. PR is always url-specific.

------
codezero
Is there a chance that a site as large as Pinterest might have their search
rankings dominated by some hand picked value rather than the many other
factors that might affect a typical site?

~~~
lugg
The time to index is certainly dependent on their size, but I don't see how
this sort of a/b tests on SEO wouldn't work on other sites.

The sample size of pages they have to play with, and the fact that they get
indexed within a couple of days does make it all the more viable though.

------
wyck
This is pretty meaningless in context, when you visit pinterest the site is
login gated. Sure that might look good on paper but it's a short term
strategy.

~~~
dchuk
The homepage might be, but they have 184,000,000 pages indexed in Google:
[https://www.google.com/search?q=site%3Apinterest.com&oq=site...](https://www.google.com/search?q=site%3Apinterest.com&oq=site%3Apinterest.com&aqs=chrome..69i57j69i58.2341j0j7&sourceid=chrome&es_sm=119&ie=UTF-8)

------
butler14
Whoever wrote this doesn't understand what a strategy is or a tactic is. Makes
reading on difficult.

------
compbio
Pinterest likely cloaks traffic. Internal site traffic to a pinboard will
require a log-in/register to continue viewing the board. Traffic from the
Google index is allowed to continue viewing the board. That is treating search
engines differently than human users (unless they throw up this log-in wall
for crawling Googlebots too, which would severely hamper crawl-ability of the
site).

You do not change the page titles of a site so you can get a few more visitors
from Google's algorithm, you change the page titles of a site, because they
are ambiguous for all your users. If you want to create more unique page
titles you can credit the username that created the board to the pagetitle,
instead of a meaningless and everchanging "number of pins on this board". For
example "Mickey Mouse on Pinterest by John Doe" or "Mickey Mouse | John Doe |
Pinterest".

You run A/B tests to test if user engagement with the site increases. If you
run A/B tests to test if certain changes increase your search engine
rankings/Google visitors then you are reverse engineering Google. Especially
with a large site like Pinterest this may gain you some ill-gotten benefit
over sites that do play nice:

"If we discover a site running an experiment for an unnecessarily long time,
we may interpret this as an attempt to deceive search engines and take action
accordingly." [1]

Even on a site like Pinterest I see low-hanging on-page SEO stuff that could
be implemented better. For instance the header for a pinboard starts at line
788. Proper content stacking/HTML code ordering ensures that information
retrieval bots do not have to wade through many menu's of boiletplate text,
before they get to the unique meat of the page.

There is basically one single way to do legit SEO and most of the tips and
techniques for that are transparently written in the Google Webmaster
Guidelines [3]. The good news is that this has not changed much at all over
the years, so one can stop algo chasing, and start improving the site for all
users and all search engines.

BTW: The blog has no canonical tag [2] and puts the _entire_ article inside
the contents of '<meta name="twitter:description"'.

[1] [http://googlewebmastercentral.blogspot.nl/2012/08/website-
te...](http://googlewebmastercentral.blogspot.nl/2012/08/website-testing-
google-search.html)

[2] [http://googlewebmastercentral.blogspot.nl/2009/02/specify-
yo...](http://googlewebmastercentral.blogspot.nl/2009/02/specify-your-
canonical.html)

[3]
[https://support.google.com/webmasters/answer/35769?hl=en](https://support.google.com/webmasters/answer/35769?hl=en)
"Following these guidelines will help Google find, index, and rank your site."

------
franze
first: i will not comment on the actual findings teased in this blog post,
because we miss lots of information, data and context (javascript to make
rendering faster, was it really the first pageview that was faster or was this
aimed at the second, client side rendering actually makes rendering of the
first pageview slower (please, proof me wrong))

second: this is the way SEO should be done - a systematic analytics dev.
dirven approach - and they solved one of the challenges big sites regularly
face SEO wise: running multiple onpage (SEO is just one aspect) tests
simultaneously over chunks of their sites.

most of the time you are stuck with setting a custom variable (or virtual
tracker) in google analytics of the pages you changed (and a control group)
the issue with this approach is that GA only reports a sample of data (50 000
rows a day) and for big sites this sample becomes insignificant very fast,
especially if you run tests. additionally it's not easy to compare the traffic
figures of the tracked page-group with log-data like crawling, so you need a
custom built solution to connect these dots.

this leads us to a serious limitation of the GA and pinterest approach:
connecting their data with google serp impressions, average rankings and
clicks. yeah, traffic is the goal of SEO, but it is pretty late in the funnel,
crawling is pretty early in the funnel, you can optimize everything in
between. for the in between we are stuck with google webmaster tools for
reliable data (at least it's data directly from google and not some third
party). so to get most out of such tests you must set them up in a way that
they traceable via google webmaster tools.

and to make something traceable in google webmaster tools basically means you
have to sice and dice them via namespaces in the URL.

simple setup

    
    
       www.example.com/ -> verify in google webmaster tools
       www.example.com/a/ -> verify in google webmaster tools to get data only for this segment
       www.example.com/b/ -> verify in google webmaster tools, ...
       ...
    

make tests on /a/ -> if it performs better than the rest of the site, good

the issue there is that to have a control group you need basically move a
comparable chunk of the site to a new namespace i.e. /z/ and site redirects
are their own hassle but well on big sites most of the time are worth it. also
you don't have to move millions of pages most of the time a sample on the
scale of 50 000 pages is enough (p.s.: every (test) segment should of course
have it's own sitemap.xml to get communicated / indexed data)

one more thing: doing positive result tests it actually quite hard - doing
negative result tests is much easier. make a test group of pages slow, see how
your traffic plumbles. make your titles duplicate, see your traffic plumble,
... yeah, these tests suck business wise, from an SEO and development point of
view they are a lot of fun.

shameless plug: hey pinterest, check out my contacts on my profile. the goal
of my company is to make all SEO agencies - including my own - redundant. we
should do stuff.

------
butler14
This reads like a poor attempt at a software engineer dabbling with SEO.
'Growth' team indeed.

Hire an SEO - or at least a digital marketer with SEO credentials - and do
some proper optimisation.

