Hacker News new | comments | show | ask | jobs | submit login
How The Guardian successfully moved its domain to theguardian.com (theguardian.com)
150 points by malditojavi on Feb 18, 2014 | hide | past | web | favorite | 53 comments

The only clever thing they did was this:

> the Identity team started laying cookies on www.theguardian.com in advance. This was a nice touch because it meant that visitors would still be logged into the site when we eventually changed domain.

Everything else? Yeah uhm, not very interesting. As they wrote themselves, there's a thing called 301 - permanently moved.

No, there was one other interesting point.

Management wanted a "big splash" public rollout. Development teams wanted to avoid a "big bang" development effort. They solved this by going live many months ahead of time but ONLY for clients using special headers. This allowed anyone to test the system while still not making it "public" until the day of the big reveal.

I had not previously heard of that particular technique (using special HTTP headers) and it's a useful one.

> I had not previously heard of that particular technique (using special HTTP headers) and it's a useful one.

I am pretty sure by "special HTTP headers", they mean cookies.

I wouldn't be surprised if they were just manually adding a header.

I've seen it done like that before, using a browser extension to add a header and then mod_rewrite to apply a special set of rules if that exists.

The article specifically mentions how this special HTTP header was implemented client side using a browser extension and how it's handled server side

I doubt it. I think they've actually just made up an HTTP header.

(Cookies wouldn't work anyway... how would you place them? What about expiration? How would they interact with existing cookies? What if you have to clear your cookies while debugging? Whereas a custom header requires a browser plugin, but otherwise is innocuous.)

More likely a useragent change

I think you are slightly underplaying the challenges of making one of the biggest domain changes ever.

Yes, the technical details are not too complex. But the risk is massive and the legwork still considerable.

If you work for a very large website and want to change domain you will have been following The Guardian's move closely.

I read that and understood how it would be beneficial, but not how it would be possible. Say I own x.com, and y.com and even have them both being served from the same box. If a user requests a page for x.com, how can I get them to accept a cookie for y.com?

On the old domain, make a request to the new domain with query parameters that have the information necessary to login as that user. You can do this using e.g. a hidden image, an iframe, or using javascript. The request on the new domain saves that login information in a cookie.

Of course. I should have thought of that. Thank you.

You could do it through redirects to from x.com to y.com/login, which sets cookies on y.com's domain, and then redirects back to x.com. Either only do it on login or set another cooking on x.com once you've done it once. (I actually work on a web property that does similar, although for different reasons.)

I was thinking the same thing. As I understand it it's possible only to do for subdomains.

I suppose one way of doing it is with some kind of script that links up to a prehosted theguardian.com so the cookie is set from where the js is included from.

The other answers to your question are technically all feasible, but are all extremely unlikely to be how this was actually done.

This is just what third party cookies are. Cookies set from the server-side (ie, the Set-Cookie header) can set whatever domain they want -- and if that cookie happens to not match the domain of the page you're on, that's what's called a third party cookie.

Some browsers (primarily Safari IIRC), however, will automatically reject those cookies, either in all instances or depending on if you've interacted with that domain before.

Thank you, and I appreciate learning. But I think I'm reading something else on wikipedia. The article there says you can only set cookies on "the top domain and its subdomains" [1] and that third party cookies are those set by page assets (like images within a page) that are served from a different domain. [2]

What do you think?

[1] http://en.wikipedia.org/wiki/HTTP_cookie#Domain_and_Path

[2] http://en.wikipedia.org/wiki/HTTP_cookie#Third-party_cookie

Ah, yes. I should have been clearer in my comment. Obviously, the request needs to be served from the domain or subdomain in question, but it doesn't need match the URL of the page you're on, and it doesn't need to be an iframe, nor be set by JavaScript. It can be a simple image request; doesn't need to be an iframe, and doesn't need to involve complex JavaScript.

My point was that it doesn't need to require much complexity at all; just an HTTP request served by the domain in question that passes along the cookies that need to be served from the new domain.

You are correct. Parent is mistaken.

My point was that the domain of the top URL you're on isn't relevant. Should have been clearer that the domain of the request itself needs to match.

You could place an invisible iframe on the page. Ugly, but possible.

you sound like the comic book guy.

back here in reality, moving sites that generate many millions of dollars is always a big deal, and when it goes correctly, acknowledgement is due.

> We attempted to speak with all our major referrers including search engines and social media.

Reworded: "Google don't have a phone".

They do for large organisations like The Guardian.

I wonder how large you have to be before Google will answer.

Spend > $250k/month on adwords and you can be friends with Google.

I'm surprised they went live with this without an expiration date on their permanent redirects. Now there's no way back, even if anything breaks. Looks like an unintentional Big Bang launch to me.

http://getluky.net/2010/12/14/301-redirects-cannot-be-undon/ http://mark.koli.ch/set-cache-control-and-expires-headers-on...

The Graun's address will forever be www.grauniad.co.uk to me.

(note to confused and/or non-UK people: look up the magazine Private Eye)

And even if you are familiar with Private Eye you might not get the joke that the Guardian was in the past notorious for tpyos and misprinst.

I had always heard an interesting story about that — since the newspaper was printed in Manchester, it was the first press runs that had to be sent to far-off London. Then the later runs (with misspellings often fixed) were sent to the closer cities.

On the other hand, newspapers based in London would send their typo-ridden newspapers to far-off locales first, and the corrected editions would stay in London.

Since the tastemakers were in London this resulted in a situation where the newspaper becomes notorious for being ridden with errors.

No idea if there's any truth to it, the wikip page presents a different story that sounds like problems with collaboration tools (eg. TTY) used between the two cities.

The Guardian actually owns thegrauniad.com -- somebody needs to set up the redirect

Not a very helpful article. I was hoping they'd share how they managed the SEO portion in a way that would prevent a drop in rankings. They glossed over almost every point.

They need to talk to Sean Parker. He'll convince them to drop "the"

Ha, the issue there is really about domain availability and cost.

in previous years say 5+ years ago, this was a scary concept to anyone working on sites and still believing in nonexisitent SEO voodoo. But it has become commonplace and more than simple to 301 a site from one domain to another, updating the usual suspects like google etc to make sure it all goes smoothly. So nothing really that super here. Just nice to hear about process behind the scenes and that everything was taken into account etc..as it should be.

I am not sure how many domain migrations you have done in the past but based on this comment my guess would be few if any. There is a bit more to it than just slapping on a couple of 301's and hoping for the best.

have done many, obviously there's a lot of legwork with links and content, but it's not some technically groundbreaking thing or mass-mystery like it used to be. Maybe I've just gotten used to it.

Random related trivia: `gu.com` also redirects to `theguardian.com` (useful on mobile, way faster to type).

>Our goal was simple: “to serve all desktop and mobile traffic on www.theguardian.com and no longer serve any content on www.guardian.co.uk, m.guardian.co.uk or www.guardiannews.com"


So is the consensus that .mobi was one of the worst ideas in existence?

Once upon a time it was thought that device TLDs would be a useful thing, that's all. It just so happened that the smartphone was invented in the interim, and media queries and responsiveness and, heck, HTML became the standard way of representing mobile content.

their first byte is not that fast :


also many requests on the page

Try comparing that against the new responsive version of the site: http://www.webpagetest.org/result/140218_42_RAG/

you have tested the mobile version

There's something ironic about how long it takes tests to run with that site.

Interesting that they contacted Yoast for SEO advice.

I was wondering about this too. The Guardian can't possibly run their site on Wordpress, or can they?

No, it's all internal, but that guy knows his stuff when it comes to SEO in general.

I've been doing large site SEO for almost a decade, working for brands like eBay, Disney and others, it's not that weird. It's just that that's not the thing people know me for ;)

>If the host was www.theguardian.com, we would rewrite all the URLs on the site to be www.theguardian.com. If the Host was www.guardian.co.uk we would rewrite all the URLs on the site to be www.guardian.co.uk.


They couldn't change all URLs to be relative, so instead they wrote a filter which would rewrite absolute URLs to match the selected hostname. A simple fix for a relatively complex problem.

Or a hack which will never be removed from the code-base, depending on your point of view.

I'm intrigued as to why changing to relative domains wasn't possible. If nothing else pushing 'http://www.theguardian.com' out for every link adds to a lot of bytes up for a busy site.

    pushing 'http://www.theguardian.com' out for every link
    adds to a lot of bytes up for a busy site
Fewer than you'd think after gzip compression:

    $ curl -s http://www.theguardian.com/us | wc -c
    $ curl -s http://www.theguardian.com/us | \
       sed s'~http://www.theguardian.com~~' | wc -c
    $ curl -s http://www.theguardian.com/us | \
       gzip | wc -c
    $ curl -s http://www.theguardian.com/us | \
       sed s'~http://www.theguardian.com~~' | gzip | wc -c
They have 7.7k of extra html due to repeating "http://www.theguardian.com" for every link, but gzip compressed this is only a difference of 229 bytes.

Very nice writeup. What was the reason for switching?

They want to move from a local newspaper to a global news website.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact