Hacker News new | comments | show | ask | jobs | submit login
Website outages and blackouts the right way (plus.google.com)
329 points by amirmc on Jan 16, 2012 | hide | past | web | favorite | 57 comments



Another, in my opinion, great example of the need for subdomains on HN submissions. I surely wasn't the only one who thought this title meant Google were endorsing blackouts?


Every time there's a submission from google.com someone complains that it looks like an official Google statement and we need subdomains to differentiate. I find this obnoxious, mostly because I have never found this to be any kind of problem but also because you people keep repeating this feature request for every link from google.com.

Probably 99.99% of google.com submissions are not linking to any official Google statements. And when submissions do happen to deal with official Google statements, they tend to be Google+ posts from someone who represents Google. So subdomains won't magically fix anything.

Why can't you just click through if it is ambiguous to you?


I think the point is that there is some ambiguity, and that it's a fairly easy to solve problem. Yeah, I understand it's not the end of the world or really likely to cause serious confusion, but I don't see any downside in showing the subdomain either.


Can't you just hover over the link and see the entire url on the bottom of your browser window?


In most cases you can, but why provide the easily accessible domain information at all if it's misleading in so many situations?


Not on my phone!


Good point, didn't think of that.


Agreed, I've seen a greasemonkey script that can do it, but honestly I can't see a reason why this isn't adopted as standard!

https://gist.github.com/1522657


Because pg has other priorities. All the whining in these threads about this is seriously irritating and derails the topic as it gets voted up past relevant things. Do it in its own post if you need to - which has already been tried several times. Just keep it out of on-topic discussion please.


I use this Greasemonkey script to show the whole URL, except the www.: https://github.com/jcs/better_hn_domains

(Note: I am not jcs)


This https://gist.github.com/1522657 has the slight advantage of working on comment pages as well.


I believe just an exception for plus.google.com needs to be added, as subdomains are shown for some URLs (for example, I see that the subdomains on tumblr are shown).


Better yet, fetch the page in the background, get the microformat info[1] which will contain link to author’s vcard[2] and show something like this instead:

    [Pierre Far]
Google+ is just a platform — who cares for “plus.google.com” if it’s not about Google+.

[1] http://www.google.com/webmasters/tools/richsnippets?url=http...

[2] http://www.google.com/webmasters/tools/richsnippets?url=http...


Would subdomains even help with posts form +? Google has been using +Pages to communicate lately so even if it was an article from Google it still likely would have said plus.google.com.


Yes, because the URL hint would be "plus.google.com" rather than just "google.com." The latter could link to Groups post, a Google+ post, or a page under the official Google site (google.com).

Each represents a rather type of communication which may or may not originate from Google HQ.


Public Service Announcement: Don't take URLs (shortened or otherwise) as political advice.


Yeah, G+ is eating up the Intenet. Don't understand why people blog there.


But we sure can understand why Google employee posts there.


Hello all

I'm the author of this post. There are some suggestions for Javascript-based alternatives. A well-implemented Javascript overlay for the blackout message is a valid option, but keep in mind the following when thinking about it:

1. Googlebot does run some types of Javascript. We blogged about this recently: http://googlewebmastercentral.blogspot.com/2011/11/get-post-...

This means that the content included via Javascript may be indexed until the next time we crawl and process the page and find it's gone.

2. Consider how the JS will affect your pages' instant previews. Overlays are likely to show up in the previews, and these will take some time to update as we re-generate the previews.

3. Consider your users. Some webmasters are suggesting keeping the blackout overlay visible without any means of hiding it. That may not be the best user experience, and may be annoying to your non-US based users.

I'm happy to answer any other questions.

Thanks, Pierre


You should be able to help googlebot index the right content by using different HTTP status codes in the response. Also, although googlebot executes JS, in my experience it doesn't use it for indexing content. My guess is that it's primarily used for the instant preview and for verifying that you're not cheating by hiding SEO keywords on load or whatnot. This could change any day, of course.


Who is this guy now? I keep seeing announcement posted on Google Plus by people speaking for Google, but they never bother to mention as much as their function or position within Google.

They ought to start of with "Hi I'm XYZ, lead PQR coordinator of Google ABC." or something.

At least this time it says "google" if you mouseover his photo, the last one didn't even that.


You can go to his about page. https://plus.google.com/u/0/115984868678744352358/about "I work at Google as a Webmaster Trends Analyst. This is my personal profile.

What do I do at Google? I help webmasters build better websites, so you'll see me talking to webmasters on our and other forums, writing blog posts, saying things like "I'll ask internally" and the like."


"I work at Google as a Webmaster Trends Analyst. This is my personal profile."

Security wise a bad idea. For example email of a actual company employee (at a company that offers email) might normally might be name@corp.yahoo.com or name@corp.google.com not name@yahoo.com etc.

By posting in this manner nothing to prevent someone from posting the wrong information.


Hi,

I'm the author of the post. Your concern is valid and it's one of the reasons the Google+ account is verified.

Also, there is plenty of evidence that it's a real Googler's account. For example, when I blog on our official Webmaster Central blog, I link to this Google+ profile; for example the most recent post: http://googlewebmastercentral.blogspot.com/2012/01/better-pa...

Hope this assures you a bit!

Thanks,

Pierre



"but they never bother to mention as much as their function or position within Google."

And even if they did how do you even know that info is true? Who verifies that?


If HN has "sticky posts", this thing should be on top at least until 18th of Jan to avoid people from hurting their website's SEO too much for the solidarity action.


Until I saw this post, I hadn't even considered the possible SEO implications.


What I took away from the article is that done right, 503 error, SEO wont be damaged (long term at least) and everything will be back to normality again within a couple of days.


This "technique" is highly recommended for your error pages also (50x). It also helps if you set the "Retry-After" HTTP header (value in seconds), that tells google (and hopefully other crawlers) that they shouldn't bother with crawling for another X seconds. Helps with the load if you are experiencing problems with your server.

Retry-After is also usefull for "down for maintenance" pages since you usually know how long your page will be down.


This technique would not be needed if Google joined the blackout and didn't crawl for a day.


Even if Google joined the blackout, only the web frontend would be blacked out, not the backend crawling.


Can someone explain how to change a site to return a 503 HTTP header?


Put Redirect 503 in your Apache configuration file.


I wrote this up awhile ago for how to do this in pure httpd configuration:

http://journal.paul.querna.org/articles/2009/08/24/downtime-...


You can do it on a page-by-page basis. If your server supports PHP, add <? header('HTTP/1.1 503 Service Unavailable'); ?> before all other bytes in the file that you want to 503.


Which HTTP server are you using? Apache? nginx? Each one is different.


Wouldn't it be possible to detect if it's a web crawler visiting your site (e.g. user agent) and then let them crawl the site as normal? If it's not a web crawler, then display the "blackout" version of the site?


That's cloaking and can cause you to be de-indexed.


Interesting. This site claims that the intent behind the cloaking is taken into account: http://www.smart-it-consulting.com/article.htm?node=148&... ... Do you agree? Or have you had a different experience?


Yes, at least in the past, Google has taken into account intent when it comes to cloaking, letting some big sites get away with it and still rank well in serps.

However, I can see how your initial suggestion of showing full content to Google, could be viewed as solely for preserving rankings in an artificial manner.

As an example of cloaking, some News Sites let Google index all their pages, while requiring actual users to login/register to view it.

Typically, if the user has a Google Referrer, they can view the page one time for free and then need to login/register to view anything else.

Visiting the page directly or with a non-google referrer shows a register/login page.

New York Times was one that does(did?) this. I stopped visiting them when they started. I think Washing Post, or one of the posts, was doing it too, as well as a number of other sites.

Experts Exchange used to basically be the same way, although I think they are doing it differently now, and they were slapped by Google a long time ago for cloaking, so changed to a different method of cloaking...


Am I alone in thinking that if Google won't join the blackout than the next best thing would be a Google search full of anti SOPA messages?

Will a 503 make it into the listing? If so be sure to put your message in the title.


Except that doesn't work. You lose your rank for your existing keywords if you replace your site with an anti-SOPA message, which is what this is about.

If you just add a banner or message to your site and keep the rest of your content intact then sure, it may show up.


Any day we're set on, Black Friday? Weekly, bi-weekly, first of every month?


Afaik this is all for the 18th January 2012.....or Wednesday as we mortals call it


Not enough. I think there needs to be something preventative in place so this sort of thing can't raise its ugly head again.

If they want censorship, let's give them censorship.


You're free to blackout your own website for as long as you want. No one is going to stop you. Good luck with your regular traffic/visitors disappearing though.


A little bit of medicine can cure you, a lot can poison you.


While I'm no fan of SOPA I think people that are outraged enough to blackout their site but and at the same time are concerned with SEO implications remind me of protesters that don't want their pictures taken. You want to show up and feel good but don't want anyone to know about it later.

Why not set the page titles and meta descriptions for all your pages so that "This site was blacked out in protest to SOPA" shows up in your search results for a while? That would honestly probably have more impact and Google will get around to indexing you again.


It's not the same as not wanting your picture taken on a demonstration, it's the same as not wearing a tshirt for the next year that says "I was on this demonstration".

Doing a blackout for twelve hours harms your business, fucking up your SEO harms it again. Obviously in this, as in any protest, there is a decision to be made on what to do. Why not only blackout for one hour? Why not blackout for an entire month?


Because it doesn't work that way. Removing the keywords you rank for and replacing them with blackout pages will kill your rank for your target keywords.

If you want people searching for anti-SOPA messages to find your site and not your regular traffic then go for it.


Since GoogleBot doesn't run javascript, does that mean that an all Javascript solution would work without disrupting indexing?


Be warned, there's some evidence that GoogleBot now executes Javascript.

http://ipullrank.com/googlebot-is-chrome/



That is good to know. I guess this helps catch SEO cheats with javascript


Another alternative is to simply use jquery like framework to create a modal dialog window and display the message. Hide all the close buttons. You can put this script in the wordpress header and then all pages will show the dialog when the document is done loading. This has the least impact on your domain and content wise.


I wonder if a JS redirect would work too to some central page that any number of people can use (maybe a 302 redirect would be better.) It can display the desired page in an iFrame. I don't know what could be on that central page, but it could get a ton of traffic if done right. Donations, petitions, forum, guest book, chat, politicians contact details.. etc. Perhaps it can be displayed as the requested url somehow.. is that possible?

The other thing it could do is redirect back to the source site every x visitors, so one could set the ratio: 1/3 visitors gets the redirect. Search bots don't get the redirect.

It'd also make a good point for media coverage, providing some metrics on the effect of the blackout.. so long as people use it.

"xx,xxx internets were censored in the last yy seconds/minutes/hours from zz domains"




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: