STEP 3: Beerby.com uses a "Soft" error page, meaning if you type in a URL like: http://www.beerby.com/adfadi you get a 302 TEMPORARY redirect to a 200 OK page.
I pinged the indexing team at Google, but this is almost certainly something weird going on with cakecentral.com's webhost:
wget http://cakecentral.com/
--2011-01-14 09:12:50-- http://cakecentral.com/
Resolving cakecentral.com... 174.129.211.41
Connecting to cakecentral.com|174.129.211.41|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-01-14 09:13:01 ERROR 404: Not Found.
Once the root page of a site starts returning 404s, we have to start taking guesses about the best way to handle it. Best advice for Cake Central: make sure your root page returns a valid HTML page with a 200 response code.
P.S. If the webhost is trying to do something sneaky, e.g. things work for browsers, but wget or Googlebot is treated differently somehow, the owner of Cake Central can use our free "Fetch as Googlebot" feature in our webmaster console to help diagnose the problem.
Summary: not the weirdest search result I've seen by far. Webhosts that serve up 404s, redirects, or duplicate error pages can cause arbitrary things to happen in search engines. Bing doesn't have the url cakecentral.com indexed at all, for example. Blekko has it, but their page is from Nov. 11, 2010, so they're probably missing the 404 issue by being a couple months older.
If you look at http://174.129.211.41/ without a host you'll see that it's a nginx reverse proxy/cache
both beerby and cakecentral are on EC2.
Pretty likely that an error was made at some point in the cakecentral nginx config to include a beerby EC2 private IP as part of a load balance pool or as the single back end (either fat fingered or by retaining an old IP as instances were stopped and started).
It has since been corrected (probably?), but as cakecentral.com is returning a 404 to robots on their homepage the best, most recent return google has was when it was misdirected.
STEP 4: Google now shows the destination page if your results contain a 302 redirect, not the source page:
> Many months ago, if you saw someresult.com/search2.php?url=mydomain.com, that would sometimes have content from mydomain. That could happen when the someresult.com url was a 302 redirect to mydomain.com and we decided to show a result from someresult.com. Since then, we’ve changed our heuristics to make showing the source url for 302 redirects much more rare. We are moving to a framework for handling redirects in which we will almost always show the destination url.
Also interesting that searching {+cake central} or {cake +central} instead of {cake central}, the first result is the correct one.
I thought that the "+" just disabled the spelling/synonym/etc... alterations, while apparently in this query it does some kind of post-filtering for exact matches... (since both words do not appear in the page)
+ historically means "Do not give me any pages that don't contain this word." This used to be very important back when search engines were stupid, and basically ranked on the sum total number of occurrences of each term in the search. Searches for multiple terms would frequently be dominated by results that mentioned one of the terms many times.
I think you're on the right trail, but it's still confusing why this would take the number one ranking over the root domain? Maybe Google thought the entire domain had moved?
I don't know if this is what's going on with cakecentral, but I did something like this inadvertently many years ago.
We needed to have a CGI script handle all hits under a certain location, and for various reasons mod_rewrite wasn't an option. So I put something like this in an .htaccess:
ErrorDocument 404 /path/to/script.cgi
I didn't realize until later I needed to explicitly set "Status: 200" in the script's headers. As far as browsers were concerned, everything worked, even IE, since the "error message" (our page content) was long enough to not trigger its built-in error message.
It is definitely an anomaly. Take a look at the following comparison between the top 10 results for the keyword "cake central". It's worse than the other results in every significant way, yet it sits at #1.
Market Samurai: http://www.marketsamurai.com/c/Antonio (referral link). It's an excellent (and expensive) program for internet marketing and SEO research, available for Mac and Windows (it's made in Flex/Air). If you buy during the trial period, you can get a big discount though. I got my copy for $97.
Try http://360voltage.com and run a Voltmeter report. You have to have an account, but it's free. There's a ton of services like this, SEOmoz is probably the best.
Interesting that people, when searching for 'cake central', still click on a link with a title saying 'ERROR: backend server did not respond in time' even though the second result has 'CakeCentral' in bold.
i would guess it was either a server (housing) accident or a DNS f*ckup that let beerby and cakecentral switch places (in an erroneous state) for a short time, bad thing google picked u the cakecentral home page URL in that moment. it saw it as either a redirect or a direct douplicate of the beerby site and decided to show the older indexed page with the same content (the beerby error page).
yeah, either this or google screwed up.
update: why i guess this is because i have seen similar errors when sombody screws up redirects from the home page. (makes HTTP 302 redirects from the home page to another page, and that page (or the redirect) is then changed to something else...) but this is the first time i ever see such an error between two unrelated sites.
I could reproduce this weird behaviour. First I thought that there might be some pages linking to the error page with the anchor text in question - this is also what the cache page claims. Also there are many scraper and auto generated spam sites with broken links that never really show up in the Google index.
Similar cases have happened before. There is a forum by Google for Webmasters where you can tell Google about problems with your website:
This is the reason you should always configure your web server to serve an HTTP 503 "Unavailable" when your backend is not online or not fast enough. This will tell the Google bot to come back later and not index the result.
Could this be some new slang term that is not quite popular yet? "Cake central" = drinking lots of beer. "The other day I got caked at that bar, it was cake central down there"
Yes, there seems to be something else that is wrong with the Google index as some pages from the cakecentral.com domain show up with content from beerby - this has been noted by someone in this thread and I could just reproduce it.
On the other hand, we can never be sure if pages exist or where they are on the web that link to our pages with a certain anchor text. The link: operator is broken since a long time and shows only a small subset of the pages linking to the page in question if anything at all.
A more complete list of links can be found in the Google Webmaster Tools, but this is also never 100% complete or up to date. And we can use the Site Explorer to get on the quest to find a certain link:
siteexplorer is more than useless, and the google link: operator is crippled, but for a link bomb you need quite some links with the exact matching linktext, but a simple search for ["cake central" beerby] does not show anything. (and other queries with the link: inanchor: oprators, too) so that it can be relativly safely assumed that it is not a link bomb (in a link bomb you always find some of the links)
or lets phrase it like this
there is absence of evidence that it was a link bomb
I'm pretty sure the web crawl Google does to figure out your rankings is separate from the one that saves the cached version and probably that snippet.
That said, I have no idea why that page would rank on those terms, error or not.
Doesn't this just reflect the dirty little secret that Google doesn't really have to get any particular details right, just mostly right most of the time?
another interesting thing to notice is that Google instant comes with the right result (cakecentral.com). Only when I press enter (or the search button) I get to the beerby.com result.
Google instant does however claim that it's showing results for cake central magazine and I can search instead for "cake central".
EDIT: even better - searching for cakecentral.com also leads to the same error page on beerby.com
This likely happens when a page was the top result but then got re-spidered. Google probably keeps the old rank for a while, even though the content has changed.
In popular french "cake traces" refers to brown marks in underpants. I guess the french expression "cake face" is a subsequent derivation from it. So I'm trying to guess what "cake central" might mean ...
STEP 1: Accessing CakeCentral.com returns a 404 "Not Found" HTTP Code when requested:
1. Go to http://www.rexswain.com/httpview.html and enter in http://cakecentral.com/
2. Take a look at the response codes, see the 404
STEP 2: Previously, inexplicably, _actual_ error pages on CakeCentral.com such as: http://www.google.com/search?q=site:www.cakecentral.com%2Fca... returned 302 redirects to Beerby.com
STEP 3: Beerby.com uses a "Soft" error page, meaning if you type in a URL like: http://www.beerby.com/adfadi you get a 302 TEMPORARY redirect to a 200 OK page.